BIANA

BIANA: Biologic Interaction and Network Analysis

Biana: a software framework for compiling biological interactions and analyzing networks
Javier Garcia-Garcia, Emre Guney, Ramon Aragues, Joan Planas-Iglesias and Baldo Oliva.
BMC Bioinformatics 2010, 11:56. 27 January 2010

BIANA Database Parsers

BIANA provides some default database parsers for most common databases and formats. BIANA has been designed to be able to store any kind of biologic database, relying on the user how he wants to integrate data between databases by choosing which combinations of attributes must be shared. However, due to the large number of different databases, formats and versions, and that often different versions of the same database have different formats, not all databases with biologic data have a current working BIANA Parser. Despite existing interchange standard formats, databases often change their formats, so parsers are not guaranteed to work in all database versions. In order to solve this problem, we provide a set of default parsers, that will be updated in this list of available parsers.

If you find an existing parser is not working any more for a new database version, or you are interested in having a parser for another database not available here, you can ask for us to make it (it can take some time), or try yourself creating a new parser (see 'How to make a parser for my own database or data' section below). Alternatively, you can use BIANA Generic Parser which accepts data in a prespecified tab-separated format. Once you convert your data you can user the Generic Parser to parse your data. Please refer to BIANA Manual on using Generic Parser.

Available Parsers

Parsed Database Name (or format)	Parser file
Uniprot	uniprotParser.py
PSI-MI 2.5 format	psimi25Parser.py
Biopax Level 2 format	biopaxLevel2Parser.py
Non-redundant Blast NR database	nrParser.py
Cluster Of Orthologous Genes Database (COGs)	cogParser.py
GeneBank	ncbiGenPeptParser.py
Taxonomy	taxonomyParser.py
Protein-protein interactions Open Biomedical Ontologies (PSM-MI OBO)	psimioboParser.py
International Protein Index Database (IPI)	ipiParser.py
HUGO Gene Nomenclature Committee (HGNC)	hgncParser.py
Kyoto Encyclopedia of Genes and Genomes (KEGG)	keggGeneParser.py, keggKOParser.py, keggligandParser.py
Structural Classification of Proteins (SCOP)	scopParser.py
Protein Families database (PFAM)	pfamParser.py
Gene Ontology (GO)	goParser.py
Tabulated File Parser (Generic Parser)	GenericParser.py
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING)	stringParser.py

Proposed unification protocol to be used

Databases	Attributes
Uniprot, GeneBank, IPI, KeggGene, COG, String	ProteinSequence AND taxID
Uniprot, HGNC, HPRD, DIP, MPACT, Reactome, IPI, BioGrid, MINT, IntAct, String	UniprotAccession
Uniprot, String	UniprotEntry
Uniprot, HGNC, HPRD, DIP, String	GeneID
Uniprot, SCOP(promiscuous)	PDB

How to make a parser for my own database or data

All parsers written in BIANA inherits BianaParser class found in biana/BianaParser/bianaParser.py. To write your own parser you need to create a new Python class whose parent is BianaParser. Then all you need to define is the arguments your parser would require in the __init__ (class constructor) method and overwrite parse_database member method which is responsible from reading and inserting information from your data files.
Here is an example parser (MyDataParser.py) to insert data in user specified format into BIANA. Let's go over the code.
First we start with subclassing BianaParser:


from bianaParser import *

                    

class MyDataParser(BianaParser):

    """             

    MyData Parser Class 



    Parses data in the following format (meaining Uniprot_id1 interacts with Uniprot_id2 and some scores are associated with both the participants and the interaction):



	    Uniprot_id1 Description1 Participant_score1 Uniprot_id2 Description2 Participant_score2 Interaction_Affinity_score

    """                 

                                                                                         

    name = "mydata"

    description = "This file implements a program that fills up tables in BIANA database from data in MyData format"

    external_entity_definition = "An external entity represents a protein"

    external_entity_relations = "An external relation represents an interaction with given affinity"

Above we introduce our parser and give name and description attributes, mandatory fields that are going to be used by BIANA to describe this parser. Then we create __init__ method where we call the constructor of the parent (BianaParser) with some additional descriptive arguments. You can add additional compulsory arguments to be requested from user by including "additional_compulsory_arguments" with a list of triplets (argument name, default value, description) (see list of command line arguments accepted by BianaParser by default).


def __init__(self):

"""

Start with the default values

"""

BianaParser.__init__(self, default_db_description = "MyData parser",  

		     default_script_name = "MyDataParser.py",

		     default_script_description = MyDataParser.description,     

		     additional_compulsory_arguments = [])

Next, we are going to overwrite parse_database method (responsible from reading and inserting information from your data files) where we introduce some initial arrangements to let BIANA know about the characteristics of the data we are going to insert:


def parse_database(self):

"""                                                                              

Method that implements the specific operations of a MyData formatted file

"""

# Add affinity score as a valid external entity relation since it is not recognized by BIANA

self.biana_access.add_valid_external_entity_attribute_type( name = "AffinityScore",

                                                                    data_type = "double",

                                                                    category = "eE numeric attribute")



# Add score as a valid external entity relation participant attribute since it is not recognized by BIANA 

# (Do not confuse with external entity/relation score attribute, participants can have their attributes as well)

self.biana_access.add_valid_external_entity_relation_participant_attribute_type( name = "Score", data_type = "float unsigned" )



# Since we have added new attributes that are not in the default BIANA distribution, we execute the following command

self.biana_access.refresh_database_information()

There are various attributes and types in BIANA to annotate data entries coming from external databases (see attributes and types recognized by BIANA for details). In case we need to use attributes/types that are not by default recognized by BIANA we need to make them known to BIANA as it is done above with add_valid_external_entity_attribute_type and add_valid_external_entity_relation_participant_attribute_type functions (see defining new attributes and types for details).


# Open input file for reading

self.input_file_fd = open(self.input_file, 'r')



# Keep track of data entries in the file and ids assigned by BIANA for them in a dictionary

self.external_entity_ids_dict = {}



for line in self.input_file_fd:

	(id1, desc1, score1, id2, desc2, score2, score_int) = line.strip().split()

Above we open a file for reading and start reading the file. This is followed by converting data read from the file into objects BIANA will understand and insert them into database:


# Create an external entity corresponding to Uniprot_id1 in database (if it is not already created)

if not self.external_entity_ids_dict.has_key(id1):

	new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )

	# Annotate it as Uniprot_id1

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id1, type="cross-reference") )

	# Associate its description

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc1) )

	# Insert this external entity into database

	self.external_entity_ids_dict[id1] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )

# Create an external entity corresponding to Uniprot_id2 in database (if it is not already created)

if not self.external_entity_ids_dict.has_key(id2):

	new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )

	# Annotate it as Uniprot_id2

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id2, type="cross-reference") )

	# Associate its description

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc2) )

	# Insert this external entity into database

	self.external_entity_ids_dict[id2] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )

Finally we insert information of the interaction as follows:


# Create an external entity relation corresponding to interaction between Uniprot_id1 and Uniprot_id2 in database

new_external_entity_relation = ExternalEntityRelation( source_database = self.database, relation_type = "interaction" )

# Associate Uniprot_id1 as the first participant in this interaction

new_external_entity_relation.add_participant( externalEntityID =  self.external_entity_ids_dict[id1] )

# Associate Uniprot_id2 as the second participant in this interaction

new_external_entity_relation.add_participant( externalEntityID =  self.external_entity_ids_dict[values[1]] )

# Associate score of first participant Uniprot_id1 with this interaction

new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id1], 

				participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score1 ) )

# Associate score of second participant Uniprot_id2 with this interaction

new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id2], 

				participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score2 ) )

# Associate the score of the interaction with this interaction

new_external_entity_relation.add_attribute( ExternalEntityRelationAttribute( attribute_identifier = "AffinityScore",

											     value = score_int ) )

# Insert this external entity realtion into database

self.biana_access.insert_new_external_entity( externalEntity = new_external_entity_relation )

As a good programming practice we do not forget to close the file we red as follows:


self.input_file_fd.close()

Command line arguments accepted by parsers

By default BIANA parsers require:

"input-identifier=": path or file name of input file(s) containing database data. Path names must end with /.
"biana-dbname=": name of database biana to be used
"biana-dbhost=": name of host where database biana to be used is going to be placed
"database-name=": internal identifier name to this database (it must be unique in the database)
"database-version=": version of the database to be inserted"

The following optional arguments are also recognized:

"biana-dbuser=": user name for the specified host
"biana-dbpass=": password for the specified user name and host
"optimize-for-parsing": set to disable indices (if there is any) and reduce parsing time. Useful when you want to insert a considerable amount of data to an existing BIANA database with indices created

Attributes and types recognized by BIANA and defining new ones

BIANA uses a set of attributes and types to define external entities coming from external biological databases (such as Uniprot Accession, STRING id, GO id, etc... as attributes and protein, DNA, interaction, complex, etc... as types). If you write a parser specialized for a particular data you have, you could either use existing attributes and types to annotate the entries in your data or create new ones if existing ones do not work for you. Here we give a list of valid BIANA attributes:

External Entity & External Entity Relation Attributes

name
CHEBI
COG
CYGD
DIP
EC
Encode
Ensembl
FlyBase
GDB
GeneID
GeneSymbol
GenomeReviews
GI
GO
HGNC
HPRD
Huge
IMGT
IntAct
IntEnz
InterPro
IPI
KeggCode
KeggGene
Method_id
MGI
MIM
MINT
MIPS
OrderedLocusName
ORFName
PFAM
PIR
PRINTS
PRODOM
Prosite
psimi_name
PubChemCompound
Ratmap
Reactome
RGD
SCOP
SGD
STRING
Tair
TaxID
Unigene
UniParc
UniprotEntry
WormBaseGeneID
WormBaseSequenceName
YPD
AccessionNumber
RefSeq
TIGR
UniprotAccession
Disease
Function
Keyword
Description
SubcellularLocation
Name
Pubmed
Formula
Pvalue
Score
ProteinSequence
Pattern
STRINGScore
SequenceMap
NucleotideSequence
PDB
TaxID_category
TaxID_name
GO_name

External Entity Relation Participant Attributes
name
cardinality
detection_method
GO
KeggCode
role

name
cardinality
detection_method
GO
KeggCode
role

And here is the list of valid BIANA types:

External Entity Types
name
protein
DNA
RNA
mRNA
tRNA
rRNA
CDS
gene
sRNA
snRNA
snoRNA
structure
pattern
compound
drug
glycan
enzyme
relation
ontology
SCOPElement
taxonomyElement
PsiMiOboOntologyElement
GOElement
External Entity Relation Types
name
interaction
no_interaction
reaction
cluster
homology
pathway
alignment
complex
regulation
cooperation
forward_reaction
backward_reaction

name
protein
DNA
RNA
mRNA
tRNA
rRNA
CDS
gene
sRNA
snRNA
snoRNA
structure
pattern
compound
drug
glycan
enzyme
relation
ontology
SCOPElement
taxonomyElement
PsiMiOboOntologyElement
GOElement

name
interaction
no_interaction
reaction
cluster
homology
pathway
alignment
complex
regulation
cooperation
forward_reaction
backward_reaction

In case, you need to annotate your data with some attribute or type that does not belong to the lists given above, you can use the following methods to introduce your attributes and types to BIANA. To add an;

External Entity Type: add_valid_external_entity_type(new_type)
External Entity Relation: add_valid_external_entity_relation_type(new_type)
External Entity Attribute (Textual): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
External Entity Attribute (Numeric): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
External Entity Relation Attribute (Textual): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
External Entity Relation Attribute (Numeric): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
External Entity Relation Participant Attribute: add_valid_external_entity_relation_participant_attribute_type(new_attribute, data_type)

Submit a new parser

Have you done a new BIANA parser for your own data or for another database? Submit you own parser and share it with others!

Parsed Database Name		Parsed Database link
Database Reference
List necessary files to parse
Parser file
Your e-mail
Additional comments