BIANA: Biologic Interaction and Network Analysis
BIANA Database Parsers
BIANA provides some default database parsers for most common databases and formats. BIANA has been designed to be able to store any kind of biologic database, relying on the user how he wants to integrate data between databases by choosing which combinations of attributes must be shared. However, due to the large number of different databases, formats and versions, and that often different versions of the same database have different formats, not all databases with biologic data have a current working BIANA Parser. Despite existing interchange standard formats, databases often change their formats, so parsers are not guaranteed to work in all database versions. In order to solve this problem, we provide a set of default parsers, that will be updated in this list of available parsers.
If you find an existing parser is not working any more for a new database version, or you are interested in having a parser for another database not available here, you can ask for us to make it (it can take some time), or try yourself creating a new parser (see 'How to make a parser for my own database or data' section below). Alternatively, you can use BIANA Generic Parser which accepts data in a prespecified tab-separated format. Once you convert your data you can user the Generic Parser to parse your data. Please refer to BIANA Manual on using Generic Parser.
Available Parsers
Parsed Database Name (or format) |
Parser file |
Uniprot |
uniprotParser.py |
PSI-MI 2.5 format |
psimi25Parser.py |
Biopax Level 2 format |
biopaxLevel2Parser.py |
Non-redundant Blast NR database |
nrParser.py |
Cluster Of Orthologous Genes Database (COGs) |
cogParser.py |
GeneBank |
ncbiGenPeptParser.py |
Taxonomy |
taxonomyParser.py |
Protein-protein interactions Open Biomedical Ontologies (PSM-MI OBO) |
psimioboParser.py |
International Protein Index Database (IPI) |
ipiParser.py |
HUGO Gene Nomenclature Committee (HGNC) |
hgncParser.py |
Kyoto Encyclopedia of Genes and Genomes (KEGG) |
keggGeneParser.py, keggKOParser.py, keggligandParser.py |
Structural Classification of Proteins (SCOP) |
scopParser.py |
Protein Families database (PFAM) |
pfamParser.py |
Gene Ontology (GO) |
goParser.py |
Tabulated File Parser (Generic Parser) |
GenericParser.py |
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) |
stringParser.py |
Proposed unification protocol to be used
Databases |
Attributes |
Uniprot, GeneBank, IPI, KeggGene, COG, String |
ProteinSequence AND taxID |
Uniprot, HGNC, HPRD, DIP, MPACT, Reactome, IPI, BioGrid, MINT, IntAct, String |
UniprotAccession |
Uniprot, String |
UniprotEntry |
Uniprot, HGNC, HPRD, DIP, String |
GeneID |
Uniprot, SCOP(promiscuous) |
PDB |
How to make a parser for my own database or data
All parsers written in BIANA inherits BianaParser class found in biana/BianaParser/bianaParser.py. To write your own parser you need to create a new Python class whose parent is BianaParser. Then all you need to define is the arguments your parser would require in the __init__ (class constructor) method and overwrite parse_database member method which is responsible from reading and inserting information from your data files.
Here is an example parser (MyDataParser.py) to insert data in user specified format into BIANA. Let's go over the code.
First we start with subclassing BianaParser:
from bianaParser import *
class MyDataParser(BianaParser):
"""
MyData Parser Class
Parses data in the following format (meaining Uniprot_id1 interacts with Uniprot_id2 and some scores are associated with both the participants and the interaction):
Uniprot_id1 Description1 Participant_score1 Uniprot_id2 Description2 Participant_score2 Interaction_Affinity_score
"""
name = "mydata"
description = "This file implements a program that fills up tables in BIANA database from data in MyData format"
external_entity_definition = "An external entity represents a protein"
external_entity_relations = "An external relation represents an interaction with given affinity"
|
Above we introduce our parser and give name and description attributes, mandatory fields that are going to be used by BIANA to describe this parser. Then we create
__init__ method where we call the constructor of the parent (BianaParser) with some additional descriptive arguments. You can add additional compulsory arguments to be requested from user by including "
additional_compulsory_arguments" with a list of triplets (argument name, default value, description) (see list of
command line arguments accepted by BianaParser by default).
def __init__(self):
"""
Start with the default values
"""
BianaParser.__init__(self, default_db_description = "MyData parser",
default_script_name = "MyDataParser.py",
default_script_description = MyDataParser.description,
additional_compulsory_arguments = [])
|
Next, we are going to overwrite
parse_database method (responsible from reading and inserting information from your data files) where we introduce some initial arrangements to let BIANA know about the characteristics of the data we are going to insert:
def parse_database(self):
"""
Method that implements the specific operations of a MyData formatted file
"""
# Add affinity score as a valid external entity relation since it is not recognized by BIANA
self.biana_access.add_valid_external_entity_attribute_type( name = "AffinityScore",
data_type = "double",
category = "eE numeric attribute")
# Add score as a valid external entity relation participant attribute since it is not recognized by BIANA
# (Do not confuse with external entity/relation score attribute, participants can have their attributes as well)
self.biana_access.add_valid_external_entity_relation_participant_attribute_type( name = "Score", data_type = "float unsigned" )
# Since we have added new attributes that are not in the default BIANA distribution, we execute the following command
self.biana_access.refresh_database_information()
|
There are various attributes and types in BIANA to annotate data entries coming from external databases (see
attributes and types recognized by BIANA for details). In case we need to use attributes/types that are not by default recognized by BIANA we need to make them known to BIANA as it is done above with
add_valid_external_entity_attribute_type and
add_valid_external_entity_relation_participant_attribute_type functions (see
defining new attributes and types for details).
# Open input file for reading
self.input_file_fd = open(self.input_file, 'r')
# Keep track of data entries in the file and ids assigned by BIANA for them in a dictionary
self.external_entity_ids_dict = {}
for line in self.input_file_fd:
(id1, desc1, score1, id2, desc2, score2, score_int) = line.strip().split()
|
Above we open a file for reading and start reading the file. This is followed by converting data read from the file into objects BIANA will understand and insert them into database:
# Create an external entity corresponding to Uniprot_id1 in database (if it is not already created)
if not self.external_entity_ids_dict.has_key(id1):
new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )
# Annotate it as Uniprot_id1
new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id1, type="cross-reference") )
# Associate its description
new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc1) )
# Insert this external entity into database
self.external_entity_ids_dict[id1] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )
# Create an external entity corresponding to Uniprot_id2 in database (if it is not already created)
if not self.external_entity_ids_dict.has_key(id2):
new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )
# Annotate it as Uniprot_id2
new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id2, type="cross-reference") )
# Associate its description
new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc2) )
# Insert this external entity into database
self.external_entity_ids_dict[id2] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )
|
Finally we insert information of the interaction as follows:
# Create an external entity relation corresponding to interaction between Uniprot_id1 and Uniprot_id2 in database
new_external_entity_relation = ExternalEntityRelation( source_database = self.database, relation_type = "interaction" )
# Associate Uniprot_id1 as the first participant in this interaction
new_external_entity_relation.add_participant( externalEntityID = self.external_entity_ids_dict[id1] )
# Associate Uniprot_id2 as the second participant in this interaction
new_external_entity_relation.add_participant( externalEntityID = self.external_entity_ids_dict[values[1]] )
# Associate score of first participant Uniprot_id1 with this interaction
new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id1],
participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score1 ) )
# Associate score of second participant Uniprot_id2 with this interaction
new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id2],
participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score2 ) )
# Associate the score of the interaction with this interaction
new_external_entity_relation.add_attribute( ExternalEntityRelationAttribute( attribute_identifier = "AffinityScore",
value = score_int ) )
# Insert this external entity realtion into database
self.biana_access.insert_new_external_entity( externalEntity = new_external_entity_relation )
|
As a good programming practice we do not forget to close the file we red as follows:
self.input_file_fd.close()
|
Command line arguments accepted by parsers
By default BIANA parsers require:
- "input-identifier=": path or file name of input file(s) containing database data. Path names must end with /.
- "biana-dbname=": name of database biana to be used
- "biana-dbhost=": name of host where database biana to be used is going to be placed
- "database-name=": internal identifier name to this database (it must be unique in the database)
- "database-version=": version of the database to be inserted"
The following optional arguments are also recognized:
- "biana-dbuser=": user name for the specified host
- "biana-dbpass=": password for the specified user name and host
- "optimize-for-parsing": set to disable indices (if there is any) and reduce parsing time. Useful when you want to insert a considerable amount of data to an existing BIANA database with indices created
Attributes and types recognized by BIANA and defining new ones
BIANA uses a set of attributes and types to define external entities coming from external biological databases (such as Uniprot Accession, STRING id, GO id, etc... as attributes and protein, DNA, interaction, complex, etc... as types). If you write a parser specialized for a particular data you have, you could either use existing attributes and types to annotate the entries in your data or create new ones if existing ones do not work for you.
Here we give a list of valid BIANA attributes:
- External Entity & External Entity Relation Attributes
name |
CHEBI |
COG |
CYGD |
DIP |
EC |
Encode |
Ensembl |
FlyBase |
GDB |
GeneID |
GeneSymbol |
GenomeReviews |
GI |
GO |
HGNC |
HPRD |
Huge |
IMGT |
IntAct |
IntEnz |
InterPro |
IPI |
KeggCode |
KeggGene |
Method_id |
MGI |
MIM |
MINT |
MIPS |
OrderedLocusName |
ORFName |
PFAM |
PIR |
PRINTS |
PRODOM |
Prosite |
psimi_name |
PubChemCompound |
Ratmap |
Reactome |
RGD |
SCOP |
SGD |
STRING |
Tair |
TaxID |
Unigene |
UniParc |
UniprotEntry |
WormBaseGeneID |
WormBaseSequenceName |
YPD |
AccessionNumber |
RefSeq |
TIGR |
UniprotAccession |
Disease |
Function |
Keyword |
Description |
SubcellularLocation |
Name |
Pubmed |
Formula |
Pvalue |
Score |
ProteinSequence |
Pattern |
STRINGScore |
SequenceMap |
NucleotideSequence |
PDB |
TaxID_category |
TaxID_name |
GO_name |
- External Entity Relation Participant Attributes
name |
---|
cardinality |
detection_method |
GO |
KeggCode |
role |
And here is the list of valid BIANA types:
- External Entity Types
name |
---|
protein |
DNA |
RNA |
mRNA |
tRNA |
rRNA |
CDS |
gene |
sRNA |
snRNA |
snoRNA |
structure |
pattern |
compound |
drug |
glycan |
enzyme |
relation |
ontology |
SCOPElement |
taxonomyElement |
PsiMiOboOntologyElement |
GOElement |
- External Entity Relation Types
name |
---|
interaction |
no_interaction |
reaction |
cluster |
homology |
pathway |
alignment |
complex |
regulation |
cooperation |
forward_reaction |
backward_reaction |
In case, you need to annotate your data with some attribute or type that does not belong to the lists given above, you can use the following methods to introduce your attributes and types to BIANA.
To add an;
- External Entity Type:
add_valid_external_entity_type(new_type)
- External Entity Relation:
add_valid_external_entity_relation_type(new_type)
- External Entity Attribute (Textual):
add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
- External Entity Attribute (Numeric):
add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
- External Entity Relation Attribute (Textual):
add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
- External Entity Relation Attribute (Numeric):
add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
- External Entity Relation Participant Attribute:
add_valid_external_entity_relation_participant_attribute_type(new_attribute, data_type)
Submit a new parser
Have you done a new BIANA parser for your own data or for another database? Submit you own parser and share it with others!