BIANA: Biologic Interaction and Network Analysis



BIANA Database Parsers

BIANA provides some default database parsers for most common databases and formats. BIANA has been designed to be able to store any kind of biologic database, relying on the user how he wants to integrate data between databases by choosing which combinations of attributes must be shared. However, due to the large number of different databases, formats and versions, and that often different versions of the same database have different formats, not all databases with biologic data have a current working BIANA Parser. Despite existing interchange standard formats, databases often change their formats, so parsers are not guaranteed to work in all database versions. In order to solve this problem, we provide a set of default parsers, that will be updated in this list of available parsers.

If you find an existing parser is not working any more for a new database version, or you are interested in having a parser for another database not available here, you can ask for us to make it (it can take some time), or try yourself creating a new parser (see 'How to make a parser for my own database or data' section below). Alternatively, you can use BIANA Generic Parser which accepts data in a prespecified tab-separated format. Once you convert your data you can user the Generic Parser to parse your data. Please refer to BIANA Manual on using Generic Parser.


Available Parsers

Parsed Database Name (or format) Parser file
Uniprot uniprotParser.py
PSI-MI 2.5 format psimi25Parser.py
Biopax Level 2 format biopaxLevel2Parser.py
Non-redundant Blast NR database nrParser.py
Cluster Of Orthologous Genes Database (COGs) cogParser.py
GeneBank ncbiGenPeptParser.py
Taxonomy taxonomyParser.py
Protein-protein interactions Open Biomedical Ontologies (PSM-MI OBO) psimioboParser.py
International Protein Index Database (IPI) ipiParser.py
HUGO Gene Nomenclature Committee (HGNC) hgncParser.py
Kyoto Encyclopedia of Genes and Genomes (KEGG) keggGeneParser.py, keggKOParser.py, keggligandParser.py
Structural Classification of Proteins (SCOP) scopParser.py
Protein Families database (PFAM) pfamParser.py
Gene Ontology (GO) goParser.py
Tabulated File Parser (Generic Parser) GenericParser.py
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) stringParser.py


Proposed unification protocol to be used

Databases Attributes
Uniprot, GeneBank, IPI, KeggGene, COG, String ProteinSequence AND taxID
Uniprot, HGNC, HPRD, DIP, MPACT, Reactome, IPI, BioGrid, MINT, IntAct, String UniprotAccession
Uniprot, String UniprotEntry
Uniprot, HGNC, HPRD, DIP, String GeneID
Uniprot, SCOP(promiscuous) PDB


How to make a parser for my own database or data

All parsers written in BIANA inherits BianaParser class found in biana/BianaParser/bianaParser.py. To write your own parser you need to create a new Python class whose parent is BianaParser. Then all you need to define is the arguments your parser would require in the __init__ (class constructor) method and overwrite parse_database member method which is responsible from reading and inserting information from your data files.
Here is an example parser (MyDataParser.py) to insert data in user specified format into BIANA. Let's go over the code.
First we start with subclassing BianaParser:


from bianaParser import *

                    

class MyDataParser(BianaParser):

    """             

    MyData Parser Class 



    Parses data in the following format (meaining Uniprot_id1 interacts with Uniprot_id2 and some scores are associated with both the participants and the interaction):



	    Uniprot_id1 Description1 Participant_score1 Uniprot_id2 Description2 Participant_score2 Interaction_Affinity_score

    """                 

                                                                                         

    name = "mydata"

    description = "This file implements a program that fills up tables in BIANA database from data in MyData format"

    external_entity_definition = "An external entity represents a protein"

    external_entity_relations = "An external relation represents an interaction with given affinity"

	
Above we introduce our parser and give name and description attributes, mandatory fields that are going to be used by BIANA to describe this parser. Then we create __init__ method where we call the constructor of the parent (BianaParser) with some additional descriptive arguments. You can add additional compulsory arguments to be requested from user by including "additional_compulsory_arguments" with a list of triplets (argument name, default value, description) (see list of command line arguments accepted by BianaParser by default).

def __init__(self):

"""

Start with the default values

"""

BianaParser.__init__(self, default_db_description = "MyData parser",  

		     default_script_name = "MyDataParser.py",

		     default_script_description = MyDataParser.description,     

		     additional_compulsory_arguments = [])

	
Next, we are going to overwrite parse_database method (responsible from reading and inserting information from your data files) where we introduce some initial arrangements to let BIANA know about the characteristics of the data we are going to insert:

def parse_database(self):

"""                                                                              

Method that implements the specific operations of a MyData formatted file

"""

# Add affinity score as a valid external entity relation since it is not recognized by BIANA

self.biana_access.add_valid_external_entity_attribute_type( name = "AffinityScore",

                                                                    data_type = "double",

                                                                    category = "eE numeric attribute")



# Add score as a valid external entity relation participant attribute since it is not recognized by BIANA 

# (Do not confuse with external entity/relation score attribute, participants can have their attributes as well)

self.biana_access.add_valid_external_entity_relation_participant_attribute_type( name = "Score", data_type = "float unsigned" )



# Since we have added new attributes that are not in the default BIANA distribution, we execute the following command

self.biana_access.refresh_database_information()

	
There are various attributes and types in BIANA to annotate data entries coming from external databases (see attributes and types recognized by BIANA for details). In case we need to use attributes/types that are not by default recognized by BIANA we need to make them known to BIANA as it is done above with add_valid_external_entity_attribute_type and add_valid_external_entity_relation_participant_attribute_type functions (see defining new attributes and types for details).

# Open input file for reading

self.input_file_fd = open(self.input_file, 'r')



# Keep track of data entries in the file and ids assigned by BIANA for them in a dictionary

self.external_entity_ids_dict = {}



for line in self.input_file_fd:

	(id1, desc1, score1, id2, desc2, score2, score_int) = line.strip().split()

	
Above we open a file for reading and start reading the file. This is followed by converting data read from the file into objects BIANA will understand and insert them into database:

# Create an external entity corresponding to Uniprot_id1 in database (if it is not already created)

if not self.external_entity_ids_dict.has_key(id1):

	new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )

	# Annotate it as Uniprot_id1

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id1, type="cross-reference") )

	# Associate its description

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc1) )

	# Insert this external entity into database

	self.external_entity_ids_dict[id1] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )

# Create an external entity corresponding to Uniprot_id2 in database (if it is not already created)

if not self.external_entity_ids_dict.has_key(id2):

	new_external_entity = ExternalEntity( source_database = self.database, type = "protein" )

	# Annotate it as Uniprot_id2

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Uniprot", value=id2, type="cross-reference") )

	# Associate its description

	new_external_entity.add_attribute( ExternalEntityAttribute( attribute_identifier= "Description", value=desc2) )

	# Insert this external entity into database

	self.external_entity_ids_dict[id2] = self.biana_access.insert_new_external_entity( externalEntity = new_external_entity )

	
Finally we insert information of the interaction as follows:

# Create an external entity relation corresponding to interaction between Uniprot_id1 and Uniprot_id2 in database

new_external_entity_relation = ExternalEntityRelation( source_database = self.database, relation_type = "interaction" )

# Associate Uniprot_id1 as the first participant in this interaction

new_external_entity_relation.add_participant( externalEntityID =  self.external_entity_ids_dict[id1] )

# Associate Uniprot_id2 as the second participant in this interaction

new_external_entity_relation.add_participant( externalEntityID =  self.external_entity_ids_dict[values[1]] )

# Associate score of first participant Uniprot_id1 with this interaction

new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id1], 

				participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score1 ) )

# Associate score of second participant Uniprot_id2 with this interaction

new_external_entity_relation.add_participant_attributes( externalEntityID = self.external_entity_ids_dict[id2], 

				participantAttribute = ExternalEntityRelationParticipantAttribute( attribute_identifier = "Score", value = score2 ) )

# Associate the score of the interaction with this interaction

new_external_entity_relation.add_attribute( ExternalEntityRelationAttribute( attribute_identifier = "AffinityScore",

											     value = score_int ) )

# Insert this external entity realtion into database

self.biana_access.insert_new_external_entity( externalEntity = new_external_entity_relation )

	
As a good programming practice we do not forget to close the file we red as follows:

self.input_file_fd.close()

	


Command line arguments accepted by parsers

By default BIANA parsers require:
  • "input-identifier=": path or file name of input file(s) containing database data. Path names must end with /.
  • "biana-dbname=": name of database biana to be used
  • "biana-dbhost=": name of host where database biana to be used is going to be placed
  • "database-name=": internal identifier name to this database (it must be unique in the database)
  • "database-version=": version of the database to be inserted"
The following optional arguments are also recognized:
  • "biana-dbuser=": user name for the specified host
  • "biana-dbpass=": password for the specified user name and host
  • "optimize-for-parsing": set to disable indices (if there is any) and reduce parsing time. Useful when you want to insert a considerable amount of data to an existing BIANA database with indices created

Attributes and types recognized by BIANA and defining new ones

BIANA uses a set of attributes and types to define external entities coming from external biological databases (such as Uniprot Accession, STRING id, GO id, etc... as attributes and protein, DNA, interaction, complex, etc... as types). If you write a parser specialized for a particular data you have, you could either use existing attributes and types to annotate the entries in your data or create new ones if existing ones do not work for you. Here we give a list of valid BIANA attributes:
  • External Entity & External Entity Relation Attributes
    name
    CHEBI
    COG
    CYGD
    DIP
    EC
    Encode
    Ensembl
    FlyBase
    GDB
    GeneID
    GeneSymbol
    GenomeReviews
    GI
    GO
    HGNC
    HPRD
    Huge
    IMGT
    IntAct
    IntEnz
    InterPro
    IPI
    KeggCode
    KeggGene
    Method_id
    MGI
    MIM
    MINT
    MIPS
    OrderedLocusName
    ORFName
    PFAM
    PIR
    PRINTS
    PRODOM
    Prosite
    psimi_name
    PubChemCompound
    Ratmap
    Reactome
    RGD
    SCOP
    SGD
    STRING
    Tair
    TaxID
    Unigene
    UniParc
    UniprotEntry
    WormBaseGeneID
    WormBaseSequenceName
    YPD
    AccessionNumber
    RefSeq
    TIGR
    UniprotAccession
    Disease
    Function
    Keyword
    Description
    SubcellularLocation
    Name
    Pubmed
    Formula
    Pvalue
    Score
    ProteinSequence
    Pattern
    STRINGScore
    SequenceMap
    NucleotideSequence
    PDB
    TaxID_category
    TaxID_name
    GO_name
  • External Entity Relation Participant Attributes
    name
    cardinality
    detection_method
    GO
    KeggCode
    role

And here is the list of valid BIANA types:
  • External Entity Types
    name
    protein
    DNA
    RNA
    mRNA
    tRNA
    rRNA
    CDS
    gene
    sRNA
    snRNA
    snoRNA
    structure
    pattern
    compound
    drug
    glycan
    enzyme
    relation
    ontology
    SCOPElement
    taxonomyElement
    PsiMiOboOntologyElement
    GOElement
  • External Entity Relation Types
    name
    interaction
    no_interaction
    reaction
    cluster
    homology
    pathway
    alignment
    complex
    regulation
    cooperation
    forward_reaction
    backward_reaction
In case, you need to annotate your data with some attribute or type that does not belong to the lists given above, you can use the following methods to introduce your attributes and types to BIANA. To add an;
  • External Entity Type: add_valid_external_entity_type(new_type)
  • External Entity Relation: add_valid_external_entity_relation_type(new_type)
  • External Entity Attribute (Textual): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
  • External Entity Attribute (Numeric): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
  • External Entity Relation Attribute (Textual): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE identifier attribute")
  • External Entity Relation Attribute (Numeric): add_valid_external_entity_attribute_type(new_attribute, data_type, "eE numeric attribute")
  • External Entity Relation Participant Attribute: add_valid_external_entity_relation_participant_attribute_type(new_attribute, data_type)

Submit a new parser

Have you done a new BIANA parser for your own data or for another database? Submit you own parser and share it with others!


Parsed Database Name Parsed Database link
Database Reference
List necessary files to parse
Parser file
Your e-mail
Additional comments

Copyright 2011 BIANA. All Rights Reserved.
Joomla theme by hostgator coupon