Python: module PianaDBaccess

PianaDBaccess

index
../../../../piana/code/PianaDB/PianaDBaccess.py

File : PianaDBaccess.py Author : R. Aragues & J. Planas Creation : 23.01.2004 Contents : class used as an interface with database piana Called from: all classes/programs that select/insert information from/into piana ======================================================================================================= This class is the piana user interface to piana database. There are lower levels that can be used to select/insert information from/into database piana (e.g. PianaDB class) but they shouldn't be used unless the programmer has a complete understanding of the system. This class implements functions such as insert_interaction() that, given all data about a particular interaction will make consistency checks and all required insertions to piana table. This is done this way to insure that a user doesn't forget to update all tables that are related to a new interaction, making trasparent for him the internal structure of the database. Summarizing, this class should be the only means of interaction with the database used by piana

Modules

Bio
MySQLdb
PianaGlobals
os
re
sys
time
utilities

Classes



__builtin__.object

Buffer
BufferElement

TemporalTableBuffer

PianaDBaccess

class Buffer(__builtin__.object)

    Class used as an buffer for piana Inserts

Methods defined here:

__init__(self, max_size, dbaccessObject)

get_buffer(self)
Returns the buffer dictionary

get_buffer_element(self, key)

insert2buffer(self, key, table, columns, values)
Inserts a query into the insert buffer Alert! Only used for insert buffers!!!

insert2tempbuffer(self, key, table, columns, conditions)
Inserts a query into the temptable buffer

Data and other attributes defined here:

__dict__ = <dictproxy object>
dictionary for instance variables (if defined)

__weakref__ = <attribute '__weakref__' of 'Buffer' objects>
list of weak references to the object (if defined)

class BufferElement(__builtin__.object)

    Class used in Buffer object to store a buffer element

Methods defined here:

__init__(self, max_size, table, columns, values=None)
Initializes a new buffer element "max_size" is the maximum size of the buffer element "table" is the table name in the database "columns" is a tuple with the names of the columns "values" is a tuple with the values to insert

getColumns(self)

getSize(self)

getTable(self)

getValues(self)

insert_values(self, values)
Insert into buffer element a new tuple of values "values" must be a tuple of values to insert If the values cannot be inserted because the size has been exceeded, return None

restart_bufferElement(self, values)
Restarts a Buffer Element to its initial values and insert the new tuple of values

Data and other attributes defined here:

__dict__ = <dictproxy object>
dictionary for instance variables (if defined)

__weakref__ = <attribute '__weakref__' of 'BufferElement' objects>
list of weak references to the object (if defined)

class PianaDBaccess(__builtin__.object)

    Class used as an interface with database piana

Methods defined here:

__getnewargs__(self)

__getstate__(self)
# ---- # methods required for using pickle with piana objects # ----

__init__(self, dbname=None, dbhost=None, dbuser=None, dbpassword=None, dbport=None, buffer=1)
"dbname" is the database name to which you want to connect to (required) "dbhost" is the machine with the mysql server that holds the piana database (required) "dbuser" is the mysql user (not required in most systems) "dbpassword" is the mysql password (not required in most systems) "dbport" is the mysql port (not required in most systems) "buffer" indicates if the user wants to use the intern Piana Buffer or not. It can be "None" (not use buffer) or "1" (use buffer) The buffer is used by default

__setstate__(self, dict)

add_external_database_information_type(self, externalDatabase, information_type)
Adds to an external database which kind of information is used from it ATTENTION! Information_type must be one element of the valid types: (it has to be checket previously!) "protein sequences" "protein attributes" "protein-protein interactions"

check_keywords_in_protein(self, list_proteinPiana, protein_ext_code=None, id_type=None, keywords=[], search_in=None)
Checks all protein info [ie. description, function, keyword list] for keywords given by user Returns the list of words in "keywords" that appear somewhere in the protein. (empty if nothing found) "list_proteinPiana" is the list of proteins that you want to check (if it is just one protein, write [your_proteinPiana] You can also use external code types: in that case, set list_proteinPiana to None and set protein_code and id_type to something:      "protein_code" is the protein external code      "id_type" is the type of code used for protein_code         --> it has to be one of the types listed in the PIANA reference card (read README.piana_tutorial)         --> attention: not fixing tax id here, therefore, I recommend not using geneNames...             "keywords" is a list of strings    --> strings of keywords must be in lower case.    --> This method will check that the keywords appear for the protein, regardless of the case of the words in the protein info    --> for example, keywords could be ['cancer', 'onco', 'apoptosis'] "search_in" is a list of texts where the keywords should be searched             if it is not used, description, function and keyword will be queried to database

check_lock_frequency(self, table_list=[])

check_protein(self, proteinSequence_value, tax_id_value=None)
Method used to check if a protein is already in the database Returns a proteinPiana if the protein exists, or None if not exists Since pianaDB is a sequence-based DB, introducing proteinSequence_value is mandatory. There is one proteinPiana for each (sequence, tax_id): therefore, tax_id is mandatory. However, when the user doesn't know the tax id for the protein, we use a dummy tax_id (ie. 0) to allow searching that sequence in the database. Therefore, if the tax_id is unknown, leave it to None.

check_proteins_similarity(self, proteinPiana_a_value, proteinPiana_b_value)
Returns 1 if the proteins are the same (ie. there is an entry proteinPiana_a, proteinPiana_b in table proteinSimilarity)

close(self)
Close the connection with database

delete_interaction(self, interactionPiana_value)
Deletes entry of table interaction with interactionPiana = interactionPiana_value Attention! You should not delete an interactionPiana before making sure that other tables of interactions "interactionPiana" are deleted as well: this will just delete the row in table interaction

delete_interaction_method_for_sourceDBID(self, sourceDBID_value)
Deletes entries of table interactionMethod with sourceDBID = "sourceDBID_value"

delete_interaction_protein_source_for_sourceDBID(self, sourceDBID_value)
Deletes entries of table interactionProteinSource with sourceDBID = "sourceDBID_value"

delete_interaction_scores_for_sourceDBID(self, sourceDBID_value)
Deletes entries of table interactionScores with sourceDBID = "sourceDBID_value"

delete_interaction_sourceDB_for_sourceDBID(self, sourceDBID_value)
Deletes entries of table interactionSourceDB with sourceDBID = "sourceDBID_value" To do it, before the deletion performs a select to keep a list of those entries that are going to be deleted. Returns a list of interactionPiana that have been deleted from table interactionSourceDB

disable_buffer(self)
Disables the use of the insert buffer. If the buffer contains anything, it empties it first

drop_external_database(self, database_name, pianaDB)
Method used to delete from the database all records from an external database

drop_multiple_external_databases(self, list_databases, pianaDB)
Method used to delete from the database all records from multiple external database

empty_temporal_table(self)
Executes all commands used to empty temporal table and update tables All table will be emptied

execute_insert_buffer(self, key_buffer=None)
Executes all inserts in an insert buffer and empties it If key_buffer is None, it will execute all buffe. Otherwise, it will insert only the key specified

get_MD5_and_sequence_from_sequenceID(self, sequenceID=None)
returns  the sequence that corresponds to sequenceID

get_all_g2_partners(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
Returns a list with all partners (proteinPianas) at distance 2 of "proteinPiana_value", considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods)      --> If A interacts with B and C, and B interacts with D and C with no protein, then partners of A at distance 2 are [D] "threshold" is used to limit the number of partners that each of the proteins at distance 1 can have    - if "threshold" == 0, then always return all partners for all proteins    - if threshold!=0 and number of partners > threshold, then return empty list (no partners) for that particular protein    --> ATTENTION!!!! threshold does not apply to the partners being returned by this method, but to the individual calls                      that this method makes to get_all_partners. Therefore, this method can return whichever number of                      partners, guaranteeing that no single protein has added to the list more than "threshold" partners     Returns empty list if no partner at distance 2 is found

get_all_partners(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
Returns a list with all partners (proteinPianas) of "proteinPiana_value", considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods) If use_self_ints == "yes", it won't return itself as a partner .Set to 'no' if you want to get it in case it exists. "threshold" is used to limit the number of partners that can be returned:    - if "threshold" == 0, then always return all partners    - if threshold!=0 and number of partners > threshold, then return empty list (no partners) Returns empty list if no partner is found

get_all_partners_with_interactionPiana(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
Returns a list with all partners (proteinPiana,interactionPiana) of "proteinPiana_value" and their associated interactionPiana, considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods) If use_self_ints == "yes", it won't return itself as a partner .Set to 'no' if you want to get it in case it exists. "threshold" is used to limit the number of partners that can be returned:    - if "threshold" == 0, then always return all partners    - if threshold!=0 and number of partners > threshold, then return empty list (no partners) Returns empty list if no partner is found

get_all_proteinPiana(self, tax_id_value=0)
returns all proteinPianas in pianaDB of species tax_id_value (if 0, returns all proteinPianas in database)

get_all_proteinSimilarity(self, sourceDB_value)
Returns a tuple of tuples of similar proteins (proteinPianaA,proteinPianaB) for a given database

get_all_protein_dbali_cluster(self, clustering_method)
Returns a list with all tuples (proteinPiana, dbali cluster ID) for a specific clustering method

get_all_protein_protein_interactions(self, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', only_id='no', tax_id_value='all')
Returns a list of triplets (proteinPianaA, proteinPianaB, interactionPiana), considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods). If "inverse_dbs"/"inverse_methods" is 'yes', then it returns everything except for those dbs/methods If use_self_ints == "no", it won't return interactions between proteinPiana_value and itself. Set to 'yes' if you want to get them if only_id is 'yes', then, instead of triplets, a list of interactionPianas is returned if no interaction is found, returns empty list

get_code_value(self, proteinCode_table, proteinCode_col, proteinPiana_value=None, answer_mode=None, source_db_info='no', id_type=None)
Method used to retrieve a protein external code from column "proteinCode_col" of table "proteinCode_table" for a given  proteinPiana (it is a generalization to obtain external code from a given table that corresponds to protein proteinPiana) "answer_mode" can be     - 'single' (if you just want one external code)     - 'list' (if you want all external codes from that table)     - 'list_primary' (if you want all external codes from that table, but the list should contain only primary codes, unless there were none) "source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not    - "no" will simply return a list of proteinPianas    - "yes" will return a list of tuples (proteinPiana, sourceDBID) in those cases where there is only one external code for "proteinPiana", "answer_mode" 'single' will return it as a single element and "list" as the first (and only) element of a list If nothing is found, the method will return None (in "answer_mode" "single") and an empty list (in "answer_mode" "list") "id_type" is only used when proteinCode_table is the generalized table

get_column_of_id_type(self, id_type)
Method that returns the column in which "id_type" can be found

get_complete_table(self, table, attributes=[])
Get the complete content of table "table" for "attributes" values, without removing duplicates

get_dic_proteins_with_known_ints(self, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
Returns a dictionary with keys those proteinPianas for which there are known interactions, considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods) Contents of the dictionary are None for all proteinPianas. If use_self_ints == "no", it won't return proteinPianas that only interact with themselves itself. Set to 'yes' if you want to get them

get_from_dict(self, dict_name, description_value)
returns ID for a given description of a dictionarized entity to find its "description_value" description_value must exist in the dictionary PianaGlobals.dict_name, otherwise nothin will be returned

get_go_depth(self, term_id_value)
Returns depth value in GO tree for "term_id_value"

get_interactionPiana(self, proteinPianaA_value, proteinPianaB_value, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
Returns the interacionPiana for the interaction between proteins "proteinPianaA_value" and "proteinPianaB_value", considering only those interaction source databases listed in "list_source_dbs" ('all' returns all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' returns all interactions from all methods) No need to respect order of proteinPianaA and proteinPianaB: this method will take care of it, place them as you want. If there is not such interaction, returns None

get_interaction_line_style(self, interaction_type)
returns the line style established in PianaGlobals for interaction_type interaction_type can be:      - normal      - expanded

get_interaction_methodID_list(self, interactionPiana_value)
Returns a list of methodID for a particular "interactionPiana_value" Returns empty list if nothing is found Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases   -> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the   sense that it will have  a complete list of sourceDBs and methods, regardless of restrictions imposed by user.   -> Right now this method is only used to check if an interaction can be deleted (./dbModification/delete_interactions_from_db.py)      and to calculate piana stats

get_interaction_methodID_list_for_sourceDB(self, interactionPiana_value, sourceDBID_value)
Returns a list of methodID for a particular interaction "interactionPiana_value" for a given "sourceDBID"   (here, soucerDBID is the name of the database, not the integer associated to it) Returns empty list if nothing is found Attention! There is no argument list_source_dbs: since arguments of this method fix a sourceDB, I let the user be responsible for not asking to get methodIDs for a database that is not the one he wants...

get_interaction_pubmedID_list(self, interactionPiana_value)
Returns a list of pubmedID for a particular "interactionPiana_value" Returns empty list if nothing is found Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases   -> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the   sense that it will have  a complete list of sourceDBs and methods, regardless of restrictions imposed by user.

get_interaction_source_database_color(self, database_name)
returns the color code established in PianaGlobals for database_name

get_interactors(self, interaction_id=None)
Returns the tuple of interactors interactors (proteinPianaA, proteinPianaB) for a given interaction_id No need to check if the interaction belongs to a specific method: the user is only calling this method when he already retrieved an interactionPiana: and he will only retrieve interactionPianas that respect the source db and method restrictions

get_list_proteinMD5(self)
returns a list with all distinct proteinMD5

get_list_proteinPiana(self, protein_id=None, id_type=None, tax_id_value=0, source_db_info='no', only_primary='default')
Method used to retrieve a list of proteinPiana identifiers from a given protein "protein_id" of type "id_type" For example, one geneName "mot1" could correspond to many proteinPianas, since different databases give different sequences to mot1 Internally, we have to work with all proteinPianas, and then give the answer to the user in the type of code he chose There can be many proteinPianas [id, id, id] or just one [id] or cero []. valid "id_type" are those valid protein identifiers in current database.                 For types "geneName" and "uniprotAcc", only will be returned "tax_id_value" fixes the species for the proteinPiana that will be returned. If tax_id_value is 0, all proteinPianas associated to                protein_id will be returned.                In case you don't know the species of your code, set tax_id_value to 0. If your code implicitly points to a single                species, this will be OK.                In which cases tax_id_value has an effect on the proteinPianas that are returned?                For example, if a geneName is associated to several proteinPianas, only those that are of tax_id_value will be returned. "source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not    - "no" will simply return a list of proteinPianas    - "yes" will return a list of tuples (proteinPiana, sourceDBID) if protein_code being passed is a pdb code, the format of the code must be pdb_code.chain_id -> If the chain_id is None, write pdb_code. (leaving the dot to mark that chain is unknown) "only_primary" determines, for those cases in which external codes can be primary or not (geneName and uniprotAcc), if only                primary correspondences are taken or not:                - "yes" will return only those proteinPianas that correspond to the code when this is primary.                - "yes-no" will return those proteinPianas that correspond to the code when this is primary. If any code is found, then search all for this                - "no" will return all proteinPianas, regarless if external code is primary or not                - "default": will be:                             "yes-no" for geneName and uniprotAccession (uniacc)                             "no" for the rest

get_list_proteinPiana_from_MD5(self, md5_value=None)
returns a list of proteinPianas that correspond to a sequence (represented by its md5 code)

get_list_protein_external_codes(self, proteinPiana=None, id_type=None, alternative_id_types=[], answer_mode='list', source_db_info='no')
returns the list of external protein codes in type "id_type" that correspond to protein proteinPiana. If no code is found for "id_type" , it succesively tries with types listed in alternative_id_types    -> if an alternative type name code is found, the list returned will contain strings of the form "id_type:external_code"    -> it will only return the list of codes for the first alternative type found    -> eg, alternative_id_types could be ["uniacc", "geneName", "md5"] "proteinPiana" is the internal piana code for the protein you want to get the external codes "id_type" is an easy-to-remember protein code type    -> They are stored in valid_protein_ids in MySQL database, and can be consulted using a piana command    -> valid id_type values are those in the PIANA reference card (read README.piana_tutorial)    -> it can be "all", meaning that the list returned will contain all codes in all types (all codes will be preceded by type name)        -> alternative_id_types are ignored if id_type is all "alternative_id_types" is a list of easy-to-remember protein code type    for example, alternative_id_types could be ["uniacc", "gi", "proteinPiana"]    I recomend writing always md5 at the end to make sure you don't have Nones in your output       -> alternative_id_types are ignored if id_type is all "answer_mode" can be:        - "single" (if you just want one external identifier)        - "list" (if you want all external identifiers for that proteinPiana)        - "list_primary" (if you want all external identifiers for that proteinPiana, but identifiers should be primary, unless none is found) "source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not    - "no" will simply return a list of external_codes    - "yes" will return a list of tuples (external_code, sourceDBID) (this mode does not allow alternative_id_types nor id_type=='all') Attention!!! This method does not work if you set "id_type" to 'all' and "answer_mode" to 'single'!!!! Attention!!! This method does not work if you ask for all external codes (id_type="all") with their source db infos (source_db_info="yes")!!!!

get_list_sequenceID(self, taxID=0)
returns a list with all distinct sequenceID "taxID" Tax ID of the specie from we want to obtain the list of sequenceID. It it is "0", it is not take into account

get_list_sourceDB_names_from_interactionPiana(self, interactionPiana_value)
Returns a list of sourceDB for a particular "interactionPiana_value" Returns empty list if nothing is found Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases   -> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the   sense that it will have  a complete list of sourceDBs and methods, regardless of restrictions imposed by user.

get_list_sourceDB_names_from_sourceDB_id(self, sourceDB_id)
returns a list of strings (ie. names of databases) from the integer associated to the record (i.e. sourceDB_id)

get_list_valid_protein_ids(self)
returns a sorted list with valid protein identifiers

get_list_valid_sources(self)
returns a sorted list with valid database sources

get_max_proteinPiana(self)
Method that returns the maximum value of proteinPiana in the database

get_methodID(self, methodDescription_value)
returns methodID for a given description of a method to find interactions "methodDescription_value" methodDescription_value must exist in dictionary PianaGlobals.method_names, otherwise nothing will be returned

get_new_interaction_piana(self)
Method used to retrieve a new interactionPiana identifier from table proteinPianaCounter Called every time we want to insert a new interaction in the database It takes care of increasing by 1 the counter table proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana

get_new_protein_piana(self)
Method used to retrieve a new proteinPiana identifier from table proteinPianaCounter Called every time we want to insert a new (sequence,taxID) in the database It takes care of increasing by 1 the counter table proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana

get_new_sequenceID(self)
Method used to retrieve a new sequenceID identifier from table proteinPianaCounter Called every time we want to insert a new sequence in the database It takes care of increasing by 1 the counter table proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana

get_node_border_color(self, node_origin)
returns the color code established in PianaGlobals for node_origin node_type can be:      - expanded      - normal

get_node_fill_color(self, node_type)
returns the color code established in PianaGlobals for node_type node_type can be:      - root      - normal

get_pair_method_protein_dbali_cluster(self, proteinPiana_value)
Returns a list of tuples (dbali cluster ID, clustering_method) for protein "proteinPiana_value"

get_partner(self, interaction_id=None, proteinPiana_value=None)
Returns the partner (proteinPiana) of "proteinPiana_value" in interaction "interaction_id" No need to check if the interaction belongs to a specific method: the user is only calling this method when he already retrieved an interactionPiana: and he will only retrieve interactionPianas that respect the source db and method restrictions

get_partners_of_proteins_sharing_cog(self, proteinPiana_value=None, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
Returns interaction partners (proteinPianas) of those proteins that share the cogID with "proteinPiana_value", considering only those interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction source methods listed in "list_source_methods" ('all' gets all interactions from all methods)

get_piana_db(self)
Returns the PianaDB object used for this PianaDBaccess object

get_proteinPianas_with_value(self, value, proteinCode_table, proteinCode_col)
returns proteinPianas whose value for column "proteinCode_col" in table "proteinCode_table" is "value"

get_protein_cath(self, proteinPiana_value, residue_value=None)
Returns a list of cathIDs for protein "proteinPiana_value" If "residue_value" is not None, then it will return only caths that are defined around that residue If "residue_value" is None, returns all cath values for the protein

get_protein_cog(self, proteinPiana_value)
returns a list with all COG identifiers that are assigned to protein "proteinPiana_value" In case no COG identifier is found, returns an empty list.

get_protein_dbali_cluster(self, proteinPiana_value, clustering_method, source_db)
Returns a list with all dbali cluster IDs of a proteinPiana for a specific clustering method "source_db" can be 'dbali', 'blast_transfer' or 'all'. It can be used to limit the dbali_clusters returned to those found through a specific technique "clustering_method" refers to the parameters used for the clustering. Valid values are shown in PianaGlobals.pibase_dbali_methods

get_protein_description(self, proteinPiana_value)
returns a list with all descriptions (text string) that are assigned to protein "proteinPiana_value" In case no description is found, returns an empty list.

get_protein_ec(self, proteinPiana_value)
returns a list with all EC identifiers that are assigned to protein "proteinPiana_value" In case no EC identifier is found, returns an empty list.

get_protein_function(self, proteinPiana_value)
returns a list with all functions (text string) that are assigned to protein "proteinPiana_value" In case no function is found, returns an empty list.

get_protein_go_accession(self, proteinPiana_value)
returns a list with all GO accessions (codes that look like "GO:234324") that are assigned to protein "proteinPiana_value" In case no GO term is found, returns an empty list.

get_protein_go_name(self, go_term_id_value)
returns  the GO name associated to "go_term_id_value" In case no GO term is found, returns None

get_protein_go_term_id(self, proteinPiana_value, term_type_value=None)
returns a list with all GO terms ids (internal go identifiers) that are assigned to protein "proteinPiana_value" In case no GO term is found, returns an empty list. "term_type_value" can be None or one of the following:   - "molecular_function"   - "biological_process"   - "cellular_component" if term_type_value is None, then returns all go terms independently of their term type category

get_protein_go_term_name(self, proteinPiana_value, term_type_value=None)
returns a list with all GO terms names (e.g. Apoptosis) that are assigned to protein "proteinPiana_value" In case no GO term is found, returns an empty list. "term_type_value" can be None or one of the following: - "molecular_function" - "biological_process" - "cellular_component" if term_type_value is None, then returns all go terms independently of their term type category

get_protein_ip(self, proteinPiana=None)
returns isoelectric point of protein proteinPiana

get_protein_keyword(self, proteinPiana_value)
returns a list with all keywords (text string) that are assigned to protein "proteinPiana_value" In case no keyword is found, returns an empty list.

get_protein_kingdoms(self, proteinPiana_value)
Returns a list with kingdoms (as defined by ncbi) related to "proteinPiana_value"

get_protein_mw(self, proteinPiana=None)
returns molecular weight of protein proteinPiana

get_protein_reactome_ids(self, proteinPiana_value)
returns a list with all REACTOME ids (e.g. REACT_1698.1) that are assigned to protein "proteinPiana_value" In case no REACTOME is found, returns an empty list

get_protein_reactome_names(self, proteinPiana_value)
returns a list with all REACTOME names (e.g. Gene Expression) that are assigned to protein "proteinPiana_value" In case no REACTOME is found, returns an empty list

get_protein_scop_cf(self, proteinPiana_value)
Returns a list of cf codes for protein "proteinPiana_value"

get_protein_scop_cf_sf_fa(self, proteinPiana_value)
Returns a list of tuples (cf, sf, fa) for protein "proteinPiana_value"

get_protein_scop_fa(self, proteinPiana_value)
Returns a list of fa codes for protein "proteinPiana_value"

get_protein_scop_sf(self, proteinPiana_value)
Returns a list of sf codes for protein "proteinPiana_value"

get_protein_sequence(self, proteinPiana=None)
returns  the sequence of protein proteinPiana

get_protein_sequenceLength(self, proteinPiana_value=None)
returns the length of the sequence of protein proteinPiana_value

get_protein_species_names(self, proteinPiana_value)
Returns a list with species names (as defined by ncbi) related to "proteinPiana_value"

get_protein_subcellularLocation(self, proteinPiana_value)
returns a list with all cellular locations (text string) that are assigned to protein "proteinPiana_value" In case no keyword is found, returns an empty list.

get_protein_taxonomy_ids(self, proteinPiana_value)
Returns a list with taxonomy ids (as defined by ncbi) related to "proteinPiana_value"

get_proteins_sharing_cog(self, proteinPiana_value)
Returns a list of proteinPiana proteins that share a COG identifier with "proteinPiana_value" Empty list returned if nothing is found

get_proteins_sharing_ec(self, proteinPiana_value)
Returns a list of proteinPiana proteins that share an EC identifier with "proteinPiana_value" Empty list returned if nothing is found

get_proteins_sharing_go(self, proteinPiana_value)
Returns a list with proteinsPianas that have the same go_term_id as "proteinPiana_value"

get_proteins_sharing_interpro(self, proteinPiana_value)
Returns a list of proteinPiana proteins that share a interpro identifier with "proteinPiana_value" Empty list returned if nothing is found

get_proteins_sharing_scop(self, proteinPiana_value)
Returns a list of proteinPiana proteins that share a SCOP family with "proteinPiana_value" Empty list returned if nothing is found

get_proteins_sharing_species(self, species_name_value=None, taxonomy_value=None)
Returns a list of proteinPiana proteins that have species taxonomy_value (or species_name_value) The user can input "taxonomy_value" (9606 for human, etc) or a "speciesName" string ('human', 'yeast', ...) in case both species_name_value and taxonomy_value are set to something different from None, taxonomy_value is used Empty list returned if nothing is found

get_proteins_with_scop(self, cf=None, sf=None, fa=None)
Returns a list of proteinPianas that have a given SCOP code if any of the three categories is None, that category is ignored if all categories are set, then this method impose that the proteins returned have those three categories

get_reactome_names_from_reactome_ids(self, list_reactome_ids=[])
returns the reactome names that are associated to reactome ids in list

get_sequence_from_sequenceID(self, sequenceID=None)
returns  the sequence that corresponds to sequenceID

get_similar_proteins_dic(self, proteinPiana_value)
Returns a dictionary with keys those proteins that are similar to proteinPiana_value

get_sourceDB_id_from_interactionPiana(self, interactionPiana_value)
Returns the database integer sourceDBID for a particular "interactionPiana_value"    --> this method returns an integer, not a list of sourceDB names!    --> this should only be used internally by PianaGraphEdgeAttribute, to minimize the amount of memory used Returns None if nothing is found Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases   -> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the   sense that it will have  a complete list of sourceDBs and methods, regardless of restrictions imposed by user.

get_source_dbs_dict(self, database_type='all')
Returns a dictionary with complete information of current external databases stored in PIANA database "databaseType" is used to restrict those databases for which information must be retrieved. It can be:           "all"           "protein sequences"           "protein attributes"           "protein-protein interactions"           "identifiers cross-references" The dictionary has the following format:         key: databaseName (the database identifier in PIANA database)         Each record in the dictinoary contains a dictionary with the keys (and their correspondent contents):                 databaseVersion                 parsed_file                 date (date of the parsing)                 database_description                 database_type: List of which kind of data the database inserts

get_species_names_from_taxonomies(self, list_taxonomy_ids=[], only_scientific='no')
Returns a list with species names that are related to the taxonomy ids in "list_taxonomy_ids" if only_scientific is 'yes', then the scienfific name is returned

get_table_of_id_type(self, id_type)
Method that returns the table in which "id_type" can be found (ie the pianaDB table where this protein identifier is located)

get_taxonomies_from_species_name(self, species_name_value=None)
Returns a list with taxonomy ids for a given speciesName species_name_value

get_temporalSimilarity_data(self)
Get all protein-protein similarities from database TO CHANGE: It now reads from temporalProteinSimilarity, but it must read it directly from search...

get_term2term_distance(self, term1, term2)
Returns distance in the GO tree between "term1" and "term2"

get_valid_protein_ids(self)

get_valid_sources(self)
returns a dictionary with valid external database sources

insert_accessionNumber(self, accessionNumber_value, proteinPiana_value, accessionNumber_source_value)
Insert correspondence between  accessionNumber  "accessionNumber_value" and "proteinPiana_value" in table accessionNumber Insert as well source database (to keep origin database of that code)

insert_blast_results(self, sequenceID_A_value, sequenceID_B_value, score_value, bit_score_value, start_A_value, end_A_value, start_B_value, end_B_value, identities_value, similarity_value, gaps_value, program_value, filter_value)
Inserts the blast result of two sequences in the database "program_value" is the program used to obtain the blast result. It can be "bl2seq" or "blastall" "filter_value" is used to check if filter is used when running blast. It can be "T" if the filter is used or "F" is it not used

insert_cog(self, cog_id, cog_description, cog_function, source_db)
Inserts a COG (Cluster of Orthologous Genes) entry into Piana "cog_id" is the COG identifier "cog_description" is the text string describing the COG "cog_function" is the text string describing the function of genes in this cluster "source_db" is the database that is giving this information

insert_ensembl_code(self, ensembl_code_value, proteinPiana_value, ensembl_source_value)
Insert correspondence between  ensembl  "ensembl_code_value" and "proteinPiana_value" in table ensembl Insert as well source database (to keep origin database of that code)

insert_external_id_code(self, external_id_type, external_id_value, proteinPiana_value, sourceDBID)
Insert correspondence between external id "external_id_value" of type "external_id_type"

insert_geneID_code(self, geneID_code_value, proteinPiana_value, geneID_source_value)
Insert correspondence between  geneID  "geneID_code_value" and "proteinPiana_value" in table geneID Insert as well source database (to keep origin database of that code)

insert_geneName_code(self, geneName_code_value, proteinPiana_value, geneName_source_value, isPrimary_value)
Insert correspondence between  geneName "geneName_code_value" and "proteinPiana_value" in table geneName Insert as well source database (to keep origin database of that code)

insert_gi_code(self, gi_code_value, proteinPiana_value, gi_source_value)
Insert correspondence between  gi  "gi_code_value" and "proteinPiana_value" in table gi Insert as well source database (to keep origin database of that code)

insert_go(self, go_id, go_name, acc, term_type, distance2root, source_db)
Inserts a GO (Gene Ontology) entry into Piana "go_id" is the go term id "go_name" is the text string associated to the term id "acc" is the go accession (code that looks like 'GO:234234') "term_type" is the type of GO term -> can be one of the following (or None)   - "molecular_function"   - "biological_process"   - "cellular_component" "distance2root" is the distance between this term id and the root of the GO hierarchy "source_db" is the external database that is giving this information

insert_go_term2term_distance(self, term1_id, term2_id, distance)
Inserts distance "distance" between GO terms "term1" and "term2" This is the distance between those terms in the GO hierarchy

insert_interPro_code(self, interProID_code_value, proteinPiana_value, interProDescription_value, interPro_source_value)
Insert interPro information into table interPro (interProID and interProDescription) Insert as well source (to keep origin database of that code) and proteinPiana (to establish the relationship)

insert_interaction(self, proteinPianaA_value, isSourceA_value, proteinPianaB_value, isSourceB_value, interactionConfidence_value, methodDescription_value, sourceDBDescription_value, confidenceAssignedSourceDB_value, pubmed_id_value='unknown')
Inserts a new interaction into piana database. "proteinPianaA_value" is the proteinPiana for one side of the interaction "isSourceA_value" sets if the interaction goes from A to B (1) or not (0) "proteinPianaB_value" is the proteinPiana for the other side of the interaction "isSourceB_value" sets if the interaction goes from B to A (1) or not (0) if "isSourceA_value" and "isSourceB_value" are 1, the interaction is bi-directional "interactionConfidence_value" is the relyability of the interaction (not being used currently) "methodDescription_value" is the method name that detected this interaction (eg. 'yeast two hybrids')                           It can be a single value or a list of values. The user has not to worry about it, the method takes it into account   --> must appear in PianaGlobals.method_names    "sourceDBDescription_value" is the source database that contains this interaction (eg. 'DIP')   --> The database must exist in the database "confidenceAssignedSourceDB_value" is the relyability assigned to the interaction by the source database "pubmed_id_value" is the pubmed identifier for the article where this interaction was described                   It can be a single value or a list of values. The user has not to worry about it, the methods takes it into account Things this method does are:    - makes sure the order proteinPianaA < proteinPianaB is respected    - searches the methodID corresponding to the method description    - searches the sourceDBID corresponding to the sourceDB description     1. check if interaction exists already, retrieve interactionPiana in case it does     2. if interactionPiana is  None:            insert information of table interaction and retrieve new interactionPiana     3. insert interactionSourceDB with interactionPiana     4. insert interactionMethod according to this sourceDB with interactionPiana Attention! I don't allow the user to limit the insertions to those from an specific database or method. If he wants to filter the database, do it afterwards... the pianaDB will contains all interactions regardless of origin (if one day somebody wants to change this, he will have to receive arguments list_source_dbs and list_source_methods   and use them below to limit the insertion)

insert_interaction_scores(self, interactionPiana_value, sourceDBDescription_value, equiv_nscore_value, equiv_nscore_transferred_value, equiv_fscore_value, equiv_pscore_value, equiv_hscore_value, array_score_value, array_score_transferred_value, experimental_score_value, experimental_score_transferred_value, database_score_value, database_score_transferred_value, textmining_score_value, textmining_score_transferred_value, combined_score_value)
Inserts interaction scores into table interactionScores_table This table only holds information for interactions contained in STRING Refer to the string manual for description of the different arguments

insert_new_external_database(self, databaseName=None, databaseVersion=None, parsedFile=None, databaseDescription=None, databaseInformation=None)
Inserts into database the information of a new external database that is being integrated into PIANA database "databaseName" will be the internal identifier for this database, so, it must be unique. "pianaDB" is the name of the PIANA database where the new protein id type is inserted. It is used to generate automatic documentation on current database "databaseInformation" is a list of the type of information that the database inserts into PIANA database ATTENTION! Information_type must be one element of the valid types:          "protein sequences"          "protein attributes"          "protein-protein interactions"          "identifiers cross-references"

insert_new_id_type(self, proteinTypeId, proteinTable='proteinExternalId', externalIdColumn='proteinExternalId', externalIdDescription=None)
Inserts into PIANA database a new external Id identifier First, it will check if this external Type Id was previously in the database: if it wasn't, insert it. Otherwise, error "proteinTable" is the SQL table where protein ids of "proteinTypeId" are kept "externalIdColumn" is the SQL column of "proteinTable" where protein ids of "proteinTypeId" are kept

insert_pdb_code(self, pdb_code_value, proteinPiana_value, chain_value, range_value, pdb_source_value)
Insert correspondence between pdbs and "proteinPiana_value" in table pdb Internally, the pdb code is formed by "pdb_code_value" + "." + "chain_value" Insert as well source database (to keep origin database of that code)

insert_pfam_code(self, pfamID_code_value, proteinPiana_value, pfamDescription_value, pfam_source_value)
Insert pfam information into table pfam (pfamID and pfamDescription) Insert as well source (to keep origin database of that code) and proteinPiana (to establish the relationship)

insert_protein(self, proteinSequence_value, sourceDBID, tax_id_value=None)
Method used to insert new proteins into pianaDB in `protein` table. Returns a proteinPiana: it is the code corresponding to the  (sequence, tax_id)                           --> if (sequence, tax) didn't exist already, creates a new proteinPiana                           --> if the sequence is already present in pianaDB, returns the previous proteinPiana -------------------------------------------------------------------------------------------------------------- Since pianaDB is a sequence-based DB, introducing proteinSequence_value is mandatory. There is one proteinPiana for each (sequence, tax_id): therefore, tax_id is mandatory. However, when the user doesn't know the tax id for the protein, we use a dummy tax_id (ie. 0) to allow inserting that sequence. Therefore, if the tax_id is unknown, leave it to None. We use MD5 codes (calculated here) instead of sequences to fasten up the process of comparing sequences MW and IP values for the protein are calculated using BioPython methods This method takes care of handling proteinPiana codes:   1. It first looks for existence of the (sequence, tax_id) in table protein,   which is the registry of correspondences between (protein sequence, tax id) and proteinPiana identifiers. We need to keep this   registry in order to make sure that proteinPiana identifiers do not change when updating the database, or building a   new one from scratch.   2. If the (sequence, tax_id) does not exist in the database, then obtain a new proteinPiana identifier. This is done with a method   that looks into a counter table, returns its value and increases the counter for the next proteinPiana identifier.   3. Once proteinPiana is known (newly generated, or  assigned from an old insertion) insert the protein into the database

insert_protein_cath(self, cath_id_value, res_start_value, res_end_value, segmentID_value, proteinPiana_value, proteinCathSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its CATH "cath_id_value" "cath_id_value" is the CATH id "res_start_value" is the residue where the CATH domain starts "res_end_value" is the residue where the CATH domain ends "segmentID_value" indicates which segment of the domain we are inserting (there can be several separate segments for one domain) "proteinPiana_value" is the internal piana identifier for the protein "proteinCathSource_value" is the external database that has set this correspondence

insert_protein_cog(self, cog_id, proteinPiana_value, proteinCogSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its COG (Cluster of Orthologous Genes) "cog_id" "cog_id" is the COG term id "proteinPiana_value" is the internal piana identifier for the protein "proteinCogSource_value" is the external database that has set this correspondence

insert_protein_dbali_cluster(self, dbali_cluster_id_value, proteinPiana_value, clustering_method_value, patch_residues_value, protein_dbali_cluster_source)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its DBAli cluster "dbali_cluster_id_value" "dbali_cluster_id_value" is the cluster id given to DBAli to that protein "proteinPiana_value" is the internal piana identifier for the protein "clustering_method_value" is the method followed by DBAli to establish the correspondence   -> the method must be listed in PianaGlobals.pibase_dbali_methods "patch_residues_value" is the list of residues (string comma-separated) in the protein that correspond to that DBAli cluster "protein_dbali_cluster_source" is the external database that has set this correspondence

insert_protein_description(self, description, proteinPiana_value, proteinDescriptionSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its description (text string) "description" is the text string describing the protein "proteinPiana_value" is the internal piana identifier for the protein "proteinDescriptionSource_value" is the external database that has set this correspondence

insert_protein_disease(self, disease, proteinPiana_value, proteinDiseaseSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated disease (text string) "disease" is the text string describing the associated pathologies/disease with the protein "proteinPiana_value" is the internal piana identifier for the protein "proteinDiseaseSource_value" is the external database that has set this correspondence

insert_protein_ec(self, ec_id, proteinPiana_value, proteinECSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its EC code "ec_id" "ec_id" is the EC id "proteinPiana_value" is the internal piana identifier for the protein "proteinECSource_value" is the external database that has set this correspondence

insert_protein_function(self, function, proteinPiana_value, proteinFunctionSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its function (text string) "function" is the text string describing the function of the protein "proteinPiana_value" is the internal piana identifier for the protein "proteinFunctionSource_value" is the external database that has set this correspondence

insert_protein_go(self, go_id, proteinPiana_value, proteinGoSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its GO (Gene Ontology) term id "go_id" "go_id" is the GO term id "proteinPiana_value" is the internal piana identifier for the protein "proteinGoSource_value" is the external database that has set this correspondence

insert_protein_keyword(self, keyword, proteinPiana_value, proteinKeywordSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and an associated keyword "keyword" is the text string with a keyword associated to the protein "proteinPiana_value" is the internal piana identifier for the protein "proteinKeywordSource_value" is the external database that has set this correspondence

insert_protein_mim(self, mimID_value, proteinPiana_value, proteinMIMSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated MIM ID "mimID_value" is the integer Identification Number in MIM database "proteinPiana_value" is the internal piana identifier for the protein "proteinMIMSource_value" is the external database that has set this correspondence

insert_protein_reactome(self, reactome_id_value, proteinPiana_value, reactome_pathwayname_value, proteinReactomeSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated Reactome ID "reactome_id" "reactome_id" is the reactome ID from Reactome "proteinPiana_value" is the internal piana identifier for the protein "reactome_pathwayname_value" is the Reactome Pathway name "proteinReactomeSource_value" is the external database that has set this correspondence (must appear in PianaGlobals.source_databases)

insert_protein_scop(self, cf, sf, fa, proteinPiana_value, proteinScopSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its SCOP "cf", "sf", "fa" values "cf" is the fold id "sf" is the superfamily id "fa" is the family id "proteinPiana_value" is the internal piana identifier for the protein "proteinScopSource_value" is the external database that has set this correspondence

insert_protein_similarity(self, proteinPiana_a_value, proteinPiana_b_value, sourceDB_value)
Insert a pair of proteinPianas that are in fact the "same" protein (they are sufficiently similar to be considered the same for some situations) This is used to avoid comparing two proteins that are in fact the same. InsertProteinSimilarity makes sure that the order proteinPianaA < proteinPianaB is respected

insert_protein_subcellularLocation(self, subcellularLocation, proteinPiana_value, proteinSubcellularLocationSource_value)
Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its cellular location "subcellularLocation" is the text string with the cellular location of the protein "proteinPiana_value" is the internal piana identifier for the protein "proteinSubcellularLocationSource_value" is the external database that has set this correspondence

insert_refseq_code(self, refseq_code_value, proteinPiana_value, refseq_source_value)
Insert correspondence between  refseq  "refseq_code_value" and "proteinPiana_value" in table refseq Insert as well source database (to keep origin database of that code)

insert_species(self, tax_id, tax_name, tax_comment, tax_kingdom=None, source_db=None)
Inserts a species into Piana "tax_id" is the ncbi taxonomy identifier (eg. 1) "tax_name" is the name given to the species by ncbi (eg. 'human') "tax_comment" is the associated comment to the species (eg 'nothing') "tax_kingdom" is the kingdom of the species (eg. 'Eukaryota') "source_db" is the external database that is giving this data (eg. 'uniprot')

insert_species_kingdom(self, tax_id, tax_name, tax_kingdom, source_db)
If tax_id exists in pianaDB, inserts the kingdom for tax id "tax_id" (tax name will be ignored) If tax_id doesn't exist, inserts the "tax_id", the "tax_name", the "tax_kingdom" and "source_db" as in insert_species()

insert_unigene_code(self, unigene_code_value, proteinPiana_value, unigene_source_value)
Insert correspondence between  unigene  "unigene_code_value" and "proteinPiana_value" in table unigene Insert as well source database (to keep origin database of that code)

insert_uniprotAcc(self, uniprotAcc_value, proteinPiana_value, sourceDBID, isPrimary_value)
Insert correspondence between  "uniprotAcc_value" and "proteinPiana_value" in table uniprotAcc Insert as well source database (to keep origin database of that code) isPrimary_value indicates whether it is the primary accession code or not

insert_uniprotEntry(self, uniprotEntry, proteinPiana_value, sourceDBID)
Insert correspondence between  "uniprotEntry" and "proteinPiana_value" in table uniprotEntry Insert as well source database (to keep origin database of that code)

insert_uniprotInfo(self, proteinPiana_value, uniprotEntry_value, uniprotAcc_value, data_class_value, description_value, geneName_value, organism_value, organelle_value, proteinSequenceLength_value, proteinMW_value)
Inserts a uniprot entry into piana (all info found in uniprot... this is independent from uniprot entries and uniprot accession numbers) (this is not very used... just have it here for being able to query piana about uniprot info) For a description of these fields, please refer to the uniprot manual

lock_tables(self, table_list=None)
Locks mysql tables indicated in "table_list" "table_list": list of tables to lock. If it is not defined or ff list is [], all tables will be locked

set_lock_frequency(self, frequency_value)
Method to change the lock/unlock frequency (used only in parsers, to speed up insertions and deletions

unlock_tables(self)
Unlocks tables previously locked with method lock_tables()

update_table_column(self, proteinPiana=None, table=None, column=None, new_value=None)
updates column "column" of table "table" where proteinPiana="proteinPiana" with value "new_value" Used to update values in protein tables where the unique identifier in proteinPiana. it will change the current value in the column with the new value provided (see its use in update_sequence_ip.py)

Data and other attributes defined here:

__dict__ = <dictproxy object>
dictionary for instance variables (if defined)

__weakref__ = <attribute '__weakref__' of 'PianaDBaccess' objects>
list of weak references to the object (if defined)

class TemporalTableBuffer(BufferElement)

    Class to store the information about the temporal table The difference with BufferElement is a new attribute and that it will not contain values to insert (they will be inserted in a table of the database)

Method resolution order:

TemporalTableBuffer

BufferElement

__builtin__.object

Methods defined here:

__init__(self, max_size, table, columns, conditions)

get_columns_tuple(self)

get_conditions_list(self)

Methods inherited from BufferElement:

getColumns(self)

getSize(self)

getTable(self)

getValues(self)

insert_values(self, values)
Insert into buffer element a new tuple of values "values" must be a tuple of values to insert If the values cannot be inserted because the size has been exceeded, return None

restart_bufferElement(self, values)
Restarts a Buffer Element to its initial values and insert the new tuple of values

Data and other attributes inherited from BufferElement:

__dict__ = <dictproxy object>
dictionary for instance variables (if defined)

__weakref__ = <attribute '__weakref__' of 'BufferElement' objects>
list of weak references to the object (if defined)

Data

verbose = 0
verbose_insert_interaction = 0
verbose_species = 0

Data
		verbose = 0 verbose_insert_interaction = 0 verbose_species = 0