| |
- __builtin__.object
-
- Buffer
- BufferElement
-
- TemporalTableBuffer
- PianaDBaccess
class Buffer(__builtin__.object) |
|
Class used as an buffer for piana Inserts |
|
Methods defined here:
- __init__(self, max_size, dbaccessObject)
- get_buffer(self)
- Returns the buffer dictionary
- get_buffer_element(self, key)
- insert2buffer(self, key, table, columns, values)
- Inserts a query into the insert buffer
Alert! Only used for insert buffers!!!
- insert2tempbuffer(self, key, table, columns, conditions)
- Inserts a query into the temptable buffer
Data and other attributes defined here:
- __dict__ = <dictproxy object>
- dictionary for instance variables (if defined)
- __weakref__ = <attribute '__weakref__' of 'Buffer' objects>
- list of weak references to the object (if defined)
|
class BufferElement(__builtin__.object) |
|
Class used in Buffer object to store a buffer element |
|
Methods defined here:
- __init__(self, max_size, table, columns, values=None)
- Initializes a new buffer element
"max_size" is the maximum size of the buffer element
"table" is the table name in the database
"columns" is a tuple with the names of the columns
"values" is a tuple with the values to insert
- getColumns(self)
- getSize(self)
- getTable(self)
- getValues(self)
- insert_values(self, values)
- Insert into buffer element a new tuple of values
"values" must be a tuple of values to insert
If the values cannot be inserted because the size has been exceeded, return None
- restart_bufferElement(self, values)
- Restarts a Buffer Element to its initial values and insert the new tuple of values
Data and other attributes defined here:
- __dict__ = <dictproxy object>
- dictionary for instance variables (if defined)
- __weakref__ = <attribute '__weakref__' of 'BufferElement' objects>
- list of weak references to the object (if defined)
|
class PianaDBaccess(__builtin__.object) |
|
Class used as an interface with database piana |
|
Methods defined here:
- __getnewargs__(self)
- __getstate__(self)
- # ----
# methods required for using pickle with piana objects
# ----
- __init__(self, dbname=None, dbhost=None, dbuser=None, dbpassword=None, dbport=None, buffer=1)
- "dbname" is the database name to which you want to connect to (required)
"dbhost" is the machine with the mysql server that holds the piana database (required)
"dbuser" is the mysql user (not required in most systems)
"dbpassword" is the mysql password (not required in most systems)
"dbport" is the mysql port (not required in most systems)
"buffer" indicates if the user wants to use the intern Piana Buffer or not. It can be "None" (not use buffer) or "1" (use buffer) The buffer is used by default
- __setstate__(self, dict)
- add_external_database_information_type(self, externalDatabase, information_type)
- Adds to an external database which kind of information is used from it
ATTENTION! Information_type must be one element of the valid types: (it has to be checket previously!)
"protein sequences"
"protein attributes"
"protein-protein interactions"
- check_keywords_in_protein(self, list_proteinPiana, protein_ext_code=None, id_type=None, keywords=[], search_in=None)
- Checks all protein info [ie. description, function, keyword list] for keywords given by user
Returns the list of words in "keywords" that appear somewhere in the protein. (empty if nothing found)
"list_proteinPiana" is the list of proteins that you want to check (if it is just one protein, write [your_proteinPiana]
You can also use external code types: in that case, set list_proteinPiana to None and set protein_code and id_type to something:
"protein_code" is the protein external code
"id_type" is the type of code used for protein_code
--> it has to be one of the types listed in the PIANA reference card (read README.piana_tutorial)
--> attention: not fixing tax id here, therefore, I recommend not using geneNames...
"keywords" is a list of strings
--> strings of keywords must be in lower case.
--> This method will check that the keywords appear for the protein, regardless of the case of the words in the protein info
--> for example, keywords could be ['cancer', 'onco', 'apoptosis']
"search_in" is a list of texts where the keywords should be searched
if it is not used, description, function and keyword will be queried to database
- check_lock_frequency(self, table_list=[])
- check_protein(self, proteinSequence_value, tax_id_value=None)
- Method used to check if a protein is already in the database
Returns a proteinPiana if the protein exists, or None if not exists
Since pianaDB is a sequence-based DB, introducing proteinSequence_value is mandatory.
There is one proteinPiana for each (sequence, tax_id): therefore, tax_id is mandatory. However, when
the user doesn't know the tax id for the protein, we use a dummy tax_id (ie. 0) to allow searching
that sequence in the database. Therefore, if the tax_id is unknown, leave it to None.
- check_proteins_similarity(self, proteinPiana_a_value, proteinPiana_b_value)
- Returns 1 if the proteins are the same (ie. there is an entry proteinPiana_a, proteinPiana_b in table proteinSimilarity)
- close(self)
- Close the connection with database
- delete_interaction(self, interactionPiana_value)
- Deletes entry of table interaction with interactionPiana = interactionPiana_value
Attention! You should not delete an interactionPiana before making sure that other tables of interactions "interactionPiana" are
deleted as well: this will just delete the row in table interaction
- delete_interaction_method_for_sourceDBID(self, sourceDBID_value)
- Deletes entries of table interactionMethod with sourceDBID = "sourceDBID_value"
- delete_interaction_protein_source_for_sourceDBID(self, sourceDBID_value)
- Deletes entries of table interactionProteinSource with sourceDBID = "sourceDBID_value"
- delete_interaction_scores_for_sourceDBID(self, sourceDBID_value)
- Deletes entries of table interactionScores with sourceDBID = "sourceDBID_value"
- delete_interaction_sourceDB_for_sourceDBID(self, sourceDBID_value)
- Deletes entries of table interactionSourceDB with sourceDBID = "sourceDBID_value"
To do it, before the deletion performs a select to keep a list of those entries that are going to be deleted.
Returns a list of interactionPiana that have been deleted from table interactionSourceDB
- disable_buffer(self)
- Disables the use of the insert buffer. If the buffer contains anything, it empties it first
- drop_external_database(self, database_name, pianaDB)
- Method used to delete from the database all records from an external database
- drop_multiple_external_databases(self, list_databases, pianaDB)
- Method used to delete from the database all records from multiple external database
- empty_temporal_table(self)
- Executes all commands used to empty temporal table and update tables
All table will be emptied
- execute_insert_buffer(self, key_buffer=None)
- Executes all inserts in an insert buffer and empties it
If key_buffer is None, it will execute all buffe. Otherwise, it will insert only the key specified
- get_MD5_and_sequence_from_sequenceID(self, sequenceID=None)
- returns the sequence that corresponds to sequenceID
- get_all_g2_partners(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
- Returns a list with all partners (proteinPianas) at distance 2 of "proteinPiana_value", considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods)
--> If A interacts with B and C, and B interacts with D and C with no protein, then partners of A at distance 2 are [D]
"threshold" is used to limit the number of partners that each of the proteins at distance 1 can have
- if "threshold" == 0, then always return all partners for all proteins
- if threshold!=0 and number of partners > threshold, then return empty list (no partners) for that particular protein
--> ATTENTION!!!! threshold does not apply to the partners being returned by this method, but to the individual calls
that this method makes to get_all_partners. Therefore, this method can return whichever number of
partners, guaranteeing that no single protein has added to the list more than "threshold" partners
Returns empty list if no partner at distance 2 is found
- get_all_partners(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
- Returns a list with all partners (proteinPianas) of "proteinPiana_value", considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods)
If use_self_ints == "yes", it won't return itself as a partner .Set to 'no' if you want to get it in case it exists.
"threshold" is used to limit the number of partners that can be returned:
- if "threshold" == 0, then always return all partners
- if threshold!=0 and number of partners > threshold, then return empty list (no partners)
Returns empty list if no partner is found
- get_all_partners_with_interactionPiana(self, proteinPiana_value=None, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', threshold=0)
- Returns a list with all partners (proteinPiana,interactionPiana) of "proteinPiana_value" and their associated interactionPiana, considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods)
If use_self_ints == "yes", it won't return itself as a partner .Set to 'no' if you want to get it in case it exists.
"threshold" is used to limit the number of partners that can be returned:
- if "threshold" == 0, then always return all partners
- if threshold!=0 and number of partners > threshold, then return empty list (no partners)
Returns empty list if no partner is found
- get_all_proteinPiana(self, tax_id_value=0)
- returns all proteinPianas in pianaDB of species tax_id_value (if 0, returns all proteinPianas in database)
- get_all_proteinSimilarity(self, sourceDB_value)
- Returns a tuple of tuples of similar proteins (proteinPianaA,proteinPianaB) for a given database
- get_all_protein_dbali_cluster(self, clustering_method)
- Returns a list with all tuples (proteinPiana, dbali cluster ID) for a specific clustering method
- get_all_protein_protein_interactions(self, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no', only_id='no', tax_id_value='all')
- Returns a list of triplets (proteinPianaA, proteinPianaB, interactionPiana), considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods).
If "inverse_dbs"/"inverse_methods" is 'yes', then it returns everything except for those dbs/methods
If use_self_ints == "no", it won't return interactions between proteinPiana_value and itself. Set to 'yes' if you want to get them
if only_id is 'yes', then, instead of triplets, a list of interactionPianas is returned
if no interaction is found, returns empty list
- get_code_value(self, proteinCode_table, proteinCode_col, proteinPiana_value=None, answer_mode=None, source_db_info='no', id_type=None)
- Method used to retrieve a protein external code from column "proteinCode_col" of table "proteinCode_table"
for a given proteinPiana (it is a generalization to obtain external code from a given table that corresponds to protein proteinPiana)
"answer_mode" can be
- 'single' (if you just want one external code)
- 'list' (if you want all external codes from that table)
- 'list_primary' (if you want all external codes from that table, but the list should contain only primary codes, unless there were none)
"source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not
- "no" will simply return a list of proteinPianas
- "yes" will return a list of tuples (proteinPiana, sourceDBID)
in those cases where there is only one external code for "proteinPiana", "answer_mode" 'single' will return it as a single element and "list"
as the first (and only) element of a list
If nothing is found, the method will return None (in "answer_mode" "single") and an empty list (in "answer_mode" "list")
"id_type" is only used when proteinCode_table is the generalized table
- get_column_of_id_type(self, id_type)
- Method that returns the column in which "id_type" can be found
- get_complete_table(self, table, attributes=[])
- Get the complete content of table "table" for "attributes" values, without removing duplicates
- get_dic_proteins_with_known_ints(self, use_self_ints='yes', list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Returns a dictionary with keys those proteinPianas for which there are known interactions, considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods)
Contents of the dictionary are None for all proteinPianas.
If use_self_ints == "no", it won't return proteinPianas that only interact with themselves itself. Set to 'yes' if you want to get them
- get_from_dict(self, dict_name, description_value)
- returns ID for a given description of a dictionarized entity to find its "description_value"
description_value must exist in the dictionary PianaGlobals.dict_name, otherwise nothin will be returned
- get_go_depth(self, term_id_value)
- Returns depth value in GO tree for "term_id_value"
- get_interactionPiana(self, proteinPianaA_value, proteinPianaB_value, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Returns the interacionPiana for the interaction between proteins "proteinPianaA_value" and "proteinPianaB_value", considering only those
interaction source databases listed in "list_source_dbs" ('all' returns all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' returns all interactions from all methods)
No need to respect order of proteinPianaA and proteinPianaB: this method will take care of it, place them as you want.
If there is not such interaction, returns None
- get_interaction_line_style(self, interaction_type)
- returns the line style established in PianaGlobals for interaction_type
interaction_type can be:
- normal
- expanded
- get_interaction_methodID_list(self, interactionPiana_value)
- Returns a list of methodID for a particular "interactionPiana_value"
Returns empty list if nothing is found
Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction
is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases
-> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the
sense that it will have a complete list of sourceDBs and methods, regardless of restrictions imposed by user.
-> Right now this method is only used to check if an interaction can be deleted (./dbModification/delete_interactions_from_db.py)
and to calculate piana stats
- get_interaction_methodID_list_for_sourceDB(self, interactionPiana_value, sourceDBID_value)
- Returns a list of methodID for a particular interaction "interactionPiana_value" for a given "sourceDBID"
(here, soucerDBID is the name of the database, not the integer associated to it)
Returns empty list if nothing is found
Attention! There is no argument list_source_dbs: since arguments of this method fix a sourceDB, I let the user be responsible for
not asking to get methodIDs for a database that is not the one he wants...
- get_interaction_pubmedID_list(self, interactionPiana_value)
- Returns a list of pubmedID for a particular "interactionPiana_value"
Returns empty list if nothing is found
Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction
is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases
-> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the
sense that it will have a complete list of sourceDBs and methods, regardless of restrictions imposed by user.
- get_interaction_source_database_color(self, database_name)
- returns the color code established in PianaGlobals for database_name
- get_interactors(self, interaction_id=None)
- Returns the tuple of interactors interactors (proteinPianaA, proteinPianaB) for a given interaction_id
No need to check if the interaction belongs to a specific method: the user is only calling this method when he already
retrieved an interactionPiana: and he will only retrieve interactionPianas that respect the source db and method restrictions
- get_list_proteinMD5(self)
- returns a list with all distinct proteinMD5
- get_list_proteinPiana(self, protein_id=None, id_type=None, tax_id_value=0, source_db_info='no', only_primary='default')
- Method used to retrieve a list of proteinPiana identifiers from a given protein "protein_id" of type "id_type"
For example, one geneName "mot1" could correspond to many proteinPianas, since different databases give different sequences to mot1
Internally, we have to work with all proteinPianas, and then give the answer to the user in the type of code he chose
There can be many proteinPianas [id, id, id] or just one [id] or cero [].
valid "id_type" are those valid protein identifiers in current database.
For types "geneName" and "uniprotAcc", only will be returned
"tax_id_value" fixes the species for the proteinPiana that will be returned. If tax_id_value is 0, all proteinPianas associated to
protein_id will be returned.
In case you don't know the species of your code, set tax_id_value to 0. If your code implicitly points to a single
species, this will be OK.
In which cases tax_id_value has an effect on the proteinPianas that are returned?
For example, if a geneName is associated to several proteinPianas, only those that are of tax_id_value will be returned.
"source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not
- "no" will simply return a list of proteinPianas
- "yes" will return a list of tuples (proteinPiana, sourceDBID)
if protein_code being passed is a pdb code, the format of the code must be pdb_code.chain_id
-> If the chain_id is None, write pdb_code. (leaving the dot to mark that chain is unknown)
"only_primary" determines, for those cases in which external codes can be primary or not (geneName and uniprotAcc), if only
primary correspondences are taken or not:
- "yes" will return only those proteinPianas that correspond to the code when this is primary.
- "yes-no" will return those proteinPianas that correspond to the code when this is primary. If any code is found, then search all for this
- "no" will return all proteinPianas, regarless if external code is primary or not
- "default": will be:
"yes-no" for geneName and uniprotAccession (uniacc)
"no" for the rest
- get_list_proteinPiana_from_MD5(self, md5_value=None)
- returns a list of proteinPianas that correspond to a sequence (represented by its md5 code)
- get_list_protein_external_codes(self, proteinPiana=None, id_type=None, alternative_id_types=[], answer_mode='list', source_db_info='no')
- returns the list of external protein codes in type "id_type" that correspond to protein proteinPiana.
If no code is found for "id_type" , it succesively tries with types listed in alternative_id_types
-> if an alternative type name code is found, the list returned will contain strings of the form "id_type:external_code"
-> it will only return the list of codes for the first alternative type found
-> eg, alternative_id_types could be ["uniacc", "geneName", "md5"]
"proteinPiana" is the internal piana code for the protein you want to get the external codes
"id_type" is an easy-to-remember protein code type
-> They are stored in valid_protein_ids in MySQL database, and can be consulted using a piana command
-> valid id_type values are those in the PIANA reference card (read README.piana_tutorial)
-> it can be "all", meaning that the list returned will contain all codes in all types (all codes will be preceded by type name)
-> alternative_id_types are ignored if id_type is all
"alternative_id_types" is a list of easy-to-remember protein code type
for example, alternative_id_types could be ["uniacc", "gi", "proteinPiana"]
I recomend writing always md5 at the end to make sure you don't have Nones in your output
-> alternative_id_types are ignored if id_type is all
"answer_mode" can be:
- "single" (if you just want one external identifier)
- "list" (if you want all external identifiers for that proteinPiana)
- "list_primary" (if you want all external identifiers for that proteinPiana, but identifiers should be primary, unless none is found)
"source_db_info" determines if information about the sourceDB that inserted the proteinPiana is returned or not
- "no" will simply return a list of external_codes
- "yes" will return a list of tuples (external_code, sourceDBID) (this mode does not allow alternative_id_types nor id_type=='all')
Attention!!! This method does not work if you set "id_type" to 'all' and "answer_mode" to 'single'!!!!
Attention!!! This method does not work if you ask for all external codes (id_type="all") with their source db infos (source_db_info="yes")!!!!
- get_list_sequenceID(self, taxID=0)
- returns a list with all distinct sequenceID
"taxID" Tax ID of the specie from we want to obtain the list of sequenceID. It it is "0", it is not take into account
- get_list_sourceDB_names_from_interactionPiana(self, interactionPiana_value)
- Returns a list of sourceDB for a particular "interactionPiana_value"
Returns empty list if nothing is found
Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction
is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases
-> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the
sense that it will have a complete list of sourceDBs and methods, regardless of restrictions imposed by user.
- get_list_sourceDB_names_from_sourceDB_id(self, sourceDB_id)
- returns a list of strings (ie. names of databases) from the integer associated to the record (i.e. sourceDB_id)
- get_list_valid_protein_ids(self)
- returns a sorted list with valid protein identifiers
- get_list_valid_sources(self)
- returns a sorted list with valid database sources
- get_max_proteinPiana(self)
- Method that returns the maximum value of proteinPiana in the database
- get_methodID(self, methodDescription_value)
- returns methodID for a given description of a method to find interactions "methodDescription_value"
methodDescription_value must exist in dictionary PianaGlobals.method_names, otherwise nothing will be returned
- get_new_interaction_piana(self)
- Method used to retrieve a new interactionPiana identifier from table proteinPianaCounter
Called every time we want to insert a new interaction in the database
It takes care of increasing by 1 the counter table
proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana
- get_new_protein_piana(self)
- Method used to retrieve a new proteinPiana identifier from table proteinPianaCounter
Called every time we want to insert a new (sequence,taxID) in the database
It takes care of increasing by 1 the counter table
proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana
- get_new_sequenceID(self)
- Method used to retrieve a new sequenceID identifier from table proteinPianaCounter
Called every time we want to insert a new sequence in the database
It takes care of increasing by 1 the counter table
proteinPiana is just an internal identifier used to uniquely identify proteins: each sequence is a different proteinPiana
- get_node_border_color(self, node_origin)
- returns the color code established in PianaGlobals for node_origin
node_type can be:
- expanded
- normal
- get_node_fill_color(self, node_type)
- returns the color code established in PianaGlobals for node_type
node_type can be:
- root
- normal
- get_pair_method_protein_dbali_cluster(self, proteinPiana_value)
- Returns a list of tuples (dbali cluster ID, clustering_method) for protein "proteinPiana_value"
- get_partner(self, interaction_id=None, proteinPiana_value=None)
- Returns the partner (proteinPiana) of "proteinPiana_value" in interaction "interaction_id"
No need to check if the interaction belongs to a specific method: the user is only calling this method when he already
retrieved an interactionPiana: and he will only retrieve interactionPianas that respect the source db and method restrictions
- get_partners_of_proteins_sharing_cog(self, proteinPiana_value=None, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Returns interaction partners (proteinPianas) of those proteins that share the cogID with "proteinPiana_value", considering only those
interaction source databases listed in "list_source_dbs" ('all' gets all interactions from all databases) and those interaction
source methods listed in "list_source_methods" ('all' gets all interactions from all methods)
- get_piana_db(self)
- Returns the PianaDB object used for this PianaDBaccess object
- get_proteinPianas_with_value(self, value, proteinCode_table, proteinCode_col)
- returns proteinPianas whose value for column "proteinCode_col" in table "proteinCode_table" is "value"
- get_protein_cath(self, proteinPiana_value, residue_value=None)
- Returns a list of cathIDs for protein "proteinPiana_value"
If "residue_value" is not None, then it will return only caths that are defined around that residue
If "residue_value" is None, returns all cath values for the protein
- get_protein_cog(self, proteinPiana_value)
- returns a list with all COG identifiers that are assigned to protein "proteinPiana_value"
In case no COG identifier is found, returns an empty list.
- get_protein_dbali_cluster(self, proteinPiana_value, clustering_method, source_db)
- Returns a list with all dbali cluster IDs of a proteinPiana for a specific clustering method
"source_db" can be 'dbali', 'blast_transfer' or 'all'. It can be used to limit the dbali_clusters returned to those found through a specific technique
"clustering_method" refers to the parameters used for the clustering. Valid values are shown in PianaGlobals.pibase_dbali_methods
- get_protein_description(self, proteinPiana_value)
- returns a list with all descriptions (text string) that are assigned to protein "proteinPiana_value"
In case no description is found, returns an empty list.
- get_protein_ec(self, proteinPiana_value)
- returns a list with all EC identifiers that are assigned to protein "proteinPiana_value"
In case no EC identifier is found, returns an empty list.
- get_protein_function(self, proteinPiana_value)
- returns a list with all functions (text string) that are assigned to protein "proteinPiana_value"
In case no function is found, returns an empty list.
- get_protein_go_accession(self, proteinPiana_value)
- returns a list with all GO accessions (codes that look like "GO:234324") that are assigned to protein "proteinPiana_value"
In case no GO term is found, returns an empty list.
- get_protein_go_name(self, go_term_id_value)
- returns the GO name associated to "go_term_id_value"
In case no GO term is found, returns None
- get_protein_go_term_id(self, proteinPiana_value, term_type_value=None)
- returns a list with all GO terms ids (internal go identifiers) that are assigned to protein "proteinPiana_value"
In case no GO term is found, returns an empty list.
"term_type_value" can be None or one of the following:
- "molecular_function"
- "biological_process"
- "cellular_component"
if term_type_value is None, then returns all go terms independently of their term type category
- get_protein_go_term_name(self, proteinPiana_value, term_type_value=None)
- returns a list with all GO terms names (e.g. Apoptosis) that are assigned to protein "proteinPiana_value"
In case no GO term is found, returns an empty list.
"term_type_value" can be None or one of the following:
- "molecular_function"
- "biological_process"
- "cellular_component"
if term_type_value is None, then returns all go terms independently of their term type category
- get_protein_ip(self, proteinPiana=None)
- returns isoelectric point of protein proteinPiana
- get_protein_keyword(self, proteinPiana_value)
- returns a list with all keywords (text string) that are assigned to protein "proteinPiana_value"
In case no keyword is found, returns an empty list.
- get_protein_kingdoms(self, proteinPiana_value)
- Returns a list with kingdoms (as defined by ncbi) related to "proteinPiana_value"
- get_protein_mw(self, proteinPiana=None)
- returns molecular weight of protein proteinPiana
- get_protein_reactome_ids(self, proteinPiana_value)
- returns a list with all REACTOME ids (e.g. REACT_1698.1) that are assigned to protein "proteinPiana_value"
In case no REACTOME is found, returns an empty list
- get_protein_reactome_names(self, proteinPiana_value)
- returns a list with all REACTOME names (e.g. Gene Expression) that are assigned to protein "proteinPiana_value"
In case no REACTOME is found, returns an empty list
- get_protein_scop_cf(self, proteinPiana_value)
- Returns a list of cf codes for protein "proteinPiana_value"
- get_protein_scop_cf_sf_fa(self, proteinPiana_value)
- Returns a list of tuples (cf, sf, fa) for protein "proteinPiana_value"
- get_protein_scop_fa(self, proteinPiana_value)
- Returns a list of fa codes for protein "proteinPiana_value"
- get_protein_scop_sf(self, proteinPiana_value)
- Returns a list of sf codes for protein "proteinPiana_value"
- get_protein_sequence(self, proteinPiana=None)
- returns the sequence of protein proteinPiana
- get_protein_sequenceLength(self, proteinPiana_value=None)
- returns the length of the sequence of protein proteinPiana_value
- get_protein_species_names(self, proteinPiana_value)
- Returns a list with species names (as defined by ncbi) related to "proteinPiana_value"
- get_protein_subcellularLocation(self, proteinPiana_value)
- returns a list with all cellular locations (text string) that are assigned to protein "proteinPiana_value"
In case no keyword is found, returns an empty list.
- get_protein_taxonomy_ids(self, proteinPiana_value)
- Returns a list with taxonomy ids (as defined by ncbi) related to "proteinPiana_value"
- get_proteins_sharing_cog(self, proteinPiana_value)
- Returns a list of proteinPiana proteins that share a COG identifier with "proteinPiana_value"
Empty list returned if nothing is found
- get_proteins_sharing_ec(self, proteinPiana_value)
- Returns a list of proteinPiana proteins that share an EC identifier with "proteinPiana_value"
Empty list returned if nothing is found
- get_proteins_sharing_go(self, proteinPiana_value)
- Returns a list with proteinsPianas that have the same go_term_id as "proteinPiana_value"
- get_proteins_sharing_interpro(self, proteinPiana_value)
- Returns a list of proteinPiana proteins that share a interpro identifier with "proteinPiana_value"
Empty list returned if nothing is found
- get_proteins_sharing_scop(self, proteinPiana_value)
- Returns a list of proteinPiana proteins that share a SCOP family with "proteinPiana_value"
Empty list returned if nothing is found
- get_proteins_sharing_species(self, species_name_value=None, taxonomy_value=None)
- Returns a list of proteinPiana proteins that have species taxonomy_value (or species_name_value)
The user can input "taxonomy_value" (9606 for human, etc) or a "speciesName" string ('human', 'yeast', ...)
in case both species_name_value and taxonomy_value are set to something different from None, taxonomy_value is used
Empty list returned if nothing is found
- get_proteins_with_scop(self, cf=None, sf=None, fa=None)
- Returns a list of proteinPianas that have a given SCOP code
if any of the three categories is None, that category is ignored
if all categories are set, then this method impose that the proteins returned have those three categories
- get_reactome_names_from_reactome_ids(self, list_reactome_ids=[])
- returns the reactome names that are associated to reactome ids in list
- get_sequence_from_sequenceID(self, sequenceID=None)
- returns the sequence that corresponds to sequenceID
- get_similar_proteins_dic(self, proteinPiana_value)
- Returns a dictionary with keys those proteins that are similar to proteinPiana_value
- get_sourceDB_id_from_interactionPiana(self, interactionPiana_value)
- Returns the database integer sourceDBID for a particular "interactionPiana_value"
--> this method returns an integer, not a list of sourceDB names!
--> this should only be used internally by PianaGraphEdgeAttribute, to minimize the amount of memory used
Returns None if nothing is found
Attention! There is no argument list_source_dbs or list_source_methods: I let the user check if the interaction
is in other databases or methods, even if he limited the network interactions to those in a list of source interaction databases
-> I do it this way because this has no effect of the interactions added to the network: it will just affect the edge attribute, in the
sense that it will have a complete list of sourceDBs and methods, regardless of restrictions imposed by user.
- get_source_dbs_dict(self, database_type='all')
- Returns a dictionary with complete information of current external databases stored in PIANA database
"databaseType" is used to restrict those databases for which information must be retrieved. It can be:
"all"
"protein sequences"
"protein attributes"
"protein-protein interactions"
"identifiers cross-references"
The dictionary has the following format:
key: databaseName (the database identifier in PIANA database)
Each record in the dictinoary contains a dictionary with the keys (and their correspondent contents):
databaseVersion
parsed_file
date (date of the parsing)
database_description
database_type: List of which kind of data the database inserts
- get_species_names_from_taxonomies(self, list_taxonomy_ids=[], only_scientific='no')
- Returns a list with species names that are related to the taxonomy ids in "list_taxonomy_ids"
if only_scientific is 'yes', then the scienfific name is returned
- get_table_of_id_type(self, id_type)
- Method that returns the table in which "id_type" can be found (ie the pianaDB table where this protein identifier is located)
- get_taxonomies_from_species_name(self, species_name_value=None)
- Returns a list with taxonomy ids for a given speciesName species_name_value
- get_temporalSimilarity_data(self)
- Get all protein-protein similarities from database
TO CHANGE: It now reads from temporalProteinSimilarity, but it must read it directly from search...
- get_term2term_distance(self, term1, term2)
- Returns distance in the GO tree between "term1" and "term2"
- get_valid_protein_ids(self)
- get_valid_sources(self)
- returns a dictionary with valid external database sources
- insert_accessionNumber(self, accessionNumber_value, proteinPiana_value, accessionNumber_source_value)
- Insert correspondence between accessionNumber "accessionNumber_value" and "proteinPiana_value" in table accessionNumber
Insert as well source database (to keep origin database of that code)
- insert_blast_results(self, sequenceID_A_value, sequenceID_B_value, score_value, bit_score_value, start_A_value, end_A_value, start_B_value, end_B_value, identities_value, similarity_value, gaps_value, program_value, filter_value)
- Inserts the blast result of two sequences in the database
"program_value" is the program used to obtain the blast result. It can be "bl2seq" or "blastall"
"filter_value" is used to check if filter is used when running blast. It can be "T" if the filter is used or "F" is it not used
- insert_cog(self, cog_id, cog_description, cog_function, source_db)
- Inserts a COG (Cluster of Orthologous Genes) entry into Piana
"cog_id" is the COG identifier
"cog_description" is the text string describing the COG
"cog_function" is the text string describing the function of genes in this cluster
"source_db" is the database that is giving this information
- insert_ensembl_code(self, ensembl_code_value, proteinPiana_value, ensembl_source_value)
- Insert correspondence between ensembl "ensembl_code_value" and "proteinPiana_value" in table ensembl
Insert as well source database (to keep origin database of that code)
- insert_external_id_code(self, external_id_type, external_id_value, proteinPiana_value, sourceDBID)
- Insert correspondence between external id "external_id_value" of type "external_id_type"
- insert_geneID_code(self, geneID_code_value, proteinPiana_value, geneID_source_value)
- Insert correspondence between geneID "geneID_code_value" and "proteinPiana_value" in table geneID
Insert as well source database (to keep origin database of that code)
- insert_geneName_code(self, geneName_code_value, proteinPiana_value, geneName_source_value, isPrimary_value)
- Insert correspondence between geneName "geneName_code_value" and "proteinPiana_value" in table geneName
Insert as well source database (to keep origin database of that code)
- insert_gi_code(self, gi_code_value, proteinPiana_value, gi_source_value)
- Insert correspondence between gi "gi_code_value" and "proteinPiana_value" in table gi
Insert as well source database (to keep origin database of that code)
- insert_go(self, go_id, go_name, acc, term_type, distance2root, source_db)
- Inserts a GO (Gene Ontology) entry into Piana
"go_id" is the go term id
"go_name" is the text string associated to the term id
"acc" is the go accession (code that looks like 'GO:234234')
"term_type" is the type of GO term
-> can be one of the following (or None)
- "molecular_function"
- "biological_process"
- "cellular_component"
"distance2root" is the distance between this term id and the root of the GO hierarchy
"source_db" is the external database that is giving this information
- insert_go_term2term_distance(self, term1_id, term2_id, distance)
- Inserts distance "distance" between GO terms "term1" and "term2"
This is the distance between those terms in the GO hierarchy
- insert_interPro_code(self, interProID_code_value, proteinPiana_value, interProDescription_value, interPro_source_value)
- Insert interPro information into table interPro (interProID and interProDescription)
Insert as well source (to keep origin database of that code) and proteinPiana (to establish the relationship)
- insert_interaction(self, proteinPianaA_value, isSourceA_value, proteinPianaB_value, isSourceB_value, interactionConfidence_value, methodDescription_value, sourceDBDescription_value, confidenceAssignedSourceDB_value, pubmed_id_value='unknown')
- Inserts a new interaction into piana database.
"proteinPianaA_value" is the proteinPiana for one side of the interaction
"isSourceA_value" sets if the interaction goes from A to B (1) or not (0)
"proteinPianaB_value" is the proteinPiana for the other side of the interaction
"isSourceB_value" sets if the interaction goes from B to A (1) or not (0)
if "isSourceA_value" and "isSourceB_value" are 1, the interaction is bi-directional
"interactionConfidence_value" is the relyability of the interaction (not being used currently)
"methodDescription_value" is the method name that detected this interaction (eg. 'yeast two hybrids')
It can be a single value or a list of values. The user has not to worry about it, the method takes it into account
--> must appear in PianaGlobals.method_names
"sourceDBDescription_value" is the source database that contains this interaction (eg. 'DIP')
--> The database must exist in the database
"confidenceAssignedSourceDB_value" is the relyability assigned to the interaction by the source database
"pubmed_id_value" is the pubmed identifier for the article where this interaction was described
It can be a single value or a list of values. The user has not to worry about it, the methods takes it into account
Things this method does are:
- makes sure the order proteinPianaA < proteinPianaB is respected
- searches the methodID corresponding to the method description
- searches the sourceDBID corresponding to the sourceDB description
1. check if interaction exists already, retrieve interactionPiana in case it does
2. if interactionPiana is None:
insert information of table interaction and retrieve new interactionPiana
3. insert interactionSourceDB with interactionPiana
4. insert interactionMethod according to this sourceDB with interactionPiana
Attention! I don't allow the user to limit the insertions to those from an specific database or method. If he wants to
filter the database, do it afterwards... the pianaDB will contains all interactions regardless of origin
(if one day somebody wants to change this, he will have to receive arguments list_source_dbs and list_source_methods
and use them below to limit the insertion)
- insert_interaction_scores(self, interactionPiana_value, sourceDBDescription_value, equiv_nscore_value, equiv_nscore_transferred_value, equiv_fscore_value, equiv_pscore_value, equiv_hscore_value, array_score_value, array_score_transferred_value, experimental_score_value, experimental_score_transferred_value, database_score_value, database_score_transferred_value, textmining_score_value, textmining_score_transferred_value, combined_score_value)
- Inserts interaction scores into table interactionScores_table
This table only holds information for interactions contained in STRING
Refer to the string manual for description of the different arguments
- insert_new_external_database(self, databaseName=None, databaseVersion=None, parsedFile=None, databaseDescription=None, databaseInformation=None)
- Inserts into database the information of a new external database that is being integrated into PIANA database
"databaseName" will be the internal identifier for this database, so, it must be unique.
"pianaDB" is the name of the PIANA database where the new protein id type is inserted. It is used to generate automatic documentation on current database
"databaseInformation" is a list of the type of information that the database inserts into PIANA database
ATTENTION! Information_type must be one element of the valid types:
"protein sequences"
"protein attributes"
"protein-protein interactions"
"identifiers cross-references"
- insert_new_id_type(self, proteinTypeId, proteinTable='proteinExternalId', externalIdColumn='proteinExternalId', externalIdDescription=None)
- Inserts into PIANA database a new external Id identifier
First, it will check if this external Type Id was previously in the database: if it wasn't, insert it. Otherwise, error
"proteinTable" is the SQL table where protein ids of "proteinTypeId" are kept
"externalIdColumn" is the SQL column of "proteinTable" where protein ids of "proteinTypeId" are kept
- insert_pdb_code(self, pdb_code_value, proteinPiana_value, chain_value, range_value, pdb_source_value)
- Insert correspondence between pdbs and "proteinPiana_value" in table pdb
Internally, the pdb code is formed by "pdb_code_value" + "." + "chain_value"
Insert as well source database (to keep origin database of that code)
- insert_pfam_code(self, pfamID_code_value, proteinPiana_value, pfamDescription_value, pfam_source_value)
- Insert pfam information into table pfam (pfamID and pfamDescription)
Insert as well source (to keep origin database of that code) and proteinPiana (to establish the relationship)
- insert_protein(self, proteinSequence_value, sourceDBID, tax_id_value=None)
- Method used to insert new proteins into pianaDB in `protein` table.
Returns a proteinPiana: it is the code corresponding to the (sequence, tax_id)
--> if (sequence, tax) didn't exist already, creates a new proteinPiana
--> if the sequence is already present in pianaDB, returns the previous proteinPiana
--------------------------------------------------------------------------------------------------------------
Since pianaDB is a sequence-based DB, introducing proteinSequence_value is mandatory.
There is one proteinPiana for each (sequence, tax_id): therefore, tax_id is mandatory. However, when
the user doesn't know the tax id for the protein, we use a dummy tax_id (ie. 0) to allow inserting
that sequence. Therefore, if the tax_id is unknown, leave it to None.
We use MD5 codes (calculated here) instead of sequences to fasten up the process of comparing sequences
MW and IP values for the protein are calculated using BioPython methods
This method takes care of handling proteinPiana codes:
1. It first looks for existence of the (sequence, tax_id) in table protein,
which is the registry of correspondences between (protein sequence, tax id) and proteinPiana identifiers. We need to keep this
registry in order to make sure that proteinPiana identifiers do not change when updating the database, or building a
new one from scratch.
2. If the (sequence, tax_id) does not exist in the database, then obtain a new proteinPiana identifier. This is done with a method
that looks into a counter table, returns its value and increases the counter for the next proteinPiana identifier.
3. Once proteinPiana is known (newly generated, or assigned from an old insertion) insert the protein into the database
- insert_protein_cath(self, cath_id_value, res_start_value, res_end_value, segmentID_value, proteinPiana_value, proteinCathSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its CATH "cath_id_value"
"cath_id_value" is the CATH id
"res_start_value" is the residue where the CATH domain starts
"res_end_value" is the residue where the CATH domain ends
"segmentID_value" indicates which segment of the domain we are inserting (there can be several separate segments for one domain)
"proteinPiana_value" is the internal piana identifier for the protein
"proteinCathSource_value" is the external database that has set this correspondence
- insert_protein_cog(self, cog_id, proteinPiana_value, proteinCogSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its COG (Cluster of Orthologous Genes) "cog_id"
"cog_id" is the COG term id
"proteinPiana_value" is the internal piana identifier for the protein
"proteinCogSource_value" is the external database that has set this correspondence
- insert_protein_dbali_cluster(self, dbali_cluster_id_value, proteinPiana_value, clustering_method_value, patch_residues_value, protein_dbali_cluster_source)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its DBAli cluster "dbali_cluster_id_value"
"dbali_cluster_id_value" is the cluster id given to DBAli to that protein
"proteinPiana_value" is the internal piana identifier for the protein
"clustering_method_value" is the method followed by DBAli to establish the correspondence
-> the method must be listed in PianaGlobals.pibase_dbali_methods
"patch_residues_value" is the list of residues (string comma-separated) in the protein that correspond to that DBAli cluster
"protein_dbali_cluster_source" is the external database that has set this correspondence
- insert_protein_description(self, description, proteinPiana_value, proteinDescriptionSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its description (text string)
"description" is the text string describing the protein
"proteinPiana_value" is the internal piana identifier for the protein
"proteinDescriptionSource_value" is the external database that has set this correspondence
- insert_protein_disease(self, disease, proteinPiana_value, proteinDiseaseSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated disease (text string)
"disease" is the text string describing the associated pathologies/disease with the protein
"proteinPiana_value" is the internal piana identifier for the protein
"proteinDiseaseSource_value" is the external database that has set this correspondence
- insert_protein_ec(self, ec_id, proteinPiana_value, proteinECSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its EC code "ec_id"
"ec_id" is the EC id
"proteinPiana_value" is the internal piana identifier for the protein
"proteinECSource_value" is the external database that has set this correspondence
- insert_protein_function(self, function, proteinPiana_value, proteinFunctionSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its function (text string)
"function" is the text string describing the function of the protein
"proteinPiana_value" is the internal piana identifier for the protein
"proteinFunctionSource_value" is the external database that has set this correspondence
- insert_protein_go(self, go_id, proteinPiana_value, proteinGoSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its GO (Gene Ontology) term id "go_id"
"go_id" is the GO term id
"proteinPiana_value" is the internal piana identifier for the protein
"proteinGoSource_value" is the external database that has set this correspondence
- insert_protein_keyword(self, keyword, proteinPiana_value, proteinKeywordSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and an associated keyword
"keyword" is the text string with a keyword associated to the protein
"proteinPiana_value" is the internal piana identifier for the protein
"proteinKeywordSource_value" is the external database that has set this correspondence
- insert_protein_mim(self, mimID_value, proteinPiana_value, proteinMIMSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated MIM ID
"mimID_value" is the integer Identification Number in MIM database
"proteinPiana_value" is the internal piana identifier for the protein
"proteinMIMSource_value" is the external database that has set this correspondence
- insert_protein_reactome(self, reactome_id_value, proteinPiana_value, reactome_pathwayname_value, proteinReactomeSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its associated Reactome ID "reactome_id"
"reactome_id" is the reactome ID from Reactome
"proteinPiana_value" is the internal piana identifier for the protein
"reactome_pathwayname_value" is the Reactome Pathway name
"proteinReactomeSource_value" is the external database that has set this correspondence (must appear in PianaGlobals.source_databases)
- insert_protein_scop(self, cf, sf, fa, proteinPiana_value, proteinScopSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its SCOP "cf", "sf", "fa" values
"cf" is the fold id
"sf" is the superfamily id
"fa" is the family id
"proteinPiana_value" is the internal piana identifier for the protein
"proteinScopSource_value" is the external database that has set this correspondence
- insert_protein_similarity(self, proteinPiana_a_value, proteinPiana_b_value, sourceDB_value)
- Insert a pair of proteinPianas that are in fact the "same" protein (they are sufficiently similar to be considered the same
for some situations)
This is used to avoid comparing two proteins that are in fact the same.
InsertProteinSimilarity makes sure that the order proteinPianaA < proteinPianaB is respected
- insert_protein_subcellularLocation(self, subcellularLocation, proteinPiana_value, proteinSubcellularLocationSource_value)
- Method that inserts the correspondence between proteinPiana "proteinPiana_value" and its cellular location
"subcellularLocation" is the text string with the cellular location of the protein
"proteinPiana_value" is the internal piana identifier for the protein
"proteinSubcellularLocationSource_value" is the external database that has set this correspondence
- insert_refseq_code(self, refseq_code_value, proteinPiana_value, refseq_source_value)
- Insert correspondence between refseq "refseq_code_value" and "proteinPiana_value" in table refseq
Insert as well source database (to keep origin database of that code)
- insert_species(self, tax_id, tax_name, tax_comment, tax_kingdom=None, source_db=None)
- Inserts a species into Piana
"tax_id" is the ncbi taxonomy identifier (eg. 1)
"tax_name" is the name given to the species by ncbi (eg. 'human')
"tax_comment" is the associated comment to the species (eg 'nothing')
"tax_kingdom" is the kingdom of the species (eg. 'Eukaryota')
"source_db" is the external database that is giving this data (eg. 'uniprot')
- insert_species_kingdom(self, tax_id, tax_name, tax_kingdom, source_db)
- If tax_id exists in pianaDB, inserts the kingdom for tax id "tax_id" (tax name will be ignored)
If tax_id doesn't exist, inserts the "tax_id", the "tax_name", the "tax_kingdom" and "source_db" as in insert_species()
- insert_unigene_code(self, unigene_code_value, proteinPiana_value, unigene_source_value)
- Insert correspondence between unigene "unigene_code_value" and "proteinPiana_value" in table unigene
Insert as well source database (to keep origin database of that code)
- insert_uniprotAcc(self, uniprotAcc_value, proteinPiana_value, sourceDBID, isPrimary_value)
- Insert correspondence between "uniprotAcc_value" and "proteinPiana_value" in table uniprotAcc
Insert as well source database (to keep origin database of that code)
isPrimary_value indicates whether it is the primary accession code or not
- insert_uniprotEntry(self, uniprotEntry, proteinPiana_value, sourceDBID)
- Insert correspondence between "uniprotEntry" and "proteinPiana_value" in table uniprotEntry
Insert as well source database (to keep origin database of that code)
- insert_uniprotInfo(self, proteinPiana_value, uniprotEntry_value, uniprotAcc_value, data_class_value, description_value, geneName_value, organism_value, organelle_value, proteinSequenceLength_value, proteinMW_value)
- Inserts a uniprot entry into piana (all info found in uniprot... this is independent from uniprot entries and uniprot accession numbers)
(this is not very used... just have it here for being able to query piana about uniprot info)
For a description of these fields, please refer to the uniprot manual
- lock_tables(self, table_list=None)
- Locks mysql tables indicated in "table_list"
"table_list": list of tables to lock. If it is not defined or ff list is [], all tables will be locked
- set_lock_frequency(self, frequency_value)
- Method to change the lock/unlock frequency (used only in parsers, to speed up insertions and deletions
- unlock_tables(self)
- Unlocks tables previously locked with method lock_tables()
- update_table_column(self, proteinPiana=None, table=None, column=None, new_value=None)
- updates column "column" of table "table" where proteinPiana="proteinPiana" with value "new_value"
Used to update values in protein tables where the unique identifier in proteinPiana. it will change the current value
in the column with the new value provided (see its use in update_sequence_ip.py)
Data and other attributes defined here:
- __dict__ = <dictproxy object>
- dictionary for instance variables (if defined)
- __weakref__ = <attribute '__weakref__' of 'PianaDBaccess' objects>
- list of weak references to the object (if defined)
|
|