| |
Methods defined here:
- __init__(self, piana_dbname=None, piana_dbhost=None, piana_dbuser=None, piana_dbpass=None)
- add_file_interactions_to_piana_graph(self, file_object, protein_type_name)
- Adds interactions from a file to current piana_graph
(adds only those interactions described in the file. This command doesn't search for interactions in the database for these proteins.)
(this is not inserting the interactions into the piana database: it is just adding it to the current network)
"file_object" is a file object describing one interaction per line
--> The interactions file follows the format (set unknown values to None):
protein_a<TAB>protein_b<TAB>source_database<TAB>detection_method<TAB>confidence_score
for example, a line could be:
HOG1 MOT1 None y2h None
This format is described in detail in file piana/code/dbParsers/piana_text_intParser/README.piana_interaction_data_format
"protein_type_name" is the type of code used for protein protein_code
- add_file_proteins_to_piana_graph(self, file_object, protein_type_name, tax_id_value, depth, hub_threshold, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Adds proteins and their interactions to current piana_graph. Proteins to add are in file "file_object"
"file_object" is a file object pointing to a file that contains one protein per line
"protein_type_name" is the type of code used for protein protein_code
"tax_id_value" sets the species of the proteins that are being added (can be used for eliminating ambiguities between codes across species)
--> valid species names are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"depth" fixes the depth at which interactions will be added (eg. depth 2 will add parters of partners of protein protein_code)
"hub_threshold" sets the maximum number of interactions a protein can have for it to be added to the piana_graph
-> 0 is equivalent to not applying any hub threshold
"list_source_dbs" sets the interaction databases that will be used to get interactions when building the piana_graph
-> can be a list of dbs (eg ["dip", "string"]) or "all" (all source dbs used)
"inverse_dbs" can be:
no (databases in list_source_dbs will be used to build the network)
yes (all databases except those in list_source_dbs will be used to build the network)
"list_source_methods" sets the methods that will be used to get interactions when building the piana_graph
-> can be a list of methods (eg ["y2h", "tap"]) or "all" (all methods used)
"inverse_methods" can be:
no (methods in list_source_methods will be used to build the network)
yes (all methods except those in list_source_methods will be used to build the network)
- add_interaction_to_piana_graph(self, protein_a, protein_b, protein_type_name, source_db=None, method=None, confidence=1)
- Adds one interaction between protein_a and protein_b to current piana_graph
(adds only this interaction. This command doesn't search for interactions in the database for these proteins.)
(this is not inserting the interaction into the piana database: it is just adding it to the current network)
"source_db" is the database from which you have obtained the interaction
--> set it to None if this is not relevant for your analysis
--> set it to 'user' if you want to label it as 'added by user' (will appear in yellow)
"method" is the method you have used to detect the interaction
--> set it to None if this is not relevant for your analysis
"confidence" is not currently being used... set it to 1
"protein_type_name" is the type of code used for protein protein_codes protein_a and protein_b (it has to be the same for both)
- add_protein_to_piana_graph(self, protein_code, protein_type_name, tax_id_value, depth, hub_threshold, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Adds one protein and its interactions to current piana_graph.
"protein_code" is a string with the protein code (must be of type protein_type_name)
"protein_type_name" is the type of code used for protein protein_code
"tax_id_value" sets the species of the protein that is being added (can be used for eliminating ambiguities between codes across species)
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"depth" fixes the depth at which interactions will be added (eg. depth 2 will add parters of partners of protein protein_code)
"hub_threshold" sets the maximum number of interactions a protein can have for it to be added to the piana_graph
-> 0 is equivalent to not applying any hub threshold
"list_source_dbs" sets the interaction databases that will be used to get interactions when building the piana_graph
-> can be a list of dbs (eg ["dip", "string"]) or "all" (all source dbs used)
"inverse_dbs" can be:
no (databases in list_source_dbs will be used to build the network)
yes (all databases except those in list_source_dbs will be used to build the network)
"list_source_methods" sets the methods that will be used to get interactions when building the piana_graph
-> can be a list of methods (eg ["y2h", "tap"]) or "all" (all methods used)
"inverse_methods" can be:
no (methods in list_source_methods will be used to build the network)
yes (all methods except those in list_source_methods will be used to build the network)
- create_database_method_piana_graph(self, tax_id_value=0, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Replaces current piana_graph with the interaction piana_graph for a given list of databases and methods
(ie. adds all interactions that appear in databases list_source_dbs that are of a method that appear in list_source_methods)
"tax_id_value" is the species for which the piana_graph will be built
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_source_dbs" sets the interaction databases that will be used to get interactions when building the piana_graph
-> can be a list of dbs (eg ["dip", "string"]) or "all" (all source dbs used)
"inverse_dbs" can be:
no (databases in list_source_dbs will be used to build the network)
yes (all databases except those in list_source_dbs will be used to build the network)
"list_source_methods" sets the methods that will be used to get interactions when building the piana_graph
-> can be a list of methods (eg ["y2h", "tap"]) or "all" (all methods used)
"inverse_methods" can be:
no (methods in list_source_methods will be used to build the network)
yes (all methods except those in list_source_methods will be used to build the network)
Note: no depth argument is needed, since we are adding all interactions that respect a certain criteria
- create_go_clustered_network(self, output_target=None, term_type=None, score_threshold=None, sim_mode=None, level_threshold=None, distance_threshold=None, rep_term=None, print_id=None)
- Creates a network of GO (Gene Ontology) terms from the piana graph
Then, clusters the Go network using parameters provided
Finally, prints the clustered go network
"output_target" is the file object where the clustered network in DOT format will be printed
- "term_type" sets the kind of GO terms that will be used for the clustering.
-> term-type can be "molecular_function", "biological_process" or "cellular_component"
- "score_threshold" is the lowest score obtained by the similarity function allowed for continuing the clustering
-> can be any real number from 0 to 100 (0 will group all proteins, 100 will not group any proteins). To obtain a relevant clustered network
use score thresholds between 0.1 and 1
- "sim_mode" sets how to calculate distances between two clusters
- "random" takes a random element from each cluster and evaluates similarity between them
- "min" takes the minimal distance between elements of each cluster
- "max" takes the maximal distance between elements of each cluster
- "average" takes the average distance between all elements of each cluster
- "level_threshold" is the lowest level of the go term in the cluster allowed for continuing the clustering
-> GO is a hierarchy organized from a initial root level (ie. 0) that increasingly makes more specific the terms.
Therefore, the higher the level used the less clustering will be performed. To obtain a relevant clustered network
use level thresholds between 1 and 3. It all depends on how general you want to be in the interpretation of the network.
- "distance_threshold" is the maximum distance allowed between two proteins in order to be clustered
-> can be any integer between 1 and ...
- "rep_term" sets which of the GO terms of the cluster will be used for printing output
-> can be min (term of minimal depth in the hierarchy) or max (maximal depth)
- "print_id" sets which id will be used for identifying the clusters in the printed output
-> can be "no" (default id: go term name) or "yes" (a more complex id)
- create_species_piana_graph(self, species_name=None, tax_id=None, hub_threshold=0, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Replaces current piana_graph with the protein-protein interaction piana_graph for a given species (using all proteins)
The user can fix the species using a tax id or a species name (don't use both at the same time):
"taxonomy_id" is the tax id for which the piana_graph will be built (set it to None if not using it)
-> has to be a valid tax id (eg. 9606; ...)
"species_name" is the species for which the piana_graph will be built (set it to None if not using it)
-> has to be the name of a species (e.g "human", "yeast", ...)
--> in case both species_name_value and taxonomy_value are set to something different from None, taxonomy_value is used
"hub_threshold" sets the maximum number of interactions a protein can have for it to be added to the piana_graph
-> 0 is equivalent to not applying any hub threshold
"list_source_dbs" sets the interaction databases that will be used to get interactions when building the piana_graph
-> can be a list of dbs (eg ["dip", "string"]) or "all" (all source dbs used)
"inverse_dbs" can be:
no (databases in list_source_dbs will be used to build the network)
yes (all databases except those in list_source_dbs will be used to build the network)
"list_source_methods" sets the methods that will be used to get interactions when building the piana_graph
-> can be a list of methods (eg ["y2h", "tap"]) or "all" (all methods used)
"inverse_methods" can be:
no (methods in list_source_methods will be used to build the network)
yes (all methods except those in list_source_methods will be used to build the network)
Note: no depth argument is needed, since we are adding all interactions for all proteins of a given species
- expand_piana_graph_interactions(self, expansion_type, expansion_mode, expansion_threshold, hub_threshold, exp_output_mode, output_file_object, proteins_type_name, list_alternative_type_names, output_tax_id, list_source_dbs='all', inverse_dbs='no', list_source_methods='all', inverse_methods='no')
- Expands interactions in the current piana_graph by propagating interactions to nodes "expansion_mode" from all nodes that
have a common characteristic "expansion_type"
"expansion_type" defines the characteristic that is used to propagate interactions between nodes
-> valid expansion-type values are those defined in PianaGlobals.expansion_types (currently can be cog, scop (ie. scop family), interpro or ec)
-> if two proteins share expansion-type, interactions are interpropagated
"expansion_mode" defines to which nodes we will propagate the interactions to
-> valid expansion-nodes values are: all (all proteins in piana_graph are expanded) or root (only root proteins are expanded)
-> if you are looking for new interactions (predictions) for your input proteins, use root
-> if you want to expand all the proteins in the piana_graph (partners of root proteins as well) use all
-> root proteins are the source proteins used to build the piana_graph
"expansion_threshold" is used to avoid propagating interactions when there are too many nodes that share the expansion type
-> valid values are: 0 (no thresholds applied) and positive integers
"hub_threshold" sets the maximum number of interactions a protein can have for it to be added to the piana_graph
-> 0 is equivalent to not applying any hub threshold
"list_source_dbs" sets the interaction databases that will be used to get interactions when building the piana_graph
-> can be a list of dbs (eg ["dip", "string"]) or "all" (all source dbs used)
"inverse_dbs" can be:
no (databases in list_source_dbs will be used to build the network)
yes (all databases except those in list_source_dbs will be used to build the network)
"list_source_methods" sets the methods that will be used to get interactions when building the piana_graph
-> can be a list of methods (eg ["y2h", "tap"]) or "all" (all methods used)
"inverse_methods" can be:
no (methods in list_source_methods will be used to build the network)
yes (all methods except those in list_source_methods will be used to build the network)
"exp_output_mode" sets whether new interactions are added to the piana_graph or printed to an output file
-> valid exp-output-mode values are: add (add predictions to piana_graph) and print (print to output-target)
-> 'add' will add to the piana_graph the predictions found by expansion
-> 'print' will print to output-target (or to default results file) the list of predictions found by expansion
-> for example, if you wanted to get predictions for root nodes using double cog expansion
you would first use command expand-interactions with expansion-nodes=all and mode=add
and then, another command expand-interactions with expansion-nodes=root and mode=print
doing this "double expansion" you will be predicting interactions based on a previous expansion
- if exp-output-mode is add, the following arguments can be ignored: leave them to blank:
- if exp-output-mode is "print" then :
-> "output_file_object" is the file object where the interactions will be printed
-> "output_tax_id" restricts the species of proteins in the interactions that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
-> "proteins_type_name" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
-> the results will follow the following format (one interaction per line):
protein1<TAB>protein2<TAB>expansion_type<TAB>source_interactionPiana<TAB>source_proteinPiana
- get_one_tax_id_from_species_name(self, species_name=None)
- returns the tax id associated to "species_name"
If there are more than one tax id associated, raises an error (to avoid ambiguities)
If there is no tax id associated to it, raises an error (to avoid using a name that doesn't exist)
- load_piana_graph(self, file_object)
- loads a previously saved piana_graph into the current piana_graph (ie replaces current piana_graph by the one in file_object)
file_object must have been opened using binary read (ie file_object= file(file_name, "rb") )
- print_all_proteins_information(self, protein_type_name, output_file_object, output_mode='compact', format_mode='txt', list_alternative_type_names=[], tax_id_value=0, list_keywords=[], file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints information about all the proteins in the current piana_graph
"protein_type_name" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object (sys.stdout to print to screen) where interactions will be printed
"output_mode" can be:
- 'compact': all relevant information in one line
- 'extended': all information in text paragraphs
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"list_alternative_type_names" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide pairs a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"tax_id_value" determines which species must be the proteins that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it doesn't do anything
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_connecting_proteins_information(self, protein_type_name, output_file_object, output_mode='compact', format_mode='txt', list_alternative_type_names=[], tax_id_value=0, list_keywords=[], file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints information about linkers in the current piana_graph
-> a linker is a protein that connects two or more root proteins
"protein_type_name" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object (sys.stdout to print to screen) where interactions will be printed
"output_mode" can be:
- 'compact': all relevant information in one line
- 'extended': all information in text paragraphs
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"list_alternative_type_names" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide pairs a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"tax_id_value" determines which species must be the proteins that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it doesn't do anything
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_file_proteins_information(self, input_file_object, input_proteins_type, output_file_object, output_proteins_type, output_mode, format_mode='txt', list_keywords=[], list_alternative_type_names=[], tax_id_value=0, file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints information for all proteins in file "input_file_object"
--> only works in compact mode to prevent the creation of enormous text files
"input_file_object" is the file object with the protein codes for which you want to obtain the information
"input_proteins_type" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object where the protein information will be written
"output_proteins_type" is the type of code that will be used to identify proteins in the output file
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"tax_id_value" sets the species of the protein that will be printed (can be used for eliminating ambiguities between codes across species)
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"output_mode" can be:
- 'extended' (multiple lines per protein to be shown directly to the screen)
- 'compact' (one line per protein to be shown directly on the screen)
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it doesn't do anything
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_interactions(self, protein_type_name, output_file_object, output_format, print_mode, format_mode, list_alternative_type_names, tax_id_value, list_keywords=[], intersection_dbs=None, file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints interactions from current piana_graph in the format chosen by the user.
"protein_type_name" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object (sys.stdout to print to screen) where interactions will be printed
"output_format" is the format that will be followed for the output
-'table': prints interactions in table format
-'network': prints interactions in a format that can be visualized as a network
"format_mode" sets the type of format that will be used for output
-> valid formats for output_format 'table' are:
- 'txt' will print flat text
- 'html' will print html
-> valid formats for output_format 'network' are:
- 'dot': uses DOT format as defined in www.graphviz.org
--> format dot will produce an output that can be then given to visualization programs
for example, neato from GraphViz, would work by:
$> cat output_in_dot_format | neato -Tgif -o network.gif
--> format_mode 'txt' for table will print a table in the format indicated in the description of command print-table-*
in the template for piana configuration files: piana/code/execs/conf_files/general_template.piana_conf
"print_mode" sets which proteins will be printed
-> "all" will print all interactions in the piana_graph
-> "all_root" will print all interactions in the piana_graph where at least one partner is a root protein
-> "only_root" will print only interactions between root proteins
-> "connecting" will print only interactions between root proteins and those proteins that connect more than one root protein
"list_alternative_types" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
-> The list of values that this list can contain can be obtained by doing python piana.py or
looking to variable valid_protein_types in PianaGlobals.py
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"tax_id_value" determines which species must be the proteins that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it highlights in red the proteins in the DOT file that contain at least one keyword in the function, description or name
-> currently, it is not used when printing the interactions in a table
-> If you are interested in highlighting proteins related to cancer, list_keywords could be: ['cancer', 'onco', 'carcinoma', 'tumor']
"intersection_dbs" sets intersection mode, which only prints out interactions that appear in all dbs of the list being passed
-> it can be None (no intersection mode applied) or a list of database names
-> valid database names are those in PianaGlobals.interaction_databases
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_list_proteins_information(self, protein_list, input_proteins_type, output_file_object, output_proteins_type, list_alternative_type_names=[], output_mode='compact', format_mode='txt', list_keywords=[], tax_id_value=0, file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints information for all proteins in list "protein_list" (does not take into account the network, only these proteins)
--> only works in compact mode to prevent the creation of enormous text files
"protein_list" is the list of proteins for which you want to retrieve the information
"input_proteins_type" is the type of code of proteins in the protein list
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object where the protein information will be written
"output_proteins_type" is the type of code that will be used to identify proteins in the output file
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"tax_id_value" determines which species must be the proteins that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_alternative_type_names" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide pairs a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"output_mode" can be:
- 'extended' (multiple lines per protein to be shown directly to the screen)
- 'compact' (one line per protein to be shown directly on the screen)
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it doesn't do anything
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_proteins_at_distance_x(self, query_protein, distance, input_protein_type, output_protein_type, list_alternative_type_names, output_file_object, format_mode, info, tax_id_value=0)
- Prints to "output_file_object" proteins from the network that are at distance "distance" from the protein "query protein"
"query_protein" is the query protein: proteins returned will be at distance "distance" from this protein
"distance" is the distance at which the proteins will be searched
"input_protein_type" is the type of code used for protein query_protein
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_protein_type" is the type of code that will be used to print proteins
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"list_alternative_types" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide pairs a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"output_file_object" is the file object (sys.stdout to print to screen) where proteins at distance X will be printed
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"info" sets whether domain information associated to the proteins will be printed or not
- 'all' will show all domain information
- 'scop' will show SCOP domain information
- 'cath' will show CATH domain information
- 'no' shows no domain information
- print_root_proteins_information(self, protein_type_name, output_file_object, output_mode='compact', format_mode='txt', list_alternative_type_names=[], tax_id_value=0, list_keywords=[], file_over_expressed=None, file_infra_expressed=None, expression_protein_type=None)
- Prints information about root proteins in the current piana_graph
-> a root protein is a protein that was given by the user as input (ie. protein of interest)
"protein_type_name" is the type of code that should be used for printing proteins identifiers
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object (sys.stdout to print to screen) where interactions will be printed
"output_mode" can be:
- 'compact': all relevant information in one line
- 'extended': all information in text paragraphs
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"list_alternative_type_names" can be used to set a list of alternative types in case no protein_type_name code is found
--> user must provide pairs a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"tax_id_value" determines which species must be the proteins that will be printed
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
"list_keywords" sets a list of keywords that will be used to highlight important proteins in the network
-> currently, it doesn't do anything
"file_over_expressed" and "file_infra_expressed" are names of files that contain proteins that are either over expressed or infra expressed
"expression_protein_type" is the type of protein code used in files file_over_expressed and file_infra_expressed
- print_spot_protein_correspondence(self, spots_file_object, molecular_error_bounds, isoelectric_error_bounds, output_file_object, output_proteins_type, list_alternative_types, format_mode)
- Finds correspondences between proteins in the piana_graph and spots in a 2D electrophoresis gel
--> It does it by comparing molecular weights and isoeletric points of the spots with the mw and ip of protein sequences
--> prints to "output_file_object" (a file object) the spots matches for each error allowed, using protein codes indicated by "protein_type_name"
--> (one spot can be assigned to several proteinPianas and viceversa (this is just a matching by mw and ip...)
"spots_file_object": a file object with spots from a Gel, and their Molecular Weight and Isoelectric Point
text file with spots must follows format (one spot per line):
spot_id<TAB>Molecular Weight<TAB>Isoelectric Point
Attention!!! - Numbers must be in american style: 234234.45 and not 234234,45
- No headers or footers allowed
"mw_error_bounds" and "ip_error_bounds" are lists of error bounds (they must have the same number of elements)
the error bounds describe the percentage of error admitted when matching a spot mw or ip to the theoretical mw or ip of a protein
for example:
mw_error_bounds = [0.0, 0.0025, 0.005, 0.01]
ip_error_bounds = [0.0, 0.0025, 0.005, 0.01]
"output_proteins_type" is the easy-to-remember type name that will be used for printing the proteins that match the 2D gel
-> Valid protein_type_name are those listed in PianaGlobals.valid_protein_types
"list_alternative_types" can be used to set a list of alternative types in case no protein_type_name code is found
-> the user must provide a list of valid easy-to-remember type names
list_alternative_types can for example look like this: ["gi", "uniacc", "md5"]
I suggest always placing md5 at the end of alternative types, so you never get a None in the output
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
- protein_code_2_protein_code(self, input_file_object, input_proteins_type, output_file_object, output_proteins_type, list_alternative_types, format_mode, tax_id_value=0)
- Translates protein codes of a certain type into another type
"input_file_object" is the file object with the protein codes to translate (one protein code per line)
"input_proteins_type" is the type of code used to identify proteins in input_file_object
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"output_file_object" is the file object where the translated codes will be written
"output_proteins_type" is the target type of code
-> valid protein-type values are those defined in PianaGlobals.valid_protein_types
"format_mode" sets the type of format that will be used for output
- 'txt' will print flat text
- 'html' will print html
"tax_id_value" sets which will be the species for the proteins translated
--> valid tax ids are 0 (do not take into account the species) and those taxonomy ids provided by ncbi taxonomy
- reset_piana_graph(self)
- start a new piana_graph
- save_piana_graph(self, file_object)
- saves current piana_graph in file file_object
file_object must have been created using binary write (ir file_object = file(file_name, "wb")
Data and other attributes defined here:
- __dict__ = <dictproxy object>
- dictionary for instance variables (if defined)
- __weakref__ = <attribute '__weakref__' of 'PianaApi' objects>
- list of weak references to the object (if defined)
- info_values = {'all': None, 'cath': None, 'no': None, 'scop': None}
- max_min_values = {'max': None, 'min': None}
- valid_exp_output_modes = {'add': None, 'print': None}
- valid_expansion_modes = {'all': None, 'root': None}
- valid_expansion_types = {'cog': ['cog', 'expansionsamecog'], 'ec': ['ec', 'expansionsameec'], 'interpro': ['interpro', 'expansionsameinterpro'], 'scop': ['scop', 'expansionsamescop']}
- valid_format_modes = {'dot': None, 'html': None, 'txt': None}
- valid_methods = {'3dstruct': ['3dstruct', 'three dimensional structure', '3d structure'], 'adhesion': ['adhesion', 'interaction adhesion', 'interaction adhesion assay'], 'affinchrom': ['affinchrom', 'affinity chromatogra', 'affinity chromatografy', 'affinity chromatography', 'affinity chromatography technologies'], 'alanine': ['alanine', 'alanine scanning'], 'atomic': ['atomic', 'atomic force microsc', 'atomic force microscopy'], 'biacore': ['biacore', 'biacore sensor chip'], 'calcium': ['calcium', 'calcium mobilization', 'calcium mobilization assay'], 'chemotaxis': ['chemotaxis'], 'colocalization': ['colocalization', 'colocalization/visualisation technologies'], 'competition': ['competition', 'competition binding'], ...}
- valid_output_formats = {'network': None, 'table': None}
- valid_output_modes = {'compact': None, 'extended': None}
- valid_print_modes = {'all': None, 'all_root': None, 'connecting': None, 'only_root': None}
- valid_protein_types = {'emblacc': {'emblAccession': 'emblAccessionID'}, 'emblpid': {'emblPID': 'emblPID'}, 'fasta': {'protein': 'proteinSequence'}, 'geneName': {'geneName': 'geneName'}, 'gi': {'gi': 'giID'}, 'interpro': {'interPro': 'interProID'}, 'md5': {'protein': 'proteinMD5'}, 'pdb.chain': {'pdb': 'pdb_chain'}, 'pirAccession': {'pirAccession': 'pirAccessionID'}, 'pirEntry': {'pirEntry': 'pirEntryID'}, ...}
- valid_sim_modes = {'average': None, 'max': None, 'min': None, 'random': None}
- valid_source_dbs = {'bind': ['bind'], 'bind_c': ['bind_c'], 'blast_transfer': ['blast_transfer'], 'cog': ['cog'], 'completion': ['completion'], 'dbali': ['dbali'], 'dip': ['dip'], 'dip_c': ['dip_c'], 'expansion': ['expansion'], 'expansion_c': ['expansion_c'], ...}
- valid_term_types = {'biological_process': None, 'cellular_component': None, 'molecular_function': None}
- yes_no_values = {'no': None, 'yes': None}
|