Structural BioInformatics Research Lab |
People | Research | Resources | Publications | Links | |
|
|
PIANA Protein Interactions And Network Analysis |
||||||||||||||||||||||||||||||||||||
PIANA can also be used as a stand-alone application to create
protein interaction networks and perform analyses on them.
Currently, PIANA can integrate into a single protein interaction network data extracted from DIP, MIPS, HPRD, BioGrid, IntAct, MINT, BIND, STRING and any data that follows the HUPO PSI standard . PIANA can also use interactions specified by the user via simple flat text files. Therefore, analyses can be performed on a single network regardless of the sources from which the interactions were extracted. Moreover, PIANA integrates proteins coming from UniProt and NCBI GenBank and contains co-references between different types of protein identifiers.
PIANA is not a network visualizer and doesn't (currently) have a nice graphical user interface. However, PIANA is a powerfool tool for performing analyses of protein interaction networks and integrates into a single network most repositories of protein interaction data. Therefore, PIANA will be of a high interest for bioinformaticiens already used to computers. This is the standard procedure of use for PIANA:
All README files linked from this page can be found in the main directory of PIANA once you have installed it on your machine.
In addition to providing a framework for working with protein interaction networks, PIANA can also be used as a stand-alone application to create and analyze protein interaction networks. Data Integration PIANA accepts most types of protein codes and contains coreferences between the different types. Therefore, PIANA accepts data from most external databases, and interactions from different sources are integrated into a single network. Moreover, the list of input proteins provided by the user can be in any of the protein code types accepted by PIANA: uniprot entry names and accession numbers, gene names, NCBI GenBank gi, geneID, unigene, FlyBase, ENSEMBL, PDB, PIR and the protein sequence in fasta format. PIANA transforms these codes into its internal identifiers, processes the data and returns the results using the type of protein code chosen by the user. PIANA contains a very extensive mapping between protein identifiers that has been created by parsing multiple databases and applying our own algorithms for creating coreferences. In consequence, PIANA users can work with all interactions from all databases integrated into a single network, allowing them to perform more comprenhensive studies of the protein interaction networks PIANA can also be used as a translator between different code types. Creation of protein-protein interaction networks The user can choose to retrieve the interactions from the PIANA MySQL database, add his own interactions or a combination of both. Usually, a list of proteins of interest is given as input (referred hereafter as “root proteins”) and PIANA adds interactions extracted from the database for these proteins until the depth (number of interaction steps from a root protein) chosen by the user is reached. The user can also restrict the network to contain proteins and interactions according to different criteria: the species of the proteins, the source interaction databases and the method used to determine the interaction (e.g build network for human protein interactions detected by means of two hybrid experiments). Interpreting the protein interaction network The network and analysis results can be printed out as a detailed table of protein interactions or a file for graphical visualization. PIANA can also present only the interactions that appear in the intersection of the databases set by the user. Furthermore, PIANA can identify as well proteins that act as “linkers” between root proteins. Connecting two root proteins is an indication of important proteins in the pathways where the root proteins are involved. --> New!: PIANA now outputs results in Cytoscape format. You can combine the data integration and analysis of PIANA with the visualization capabilities of Cytoscape. When visualizing the network the user can ask PIANA to highlight proteins that have specific keywords in their function or description. PIANA also accepts files with over/under expressed genes to indicate in its output which of the proteins in the network also appear as being "relevant" in a microarray experiment Checking pathways related to your PPI network Given a list of proteins that are known to belong to a specific pathway, PIANA checks which proteins of the network appear on those pathways. Furthermore, if you have different PPI networks you can ask PIANA to compare them in terms of the pathways that are 'affected' by each network. Predicting new interactions PIANA can predict protein interactions by transferring interactions between proteins that share a given property. For example, PIANA predicts interactions using interologs (i.e orthologous proteins interact with the same proteins) by means of COG codes. In a similar way, SCOP codes can be used to transfer interactions between proteins that share a similar type of domain family. Finding “interaction distance” between proteins Obtaining lists of proteins that are at a certain interaction distance (ie. minimum number of interacting steps that have to be taken between two proteins) from another protein can be useful for tasks such as searching for remote similarities between proteins. PIANA integrates algorithms such as Dijkstra for efficiently finding the interaction distance and the set of proteins that is at a given distance from a root protein. Identifying spots in 2D gels from electrophoresis experiments In combination with the results of an electrophoresis experiment, PIANA can be used to accurately identify spots in a 2D gel. By comparing the molecular weight and isoelectric point of the proteins in the network with the features of the spots in the 2D gel, spots that could not be identified by mass spectrometry can be assigned to proteins in the network. Clustering proteins using their GO terms Networks can become very complex and hence, clustering methods are needed to ease their interpretation. PIANA provides a clustering library for protein interaction networks and specifically, methods for clustering proteins by their GO terms. Extending PIANA PIANA is designed so that new and independent modules can be easily added. Moreover, PIANA libraries can be used to work with protein interaction networks in external python programs that do not want to take care of the low level operations related to graphs, databases and protein interaction networks. For more PIANA capabilities, please refer to the documentation provided along with the code. Moreover, all PIANA commands are described in this file: general_template.piana_conf
PIANA can be used as a stand-alone application or as a library for working with graphs, protein-protein interaction networks and protein data. These are some of the README files provided along with the code. You can read them before deciding whether it is worth it for you to download PIANA or not:
If you wish to use PIANA for implementing your own classes, programs or scripts, you can use its classes and methods. Here, you can read the documentation related to the four fundamental classes contained in PIANA:
Full HTML documentation of all classes and methods is provided along with the code (piana/docs/documentation/piana_documentation.html). If you are just interested in using the parsers provided by PIANA, you should read README.populate_piana_db
PIANA requires a MySQL database for creating the protein-protein interaction networks. Currently, we do not provide a web server, therefore PIANA users must use the tools we provide to create their own database. On one side, this increases the difficulty of installing PIANA. On the other side, this gives more control to PIANA users on the data they use for their analysis. There are two options for the PIANA user:
PIANA is normally used to perform the analysis of protein interaction networks built from a group of "interesting proteins", where 'interesting' can refer to different concepts: their genes were found over/under expressed in a microarray experiment, proteins known to be involved in a pathway being studied, proteins identified by mass spectrometry, etc. A standard use of PIANA would be:
To illustrate the analyses performed by PIANA, we have used genes that mediate breast cancer metastasis to lung, extracted from an article published by the Massague group at Memorial Sloan Kettering: click here to see the analysis we have performed for these genes This file describes step by step how the above analysis was performed using PIANA. For a complete listing of all PIANA options, this file describes in detail all input and output parameters to PIANA, as well as commands that PIANA can execute
This will allow you to continue with the download (ie. wget -c) in case there are any problems during the connection.
If you use this software to either build your own code or perform biological analyses, please do not forget to make reference to this article:
If you use the interaction predictions by distant structure/sequence patterns provided with pianaDB_limited, please do not forget to make reference to this article:
Don't forget that if you use PIANA for your analyses, apart from citing PIANA you must make reference to the databases that contain the information that allowed you to reach your results. For example, if PIANA finds interactions extracted from DIP for your proteins of interest, you must also make reference to DIP in your articles.
If you encounter problems using PIANA, or have suggestions on how to improve it, send an e-mail to boliva at imim.es PIANA is under GNU General Public License. |
Disclaimer | Structural BioInformatics Research Laboratory | webmaster: agonzalez at imim.es |