----------------------- README.pianaDB_limited ----------------------- This file describes the database we distribute: pianaDB_limited ---------------------------------------------------------------- If you do not want to build your database from scratch, you can use the one we provide as a mysql dump. You can also use this database as the starting point to build a more complete database (create pianaDB_limited and then insert more info using the parsers we provide and following instructions on README.populate_piana_db To create pianaDB_limited in your mysql machine ----------------------------------------------- - you need mysql server and client installed (see piana/README.piana_requirements) - you can have the database in one machine and the piana code in another machine - you'll need mysql rights to create a database (otherwise, ask your system admin to create the database for you... you can then populate it as indicated below) 1 --> get the file pianaDB_limited.vX.X.mysqldump.gz from http://sbi.imim.es/piana (where vX.X is the version of the database) (we recommend using wget instead of doing an http download) $> wget http://sbi.imim.es/piana/pianaDB_limited.vX.X.mysqldump.gz 2 --> uncompress the file: $> gzip -d pianaDB_limited.vX.X.mysqldump.gz 3 --> create a database name in your mysql server $machine where you want to have the piana database> mysql $mysql> create database pianaDB_limited; (you can give any name you want to the database) 4 --> populate the database using file pianaDB_limited.vX.X.mysqldump $ > mysql --host=mysql_server_machine --database=pianaDB_limited < pianaDB_limited.vX.X.mysqldump (depending on your machine speed, it can take a few hours to completely load the database) Now, you've got a database named pianaDB_limited that you can start using with PIANA. Continue reading README.piana_tutorial for more information on PIANA or start using PIANA by following examples on README.piana_examples Attention! pianaDB_limited does not contain all the information that PIANA can contain: it is a limited version of the database. If you want to take full advantage of PIANA, we recommend you populate piana with more information (eg. MIPS, HPRD, BIND, STRING, ...). To learn more on how to do this, continue reading... Why a pianaDB_limited? ---------------------- Due to different reasons (ie. file size and copyright issues) we do not provide the complete version of the piana database we use locally at our lab. However, since we understand that creating and populating a database is a painful (ie. takes a long time) task (although it can be easily achieved by following the instructions on README.populate_piana_db) we decided to provide a version of a PIANA database along with the code. Attention! Don't forget that if you use this piana database for your analysis, in addition to citing the bioinformatics PIANA paper, you must refer to the databases that were used to generate your results. The papers to be referred for each database are written under each directory of piana/data/externalDBs Contents of pianaDB_limited --------------------------- Remember: pianaDB_limited does not allow you to extract the full potential of PIANA. This is just a limited version of the database so you can test the system, but if you want real world results, you'll need to create a more complete database. Read the following file: README.populate_piana_database --> These are the databases from which the information has been extracted to create pianaDB_limited: (for a full description of these databases please refer to README.populate_piana_db) ---------------------------------------------- -- Databases of protein sequences and information -- ---------------------------------------------- -> ncbi taxonomy 3-Apr-2007 -> uniprot (swissprot and trembl) UniProtKB/Swiss-Prot 3-Apr-2007 UniProtKB/TrEMBL 3-Apr-2007 -> genpept rel 158 (downloaded 3-Apr-2007) -> ncbi nr (downloaded 3-Apr-2007) -> ncbi swissprot and pdbaa (downloaded 9-Apr-2007) -> pdbsprotec version 15-Jan-2007 -> COG (downloaded January 2007) -> KOG (downloaded January 2007) -> SCOP 1.71 -> Gene Ontology (GO) (downloaded 3-Apr-2007) --------------------------------- -- Databases of protein interactions -- --------------------------------- -> interactions from (Espadaler J., O. Romero-Isart, et al (2005) "Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships" Bioinformatics 21(16):3360-8) --> these interactions are internally labeled as source database 'ori' -> DIP 19-Feb-2007 --> these interactions are internally labeled as 'dip' --> for copyright issues, the method information has not been inserted: all interactions labeled as "x" ===> Attention! Apart from having a limited set of the interactions, pianaDB_limited has been also modified to make it smaller. The list of modifications that has been applied to this database is described below. ===> You can populate pianaDB_limited with more data! Follow instructions on README.populate_piana_db to parse MIPS, HPRD, BIND, STRING and your own interaction data. pianaDB_limited can be used as a starting point to create the complete piana database you'll use for your analyses. However, even if you populate pianaDB_limited with more information, this will be a limited database (see below) ===> If you are going to use pianaDB_limited and you need to know which method was used to detect the interaction (which as explained above has been deleted to respect the DIP license), I suggest you do the following: - delete all interactions coming from DIP as explained in piana/code/dbModification/README.delete_interactions - parse DIP as explained in README.populate_piana_db --> Now, you have pianaDB_limited containing all methods used to detect interactions coming from DIP The main particularity of this database with respect to the one we use in our lab is that it *ONLY* contains protein interactions from DIP and from (Espadaler J., O. Romero-Isart, et al (2005) "Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships" Bioinformatics 21(16):3360-8). Other interaction databases require licensing or special permissions, and we didn't include them in pianaDB_limited. However, they can be easily downloaded from their homepages and can be inserted into your own piana database by following the instructions on README.populate_piana_db DIP also requires permission from UCLA and David Eisenberg. In order not to violate this license we have modified the PIANA database to respect the requirements that were set by the DIP team: all DIP interactions are marked in pianaDB_limited as 'dip' (table interactionSourceDB). Furthermore, we haven't included any data on which methods used for detecting the interactions: all interactions in pianaDB_limited coming from DIP appear in the database as methodID="x". For any extra information on these interactions you should visit their homepage http://dip.doe-mbi.ucla.edu or parse again DIP as explained above. We provide the parsers needed to insert all the interaction data you want into PIANA. We let PIANA users to choose which of those external databases he wants to use by following instructions on README.populate_piana_db. For example, if you want to have access to method information (ie. know which method was used to detect the interaction), the parser we provide for DIP inserts that information into piana databases. Other differences between pianaDB_limited and a standard one: (these modifications have been made to make pianaDB_limited smaller) - the protein sequence has been removed from the database (md5 code has been left, so new proteins can be added to this database and still keep consistency) - Removed description, function and disease information for proteins All modifications that have been done are described in piana/code/dbModification/remove_data_pianaDB_limited.sql Briefly, pianaDB_limited is enough for you to start using PIANA, but if you want to take full advantage of the data integration PIANA provides, you should think about inserting information from other external databases (eg. STRING, MIPS, IntAct...). It is quite easy, although it takes time...(on our system, it has taken 5 days to create pianaDB_limited from scratch). You've got step-by-step instructions on README.populate_piana_db on how to to insert more protein interaction data. You'll also need to follow those instructions if you want to create a database from scratch, or if you want to use PIANA as a parser, or if you want to update your database with new information.