-----------------------
README.pianaDB_limited
-----------------------

This file describes the database we distribute: pianaDB_limited 

----------------------------------------------------------------

If you do not want to build your database from scratch, you can use
the one we provide as a mysql dump. You can also use this database as
the starting point to build a more complete database (create
pianaDB_limited and then insert more info using the parsers we provide
and following instructions on README.populate_piana_db


To create pianaDB_limited in your mysql machine
-----------------------------------------------

- you need mysql server and client installed 
  (see piana/README.piana_requirements)

- you can have the database in one machine and the piana code in
  another machine

- you'll need mysql rights to create a database (otherwise, ask your
  system admin to create the database for you... you can then
  populate it as indicated below)


1 --> get the file pianaDB_limited.vX.X.mysqldump.gz from
      http://sbi.imim.es/piana

    (where vX.X is the version of the database)

    (we recommend using wget instead of doing an http download)

    $> wget http://sbi.imim.es/piana/pianaDB_limited.vX.X.mysqldump.gz

2 --> uncompress the file:
    $> gzip -d pianaDB_limited.vX.X.mysqldump.gz

3 --> create a database name in your mysql server

    $machine where you want to have the piana database> mysql
    $mysql> create database pianaDB_limited;

   (you can give any name you want to the database)

4 --> populate the database using file pianaDB_limited.vX.X.mysqldump

    $ > mysql --host=mysql_server_machine --database=pianaDB_limited < pianaDB_limited.vX.X.mysqldump

    (depending on your machine speed, it can take a few hours to
     completely load the database)


Now, you've got a database named pianaDB_limited that you can start
using with PIANA. Continue reading README.piana_tutorial for more
information on PIANA or start using PIANA by following examples on
README.piana_examples

Attention! pianaDB_limited does not contain all the information that
PIANA can contain: it is a limited version of the database. If you
want to take full advantage of PIANA, we recommend you populate piana
with more information (eg. MIPS, HPRD, BIND, STRING, ...). To learn
more on how to do this, continue reading...

Why a pianaDB_limited?
----------------------

Due to different reasons (ie. file size and copyright issues) we do
not provide the complete version of the piana database we use locally
at our lab. However, since we understand that creating and populating
a database is a painful (ie. takes a long time) task (although it can
be easily achieved by following the instructions on
README.populate_piana_db) we decided to provide a version of a PIANA
database along with the code.

Attention! Don't forget that if you use this piana database for your
analysis, in addition to citing the bioinformatics PIANA paper, you
must refer to the databases that were used to generate your
results. The papers to be referred for each database are written under
each directory of piana/data/externalDBs

Contents of pianaDB_limited
---------------------------

Remember: pianaDB_limited does not allow you to extract the full
potential of PIANA. This is just a limited version of the database so
you can test the system, but if you want real world results, you'll
need to create a more complete database. Read the following file:
README.populate_piana_database


--> These are the databases from which the information has been
    extracted to create pianaDB_limited:

   (for a full description of these databases please refer to
    README.populate_piana_db)

     ----------------------------------------------
  -- Databases of protein sequences and information --
     ----------------------------------------------

    -> ncbi taxonomy 3-Apr-2007

    -> uniprot (swissprot and trembl)
        UniProtKB/Swiss-Prot  3-Apr-2007
        UniProtKB/TrEMBL 3-Apr-2007

    -> genpept rel 158 (downloaded 3-Apr-2007)

    -> ncbi nr (downloaded 3-Apr-2007)

    -> ncbi swissprot and pdbaa  (downloaded 9-Apr-2007)

    -> pdbsprotec version 15-Jan-2007

    -> COG (downloaded January 2007)
    -> KOG (downloaded January 2007)

    -> SCOP 1.71 

    -> Gene Ontology (GO) (downloaded 3-Apr-2007)

     ---------------------------------
  -- Databases of protein interactions --
     ---------------------------------

  -> interactions from (Espadaler J., O. Romero-Isart, et al
    (2005) "Prediction of protein-protein interactions using distant
    conservation of sequence patterns and structure relationships"
    Bioinformatics 21(16):3360-8)

      --> these interactions are internally labeled as source database 'ori'

  -> DIP 19-Feb-2007
      
      --> these interactions are internally labeled as 'dip'

      --> for copyright issues, the method information has not been
          inserted: all interactions labeled as "x"


===> Attention! Apart from having a limited set of the interactions,
     pianaDB_limited has been also modified to make it smaller. The
     list of modifications that has been applied to this database is
     described below.

===> You can populate pianaDB_limited with more data! Follow
     instructions on README.populate_piana_db to parse MIPS, HPRD,
     BIND, STRING and your own interaction data. pianaDB_limited can
     be used as a starting point to create the complete piana database
     you'll use for your analyses. However, even if you populate 
     pianaDB_limited with more information, this will be a limited 
     database (see below)


===> If you are going to use pianaDB_limited and you need to know which
     method was used to detect the interaction (which as explained above
     has been deleted to respect the DIP license), I suggest you do the 
     following:

     - delete all interactions coming from DIP as explained in
       piana/code/dbModification/README.delete_interactions
     - parse DIP as explained in README.populate_piana_db

     --> Now, you have pianaDB_limited containing all methods used to
         detect interactions coming from DIP

The main particularity of this database with respect to the one we use
in our lab is that it *ONLY* contains protein interactions from DIP and 
from (Espadaler J., O. Romero-Isart, et al (2005) "Prediction of 
protein-protein interactions using distant conservation of sequence 
patterns and structure relationships" Bioinformatics 21(16):3360-8). 
Other interaction databases require licensing or special permissions,
 and we didn't include them in pianaDB_limited. However, they can be
easily downloaded from their homepages and can be inserted into your
own piana database by following the instructions on
README.populate_piana_db

DIP also requires permission from UCLA and David Eisenberg. In order
not to violate this license we have modified the PIANA database to
respect the requirements that were set by the DIP team: all DIP
interactions are marked in pianaDB_limited as 'dip' (table
interactionSourceDB). Furthermore, we haven't included any data on
which methods used for detecting the interactions: all interactions in
pianaDB_limited coming from DIP appear in the database as
methodID="x". For any extra information on these interactions you
should visit their homepage http://dip.doe-mbi.ucla.edu or parse again
DIP as explained above.

We provide the parsers needed to insert all the interaction data you
want into PIANA. We let PIANA users to choose which of those external
databases he wants to use by following instructions on
README.populate_piana_db. For example, if you want to have access to
method information (ie. know which method was used to detect the
interaction), the parser we provide for DIP inserts that information
into piana databases.

Other differences between  pianaDB_limited and a standard one:
 (these modifications have been made to make pianaDB_limited smaller)

   - the protein sequence has been removed from the database
     (md5 code has been left, so new proteins can be added to this
      database and still keep consistency)   

   - Removed description, function and disease information for
     proteins

All modifications that have been done are described in
piana/code/dbModification/remove_data_pianaDB_limited.sql

Briefly, pianaDB_limited is enough for you to start using PIANA, but
if you want to take full advantage of the data integration PIANA
provides, you should think about inserting information from other
external databases (eg. STRING, MIPS, IntAct...). It is quite easy,
although it takes time...(on our system, it has taken 5 days to
create pianaDB_limited from scratch). You've got step-by-step
instructions on README.populate_piana_db on how to to insert more
protein interaction data. You'll also need to follow those
instructions if you want to create a database from scratch, or if you
want to use PIANA as a parser, or if you want to update your database
with new information.