---------------------
README.piana_tutorial
---------------------

Read this file and follow the instructions provided.

### =========================================================== ###
###              INTRODUCTION                                   ###
### =========================================================== ###

PIANA - Proteins Interactions And Network Analysis


PIANA is a framework for (i) integrating data from multiple sources
about proteins and their interactions; (ii) working with this data in
an easy-to-use manner; and (iii) doing automatic analysis of the
protein interaction networks. It has been mainly designed to be used
in batch mode, but it also features a simple user interface.


Main help files you should read for learning how to use PIANA.

- piana/README.piana_tutorial     -- describes PIANA and how to use it

- piana/README.piana_installation -- explains how to install PIANA

- piana/README.piana_examples     -- examples on how to use PIANA

- piana/code/execs/conf_files/general_template.piana_conf 
                                  -- description of all piana commands 
                                     and parameters. Describes as well
                                     the format followed by PIANA
                                     outputs

- PIANA reference card            -- the reference card you should 
                                     always have next to you
                                     (read section "PIANA reference
                                       card" to learn how to create it
                                      for your PIANA database)

- our website: http://sbi.imim.es/piana
                                  -- check that you have the latest
                                     version of PIANA. You can also
                                     subscribe to our PIANA groups
                                     to receive an email when we
                                     release a new version.
                                  -- for any questions that you cannot
                                     solve reading the documentation,
                                     you can always ask your question
                                     at the PIANA group for discussion


If you are 100% new to PIANA (ie. you need to install it) read section
"Preparing PIANA". If you are a user of PIANA, or you already have
access to a PIANA installation and/or PIANA database, you can skip that 
section.

If you are one of those individuals that cannot wait to use a new toy,
read README.read_me_first and maybe you are lucky enough to learn how
to use PIANA in just a few minutes... then, you should give a try to
the examples in README.piana_examples


### =========================================================== ###
###           PREPARING PIANA                                   ###
### =========================================================== ###

PIANA code must be installed somewhere in order to be used (no web
server services currently provided). Apart from the code, you will
also need to either create and populate your own piana database or
have access to an external one (a team of people can have PIANA
installed on several computers but have all data centralized in a
single mysql server). You can also use the database provided in our
web page.
   
 --> if you need to install PIANA on your computer, follow steps
     in file piana/README.piana_installation

 --> the piana database can be either on your computer or in a
     PIANA host (i.e. a machine with mysql server):

     - if you will be using an external piana database, find out
       the name of the database you'll be using and the server
       where it is located

     - if you have to use your own piana database, follow steps in
       file piana/README.populate_piana_db before continuing
       reading. You can also use the database we provide along with
       the code pianaDB_limited. Read piana/README.pianaDB_limited
       for more info on this subject.


### =========================================================== ###
###             PIANA REFERENCE CARD                            ###
### =========================================================== ###

In many occassions across this text and other README files, you will
be referred to the PIANA reference card. This card contains (almost)
all information you will need to have at hand when you become an
expert on PIANA, as well as other useful information that will help
you speed your PIANA usage.

For example, as you might already know, PIANA accepts multiple protein
identifier types for your input and output. For example, PIANA can
receive as input gene names and produce the output for you using NCBI
GeneID. Althougth this is trasparent to you, there is one thing you
must know when asking PIANA to do the work: how to tell PIANA that
your input identifiers are gene names. And... how do you do that?
Usually, by setting the command line argument as follows
--input-id-type=geneName; or by setting the parameter input-id-type in
your configuration file. And the same for your output:
--output-id-type=geneid.

How did I know that NCBI GeneID is called by PIANA geneid?

I read the PIANA reference card: along other information, it will tell
you the PIANA name for each type of identifier that it accepts.


I hope you are convinced you need to print the PIANA reference card
before continuing to read this tutorial. To do so:

$> cd piana/code/execs
$> python piana.py --piana-dbname=your_pianaDB --piana-dbhost=your_host --print-reference-card > PIANA_reference_card.txt


### =========================================================== ###
###             PIANA MAIN COMPONENTS                           ###
### =========================================================== ###

How does PIANA work?

This is the standard procedure of use for PIANA:

 1. Create a configuration file as described in the template for
    configuration files: 
            piana/code/execs/conf_files/general_template.piana_conf

    (you have several examples of PIANA configuration files under
     directory piana/code/execs/conf_files )
   
 2. Run piana: 
    $> python piana.py --configuration-file=your_configuration_file

    (you can as well set some parameters through the command line,
     see below for more details on this)
 
 3. Analyze your results


The following is a short explanation of the different modules of
PIANA:

(if you can't wait to use piana, README.piana_examples is a 
 step-by-step guide for using (and learning) PIANA)

-------------
- piana.py  -
-------------

To use PIANA as a tool, you must call piana.py with the command 
line arguments that will tell PIANA what do you want it to do.

   - piana.py is to be found under directory piana/code/execs

   - piana.py admits two execution modes, interactive and batch

      - in interactive mode, parameters are requested to the 
        user interactively. (some specific parameters are not 
        asked to the user and, if no configuration file provided,  
        defaults are taken)

      - in batch mode, all parameters and commands are written
        in a configuration file.

      - in interactive execution mode, you can configure the parameters 
        through a configuration file, but not the commands

      - use piana/code/exec/conf_files/general_template.piana_conf to
        create your own configuration file

      - see examples of configuration files under
        piana/code/exec/conf_files/

      - main parameters can also be set on the command line (instead
        of doing it in the configuration file) for easy-and-quick 
        changes of parameter values. There are some specific 
        parameters that cannot be set through the command line (eg
        list-source-methods). A command line argument overwrites
        whichever value was set in the configuration file. This is
        useful because you can have a configuration file with most
        parameters set to default values and then tune your PIANA
        execution by overwriting defaults with new values set in
        the command line
        
   - to get a list of command line options 
     (remember you have as well the PIANA reference card)

        $> python piana.py --help 

   - typical piana actions are described in
                               piana/code/execs/README.piana_examples

   - one typical example would be:
       
       # go to the piana execution directory
       $> cd piana/code/execs

       # create a file with one protein per line 
       $> cat > example_protein.txt
          BAXA_HUMAN
         (Ctl-d to exit the cat command)

       # execute piana with the configuration file to get 
       # network and table
       $> python piana.py --input-file=example_protein.txt --piana-dbname=pianaDB_limited --piana-dbhost=localhost --depth=1 --input-id-type=unientry --output-id-type=unientry --results-prefix=example_results --configuration-file=conf_files/get_example_results.piana_conf

       # visualize the network
       $> neato -Tgif -o example_results.gif example_results.all.print-network
       $> xview example_results.gif


    If you take a look at the configuration file that was used
    (conf_files/get_example_results.piana_conf) you will see the
    reason you are obtaining those files with results (ie. you
    executed different commands). Read file
    piana/code/exec/conf_files/general_template.piana_conf for a 
    full explanation of each command and a description of the
    format followed by the different PIANA outputs.

     Note: 

      - piana.py will work at the same time with all proteins
        provided in the input file: the network will be built 
        from all proteins, by retrieving known interactions 
        for each of the proteins in your input file

      There is another way of running PIANA that lets you 
      perform multiple piana runs with just one command, whether
      you want to run piana.py for all protein files in a 
      given directory or you want to create an independent
      network for each protein in an input file: run_multiple_pianas.py

      - run_multiple_pianas.py has two modes:


   mode 1) in which you set an input_file_name and a separate
           network is built for each protein in the file and
           your commands are executed on that network. When
           PIANA finishes processing that network and writing
           results, it will go to the next protein and repeat
           the same operations.
           --> PIANA will work separatedly with each protein: 
               one network will be built at a time for each 
               protein, and analyses will be done on this 
               individual network. In cases where you do not 
               need the network to contain all your input proteins 
               (eg. predicting interactions) it is faster to use
               run_piana_protein_by_protein.py

   mode 2) in which you set an input_dir that contains files with
           proteins (all have to be of the same type of identifier)
	   and piana.py is run for each of those files independently.
           This is useful when you want to apply the same commands
           (with the same parameters) to a number of input protein 
           files          


    File README.piana_examples has more examples on how to use piana.py
    and run_multiple_pianas.py.


-----------------------------------------------------------

- PIANA databases contain all the information PIANA needs -

-----------------------------------------------------------

   - If you are a PIANA user not very interested in how it works,
     you can safely skip this section. However, you will be even
     safer if you read it: there are some details that might be
     important to decide how are you going to use PIANA.

   - PIANA databases follow the structure described in
     piana/code/dbCreation/create_piana_tables.sql

   - each time you work with PIANA, you must specify which PIANA 
     database should it use by setting the parameters piana-dbname,
     piana-dbhost and depending on the system (only if your mysql
     server requires authentication), piana-dbuser and piana-dbpass

   - the results PIANA will produce are those extracted from the
     database you told PIANA to work with. If you use a dummy
     database, the results will also be dummy. If you use a good
     database, the results will be good. To create your own PIANA
     database and populate it with information from UniProt, NCBI,
     IntAct, DIP, etc, read README.populate_piana_db

   - PIANA uses a long integer to uniquely identify each protein. This
     identifier is called proteinPiana. Each identifier corresponds to
     one unique combination of the protein sequence and the protein 
     taxonomy id (ie. different [sequence, tax i] = different 
     proteinPiana). Therefore, if there are two proteins in a given
     species that have the same sequence, they will be considered to 
     be the same protein, ie. they will have the same proteinPiana. 
     We use tax ids from NCBI. From the PIANA user perspective, this
     is trasparent to you: you can work with PIANA for years and 
     never read proteinPiana in your output files, since PIANA will
     identify proteins using your preferred type of identifier (e.g.
     NCBI GeneID)

     There are many tables in the PIANA database that contain
     correspondences between proteinPianas and third-party identifiers. 
     For example, table geneName contains correspondences between 
     proteinPianas and gene names. PIANA uses these tables to translate
     between identifiers.  All operations done by PIANA internally use
     proteinPiana as an identifier, and only at the last step it 
     translates the identifier to the type demanded by the user. Types
     of identifiers that PIANA accepts and they way PIANA uses to refer
     to these identifiers can be read on the PIANA reference card (see 
     section "PIANA reference card" on how to print it).

     Attention! proteinPiana identifiers are not maintained across
     PIANA databases (unless you have synchronized DBs, see below): 
     you cannot use proteinPiana as an identifier for your 
     protein of interest if you are going to use several PIANA 
     databases. Even if you only use one database, you will update it
     some day, and your proteinPianas will not be coherent between
     the updated version and the one you are using now. For example,
     MOT1_YEAST might be 11111 in one piana database and 22222 in the 
     next version created: it all depends on the order in which the 
     proteins are inserted. However, there are some cases in which it 
     can be useful to keep your results using proteinPianas 
     (eg. predicting interactions), but keep in mind that they are 
     only valid for that specific database (or a synchronized one).

     When analysing your results, you might face the following
     problem: two proteins that in reality are "the same" appear in
     the results as two separate entitities. This occurs when there
     are two similar sequences that were obtained from two different 
     databases and there was no co-reference between them. Different 
     techniques have been implemented to avoid this problem (see 
     section 'PIANA and protein names') but it is not 100% solved, 
     because it is impossible to create a perfect database from 
     far-from-perfect databases (and, let's admit it: there is no one
     single biological database that is perfect).

   - if you have access to a pianaDB (i.e. somebody else already created
     it in your lab) then all you need is access to the machine where it
     is installed (and, if required, a mysql user name and password)

   - if you don't have access to a pianaDB, you will either need to
     create one (read README.populate_piana_db for that) or use the
     one we provide in our website (read README.pianaDB_limited
     for that)

   - Although PIANA has parameters that let you set which
     third-party databases have to be used for building the
     networks, these restrictions slow down the program. To avoid
     introducing restrictions (e.g. use only interactions detected
     experimentally) what we do in our PIANA MySQL server is
     that we have a piana database that only contains experimental
     data (DIP, MIPS, HPRD, BIND, ...) and another database with all
     interaction data (DIP, MIPS, HPRD, BIND, STRING, interologs,
     predictions from sequence/structure, ...). Then, each user
     chooses which database to use - only experimental or
     all interactions - just by setting the corresponding 
     database in parameter piana-dbname

     Our advise is that you keep these two separate DBs (only experimental
     and experimental + predictions) in a  synchronized way.
     This is very easy to do, and can be very useful when building 
     your networks (for example, you can use parameter use-secondary-db
     in your piana configuration files).
     How do you keep two synchronized PIANA databases: simply, start  
     creating a database as indicated in README.populate_piana_db. When
     you reach the point where there are interactions you don't want to be
     in your primary PIANA database, do a mysqldump for the database and
     then use that dump to create a new PIANA database (eg. your primary
     PIANA database can be called pianaDB_experimental, then you do a 
     mysqldump (see documentation on MySQL website) and then you use
     that dump to create a pianaDB_all_ints. Then, you simply continue
     parsing the interaction databases you wish into pianaDB_all_ints.
     Therefore, pianaDB_experimental and pianaDB_all_ints will use the 
     same proteinPianas and will contain the same protein information. 
     The only difference will be in the number of interactions they will
     contain, pianaDB_experimental being a subset of pianaDB_all_ints.
     Nevertheless, this is not required, since you can always restrict
     your network to contain specific interactions by setting the
     appropiate parameters (i.e. list-source-dbs and inverse-dbs) in your
     piana configuration file for your experiment.
   

   - if you want to add information to a PIANA database, you can 
     use the parsers described in README.populate_piana_db or
     create your own parsers. 


   - what is parameter use-secondary-db in the configuration files?

     As presented previously, in our lab we maintain two different
     PIANA databases: one with experimental interactions and 
     another one with experimental and predicted interactions. There
     are cases in which we want to retrieve interactions for a group
     of proteins, restricting those interactions to be only from 
     experimental methods. But, for those cases in which no interactions
     can be found for a given protein using the primary PIANA database
     (i.e. the one set in piana-dbname), sometimes we want to add
     predictions automatically to the network, just to be sure that
     at least, something is said for that protein. Parameter 
     use-secondary-db is used in these cases: it tells PIANA to use the 
     secondary PIANA database for proteins for which no interactions
     can be found in the primary database.
     Apart from setting use-secondary-db to yes in your configuration
     file, you will need to update file 
     piana/code/utilities/piana_configuration_parameters.py with the 
     name, host, user and pass of your secondary PIANA database.

     More information on the secondary db can be read in file
     piana/code/execs/conf_files/general_template.piana_conf

---------------------------------------------------------------
- interface to pianaDB_* : piana/code/PianaDB/PianaDBaccess.py
---------------------------------------------------------------

   - methods and classes used to access information in pianaDB

   - you don't need to use this component unless you are developing
     piana code or accessing the PIANA database from your own
     programs.

   - full documentation of PianaDBaccess can be found at:
     piana/docs/documentation/pydoc_docs/PianaDBaccess.html

   - if you want to use PIANA for developing your own code, please
     read README.piana_developers
         
-------------------------------------------------------------
- parsers for third party databases - piana/code/dbParsers/*
-------------------------------------------------------------

   - PIANA databases are populated with data from third party
     databases (eg. UNIPROT, DIP, BIND, ...). In order to 
     populate the PIANA database, we provide a number of 
     parsers for these external databases.
      
   - all parsers can be found under piana/code/dbParsers

   - to learn how these parsers are used, read
     piana/README.populate_piana_db

   - if you want to develop your own piana parsers, please 
     read section "PIANA parsers" of README.piana_developers

----------------------------------------------
- Graph Management tools - piana/code/Graph/*
----------------------------------------------

   - classes and methods used to manage networks

   - you don't need to use this component unless you are 
     developing piana code

   - full documentation of classes Graph, PianaGraph and 
     others can be found at:
     piana/docs/documentation/piana_documentation.html

   - if you want to use the graph library of PIANA, please 
     read README.piana_developers


-------------------------------------------------
- PIANA library - piana/code/PianaApi/PianaApi.py
-------------------------------------------------

   - PianaApi.py is the module you have to import from your
     python script if you want to use PIANA directly from
     your code. PianaApi has all methods related with creating,
     analyzing and working with PIANA. In fact, piana.py is 
     just a user interface to PianaApi.

   - All PianaApi methods are documented in 
     piana/docs/documentation/PianaApi.html

### =========================================================== ###
###             USING PIANA                                     ###
### =========================================================== ###

Once you have installed PIANA and you know which piana database you
will be using, you are ready to start using PIANA.

First of all, it is recommended to read all this help file
(piana/README.piana_tutorial). Then you can read some examples
(piana/README.piana_examples) for different cases where piana 
has been used.

If you are going to use piana in its interactive mode (which provides
the same functionalities as the batch mode, with the difference that 
commands have to be executed manually one by one and that some 
parameters cannot be set) you can already try it by doing:

$piana/code/execs> python2.3 piana.py

PIANA will ask you for some information needed for execution (database
and mysql server) and then will show you a menu with all execution
options. You should start building a network using commands
add-protein, add-proteins-file or add-interactions-file. Alternatively, 
if you gave an argument --input-file in the command line the network 
will be automatically built before presenting the menu. In our lab, we
never use PIANA in the interactive mode, and althougth all efforts are
done to assure it works, we all know that things that are not 
frequently use, are more likely to contain errors.

If you are going to use PIANA in its batch mode (which is the 
mode for which piana has been designed, we strongly suggest that you
use this mode) then read README.piana_examples to learn more about it.

For a complete description of all piana commands and parameters, and 
to interpret the outputs of PIANA commands, please read the descriptions 
given on piana/code/execs/conf_files/general_template.piana_conf


PIANA types of users
--------------------

PIANA has three types of users: "developer" "advanced_user" and 
"simple_user"

Most users will be "simple_user". Unless you are going to use PIANA
for analyzing the topology of your network (connectivity, adjacency
matrix, ...), you shouldn't worry about the types of users: the
default version of PIANA is set to "simple_user". If you believe you 
are not a 'simple_user', you can modify your profile on file
piana/code/utilities/piana_configuration_parameters.py

Types of users have been introduced so standard users of PIANA do not
need to install all external modules required for doing more 
complicated network operations. If you change your profile to be a
"developer" or "advanced_user", you'll need to respect as well the
extra requirements listed in README.piana_requirements for non
standard users of PIANA.

This section might not be the best place to mention the following, but
I'll do it anyway: PIANA is not very good at analyzing the interaction
networks from a mathematical network perspective: we currently do not
provide many internal methods that calculate things such as clustering
coefficients or betweenness or scalefreeness. However, if you are
interested in keeping an integrated repository of interactions coming
from multiple sources, or you want to perform biological analyses of
your networks, then PIANA is the perfect tool for you (and I might
add, the best on the market ;-) )


PIANA memory usage
------------------

PIANA works in connection with a MySQL database. In order to make
PIANA faster, most of the information that has been queried is stored
in memory in order to avoid repeated database queries. However, in
large networks it can be a roblem, as the memory is filled quickly. In
order to be able to manage large networks without memory problems, a
parameter is specifyied in configuration files: "memory-usage". 

The parameter memory-usage can take values:

- "high": all information of the network is stored in memory.
          Information from database is only retrieved when needed,
          but then it is stored in memory.
          It is slower to build the network, but it is faster when 
          information is printed more than once.

- "low": It uses low memory, as all the information is retrieved
         from database when needed, and it is never stored in memory.
         It is faster to build the network, but it is slower to print
         and to post-process the network.

The default value for this parameter is "high". It is recomended to
use memory_usage=high. If the network is too big and you have memory
problems, then set memory_usage to "low". See comments on Graph for 
more info on the different memory usages available in PIANA


PIANA protein identifier types
------------------------------

Something important when using PIANA is the type of protein identifier
you use. It is well known to all of us how messy the world of "protein
naming" is, and although PIANA tries to simplify this by providing a
unique interface for all codes, there is still something you must do:
tell PIANA which type of identifier you want to use each time you run
it.

The types of identifiers that PIANA admits can be seen on the PIANA
reference card (to print the PIANA reference card, read (above)
section PIANA reference card). For example, if you are using Uniprot
Accessions to keep your lists of proteins, then your id type is
Uniprot Accession (which in PIANA is refer as 'uniacc'). And
therefore, when running PIANA you will do something like:
 $> python piana.py --input-id-type=uniacc --configuration-file=...


Most PIANA names for identifier types are self explanatory, and you
can read their description on the PIANA reference card. One important
remark is that when you give PDB identifiers to PIANA, you must always
specify the chain of the PDB code, respecting the format
"pdbfile.chain" (for example 1b5n.a)


PIANA and protein identifiers (i.e. the name given to a protein)
----------------------------------------------------------------

Protein naming is one of the main problems faced by biologists and
bioinformaticians. Databases are not coherent, there are thousands of
name conflicts, the same sequence can have many names associated and
the viceversa, the same name can refer to different sequences. PIANA
tries to alleviate this problem by doing the following:

- the user can use any type of protein identifier for his/her input
  proteins. The only requirement is to tell PIANA which type of
  identifier he/she is using

- the user can set the type of identifier he/she wants to use for
  outputting information

- PIANA uses as internal identifier what we call a proteinPiana (an
  integer). Each proteinPiana is linked to a pair [sequence, tax id], 
  so there is a unique identifier for each protein. Two proteins 
  with the same sequence of different species will have a different
  proteinPiana. Read more about proteinPianas on section "PIANA 
  databases" of this README file.

- PIANA has thousands of cross-references between third-party
  databases (obtained from several repositories) and all third-party
  identifiers are linked to a proteinPiana. Therefore, at any PIANA
  execution the process followed is: 1) user identifiers are
  'translated' to proteinPianas; 2) all network operations performed; 3)
  protenPianas are translated to the type of identifier demanded by the
  user (and final analysis done based on the protein interaction network
  for the user identifiers)

- PIANA uses a unique protein name in the output:

  PIANA "guarantees" that each protein name used in the output
  refers to a different [sequence, tax id] and that sequences that are
  in fact "the same" protein appear as a single node of the network.
  Due to the problems mentioned before, one might find that PIANA has 
  considered two proteins to be the same in cases where this was not 
  true, but there is little we can do against this. Furthermore, if the 
  user gave a list of proteins to build the network with, PIANA will 
  give preference to those names over other names found in the database 
  for that protein.

  This is achieved by doing a 'name unification', by which all 
  proteinPianas in the network that share at least one external id 
  (eg. gene name) are linked to the same protein name. The type 
  of external id is  determined by the user using the parameter 
  output-id-type on his piana configuration file. If you want to 
  learn more about this unification,  you'll have to look  into 
  PianaGraph.py, mainly the comments on method _create_unified_network

- Advice: gene names are by far the worst protein code type that
  exists... if you use gene names, expect problems with the names when
  interpreting the results... Moreover, since PIANA joins in a single
  "node" those proteins that have the same gene name, you might find
  that many of your proteins are placed in the same node (for example,
  if all proteins belonging to the same complex have the same gene
  name). Whenever it is possible, PIANA gives the official gene name 
  to a protein and disregards other gene names that are also related
  to that protein. However, many wet lab biologists do not use official
  gene names in their daily work, and that's why PIANA is prepared to
  accept as well those names as input. To reduce gene name ambiguity to 
  a maximum, PIANA uses the species given by the user to limit the gene
  names and their associated sequences (this is done with parameters
  input-proteins-species and output-proteins-species)

- To get a detailed description of the names assigned to a protein in
  the databases and the name that PIANA has chosen for a given protein,
  read results from commands print-*-prots-info in compact mode. These
  results files produced by PIANA give information on all the protein
  identifiers and sequences that are associated to the protein name used 
  in the output.
     
- Attention! proteinPiana identifiers are not maintained across
  piana databases: you cannot use proteinPiana as an identifier for
  your protein of interest. MOT1_HUMAN might be 11111 in one piana
  database and 22222 in the next version created: it all depends on
  the order in which the proteins are inserted. Read more about this
  on section Piana Database (above).

- When analysing your results, you might face the following
  problem: two proteins that in reality are "the same" appear in
  the results as two separate entitities. This occurs when there
  are two similar sequences that were obtained from two different 
  databases and there was no co-reference between them. Different 
  techniques have been implemented to avoid this problem, but it 
  is not 100% solved.


Setting which interaction databases to use
------------------------------------------

PIANA databases can contain information from many external interaction
databases. In order to let the user choose which are the interactions
he want to use, there are two options: introduce restrictions on which
are the source databases to use (eg. DIP and mips) and introduce
restrictions on which detection methods are accepted (eg. y2h and
tap).  A user can also set both to all, which uses all interactions in
the PIANA database. This is achieved with parameters list-source-dbs
and list-source-methods in your PIANA configuration file.  Moreover,
if you want to use all interactions except those coming from a
specific database or method, you can set inverse-dbs (or
inverse-methods) to yes, which will exclude from the network any
interactions coming from list-source-dbs (or
list-source-methods). Read more about these parameters on
piana/code/execs/conf_files/general_template.piana_conf

When list-source-dbs (or list-source-methods) is all, inverse-dbs (and
inverse-methods) is not taken into account


PIANA results and output formats
--------------------------------

PIANA results are written to files, that you can then
read/process to make your analysis. Each of the results files will
have the results prefix you set in the configuration file, and a file
extension with the name of the command that originated those results.

For example, if you set as results prefix 'test' and you execute
commands print-network (with format-mode dot) and 
print-table (with format-mode txt), once PIANA has finished there
will be two results files:

 - test.print-network.dot 
   (a DOT file that can be use to create an image of the network)
 - test.print-table.txt 
   (where each line is one of the interactions found by PIANA).

Other outputs are HTML tables with the interactions (that can be 
visualized with your favourite web browser by opening the file 
on the browser and search for the local file) and SIF files, that
can be imported from programs such as Cytoscape for more detailed
visualization.

PIANA DOT result files are to be converted into an image using
a program that 'reads' DOT files (eg. neato). To learn how to
create images of networks from DOT files, please read file
piana/code/execs/README.visualize_piana_network. To learn how
to visualize SIF files using cytoscape, please refer to the
cytoscape web page http://www.cytoscape.org

PIANA txt result files can be visualized on any text editor or
parsed with your own scripts for further processing. Format for
each txt result file is described in the command documentation 
that originated that file. All commands are described in detail
on piana/code/execs/conf_files/general_template.piana_conf.


PIANA abbreviations (for methods, source databases, ...)
--------------------------------------------------------

PIANA uses abbreviations for describing the information associated to
interactions. For example, the detection methods for which PIANA has 
interactions. If you need to know which is the complete name
for this abbreviations, you should read the PIANA reference card.

 - Database abbreviations are set at the time of parsing by the PIANA
   administrator (it is automatically inserted when using the parsers we
   provide). 

- Method abbreviations are set in the python dictionary 
  method_names of PianaGlobals.py. Each element of this dictionary
  is as follows:

__mirar__ hemos cambiado el formato de este diccionario?

  "method_abbreviation" : [ name1 for method, 
                            name2 for method,
                            ................
                          ]

  If you are doing your own parser, you must make sure before
  inserting your interactions that your method name appears in this
  dictionary. Otherwise, the method name for your interactions will be
  'unknown', as PIANA won't know which abbreviation to use for your
  method name.


PIANA internals 
---------------

Maybe you are wondering how does PIANA work. In that case, this 
section is for you. If you just want to use PIANA for creating
and analyzing networks and do not really care about the 
internals, you can skip this section.

 1. proteinPianas

    Always keep in mind that the unique identifier for PIANA is 
    a proteinPiana. And, there is one different proteinPiana for
    each combination of (sequence, taxonomy id). Therefore, if
    there are two different proteins in the same species that
    have the same sequence, they will have the same proteinPiana.
    I don't known if this is entirely correct (I am a computer
    scientist) but I would guess that if there are two identical
    sequences in the same species, they might be considered the
    same protein, even if they are located in different organelles
    or perform different functions.

  2. PIANA networks only know proteinPianas

    Although no-one uses proteinPianas as a protein identifier 
    (except for me, but I am suposed to know what I am doing) 
    it is important to understand that PIANA networks always 
    identify nodes using proteinPianas. Other types of identifiers 
    you might use (eg. uniprot entries) are preprocessed before
    creating the networks and before presenting the results. 
    Therefore, when you ask PIANA to build a network, first thing
    it does is 'translating' your code into proteinPianas. Due
    to the wonderfully standarized world of protein identifiers,
    you will find all kinds of situations: one uniprot entry that
    has multiple proteinPianas, one proteinPiana that corresponds
    to multiple uniprot entries, new uniprot entries that are not
    known by PIANA (because your database has not been updated),
    etc. Therefore, even if you just give one code to PIANA for
    building the network, PIANA can internally represent that
    code as two different nodes (that might have different
    interactions).

    When you ask PIANA to give you results in a particular type
    of identifier, the opposite process occurs: PIANA 'translates'
    proteinPianas into your type of code. In that translation 
    process, many things can happen as well: what were 5 different
    nodes in the PIANA network might become a single node in the
    output when using your type of identifier. This single node will
    'inherit' the interactions and characteristics of the five nodes that
    were fused to form it. See section "PIANA and protein names" for more
    info on this. 

 3. Databases are full of errors

     Protein and interaction databases are full of errors, typos,
     strange characters, wrong labels, etc. Since PIANA parses
     these databases, PIANA databases are also full of errors. 
     And, let's be realistic: no parser is perfect, so you 
     should expect PIANA to contain more errors than the databases
     it uses to populate its own database. Some things are 
     corrected when parsing, but it is almost impossible to
     do it perfect... This said, we think our parsers are quite
     good. Until the next release of a third-party database, of
     course... Because, let me guess: they are going to change 
     (again) the format for the database, and we will have to 
     develop a new (or modified) parser to read the information.


4. When at doubt, look at the MySQL database

   When you see strange data of something you do not trust in
   PIANA results, it is always good to know a little mysql 
   and be able to query the PIANA database. 

   If you go to the machine where the database is placed, do
   '$> mysql' and then 'mysql> use name_of_your_pianaDB' then
   you can see the different tables 'mysql> show tables' or 
   ask for the description of each table 'mysql> desc table_name'

   A few "select" commands can help you to understand why PIANA
   gave you a specific result that you do not understand...

5. Third party databases change their formats

  Everyone working with biological databases knows that the
  formats are changed from one version to another without
  any advertisement. Moreover, some fields in the databases
  might be introduced incorrectly, and then the PIANA parsers
  that worked well before, do not work anymore. You can try
  correcting the error yourself (and send us the correction,
  please) or wait till somebody else does it for you. But,
  this is going to happen from time to time...


### =========================================================== ###
###           WHAT CAN I DO WITH PIANA                          ###
### =========================================================== ###


Before going into the specific details of things you can do with
PIANA, it is important that you understand the "concept of using
PIANA".

Basically, it is the following (when you are using PIANA as a user,
not as a developer):

1 - you've got a list of proteins that are of interest for you
   (hereafter referred as root proteins)

    -> that you have obtained from mass spectometry
    -> that your boss has asked you to study
    -> that you've read in an article that are important to a certain
       disease
    -> that belong to a pathway you are studying
    -> .....

2 - one way of studying a list of proteins is by analysing their
    interactions (ie their protein interaction network)

    -> for example, if all of your root proteins appear connected in
       the network, that means that you are looking at proteins that
       are closely related

    -> for example, you could identify other proteins that connect
       your root proteins between them. If a protein appears in the
       network connecting two root nodes (hereafter referred as a 
       linker protein) then it means that it is probably also 
       involved in the pathway you are studying.

3 - therefore, you feed PIANA with your root proteins, PIANA looks for
    the interactions and builds the network. Then analyses and
    printouts can be performed from this network.

4 - you can ask PIANA to print the network, print a table with the
    interactions, or identify linker proteins.

5 - you can do many other things, but you'll have to read
    piana/code/execs/conf_files/general_template.piana_conf to get a taste
    of all commands and parameters that PIANA accepts


piana/README.piana_examples explains in detail how to carry out each
of these tasks with PIANA


From the practical point of view, this is a non-exhaustive list of
things you can do with PIANA:

1. build a protein-protein interaction network

      - networks can be build from an input file or adding protein by
        protein, or adding interactions from a file, or specifying a
        species to retrieve its interactome, or ...

      - for any input protein, piana will look for interactions in the
        database and add them to the network, respecting the restrictions
        (e.g. use only yeast two hybrid interactions) you specified in 
        the input parameters of your piana configuration file.

      - commands and parameters for building networks are described in
        piana/code/execs/conf_files/general_template.piana_conf

2. do 'things' with the network

      - you can just print the network, or do expansions, or match
        proteins to spots, or map over/under expressed lists of
        proteins into the network, or search for keywords in the 
        description of proteins in the network, or ...

      - commands and parameters for doing things with the network 
        are described in
        piana/code/execs/conf_files/general_template.piana_conf

3. translate between protein identifiers or getting information about 
   proteins

      - some piana commands do not deal with networks but just with
        proteins

      - commands for retrieving protein information are described in
        piana/code/execs/conf_files/general_template.piana_conf

4. use the database and its interface to develop your own code

     - read README.piana_developers to learn more about this

5. use the Graph library and the PianaGraph library to develop your
   own code

     - read README.piana_developers to learn more about this

6. use the clustering library to perform your own clusterings

     - piana command for clustering according to GO terms (see
       general_template.piana_conf)

     - documentation can be found in
       piana/docs/documentation/pydoc_docs/Clustering.html


### =========================================================== ###
###                       SUMMARY                               ###
### =========================================================== ###

Before executing piana, you should:

  - install PIANA

  - know which pianaDB you want to use (look at the database component
    above to decide)

  - know which are your input proteins, and of which identifier type

  - know which piana commands you want to execute

  - write a configuration file following the template
    piana/code/execs/conf_files/general_template.piana_conf

  - know which interface suits your needs: piana.py or
    run_piana_protein_by_protein.py

  - execute the program, with the required command line parameters and
    setting argument --configuration-file=your_configuration_file

If you are still not sure how this works... 

--> Read piana/code/execs/README.piana_examples to see how to use
    PIANA for specific purposses.
    
    This file explains the procedure followed for some typical piana
    actions. Start from example 1 and execute example by example to 
    better understand how PIANA works. Once you understand the 
    examples you can start basing your own analyses on them.

--> Read piana/code/execs/conf_files/general_template.piana_conf
  
    This file explains how to create your own configuration file, and
    describes all the parameters and commands that are available in
    PIANA.