-------------------------- README.transition_from_1.2 -------------------------- If you are arriving to PIANA v1.4 from PIANA v1.2, you must read this file. (PIANA 1.3 was internally released and will never see the (public) light. ) There are a number of things you must take into consideration when transitioning from PIANA 1.2 to PIANA 1.4. This is a summary of your "To do" list: ======= - PIANA database ======= Your old PIANA v1.2 database won't work with PIANA 1.4. We are sorry, but we had to take this decision if we wanted to improve PIANA. There were many things that we weren't taking into account before, and we wanted to do a good database so the code could also be good. Therefore, you need to create a new PIANA database, using the new parsers. This is described in detail on README.populate_piana_db If you are interested in looking to the new database organization, look at piana/code/dbCreation/create_piana_tables.sql In our lab, we have made the transition smoothly by keeping the old code and the old database so that people not interested in the new PIANA could keep working with the old version. Any user can have both versions of PIANA simultaneously in his machine, just by changing the PYTHONPATH to the desired version. However, we recommend that everybody updates to v1.4, since this is much better version of PIANA than v1.2 ======= - PIANA configuration files ======= We have changed several things in the configuration files: - remove parameter ignore-unreliable from your configuration files This parameter no longer exists: we decided not to include any unreliable identifiers into PIANA, and therefore, it doesn't make sense to ignore something that doesn't exist. - all arguments and parameters *-proteins-type have been named *-id-type The way we were previously naming parameters (e.g. input-proteins-type) was a bit confusing, since input-proteins-type could refer to many aspects. Now, we have changed all parameters to names that are easily understood: for example, to set the type of protein identifier you are going to use for running PIANA, you now set input-id-type to your desired value. This affects not only input-proteins-type, output-proteins-type and expression-proteins-type but also arguments protein-type and proteins-type in PIANA commands. - commands for adding proteins and files with proteins to networks have new arguments. It is important that you read the new description in general_template.piana_conf before executing the new version of PIANA with your old configuration files. For example, you can now force PIANA to use a "secondary database" when adding interactions to your network. - there are new parameters and commands that you might want to include in your configuration files. Read piana/code/execs/conf_files/general_template.piana_conf to find out which paramaters and commands are available in this version. ======= - PIANA reference card ======= We have implemented an option in piana.py that prints a PIANA reference card, containing all types of useful information for PIANA users, such as the PIANA names for the types of protein identifiers, the databases that have been inserted into your PIANA database, a summary of commands and parameters, etc, etc. To print this card, refer to piana/README.piana_tutorial section "PIANA's reference card" ======= - Internal aspects ======= If you use PIANA from the inside (e.g. you use PianaDBaccess to get proteinPianas associated to a specific protein identifier), you must take into account the following: - get_list_protein_piana() has become get_list_proteinPiana() Apart from changing the name of the method, we have changed the way it works. This method used to receive a sql column as the argument, which didn't make much sense: a user of PianaDBaccess is not suposed to know anything about sql tables and columns. Therefore, we have changed the arguments: get_list_proteinPiana now receives the PIANA name for types of protein identifiers, for example 'uniacc'. - we used to use utilities.py to obtain the slq column and table associated to a certain identifier type. We no longer do it like that: now, PianaDBaccess is responsible for knowing which table and column correspond to each identifier type. This is kept in the database, and the parser is resposible for inserting that information at the time of parsing. - PianaGlobals used to contain a dictionary with valid external databases. This has been replaced by a table in the PIANA database: it makes it easier to update the databases that have been inserted into the database. - PianaGlobals used to contain a dictionary with valid identifier types. This has been replaced by a table in the PIANA database. - Identifier type geneName has now a primary mode, in which the official name is returned instead of just returning any gene name. This works in the same way it used to work for uniprot accession numbers. ======= - Parsers ======= The instructions for creating your own PIANA database are in README.populate_piana_db. There are new parsers for new databases (e.g. MINT, IntAct, gene_info, ...). Moreover, the parsers are much faster than they used to be. Finally, pay attention to the new arguments passed to the parsers, such as a description of which type of information is it parsing. Apart from updating all parsers to make them compatible with the latests versions of databases, we have made them faster by doing some MySQL tricks, such as locking/unlocking tables or going through a temporal table. This doesn't change too much the way you should implement parsers, but, if you want to use the tricks, you should pay attention to the README file we have written on how to write PIANA parsers. More information about this can be found at piana/README.piana_developers ======= - New functionalities ======= Read piana/code/execs/conf_files/general_template.piana_conf to discover all the new functionalities of PIANA. As a summary, these are some new things: - faster access to the database, smaller database thanks to new variable types used in MySQL tables, ... - many new types of protein identifiers accepted: ensembl, refseq, geneID, ... - many new parsers for interactions: IntAct, bioGrid, MINT, ... - When building the network, PIANA will search for interactions between proteins in the last depth level. In v1.2, if you asked for the network of protein A, and A interacted with B,C and D, and C and D interacted between themselves, the network built by PIANA for A at depth 1 would not include the interaction between C and D. In this new release, all interactions are shown. - new input parameter for your configuration files: special-proteins This can be used to highlight or label multiple groups of proteins according to your own criteria - a command for identifying interacting motifs in your proteins Command calculate-imotifs was used for our latest paper. - a command for labeling proteins by their number of interactions to root proteins Command classify-network-proteins will label your proteins according to how many root proteins they are connected to - new script to predict interactions based on common properties For example, if you have done blast2all for proteins in PIANA, you can predict interactions based on orthology using this script. It is located on piana/code/predictions/interactions/