Course on structural bioinformatics
1) Principles of Protein Structure
2) Sequence Comparison
3) Structure Comparison
4) Principles of Comparative Modeling and Threading
6) Evaluation of the model.
7) Advanced exercices.
i. Open PDB file: open <name>
ii. Select chain A: select *A
iii. Coloring red: color red
iv. Format: ribbons
v. Remove from view: ribbons off
vi. Select residue 10 from chain A: select 10 && *A
vii. Select residue 10 and 20: select 10,20
viii. Select from 10 to 20: select 10-20
ix. Select polar atoms: select polar
ii. Exercise: Identify Polar/Non-polar properties in each fold
1. Open Up&Down b barrel
2. Remove from view chain B
3. Select chain A
4. Color white chain A
5. Select polar residues and color them in red
6. Do the same with the sheets of a Rossmann fold
7. Question 1: Why do you think the pattern in the sheet is different?
8. Question 2: Where will it be the active site on the Rossmann fold?
iii. Check the description of secondary structure given in a PDB formatted file (i.e. code 8fab). Visualize with RasMol the distribution of hydrogen bonds (command hbonds) and compare with the definition of secondary structures from the PDB file.
iv. Exercise: Open with your favorite editor one of the example files and understand the description of the coordinates for each atom. Identify the Ca atoms that describe the protein trace. Copy the file with another name, remove with the editor half of the atoms of the list and open it with RasMol.
i. Exercise: Compare the folds of b-propellers
ii. Determine the folds of your favorite protein from PDB (ie. 8fab)
2) Sequence Comparison
i. Select the database of search:
ii. Exercise: Do the search with your favorite sequence(s).
1. Compare the results of the search in different databases
2. What results contain more information?
3. From the tutorial of BLAST try to answer the following questions:
a. What’s the meaning of the e-value?
b. What are the substitution matrices?
c. What’s the dependence between the e-value and the length of our favorite sequence(s)?
i. Iterative search: do at least two iterative PSI-BLAST runs after a BLASTP search in the NR database of your favorite sequence(s).
ii. Choose a format to keep the PSSM.
iii. Use the PSSM matrix for searching on the PDB database
1. Compare the e-values of the same pair of sequences aligned with BLAST between two iterations. Why are they different?
2. When do you think we may need to do the search using a PSSM?
3. Search 4 sequences with known structures that can be comparatively aligned with each of the following sequences
c. Multiple Alignment of Sequences
i. Find the main motifs of your favorite sequence(s).
ii. How many motifs would you confirm are appropriate for it?
iii. What are the common motifs of your favorite sequence(s) and the 4 previously aligned sequences with known structure?
i. Split by domains this protein sequence
ii. Check the function, motifs and main properties in InterPro
3) Structure Comparison
b. Exercises: Check pairwise (2 proteins) and multiple (>2) alignments of known structures in the PDB and from structures uploaded from your own computer.
i. Use proteins of the families of Immunoglobulin VL-lambda and NKP44
iii. Use proteins from different superfamilies of the 6 blade b-propeller fold.
c. Exercise: Download the PDB files of the 4 sequences previously aligned with your favorite sequence(s) in 2.c.ii. Obtain the multiple structure superposition and extract the multiple alignment of their sequences. Compare the multiple alignments based on the structure, ClustalW, T-Coffee, PFAM and SMART.
d. Exercise: Compare the previous alignments with the alignments in HSSP database
1. Use the alignments obtained in 2.c.ii and 2.d.iv to run a driven modeling of your favorite sequence(s).
2. Extract the alignment with the best template (smallest e-value and largest Id%) from the alignments in 2.c.ii and 2.d.iv to run a driven modeling of your favorite sequence(s).
3. Obtain the structural alignment of the candidate templates using your favorite superposition program(exercise 3). Generate with hmmbuild the Hidden Markov Model. Align your templates and the target with the obtained profile using hmmalign. Extract the alignment with the best template (smallest e-value and largest Id%) from the multiple alignment and run a driven modeling of your favorite sequence(s).
iii. Compare your model with the structures of the templates using the structural superposition.
iv. Compare your model with the model in ModBase using the structural superposition.
v. Compare driven and automatic models by superimposition.
i. Compare the alignments of the related proteins with known structure and the alignments of the same sequences obtained with ClustalW, T-Coffee, PFAM and SMART.
ii. Compare by superposition the structures assigned by threading (all servers) and the structures obtained by sequence search
iv. Compare by superposition the models by homology and by threading. What are the main differences and why? What models do you think are more reliable and why?
g. Predict the putative fold of a sequence by threading and fold prediction of these problematic sequences: Split them in domains and assign the fold for each domain using all servers available
a. Getting used with ArchDB. Browsing the database of protein loops
i. Check loops with 4 residues between a-helix.
ii. Check the affinity of loops b-a with 5 residues length to bind ATP. What classes/sub-classes are the best?
iii. Check the loops of the template structures used on the model building of your favorite sequence(s)
iv. Download the structure of one of your templates and check the interval of residues of one of its loops with RasMol. Remove this loop from the structure (see exercise 1.b.iv) and upload the new coordinates to query as structure on the database. How it is classified the removed loop?
b. Query ArchDB with the model of your favorite structure (use any of the structures previously modeled for uploading the structure)
i. Compare the loops of the model and the loops of the templates (check if all belong to the same classes and sub-classes). Check the loops with different conformation between the model and the template.
ii. Check the classes and subclasses assigned (if any) to the loops of the target that could not be aligned with the sequence of the template(s).
iii. Compare the putative conformations of the loops of the model that were previously checked in 5.b.i: download the protein structures that contain the loops with the same geometry (disposition of secondary structures), extract the coordinates of the particular loop (see exercise 1.b.iv) and superpose them with the loop of the model (identical procedure for using these coordinates alone).
i. Compare the previous and last model of the loops using ArchDB.
6) Evaluation of the model.
a. Download and install the program Prosa 2003 on your computer.
i. Run the tutorial examples (sessions 1, 2) of the manual, by downloading form PDB the files: 2aat, 3aat, 1aaw and 1spa).
i. Compare both graphs of energy (for the results of ANOLEA you can use your favorite graphics program, ie EXCEL).
ii. Identify the picks of positive energy as those where the model is likely wrong and the best model among the ones you have build.
i. Compare the predicted secondary structure and the secondary structure of the model in the regions where the model is likely wrong.
ii. If the model and the prediction differ check the accuracy of the prediction. Modify the model accordingly by increasing or reducing the secondary structure elements.
iii. Reconstruct the model, check the loops and evaluate the new energy. Calculate the pseudo-energy of the new models and compare it with the previous models.
7) Advanced exercises:
a. Model the following sequence
b. We only know the sequence of a protein. Can you tell us what should be its function and if this can be performed?
c. We have the coordinates of a protein Ca trace. We wish to evaluate the difference between the pseudo-energy distribution along its sequence calculated with Prosa 2003 and with statistic potentials for Ca atoms and for Cb.
d. Detect the errors in the following model and fix them.