Course
on structural bioinformatics
Tutorial
Index
1)
Principles
of Protein Structure
2)
Sequence
Comparison
3)
Structure
Comparison
4)
Principles
of Comparative Modeling and Threading
5)
6)
Evaluation
of the model.
7)
Advanced exercices.
Tutorial
1)
Principles of Protein Structure and Protein Crystallography
a.
Visualization with RasMol: Download the program RasMol
and install it in your computer. Download a set of example
proteins.
i.
Open PDB
file: open <name>
ii.
Select
chain A: select *A
iii.
Coloring
red: color red
iv.
Format:
ribbons
v.
Remove
from view: ribbons off
vi.
Select
residue 10 from chain A: select 10 && *A
vii.
Select
residue 10 and 20: select 10,20
viii.
Select
from 10 to 20: select 10-20
ix.
Select
polar atoms: select polar
b.
Use the
list of examples and the database of protein structures
PDB for the following
exercises:
i.
Exercise:
How many chains and domains are in the problem
structure?. Identify the type of fold of each domain of this problem.
ii.
Exercise:
Identify Polar/Non-polar properties in each fold
1.
Open
Up&Down b barrel
2.
Remove
from view chain B
3.
Select
chain A
4.
Color
white chain A
5.
Select
polar residues and color them in red
6.
Do the
same with the sheets of a Rossmann fold
7.
Question
1: Why do you think the pattern in the sheet is different?
8.
Question
2: Where will it be the active site on the Rossmann fold?
iii.
Check the
description of secondary structure given in a PDB formatted file (i.e. code
8fab). Visualize with RasMol the distribution of hydrogen bonds (command
hbonds) and compare with the definition of secondary structures from the PDB
file.
iv.
Exercise:
Open with your favorite editor one of the example files and understand the
description of the coordinates for each atom. Identify the Ca atoms that describe the protein trace. Copy
the file with another name, remove with the editor half of the atoms of the
list and open it with RasMol.
c.
SCOP / CATH / DALI / DBAli / HOMSTRAD
i.
Exercise:
Compare the folds of b-propellers
ii.
Determine
the folds of your favorite protein from PDB (ie. 8fab)
2)
Sequence
Comparison
i.
Select the
database of search:
1.
Searching
your favorite sequence(s) in UniProt/Swiss-Prot
2.
Searching
your favorite sequence(s) in PDB
ii.
Exercise:
Do the search with your favorite sequence(s).
1.
Compare
the results of the search in different databases
2.
What
results contain more information?
3.
From
the tutorial of BLAST try to answer the
following questions:
a.
What’s the
meaning of the e-value?
b.
What are
the substitution matrices?
c.
What’s the
dependence between the e-value and the length of our favorite sequence(s)?
i.
Iterative
search: do at least two iterative PSI-BLAST runs after a BLASTP search in the
NR database of your favorite sequence(s).
ii.
Choose a
format to keep the PSSM.
iii.
Use the
PSSM matrix for searching on the PDB database
iv.
Questions:
1.
Compare
the e-values of the same pair of sequences aligned with BLAST between two
iterations. Why are they different?
2.
When do
you think we may need to do the search using a PSSM?
3.
Search 4
sequences with known structures that can be comparatively aligned with each of
the following sequences
c.
Multiple
Alignment of Sequences
i.
Download 4
sequences of the same family of Globins and 4 sequences of Phycocyanin-like
phycobilisome proteins and
obtain a multiple alignment using : ClustalW
and T-Coffee
ii.
Align the
4 sequences found in the previous exercise with PSI-BLAST and your favorite
sequence(s) using ClustalW and T-Coffee.
d.
Hidden
Markov Models: PFAM / SMART / Superfamily (it may be
useful to download and install on your computer the package HMMER, or run it on the web server)
i.
Search the
PFAM / SMART and/or
Superfamily domain
of your favorite sequence(s).
ii.
Download
the PFAM / SMART profiles where the favorite sequence(s) belongs
iii.
Obtain the
alignment of this sequence and the sequence with known structure and smallest e-value
by PSI-BLAST using the previous PFAM
/ SMART profile.
iv.
Obtain a
multiple alignment of the 4 sequences previously aligned with clustalW and
T-coffee plus our favorite sequence(s) using the PFAM / SMART profile
and compare the alignments.
v.
Search
with the chosen PFAM / SMART profile in the set of sequences of
PDB+NR+Swissprot+PIR.
e.
Searching
short FingerPrints
and PROSITE
patterns: ScanProsite, Motifs, PPSearch
i.
Find the
main motifs of your favorite sequence(s).
ii.
How many
motifs would you confirm are appropriate for it?
iii.
What are the
common motifs of your favorite sequence(s) and the 4 previously aligned sequences
with known structure?
f.
Sequence
domains: InterPro / CDD / Prodom
i.
Split by
domains this protein sequence
ii.
Check the
function, motifs and main properties in InterPro
3)
Structure
Comparison
a.
Understand
the methods from the manual and tutorials of the programs for 3D superposition:
CE / MAMMOTH / STRAP / STAMP / SUPERPOSE/ MISTRAL/ DBAli
b.
Exercises:
Check pairwise (2 proteins) and multiple (>2) alignments of known structures
in the PDB and from structures uploaded from your own computer.
i.
Use
proteins of the families of Immunoglobulin VL-lambda and
NKP44
ii.
Use
proteins of the superfamilies of Globins and Phycocyanin-like
phycobilisome proteins
iii.
Use
proteins from different superfamilies of the 6 blade b-propeller fold.
c.
Exercise:
Download the PDB files of the 4 sequences previously aligned with your favorite
sequence(s) in 2.c.ii. Obtain the multiple structure superposition and extract the
multiple alignment of their sequences. Compare the multiple alignments based on
the structure, ClustalW, T-Coffee, PFAM and SMART.
d.
Exercise:
Compare the previous alignments with the alignments in HSSP
database
4)
Principles of Comparative Modeling and
Threading
e.
Model
building of your favorite sequence(s) in the servers of: ModWeb, Swiss-Model and 3D-Jigsaw
i.
ii.
Driven
1.
Use the
alignments obtained in 2.c.ii and 2.d.iv to run a driven modeling of your
favorite sequence(s).
2.
Extract
the alignment with the best template (smallest e-value and largest Id%) from
the alignments in 2.c.ii and 2.d.iv to run a driven modeling of your favorite sequence(s).
3.
Obtain the
structural alignment of the candidate templates using your favorite superposition
program(exercise 3). Generate with hmmbuild
the Hidden Markov Model. Align your templates and the target with the obtained
profile using hmmalign.
Extract the alignment with the best template (smallest e-value and largest Id%)
from the multiple alignment and run a driven modeling of your favorite sequence(s).
iii.
Compare
your model with the structures of the templates using the structural
superposition.
iv.
Compare
your model with the model in ModBase using
the structural superposition.
v.
Compare
driven and automatic models by superimposition.
f.
Assign the
fold of your favorite sequence(s) using threading: 3D-PSSM/PHYRE, FUGUE, LOOPP , Threader and PredictProtein(TOPITS)
i.
Compare
the alignments of the related proteins with known structure and the alignments
of the same sequences obtained with ClustalW, T-Coffee, PFAM and SMART.
ii.
Compare by
superposition the structures assigned by threading (all servers) and the
structures obtained by sequence search
iii.
Model
build your favorite sequence(s) using the alignment obtained by threading: 3D-PSSM/PHYRE, FUGUE, LOOPP and PredictProtein(TOPITS).
iv.
Compare by
superposition the models by homology and by threading. What are the main
differences and why? What models do you think are more reliable and why?
g.
Predict
the putative fold of a sequence by threading and fold prediction of these
problematic sequences: Split them in domains and assign
the fold for each domain using all servers available
h.
Model
build with the servers ModWeb, Swiss-Model and 3D-Jigsaw the models of the
domains of the problematic sequences using the thread
alignments.
5)
a.
Getting
used with ArchDB.
Browsing the database of protein loops
i.
Check
loops with 4 residues between a-helix.
ii.
Check the
affinity of loops b-a with 5 residues length
to bind ATP. What classes/sub-classes are the best?
iii.
Check the
loops of the template structures used on the model building of your favorite sequence(s)
iv.
Download
the structure of one of your templates and check the interval of residues of
one of its loops with RasMol. Remove this loop from the structure (see exercise
1.b.iv) and upload the new coordinates to query as structure on the database.
How it is classified the removed loop?
b.
Query ArchDB with the model of
your favorite structure (use any of the structures previously modeled for
uploading the structure)
i.
Compare
the loops of the model and the loops of the templates (check if all belong to
the same classes and sub-classes). Check the loops with different conformation
between the model and the template.
ii.
Check the
classes and subclasses assigned (if any) to the loops of the target that could
not be aligned with the sequence of the template(s).
iii.
Compare
the putative conformations of the loops of the model that were previously
checked in 5.b.i: download the protein structures that contain the loops with
the same geometry (disposition of secondary structures), extract the
coordinates of the particular loop (see exercise 1.b.iv) and superpose them
with the loop of the model (identical procedure for using these coordinates
alone).
c.
Model
build the conflictive loops with the server ArchPred, and ModLoop
i.
Compare
the previous and last model of the loops using ArchDB.
6)
Evaluation
of the model.
a.
Download
and install the program Prosa 2003 on
your computer.
i.
Run the
tutorial examples (sessions 1, 2) of the manual, by downloading form PDB the
files: 2aat, 3aat, 1aaw and 1spa).
b.
Evaluate
the pseudo-energy of your model(s), according to statistical potentials, with ANOLEA, Verify3D and with Prosa 2003.
i.
Compare
both graphs of energy (for the results of ANOLEA you can use your favorite
graphics program, ie EXCEL).
ii.
Identify
the picks of positive energy as those where the model is likely wrong and the
best model among the ones you have build.
c.
Run the
prediction of secondary structure of your target sequence
with PSIPRED, JPRED , PROF, and PredictProtein(PHD)
i.
Compare
the predicted secondary structure and the secondary structure of the model in
the regions where the model is likely wrong.
ii.
If the
model and the prediction differ check the accuracy of the prediction. Modify
the model accordingly by increasing or reducing the secondary structure
elements.
iii.
Reconstruct
the model, check the loops and evaluate the new energy. Calculate the
pseudo-energy of the new models and compare it with the previous models.
7)
Advanced
exercises:
a.
Model the
following sequence
b.
We only
know the sequence of a protein. Can you tell us what
should be its function and if this can be performed?
c.
We have
the coordinates of a protein Ca trace. We wish to evaluate the difference
between the pseudo-energy distribution
along its sequence calculated with Prosa 2003 and with statistic
potentials for Ca atoms and for Cb.
d.
Detect the
errors in the following model and fix them.