Due to the well-known fact that amino acid sequence homology at a given level leads to similar 3D structure of proteins, several databases are interrelating the databases of sequences and structures. However, the term homology, a fundamental concept in bioinformatics, is often used incorrectly . Sequences are homologous if they are related by divergence from a common ancestor (as a first consequence, the search for homology in the sequence database is used to determine indications for function of proteins). Conversely, analogy relates to the acquisition of common structural or functional features via convergent evolution from unrelated ancestors . Homology is not a measure of similarity, but rather an absolute statement that sequences have a divergent rather than a convergent relationship. Among homologous sequences we can distinguish orthologs (proteins having the same function in different species) and paralogs (proteins performing different but related functions within one organism).
The model building of a target structure based on the comparison with the data extracted from homologous sequences with known structures (parents or templates) is named comparative modelling. Besides, this can be extended to homologs with low percentage of identity. All current comparative modelling methods consist of four sequential steps :1) fold assignment and template selection; 2) template-target alignment; 3) model building; and 4) model evaluation.
STEP 1: Fold assignment
To start the modeling process, we have to identify the template and define an alignment (residue-by-residue equivalences between the target and the template sequences. In homology modelling the stretches to be built are chosen according to their sequence alignment, consequently this is the most crucial step in a modeling process. Any errors at this stage are usually impossible to correct later . The sequences of the fold having the larger similarity with the target sequence will be taken as parents or templates. Currently, around 40% of all protein sequences can have at least one domain modelled on a related known protein structure . In particular, some proteins can have very low sequence identity and yet all share the same fold and a closely related function . The current theory of evolution would hold that such structures, having diverged from a common ancestor, often retain some functional and sequence similarity . In addition, divergent evolution has been recently reported on the basis of a biochemical pathway evolution for some proteins with a common (ba)8 barrel fold for which sequence similarity was not detected .
Originally, searches of homologous sequences to the target were done with local alignement programs as for example: FASTA ; SSEARCH or BLAST that are able to find identities shared between pairs of related sequences. With the high rate at which new sequences become available from genomic initiatives the importance of the sensitive methods of recognizing distant homologies has increased. Such methods are the main source of annotation, hence in the last decade very sensitive approaches have been developed to recognise fold. They have succeeded in different degrees of identification of relationships between remote homologues. These methods include:
2) Advanced sequence comparison procedures that take into account multiple sequence alignments with a position specific scoring system , either provided by a coherent theory for profile methods using machine learning probabilistic models (Hidden Markov Models) ; by a position specific iterative BLAST (PSI-BLAST) ; or by searching in sequence space using intermediate sequences (ISS) . These methods were shown to get better results than simple threading .
3) Finally, new approaches incorporating sequence profiles and knowledge-based threading potential have been used, improving the recognition of remote homologues
2) It exploits the transitivity of homology like the intermediate sequence search , by which a query sequence is aligned to a database (i.e. SWISS-PROT) . Then, all aligned sequences with high significance similarity (E-values<0.001) are used as new seeds and this is iterated until no new sequences are found. This procedure implies a larger search than the obtained by a single sequence search.
STEP 2: Template selection and alignment
For the template selection, one or more templates can be used. The use of multiple templates is not justified when the sequence spread between parents, relative to the target, is not appropriate for the level of expected model error. If both the average level of sequence identity between target and parents is larger than 40% and the sequence spread is too small between parents, then a single parent is used . The search on the database produces several local alignments according to the best score that correlates both target and template sequences. However, this is not necessarily the best alignment to identify residue correspondences and construct the target protein conformation, because the procedure was tuned to find remote homolgues and not the best alignment. Therefore, although target and templates are likely to be correctly aligned if sharing more than 40% identity, they need to be realigned if they are in the "twilight zone" sharing less than 30% identity.
The optimal alignment between homologous proteins, one of them with known 3D structure (template), is further used for constructing a model of the spatial structure of the target. However, after superposition of protein cores, amino acids from loop regions can be significantly displaced . At least 2/3 of the comparative protein modelling cases are based on less than 40% sequence identity between target and templates. To obtain a reasonable level of accuracy, the models must be based on alignments with few errors. Such alignments can usually be obtained when the sequence identity between the modelled sequence and at least one known structure is larger than 30% . A remarkable improvement is obtained by using multiple alignments of global sequence plus additional structural informations instead of the pair sequence local alignments used on the search of likely relatives. Several alignment programs ( MULTIALIGN ; MULTAL ; CLUSTALW ) have been tested against a database of correctly aligned multiple sequences ( BaliBase) . After all, the recent approaches that include local and pre-processed alignments, like those already found by using PSI-BLAST (i.e. DbClustal ); or those recalculating the local ( i.e. using Lalign ) and pre-processed alignments for segemnt pairs (i.e. using Dialign2 ) as for example the program T-Coffee ; or by iterative refinement of the multiple alignement like the program Prrp have obtained extraordinary good results.
Nevertheless, all these alignements loose the structural information given by those templates for which the conformation is known. On superimposing very similar structures upon one another, one is immediately able to distinguish regions of higher conservation; these are commonly referred to as structurally conserved regions (SCRs), whilst those regions that present the largest differences in conformation are referred as structurally variable regions (SVRs). In order to avoid the lost of structural information we suggest the following re-alignement between the target a sequence and the template:
2) With the sequence alignment obtained previously for these templates proceed to calculate a hidden markov profile to align the target sequence to the HMM profile. or alternatively, use some of the following steps instead:
3') To align the related sequences (target , templates and extra related sequences) with Dbclustal, T-coffee, Prrp, etc.; check for the closest result to the structural alignment and refine manually the alignment of the target sequence.
3'') Use all the different alignments obtained by step 3 and/or 3' to model built several models and evaluate the final model by other means (see "evaluation of the model") to choose the best model.
3''') To align all the related sequences as in 3'; obtain hidden
markov profiles with these alignements and align both hidden markov profiles
obtained structuraly (from step 2) and sequencially (as in 3'). Several
alignements of the target sequence with the templates with known 3D structure
are extracted from the final alignments. These alignements will be used
to model built several conformations of the target sequence as in 3'' and
the resulting models will also be evaluated by other methods in order to
choose the best model.
Methods of model building
Several algorithms have been developed in order to obtain a rigid body superimposition between sequences no directly related (JIG-SAW , COMPOSER , among others). SCR construction follows the original approach of Greer using sequentially similar SCR from homologous proteins to define the new core from a multiple alignment: 1) superimposing the known structures of homologous proteins (parents) using the SCRs to construct a framework; 2) superimposing the closest template sequence to the target sequence in the averaged main chain of framework; 3) building the SVRs main chain conformations by fitting compatible structures in the anchored stumps of the framework (see section on SVRs modelling for identification of the stretches to use); and 4) completing the target structure by modelling the side-chains of the target sequence.
The methods based on the satisfaction of spatial restraints (like
MODELLER ) are based on generating as many constraints (or restraints)
as possible from the structural alignments of the parents and building
the target structure like in the NMR methods (using additional energetic
restraints according to the correct stereochemistry of the protein polymer).
It is clear that regions where the structure of the homologous templates
can not be structurally aligned, or where an alignment between the target
and the multiple alignment of the templates is not given, will have to
be built with an additional function. Most of the structural changes are
produced in the loop regions, but occasional secondary structures may also
be involved in variable regions . In the case of multiple superimposed
parents the coordinates are separated into conserved secondary structure
elements and conserved loops.
SVRs modelling can be seen as a mini protein folding problem, consequently the number of methods for predicting loop conformation are twofold: ab initio methods and adopting database searching techniques or knowledge-based approaches
1. The ab initio prediction is based on a conformational search guided by a scoring or energy function: (f,y) space sampling ; minimum perturbation random tweak method ; systematic conformational search ; global energy minimization , local energy minimization ; molecular dynamics simulations ; genetic algorithms ; Monte Carlo and molecular dynamics ; Monte Carlo sampling ; multiple copy sampling ; searching discrete conformations by dynamic programming ; self-consistent field optimization ; among others (for a review see )
2. The database approach to loop prediction consists of finding a segment of main chain that fits the two stem regions of a loop. The procedure has improved since the early works on modeling and in the last few years instead of a single conformation a number of loop conformations are selected for each gap that is as uniformely spread as possible . Hence, the remaining loops from the multiple parent modelling and all loops in the single parent modelling are modelled from database searches in three different databases: 1) homologous structures ; 2) cluster database of loops ; and 3) nonredundant database of proteins with less than 25% homology and accuracy higher than 2.5 A.
The requirements of the chosen loop cluster of conformations
are twofold: 1) the fitting between the two bracing secondary structures,
and 2) a sequence pattern presented in the target loop to model. This procedure
is based on the successful work on canonical loop structures of immunoglobulin
complementary determining regions (CDR) by Chothia et al.. Nevertheless,
the database search is valid only for short and medium sized loops or for
special cases where homologous proteins share some structural commonalities
on the loops although still being considered variable regions (as is the
case for immunoglobulins ). Up to date classifications of long loops have
failed, and it has been demonstrated that a correlation between the geometric
variables describing the loop stems is needed in order to obtain such classification.
This was only asserted for short and medium sized loops .
The source of errors in comparative modelling is mainly due to the lack of templates and the decrease in sequence identity between the target and the templates. These errors are split in five categories:
Finally, the recent work of Lazaridis and Karplus , shows the improvement on the classical molecular mechanics calculation of the energy by including solvation (environmental) terms to detect wrongly modelled regions. Consequently, the criticism on the potential of mean force can not be applied to this approach that did perform as well as statistical functions in discriminating correct and misfolded models .
The experimental evaluation of the model can only be done by site directed mutagenesis or additional information which is not commonly obtained. One way to escape the experiment is by using the knowledge obtained from a highly spread multiple alignments of related sequences introducing the following conditions:
REFERENCES
N. Alexandrov and R. Luethy. (1998). Alignment algorithm for homology modeling and threading. Protein Sci 7, 254-258.
B. Al-Lazikani, A. Lesk and C. Chothia. (1997). Standard conformations for the canonical structures of immunoglobulins. J. Mol. Biol. 273, 927-948.
P. Aloy, J. Mas, M. Martí-Renom, E. Querol, F. Avilés and B. Oliva. (2000). Refinement of modelled structures by knowledge based energy profiles and secondary structure prediction: Application to the Human Procarboxypeptidase A2. J Comput-Aided Molec. Des. 14, 83-92.
S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.
T. Attwood. (2000). The Babel of Bioinformatics. Science 290, 471-473.
A. Bairoch and R. Apweiler. (1997). The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acid Res. 25, 31-36.
G. Barton and M. Sternberg. (1987). A strategy for the rapid multiple alignmentof protein sequences; confidence levels from tertiary structure comparisons. J. Mol. Biol. 198, 327-337.
A. Bateman, E. Birney, R. Durbin, S. Eddy, K. Howe and E. Sonnhammer. (2000). The Pfam protein family database. Nucleic Acid Res. 28, 263-266.
P. Bates and M. Sternberg. (1998). From Sequence to Structure. Protein Structure Prediction: A practical approach (M. Sternberg, Ed.), Oxford Univ. Press, Oxford,UK.
P. A. Bates and M. Sternberg. (1999). Model building by comparison at CASP3: Using expert knowledge and computer automation. Proteins: Struct., Func. and Gene. Suppl. 3, 47-54.
D. Bowie, J. U. Luthy and D. Eisenberg. (1991). A method to identify protein sequences that fold into a known-3D structure. Science 253, 164-170.
B. Brooks, R. Bruccoleri, B. Olafson, D. States, S. Swaminathan and M. Karplus. (1983). CHARMM: a program for macromolecular energy minimization and dynamics calculations. J. Comp. Chem. 4, 187-217.
R. Bruccoleri and M. Karplus. (1987). Prediction of the foldingof short polypetide segments by uniform conformational sampling. Biopolymers 26, 137-138.
A. Brünger. (1992). X-PLOR: A system for X-ray crystallography and NMR. Yale University Press, New haven.
V. Collura, J. Higo and J. Garnier. (1993). Modeling of protein loops by simulated annealing. Protein Sci. 2, 1502-1510.
R. Copley and P. Bork. (2000). Homology among ba8 barrels: implications for the evolution of metabolic pathways. J. Mol. Biol. 303, 627-640.
C. Chothia, A. Lesk, A. Tramontano, M. Levitt, S. Smith-Gill, G. Air, S. Sheriff, E. Padlan, D. Davies, W. Tulip, P. Colman, S. Spinelli, P. Alzari and R. Poljak. (1989). Conformations of Immunoglobulin Hypervariable Regions. Nature 342, 877-883.
S. Chung and S. Subbiah. (1996). A structural explanation for the twilight zone of protein sequence homology. Structure 4, 1123-1127.
C. Deane, Q. Kaas and T. Blundell. (2001). SCORE: predicting the core of protein models. Bioinformatics 17, 541-550.
R. Dima, J. Banavar and A. Maritan. (2000). Scoring functions in protein folding and design. Protein Sci. 9, 812-819.
F. S. Domingues, W. A. Koppensteiner, M. jaritz, A. Prlic, C. Weichenberger, M. Wiederstein, H. Floeckner, P. lackner and M. Sippl. (1999). Sustained performance of knwoledge-based potentials in fold recognition. Proteins: Struct., Func. & Gene. Suppl. 3, 112-120.
L. Donate, S. Rufino, L. Canard and T. Blundell. (1996). Conformational analysis and clustering of short and medium size loops connecting regular secondary structures. A database for modelling and prediction. Proteins Sci. 5, 2600-2616.
M. Dudeck, K. Ramnarayan and J. Ponder. (1998). Protein structure prediction using a combination of sequence homology and global energy minimization: II. Energy functions. J. Comp. Chem. 19, 548-573.
S. Eddy. (1998). Profile hidden markov models. Bioinformatics 14, 755-763.
K. Fidelis, P. Stern, D. Bacon and J. Moult. (1994). Comparison of systematic search and database methods fro constructing segments of protein structure. Protein Eng. 7, 953-960.
D. Fischer and D. Eisenberg. (1996). Protein fold recognition using sequence-derived predictions. Protein Science 5, 947-955.
A. Fiser, R. Do and A. Sali. (2000). Modeling of loops in protein structures. Protein Sci. 9, 1753-1773.
I. Friedberg, T. Kaplan and H. Margalit. (2000). Evaluation of Psi/Blast algnment accuracy in comparison to structural alignments. Protein Sci 9, 2278-2284.
D. W. Gatchell, S. Dennis and S. Vajda. (2000). Discrimination of Near-native Protein Structures from Misfolded Models by Empirical Free Energy Functions. Proteins: Struct., Func. & Gene. 41, 518-534.
C. Geourjon, C. Combet, C. Blanchet and G. Deleague. (2001). Identification of related proteins with weak sequence identity using secondary structure information. Protein Sci. 10, 788-797.
O. Gotoh. (1996). Significant inprovement in accuracy of multiple sequence alignments by iterative refinements assessed by reference to structural alignments. J. Mol. Biol. 264, 823-838.
J. Greer. (1990). Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins: Struc. Func. and Gene. 7, 317-334.
W. v. Gunsteren, S. Billeter, A. Eising, P. Hünenberger, P. Früger, A. Mark, W. Scott and I. Tironi. (1996). Biomolecular Simulation: The GROMOS96 Manual and User Guide. Verlag der Fachvereine, Zürich.
R. Hooft, G. Vriend and C. Sander. (1996). Verification of protein structures: side-chain planarity. J. Appl. Crystallogr. 29, 714-716.
X. Huang and W. Miller. (1991). A time-efficient linear-space local similarity algorithm. Advan. Appl.Math. 12, 337-357.
J. Irving, J. Whisstock and A. Lesk. (2001). Protein structural alignments and functional genomics. Proteins: struc. Func and Gene. 42, 378-382.
L. Jaroszewski, L. Rychlewski and A. Godzik. (2000). Improving the quality of twilight-zone alignments. Protein Sci. 9, 1487-1496.
A. Jennings, C. Edge and M. Sternberg. (2001). An approach to improving multiple alignments of protein sequences using predicted secondary structure. Protein Eng. 14, 227-231.
D. Jones. (1999). GenTHREADER: an efficient and reliable protein fold recognition method for genomicsequences. J. Mol. Biol. 287, 797-815.
T. A. Jones and S. Thirup. (1986). Using known substructures in protein model building and crystallography. EMBO J. 5, 819-822.
K. Karplus, C. Barrett, M. Cline, M. Diekhans, L. Grate and R. Hughey. (1999). Predicting proteins tructure using only sequence information. Proteins: Struc. Func. and Gene. Suppl 3, 121-125.
L. A. Kelley, R. M. MacCallum and M. Sternberg. (2000). Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499-520.
A. Kidera. (1995). Enhanced conformational sampling in Monte carlo simulations of proteins: Applications to a constrained peptide. Proc. Natl. Acad. Sci. USA 92, 9886-9889.
P. Koehl and M. Delarue. (1995). A self-consistent mean field approach to simultneous gap closure and side-chain positioning in protein homology modeling. Nat. Struct. Biol. 2, 163-170.
P. Koehl and M. Delarue. (1996). Mean-field minimization methods for biological macromolecules. Curr. Opin. Struct. Biol. 6, 222-226.
R. Laskowski, M. MacArthur and J. Thornton. (1998). Validation of Protein models derived from experiment. Curr. Opin. Struct. Biol. 5, 631-639.
T. Lazaridis and M. Karplus. (1999). Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J. Mol. Biol. 288, 477-487.
J. U. Luthy, D. Bowie and D. Eisenberg. (1992). Assesment of protein models with three dimensional profiles. Nature 356, 83-85.
A. Martin, J. Cheetham and A. Rees. (1989). Modeling antibody hypervariable loops: a combined algorithm. Proc. Natl. Acad. Sci. USA 86, 9268-9272.
A. Martin and J. Thornton. (1996). Structural Families in Loops of Homologous Proteins: Automatic Classification, Modelling and Application to Antibodies. J.Mol.Biol. 263, 800-815.
M. Martí-Renom, J. Mas, P. Aloy, E. Querol, F. Aviles and B. Oliva. (1998). Statistical Analysis of the loop-geometry on a non-redundant database of proteins. J Mol. Mod. 4, 347-354.
M. A. Martí-Renom, A. Stuart, A. Fisher, R. Sánchez, F. Melo and A. Sali. (2000). Comparative protein structure modeling of genes and genomes. Ann. Rev. Biophys. Biomolec. Struc. 29, 291-325.
C. Mattos, G. Petsko and M. Karplus. (1994). Analysis of two residue turns in proteins. J.Mol. Biol. 238, 733-747.
M. McGregor, S. Islam and M. Sternberg. (1987). Analysis of the relationship between side-chain conformation and secondary structure in globular proteins. J. Mol. Biol. 198, 295-310.
F. Melo and E. Feytmans. (1997). Novel knowledge-based mean force potential at atomic level. J. Mol. Biol. 267, 207-222.
F. Melo and E. Feytmans. (1998). Assessing protein structures with a non local atomic interaction energy. J. Mol. Biol. 277, 1141-1152.
V. Morea, A. Tramontano, M. Rustici, C. Chothia and A. Lesk. (1998). Conformations of the third hypervariable region in the VH domain of immunoglobulins. J. Mol. Biol. 275, 265-294.
B. Morgenstern. (1999). Dialign2: improvement of the segment-to-segemnt approach to multiple sequence alignment. Bioinformatics 15, 211-218
J. Moult and M. James. (1986). An algorithm for determiningthe conformation of polypeptide segments in proteins by systematic search. Proteins: Struc. Func. and Gene. 1, 156-163.
N. Nakajima, J. Higo and A. Kidera. (2000). Free energy landscapes of peptides by enhanced conformational sampling. J. Mol Biol. 296, 197-216.
C. Notredame, D. Higgins and J. Heringa. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217.
T. Oldfield. (1992). Squid: a program for the analysis and display of data from crystallography and molecular dynamics. J. Mol. Graph. 10, 247-252.
B. Oliva, P. Bates, E. Querol, F. Avilés and M. Sternberg. (1997). An automatic Classification of the structure of protein loops. J. Mol. Biol. 266, 814-830.
B. Oliva, P. Bates, E. Querol, F. Avilés and M. Sternberg. (1998). Automated Classification of Antibody Complementarity Determining Region 3 of the Heavy Chain (H3) Loops into Canonical Forms and Its Application to Protein Structure Prediction. J. Mol. Biol.(279), 1193-1210.
O. Olmea, B. Rost and A. Valencia. (1999). Effective use of sequence correlation and conservation in fold recognition. J. Mol. Biol. 293, 1221-1239.
A. Panchenko, A. marchler-Bauer and S. H. Bryant. (2000). Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296, 1319-1331.
K. Pawlowski, A. Bierzynski and A. Godzik. (1996). Structural diversity in a family of homologous proteins. J. Mol. Biol. 258, 349-366.
W. Pearson. (1996). Effective protein sequence comparison. Meth. Enz. 266, 227-258.
W. Pearson and D. Lipman. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444-2448.
R. Petrella, T. Lazaridis and M. Karplus. (1998). Protein sidechain conformer prediction: a test of the energy function. Folding and Design 3, 353-377.
C. Rapp and R. Friesner. (1999). Prediction of loop geometries using a generalyzed Born model of solvation effect. Proteins: Struc., Func. and Gene. 35, 173-183.
C. Ring and F. Cohen. (1994). Conformational sampling of loop structures using genetic algorithm. Isr. J. Chem. 34, 245-252.
D. Rosenbach and R. Rosenfeld. (1995). Simultaneous modeling of multiple loops in proteins. Protein Sci. 4, 496-505.
B. Rost. (1999). Twilight zone of proteins sequence alignments. Protein Eng. 12, 85-94.
S. Rufino, L. Donate, L. Canard and T. Blundell. (1997). Predicting the Conformational Class of Short and Medium Size Loops Connecting Regular Secondary Structures: Application to Comparative Modelling. J. Mol. Biol. 267, 352-367.
R. Russell, M. Saqi, R. Sayle, P. Bates and M. Sternberg. (1997). Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mo.l Biol. 269, 423-439.
R. Russell, P. Sasieni and M. Sternberg. (1998). Supersites within superfolds. Binding site similarity in the absence of homology. J. Mol. Biol. 282, 903-918.
L. Rychlewski, L. Jaroszewski, L. Weizhong and A. Godzik. (2000). Comparison of sequence profiles. Structural prediction with no structure information. Protein Sci. 8, 232-241.
G. Salem, E. Hutchinson, C. orengo and J. Thornton. (1999). Correlation of observed Fold frequency with the ocurrence of local structural motifs. J. Mol. Biol. 287, 969-981.
A. Sali and T. Blundell. (1993). Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815.
R. Sánchez, U. Pieper, F. Melo, N. Eswar, M. Martí-Renom, M. Madhusudhan, N. Mirkovic and A. Sali. (2000). Protein Structure Modeling for Structural Genomics. Nature Struct. Biol. Suppl. November, 986-990.
R. Sánchez and A. Sali. (1997). Advances in comparative protein structure modeling. Curr. Opin. Struct. Biol. 7, 206-214.
R. Sánchez and A. Sali. (1997). Evaluation of comparative protein structure modeling by MODELLER-3. Proteins: Struc. Func. and Gene. Suppl 1, 50-58.
M. Saqi, R. Russell and M. Sternberg. (1999). Misleading local sequence alignment: implications for comparative modelling. Protein Eng. 11, 627-630.
J. Sauder, J. Arthur and R. Dunbrack. (2000). Large-scale comparisson of protein sequence alignment algorithms with structure alignments. Proteins: Struc. Func. and Gene. 40, 6-22.
P. Shenkin, D. Yarmush, R. Fine, H. Wang and C. levinthal. (1987). Predicting antibody hypervariable loop conformation: I. Ensembles of random conformation fro ring-like structures. Biopolymers 26, 2053-2085.
H. Shirai, A. Kidera and H. Nakamura. (1999). H3-rules: identification of CDR-H3 structures in antibodies. FEBS Letters 455, 188-197.
M. Sippl. (1993). Recognition of errors in three-dimensional structures of proteins. Proteins: Struc. Func. and Gene. 17, 355-362.
K. Smith and B. Honig. (1994). Evaluation of the conformational free energies of loops in proteins. Proteins: Struc. Func. and Gene. 18, 119-132.
T. Smith and M. Waterman. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195-197.
M. Sternberg, P. Bates, L. Kelley and R. MacCallum. (1999). Progress in proteins structure prediction: assesment of CASP3. Curr. Opin. Struct. Biol. 9, 368-373.
M. Sutcliffe, F. Hayes and T. Blundell. (1987). Knowledge-based modeling of homologous proteins, part II: rules for the conformations of substituted side-chains. Protein Eng. 1, 385-392.
M. Sutcliffe, F. Hayes, D. Carney and T. Blundell. (1987). Knowledge-based modeling of homologous proteins, part I. Three dimensional frameorks derived from the simultaneous superposition of multiple structure. Protein Eng.(377-384).
W. Taylor. (1988). A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28, 161-169.
S. Teichmann, C. Chothia, G. Church and J. Park. (2000). Fast assignements of protein structures to sequences using the intermediate sequence library. Bioinformatics 16, 117-124.
J. Thompson, D. Higgins and T. Gibson. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673-4680.
J. Thompson, F. Plewianiak and O. Poch. (1999). Balibase: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87-88.
J. Thompson, F. Plewianiak and O. Poch. (1999). A comprehensive comparison of multiple sequence alignment programs. Nucleic Acid Res. 27, 2682-2690.
J. Thompson, F. Plewianiak, J. Thierry and O. Poch. (2000). DbClustal: rapid and reliable global multiple alignments of protein sequence detected by database searches. Nucleic Acids Res. 28, 2919-2926.
C. Topham, N. Srinivasan, C. Thorpe, J. Overington and N. Kalsheker. (1994). Comparative modeling of major house dust mite allergen der p I: structure validation using an extended environmental amino acid propensity table. Protein Eng. 7, 869-894.
A. Torda. (1997). Perspectives in protein fold recognition. Curr. Opin. Struct. Biol. 7, 200-205.
A. Tramontano, C. Chothia and A. Lesk. (1989). Structural determinants of the conformations of medium sized loops in proteins. Proteins: Struc. Func. and Gene. 6, 382-394.
S. Vajda and C. DeLisi. (1990). Determining minimum energy conformations of polypetides by dynamic programming. Biopolymers 29, 1755-1772.
M. Vasquez. (1996). Modeling side-chain conformation. Curr. Opin. Struct. Biol. 6, 217-221.
H. W. v. Vlijmen and M. Karplus. (1997). PDB-based protein loop prediction: parameters for selection and methods for optimization. J. Mol. Biol. 267, 975-1001.
J. Wojcik, J. Mornon and J. Chomilier. (1999). New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. J. Mol. Biol. 289, 1469-1490