SINGLE TEMPLATE PROTEIN MODELING

Acetate kinase from Rhizobium meliloti


INTRODUCTION
AIMS OF THE STUDY
MATERIALS AND METHODS
Sequences (swissprot entry)
Programs
RESULTS
Template selection
Resulting models
Model selection
Model optimization
DISCUSSION
CONCLUSIONS
FURTHER STUDIES
REFERENCES
ACKNOWLEDGEMENTS
PEOPLE



INTRODUCTION
 

    Acetate kinases are a protein family mainly found in microorganisms whose enzymatic function consists in the conversion of organic matter to CO2 and CH4 (by the conversion of acetate to acetyl phosphate in the presence of ATP and a divalent cation). Structurally, after the SCOP classification, they belong to the a/b Class, and to the ASKHA Superfamily (acetate and sugar kinases/Hsc70/actin), whose members have a conserved structural "bbbababa" core and the strange epsilon conformation on the Ala 330 (Phi-angle=75.4º, Psi-angle=175.3º), that initiates a turn of a helix in all the members of the Superfamily.
 
 

Topologic diagram of the template

Fig.1 Topologic diagram of the template. The mentioned  "bbbababa" core is represented in dark grey; the remaining secondary structures  (subdomains) are insertions between particular elements of the beta sheet, typical of each member of the Superfamily


    The structure of the acetate kinase from Rhizobium meliloti (P58382) has not been obtained experimentally yet. Its modeling has been based on a single template, the acetate kinase of Methanosarcina thermophila (1g99), whose structure was obtained in 2000 at 2.50 A (Buss et al., 2001).

    Although the molecules are dimers, both monomers are identical (100% homology, see dimer.aln), we have used one monomer of the template (Chain A) . Therefore, all the rasmol views below will account for one monomer only (each monomer consisting of a central beta sheet surrounded by alpha helices). Next view of the template shows the complete molecule in the so called "bird"  conformation: the body of the bird  is composed by the C-terminal domains (7 stranded beta sheet and 11 helices, for each monomer), and the wings by the N-terminal domains (8 stranded beta sheet and 8 helices, for each monomer). The ATP binds in the cleft between the two domains, without forming any hidrogen bonds with the protein (which explains the lack of specificity of the enzyme for a particular nucleotide triphosphate, as the phosphoryl donnor). The cleft closes and brings the two catalytic residues toghether (Glu384: one in each side of the molecule), so that they achieve the active site and participate directly in the phosphorilation.
 
 

Template's bird conformation

Fig.2 Frontal and side views of the template's bird conformation (A and B, respectively)
 

Top


AIMS OF THE STUDY
 

    Comparing the differences of modeling by primary sequence homology and by prediction of secondary structure.
 

Top


MATERIALS AND METHODS
 

Sequences (swissprot entry)
 
 
 

P58382
Acetate kinase from Rhizobium meliloti
      P38502 Acetate kinase from Methanosarcina thermophila
      Q59331 Acetate kinase from Clostridium thermosaccharolyticum
      P37877 Acetate kinase from Bacillus subtilis
      P74879 Propionate kinase from Salmonella typhimurium
      P77845 Acetate kinase from Corynebacterium glutamicum
      P11868 Propionate kinase from Escherichia coli
      Q9X4M1 Acetate kinase from Lactobacillus sakei
      Q05619  Butyrate kinase Clostridium beijerinckii

Programs
 

HMMER
CLUSTALW
TOPITS
MODELLER
RASMOL
PROSAII
PROCHECK
GRUMOS
ARCHTYPE
 

Top


RESULTS
 

Template selection
 

    The different strategies we use gives us the same and single template (1g99), with an E-value ranging  from e-218 to e-110:
 


    Since there are no templates other than 1g99 the only alignment we can put into the Modeller program is the one with the sequence of the template (taken from the PDBfile) and the problem sequence. Another approach is to use the Topits server to make a prediction of secondary structure of the problem sequence based on the information in the PDBfile of the template (see the files containing these procedures Alin.aln and topits.html). We note that the server extracts the template sequence from Swissprot and not from the PDBfile; we want to preserve the topits alignment and use the PDB sequence, otherwise the Modeller wouldn't recognize the input file. That's why we simply align both template sequences (swissprot and PDB template-sequence) and the gaps contained in the resulting file have to be added to the normal clustalw alignment (Alin.aln) which will also contain the only gap that returns topits (final.clw). Some gaps have to be changed in the clustalw alignment between the template and the problem sequence: those that appear at the end of an alfa helix (localized thanks to the topits prediction); we simply move them a bit forward to avoid cutting the helix, which would rise the general energetic profile, since these secondary structure stabilizes the final general tertiary structure (final.clw compared to pdb_problema.aln).
 

Resulting models
 

    After having changed the clustalw format to the peer format (alin.pir and final.pir) and generated the respective .top input files (alin.top and topits.top), we start running the Modeller program.

    We obtain four models (obj1, obj2, obj3 and obj4) and the respective Rasmol views are:
 
 

Sequence model 1Sequence model 2Topits model 1Topits model 2

Figs.3-6  Rasmol views of the objects 1,2,3 and 4, respectively
 
 

Common superposed structure (previous 4 models)

Fig.7 Rasmol view of the common superposed structure of the four previous models


Model selection
 

    The energy profiles of each model calculated by Prosa II  (combined energy plots) shows at least 3 peaks of positive energy, from which the most important is the one at 80 residues, and the other two are at 230 and 250 residues, approximately (Prosa1, Prosa2, Prosa3 and Prosa4). Comparing both sequence based models (obj12prosa.jpg, code: obj1-yellow, obj2-red) and both topits models (obj34prosa.jpg, code: obj3-yellow,  obj4-red) we decide to use those with lowest energy profiles (obj2 y obj3) to continue with the optimization (in red in Prosa II plots). The  Ramachandran plots  (from the Procheck analysis, at 2.5A) show an average percentage of disallowed regions of 0.6% for both models, obj2 and obj3. (0.0% for the template 1g99A.sum).

    The second and the third models have the same number of bad contacts (3) in the Procheck output (P58382.sum, predict_h3170.sum), (1 bad contact in the template, 1g99A.sum).

    From these two models (obj2 and obj3), the first one has the most optimized energies; however, we will try to optimize the peaks of both of them, by using three different strategies: Grumos, Archtype and Molecular Dynamics.
 

Model optimization
 

    The Grumos optimization (previous arrangement of the model, without any disulfide bridge, water-free system,  Steepest Descent method of energy analysis, 10 optimizations -one every 1000 steps- without shake, interactions between groups) improves none of the two selected models, substantially.

    The peaks still appear in the common ProsaII plot (Grumos2, code: Obj2-yellow, Opt2-red; Grumos3, code: Obj3-yellow, Opt3-red); although there is a small descent in the two second peak in the sequence based models optimization, but the first  terrible one, is even increased.

    We cannot use Archtype to minimize the energy of the first 100 residues, as the loop found in this region is 12 aa long; we will wait until we run the Molecular Dynamic to see if there is something to do with it. We try anyway with the mentioned program but the results are hopeless: the protein presents 5 loops at the peaks sites, from which just four of them have lengths able to be analyzed through the Archtype (less than 9 residues long). Starting with the fourth loop (parameters), and lowering the cutoff from 60 to 45 (which is rather inefficient) the program gives a list of templates, from which the best is1mro_B (PDBentry), cluster_k, with a score of 100, but after having pasted the loop region to the normal "Template-ProblemSequence alignment", (and subsequent modifications to run the Modeller program), it raises the general energy profile (plot, code: Lup-magenta,  Lup2-green, Opt2-red). Since the obtained plot doesn't solve the energetic problem, and given the low cutoff used, we decide not to follow with this strategy.

  ! Since this is a modeling based on the practical courses of the subject Structural Biology, our knowledge has almost reached its limits.

    The last resource in the model optimization is to run de Molecular Dynamic analysis with the option in the Grumos program  (Temp. by default, without periodicity, rotation 50, 10ps  trajectory time, 10 output files, 100ps as total dynamic time). We do the same for both models (based on sequence and based on secondary structure prediction). The 10 resulting output files, superposed with XAM, show the fragment with the worst energetic profile (1-100 aa) in a rigid conformational state in both models (green), so at the temperature used in the dynamic (278 K) the secondary structure in this region doesn't move. The second studied region (230-255 aa) appears to be very flexible in the first dynamic (Dynamics2-magenta), which comes from a quite successful Energy Optimization. Surprisingly, the other dynamic, which wasn't successful at all in energy optimization, doesn't seem to be as flexible as in the first case. (Dynamics2; Dynamics3, code: 1-100 aa in green and 230-255 aa in magenta).
 


Top


DISCUSSION
 

    Because the most important of a protein for this to be functional is its catalytic center we are going to follow the analysis by comparison of the important residues of the template with those found in the model, which is going to be definitely the obj2, the sequence based model.

    According to the article of reference (Buss et al., 2001), the template presents 4 important regions:
 
 
 

Region
Sub-region
Involved Aminoacids
Color
ADP binding region
Adenine binding (hydrophobic pocket)
A285, I332, I339, D283
Orange
 
Ribose binding
F284
Green
 
alpha-phosphate binding
G331
Pink
 
beta-phosphate binding
H208, S209, G210
Cyan
Substrate binding
Acetyl phosphate binding
 R91, H123, H180 (phosphate moiety)
Brown
 
Acetate binding
V93, F179, M228 (methyl group) P232, R241
Yellow
Cation binding (Mg2+)
 
D148
Magenta
Phosphorylation site
 
E384
Red

    The comparison of both structures (the template and the model) shows a conserved catalytic site, with some modifications in concrete residues, due to a shift between the sequences in the alignment:
 


General viewSide 1 viewSide 2 viewBack view

Fig.8-11 General, two side and back Rasmol views of the active site in the optimized sequence based model, respectively


    We observe that the important regions are conserved in the optimized sequenced based model.

    The next important feature to check in our model is the strange epsilon conformation in residue 325 (the Alanine found in position 330 in the template). Another superposition between these two structures gives us the confirmation of these absolutely conserved conformation among all the ASKHA Superfamily members.
 
 

A330 (template) in blue, A325 (model) in green

Fig.12 Rasmol view of the conserved epsilon conformation in residues 330 (Template, in blue) and 325 (Model, in green); both are Alanines
 

Top


CONCLUSIONS
 

   None of the strategies used is able to give a perfect model of the problem sequence, in terms of the energetic profile. The topits based model should have been the best, since it introduces structural information from the beginning, but we realise that small changes in the sequence, in non-functional regions of the protein, can greatly influence the end result. Nevertheless, given the high homology between both sequences (92%), the important features of the protein are well modeled (the ownership to the superfamily and the functional active site).
 


Top


FURTHER STUDIES
 

   Despite the low cutoff, the remaining loops could be further optimized with the Archtype program.
   The temperature of the Dynamic analysis could be raised to allow some problematic regions to move and adopt a more stable conformational state.
   There must be other programs (¿?) or possibilities to achieve a better model, but our is a limited assignment.
   Another possible template has been determined recently by theoretical methods; the PDBfile has been released in the Database today, the 12th June 2002, by the group of  R. Sagajkar and R. Ramchandra (1LP2, Acetate Kinase of Pasteurella Multocida); a modeling including this second template could be of a great help. Unluckily, the article hasn't been published yet.
 


Top


REFERENCES
 

Buss, Kathryn A. and Cooper, David R. 2001 Urkinase: Structure of Acetate Kinase, a Member of the ASKHA Superfamily of Phosphotransferases. Journal of Bacteriology 183: 680-686

TOPITS server

ARCHTYPE server
 


Top


ACKNOWLEDGEMENTS
 

    To our devoted teachers in Structural Biology, specially to Nuria Centeno, for her patience and encouragement.
 


Top


PEOPLE
 

Isabel Lorenzo Sánchez
Dominique Monferrer Ventura
Ángel Núñez Pagán
 


Top


Barcelona, 12th June 2002