Difference between revisions of "Structure-Based Mutation Analysis Hemochromatosis"

From Bioinformatikpedia
(Gromacs)
(Minimise)
 
(27 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Hemochromatosis|Hemochromatosis]]>>[[Structure-Based Mutation Analysis Hemochromatosis|Task 7: Structure-based mutation analysis]]
 
[[Hemochromatosis|Hemochromatosis]]>>[[Structure-Based Mutation Analysis Hemochromatosis|Task 7: Structure-based mutation analysis]]
 
<br style="clear:both;">
 
 
== Riddle of the task ==
 
 
 
It took you over an hour to figure out the [[User:Bernhoferm#Riddle_2|right combination]], but the door is finally open. The sight is unbelievable. Inside the nex room lies a treasure beyond imagination: heaps of gems, gold, and jewelry. Exotic furs, marvelous paintings, and many more. You step inside to collect what should now be yours...
 
 
The moment you reach out for the first piece of treasure it vanishes into thin air. ALL of it. The treasure was just an illusion... You look around and see another entrance into the room. A collapsed one. Across the room is a person, kneeling before another door. You shout... No answer. He didn't even move. As you get closer to him you see that, whoever it was, is dead. His skin mummified due to the dry air. Next to him an old leathery backpack. You reach out to take it as you notice small fragments on the floor. They look like tiny bits of red glass. Now that you're in front of him you also see many of these splinters burried inside the person's flesh. Within the backpack you find several glass orbs: a blue one, a yellow one, a green one, an orange one, a cyan one, and a violet one. Infront of the dead man, at the bottom of the door, you notice three slots. Each of them about the size of the orbs. One of them is red, the second one orange, and the third one yellow...
 
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 16: Line 7:
   
 
Detailed description: [[Task_7_-_Structure-based_mutation_analysis|Structure-based mutation analysis]]
 
Detailed description: [[Task_7_-_Structure-based_mutation_analysis|Structure-based mutation analysis]]
  +
  +
In this task we employed several methods for structure-based predictions of mutation effects. The methods were SCWRL, FoldX, Minimise, and Gromacs. After the generation of models for each mutation and method we used PyMol and energy statistics to classify the mutations into disease causing ones and benign mutations.
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 43: Line 36:
   
   
In order to analyze the effects of the mutations we have created several models with SCWRL and FoldX. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:
+
In order to analyze the effects of the mutations we have created several models with SCWRL<ref name="scwrl">Georgii G. Krivov, Maxim V. Shapovalov, and Roland L. Dunbrack, Jr. (2009): Improved prediction of protein side-chain conformations with SCWRL4. PMID 19603484</ref> and FoldX<ref name="foldx1">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005): The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8. PMID 15980494</ref><ref name="foldx2">Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005): Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52. PMID 16006526</ref><ref name="foldx3">Guerois R., Nielsen J. E., Serrano L. (2002): Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87 PMID 12079393</ref>. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:
 
* green: reference (1a6zC)
 
* green: reference (1a6zC)
 
* cyan: SCWRL wildtype
 
* cyan: SCWRL wildtype
Line 51: Line 44:
   
   
An overview table containing all energy values for the models and their wildtypes can be found [[Hemo_Task7_SCWRL_FoldX_Table|here]].
+
An overview table containing all energy values for the models and their wildtypes can be found [[Hemo_Task7_SCWRL_FoldX_Table|here]]. In the following section only the normalized energy change will be given. This number represents the difference in energy compared to the wildtype model in one-tenth of a percent (i.e. 17 means +1.7%).
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 293: Line 286:
   
   
Next we used Minimise to minimize (lame pun...) the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found [[Hemo_Task7_Minimise_Table|here]].
+
Next we used Minimise to minimize the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found [[Hemo_Task7_Minimise_Table|here]].
   
 
The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.
 
The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.
Line 310: Line 303:
   
   
  +
In order to compare the minimise resutls for SCWRL and FoldX with each other and with the original values given by the modeling programs we chose the 2nd iteration results as these showed an improvement in energy for all models. Then the energy values were again normalized (cf. Section: [[Structure-Based_Mutation_Analysis_Hemochromatosis#SCWRL_and_FoldX|SCWRL and FoldX]]). An overview table for these values can be found [[Hemo_Task7_Minimise_Table#label-it2_norm|here]]. <xr id="it2_comparison"/> shows a comparison between the two normalized values. After the minimization every mutation exhibits the same energy change whether it is a SCWRL or FoldX model. Even the magnitude of the changes are quite similar for both methods. In contrast the new values show no correlation to their original ones at all. This suggests that Minimise's performance is almost independent of the input model. The new values also have a good correlation with the mutations' effects (i.e. positive = malign, negative = benign). Only M35T and V53M would result in false predictions (assuming R224W is indeed benign).
MUST... NOT... CREATE... MORE... FIGURES...<br>
 
  +
WILL... GO... CRAZY...
 
  +
<figtable id="it2_comparison">
  +
{| class="wikitable" style="width: 600px; margin: 1em 1em 1em 0; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
  +
|-
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |Mutation
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |SCWRL norm.
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |original norm.
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |FoldX norm.
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |original norm.
  +
! style="text-align:left; border-style: solid; border-width: 1px 1px 2px 1px" |Validation
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |M35T
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |2.431904837
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |17.72037844
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |2.656691758
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |7.744433688
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |benign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |V53M
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-1.390424842
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |70.34982121
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-2.620524491
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |3.705278503
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |G93R
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |7.858037681
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |16.14153574
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |10.91059879
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-3.772906935
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |Q127H
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |5.379939254
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |17.91113835
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |5.091157565
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-3.676245328
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |A162S
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-1.378597488
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |38.09921951
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-1.029897803
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |6.643335904
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |benign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |L183P
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |19.9535606
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |0.284110511
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |13.45154088
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |33.99673341
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |T217I
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-1.748299106
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |48.10396821
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-0.268900854
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |9.312876843
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |benign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |R224W
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-8.539391409
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |53.50612664
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |-10.18388517
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |2.104124083
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |benign (uncertain)
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |E277K
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |9.191026282
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |122.4069842
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |1.966468695
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |25.85492841
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |C282S
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |1.037044876
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |34.74671548
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |1.456294229
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |46.69076258
  +
| style="border-style: solid; border-width: 1px 1px 1px 1px" |malign
  +
|-
  +
|+ style="caption-side: top; text-align: left" | <font size=1>'''Table 12''': Comparison between the normalized values for the second iteration of minimise and the values for the original models from SCWRL and FoldX. The last column shows the real annotation for the mutation. For R224W it is uncertain if the mutation is indeed benign or if it just has not been classified as malign, yet.
  +
|}
  +
</figtable>
  +
  +
  +
<figure id="R224W_min_fx">
  +
[[File:hemo_R224W_min_foldx.gif|thumb|400px|<font size=1>'''Figure 2:''' Changes in structure for the R224W FoldX model over 5 iterations with Minimise. Residues within 5Å are also shown as sticks. Colors are: green (1a6zC), blue (FoldX model), cyan (iteration 1), yellow (iteration 2), orange (iteration 3), red (iteration 4), and magenta (iteration 5).]]
  +
</figure>
  +
  +
  +
<xr id="R224W_min_fx"/> shows the changes in the structure of the R224W mutation based on the FoldX model. The first and second iteration of Minimise have the biggest impact on the protein structure while improving its energy potential. The following iterations perform only minor changes and also make the energy potential worse. Although this is only an example, the other mutation models show about the same behavior.
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 317: Line 401:
 
== Gromacs ==
 
== Gromacs ==
   
For gromacs we used the models created with [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Structure-Based_Mutation_Analysis_Hemochromatosis#SCWRL_and_FoldX SCWRL and FoldX]. (this replaces Step 1 to 3 in the [[Task_7_-_Structure-based_mutation_analysis#Gromacs|task description]])
+
For gromacs we used the models created with [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Structure-Based_Mutation_Analysis_Hemochromatosis#SCWRL_and_FoldX SCWRL and FoldX] that were repaired (with repairPDB) like for the [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Task7_Hemochromatosis_Protocol#Minimise minimise-step]. (this replaces Step 1 to 3 in the [[Task_7_-_Structure-based_mutation_analysis#Gromacs|task description]])
   
   
Line 343: Line 427:
   
   
 
The output of g_energy can be seen in the following pictures. These are only for chosen models. The total amount of pictures can be found [[hemochromatosis_gromacs_energy_table_pictures#Pictures|here]]. The final calculated energies can also be found on that page under section [[hemochromatosis_gromacs_energy_table_pictures#Tables|"Tables"]].
 
   
 
<!--
 
<!--
Line 360: Line 442:
 
| align="right" | [[File:Hemochromatosis mdrun_runtime2.png|thumb|200px|Runtime of mdrun with different number of steps, stepwidth 1]]
 
| align="right" | [[File:Hemochromatosis mdrun_runtime2.png|thumb|200px|Runtime of mdrun with different number of steps, stepwidth 1]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table TODO:''' Plots of runtimes of mdrun against the number of steps.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 12:''' Plots of runtimes of mdrun against the number of steps.
 
|}
 
|}
  +
</figtable>
 
<br style="clear:both;">
 
<br style="clear:both;">
   
   
The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken. Here you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.
+
The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken for chosen models. The total amount of pictures (for all models) can be found [[hemochromatosis_gromacs_energy_table_pictures#Pictures|here]]. The final calculated energies can also be found on that page under section [[hemochromatosis_gromacs_energy_table_pictures#Tables|"Tables"]].
  +
  +
Here (<xr id="gromacs_energies"/>) you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.
   
 
The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.
 
The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.
   
 
As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.
 
As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.
  +
  +
In the following <xr id="delta_potential"/> we noted the change of potential, observed when comparing the mutation model against the wildtype model. The used models were all created with FoldX. <!--We chose these because, as noted in Section [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Structure-Based_Mutation_Analysis_Hemochromatosis#SCWRL_and_FoldX SCWRL and FoldX], the energies of the models created with FoldX seem to be lower in general. Although these are not always lower, but by using only FoldX structures we make sure to have the values comparable. -->
   
 
<figtable id="gromacs_energies">
 
<figtable id="gromacs_energies">
 
{| class="wikitable" style="float: right; margin: 0 0 0 1em; border: 2px solid darkgray;" cellpadding="2"
 
{| class="wikitable" style="float: right; margin: 0 0 0 1em; border: 2px solid darkgray;" cellpadding="2"
 
! scope="row" align="left" |
 
! scope="row" align="left" |
  +
| align="right" | [[File:Hemochromatosis Amber03foldX A162S.png|thumb|200px|A162S mutant, calculated with foldX and the Amber03 forcefield.]]
 
| align="right" | [[File:Hemochromatosis Amber03foldX A162S_WT.png|thumb|200px|Wildtype based on the foldX A126S mutation.The used forcefield was Amber03.]]
 
| align="right" | [[File:Hemochromatosis Amber03foldX A162S_WT.png|thumb|200px|Wildtype based on the foldX A126S mutation.The used forcefield was Amber03.]]
  +
| align="right" | [[File:Hemochromatosis Amber03foldX A162S.png|thumb|200px|Mutant, calculated with foldX and the Amber03 forcefield.]]
 
 
|-
 
|-
 
! scope="row" align="left" |
 
! scope="row" align="left" |
Line 385: Line 473:
 
| align="right"| [[File:Hemochromatosis Amber99sb-ildnscwrl_WT.png|thumb|200px|Wildtype based on the scwrl method.The used forcefield was Amber99-ildn]]
 
| align="right"| [[File:Hemochromatosis Amber99sb-ildnscwrl_WT.png|thumb|200px|Wildtype based on the scwrl method.The used forcefield was Amber99-ildn]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table TODO:''' Plots of the energy values (bond, angle and potential) of the model .
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 13:''' Plots of the energy values (bond, angle and potential) of the model .
 
|}
 
|}
  +
</figtable>
   
  +
<figtable id="delta_potential">
 
<figtable id="amber03-table">
 
 
{| class="wikitable", style="border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
 
{| class="wikitable", style="border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
 
|-
 
|-
Line 436: Line 524:
 
| malign
 
| malign
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table TODO:''' change in potential when comparing the mutated foldX models with the wildtype ones.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 14:''' change in potential when comparing the mutated foldX models with the wildtype ones.
 
|}
 
|}
  +
</figtable>
 
   
 
Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign
 
Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign
 
<br style="clear:both;">
 
<br style="clear:both;">
   
== Conclusion ==
 
   
   
  +
In addition to this we also computed potential values for different force fields. It is noticable, that the amber99sb-ildn forcefield generates a much lower potential (~ -42000) than the amber03 and charmm27 (~ -38800) (see [[hemochromatosis_gromacs_energy_table_pictures#Tables|"tables"]]). For this the amber99sb-ildn forcefield generates higher bond and angle values than the amber03 forcefield, whereas charmm27 as forcefield seems to only use bondvalues and reaches about the same potential as the amber03 forcefield through this.
Maybe?
 
   
 
<br style="clear:both;">
 
<br style="clear:both;">

Latest revision as of 14:44, 26 June 2012

Hemochromatosis>>Task 7: Structure-based mutation analysis


Short task description

Detailed description: Structure-based mutation analysis

In this task we employed several methods for structure-based predictions of mutation effects. The methods were SCWRL, FoldX, Minimise, and Gromacs. After the generation of models for each mutation and method we used PyMol and energy statistics to classify the mutations into disease causing ones and benign mutations.


Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.


Structure selection and mapping of the mutations

<figure id="mut_map">

Figure 1: M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S mapped onto 1a6zC. Mutations are shown in sticks representation and colored red. Glycosylation sites are colored cyan. Disulfide bonds are colored orange and also shown as sticks.

</figure>

There are only two structures available for HFE at PDB: 1a6z and 1de4. We chose 1a6z for this task as it has the better resolution (2.6 Å instead of 2.8 Å) and has only a beta-2-microglobulin in addition to HFE. In 1de4 HFE would be complexed with transferrin receptor (TFR). All of the mutations from the previous task (M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S) are included in the PDB structure (residues 26-297).

<xr id="mut_map"/> shows a three dimensional mapping of the mutations (red) onto 1a6zC. Glycosylation sites (cyan) and disulfide bonds (orange) are also indicated. The only such residue that is directly affected by a mutation is the disulfide bond spanned by C225 and C282 where C282 is mutated into Serine. Though Q127H, L183P, and R224W are quite close to the glycosylation site at 130 and the two disulfide bonds (C124-C187, C224-C282) and therefore might affect them indirectly.



SCWRL and FoldX

In order to analyze the effects of the mutations we have created several models with SCWRL<ref name="scwrl">Georgii G. Krivov, Maxim V. Shapovalov, and Roland L. Dunbrack, Jr. (2009): Improved prediction of protein side-chain conformations with SCWRL4. PMID 19603484</ref> and FoldX<ref name="foldx1">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005): The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8. PMID 15980494</ref><ref name="foldx2">Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005): Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52. PMID 16006526</ref><ref name="foldx3">Guerois R., Nielsen J. E., Serrano L. (2002): Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87 PMID 12079393</ref>. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:

  • green: reference (1a6zC)
  • cyan: SCWRL wildtype
  • magenta: SCWRL mutant
  • orange: FoldX wildtype
  • red: FoldX mutant


An overview table containing all energy values for the models and their wildtypes can be found here. In the following section only the normalized energy change will be given. This number represents the difference in energy compared to the wildtype model in one-tenth of a percent (i.e. 17 means +1.7%).


M35T

<figtable id="M35T_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 1: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for M35T. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 17.720
  • FoldX energy (norm.): 7.744


The wildtype of M35T is part of a beta sheet complex in the MHC I domain and spans two hydrogen bonds to a neighboring beta sheet (cf. <xr id="M35T_pymol"/>). Both of these hydrogen bonds are preserved in the mutant model (SCWRL and FoldX). The FoldX model uses a slighty different rotamer, though, which enables it to form an additional hydrogen bond to the previous residue. This might cause an increased stability over the wildtype. The changes to the surface due to the mutation are only minor and should not cause any problems. The surface model also shows that FoldX uses a slighty different rotamer for the wildtype model. Even the energy values indicate only minor changes in the whole model. Therefore this mutation should be considered non disease causing.


V53M

<figtable id="V53M_pymol">

SCWRL.
FoldX.
Table 2: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for V53M. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 70.349
  • FoldX energy (norm.): 3.705


V53M marks the transition of a beta sheet into a turn within the MHC I domain and forms two hydrogen bonds to the beginning of the next beta sheet (cf. <xr id="V53M_pymol"/>). The mutant models both retain these bonds and do not form additional ones either. As this residue is burried within the protein, there are no changes to the surface, but the residue loses its strong hydrophobic character which would force it into the protein during translation/folding. Even though the rotamers used by SCWRL and FoldX differ only slightly the difference in the energy model is quite huge. While FoldX would not indicate that V53M is disease causing, SCWRL's energy change does so. The mutation is quite hard to classify. Though considering that it has the second highest energy change for all SCWRL models and that it loses its hydrophobicity, it is more likely to be disease causing than not.


G93R

<figtable id="G93R_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 3: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for G93R. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 16.141
  • FoldX energy (norm.): -3.772


G93R lies within a big alpha helix in the MHC I domain (cf. <xr id="G93R_pymol"/>). The important three hydrogen bonds for the helix stabilization are conserved in both mutant models. In the FoldX model an additional hydrogen bond within the helix structure is formed. While these changes seem harmless at first, it should also be noted that this region is supposed to be the interface for the TFR-HFE complex. This makes the changes to the surface even more severe than they would seem on their own. The much bigger arginine causes a massiv bulk on the surface which is very likely to interfere with the complex formation. Therefore this mutation should be considered disease causing, even if the energy models do not suggest this.


Q127H

<figtable id="Q127H_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 4: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for Q127H. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 17.911
  • FoldX energy (norm.): -3.676


Q127H is at the start of a coil/turn between two beta sheets in the MHC I domain (cf. <xr id="Q127H_pymol"/>). While both mutant models retain the two hydrogen bonds that stabilize this coil/turn, the FoldX model forms even an additional one, they both lose a hydrogen bond which connects Q127 and E125. Thus the indirect anchor to the previous beta sheet is lost. This might not be that severe, but one of the connected amino acids marks the glycosylation site N130 (connected by the lower hydrogen bond in the figures). With this in mind this mutation should be considered disease causing. Like in the previous mutation this is contrary to the energy models.


A162S

<figtable id="A162S_pymol">

SCWRL.
FoldX.
Table 5: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for A162S. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 38.099
  • FoldX energy (norm.): 6.643


A162S is part of a helix in the MHC I domain (cf. <xr id="A162S_pymol"/>). All wildtype hydrogen bonds are preserved in the mutant models and several new ones are formed (3 in SCWRL and 4 in FoldX). This should further stabilize the structure. Additionally the residue is buried within the protein and thus causes no changes on the surface. Even the size of the wildtype and mutant amino acids does not differ much. The only indicator for a malign mutation would be the energy change in the SCWRL model, but this has proven to be quite unreliabe in the previous mutations. Therefore this mutation should be considered non disease causing.


L183P

<figtable id="L183P_pymol">

SCWRL.
FoldX.
Table 6: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for L183P. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 0.284
  • FoldX energy (norm.): 33.996


L183P is, again, located in one of the MHC I domain's helices. Proline's effect as a helix breaker is demonstrated in <xr id="L183P_pymol"/>. Both stabilizing hydrogen bonds are lost and no new ones are formed. As mentioned before this region is interface for the TFR-HFE complex and therefore a break in one of the three big helices should be considered to be disease causing, even though this particular residue is not on the surface or the protein. This is also the first FoldX model to show a big energy change. Maybe FoldX's energy model is a better indicator than SCWRL's.


T217I

<figtable id="T217I_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 7: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for T217I. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 48.103
  • FoldX energy (norm.): 9.312
  • Warning: Highly hydrophobic amino acid on the surface!


T217I is the first mutation that is within the C1 domain (cf. <xr id="T217I_pymol"/>). It is part of a coil/turn between two beta sheets and seems to play an important role in the stabilization of this region as it forms a total of 5 hydrogen bonds. All but one of these bonds are lost in both mutant models. Though the hydrogen bond which is conserved is probably the most important one as it reaches across the coil/turn to the beginning of the next beta sheet. While the changes to the surface are only minor, the fact that the mutant is highly hydrophobic indicates a malign mutation. The change in the energy models also, more or less, suggest this mutation to be disease causing.


R224W

<figtable id="R224W_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 8: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for R224W. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 53.506
  • FoldX energy (norm.): 2.104


R224W lies within one of the C1 domain's beta sheets and forms two stabilizing hydrogen bonds to the neighboring beta sheet (cf. <xr id="R224W_pymol"/>). All hydrogen bonds are unchanged in the mutant models and no new ones are formed. While the mutant residue has quite a different structure than the wildtype the rotamer chosen by FoldX seems to resemble the original one better. The mutant produces moderate changes on the protein surface which could severe considering that this side of the C1 domain is aligned with Beta-2-Microglobulin (when in complex). SCWRL's energy model also indicates a malign mutation. Overall R224W should be considered disease causing.


E277K

<figtable id="E277K_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 9: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for E227K. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 122.406
  • FoldX energy (norm.): 25.854


E277K is part of a very small helix (4 residues according to DSSP) within the C1 domain (cf. <xr id="E277K_pymol"/>). It seems to have a quite complex role in stabilizing the entire domain as it forms hydrogen bonds with three different structural formations: One with the following beta sheet (Y280), two with G275 which is within a short coil, and one with T221 which is at the start of another beta sheet within the C1 domain. Both mutant models lose the hydrogen bonds with G275 and the SCWRL model additionally loses the one with T221 which might have serve effects on the tertiary structure of the C1 domain. These destabilizations, the moderate changes on the protein surface (cave in the SCWRL model, bulk in the FoldX one), and the high energy changes for both models strongly indicate a disease causing mutation.


C282S

<figtable id="C282S_pymol">

SCWRL.
FoldX.
Table 10: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for C282S. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 34.746
  • FoldX energy (norm.): 46.690


C282S is located within a beta sheet of the C1 domain (cf. <xr id="C282S_pymol"/>) and forms two hydrogen bonds with the neighboring sheet. These bonds are retained in both mutant models and they even form a third one with the same residue. The difference in residue size is minor and it is located within the protein (no surface changes). Though the major problem with this mutation is the loss of the only disulfide bridge (C225-C282) within the C1 domain which is also reflected in the big energy model changes. This loss alone is enough to consider this mutation disease causing.


Minimise

Next we used Minimise to minimize the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found here.

The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.


<figtable id="energy_gain">

All models.
FoldX models.
SCWRL models.
Table 11: Median energy change per iteration of minimization. Each box is based on the energy difference between the current and the first iteration. Statistics are shown for all 31 models (left), all 20 FoldX models (center), and all 11 SCWRL models (right).

</figtable>


In order to compare the minimise resutls for SCWRL and FoldX with each other and with the original values given by the modeling programs we chose the 2nd iteration results as these showed an improvement in energy for all models. Then the energy values were again normalized (cf. Section: SCWRL and FoldX). An overview table for these values can be found here. <xr id="it2_comparison"/> shows a comparison between the two normalized values. After the minimization every mutation exhibits the same energy change whether it is a SCWRL or FoldX model. Even the magnitude of the changes are quite similar for both methods. In contrast the new values show no correlation to their original ones at all. This suggests that Minimise's performance is almost independent of the input model. The new values also have a good correlation with the mutations' effects (i.e. positive = malign, negative = benign). Only M35T and V53M would result in false predictions (assuming R224W is indeed benign).

<figtable id="it2_comparison">

Mutation SCWRL norm. original norm. FoldX norm. original norm. Validation
M35T 2.431904837 17.72037844 2.656691758 7.744433688 benign
V53M -1.390424842 70.34982121 -2.620524491 3.705278503 malign
G93R 7.858037681 16.14153574 10.91059879 -3.772906935 malign
Q127H 5.379939254 17.91113835 5.091157565 -3.676245328 malign
A162S -1.378597488 38.09921951 -1.029897803 6.643335904 benign
L183P 19.9535606 0.284110511 13.45154088 33.99673341 malign
T217I -1.748299106 48.10396821 -0.268900854 9.312876843 benign
R224W -8.539391409 53.50612664 -10.18388517 2.104124083 benign (uncertain)
E277K 9.191026282 122.4069842 1.966468695 25.85492841 malign
C282S 1.037044876 34.74671548 1.456294229 46.69076258 malign
Table 12: Comparison between the normalized values for the second iteration of minimise and the values for the original models from SCWRL and FoldX. The last column shows the real annotation for the mutation. For R224W it is uncertain if the mutation is indeed benign or if it just has not been classified as malign, yet.

</figtable>


<figure id="R224W_min_fx">

Figure 2: Changes in structure for the R224W FoldX model over 5 iterations with Minimise. Residues within 5Å are also shown as sticks. Colors are: green (1a6zC), blue (FoldX model), cyan (iteration 1), yellow (iteration 2), orange (iteration 3), red (iteration 4), and magenta (iteration 5).

</figure>


<xr id="R224W_min_fx"/> shows the changes in the structure of the R224W mutation based on the FoldX model. The first and second iteration of Minimise have the biggest impact on the protein structure while improving its energy potential. The following iterations perform only minor changes and also make the energy potential worse. Although this is only an example, the other mutation models show about the same behavior.


Gromacs

For gromacs we used the models created with SCWRL and FoldX that were repaired (with repairPDB) like for the minimise-step. (this replaces Step 1 to 3 in the task description)


title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no


We used this .mdp file for evaluating all energies. For more information regarding the arguments read this.





For getting the runtimes of mdrun we iteratively called mdrun with different stepsizes in the .mdp file. At first we looked at 100+X*100 steps resulting in <xr id="runtimes"/>, left picture. Here you can see that the runtime (noted as real-time so uninfluenced by system) is capped at around 32 seconds. As these gaps were too big to see whether its a linear growth we performed another test. To get more accurate results we performed the same test again with 100+X*1. The result can be seen in <xr id="runtimes"/>, right picture.

Based on this result we conclude that the runtime is linear up to a certain point where no improvement can be made anymore and the program terminates.

<figtable id="runtimes">

Runtime of mdrun with different number of steps, stepwidth 100
Runtime of mdrun with different number of steps, stepwidth 1
Table 12: Plots of runtimes of mdrun against the number of steps.

</figtable>


The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken for chosen models. The total amount of pictures (for all models) can be found here. The final calculated energies can also be found on that page under section "Tables".

Here (<xr id="gromacs_energies"/>) you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.

The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.

As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.

In the following <xr id="delta_potential"/> we noted the change of potential, observed when comparing the mutation model against the wildtype model. The used models were all created with FoldX.

<figtable id="gromacs_energies">

A162S mutant, calculated with foldX and the Amber03 forcefield.
Wildtype based on the foldX A126S mutation.The used forcefield was Amber03.
Wildtype based on the foldX A126S mutation.The used forcefield was Charmm27.
Wildtype based on the scwrl method.The used forcefield was Charmm27.
Wildtype based on the foldX A126S mutation.The used forcefield was Amber99-ildn.
Wildtype based on the scwrl method.The used forcefield was Amber99-ildn
Table 13: Plots of the energy values (bond, angle and potential) of the model .

</figtable>

<figtable id="delta_potential">

Mutation change in potential validation
M35T -150.7 benign
V53M 783.4 malign
G93R -529.9 malign
Q127H -196.1 malign
A162S 113.9 benign
L183P 310.5 malign
T217I 159.6 benign
R224W 603.2 benign(should be malign)
E277K 455.8 malign
C282S 32.4 malign
Table 14: change in potential when comparing the mutated foldX models with the wildtype ones.

</figtable>

Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign


In addition to this we also computed potential values for different force fields. It is noticable, that the amber99sb-ildn forcefield generates a much lower potential (~ -42000) than the amber03 and charmm27 (~ -38800) (see "tables"). For this the amber99sb-ildn forcefield generates higher bond and angle values than the amber03 forcefield, whereas charmm27 as forcefield seems to only use bondvalues and reaches about the same potential as the amber03 forcefield through this.


References

<references/>