Difference between revisions of "Structure-based mutation analysis Gaucher Disease"

From Bioinformatikpedia
(AMBER03)
(Mutations)
 
(37 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. For this, we first chose a reliable crystal structure of glycosylceramidase and than mapped ten mutation onto this structure. Next, different methods were tried for mutating residues and calculating the energy of the resulting models, namels SCWRL<ref name="scwrl">Qiang Wang, Adrian A. Canutescu, and Roland L. Dunbrack, Jr.(2008). [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682191/ SCWRL and MolIDE: Computer programs for side-chain conformation prediction and homology modeling]. Nat Protoc.</ref>, FoldX<ref name="foldx">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005) [http://www.ncbi.nlm.nih.gov/pubmed/15980494?dopt=Citation The FoldX web server: an online force field.] Nucleic Acids Research.</ref>, MINIMISE, and Gromacs<ref name="gromacs">H.J.C. Berendsen, D. van der Spoel, R. van Drunen. (1995) ''GROMACS: A message-passing parallel molecular dynamics implementation.''Computer Physics Communications.</ref>. In the end, the mutations were classified as disease-causing or non-disease causing depending to which extend they change the energy of the structure. Technical details are reported in our [[Gaucher_Task07_Protocol|protocol]].
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. Technical details are reported in our [[Gaucher_Task07_Protocol|protocol]].
 
   
 
== Cystral structure ==
 
== Cystral structure ==
  +
The UniProtKB entry [http://www.uniprot.org/uniprot/p04062#section_x-ref P04062] lists 23 crystal structure for glycosylceramidase. The five structures with the highest resolutions are enumerated in <xr id="tab:mutations"/>. All of them cover the complete sequence of P04062 but the 39 residues long signal peptide at the beginning. For the subsequent analysis, we chose chain A of [http://www.rcsb.org/pdb/explore/explore.do?structureId=2nt0 2nt0] which has the highest resolution and was crystalized at the physiological lysosomal pH value of 4.5.
  +
 
<figtable id="tab:mutations">
 
<figtable id="tab:mutations">
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="400px"
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="400px"
Line 23: Line 25:
   
 
== Mutations ==
 
== Mutations ==
  +
For the purpose of comparability, we used the same mutations as we did for the [[Sequence-based mutation analysis Gaucher Disease|sequence-based mutation analysis]]. We took the HHsearch alignment for mapping these mutations onto 2nt0_A. <xr id="tab:mutations"/> lists all ten mutations and their position in the sequence P04062 as well as the structure 2nt0_A.
  +
 
<figtable id="tab:mutations">
 
<figtable id="tab:mutations">
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="400px"
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="400px"
Line 52: Line 56:
 
<caption>Mutations used for the structure-based mutation analysis.</caption>
 
<caption>Mutations used for the structure-based mutation analysis.</caption>
 
</figtable>
 
</figtable>
  +
  +
<xr id="fig:mutations"/> visualizes the locations of the selected mutation in 2nt0_A. '''W312C''' is closest to the active site residues E235 and E340. '''W209R''', '''L470P''', and '''W312C''' and located in secondary structure elements and '''L470P''' is likely to break the beta-sheet. The remaining mutations are part of loop regions.
   
 
<figure id="fig:mutations">
 
<figure id="fig:mutations">
Line 59: Line 65:
   
 
== SCWRL ==
 
== SCWRL ==
We employed SCWRL <ref name="scwrl">Qiang Wang, Adrian A. Canutescu, and Roland L. Dunbrack, Jr.(2008). [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682191/ SCWRL and MolIDE: Computer programs for side-chain conformation prediction and homology modeling]. Nat Protoc.</ref> for substituting the wildtype residues listed in <xr id="tab:mutations"/> by the corresponding mutatant residues which are chosen from a rotamer library. <xr id="fig:scwrl"/> denotes the results.
+
We employed SCWRL <ref name="scwrl"/> for substituting the wildtype residues listed in <xr id="tab:mutations"/> by the corresponding mutant residues which are chosen from a rotamer library. <xr id="fig:scwrl"/> denotes the results.
   
 
<figure id="fig:scwrl">
 
<figure id="fig:scwrl">
Line 84: Line 90:
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="700px"
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:center" width="700px"
 
|- style="background-color: lightgrey"
 
|- style="background-color: lightgrey"
! rowspan="2" | Nr !! rowspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Mutation !! colspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Wildtype !! colspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Mutatant !! rowspan="2"| Clashes !! rowspan="2" | Structural<br/>change
+
! rowspan="2" | Nr !! rowspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Mutation !! colspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Wildtype !! colspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Mutant !! rowspan="2"| Clashes !! rowspan="2" | Structural<br/>change
 
|- style="background-color: lightgrey"
 
|- style="background-color: lightgrey"
 
! H-bonds !! style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Hydrophobicity !! H-bonds !! style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Hydrophobicity
 
! H-bonds !! style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Hydrophobicity !! H-bonds !! style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Hydrophobicity
Line 156: Line 162:
   
 
<figure id="fig:scwrl_ss">
 
<figure id="fig:scwrl_ss">
[[File:scwrl_ss.png|thumb|150px|<caption>Seconary structure elements of 2nt0_A (grey) compared to secondary structure elements of models built by SCRWL.</caption>]]
+
[[File:scwrl_ss.png|thumb|150px|<caption>Secondary structure elements of 2nt0_A (grey) compared to secondary structure elements of models built by SCRWL.</caption>]]
 
</figure>
 
</figure>
   
 
== FoldX ==
 
== FoldX ==
  +
FoldX <ref name="foldx"/> is a force field for assessing the impact of point mutations. We called FoldX to determine the optimal side-chain conformation of the mutated site and compared the energy of the mutant model with the wildtype model in order to assess the severity of each mutation.
The superposition of the rotamer configurations predicted by FoldX and SCWRL are shown in <xr id="fig:foldx"/>. The predictions of both tools differed in case of four mutations. In case of '''H60R''', the side-chain orientation of arginine predicted by FoldX forms two instead one hydrogen bonds to T741 and might therefore impact the protein structure more than the orientation of SCRWL. In case of '''A384D''', the romater of FoldX might be more stable than the one of SCWRL since it has a higher distance to the surrounding residues. In case of '''D443N''' we prefer the prediction of SCWRL which is closer to the wildtype configuration. For the same reason we prefer the prediction of FoldX in case of '''R442'''. For the subsequent GROMACS analysis, we hence chose the FoldX model in case of mutation number 8 and 10 and the SCWRL models for all all other mutations.
 
  +
  +
<xr id="fig:foldx"/> depicts the side-chain conformation predicted by FoldX in comparison to SCWRL and the wildtype. The predictions of FoldX and SCWRL differed in four cases. In case of '''H60R''', the side-chain orientation of arginine predicted by FoldX forms two instead one hydrogen bonds to T741 and might therefore impact the protein structure more than the orientation of SCRWL. In case of '''A384D''', the romater of FoldX might be more stable than the one of SCWRL since it has a higher distance to the surrounding residues. In case of '''D443N''', we prefer the prediction of SCWRL which is closer to the wildtype configuration. For the same reason we prefer the prediction of FoldX in case of '''R442'''. For the subsequent GROMACS analysis, we hence chose the FoldX model in case of mutation number 8 and 10 and the SCWRL models for all all other mutations.
   
 
<figure id="fig:foldx">
 
<figure id="fig:foldx">
Line 178: Line 186:
 
</figure>
 
</figure>
   
A comprehensive list of the differences between the mutant and the wildtype models can be found [[FoldX Difference Mutant/Wildtype|here]]. The total energy increased in case of mutation number 4-8, and 10. Just as in case of SCWRL (cf. <xr id="fig:scwrl"/>), '''L470P''', '''A384D''', and '''W209R''' increased the energy of the model most. Since it is unlikely that mutations like '''V172I''' decrease the energy, we consider the energy calculations of SCWRL as more plausible.
+
A comprehensive list of the differences between the mutant and the wildtype models can be found [[FoldX Difference Mutant/Wildtype|here]]. The total energy increased in case of mutation number 4-8, and 10. Just as in case of SCWRL (cf. <xr id="fig:scwrl"/>), '''L470P''', '''A384D''', and '''W209R''' increased the energy of the model most which suggests that these mutations are disease-causing. The fact that some mutations even increase the stability of the model indicates a mild effect on the protein function.
   
== Minimise ==
+
== MINIMISE ==
  +
MINIMISE is a program that applies the CHARMM22 force-field and a rotamer library for minimizing the energy of models. We employed five iterations MINIMISE to refine the models generated by SCWRL and FoldX and compared the energy of the most stable mutant model to the most stable wildtype model in order to assess the effect of each point mutation.
{|
 
  +
|<figure id="fig:minmise_scwrl_energies">[[File:minimise_scwrl_energies.png|thumb|300px|<caption>Energy of the SCWRL models vs. the number of <tt>minimise</tt> iterations.</caption>]]</figure>
 
  +
=== SCWRL models===
|<figure id="fig:minmise_foldx_energies">[[File:minimise_foldx_energies.png|thumb|300px|<caption>Energy of the FoldX models vs. the number of <tt>minimise</tt> iterations.</caption>]]</figure>
 
  +
The energy of all SCWRL models could be reduced up to the second iteration and than gradually increased again. We therefore considered the models after two iterations MINIMISE as the most reliable ones which were used to assess the impact of the mutations on the phenotype. For this, we compared the MINIMISE energy of the mutant model to energy of the wildtype model (cf. the values in brackets in <xr id="fig:minimise_scwrl_e"/>). As already observed when employing SCWRL (cf. <xr id="fig:foldx"/>) and FoldX (cf. <xr id="fig:foldx"/>), '''L470P''', '''D443N''', and '''W209R''' significantly increase the the energy of the model. However, '''H60R''' and '''D443N''' now also lead to unstable models which would suggest a effect on the phenotype. Comparable to FoldX, '''E111K''' leads to a more stable model. In contrast to SCWRL and FoldX, the resulting model of '''R44S''' is also much more stable than the wildtype model according to the MINIMISE energy function.
|}
 
   
<figure id="fig:minimise_scwrl_mutations">
+
<figure id="fig:minimise_scwrl_e">
 
<gallery perrow=5 widths="100">
 
<gallery perrow=5 widths="100">
File:minimise_scwrl_1.gif|1: H60R
+
File:minimise_scwrl_e_1.png|1: H60R (118.96)
File:minimise_scwrl_2.gif|2: V172I
+
File:minimise_scwrl_e_2.png|2: V172I (2.48)
File:minimise_scwrl_3.gif|3: E111K
+
File:minimise_scwrl_e_3.png|3: E111K (-1.02)
File:minimise_scwrl_4.gif|4: L197P
+
File:minimise_scwrl_e_4.png|4: L197P (17.59)
File:minimise_scwrl_5.gif|5: W209R
+
File:minimise_scwrl_e_5.png|5: W209R (65.80)
File:minimise_scwrl_6.gif|6: L470P
+
File:minimise_scwrl_e_6.png|6: L470P (67.08)
File:minimise_scwrl_7.gif|7: W312C
+
File:minimise_scwrl_e_7.png|7: W312C (15.31)
File:minimise_scwrl_8.gif|8: A384D
+
File:minimise_scwrl_e_8.png|8: A384D (26.65)
File:minimise_scwrl_9.gif|9: D443N
+
File:minimise_scwrl_e_9.png|9: D443N (91.79)
File:minimise_scwrl_10.gif|10: R44S
+
File:minimise_scwrl_e_10.png|10: R44S (-110.46)
 
</gallery>
 
</gallery>
<caption>Side-chain optimization of SCWRL models over five iterations <tt>minimise</tt>. Green: the input model.</caption>
+
<caption>Energy of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations MINIMISE. In brackets: energy(mutant 2nd iteration)-energy(wildtype 2nd iteration).</caption>
 
</figure>
 
</figure>
   
  +
<xr id="fig:minimise_scwrl_m"/> show to which extent the MINIMISE optimization alters the side-chain conformations of the mutated residue compared to the wildtype residue. Altogether, the side-chain orientation of the mutated residues is changed only slightly and less than the side-chain of the wildtype model. Note that not only the side-chain of the mutated residue changes but also the side-chains of neighbouring residues.
<figure id="fig:minimise_foldx_mutations">
 
  +
  +
<figure id="fig:minimise_scwrl_m">
 
<gallery perrow=5 widths="100">
 
<gallery perrow=5 widths="100">
File:minimise_foldx_1.gif|1: H60R
+
File:minimise_scwrl_m_1.gif|1: H60R (118.96)
File:minimise_foldx_2.gif|2: V172I
+
File:minimise_scwrl_m_2.gif|2: V172I (2.48)
File:minimise_foldx_3.gif|3: E111K
+
File:minimise_scwrl_m_3.gif|3: E111K (-1.02)
File:minimise_foldx_4.gif|4: L197P
+
File:minimise_scwrl_m_4.gif|4: L197P (17.59)
File:minimise_foldx_5.gif|5: W209R
+
File:minimise_scwrl_m_5.gif|5: W209R (65.80)
File:minimise_foldx_6.gif|6: L470P
+
File:minimise_scwrl_m_6.gif|6: L470P (67.08)
File:minimise_foldx_7.gif|7: W312C
+
File:minimise_scwrl_m_7.gif|7: W312C (15.31)
File:minimise_foldx_8.gif|8: A384D
+
File:minimise_scwrl_m_8.gif|8: A384D (26.65)
File:minimise_foldx_9.gif|9: D443N
+
File:minimise_scwrl_m_9.gif|9: D443N (91.79)
File:minimise_foldx_10.gif|10: R44S
+
File:minimise_scwrl_m_10.gif|10: R44S
 
</gallery>
 
</gallery>
<caption>Side-chain optimization of FoldX models over five iterations <tt>minimise</tt>. Green: the input model.</caption>
+
<caption>Side-chain conformation of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations MINIMISE. In brackets: energy(mutant 2nd iteration)-energy(wildtype 2nd iteration).</caption>
 
</figure>
 
</figure>
   
== Gromacs ==
+
=== FoldX models ===
   
  +
The most stable FoldX model were obtained after one iterations MINIMISE (cf. <xr id="fig:minimise_foldx_e"/>). Further iterations resulted in models with a higher energy. Hence, we compared the mutant and wildtype model of the first iteration for estimating the impact of the mutations. If the mutations are sorted by their energy, the order is similar to the order of the SCWRL models which were optimized by MINIMISE: '''H60R''' increases the energy most, followed by '''L470P''', '''W312C ''', and '''W209R'''. In contrast, '''R44S''' results in the most stable model.
=== Runtime analysis ===
 
   
  +
<figure id="fig:minimise_foldx_e">
To show the relationship between nsteps and runtime of 'mdrun', different nstep were chosen from 50 to 1000. Three different energy functions were selected:
 
  +
<gallery perrow=5 widths="100">
AMBER03 protein, nucleic AMBER94
 
  +
File:minimise_foldx_e_1.png|1: H60R (91.17)
CHARMM27 all-atom force field (with CMAP)
 
  +
File:minimise_foldx_e_2.png|2: V172I (-2.52)
OPLS-AA/L all-atom force field
 
  +
File:minimise_foldx_e_3.png|3: E111K (5.91)
  +
File:minimise_foldx_e_4.png|4: L197P (28.80)
  +
File:minimise_foldx_e_5.png|5: W209R (52.35)
  +
File:minimise_foldx_e_6.png|6: L470P (72.20)
  +
File:minimise_foldx_e_7.png|7: W312C (63.56)
  +
File:minimise_foldx_e_8.png|8: A384D (22.79)
  +
File:minimise_foldx_e_9.png|9: D443N (45.93)
  +
File:minimise_foldx_e_10.png|10: R44S (-75.12)
  +
</gallery>
  +
<caption>Energy of the FoldX mutant models compared to the FoldX wildtype models over five iterations MINIMISE. In brackets: energy(mutant first iteration)-energy(wildtype first iteration).</caption>
  +
</figure>
   
  +
Comparable to the SCWRL models, the side-chain conformation of the mutated residues are changed only slightly by MINMISE (cf. <xr id="fig:minimise_foldx_m"/>). The side-chain of '''L470P''' and '''W209R''' is moved most. Due to the model refinement, Pymol assigned the secondary structure in some parts of the model differently.
==== AMBER03 ====
 
   
  +
<figure id="fig:minimise_foldx_m">
nstep=50
 
  +
<gallery perrow=5 widths="100">
 
  +
File:minimise_foldx_m_1.gif|1: H60R (91.17)
step=50
 
  +
File:minimise_foldx_m_2.gif|2: V172I (-2.52)
Reached the maximum number of steps before reaching Fmax < 1
 
  +
File:minimise_foldx_m_3.gif|3: E111K (5.91)
real 0m7.446s
 
  +
File:minimise_foldx_m_4.gif|4: L197P (28.80)
user 0m13.230s
 
  +
File:minimise_foldx_m_5.gif|5: W209R (52.35)
sys 0m1.070s
 
  +
File:minimise_foldx_m_6.gif|6: L470P (72.20)
  +
File:minimise_foldx_m_7.gif|7: W312C (63.56)
  +
File:minimise_foldx_m_8.gif|8: A384D (22.79)
  +
File:minimise_foldx_m_9.gif|9: D443N (45.93)
  +
File:minimise_foldx_m_10.gif|10: R44S (-75.12)
  +
</gallery>
  +
<caption>Side-chain conformation of the SCWRL mutant models compared to the FoldX wildtype models over five iterations MINIMISE. In brackets: energy(mutant first iteration)-energy(wildtype first iteration)</caption>
  +
</figure>
   
  +
== Gromacs ==
   
  +
[http://www.gromacs.org/ GROMACS] (GROningen MAchine for Chemical Simulations) is a molecular dynamics simulation package. We used it to minimize our protein in vacuum and analyzed the energies during the minimization.
nstep=100
 
 
step=40
 
Reached the maximum number of steps before reaching Fmax < 1
 
   
  +
=== Runtime analysis ===
real 0m13.987s
 
user 0m25.860s
 
sys 0m1.530s
 
   
  +
To show the relationship between nsteps and runtime of 'mdrun', different nstep were chosen from 50 to 5000. Three different energy functions were selected:
  +
#AMBER03 protein, nucleic AMBER94
  +
#CHARMM27 all-atom force field (with CMAP)
  +
#OPLS-AA/L all-atom force field
   
  +
<br style="clear:both;">
  +
<figtable id="runtime_table">
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 1px 0 1px 0" align="left" width="1000px"
  +
|-
  +
| style="border-style: solid; border-width: 0 0 1px 0" | Forcefield
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=50
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=100
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=200
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=300
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=400
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=500
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=1000
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=2000
  +
| style="border-style: solid; border-width: 0 0 1px 0" | nstep=5000
  +
|-
  +
|-
  +
| AMBER03 <br/> (360 steps to reach the minimum) || 7.446 || 13.987 ||27.186 || 40.664 || 48.479 || 48.481 || 48.475 || 48.694 || 48.743
  +
|-
  +
| CHARMM27 <br/> (348 steps to reach the minimum)|| 7.292 || 13.785 || 26.843 || 39.990 || 46.322 || 46.158 || 46.121 || 45.345 || 46.013
  +
|-
  +
| OPLS-AA/L <br/> (1177 steps to reach the minimum)|| 6.063 || 11.513 || 22.378 || 32.959 || 43.790 || 54.252 || 107.718 || 126.707 || 126.734
  +
|-
  +
|}
  +
<br style="clear:both;">
  +
<caption>Runtime of minimization with Gromacs for different nsteps by using AMBER03, CHARMM27 and OPLS-AA/L forcefield. Only the 'real time' returned from linux 'time' command is shown here. The unit of time is second. </caption>
  +
</figtable>
  +
<br style="clear:both;">
   
  +
<xr id="runtime_table"/> presented the running time of Gromacs minimization for different setting of nsteps by using different forcefield. Nsteps told the program the maximum number of steps to minimize during the simulation. The computation time went higher when the Nsteps was higher. The table showed also the the steps the program needed to reach the minimum. By using AMBER03 and CHARMM27, only 360 and 348 steps were needed. By using OPLS-AA/L, it was 1177 steps, therefore it ran slower than the other two.
nstep=200
 
   
  +
<br style="clear:both;">
step=200
 
Reached the maximum number of steps before reaching Fmax < 1
 
   
  +
<figure id="fig:runtim">
real 0m27.186s
 
  +
[[File:runtime.png|thumb|400px|left|<caption> Runtime of minimization with Gromacs for different nsteps by using the OPLS-AA/L forcefield and the wildtype structure 2nt0_A.</caption>]]
user 0m51.340s
 
  +
</figure>
sys 0m2.540s
 
   
  +
<br style="clear:both;">
   
  +
In <xr id="fig:runtim"/> we showed the runtime plot for different setting of nsteps(from 50 to 5000). By using the OPLS-AA/L force field, the program needed 1177 steps to reach the minimum. From nsteps=50 to nsteps=1177, the runtime increased lineraly because it would run exactly nsteps time. After nsteps=1177, since the minimum has reached, the program would always stopped at 1177 steps no matter what the setting of nsteps was, therefore the running time stayed in the same.
   
  +
=== Mutations ===
nstep=300
 
   
  +
We employed only AMBER force field for 10 mutated structures. <xr id="mutation_gromacs"/> showed the average bond, angle and potential energy of the mutated structures and the wildtype, and also the corresponding differences between mutated each structure and wildtype.
Step=300
 
Reached the maximum number of steps before reaching Fmax < 1
 
   
  +
<figtable id="mutation_gromacs">
real 0m40.664s
 
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 1px 0 1px 0" align="left" width="1200px"
user 1m16.690s
 
  +
|-
sys 0m3.860s
 
  +
| style="border-style: solid; border-width: 0 0 1px 0" |
  +
| style="border-style: solid; border-width: 0 0 1px 0" | WT
  +
| style="border-style: solid; border-width: 0 0 1px 0" | H60R
  +
| style="border-style: solid; border-width: 0 0 1px 0" | V172I
  +
| style="border-style: solid; border-width: 0 0 1px 0" | E111K
  +
| style="border-style: solid; border-width: 0 0 1px 0" | L197P
  +
| style="border-style: solid; border-width: 0 0 1px 0" | W209R
  +
| style="border-style: solid; border-width: 0 0 1px 0" | L470P
  +
| style="border-style: solid; border-width: 0 0 1px 0" | W312C
  +
| style="border-style: solid; border-width: 0 0 1px 0" | A384D
  +
| style="border-style: solid; border-width: 0 0 1px 0" | D443N
  +
| style="border-style: solid; border-width: 0 0 1px 0" | R44S
  +
|-
  +
| Average Energy Bond || 1699.12 || 1624.88 || 1581.54 || 1973.46 || 1470.86 || 1598.3 || 1694.03 || 1579.31 || 1451.23 || 1621.83 || 1685.66
  +
|-
  +
| Difference Energy Bond || 0 || -74.24 || -117.58 || 274.34 || -228.26 || -100.82 || -5.09 || -119.81 || -247.89 || -77.29 || -13.46
  +
|-
  +
| Average Energy Angle || 4375.68 || 4289.66 || 4371.18 || 4382.41 || 4381.32 || 4387.08 || 4425.35 || 4361.97 || 4361.9 || 4367.22 || 4368.95
  +
|-
  +
| Difference Energy Angle || 0 || -86.02 || -4.5 || 6.73 || 5.64 || 11.4 || 49.67 || -13.71 || -13.78 || -8.46 || -6.73
  +
|-
  +
| Average Energy Potential || -44445.9 || -45289.6 || -45025.6 || -43553.6 || -45475.1 || -45265.6 || -44266.1 || -45024.9 || -45805.7 || -44803.9 || -43924.1
  +
|-
  +
| Difference Energy Potential || 0 || -843.7 || -579.7 || <span style="color:red">892.3</span> || <span style="color:blue">-1029.2</span> || -819.7 || <span style="color:red">179.8</span> || -579 || <span style="color:blue">-1359.8</span> || -358 || <span style="color:red">521.8</span>
  +
|-
  +
|}
  +
<br style="clear:both;">
  +
<caption> The average bond, angle and potential energy of the mutated structures and the wildtype, and the corresponding differences. The positive energy differences are colored in red suggesting unstable structure compared to the wildtype structure. The relative higher energy differences are colored in blue suggesting much more stable structure. </caption>
  +
</figtable>
   
  +
It was expected to see significant energy difference between some mutated structures and the wildtype structure in <xr id="mutation_gromacs"/>. If the difference was positive and significant, it would suggest that the mutated structure was much more unstable than the wildtype one, therefore it could be harmful. If the difference was negative and significant, it would suggest that the mutated structure was much more stable. Such mutation would be harmful too, because too much stability might reduce the flexibility of the protein, therefore could lead to negative impact on the protein function. If the difference was minor, it would suggest that the mutation was neutral. In <xr id="mutation_gromacs"/>, however, no very significant difference was found. Mutation E111K, L470P and R44S showed increased potential energy. Mutation L197P and A384D showed relative higher negative potential energy difference than the wildtype. Since such difference seemed not very significant, it was hard to decide whether they are damaging mutation candidates or not.
   
  +
== Discussion ==
nstep=400
 
  +
In the previous sections, different energy functions were used to calculate the energy of the models which resulted by substitution particular residues. The energy of these mutant models was then compared to the energy of the wildtype models: mutations which resulted in a model with a higher energy than the wildtype model are likely to be disease-causing. We therefore took a straightforward approach for classifying a mutation as disease-causing or non-disease-causing depending on its energy difference: a mutations is disease-causing, if its energy difference is positive and if it belongs to the five mutations with the highest energy difference.
   
  +
<figtable id="tab:discussion">
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:right" width="1000px"
  +
|- style="background-color: lightgrey; text-align:center"
  +
! colspan="2" style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Method
  +
! 1 !! 2 !! 3 !! 4 !! 5 !! 6 !! 7 !! 8 !! 9 !! 10 !! Prediction
  +
|- style="background-color: lightgrey; text-align:center"
  +
! ''Name'' !! style="border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | ''Weight''
  +
! H60R !! V172I !! E111K !! L197P !! W209R !! L470P !! W312C !! A384D !! D443N !! R44S !! Accuracy
  +
|-
  +
| colspan=13 style="background-color: lightgrey; border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 0 0 1px 0"|
  +
|-
  +
| style="background-color:lightgrey; text-align:left" | [[#SCWRL|SCWRL]]
  +
| style="background-color:lightgrey; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | 1.0
  +
| style="background-color:lightgreen" | 2.88 (8) || style="background-color:lightgreen" | 4.39 (6) || style="background-color:#FF3333" | 8.14 (4) || style="background-color:lightgreen" | 3.28 (7) || style="background-color:#FF3333" | 9.23 (3) || style="background-color:#FF3333" | 28.84 (1) || style="background-color:lightgreen" | 0.18 (10) || style="background-color:#FF3333" | 13.50 (2) || style="background-color:lightgreen" | 2.36 (9) || style="background-color:#FF3333" | 4.85 (5) || 60%
  +
|-
  +
| style="background-color:lightgrey; text-align:left" | [[#FoldX|FoldX]]
  +
| style="background-color:lightgrey; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | 1.0
  +
| style="background-color:lightgreen" | -0.39 (9) || style="background-color:lightgreen" | -0.80 (10) || style="background-color:lightgreen" | -0.36 (8) || style="background-color:lightgreen" | 0.28 (6) || style="background-color:#FF3333" | 2.86 (3) || style="background-color:#FF3333" | 8.59 (1) || style="background-color:#FF3333" | 1.88 (4) || style="background-color:#FF3333" | 4.52 (2) || style="background-color:lightgreen" | -0.34 (7) || style="background-color:#FF3333" | 0.60 (5) || 60%
  +
|-
  +
| style="background-color:lightgrey; text-align:left" | [[#SCWRL models|MINIMISE - SCWRL]]
  +
| style="background-color:lightgrey; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | 1.0
  +
| style="background-color:#FF3333" | 118.96 (1) || style="background-color:lightgreen" | 2.48 (8) || style="background-color:lightgreen" | -1.02 (9) || style="background-color:lightgreen" | 17.59 (6) || style="background-color:#FF3333" | 65.80 (4) || style="background-color:#FF3333" | 67.08 (3) || style="background-color:lightgreen" | 15.31 (7) || style="background-color:#FF3333" | 26.65 (5) || style="background-color:#FF3333" | 91.79 (2) || style="background-color:lightgreen" | -110.46 (10) || 40%
  +
|-
  +
| style="background-color:lightgrey; text-align:left" | [[#FoldX models|MINIMISE - FoldX]]
  +
| style="background-color:lightgrey; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | 1.0
  +
| style="background-color:#FF3333" | 91.17 (1) || style="background-color:lightgreen" | -2.52 (9) || style="background-color:lightgreen" | 5.91 (8) || style="background-color:lightgreen" | 28.80 (6) || style="background-color:#FF3333" | 52.35 (4) || style="background-color:#FF3333" | 72.20 (2) || style="background-color:#FF3333" | 63.56 (3) || style="background-color:lightgreen" | 22.79 (7) || style="background-color:#FF3333" | 45.93 (5) || style="background-color:lightgreen" | -75.12 (10) || 40%
  +
|-
  +
| style="background-color:lightgrey; text-align:left" | [[#Gromacs|AMBER]]
  +
| style="background-color:lightgrey; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | 1.0
  +
| style="background-color:lightgreen" | -843.7 (8) || style="background-color:lightgreen" | -579.7 (6) || style="background-color:#FF3333" | 892.3 (1) || style="background-color:lightgreen" | -1029.2 (9) || style="background-color:lightgreen" | -819.7 (7) || style="background-color:#FF3333" | 179.8 (3) || style="background-color:lightgreen" | -579 (5) || style="background-color:lightgreen" | -1359.8 (10) || style="background-color:lightgreen" | -358 (4) || style="background-color:#FF3333" | 521.8 (2) || 40%
  +
|-
  +
| colspan=13 style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 0 0 1px 0"|
  +
|-
  +
| colspan=13 style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 0 0 1px 0"|
  +
|-
  +
| colspan="2" style="background-color:lightgrey; text-align:left; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Prediction
  +
| style="background-color:lightgreen" | !Disease || style="background-color:lightgreen" | !Disease|| style="background-color:lightgreen" | !Disease|| style="background-color:lightgreen" | !Disease|| style="background-color:#FF3333" | Disease|| style="background-color:#FF3333" | Disease|| style="background-color:lightgreen" | !Disease || style="background-color:#FF3333" | Disease || style="background-color:lightgreen" | !Disease || style="background-color:#FF3333" | Disease || 50%
  +
|-
  +
| colspan=13 style="background-color:lightgrey; border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 0 0 1px 0"|
  +
|-
  +
| colspan="2" style="background-color:lightgrey; text-align:left; border-style: solid; border-spacing: 0; border-width: 0 1px 0 0;" | Verification
  +
| style="background-color:lightgreen" | !Disease || style="background-color:lightgreen" | !Disease || style="background-color:#FF3333" | Disease || style="background-color:#FF3333" | Disease || style="background-color:#FF3333" | Disease || style="background-color:lightgreen" | !Disease || style="background-color:#FF3333" | Disease || style="background-color:#FF3333" | Disease || style="background-color:lightgreen" | !Disease || style="background-color:lightgreen" | !Disease
  +
|}
  +
<caption>Summary of the structure-bases mutations analysis. For each mutation, the energy difference, i.e. energy(mutant)-energy(wildtype), is listed. In brackets is the rank of the mutation after sorting the mutations by their energy difference. A mutations is disease-causing according to a certain method, if its energy difference is positive and if it belongs to the five mutations with the highest energy difference. A mutations is predicted as disease-causing, it is disease-causing according to the majority of tools.</caption>
  +
</figtable>
   
  +
<xr id="tab:discussion"/> summarizes the prediction results. Compared to the [[Sequence-based mutation analysis Gaucher Disease|sequence-based mutation analysis]], none of the methods obtained an accuracy above 60%. The SCWRL and the FoldX energy functions turned out to be the most reliable energy functions. '''L470P''' clearly decreases the energy of the model according to all methods which is a strong evidence that this mutation impairs the protein function. We drew the same conclusion in the [[Sequence-based mutation analysis Gaucher Disease|sequence-based mutation analysis]]. Nevertheless, this mutation is not present in the HGMD and therefore not yet considered as disease-causing.
nstep=500
 
 
step=360
 
 
Stepsize too small, or no change in energy.
 
 
real 0m48.481s
 
user 1m32.200s
 
sys 0m4.190s
 
 
nstep=600
 
 
nstep=700
 
 
nstep=800
 
 
nstep=900
 
 
nstep=1000
 
 
step=360
 
Stepsize too small, or no change in energy.
 
 
real 0m48.475s
 
user 1m31.650s
 
sys 0m4.720s
 
 
nstep=1500
 
 
 
nstep=2000
 
step=360
 
Stepsize too small, or no change in energy.
 
real 0m48.694s
 
user 1m31.450s
 
sys 0m4.990s
 
 
 
nstep=2500
 
 
nstep=3000
 
 
nstep=5000
 
step=360
 
Stepsize too small, or no change in energy.
 
real 0m48.743s
 
user 1m32.050s
 
sys 0m4.550s
 
 
==== CHARMM27 ====
 
 
 
nstep=50
 
 
step=45
 
real 0m6.770s
 
user 0m12.000s
 
sys 0m0.970s
 
 
 
nstep=100
 
 
step=40
 
real 0m6.039s
 
user 0m10.690s
 
sys 0m0.850s
 
 
 
nstep=200
 
 
step=44
 
real 0m6.700s
 
user 0m11.640s
 
sys 0m1.010s
 
 
 
nstep=300
 
 
Step=90
 
 
real 0m12.573s
 
user 0m23.430s
 
sys 0m1.210s
 
 
 
nstep=400
 
 
Step=20
 
 
real 0m3.537s
 
user 0m5.670s
 
sys 0m0.730s
 
 
 
nstep=500
 
 
nstep=600
 
 
nstep=700
 
 
nstep=800
 
 
nstep=900
 
 
nstep=1000
 
 
step=61
 
 
real 0m8.915s
 
user 0m16.210s
 
sys 0m0.990s
 
 
 
nstep=1500
 
step=50
 
real 0m7.583s
 
user 0m13.250s
 
sys 0m1.000s
 
 
 
nstep=2000
 
step=29
 
real 0m4.637s
 
user 0m7.850s
 
sys 0m0.740s
 
 
nstep=2500
 
 
nstep=3000
 
 
nstep=5000
 
 
=== Mutations ===
 
 
Mutation 1
 
 
   
  +
== Conclusions ==
Energy Average Err.Est. RMSD Tot-Drift
 
  +
Tools like SCWRL or FoldX can aid finding probable side-chain conformations of mutated residues and the provided energy values help to assess the impact of mutations. However, the output of these programs is not sufficient to reliable classify mutations as disease-causing or non-disease causing since (1) the energy calculations depend on model assumptions which might not be true and since (2) models with a higher energy are not necessarily the worse models as protein flexibility is important for protein function in many cases. We conclude that it is essential to also investigate various structural features manually and to integrate sequence-information for being able to estimate the severity of mutations.
-------------------------------------------------------------------------------
 
Bond 1624.88 820 5492.83 -5022.62 (kJ/mol)
 
Angle 4289.66 76 402.822 -411.894 (kJ/mol)
 
Potential -45289.6 2900 16896.6 -18817.6 (kJ/mol)
 
   
 
== References ==
 
== References ==

Latest revision as of 22:35, 26 June 2012

The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. For this, we first chose a reliable crystal structure of glycosylceramidase and than mapped ten mutation onto this structure. Next, different methods were tried for mutating residues and calculating the energy of the resulting models, namels SCWRL<ref name="scwrl">Qiang Wang, Adrian A. Canutescu, and Roland L. Dunbrack, Jr.(2008). SCWRL and MolIDE: Computer programs for side-chain conformation prediction and homology modeling. Nat Protoc.</ref>, FoldX<ref name="foldx">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005) The FoldX web server: an online force field. Nucleic Acids Research.</ref>, MINIMISE, and Gromacs<ref name="gromacs">H.J.C. Berendsen, D. van der Spoel, R. van Drunen. (1995) GROMACS: A message-passing parallel molecular dynamics implementation.Computer Physics Communications.</ref>. In the end, the mutations were classified as disease-causing or non-disease causing depending to which extend they change the energy of the structure. Technical details are reported in our protocol.

Cystral structure

The UniProtKB entry P04062 lists 23 crystal structure for glycosylceramidase. The five structures with the highest resolutions are enumerated in <xr id="tab:mutations"/>. All of them cover the complete sequence of P04062 but the 39 residues long signal peptide at the beginning. For the subsequent analysis, we chose chain A of 2nt0 which has the highest resolution and was crystalized at the physiological lysosomal pH value of 4.5.

<figtable id="tab:mutations">

PDB Res [Å] R value Coverage pH
2nt0 1.80 0.18 96% (40-536) 4.5
3gxi 1.84 0.19 96% (40-536) 5.5
2v3f 1.95 0.15 96% (40-536) 6.5
2v3d 1.96 0.16 96% (40-536) 6.5
1ogs 2.00 0.18 96% (40-536) 4.6

The 5 crystral structures of glycosylceramidase with the highest resolution. The physiological lysosomal pH value is 4.5. 2nt0 was selected for the analysis. </figtable>

Mutations

For the purpose of comparability, we used the same mutations as we did for the sequence-based mutation analysis. We took the HHsearch alignment for mapping these mutations onto 2nt0_A. <xr id="tab:mutations"/> lists all ten mutations and their position in the sequence P04062 as well as the structure 2nt0_A.

<figtable id="tab:mutations">

Nr Pos
P04062
Pos
2nt0_A
From To Disease
causing
1 99 60 H R No
2 211 172 V I No
3 150 111 E K Yes
4 236 197 L P Yes
5 248 209 W R Yes
6 509 470 L P No
7 351 312 W C Yes
8 423 384 A D Yes
9 482 443 D N No
10 83 44 R S No

Mutations used for the structure-based mutation analysis. </figtable>

<xr id="fig:mutations"/> visualizes the locations of the selected mutation in 2nt0_A. W312C is closest to the active site residues E235 and E340. W209R, L470P, and W312C and located in secondary structure elements and L470P is likely to break the beta-sheet. The remaining mutations are part of loop regions.

<figure id="fig:mutations">

2nt0_A with the selected mutations used for the structure-based analysis. Blue: wildtype residues; Red: mutant residues; Orange: active site residues E235 and E340.

</figure>

SCWRL

We employed SCWRL <ref name="scwrl"/> for substituting the wildtype residues listed in <xr id="tab:mutations"/> by the corresponding mutant residues which are chosen from a rotamer library. <xr id="fig:scwrl"/> denotes the results.

<figure id="fig:scwrl">

Rotamers of SNPs from <xr id="tab:mutations"/>. Blue: wildtype; Red: rotamer SCWRL; In brackets: energy(mutant)-energy(wildtype). </figure>

None of rotamers chosen by SCWRL clashed with another side-chain or the backbone. The only mutation which led to a structural change was L470P. Here, the insertion of proline interrupted the beta-sheet. The hydrogen bonding network changed in case of mutation number 1, 5, 7, and 8 (cf. <xr id="tab:scwrl"/>). W209R introduces a hydrophilic arginine which forms a hydrogen bond to T180. Although not predicted by SCWRL, the arginine might impact the protein structure. W312C is located next to the active site (cf. <xr id="fig:mutations"/>) and there exists a hydrogen bond to E340. Substitution the hydrophobic tryptohphane by a hydrophlic cysteine in the vicinity of the active site might account for the disease-causing effect of this mutation.

As expected, all mutations increased the energy of the model (cf. the energy difference in brackets in <xr id="fig:scwrl"/>). The energy increased most in case of L470P due to the break of the beta-sheet. A384D and W209R also made the model less stable which is caused by substituting an unpolar residue by a charged residue. All four mutations which increased the model energy most are disease-causing.

<figtable id="tab:scwrl">

Nr Mutation Wildtype Mutant Clashes Structural
change
H-bonds Hydrophobicity H-bonds Hydrophobicity
1 H60R T471 Hydrophilic G62 Hydrophilic No No
2 V172I Hydrophobic Hydrophobic No No
3 E111K Hydrophilic Hydrophilic No No
4 L197P Hydrophobic Hydrophobic No No
5 W209R Hydrophobic T180 Hydrophilic No No
6 L470P T482 Hydrophobic T482 Hydrophobic No Yes
7 W312C E340, C342, P316 Hydrophobic E340, C342 Hydrophilic No No
8 A384D Hydrophobic V404 Hydrophilic No No
9 D443N Hydrophilic Hydrophilic No No
10 R44S S13, Y487 Hydrophilic S13, Y487 Hydrophilic No No

Structure-based analysis of SNPs from <xr id="tab:mutations"/>. H-bonds: residues involved in forming hydrogen bonds (cut-off: 3.2 Å). </figtable>

We further noticed that SCRWL changed the backbone at some positions which led to different secondary structure assignments (<xr id="fig:scwrl_ss"/>). The positions at which the deviations could be observed were independent from the mutated sites.

<figure id="fig:scwrl_ss">

Secondary structure elements of 2nt0_A (grey) compared to secondary structure elements of models built by SCRWL.

</figure>

FoldX

FoldX <ref name="foldx"/> is a force field for assessing the impact of point mutations. We called FoldX to determine the optimal side-chain conformation of the mutated site and compared the energy of the mutant model with the wildtype model in order to assess the severity of each mutation.

<xr id="fig:foldx"/> depicts the side-chain conformation predicted by FoldX in comparison to SCWRL and the wildtype. The predictions of FoldX and SCWRL differed in four cases. In case of H60R, the side-chain orientation of arginine predicted by FoldX forms two instead one hydrogen bonds to T741 and might therefore impact the protein structure more than the orientation of SCRWL. In case of A384D, the romater of FoldX might be more stable than the one of SCWRL since it has a higher distance to the surrounding residues. In case of D443N, we prefer the prediction of SCWRL which is closer to the wildtype configuration. For the same reason we prefer the prediction of FoldX in case of R442. For the subsequent GROMACS analysis, we hence chose the FoldX model in case of mutation number 8 and 10 and the SCWRL models for all all other mutations.

<figure id="fig:foldx">

Rotamers of SNPs from <xr id="tab:mutations"/>. Blue: wildtype; Red: rotamer SCWRL; Orange: rotamer FoldX; In brackets: energy(mutant)-energy(wildtype). </figure>

A comprehensive list of the differences between the mutant and the wildtype models can be found here. The total energy increased in case of mutation number 4-8, and 10. Just as in case of SCWRL (cf. <xr id="fig:scwrl"/>), L470P, A384D, and W209R increased the energy of the model most which suggests that these mutations are disease-causing. The fact that some mutations even increase the stability of the model indicates a mild effect on the protein function.

MINIMISE

MINIMISE is a program that applies the CHARMM22 force-field and a rotamer library for minimizing the energy of models. We employed five iterations MINIMISE to refine the models generated by SCWRL and FoldX and compared the energy of the most stable mutant model to the most stable wildtype model in order to assess the effect of each point mutation.

SCWRL models

The energy of all SCWRL models could be reduced up to the second iteration and than gradually increased again. We therefore considered the models after two iterations MINIMISE as the most reliable ones which were used to assess the impact of the mutations on the phenotype. For this, we compared the MINIMISE energy of the mutant model to energy of the wildtype model (cf. the values in brackets in <xr id="fig:minimise_scwrl_e"/>). As already observed when employing SCWRL (cf. <xr id="fig:foldx"/>) and FoldX (cf. <xr id="fig:foldx"/>), L470P, D443N, and W209R significantly increase the the energy of the model. However, H60R and D443N now also lead to unstable models which would suggest a effect on the phenotype. Comparable to FoldX, E111K leads to a more stable model. In contrast to SCWRL and FoldX, the resulting model of R44S is also much more stable than the wildtype model according to the MINIMISE energy function.

<figure id="fig:minimise_scwrl_e">

Energy of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations MINIMISE. In brackets: energy(mutant 2nd iteration)-energy(wildtype 2nd iteration). </figure>

<xr id="fig:minimise_scwrl_m"/> show to which extent the MINIMISE optimization alters the side-chain conformations of the mutated residue compared to the wildtype residue. Altogether, the side-chain orientation of the mutated residues is changed only slightly and less than the side-chain of the wildtype model. Note that not only the side-chain of the mutated residue changes but also the side-chains of neighbouring residues.

<figure id="fig:minimise_scwrl_m">

Side-chain conformation of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations MINIMISE. In brackets: energy(mutant 2nd iteration)-energy(wildtype 2nd iteration). </figure>

FoldX models

The most stable FoldX model were obtained after one iterations MINIMISE (cf. <xr id="fig:minimise_foldx_e"/>). Further iterations resulted in models with a higher energy. Hence, we compared the mutant and wildtype model of the first iteration for estimating the impact of the mutations. If the mutations are sorted by their energy, the order is similar to the order of the SCWRL models which were optimized by MINIMISE: H60R increases the energy most, followed by L470P, W312C , and W209R. In contrast, R44S results in the most stable model.

<figure id="fig:minimise_foldx_e">

Energy of the FoldX mutant models compared to the FoldX wildtype models over five iterations MINIMISE. In brackets: energy(mutant first iteration)-energy(wildtype first iteration). </figure>

Comparable to the SCWRL models, the side-chain conformation of the mutated residues are changed only slightly by MINMISE (cf. <xr id="fig:minimise_foldx_m"/>). The side-chain of L470P and W209R is moved most. Due to the model refinement, Pymol assigned the secondary structure in some parts of the model differently.

<figure id="fig:minimise_foldx_m">

Side-chain conformation of the SCWRL mutant models compared to the FoldX wildtype models over five iterations MINIMISE. In brackets: energy(mutant first iteration)-energy(wildtype first iteration) </figure>

Gromacs

GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics simulation package. We used it to minimize our protein in vacuum and analyzed the energies during the minimization.

Runtime analysis

To show the relationship between nsteps and runtime of 'mdrun', different nstep were chosen from 50 to 5000. Three different energy functions were selected:

  1. AMBER03 protein, nucleic AMBER94
  2. CHARMM27 all-atom force field (with CMAP)
  3. OPLS-AA/L all-atom force field


<figtable id="runtime_table">

Forcefield nstep=50 nstep=100 nstep=200 nstep=300 nstep=400 nstep=500 nstep=1000 nstep=2000 nstep=5000
AMBER03
(360 steps to reach the minimum)
7.446 13.987 27.186 40.664 48.479 48.481 48.475 48.694 48.743
CHARMM27
(348 steps to reach the minimum)
7.292 13.785 26.843 39.990 46.322 46.158 46.121 45.345 46.013
OPLS-AA/L
(1177 steps to reach the minimum)
6.063 11.513 22.378 32.959 43.790 54.252 107.718 126.707 126.734


Runtime of minimization with Gromacs for different nsteps by using AMBER03, CHARMM27 and OPLS-AA/L forcefield. Only the 'real time' returned from linux 'time' command is shown here. The unit of time is second. </figtable>

<xr id="runtime_table"/> presented the running time of Gromacs minimization for different setting of nsteps by using different forcefield. Nsteps told the program the maximum number of steps to minimize during the simulation. The computation time went higher when the Nsteps was higher. The table showed also the the steps the program needed to reach the minimum. By using AMBER03 and CHARMM27, only 360 and 348 steps were needed. By using OPLS-AA/L, it was 1177 steps, therefore it ran slower than the other two.


<figure id="fig:runtim">

Runtime of minimization with Gromacs for different nsteps by using the OPLS-AA/L forcefield and the wildtype structure 2nt0_A.

</figure>


In <xr id="fig:runtim"/> we showed the runtime plot for different setting of nsteps(from 50 to 5000). By using the OPLS-AA/L force field, the program needed 1177 steps to reach the minimum. From nsteps=50 to nsteps=1177, the runtime increased lineraly because it would run exactly nsteps time. After nsteps=1177, since the minimum has reached, the program would always stopped at 1177 steps no matter what the setting of nsteps was, therefore the running time stayed in the same.

Mutations

We employed only AMBER force field for 10 mutated structures. <xr id="mutation_gromacs"/> showed the average bond, angle and potential energy of the mutated structures and the wildtype, and also the corresponding differences between mutated each structure and wildtype.

<figtable id="mutation_gromacs">

WT H60R V172I E111K L197P W209R L470P W312C A384D D443N R44S
Average Energy Bond 1699.12 1624.88 1581.54 1973.46 1470.86 1598.3 1694.03 1579.31 1451.23 1621.83 1685.66
Difference Energy Bond 0 -74.24 -117.58 274.34 -228.26 -100.82 -5.09 -119.81 -247.89 -77.29 -13.46
Average Energy Angle 4375.68 4289.66 4371.18 4382.41 4381.32 4387.08 4425.35 4361.97 4361.9 4367.22 4368.95
Difference Energy Angle 0 -86.02 -4.5 6.73 5.64 11.4 49.67 -13.71 -13.78 -8.46 -6.73
Average Energy Potential -44445.9 -45289.6 -45025.6 -43553.6 -45475.1 -45265.6 -44266.1 -45024.9 -45805.7 -44803.9 -43924.1
Difference Energy Potential 0 -843.7 -579.7 892.3 -1029.2 -819.7 179.8 -579 -1359.8 -358 521.8


The average bond, angle and potential energy of the mutated structures and the wildtype, and the corresponding differences. The positive energy differences are colored in red suggesting unstable structure compared to the wildtype structure. The relative higher energy differences are colored in blue suggesting much more stable structure. </figtable>

It was expected to see significant energy difference between some mutated structures and the wildtype structure in <xr id="mutation_gromacs"/>. If the difference was positive and significant, it would suggest that the mutated structure was much more unstable than the wildtype one, therefore it could be harmful. If the difference was negative and significant, it would suggest that the mutated structure was much more stable. Such mutation would be harmful too, because too much stability might reduce the flexibility of the protein, therefore could lead to negative impact on the protein function. If the difference was minor, it would suggest that the mutation was neutral. In <xr id="mutation_gromacs"/>, however, no very significant difference was found. Mutation E111K, L470P and R44S showed increased potential energy. Mutation L197P and A384D showed relative higher negative potential energy difference than the wildtype. Since such difference seemed not very significant, it was hard to decide whether they are damaging mutation candidates or not.

Discussion

In the previous sections, different energy functions were used to calculate the energy of the models which resulted by substitution particular residues. The energy of these mutant models was then compared to the energy of the wildtype models: mutations which resulted in a model with a higher energy than the wildtype model are likely to be disease-causing. We therefore took a straightforward approach for classifying a mutation as disease-causing or non-disease-causing depending on its energy difference: a mutations is disease-causing, if its energy difference is positive and if it belongs to the five mutations with the highest energy difference.

<figtable id="tab:discussion">

Method 1 2 3 4 5 6 7 8 9 10 Prediction
Name Weight H60R V172I E111K L197P W209R L470P W312C A384D D443N R44S Accuracy
SCWRL 1.0 2.88 (8) 4.39 (6) 8.14 (4) 3.28 (7) 9.23 (3) 28.84 (1) 0.18 (10) 13.50 (2) 2.36 (9) 4.85 (5) 60%
FoldX 1.0 -0.39 (9) -0.80 (10) -0.36 (8) 0.28 (6) 2.86 (3) 8.59 (1) 1.88 (4) 4.52 (2) -0.34 (7) 0.60 (5) 60%
MINIMISE - SCWRL 1.0 118.96 (1) 2.48 (8) -1.02 (9) 17.59 (6) 65.80 (4) 67.08 (3) 15.31 (7) 26.65 (5) 91.79 (2) -110.46 (10) 40%
MINIMISE - FoldX 1.0 91.17 (1) -2.52 (9) 5.91 (8) 28.80 (6) 52.35 (4) 72.20 (2) 63.56 (3) 22.79 (7) 45.93 (5) -75.12 (10) 40%
AMBER 1.0 -843.7 (8) -579.7 (6) 892.3 (1) -1029.2 (9) -819.7 (7) 179.8 (3) -579 (5) -1359.8 (10) -358 (4) 521.8 (2) 40%
Prediction !Disease !Disease !Disease !Disease Disease Disease !Disease Disease !Disease Disease 50%
Verification !Disease !Disease Disease Disease Disease !Disease Disease Disease !Disease !Disease

Summary of the structure-bases mutations analysis. For each mutation, the energy difference, i.e. energy(mutant)-energy(wildtype), is listed. In brackets is the rank of the mutation after sorting the mutations by their energy difference. A mutations is disease-causing according to a certain method, if its energy difference is positive and if it belongs to the five mutations with the highest energy difference. A mutations is predicted as disease-causing, it is disease-causing according to the majority of tools. </figtable>

<xr id="tab:discussion"/> summarizes the prediction results. Compared to the sequence-based mutation analysis, none of the methods obtained an accuracy above 60%. The SCWRL and the FoldX energy functions turned out to be the most reliable energy functions. L470P clearly decreases the energy of the model according to all methods which is a strong evidence that this mutation impairs the protein function. We drew the same conclusion in the sequence-based mutation analysis. Nevertheless, this mutation is not present in the HGMD and therefore not yet considered as disease-causing.

Conclusions

Tools like SCWRL or FoldX can aid finding probable side-chain conformations of mutated residues and the provided energy values help to assess the impact of mutations. However, the output of these programs is not sufficient to reliable classify mutations as disease-causing or non-disease causing since (1) the energy calculations depend on model assumptions which might not be true and since (2) models with a higher energy are not necessarily the worse models as protein flexibility is important for protein function in many cases. We conclude that it is essential to also investigate various structural features manually and to integrate sequence-information for being able to estimate the severity of mutations.

References

<references/>