Structure-based mutation analysis Gaucher Disease
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. Technical details are reported in our protocol.
|PDB||Res [Å]||R value||Coverage||pH|
The 5 crystral structures of glycosylceramidase with the highest resolution. The physiological lysosomal pH value is 4.5. 2nt0 was selected for the analysis. </figtable>
Mutations used for the structure-based mutation analysis. </figtable>
We employed SCWRL <ref name="scwrl">Qiang Wang, Adrian A. Canutescu, and Roland L. Dunbrack, Jr.(2008). SCWRL and MolIDE: Computer programs for side-chain conformation prediction and homology modeling. Nat Protoc.</ref> for substituting the wildtype residues listed in <xr id="tab:mutations"/> by the corresponding mutatant residues which are chosen from a rotamer library. <xr id="fig:scwrl"/> denotes the results.
Rotamers of SNPs from <xr id="tab:mutations"/>. Blue: wildtype; Red: rotamer SCWRL; In brackets: energy(mutant)-energy(wildtype). </figure>
None of rotamers chosen by SCWRL clashed with another side-chain or the backbone. The only mutation which led to a structural change was L470P. Here, the insertion of proline interrupted the beta-sheet. The hydrogen bonding network changed in case of mutation number 1, 5, 7, and 8 (cf. <xr id="tab:scwrl"/>). W209R introduces a hydrophilic arginine which forms a hydrogen bond to T180. Although not predicted by SCWRL, the arginine might impact the protein structure. W312C is located next to the active site (cf. <xr id="fig:mutations"/>) and there exists a hydrogen bond to E340. Substitution the hydrophobic tryptohphane by a hydrophlic cysteine in the vicinity of the active site might account for the disease-causing effect of this mutation.
As expected, all mutations increased the energy of the model (cf. the energy difference in brackets in <xr id="fig:scwrl"/>). The energy increased most in case of L470P due to the break of the beta-sheet. A384D and W209R also made the model less stable which is caused by substituting an unpolar residue by a charged residue. All four mutations which increased the model energy most are disease-causing.
|7||W312C||E340, C342, P316||Hydrophobic||E340, C342||Hydrophilic||No||No|
|10||R44S||S13, Y487||Hydrophilic||S13, Y487||Hydrophilic||No||No|
Structure-based analysis of SNPs from <xr id="tab:mutations"/>. H-bonds: residues involved in forming hydrogen bonds (cut-off: 3.2 Å). </figtable>
We further noticed that SCRWL changed the backbone at some positions which led to different secondary structure assignments (<xr id="fig:scwrl_ss"/>). The positions at which the deviations could be observed were independent from the mutated sites.
FoldX <ref>Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005) The FoldX web server: an online force field. Nucleic Acids Research.</ref> is a force field for assessing the impact of point mutations. We called FoldX to determine the optimal side-chain conformation of the mutated site and compared the energy of the mutant model with the wildtype model in order to assess the severity of each mutation.
<xr id="fig:foldx"/> depict the side-chain conformation predicted by FoldX in comparison to SCWRL and the wildtype. The predictions of FoldX and SCWRL differed in four cases. In case of H60R, the side-chain orientation of arginine predicted by FoldX forms two instead one hydrogen bonds to T741 and might therefore impact the protein structure more than the orientation of SCRWL. In case of A384D, the romater of FoldX might be more stable than the one of SCWRL since it has a higher distance to the surrounding residues. In case of D443N we prefer the prediction of SCWRL which is closer to the wildtype configuration. For the same reason we prefer the prediction of FoldX in case of R442. For the subsequent GROMACS analysis, we hence chose the FoldX model in case of mutation number 8 and 10 and the SCWRL models for all all other mutations.
Rotamers of SNPs from <xr id="tab:mutations"/>. Blue: wildtype; Red: rotamer SCWRL; Orange: rotamer FoldX; In brackets: energy(mutant)-energy(wildtype). </figure>
A comprehensive list of the differences between the mutant and the wildtype models can be found here. The total energy increased in case of mutation number 4-8, and 10. Just as in case of SCWRL (cf. <xr id="fig:scwrl"/>), L470P, A384D, and W209R increased the energy of the model most which suggests that these mutations are disease-causing. The fact that some mutations even increase the stability of the model indicates a mild effect on the protein function.
Energy of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations minimise. </figure>
Side-chain conformation of the SCWRL mutant models compared to the SCWRL wildtype models over five iterations Minimise. </figure>
Energy of the FoldX mutant models compared to the FoldX wildtype models over five iterations minimise. </figure>
Side-chain conformation of the FoldX mutant models compared to the FoldX wildtype models over five iterations Minimise. </figure>
To show the relationship between nsteps and runtime of 'mdrun', different nstep were chosen from 50 to 5000. Three different energy functions were selected:
- AMBER03 protein, nucleic AMBER94
- CHARMM27 all-atom force field (with CMAP)
- OPLS-AA/L all-atom force field
(360 steps to reach the minimum)
(348 steps to reach the minimum)
(1177 steps to reach the minimum)
Runtime of minimization with Gromacs for different nsteps by using AMBER03, CHARMM27 and OPLS-AA/L forcefield. Only the 'real time' returned from linux 'time' command is shown here. The unit of time is second. </figtable>
<xr id="runtime_table"/> presented the running time of Gromacs minimization for different setting of nsteps by using different forcefield. Nsteps told the program the maximum number of steps to minimize during the simulation. The computation time went higher when the Nsteps was higher. The table showed also the the steps the program needed to reach the minimum. By using AMBER03 and CHARMM27, only 360 and 348 steps were needed. By using OPLS-AA/L, it was 1177 steps, therefore it ran slower than the other two.
In <xr id="fig:runtim"/> we showed the runtime plot for different setting of nsteps(from 50 to 5000). By using the OPLS-AA/L force field, the program needed 1177 steps to reach the minimum. From nsteps=50 to nsteps=1177, the runtime increased lineraly because it would run exactly nsteps time. After nsteps=1177, since the minimum has reached, the program would always stopped at 1177 steps no matter what the setting of nsteps was, therefore the running time stayed in the same.
Energy Average Err.Est. RMSD Tot-Drift ------------------------------------------------------------------------------- Bond 1624.88 820 5492.83 -5022.62 (kJ/mol) Angle 4289.66 76 402.822 -411.894 (kJ/mol) Potential -45289.6 2900 16896.6 -18817.6 (kJ/mol)