Canavan Task 10 - Molecular Dynamics Simulations

From Bioinformatikpedia

Protocol

Further information and commands can be found in the protocol.

Initial Checks

For all three runs, there are 2000 time frames, with a resolution of 5 psec. Therefore the whole simulation ran for 10000 psec = 10 nsec. No errors occured during the run and the simulation finished properly.

<figtable id="gmxcheck">

<xr nolink id="gmxcheck"/> Results from gmxcheck and the logfile for the three MD simulations for Aspartoacylase wildtype and the mutants K213E and A305E
WT K213E A305E
run time 5h 22:37 5h 08:54 5h 08:35
atoms outside of box 406, 408, 480, 482, 483, 484, 485, 486, 500, 501,.. 406, 480, 482, 483, 484, 485, 486, 500, 502, 503,.. 406, 408, 480, 482, 483, 484, 485, 486, 500, 501,..
Last frame 2000, time 10000.000 2000, time 10000.000 2000, time 10000.000

</figtable>

Energies

Temperature, Pressure, Total Energies and Potential Energies have been analysed for the three proteins for the MD run using g_energy. For all analysed thermodynamical parameters convergence could be observed. Though, for pressure the values vary enormously, but the average pressure is close to the specified value of 1 bar.

Temperature

The temperature fluctuates about the reference temperature of 298K that has been specified in the corresponding mdp file. There are only small deviations from this value, which is reflected in the small error estimates.

<figtable id="energies_temp">

<xr nolink id="energies_temp"/> Quality check for convergence of energy values: temperature.
WT K213E A305E
Reference Value 298 K 298 K 298 K
Average 297.914 K 297.908 297.917
Err.Est. 0.0072 0.0059 0.0047
RMSD 1.38693 1.38991 1.3981
Total Drift -0.00042403 (K) 0.00867898 (K) 0.0154339 (K)
Plot <figure id="wt_temp">
<xr nolink id="wt_temp"/>
Temperature of the system for the wildtype smoothed over a window of 20.
</figure>
<figure id="k213_scwrl_temp">
<xr nolink id="k213_scwrl_temp"/>
Temperature of the system for K213E smoothed over a window of 20.
</figure>
<figure id="a305_scwrl_temp.png">
<xr nolink id="a305_scwrl_temp.png"/>
Temperature of the system for A305E smoothed over a window of 20.
</figure>

</figtable>

Pressure

The pressure varies enormously about the reference pressure of 1 bar. This results in large RMSD values. Yet, the average pressure over the whole simulation is very close to the specified pressure of 1 bar and therefore the error estimates are relatively small.


<figtable id="energies_temp">

<xr nolink id="energies_temp"/> Quality check for convergence of energy values: pressure.
WT K213E A305E
Reference Value 1.0 (Berendsen barostat) 1.0 (Berendsen barostat) 1.0 (Berendsen barostat)
Average 1.00763 1.00683 1.00797
Err.Est. 0.018 0.015 0.022
RMSD 111.943 112.761 112.987
Total Drift -0.0713928 (bar) -0.0585283 (bar) -0.100316 (bar)
Plot <figure id="wt_pressure">
<xr nolink id="wt_pressure"/>
Pressure of the system for the wildtype smoothed over a window of 20.
</figure>
<figure id="k213_scwrl_pressure">
<xr nolink id="k213_scwrl_pressure"/>
Pressure of the system for K213E smoothed over a window of 20.
</figure>
<figure id="a305_scwrl_pressure">
<xr nolink id="a305_scwrl_pressure"/>
Pressure of the system for A305E smoothed over a window of 20.
</figure>

</figtable>

Potential Energy

The potential energy decreases over simulation time, which leads to the conclusion, that the conformation of the structure is energetically optimized during the simulation. The decrease in energy is reflected in the large negative drift values. The wildtype has the best average energies and also the best energies at the end of the simulation with about -592000 kJ/mol, whereas for K213E it is -586000 kJ/mol and for A305E -583000 kJ/mol.


<figtable id="energies_potential">

<xr nolink id="energies_potential"/> Quality check for convergence of energy values: potential energy.
WT K213E A305E
Average -592161 -585993 -583187
Err.Est. 55 65 46
RMSD 721.947 726.81 722.076
Total Drift -252.112 (kJ/mol) -425.823 (kJ/mol) -290.165 (kJ/mol)
Plot <figure id="CD_wt_potenergy">
<xr nolink id="CD_wt_potenergy"/>
Potential energy of the system for the wildtype smoothed over a window of 20
</figure>
<figure id="k213_scwrl_poten">
<xr nolink id="k213_scwrl_poten"/>
Potential energy of the system for K213E smoothed over a window of 20
</figure>
<figure id="a305_scwrl_potenergy">
<xr nolink id="a305_scwrl_potenergy"/>
Potential energy of the system for A305E smoothed over a window of 20
</figure>

</figtable>

Total energy

The total energy values resemble the potential energy values. In contrast to the potential energy, kinetic energies are included in the calculation of the total energy. Again the total energy decreases during the simulation. And again, the wildtype protein has the best average total energy with about -485000 kJ/mol against -480000 kJ/mol for the K213E mutant and -478500 kJ/mol for A305E.

<figtable id="energies_total">

<xr nolink id="energies_total"/> Quality check for convergence of energy values: total energy.
WT K213E A305E
Average -485680 -480905 -478506
Err.Est. 54 64 45
RMSD 886.726 888.522 887.724
Total Drift -252.262 (kJ/mol) -422.763 (kJ/mol) -284.743 (kJ/mol)
Plot <figure id="wt_tot_energy">
<xr nolink id="wt_tot_energy"/>
Total energy of the system for the wildtype, smoothed over a window of 20 timesteps.
</figure>
<figure id="k213_scwrl_toten">
<xr nolink id="k213_scwrl_toten"/>
Total energy of the system for K213E, smoothed over a window of 20 timesteps.
</figure>
<figure id="a305e_scwrl_totenergy">
<xr nolink id="a305e_scwrl_totenergy"/>
Total energy of the system for A305E smoothed over a window of 20
</figure>

</figtable>

distances between periodic boundaries

We calculated the minimum distance between periodic images for the whole protein (not only C-alpha atoms). The suggested distance limit of 2nm is undercut at some timesteps during the simulation. Especially for the mutant K213E the distance often is below 2nm. This might have caused undesired unphysical interactions.

<figtable id="periodic_boundary">

<xr nolink id="periodic_boundary"/> Minimum distances between periodic images for the simulation run calculated with g_mindist.
WT K213E A305E
shortest dist 1.6456 (nm) 1.47431 (nm) 1.59859 (nm)
at time step 7675 (ps) 7515 (ps) 2460 (ps)
between atoms 15 and 4507 598 and 4497 597 and 4507
Plot <figure id="wt_pi">
<xr nolink id="wt_pi"/>
Minimum distance of periodic images for the wildtype protein. Between timesteps 6000-9000 the minimal distance between periodic images is below 2nm.
</figure>
<figure id="k213e_pi">
<xr nolink id="k213e_pi"/>
Minimum distance of periodic images for K213E. Almost during the whole simulation the minimal distance is less than 2 nm.
</figure>
<figure id="a305e_pi">
<xr nolink id="a305e_pi"/>
Minimum distance of periodic images for A305E. The minimum distance between periodic images is larger than 2nm for most of the simulation.
</figure>

</figtable>

RMSF

Only small fluctuations can be observed for the residues of the three proteins.

For all three proteins there is a peak around residues 60-75, which defines this region as rather flexible. Especially for K213E, there is a strong peak of more than 0.35 nm. This region forms a loop, that is highlighted with a red circle in the lower right corner in the B-factor figures (<xr id="wt_bfactors"/>, <xr id="k213e_bfactors"/>, <xr id="a305e_bfactors"/>).

There is another flexible region formed by residues 220-230. This region is also highlighted by a red circle in the b-factor figures in the lower left corner.

For the wildtype, there is a region between residues 120 and 180 that is especially rigid. When looking at the structure, one finds that this regions defines the core of the protein.

The figures of the B-factors of the proteins visualize the fluctuation plots. Most parts of the protein are rather rigid and only some exposed loops have higher bfactors. These loops correspond to the regions identified in the fluctuation plots.

Interestingly, K213E has rather low B-factor values, than one might expect from the fluctuations plot, whereas A305E has higher B-factors than the wildtype and seems to be more motile.



<figtable id="rmsf">

<xr nolink id="rmsf"/> Root Mean Square Fluctuations for the MD simulation run calculated per residue with g_rmsf.
WT K213E A305E
RMSF Plot <figure id="wt_rmsf_plot">
<xr nolink id="wt_rmsf_plot"/>
Per residue root-mean-square-fluctution plot for the wildtype protein. There is a rigid region around residues 120-180.
</figure>
<figure id="k213e_rmsf_plot">
<xr nolink id="k213e_rmsf_plot"/>
Per residue root-mean-square-fluctution plot for K213E.
</figure>
<figure id="a3o5e_rmsf_plot">
<xr nolink id="a3o5e_rmsf_plot"/>
Per residue root-mean-square-fluctution plot for A305E.
</figure>
B-Factors <figure id="wt_bfactors">
<xr nolink id="wt_bfactors"/>
The wildtype protein colored according to the calculated B-Factors. Highly flexible loops (according to residues from the fluctuations plot) are emphasized by red circles.
</figure>
<figure id="k213e_bfactors">
<xr nolink id="k213e_bfactors"/>
K213E colored according to the calculated B-Factors. Highly flexible loops (according to residues from the fluctuations plot) are emphasized by red circles.
</figure>
<figure id="a305e_bfactors">
<xr nolink id="a305e_bfactors"/>
A305E colored according to the calculated B-Factors. Highly flexible loops (according to residues from the fluctuations plot) are emphasized by red circles.
</figure>

</figtable>

RMSF Differences

With this script we compared the RMSF for the three proteins.

For wildtype and K213E the P-value is 0.03297.

For wildtype and A305E the P-value is 2.18339e-11.

For K213E and A305E the P-value is 0.00011.

Average Structure

g_rmsf also generates an unphysical average structure. One can see, that for regions with high b-facotrs, the averaged structure shows several possible residue conformers.

<figtable id="average">

<figure id="wt_average">
<xr nolink id="wt_average"/>
Close-up of the loop region formed by residues 60-75 and the average residue conformers for this loop. The higher the b-factor, the more possible residue conformers are found.
</figure>

</figtable>

Convergence of RMSD

As expected, the RMSD increases when using the starting structure as a reference: During the simulation, the structure changes and deviates more and more from the starting structure. Yet, these changes are not tremendous (RMSD < 0.2 nm), as the starting structure is derived from the crystal structure and therefore should already have adopted an optimal conformation.

When using only C-alpha atoms for the RMSD calculation instead of the whole protein, the values are smaller, since sidechains are the most flexible elements in a structure and cause a higher RMSD compared to only C-alpha atoms.

When taking the average structure as reference, the RMSD is higher at the beginning of the simulation and finally converges as the structure reaches an equilibrium. Only for A305E, the RMSD increases in the last 2000 timesteps and therefore does not show convergence. Another interesting point is that for both mutants the RMSD values are much higher than for the wildtype: For the wildtype the RMSD for the average structure is about 0.48 nm at the end of the simulation, whereas for K213E it is 0.69 nm and for A305E it is 0.61 nm.

Distance-based RMSD

<figure id="db_rmsd">

<xr nolink id="db_rmsd"/>
Coordinate-based-vs-distance-based RMSD.

</figure>

Here, we also included the 'internal' or distance-based RMSD (<xr id="db_rmsd"/>), since a problem with a coordinate-based RMSD is that it involves least squares fitting of the simple coordinates of the atoms, not their distances. We can definitely see differences between those RMSD measures, one being that the wildtype seems to undergo a larger structural change around frame 6000, which is more supported by the internal RMSD rather than the coordinate-based one.


<figtable id="rmsd">

<xr nolink id="rmsd"/> Root Mean Square Deviations of the structure over time towards different reference structures calculated with g_rmsd.
WT K213E A305E
RMSD whole protein compared to starting structure <figure id="wt_rmsd_all_vs_first">
<xr nolink id="wt_rmsd_all_vs_first"/>
RMSD of wildtype protein compared to the starting structure over the simulation time. RMSD increases fast and converges at the end of the simulation.
</figure>
<figure id="k213e_rmsd_all_vs_first">
<xr nolink id="k213e_rmsd_all_vs_first"/>
RMSD of K213E mutant compared to the starting structure over the simulation time. RMSD increases fast and converges at the end of the simulation.
</figure>
<figure id="a3o5e_rmsd_all_vs_first">
<xr nolink id="a3o5e_rmsd_all_vs_first"/>
RMSD of A305E mutant compared to the starting structure over the simulation time. RMSD increases fast and converges at the end of the simulation.
</figure>
RMSD C-alpha atoms compared to starting structure <figure id="wt_rmsd_calpha_vs_first">
<xr nolink id="wt_rmsd_calpha_vs_first"/>
C-alpha RMSD of the wildtype protein compared to the starting structure over the simulation time. At the beginning of the simulation, the RMSD increases a bit and converges towards the end.
</figure>
<figure id="k213_scwrl_rmsd_calpha_first">
<xr nolink id="k213_scwrl_rmsd_calpha_first"/>
C-alpha RMSD of K213E mutant compared to the starting structure over the simulation time. At the beginning of the simulation, the RMSD increases a bit and converges towards the end.
</figure>
<figure id="a305e_scwrl_rmsd_calpha_first">
<xr nolink id="a305e_scwrl_rmsd_calpha_first"/>
C-alpha RMSD of A305E mutant compared to the starting structure over the simulation time. At the beginning of the simulation, the RMSD increases a bit and converges towards the end.
</figure>
Internal RMSD for Calpha atoms <figure id="wt__internal_rmsd">
<xr nolink id="wt__internal_rmsd"/>
C-alpha internal RMSD of the wildtype protein. It starts off similarly to the coordinate-based RMSD, but shows a more definite peak around frame 6000, an observation that we will repeat with further analysis.
</figure>
<figure id="k213_internal_rmsd">
<xr nolink id="k213_internal_rmsd"/>
C-alpha internal RMSD of K213E mutant.
</figure>
<figure id="a305e_internal_rmsd">
<xr nolink id="a305e_internal_rmsd"/>
C-alpha internal RMSD of A305E mutant.
</figure>
RMSD whole protein compared to average structure <figure id="wt_rmsd_all_vs_average">
<xr nolink id="wt_rmsd_all_vs_average"/>
RMSD of wildtype protein compared to the average structure over the simulation time. RMSD decreases as the protein structure converges towards an average structure.
</figure>
<figure id="k213e_rmsd_all_vs_average">
<xr nolink id="k213e_rmsd_all_vs_average"/>
RMSD of K213E mutant compared to the average structure over the simulation time. RMSD decreases as the protein structure converges towards an average structure.
</figure>
<figure id="a305e_rmsd_all_vs_average">
<xr nolink id="a305e_rmsd_all_vs_average"/>
RMSD of A305E mutant compared to the average structure over the simulation time. RMSD decreases at the beginning but increases for the last 2000 time steps. A305E does not show structural convergence.
</figure>
RMSD protein backbone compared to average structure <figure id="wt_rmsd_calpha_vs_average">
<xr nolink id="wt_rmsd_calpha_vs_average"/>
RMSD of the wildtype backbone compared to the average structure over the simulation time. RMSD decreases as the protein structure converges towards an average structure.
</figure>
<figure id="k213_scwrl_rmsd_calpha_average">
<xr nolink id="k213_scwrl_rmsd_calpha_average"/>
RMSD of the K213E mutant backbone compared to the average structure over the simulation time. RMSD decreases as the protein structure converges towards an average structure.
</figure>
<figure id="a305e_scwrl_rmsd_calpha_average">
<xr nolink id="a305e_scwrl_rmsd_calpha_average"/>
RMSD of the A305E mutant backbone compared to the average structure over the simulation time. RMSD varies a lot over simulation time, but in the end has the same value as at the beginning of the simulation.
</figure>

</figtable>

Visualization

Wildtype

wt_movie.avi

K213E

k213e_movie.avi

closeup

A305E

a305e_movie.avi

Radius of gyration

Against our expectations, the radius of gyration increases for the proteins. As the energy of the system decreases during the run, we would expect that the protein becomes more compact. One idea is, that we used the monomeric form of the protein for the simulation, whereas in the crystal structure it is a dimer. Therefore the monomer might have a different energetical optimal conformation than compared to its dimer bound form.

Yet, the changes are only minor and within a range of less than 0.05 nanometer. Thus, the increase in the radius of gyration is not of great impact.


<figtable id="rog">

<xr nolink id="rog"/> Radius of gyration of the proteins for the simulation run calculated with g_gyrate.
WT K213E A305E
Plot <figure id="wt_rg">
<xr nolink id="wt_rg"/>
Radius of gyration of the wildtype protein during the MD simulation.
</figure>
<figure id="k213_scwrl_rog">
<xr nolink id="k213_scwrl_rog"/>
Radius of gyration of the K213E mutant during the MD simulation.
</figure>
<figure id="a305e_scwrl_rog">
<xr nolink id="a305e_scwrl_rog"/>
Radius of gyration of the A305E mutant during the MD simulation.
</figure>

</figtable>


Surface

The surface for the wildtype does not significantly change during the simulations, but rather oscillates around 88nm^2. Big changes in the surface could imply major structural changes like an opening or closing process of the molecule, but we can neither observe that in our visualisations, nor see it implied from the surface data.
For the K213E mutant, which lies on an outer loop far away from the binding site and from the dimer interaction site, but is still reported to affect protein function, we can observe a slight increase of the surface area, but the oscillation is still strong, so we cannot be sure if there is really a difference to the wildtype or if we would only need to run the simulation a little longer to see them converging at the same value.
For the A305E mutant, one of the most frequent of Canavan Disease patients, we can definitely notice a difference to the wildtype's behaviour of surface area during the simulation. First, its area is larger in general, i.e., the average seems to be around 90nm^2, which is definitely a difference which is large enough to be mentioned. Also, its oscillation are a little stronger.

<figtable id="sas">

<xr nolink id="rog"/> The solvent accessible surface of the proteins for the simulation run calculated with g_sas.
WT K213E A305E
Plot <figure id="wt_aspa_sas">
<xr nolink id="wt_rg"/>
The solvent accessible surface of the wildtype during the MD simulation.
</figure>
<figure id="k213_sas">
<xr nolink id="k213_scwrl_rog"/>
The solvent accessible surface of the K213E mutant during the MD simulation.
</figure>
<figure id="a305e_sas">
<xr nolink id="a305e_scwrl_rog"/>
The solvent accessible surface of the A305E mutant during the MD simulation.
</figure>

</figtable>

HBonds within Protein

For the Wildtype protein, the number of HBonds within the protein oscillates around 230. We can see a slight decrease around frame 6000, an observation that will be followed by more hints that some bigger changes might be going on around frame 6000.
The K213E mutant also starts with roughly 230 HBonds, but this number quickly decreases and convergence seems to be around 220 HBonds.
Mutant A305E now shows an enormous difference to the wildtype and K213E mutant - in fact, the difference is so large that we are still figuring out if there is a mistake in our input files to compute it. Todo. We consider it very unlikely that a single mutation can be responsible for such an increase in the number of HBonds. On the other side, A305E is one of the most frequent mutations reported to affect protein function, so we do expect to see differences.

<figtable id="hbonds">

<xr nolink id="hbonds"/> Number of HBonds within the proteins for the simulation run calculated with g_hbond.
WT K213E A305E
Plot <figure id="wt_rg">
<xr nolink id="wt_rg"/>
Number of HBonds within the protein of the wildtype protein during the MD simulation.
</figure>
<figure id="k213_scwrl_rog">
<xr nolink id="k213_scwrl_rog"/>
Number of HBonds within the protein of the K213E mutant during the MD simulation.
</figure>
<figure id="a305e_scwrl_rog">
<xr nolink id="a305e_scwrl_rog"/>
Number of HBonds within the protein of the A305E mutant during the MD simulation.
</figure>

</figtable>

Ramachandran Plots

Under construction: we are hoping to getting round to improving this section. As far as can be seen from the pictures below, there is now larger difference between the Wildtype and the mutations, but we don't quite trust these black-and-white pictures.

<figtable id="hbonds">

<xr nolink id="hbonds"/> Ramachandran plots for the proteins for the MD simulation, calculated with g_rama.
WT K213E A305E
Plot <figure id="wt_aspa_rama">
<xr nolink id="wt_aspa_rama"/>
Ramachandran plots for the protein of the wildtype protein during the MD simulation.
</figure>
<figure id="k213e_rama">
<xr nolink id="k213e_rama"/>
Ramachandran plots for the protein of the K213E mutant during the MD simulation.
</figure>
<figure id="a305e_rama">
<xr nolink id="a305e_rama"/>
Ramachandran plots for the protein of the A305E mutant during the MD simulation.
</figure>

</figtable>

RMSD reloaded

Here, we show an all-against-all frames RMSD comparison. The darker the colour, the larger the difference. For the wildtype, we can again see that a larger structural change seems to be happening around frame 6000.
For the K213E mutant, we observe very slight changes towards the end of the simulation, but not noticeably strong.
The same holds true for A305E - some slightly more noticeable changes between frame 9- and 10000, but not so strong as to suggest further investigation.


<figtable id="hbonds">

<xr nolink id="hbonds"/> Pairwise RMSD comparisons of the structures from the trajectory if the MD simulation, calculated with g_rms.
WT K213E A305E
Plot <figure id="wt_pw_rmsd">
<xr nolink id="wt_pw_rmsd"/>
Pairwise RMSD comparisons of the structures of the wildtype protein during the MD simulation.
</figure>
<figure id="k213e_pw_rmsd">
<xr nolink id="k213e_pw_rmsd"/>
Pairwise RMSD comparisons of the structures of the K213E mutant during the MD simulation.
</figure>
<figure id="a305e_pw_rmsd">
<xr nolink id="a305e_pw_rmsd"/>
Pairwise RMSD comparisons of the structures of the A305E mutant during the MD simulation.
</figure>

</figtable>

Cluster structures in trajectory

To cluster the structures in the trajectory, the RMSD pairwise comparison matrix of the previous section is used. It can be seen in the upper half of the maps shown below.
Again, the wildtype shows a structural transition around frame 6000. However, we were not able to make out a major difference in the visualisations of the wildtype protein.
The RMSD matrix already suggested it, but we were still surprised to find that for both mutants, there was barely any change in clusters at all.


<figtable id="hbonds">

<xr nolink id="hbonds"/> Clustured structures from the trajectory of the MD simulation, calculated with g_cluster.
WT K213E A305E
Plot <figure id="wt__cluster">
<xr nolink id="wt__cluster"/>
Structures of the wildtype clustered according to pairwise RMSDs for the MD simulation.
</figure>
<figure id="k213e_cluster">
<xr nolink id="k213e_cluster"/>
Structures of the K213E mutant clustered according to pairwise RMSDs for the MD simulation.
</figure>
<figure id="a305e_cluster">
<xr nolink id="a305e_cluster"/>
Structures of the A305E mutant clustered according to pairwise RMSDs for the MD simulation.
</figure>

</figtable>