Canavan Task 10 - Molecular Dynamics Simulations
- 1 Protocol
- 2 Initial Checks
- 3 Energies
- 4 distances between periodic boundaries
- 5 RMSF
- 6 Convergence of RMSD
- 7 Visualization
- 8 Radius of gyration
- 9 Surface
- 10 HBonds within Protein
- 11 Ramachandran Plots
- 12 RMSD reloaded
- 13 Cluster structures in trajectory
Further information and commands can be found in the protocol.
For all three runs, there are 2000 time frames, with a resolution of 5 psec. Therefore the whole simulation ran for 10000 psec = 10 nsec. No errors occured during the run and the simulation finished properly.
|run time||5h 22:37||5h 08:54||5h 08:35|
|atoms outside of box||406, 408, 480, 482, 483, 484, 485, 486, 500, 501,..||406, 480, 482, 483, 484, 485, 486, 500, 502, 503,..||406, 408, 480, 482, 483, 484, 485, 486, 500, 501,..|
|Last frame||2000, time 10000.000||2000, time 10000.000||2000, time 10000.000|
Temperature, Pressure, Total Energies and Potential Energies have been analysed for the three proteins for the MD run using g_energy. For all analysed thermodynamical parameters convergence could be observed. Though, for pressure the values vary enormously, but the average pressure is close to the specified value of 1 bar.
The temperature fluctuates about the reference temperature of 298K that has been specified in the corresponding mdp file. There are only small deviations from this value, which is reflected in the small error estimates.
|Reference Value||298 K||298 K||298 K|
|Total Drift||-0.00042403 (K)||0.00867898 (K)||0.0154339 (K)|
|Plot||<figure id="wt_temp">||<figure id="k213_scwrl_temp">||<figure id="a305_scwrl_temp.png">|
The pressure varies enormously about the reference pressure of 1 bar. This results in large RMSD values. Yet, the average pressure over the whole simulation is very close to the specified pressure of 1 bar and therefore the error estimates are relatively small.
|Reference Value||1.0 (Berendsen barostat)||1.0 (Berendsen barostat)||1.0 (Berendsen barostat)|
|Total Drift||-0.0713928 (bar)||-0.0585283 (bar)||-0.100316 (bar)|
|Plot||<figure id="wt_pressure">||<figure id="k213_scwrl_pressure">||<figure id="a305_scwrl_pressure">|
The potential energy decreases over simulation time, which leads to the conclusion, that the conformation of the structure is energetically optimized during the simulation. The decrease in energy is reflected in the large negative drift values. The wildtype has the best average energies and also the best energies at the end of the simulation with about -592000 kJ/mol, whereas for K213E it is -586000 kJ/mol and for A305E -583000 kJ/mol.
|Total Drift||-252.112 (kJ/mol)||-425.823 (kJ/mol)||-290.165 (kJ/mol)|
|Plot||<figure id="CD_wt_potenergy">||<figure id="k213_scwrl_poten">||<figure id="a305_scwrl_potenergy">|
The total energy values resemble the potential energy values. In contrast to the potential energy, kinetic energies are included in the calculation of the total energy. Again the total energy decreases during the simulation. And again, the wildtype protein has the best average total energy with about -485000 kJ/mol against -480000 kJ/mol for the K213E mutant and -478500 kJ/mol for A305E.
|Total Drift||-252.262 (kJ/mol)||-422.763 (kJ/mol)||-284.743 (kJ/mol)|
|Plot||<figure id="wt_tot_energy">||<figure id="k213_scwrl_toten">||<figure id="a305e_scwrl_totenergy">|
distances between periodic boundaries
We calculated the minimum distance between periodic images for the whole protein (not only C-alpha atoms). The suggested distance limit of 2nm is undercut at some timesteps during the simulation. Especially for the mutant K213E the distance often is below 2nm. This might have caused undesired unphysical interactions.
|shortest dist||1.6456 (nm)||1.47431 (nm)||1.59859 (nm)|
|at time step||7675 (ps)||7515 (ps)||2460 (ps)|
|between atoms||15 and 4507||598 and 4497||597 and 4507|
|Plot||<figure id="wt_pi">||<figure id="k213e_pi">||<figure id="a305e_pi">|
Only small fluctuations can be observed for the residues of the three proteins.
For all three proteins there is a peak around residues 60-75, which defines this region as rather flexible. Especially for K213E, there is a strong peak of more than 0.35 nm. This region forms a loop, that is highlighted with a red circle in the lower right corner in the B-factor figures (<xr id="wt_bfactors"/>, <xr id="k213e_bfactors"/>, <xr id="a305e_bfactors"/>).
There is another flexible region formed by residues 220-230. This region is also highlighted by a red circle in the b-factor figures in the lower left corner.
For the wildtype, there is a region between residues 120 and 180 that is especially rigid. When looking at the structure, one finds that this regions defines the core of the protein.
The figures of the B-factors of the proteins visualize the fluctuation plots. Most parts of the protein are rather rigid and only some exposed loops have higher bfactors. These loops correspond to the regions identified in the fluctuation plots.
Interestingly, K213E has rather low B-factor values, than one might expect from the fluctuations plot, whereas A305E has higher B-factors than the wildtype and seems to be more motile.
|RMSF Plot||<figure id="wt_rmsf_plot">||<figure id="k213e_rmsf_plot">||<figure id="a3o5e_rmsf_plot">|
|B-Factors||<figure id="wt_bfactors">||<figure id="k213e_bfactors">||<figure id="a305e_bfactors">|
With this script we compared the RMSF for the three proteins.
For wildtype and K213E the P-value is 0.03297.
For wildtype and A305E the P-value is 2.18339e-11.
For K213E and A305E the P-value is 0.00011.
g_rmsf also generates an unphysical average structure. One can see, that for regions with high b-facotrs, the averaged structure shows several possible residue conformers.
Convergence of RMSD
As expected, the RMSD increases when using the starting structure as a reference: Over the simulation the structure changes and deviates more and more from the starting structure. Yet these changes are not tremendous (RMSD < 0.2 nm), as the starting structure is derived from the crystal structure and therefore should already have adopted a optimal conformation.
When using only C-alpha atoms for the RMSD calculation instead of the whole protein, the values are smaller. This is because sidechains are the most flexible elements in a structure and cause the higher RMSD compared to only C-alpha atoms.
When taking the average structure as reference, the RMSD is higher at the beginning of the simulation and finally converges as the structure reaches an equilibrium. Only for A305E the RMSD increases in the last 2000 timesteps and therefore does not show convergence!
Another interesting point is, that for both mutants the RMSD values are much higher than for the wildtype: For the wildtype the RMSD towards the average structure is about 0.48 nm at the end of the simulation, whereas for K213E it is 0.69 nm and for A305E it is 0.61 nm.
Here, we also included the 'internal' or distance-based RMSD (<xr id="db_rmsd"/>), since a problem with a coordinate-based RMSD is that it involves least squares fitting of the simple coordinates of the atoms, not their distances. We can definitely see differences between those RMSD measures, one being that the wildtype seems to undergo a larger structural change around frame 6000, which is more supported by the internal RMSD rather than the coordinate-based one.
|RMSD whole protein towards starting structure||<figure id="wt_rmsd_all_vs_first">||<figure id="k213e_rmsd_all_vs_first">||<figure id="a3o5e_rmsd_all_vs_first">|
|RMSD C-alpha atoms towards starting structure||<figure id="wt_rmsd_calpha_vs_first">||<figure id="k213_scwrl_rmsd_calpha_first">||<figure id="a305e_scwrl_rmsd_calpha_first">|
|Internal RMSD for Calpha atoms||<figure id="wt__internal_rmsd">||<figure id="k213_internal_rmsd">||<figure id="a305e_internal_rmsd">|
|RMSD whole protein towards average structure||<figure id="wt_rmsd_all_vs_average">||<figure id="k213e_rmsd_all_vs_average">||<figure id="a305e_rmsd_all_vs_average">|
|RMSD protein backbone towards average structure||<figure id="wt_rmsd_calpha_vs_average">||<figure id="k213_scwrl_rmsd_calpha_average">||<figure id="a305e_scwrl_rmsd_calpha_average">|
Radius of gyration
Against our expectations, the radius of gyration increases for the proteins. As the energy of the system decreases during the run, we would expect that the protein becomes more compact. One idea is, that we used the monomeric form of the protein for the simulation, whereas in the crystal structure it is a dimer. Therefore the monomer might have a different energetical optimal conformation than compared to its dimer bound form.
Yet, the changes are only minor and within a range of less than 0.05 nanometer. Thus, the increase in the radius of gyration is not of great impact.
|Plot||<figure id="wt_rg">||<figure id="k213_scwrl_rog">||<figure id="a305e_scwrl_rog">|
The surface for the wildtype does not significantly change during the simulations, but rather oscillates around 88nm^2. Big changes in the surface could imply major structural changes like an opening or closing process of the molecule, but we can neither observe that in our visualisations, nor see it implied from the surface data.
For the K213E mutant, which lies on an outer loop far away from the binding site and from the dimer interaction site, but is still reported to affect protein function, we can observe a slight increase of the surface area, but the oscillation is still strong, so we cannot be sure if there is really a difference to the wildtype or if we would only need to run the simulation a little longer to see them converging at the same value.
For the A305E mutant, one of the most frequent of Canavan Disease patients, we can definitely notice a difference to the wildtype's behaviour of surface area during the simulation. First, its area is larger in general, i.e., the average seems to be around 90nm^2, which is definitely a difference which is large enough to be mentioned. Also, its oscillation are a little stronger.
|Plot||<figure id="wt_aspa_sas">||<figure id="k213_sas">||<figure id="a305e_sas">|
HBonds within Protein
For the Wildtype protein, the number of HBonds within the protein oscillates around 230. We can see a slight decrease around frame 6000, an observation that will be followed by more hints that some bigger changes might be going on around frame 6000.
The K213E mutant also starts with roughly 230 HBonds, but this number quickly decreases and convergence seems to be around 220 HBonds.
Mutant A30E now shows an enormous difference to the wildtype and K213E mutant - in fact, the difference is so large that we are still figuring out if there is a mistake in our input files to compute it. Todo. We consider it very unlikely that a single mutation can be responsible for such an increase in the number of HBonds.
|Plot||<figure id="wt_rg">||<figure id="k213_scwrl_rog">||<figure id="a305e_scwrl_rog">|
Under construction: we are hoping to getting round to improving this section. As far as can be seen from the pictures below, there is now larger difference between the Wildtype and the mutations, but we don't quite trust these black-and-white pictures.
|Plot||<figure id="wt_aspa_rama">||<figure id="k213e_rama">||<figure id="a305e_rama">|
Here, we show an all-against-all frames RMSD comparison. The darker the colour, the larger the difference. For the wildtype, we can again see that a larger structural change seems to be happening around frame 6000.
|Plot||<figure id="wt_pw_rmsd">||<figure id="k213e_pw_rmsd">||<figure id="a305e_pw_rmsd">|
Cluster structures in trajectory
To cluster the structures in the trajectory, the RMSD pairwise comparison matrix of the previous section is used. It can be seen in the upper half of the maps shown below.
Again, the wildtype shows a structural transition around frame 6000. However, we were not able to make out a major difference in the visualisations of the wildtype protein.
|Plot||<figure id="wt__cluster">||<figure id="k213e_cluster">||<figure id="a305e_cluster">|