Molecular Dynamics Simulations Analysis of Glucocerebrosidase

From Bioinformatikpedia
Revision as of 10:07, 23 August 2011 by Brunners (talk | contribs) (Convergence of radius of gyration)

Introduction

To analyze our Molecular Dynamics Simulations we followed the tutorial described here: http://md.chem.rug.nl/~mdcourse/analysis1.html

Wildtype

A brief check of results

  • command: gmxcheck -f 2NT0_wt_md.xtc

How many frames are in the trajectory file and what is the time resolution?

  • 2001 frames
  • timestep: 5ps

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

  • Simulation run: 8h37:36
  • Simulation speed: 27.821
  • Time to reach a second: 1/0.000000028 = 35714285,714285714 days = 97847,358121331 years

Which contribution to the potential energy accounts for most of the calculations?

  • -9.39801e+05 kJ/mol

Visualization of results

For visualization we extract 1000 frames from the trajectory (-dt 10), only select the protein without the water and remove the jumps over the boundaries to make a continuous trajectory (-pbc nojump).

  • command: trjconv -s 2NT0_wt_md.tpr -f 2NT0_wt_md.xtc -o protein.pdb -pbc nojump -dt 10
Figure 1: The protein with the spectrum
Figure 2: The protein with the box around it
Figure 3: The cartoon presentation after using the command dss
Figure 4: The active site

STILL MISSING: movie

Quality assurance

Convergence of energy terms

First we look at the energy terms. Therefore we look at the temperature, the pressure, the energy, the volume, the density, the box and the interaction energies between protein and solvent (Coulomb and van der Waals).

With the following commands we produced *.xvg files which were visualized with xmgrace.

g_energy -f 2NT0_wt_md.edr -o temperature.xvg
g_energy -f 2NT0_wt_md.edr -o pressure.xvg
g_energy -f 2NT0_wt_md.edr -o energy.xvg
g_energy -f 2NT0_wt_md.edr -o volume.xvg
g_energy -f 2NT0_wt_md.edr -o density.xvg
g_energy -f 2NT0_wt_md.edr -o box.xvg

Temperature
Figure 6: Temperature of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Temperature 297.912 0.0062 1.08924 0.00616934 (K)

What is the average temperature and what is the heat capacity of the system?
297.912 K

In figure 6 you can see the plot for the temperature during the simulation. It fluctuates between 295 K and 301 K. So the temperature is very stable, the difference is only six Kelvin. That means that the system reached very soon a stable temperature.

Pressure
Figure 7: Pressure of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Pressure 1.00032 0.014 87.9016 0.0583997 (bar)

Estimate the plateau values for the pressure.
0 bar? 1.00032 bar? higher?

The pressure fluctuates around -300 bar and +300 bar. That is a difference of 600 bar which is very large. The system seems not to have reached its plateau. So it is also hard to estimate such a value because no trend is observable.

Energy
Figure 8: Energies of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Potential -939801 85 924.013 -583.172 (kJ/mol)
Kinetic En. 170900 3.6 624.851 3.53909 (kJ/mol)
Total Energy -768901 84 1128.21 -579.63 (kJ/mol)

What are the terms plotted in the file energy.xvg?
potential, kinetic and total energy

In the plot you can see that all three energy terms reached a plateau. It is hard to say how large the fluctuations are, because the scale is too large. But as they converged the simulation seems to have reached its optimum.

Volume
Figure 9: Volume of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Volume 735.655 0.042 0.552925 -0.0802412 (nm^3)

Estimate the plateau values for the volume.
735.655? 736?

In figure 9 you can see the volume fluctuating between 734 nm³ and 737 nm³. The difference is only three nm³. So the colume also converged to a an interval.

Density
Figure 10: Density of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Density 1009.64 0.058 0.75885 0.110104 (kg/m^3)

Estimate the plateau values for the density.
1009.64? 1010?

In figure 10 you can see the density fluctuating around 1010 kg/m³. The difference seems to be about 5 kg/m³. So the density also converged.

Box
Figure 11: Boxes of the Molecular Dynamics Simulation of the Wildtype


Energy Average Err.Est. RMSD Tot-Drift
Box-X 10.1328 0.0002 0.00253864 -0.000368389 (nm)
Box-Y 10.1328 0.0002 0.00253864 -0.000368389 (nm)
Box-Z 7.16498 0.00014 0.00179508 -0.000260487 (nm)

What are the terms plotted in the file box.xvg?
The size of the box around the protein.

In figure 11 you can see the sizes of the box, the x-, the y- and the z-term. They are very constant with 10.1328, 10.1328 and 7.16498 nm.

Interaction Energy: Coulomb

Figure 12: Interaction energy of protein and solvent
Energy Average Err.Est. RMSD Tot-Drift
Coul-SR:Protein-non-Protein -24690.5 120 476.146 -405.451 (kJ/mol)
Coul-14:Protein-non-Protein 0 0 0 0 (kJ/mol)

In figure 12 you can see the Coulomb Energy. It has its average at -24690.5 kJ/mol. It fluctuates around that value in an interval of about 1000 kJ/mol. At the end it is more stable. As the equilibration takes longer for this term it may be that the value at the time 8000 to 10000 is the equilibrium.

Interaction Energy: van der Waals

Figure 13: Interaction energy of protein and solvent
Energy Average Err.Est. RMSD Tot-Drift
LJ-SR:Protein-non-Protein -3184.89 4.3 149.546 -12.8142 (kJ/mol)
LJ-14:Protein-non-Protein 0 0 0 0 (kJ/mol)

In figure 13 you can see the van der Waals interaction energy. It has its average at -3184.89. The values are fluctuating in an interval of about 1000 kJ/mol. Also at the end of the time there is no equilibration recognisable. Maybe it always fluctuates in such a big interval or it needs more time to converge.

Minimum distances between periodic images

We used the following command: g_mindist -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -od minimal-periodic-distance.xvg -pi

Figure 14: Minima periodic distance on the protein

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 1.97143 (nm) at time 5825 (ps), between atoms 960 and 2323.

Figure 14: Minima periodic distance on the C-alpha group

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 2.43551 (nm) at time 5345 (ps), between atoms 952 and 2267.

What happens if the minimal distance becomes shorter than the cut-off distance used for electrostatic interactions? Is it the case in your simulations?

It is not the case in our simulation. The energy would "explode".

Run now g_mindist on the C-alpha group, does it change the results? What does is mean for your system?

It changes to a greater distance between two other atoms (952 and 2267). That means that it may be atoms of a side group that have this minimal distance. Also the fluctuation interval of the minimum distance becomes smaller.

Root mean square fluctuations

In the following we want to look at the RMSF, the root mean square fluctuations<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref>.

Therefore we used the following command:

g_rmsf -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 15: RMSF fluctuation

Indicate the start and end residue for the most flexible regions and the maximum amplitudes.

In figure 15 you can see the RMSF fluctuation. With that you can recognise flexible regions. The most flexible region seems to be between residue 125 and residue 175. Other flexible regions are at the beginning of the protein between residue 20 and 50 and at the region about position 350.


Figure 16: Colored structure according to b-factors and flexible regions

In figure 16 you can see the protein colored according to b-factors and flexible regions. Blue means that the region is not flexible and red means that it is a very flexible region. You can see some loops that are flexible and parts of the alpha helices and beta sheets that are a bit more flexible. The most of the inner part of the protein is colored blue and therefore not flexible.

Convergence of RMSD

No we want to calculate the RMSD. It is used as an indicator of convergence of the structure towards an equilibrium state.

We used the following commands:

trjconv -f 2NT0_wt_md.xtc -o traj_nojump.xtc -pbc nojump
g_rms -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o rmsd-all-atom-vs-start.xvg

g_rms -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o rmsd-backbone-vs-start.xvg
echo 1 | trjconv -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o protein.xtc
g_rms -f protein.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg
g_rms -f protein.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Figure 17a: RMSD: All atom versus all, against starting structure
Figure 17b: RMSD: All atom versus all, against average structure
Figure 18a: RMSD: Backbone, against starting structure
Figure 18b: RMSD: Backbone, against average structure

If observed, at what time and value does the RMSD reach a plateau?

If you compare the different methods (all atom/backbonde and against starting structure or average structure) as you can see in figure 17a, b and 18a, b there are different times when the RMSD reaches a plateau. For all atoms and backbone against the starting structure both reach their plateau at about 1500 ps with a value of 0.18 and a bit lower for backbone with about 0.17.

For the comparison to the average structure they reach a plateau later. At 3000 to 4000 ps. All atom versus all reaches a RMSD of about 0.2, the backbone of 0.06, which is very low.

Briefly discuss two differences between the graphs against the starting structure and against the average structure. Which is a better measure for convergence?

As already mentioned the convergence is reached later. The reason for that is that the average structure is nearer to the final structure. Another thing is that the RMSD for the average structures gets lower wheras it gets higher for the comparison to the starting structure. The starting structure is farer away from the final structure, so this is again the reason. It might be better to take the starting structure to recognise the convergence because you can see sooner that an equilibrian state is reached.

Convergence of radius of gyration

Now look at the radius of gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref>. Therefore we used the following command:

g_gyrate -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -o radius-of-gyration.xvg

The radius of gyration gives insight to the shape of the protein.

Figure 19: Radius of gyration

At what time and value does the radius of gyration converge?

The radius of gyration is always constant with about 2.3 nm. The RgX, RgY and RgZ fluctuate until the end and reach values between 1.8 and 1.9.

Structural analysis: properties derived from configurations

Solvent accessible surface area

Hydrogen bonds

Salt bridges

Secondary structure

Ramachandran (phi/psi) plots

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Cluster analysis

Distance RMSD

Mutation 7

still waiting


Mutation 10

still waiting