Difference between revisions of "Molecular Dynamics Simulations Analysis of Glucocerebrosidase"

Latest revision as of 02:18, 26 September 2011

Introduction

To analyze our Molecular Dynamics Simulations we followed the tutorial described here: http://md.chem.rug.nl/~mdcourse/analysis1.html

Wildtype

A brief check of results

command: gmxcheck -f 2NT0_wt_md.xtc

How many frames are in the trajectory file and what is the time resolution?

2001 frames
timestep: 5ps

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

Simulation run: 8h37:36
Simulation speed: 27.821
Time to reach a second: 1/0.000000028 = 35714285,714285714 days = 97847,358121331 years

Which contribution to the potential energy accounts for most of the calculations?

-9.39801e+05 kJ/mol

Visualization of results

For visualization 1000 frames from the trajectory are extracted (-dt 10), only the protein without water is selected and the jumps over the boundaries are removed to make a continuous trajectory (-pbc nojump).

command: trjconv -s 2NT0_wt_md.tpr -f 2NT0_wt_md.xtc -o protein.pdb -pbc nojump -dt 10

Figure 1: The protein with the spectrum

Figure 2: The protein with the box around it

Figure 3: The cartoon presentation after using the command dss

Figure 4: The active site

Figure 5a: Molecular Dynamics Simulation of Glucocerebrosidase

Figure 5b: Molecular Dynamics Simulation of Glucocerebrosidase

In Figure 5 one can see the protein in motion. Compared to the results in Normal Mode Analysis the motion is not as strong and the parts that are moving are smaller than the moving ones of the Normal Mode Analysis. In this simulation mostly side chains and little elements of the protein are bouncing. Big movements like attraction or repulsion, as seen in the Normal Mode Analysis, are not present.

Quality assurance

Convergence of energy terms

To assess the quality of the model, the convergence of thermodynamic parameters, such as the temperature, the pressure, the potential and the kinetic energy is investigated. Not converged values indicate, that the simulation has not reached a thermal equilibrium and should be extended further.

To investigate the convergance, files for temperature, pressure, energy, volume, density, box and interaction energies between protein and solvent (Coulomb and van der Waals) have been created with the following commands:

g_energy -f 2NT0_wt_md.edr -o temperature.xvg g_energy -f 2NT0_wt_md.edr -o pressure.xvg g_energy -f 2NT0_wt_md.edr -o energy.xvg g_energy -f 2NT0_wt_md.edr -o volume.xvg g_energy -f 2NT0_wt_md.edr -o density.xvg g_energy -f 2NT0_wt_md.edr -o box.xvg

In the next steps, the files have been visualized with xmgrace.

Temperature

Figure 6: Temperature of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Temperature	297.912	0.0062	1.08924	0.00616934 (K)

Figure 6 shows the plot for the temperature during the simulation. The temperature fluctuates between 295 K and 301 K and has an average of 297.912 K. The temperature of the system is therefore very stable, with a difference between minimal and maximal temperature of only six Kelvin. This indicates that the temperature of the system converged and reached a stable temperature very fast.

Pressure

Figure 7: Pressure of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Pressure	1.00032	0.014	87.9016	0.0583997 (bar)

The pressure fluctuation, shown in Figure 7, lies within -300 and +300 bar. This results in a difference of 600 bar which is quite large. The average pressure is about 1 bar. The system does not seem to have reached a stable pressure. It is hard to estimate when and at what point it would have converged, as no trend is observable during the simulation.

Energy

Figure 8: Energies of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Potential	-939801	85	924.013	-583.172 (kJ/mol)
Kinetic En.	170900	3.6	624.851	3.53909 (kJ/mol)
Total Energy	-768901	84	1128.21	-579.63 (kJ/mol)

In the energy file three different terms are plotted: potential, kinetic and total energy. The plot in Figure 8 shows that all three energy terms reached their plateau. It is hard to estimate the size of the fluctuations, as the scale is quite large. But as the energy terms converged, the simulation seems to have reached its optimum.

Volume

Figure 9: Volume of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Volume	735.655	0.042	0.552925	-0.0802412 (nm³)

The volume, shown in Figure 9, fluctuates between 734 nm³ and 737 nm³ and the average is 735 nm³. The small differences in volume indicate, that the volume has converged.

Density

Figure 10: Density of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Density	1009.64	0.058	0.75885	0.110104 (kg/m³)

Figure 10 shows the density fluctuating around 1010 kg/m³ with an average of 1009.64 kg/m³. The maximum differences are of only about 5 kg/m³ which indicates, that the density has converged as well.

Box

Figure 11: Boxes of the Molecular Dynamics Simulation of the Wildtype

Energy	Average	Err.Est.	RMSD	Tot-Drift
Box-X	10.1328	0.0002	0.00253864	-0.000368389 (nm)
Box-Y	10.1328	0.0002	0.00253864	-0.000368389 (nm)
Box-Z	7.16498	0.00014	0.00179508	-0.000260487 (nm)

Figure 11 shows the plotted size of the box around the protein. The x-, y- and z-terms are very constant with values of 10.1328, 10.1328 and 7.16498 nm.

Interaction Energy: Coulomb

Figure 12: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
Coul-SR:Protein-Protein	-23823.8	63	232.842	-193.75 (kJ/mol)
Coul-14:Protein-Protein	95543.6	68	227.139	424.841 (kJ/mol)

Figure 12 shows the Coulomb Energy with an average at -24690.5 kJ/mol. The energy fluctuates around that value in an interval of about 1000 kJ/mol. Near the end of the simulation the energy becomes more stable. As the equilibration takes longer for this term than for the others, the value present at the time 8000 to 10000 is the equilibrium.

Interaction Energy: van der Waals

Figure 13: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
LJ-SR:Protein-non-Protein	-16499	3.6	115.164	9.41385 (kJ/mol)
LJ-14:Protein-non-Protein	8538.44	3	61.3311	-1.02288 (kJ/mol)

In Figure 13 one can see the van der Waals interaction energy. The energy has its average at -3184.89 kJ/mol. The values are fluctuating in an interval of about 1000 kJ/mol. The simulation does not seem to have established an equilibrium. This might have two reasons: either the van der Waals interaction energy always fluctuates in such a big interval or it needs more time to converge.

Minimum distances between periodic images

To get the minimum distances between periodic images, the following command has been used: g_mindist -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -od minimal-periodic-distance.xvg -pi

Figure 14: Minimal periodic distance on the protein

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 1.97143 (nm) at time 5825 (ps), between atoms 960 and 2323.

Figure 14: Minimal periodic distance on the C-alpha group

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 2.43551 (nm) at time 5345 (ps), between atoms 952 and 2267.

What happens if the minimal distance becomes shorter than the cut-off distance used for electrostatic interactions? Is it the case in your simulations?

It is not the case in our simulation. If this would be the case, the energy would "explode".

Run now g_mindist on the C-alpha group, does it change the results? What does is mean for your system?

It changes to a greater distance between two other atoms (952 and 2267). This indicates, that probably two atoms of a side group show the minimal distance in the protein. The fluctuation interval of the minimum distance becomes smaller as well.

Root mean square fluctuations

The RMSF, root mean square fluctuation<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref>, was calculated with the following command:

g_rmsf -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 15: RMSF fluctuation

Indicate the start and end residue for the most flexible regions and the maximum amplitudes.

In Figure 15 you can see the RMSF fluctuation of glucocerebrosidase which allows you to recognise the most flexible regions. The most flexible region seems to be between residue 125 and residue 175. Other flexible regions are at the beginning of the protein, i.e. between residue 20 and 50, and at the region around position 350. The active site, Glu235 and Glu340, is not part of those flexible regions.

Figure 16a: Colored structure according to b-factors and flexible regions

Figure 16b: Colored structure according to b-factors and flexible regions

Figure 16 shows the protein colored according to b-factors and flexible regions. Blue indicates, that the region is stable, whereas red marks very flexible regions. Some loops and some parts of alpha helices and beta sheets are shown to be flexible, whereas the center of the protein remains unflexible.

Convergence of RMSD

The RMSD (root mean squared deviation) is used as an indicator of convergence of the structure towards an equilibrium state. It was calculated with the following commands:
trjconv -f 2NT0_wt_md.xtc -o traj_nojump.xtc -pbc nojump g_rms -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o rmsd-all-atom-vs-start.xvg

g_rms -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o rmsd-backbone-vs-start.xvg echo 1 | trjconv -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o protein.xtc g_rms -f protein.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg g_rms -f protein.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Figure 17a: RMSD: All atom versus all, against starting structure

Figure 17b: RMSD: All atom versus all, against average structure

Figure 18a: RMSD: Backbone, against starting structure

Figure 18b: RMSD: Backbone, against average structure

If observed, at what time and value does the RMSD reach a plateau?

If you compare the different methods (all atom/backbonde and against starting structure or average structure), shown in Figure 17 a, b and 18 a, b, the RMSD reaches a plateau at different times. For all atoms and backbone against the starting structure, both reach their plateau at about 1500 ps with a value of 0.18 and a bit lower for backbone with about 0.17. Compared to the average structure they reach a plateau later, at 3000 to 4000 ps. All atom versus all reaches a RMSD of about 0.2, the backbone of 0.06, which is very low.

Briefly discuss two differences between the graphs against the starting structure and against the average structure. Which is a better measure for convergence?

As already mentioned the convergence is reached later. The reason for that is that the average structure is nearer to the final structure. Another thing is that the RMSD of the average structure gets lower whereas it gets higher for the comparison to the starting structure. The starting structure is farer away from the final structure, so this is again the reason. It might be better to take the starting structure to recognise the convergence because one can see sooner that an equilibrium state is reached.

Convergence of radius of gyration

The radius of gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref>, which gives insight to the shape of the protein, was calculated with the following command:

g_gyrate -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -o radius-of-gyration.xvg

Figure 19: Radius of gyration

At what time and value does the radius of gyration converge?

The radius of gyration is always constant with about 2.3 nm. The RgX, RgY and RgZ fluctuate until the end and reach values between 1.8 and 1.9.

Structural analysis: properties derived from configurations

Solvent accessible surface area

The solvent accessible surface area (SASA) <ref>http://en.wikipedia.org/wiki/Accessible_Surface_Area</ref> indicates, which parts of a protein are reachable by solvents.

Figure 20: Solvent Accessable Surface

Figure 21: Atomic SAS

Figure 22: Residue SAS

In Figure 20 you can see the solvent accessible surface in relation to time, in Figure 21 the atomic SAS and in Figure 22 the residue SAS. It is very hard to decide which residues are most accessible to the solvent. The areas between residue 200 and 250 and 300 to 350 and residues 400 and 450 seem to be reachable by solvents the most. Figure 20 shows that the solvation is almost the same at all time.

Hydrogen bonds

Next, the hydrogen bonds in the protein and between the protein and the surrounding solvent have been investigated. Therefore the following commands have been used:

echo 1 1 | g_hbond -f traj_nojump.xtc -s 2NT0_wt_md.tpr -num hydrogen-bonds-intra-protein.xvg echo 1 12 | g_hbond -f traj_nojump.xtc -s 2NT0_wt_md.tpr -num hydrogen-bonds-protein-water.xvg

Figure 23: Hydrogen bonds in the protein

Figure 24: Hydrogen bonds between protein and the surrounding solvent

Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.

In Figure 23 one can see the internal hydrogen bonds, i.e. hydrogen bonds within a single protein, and Figure 24 shows the hydrogen bonds to the surrounding water molecules. Internally, there are less hydrogen bonds built (about 350), than with the surrounding solvent (about 800), but more pairs within 0.35 nm (1700 compared to 1100). The fluctuation is larger for the hydrogen bonds with the solvent. Internally there is only little fluctuation in an interval of about 50 to 100.

Ramachandran (phi/psi) plots

As a next step we plotted the Ramachandran plot with the following command:

g_rama -f traj_nojump.xtc -s 2NT0_wt_md.tpr -o ramachandran.xvg

Figure 26: Ramachandran plot

Compared to standard Ramachandran plots<ref>http://en.wikipedia.org/wiki/Ramachandran_plot</ref> you can see that there are huge regions on the right side which normally do not occur. Looking at the Glycine Ramachandran plot at http://en.wikipedia.org/wiki/Ramachandran_plot there is more analogy than to the Ramachandran plot in general case. But Glycine is not overrepresented (6.8 per cent) in Glucocerebrosidase.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Now we want to compare the different structures during the trajectory. So we can find similar structures.

We therefore used the following commands:

g_rms -s 2NT0_wt_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue gv rmsd-matrix.eps

Figure 27: RMSD Matrix

In figure 27 you can see some clusters that are colored green. The greatest one is at the end of the time, which seems to be the equilibration. It is also interesting that there is a cluster between 1500 and 4000 ps.

Cluster analysis

Now we want to find clusters of similar structures. Therefore we use the following commands:

g_cluster -h echo 6 6 | g_cluster -s 2NT0_wt_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10

Figure 28: RMSD Distribution

In figure 28 you can see that most structures have a RMS lower than 0.1. Four clusters were found for the trajectory.

Executive: RMS = 0.656 (121 to 121 atoms)

But there are no notable differences between the clusters that can be seen by observing them in Pymol.

Distance RMSD

Now we want to see the interatomic distances by using the distance RMSD (dRMSD). Therefore we used the following command:

g_rmsdist -s 2NT0_wt_md.tpr -f traj_nojump.xtc -o distance-rmsd.xvg

Figure 29: dRMSD

At what time and value does the dRMSD converge and how does this graph compare to the standard RMSD?

It converges at the end at about 8000. So much time is needed until the protein reaches its equilibrium state. It looks very similar to figure 17a. It is only a bit more fluctuating but the values are quite the same.

Mutation 7

A brief check of results

command: gmxcheck -f 2NT0_mut7_md.xtc

How many frames are in the trajectory file and what is the time resolution?

2001 frames
timestep: 5ps

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

Simulation run: 8h47:42
Simulation speed: 27.287
Time to reach a second: 1/0.000000028 = 35714285,714285714 days = 97847,358121331 years

Which contribution to the potential energy accounts for most of the calculations?

-9.39750e+05 kJ/mol

Visualization of results

For visualization we extract 1000 frames from the trajectory (-dt 10), only select the protein without the water and remove the jumps over the boundaries to make a continuous trajectory (-pbc nojump).

command: trjconv -s 2NT0_mut7_md.tpr -f 2NT0_mut7_md.xtc -o protein.pdb -pbc nojump -dt 10

Figure 30: The protein with the spectrum

Figure 31: The protein with the box around it

Figure 32: The cartoon presentation after using the command dss

Figure 33: The active site

Figure 34a: Molecular Dynamics Simulation of Glucocerebrosidase

Figure 34b: Molecular Dynamics Simulation of Glucocerebrosidase

Quality assurance

Convergence of energy terms

First we look at the energy terms. Therefore we look at the temperature, the pressure, the energy, the volume, the density, the box and the interaction energies between protein and solvent (Coulomb and van der Waals).

With the following commands we produced *.xvg files which were visualized with xmgrace.

g_energy -f 2NT0_mut7_md.edr -o temperature.xvg g_energy -f 2NT0_mut7_md.edr -o pressure.xvg g_energy -f 2NT0_mut7_md.edr -o energy.xvg g_energy -f 2NT0_mut7_md.edr -o volume.xvg g_energy -f 2NT0_mut7_md.edr -o density.xvg g_energy -f 2NT0_mut7_md.edr -o box.xvg

Temperature

Figure 35: Temperature of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Temperature	297.909	0.0064	1.09154	0.00476588 (K)

What is the average temperature and what is the heat capacity of the system?
297.909 K

In figure 35 you can see the plot for the temperature during the simulation. It fluctuates between 295 K and 305 K. So the temperature is stable, the difference is ten Kelvin. That means that the system reached very soon a stable temperature.

Pressure

Figure 36: Pressure of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Pressure	1.00557	0.019	87.8392	-0.0608909 (bar)

Estimate the plateau values for the pressure.
0 bar? 1.00557 bar?

The pressure fluctuates around -300 bar and +300 bar. That is a difference of 600 bar which is very large. The system seems not to have reached its plateau. So it is also hard to estimate such a value because no trend is observable.

Energy

Figure 37: Energies of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Potential	-939750	81	924.746	-485.89 (kJ/mol)
Kinetic En.	170902	3.7	626.184	2.73516 (kJ/mol)
Total Energy	-768848	81	1132.31	-483.155 (kJ/mol)

What are the terms plotted in the file energy.xvg?
potential, kinetic and total energy

In the plot you can see that all three energy terms reached a plateau. It is hard to say how large the fluctuations are, because the scale is too large. But as they converged the simulation seems to have reached its optimum.

Volume

Figure 38: Volume of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Volume	735.625	0.041	0.555533	0.00591744 (nm³)

Estimate the plateau values for the volume.
735.625?

In figure 38 you can see the volume fluctuating between 734 nm³ and 737 nm³. The difference is only three nm³. So the colume also converged to a an interval.

Density

Figure 39: Density of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Density	1009.66	0.057	0.762479	-0.0081254 (kg/m³)

Estimate the plateau values for the density.
1009.66? 1010?

In figure 39 you can see the density fluctuating around 1010 kg/m³. The difference seems to be about 5 kg/m³. So the density also converged.

Box

Figure 40: Boxes of the Molecular Dynamics Simulation of Glucocerebrosidase with Mutation 7

Energy	Average	Err.Est.	RMSD	Tot-Drift
Box-X	10.1327	0.00019	0.00255068	2.71821e-05 (nm)
Box-Y	10.1327	0.00019	0.00255068	2.71821e-05 (nm)
Box-Z	7.16488	0.00013	0.0018036	1.92236e-05 (nm)

What are the terms plotted in the file box.xvg?
The size of the box around the protein.

In figure 40 you can see the sizes of the box, the x-, the y- and the z-term. They are very constant with 10.1327, 10.1327 and 7.16488 nm.

Interaction Energy: Coulomb

Figure 41: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
Coul-SR:Protein-Protein	-23734.3	28	191.785	24.8598 (kJ/mol)
Coul-14:Protein-Protein	95624.1	46	212.282	278.389 (kJ/mol)

In figure 41 you can see the Coulomb Energy. It has its average at -23734.3 kJ/mol and 95624.1 kJ/mol. It fluctuates around that value in an interval of about 1000 kJ/mol. At the end it is more stable. As the equilibration takes longer for this term it may be that the value at the time 8000/-17000 to 10000/-19000 is the equilibrium.

Interaction Energy: van der Waals

Figure 42: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
LJ-SR:Protein-Protein	-16410.5	27	128.606	140.393 (kJ/mol)
LJ-14:Protein-Protein	8533.35	2.4	61.2867	12.2783 (kJ/mol)

In figure 13 you can see the van der Waals interaction energy. It has its average at -16410.5 kJ/mol and 8533.35 kJ/mol. The values are fluctuating in an interval of about 1000 kJ/mol.

Minimum distances between periodic images

We used the following command: g_mindist -f 2NT0_mut7_md.xtc -s 2NT0_mut7_md.tpr -od minimal-periodic-distance.xvg -pi

Figure 43: Minima periodic distance on the protein

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 2.366 (nm) at time 135 (ps), between atoms 930 and 2377. This is a larger minimum distance than in the wildtype.

Figure 44: Minima periodic distance on the C-alpha group

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 2.95133 (nm) at time 240 (ps), between atoms 921 and 2237. This is also larger than in the wildtype.

What happens if the minimal distance becomes shorter than the cut-off distance used for electrostatic interactions? Is it the case in your simulations?

It is not the case in our simulation. The energy would "explode".

Run now g_mindist on the C-alpha group, does it change the results? What does is mean for your system?

It changes to a greater distance between two other atoms (921 and 2237). That means that it may be atoms of a side group that have this minimal distance. Also the fluctuation interval of the minimum distance becomes smaller.

Root mean square fluctuations

In the following we want to look at the RMSF, the root mean square fluctuations<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref>.

Therefore we used the following command:

g_rmsf -f 2NT0_mut7_md.xtc -s 2NT0_mut7_md.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 45: RMSF fluctuation

Indicate the start and end residue for the most flexible regions and the maximum amplitudes.

In figure 45 you can see the RMSF fluctuation. With that you can recognise flexible regions. The most flexible region seems to be around residue 350. This differs a lot from the wildtype where the most flexible regions was between 125 and residue 175 and the whole protein was more flexible.

Figure 46a: Colored structure according to flexible regions

Figure 46b: Colored structure according to b-factors

In figure 46a you can see the protein colored according flexible regions. The b-factors are shown in figure 46b. Blue means that the region is not flexible and red means that it is a very flexible region. You can see one loop that is very flexible and some very small parts of the alpha helices and a beta sheet that are a bit flexible. The most of the inner part of the protein is colored blue and therefore not flexible. Compared to the wildtype the protein lost flexibility.

Convergence of RMSD

Now we want to calculate the RMSD. It is used as an indicator of convergence of the structure towards an equilibrium state.

We used the following commands:

trjconv -f 2NT0_mut7_md.xtc -o traj_nojump.xtc -pbc nojump g_rms -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -o rmsd-all-atom-vs-start.xvg

g_rms -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -o rmsd-backbone-vs-start.xvg echo 1 | trjconv -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -o protein.xtc g_rms -f protein.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg g_rms -f protein.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Figure 47a: RMSD: All atom versus all, against starting structure

Figure 47b: RMSD: All atom versus all, against average structure

Figure 48a: RMSD: Backbone, against starting structure

Figure 48b: RMSD: Backbone, against average structure

If observed, at what time and value does the RMSD reach a plateau?

If you compare the different methods (all atom/backbonde and against starting structure or average structure) as you can see in figure 47a, b and 48a, b there are different times when the RMSD reaches a plateau. For all atoms and backbone against the starting structure both reach their plateau at about 2000 ps with a value of 0.17 and a bit higher for backbone with about 0.18.

For the comparison to the average structure they reach a plateau later. At 4000 to 5000 ps. All atom versus all reaches a RMSD of about 0.16, which is lower than in the wildtype, the backbone of 0.06, which is very low and almost the same as in the wildtype.

Briefly discuss two differences between the graphs against the starting structure and against the average structure. Which is a better measure for convergence?

As already mentioned the convergence is reached later. The reason for that is that the average structure is nearer to the final structure. Another thing is that the RMSD for the average structures gets lower wheras it gets higher for the comparison to the starting structure. The starting structure is farer away from the final structure, so this is again the reason. It might be better to take the starting structure to recognise the convergence because you can see sooner that an equilibrian state is reached.

Convergence of radius of gyration

Now look at the radius of gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref>. Therefore we used the following command:

g_gyrate -f 2NT0_mut7_md.xtc -s 2NT0_mut7_md.tpr -o radius-of-gyration.xvg

The radius of gyration gives insight to the shape of the protein.

Figure 49: Radius of gyration

At what time and value does the radius of gyration converge?

The radius of gyration is always constant with about 2.3 nm. The RgX, RgY and RgZ fluctuate until the end. RgX and RgY reach the value 2.1. RgZ is much lower and reaches a value of 1.5. This differs a lot from the wildtype Molecular Dynamics simulation.

Structural analysis: properties derived from configurations

Solvent accessible surface area

With the Solvent accessible surface area (SASA)<ref>http://en.wikipedia.org/wiki/Accessible_Surface_Area</ref> we can see which parts of the protein are reachable by solvents.

Figure 50: Solvent Accessable Surface

Figure 51: Atomic SAS

Figure 52: Residue SAS

In figure 50 you can see the solvent accessable surface during the time, in figure 51 the atomic SAS and in figure 52 the residue SAS. It is very hard to say which residues are the most accessible to the solvent. But there might be some areas between residue 225, 250 and 300, at residue 350, 400 and also 450. Figure 50 shows that the solvation is almost the same all time.

Hydrogen bonds

As a next step we look at the hydrogen bonds in the protein and between the protein and the surrounding solvent. Therefore we use the following commands:

echo 1 1 | g_hbond -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -num hydrogen-bonds-intra-protein.xvg echo 1 12 | g_hbond -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -num hydrogen-bonds-protein-water.xvg

Figure 53: Hydrogen bonds in the protein

Figure 54: Hydrogen bonds between protein and the surrounding solvent

Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.

In figure 53 you can see the hydrogen bonds internally, that means which are build between the protein itself, and in figure 54 you can see the hydrogen bonds that are build with the surrounding water. Internally there are less hydrogen bonds build (about 350), less than with the water (about 800), but there are more pairs within 0.35 nm (1700 compared to 1100). The fluctuation is larger with the solvent. Internally there is only little fluctuation in an interval of about 50 to 100. That's exactly the same as in the wildtype.

Ramachandran (phi/psi) plots

As a next step we plotted the Ramachandran plot with the following command:

g_rama -f traj_nojump.xtc -s 2NT0_mut7_md.tpr -o ramachandran.xvg

Figure 56: Ramachandran plot

Compared to standard Ramachandran plots<ref>http://en.wikipedia.org/wiki/Ramachandran_plot</ref> you can see that there are huge regions on the right side which normally do not occur. Looking at the Glycine Ramachandran plot at http://en.wikipedia.org/wiki/Ramachandran_plot there is more analogy than to the Ramachandran plot in general case. But Glycine is not overrepresented (6.8 per cent) in Glucocerebrosidase. That is the same as in the wildtype. But there is also one difference. If you compare Psi -50 ans Phi 150 to the wildtype Ramachandran plot you can see that there are some points missing. It is the same for Psi 150 and Phi 125. Although only one residue mutated there is a change in the Phi/Psi-angles.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Now we want to compare the different structures during the trajectory. So we can find similar structures.

We therefore used the following commands:

g_rms -s 2NT0_mut7_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue gv rmsd-matrix.eps

Figure 57: RMSD Matrix

In figure 57 you can see some clusters that are colored blue and green. There are four clusters which are colored green and blue. This is different to the wildtype. There you also can see such clusters but not as clear as in figure 57. The huge cluster between 1500 and 4000 ps appears again but the smaller clusters in that region are stronger now.

Cluster analysis

Now we want to find clusters of similar structures. Therefore we use the following commands:

g_cluster -h echo 6 6 | g_cluster -s 2NT0_mut7_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10

Figure 58: RMSD Distribution

In figure 58 you can see that most structures have a RMS lower than 0.1. Six clusters were found for the trajectory. The RMSD ranges from 0.0545 to 0.18 nm, the average RMSD is 0.102969.

But there are no notable differences between the clusters that can be seen by observing them in Pymol. Only some loops look a little different.

Figure 59: Clusters in Pymol

Distance RMSD

Now we want to see the interatomic distances by using the distance RMSD (dRMSD). Therefore we used the following command:

g_rmsdist -s 2NT0_mut7_md.tpr -f traj_nojump.xtc -o distance-rmsd.xvg

Figure 60: dRMSD

At what time and value does the dRMSD converge and how does this graph compare to the standard RMSD?

It converges at the end at about 7000. So much time is needed until the protein reaches its equilibrium state. There it reaches a RMSD of almost 0.25 nm.

Mutation 10

A brief check of results

command: gmxcheck -f 2NT0_mut10_md.xtc

How many frames are in the trajectory file and what is the time resolution?

2001 frames
timestep: 5ps

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

Simulation run: 8h39:07
Simulation speed: 27.739
Time to reach a second: 1/0.000000028 = 35714285,714285714 days = 97847,358121331 years

Which contribution to the potential energy accounts for most of the calculations?

-9.39855e+05 kJ/mol

Visualization of results

For visualization we extract 1000 frames from the trajectory (-dt 10), only select the protein without the water and remove the jumps over the boundaries to make a continuous trajectory (-pbc nojump).

command: trjconv -s 2NT0_mut10_md.tpr -f 2NT0_mut10_md.xtc -o protein.pdb -pbc nojump -dt 10

Figure 61a: The protein with the spectrum

Figure 61b: The protein with the box around it

Figure 61c: The cartoon presentation after using the command dss

Figure 61d: The active site

Figure 62a: Molecular Dynamics Simulation of Glucocerebrosidase

Figure 62b: Molecular Dynamics Simulation of Glucocerebrosidase

Quality assurance

Convergence of energy terms

First we look at the energy terms. Therefore we look at the temperature, the pressure, the energy, the volume, the density, the box and the interaction energies between protein and solvent (Coulomb and van der Waals).

With the following commands we produced *.xvg files which were visualized with xmgrace.

g_energy -f 2NT0_mut10_md.edr -o temperature.xvg g_energy -f 2NT0_mut10_md.edr -o pressure.xvg g_energy -f 2NT0_mut10_md.edr -o energy.xvg g_energy -f 2NT0_mut10_md.edr -o volume.xvg g_energy -f 2NT0_mut10_md.edr -o density.xvg g_energy -f 2NT0_mut10_md.edr -o box.xvg

Temperature

Figure 63: Temperature of the Molecular Dynamics Simulation of the protein with Mutation 10

Energy	Average	Err.Est.	RMSD	Tot-Drift
Temperature	297.907	0.007	1.09216	0.00537584 (K)

What is the average temperature and what is the heat capacity of the system?
297.907 K

In figure 63 you can see the plot for the temperature during the simulation. It fluctuates between 294 K and 302 K. So the temperature is stable, the difference is only eight Kelvin. That means that the system reached very soon a stable temperature.

Pressure

Figure 64: Pressure of the Molecular Dynamics Simulation of the protein with mutation 10

Energy	Average	Err.Est.	RMSD	Tot-Drift
Pressure	0.995622	0.016	87.7623	-0.0375739 (bar)

Estimate the plateau values for the pressure.
0 bar? 0.995622 bar? higher?

The pressure fluctuates around -300 bar and +300 bar. That is a difference of 600 bar which is very large. The system seems not to have reached its plateau. So it is also hard to estimate such a value because no trend is observable.

Energy

Figure 65: Energies of the Molecular Dynamics Simulation

Energy	Average	Err.Est.	RMSD	Tot-Drift
Potential	-939855	120	936.926	-807.027 (kJ/mol)
Kinetic En.	170899	4	626.533	3.08393 (kJ/mol)
Total Energy	-768957	120	1142.33	-803.943 (kJ/mol)

What are the terms plotted in the file energy.xvg?
potential, kinetic and total energy

In the plot you can see that all three energy terms reached a plateau. It is hard to say how large the fluctuations are, because the scale is too large. But as they converged the simulation seems to have reached its optimum.

Volume

Figure 66: Volume of the Molecular Dynamics Simulation

Energy	Average	Err.Est.	RMSD	Tot-Drift
Volume	735.529	0.058	0.543357	-0.382496 (nm³)

Estimate the plateau values for the volume.
735.529? 736?

In figure 66 you can see the volume fluctuating between 734 nm³ and 737 nm³. The difference is only three nm³. So the colume also converged to a an interval.

Density

Figure 67: Density of the Molecular Dynamics Simulation

Energy	Average	Err.Est.	RMSD	Tot-Drift
Density	1009.82	0.08	0.745989	0.525138 (kg/m³)

Estimate the plateau values for the density.
1009.82? 1010?

In figure 67 you can see the density fluctuating around 1010 kg/m³. The difference seems to be about 5 kg/m³. So the density also converged.

Box

Figure 68: Boxes of the Molecular Dynamics Simulation

Energy	Average	Err.Est.	RMSD	Tot-Drift
Box-X	10.1322	0.00027	0.002495	-0.0017563 (nm)
Box-Y	10.1322	0.00027	0.002495	-0.0017563 (nm)
Box-Z	7.16457	0.00019	0.00176423	-0.00124195 (nm)

What are the terms plotted in the file box.xvg?
The size of the box around the protein.

In figure 68 you can see the sizes of the box, the x-, the y- and the z-term. They are very constant with 10.1322, 10.1322 and 7.16457 nm.

Interaction Energy: Coulomb

Figure 69: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
Coul-SR:Protein-Protein	-23792.5	35	192.508	-46.3472 (kJ/mol)
Coul-14:Protein-Protein	95298	89	249.16	613.217 (kJ/mol)

In figure 69 you can see the Coulomb Energy. It has its average at -23792.5 kJ/mol and 95298 kJ/mol. It fluctuates around that value in an interval of about 1000 kJ/mol. At the end it is more stable. As the equilibration takes longer for this term it may be that the value at the time 8000/-15500 to 10000/-16500 is the equilibrium.

Interaction Energy: van der Waals

Figure 70: Interaction energy of protein and solvent

Energy	Average	Err.Est.	RMSD	Tot-Drift
LJ-SR:Protein-Protein	-16508.3	18	117.895	88.1434 (kJ/mol)
LJ-14:Protein-Protein	8509.66	5.3	61.7179	-36.3349 (kJ/mol)

In figure 70 you can see the van der Waals interaction energy. It has its average at -16508.3/8509.66 kJ/mol. The values are fluctuating in an interval of about 1000 kJ/mol. Also at the end of the time there is no equilibration recognisable. Maybe it always fluctuates in such a big interval or it needs more time to converge.

Minimum distances between periodic images

We used the following command: g_mindist -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -od minimal-periodic-distance.xvg -pi

Figure 71: Minima periodic distance on the protein

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 1.90705 (nm) at time 1485 (ps), between atoms 965 and 2274 (different to the wildtype), which is smaller than in the wildtype.

Figure 72: Minima periodic distance on the C-alpha group

What was the minimal distance between periodic images and at what time did that occur?

The shortest periodic distance is 2.37553 (nm) at time 1485 (ps), between atoms 952 and 2267. This is again smaller than in the wildtype although these are the same atoms.

What happens if the minimal distance becomes shorter than the cut-off distance used for electrostatic interactions? Is it the case in your simulations?

It is not the case in our simulation. The energy would "explode".

Run now g_mindist on the C-alpha group, does it change the results? What does is mean for your system?

It changes to a greater distance between two other atoms (952 and 2267). That means that it may be atoms of a side group that have this minimal distance. Also the fluctuation interval of the minimum distance becomes smaller.

Root mean square fluctuations

In the following we want to look at the RMSF, the root mean square fluctuations<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref>.

Therefore we used the following command:

g_rmsf -f 2NT0_mut10_md.xtc -s 2NT0_mut10_md.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 73: RMSF fluctuation

Indicate the start and end residue for the most flexible regions and the maximum amplitudes.

In figure 73 you can see the RMSF fluctuation. With that you can recognise flexible regions. The most flexible region seems to be around residue 50. There are many other flexible regions all over the protein. But there is no huge peak. This differs from the wildtype.

Figure 74a: Colored structure according to flexible regions

Figure 74b: Colored structure according to b-factors

In figure 74 you can see the protein colored according to b-factors and flexible regions. Blue means that the region is not flexible and red means that it is a very flexible region. You can see some loops that are flexible and parts of the alpha helices and beta sheets that are a bit more flexible. The most of the inner part of the protein is colored blue and therefore not flexible. It differs again from the wildtype. Other regions are colored red and are therefore the most flexible and also the flexibility of the alpha helices and the beta sheets differs.

Convergence of RMSD

Now we want to calculate the RMSD. It is used as an indicator of convergence of the structure towards an equilibrium state.

We used the following commands:

trjconv -f 2NT0_mut10_md.xtc -o traj_nojump.xtc -pbc nojump g_rms -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -o rmsd-all-atom-vs-start.xvg

g_rms -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -o rmsd-backbone-vs-start.xvg echo 1 | trjconv -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -o protein.xtc g_rms -f protein.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg g_rms -f protein.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Figure 75a: RMSD: All atom versus all, against starting structure

Figure 75b: RMSD: All atom versus all, against average structure

Figure 76a: RMSD: Backbone, against starting structure

Figure 76b: RMSD: Backbone, against average structure

If observed, at what time and value does the RMSD reach a plateau?

If you compare the different methods (all atom/backbonde and against starting structure or average structure) there are different times when the RMSD reaches a plateau. For all atoms and backbone against the starting structure both reach their plateau at about 3000 ps with a value of 0.2 and lower for backbone with about 0.07. These are lower values than in the wildtype although the equilibration is reached later.

For the comparison to the average structure they reach a plateau later. At 4000 to 5000 ps. Both reach a value of about 0.15, which is very high for the backbone compared to the wildtype simulations.

Briefly discuss two differences between the graphs against the starting structure and against the average structure. Which is a better measure for convergence?

As already mentioned the convergence is reached later. The reason for that is that the average structure is nearer to the final structure. Another thing is that the RMSD for the average structures gets lower wheras it gets higher for the comparison to the starting structure. The starting structure is farer away from the final structure, so this is again the reason. It might be better to take the starting structure to recognise the convergence because you can see sooner that an equilibrian state is reached.

Convergence of radius of gyration

Now look at the radius of gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref>. Therefore we used the following command:

g_gyrate -f 2NT0_mut10_md.xtc -s 2NT0_mut10_md.tpr -o radius-of-gyration.xvg

The radius of gyration gives insight to the shape of the protein.

Figure 77: Radius of gyration

At what time and value does the radius of gyration converge?

The radius of gyration is always constant with about 2.3 nm. The RgX, RgY and RgZ fluctuate until the end, only RgY has relatively small fluctuating intervals. RgY reaches about 1.05 nm, RgX about 1.9 nm and RgZ about 1.75 nm. This again differs compared to the wildtype simulation.

Structural analysis: properties derived from configurations

Solvent accessible surface area

With the Solvent accessible surface area (SASA)<ref>http://en.wikipedia.org/wiki/Accessible_Surface_Area</ref> we can see which parts of the protein are reachable by solvents.

Figure 78a: Solvent Accessable Surface

Figure 78b: Atomic SAS

Figure 78c: Residue SAS

In figure 78a you can see the solvent accessable surface during the time, in figure 78b the atomic SAS and in figure 78c the residue SAS. It is very hard to say which residues are the most accessible to the solvent. But there might be some areas between residue 200 and 250 and 300 to 350, at residue 400 and also 450. Figure 78a shows that the solvation is almost the same all time.

Hydrogen bonds

As a next step we look at the hydrogen bonds in the protein and between the protein and the surrounding solvent. Therefore we use the following commands:

echo 1 1 | g_hbond -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -num hydrogen-bonds-intra-protein.xvg echo 1 12 | g_hbond -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -num hydrogen-bonds-protein-water.xvg

Figure 79a: Hydrogen bonds in the protein

Figure 79b: Hydrogen bonds between protein and the surrounding solvent

Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.

In figure 79a you can see the hydrogen bonds internally, that means which are build between the protein itself, and in figure 79b you can see the hydrogen bonds that are build with the surrounding water. Internally there are less hydrogen bonds build (about 350), less than with the water (about 800), but there are more pairs within 0.35 nm (1700 compared to 1100). The fluctuation is larger with the solvent. Internally there is only little fluctuation in an interval of about 50 to 100. That is the same as in the wildtype. Only the fluctuation seems to be larger at the end.

Ramachandran (phi/psi) plots

As a next step we plotted the Ramachandran plot with the following command:

g_rama -f traj_nojump.xtc -s 2NT0_mut10_md.tpr -o ramachandran.xvg

Figure 81: Ramachandran plot

Compared to standard Ramachandran plots<ref>http://en.wikipedia.org/wiki/Ramachandran_plot</ref> you can see that there are huge regions on the right side which normally do not occur. Looking at the Glycine Ramachandran plot at http://en.wikipedia.org/wiki/Ramachandran_plot there is more analogy than to the Ramachandran plot in general case. But Glycine is not overrepresented (6.8 per cent) in Glucocerebrosidase. It looks similar to the Ramachandran plot of the wildtype, but there are also some differences. The region around Psi 50 and Phi 100 is smaller than in the wildtype. If you look at Psi -100 and Phi -150 you can see that there are more points. Here again the change of only one amino acid changes the Ramachandran plot.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Now we want to compare the different structures during the trajectory. So we can find similar structures.

We therefore used the following commands:

g_rms -s 2NT0_mut10_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue gv rmsd-matrix.eps

Figure 82: RMSD Matrix

In figure 82 you can see some clusters that are colored blue and green. Conspicuous is the blue one from 3000 to 9000. The structures must be very similar in this interval. At the end there must be some change again. The wildtype simulation does not show this.

Cluster analysis

Now we want to find clusters of similar structures. Therefore we use the following commands:

g_cluster -h echo 6 6 | g_cluster -s 2NT0_mut10_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10

Figure 83: RMSD Distribution

In figure 83 you can see that half of the structures have a RMS lower than 0.1, half of them higher than 0.1. Compared to the wildtype you can see that the clusters are more different. We could also see this in the previous section. The RMSD ranges from 0.0551 to 0.198 nm, the average RMSD is 0.100617. Seven clusters were found for the trajectory.

Anyway there are again no notable differences between the clusters that can be seen by observing them in Pymol. Only some loops look a little different.

Figure 84: Clusters in Pymol

Distance RMSD

Now we want to see the interatomic distances by using the distance RMSD (dRMSD). Therefore we used the following command:

g_rmsdist -s 2NT0_mut10_md.tpr -f traj_nojump.xtc -o distance-rmsd.xvg

Figure 85: dRMSD

At what time and value does the dRMSD converge and how does this graph compare to the standard RMSD?

It converges at the end at about 6000. So much time is needed until the protein reaches its equilibrium state. It reaches a RMSD of 0.175.

Discussion

Wildtype compared to Mutation 7

Visualization

Only because of the visualization of the whole protein one cannot say something special about the mutated structure. They look very similar und it is not possible to decide, if there is a change. For that we must look a bit closer to the region where the mutation occurs. In this case this is position 370 where Asparagine mutated to Serine.

Figure 86a: Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to Asparagine at position 370

Figure 86b: Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to the mutated Serine at position 370

But also a closer look does not show any changes. The motion of the helix is the same as in the wildtype. There is also no visible change of the active site (figure 4 and 33). So just by visualization we do not see the effect of the mutation.

Energy terms

Next step is to look at the energy terms. They are quite the same for temperature, pressure, potential and kinetic energy, volume, density, Box-X, -Y and -Z and the interaction energies, which means Coulomb and Lennard Jones potential. This is what we expect because there should be an equilibrium state which is almost the same in wildtype and mutant.

Minimum distances between periodic images

The minimum periodic distances on the protein and the C-alpha atoms change. They are about 0.5 nm larger in the mutant. So the arrangement of the atoms changes a bit.

Flexible regions: Root mean square fluctuations

Figure 87: Most flexible regions of wildtype (orange) and mutated structre (green) mapped onto 2NT0. Active site residues are shown in blue, residues forming hydrogen bonds with active site in cyan and mutation 7 in red.

Comparing the root mean square fluctuations of wildtype and mutated structre, one can see that the flexibility of the mutated structure is different than the flexibility of the wildtype. The flexibility of both structures differs the most in two regions: position 125 to position 175 and position 325 to 375. The wildtype structure is more flexible than the mutated one in the area of position 125 to position 175. This region is hilighted in orange in Figure 87. None of the active site residues (shown in blue) nor the residues forming hydrogen bonds with the active site are part of this region. The most flexible region of the mutated structure includes in contrast the active site residue Glu340 and the mutation itself. This region ranges from residue 325 to residue 375 and is shown in green. The missing polar interaction formed between S370 and S366 (see polar interactions in structure-based mutation analysis) might be the reason for the higher flexibility of this region in the mutated structure. The flexibility of a protein is an important aspect for its function. A protein may either be flexible or not flexible at all to perform a molecular function. The fact, that the flexibility of the protein is changing through the mutation, might be the cause for the damaging effect of the mutation N370S, especially as the new most flexible region in the mutant includes one of the active site residues. A more flexible active site than the wildtype active site could have a major impact on the proteins function: one of the active site residues could be shifted or moved away from the other one, which might inhibit the function of the protein. This might be the major cause of a damaging effect.

Convergence of RMSD

For the convergence of the RMSD we get almost the same results. There is no change because of the mutation.

Convergence of radius of gyration

The radius of gyration differs for wildtype and mutated structure. So the different modes during the Molecular Dynamics Simulation differ. But this says nothing about the effect of the mutation.

Structural analysis

The Solvent accessible surface area does not change at the mutated structure. The results for the wildtype simulation are almost the same as for the simulation of the mutated protein. This is important because now we know, that solvent can reach the same area as in the wildtype.

Also the results for the Hydrogen bonds are the same for wildtype and mutation. This is another important point because if there would be a change this could affect the function of the protein.

The interesting thing is the Ramachandran plot. In figure 26 and 56 one can see that the phi and psi angles differ a bit. This means that the orientation of some residues changed. If this affects residues which are important for the protein's function this could be a reason why the mutation causes Gaucher Disease.

Analysis of dynamics and time-averaged properties

This is more important for the simulation. We see some differences for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.

All in all

There are two hints for a harmful mutation. On the one hand we have a different flexibility. This can be very important if it affects the sites which are part of the active site or important for the protein's function. On the other hand the Ramachandran plot changes, which means that some phi/psi angles are different. The orientation of some residues changed and that might be responsible for the loss of the protein's function. We know that this mutation is listed in HGMD, it is even the most common mutation in Gaucher Disease. We do not see the change only by looking at the movement in Pymol but with help of the analysis we can at least assume that the reason for the loss of the protein's function is the different flexibility and the different orientation of some residues.

Wildtype compared to Mutation 10

Visualization

Also in this case we see no difference in the motion of the whole protein. Therefore we now have a closer look to the mutated site. In this case this is position 470 where Leucine mutated to Proline.

Figure 88a: Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to Asparagine at position 370

Figure 88b: Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to the mutated Serine at position 370

And again we do not see any difference by just looking at it with Pymol. Of course the residue itself has another motion but this does not affect the protein.

Energy terms

Now we look at the energy terms. They are again almost the same for temperature, pressure, potential and kinetic energy, volume, density, Box-X, -Y and -Z and the interaction energies, which means Coulomb and Lennard Jones potential. This is what we expect because there should be an equilibrium state which is almost the same in wildtype and mutant.

Minimum distances between periodic images

The minimum periodic distances on the protein and the C-alpha atoms change. They are about 0.07 nm smaller in the mutant. So the arrangement of the atoms changes a bit.

Flexible regions: Root mean square fluctuations

The flexible regions differ compared to the wildtype. In the mutant structure are many other flexible regions all over the protein but no huge peak. The most flexible region of the wildtype, which is between residue 125 and residue 175, is here not the most flexible one. There is no residue with such a high flexibility. So it decreases.

Figure 16a and 74a also show that the flexibility of the protein changes. But it is not as strong as it was in the mutated structure for site 370. But there are definitely differences and so this might influence the function of the protein. For example it can have an affect on the binding property.

Convergence of RMSD

For this mutation the convergence of the RMSD changes a bit. It is an indicator of convergence of the structure towards an equilibrium state. For the mutation the RMSD is lower if you compare it to the starting or average structure. So the simulation reaches faster the equilibrium state. But this only gives a hint, that the structure differs from the wildtype.

Convergence of radius of gyration

The radius of gyration differs for wildtype and mutated structure. So the different modes during the Molecular Dynamics Simulation differ. But this again says nothing about the effect of the mutation, only that the simulation was different.

Structural analysis

The Solvent accessible surface area does not change at the mutated structure. The results for the wildtype simulation are almost the same as for the simulation of the mutated protein. This is important because now we know, that solvent can reach the same area as in the wildtype.

Also the results for the Hydrogen bonds are the same for wildtype and mutation, only the fluctuation is larger in the mutant structure. This is another important point because if there would be a change this could affect the function of the protein.

But also here the Ramachandran plot is the most interesting result. In figure 26 and 81 one can see that the phi and psi angles differ, even more than in the structure of mutation seven. This means that the orientation of some residues changed. If this affects residues which are important for the protein's function this could cause a harmful effect.

Analysis of dynamics and time-averaged properties

This is more important for the simulation. We see some differences for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.

All in all

There are again two hints for a harmful mutation. On the one hand we have a different flexibility. This can be very important if it affects the sites which are part of the active site or important for the protein's function. But in this case the flexibility changes not much, so it might be accepted and the protein's function is maintained.

On the other hand the Ramachandran plot changes and this time even worse than for mutation seven. This means that some phi/psi angles are different. The orientation of some residues changed and that could be responsible for the loss of the protein's function.

But in this case we know that this mutation should be harmless, because it is only listed in dbSNP. With Molecular Dynamics we wanted to see that the mutation is harmless because for our sequence based prediction we had the opposite result. But also with Molecular Dynamics, or at least with the analysis we did above, we can not classify this mutation as harmless. Of course the flexibility changes not much but the Ramachandran plot is again very different. So this also does not help us. Further analysis is necessary.

References

@@ Line 40: / Line 40: @@
 |}
-In Figure 5 one can see the protein in motion. Compared to the results in [[Normal_Mode_Analysis_of_Glucocerebrosidase|Normal Mode Analysis]] the motion is not as strong and the parts that are moving are smaller than the moving ones of the Normal Mode Analysis. In this simulation mostly side chains and little elements of the protein are bouncing. Big movements like attraction or repulsion ,as seen in the Normal Mode Analysis, are not present.
+In Figure 5 one can see the protein in motion. Compared to the results in [[Normal_Mode_Analysis_of_Glucocerebrosidase|Normal Mode Analysis]] the motion is not as strong and the parts that are moving are smaller than the moving ones of the Normal Mode Analysis. In this simulation mostly side chains and little elements of the protein are bouncing. Big movements like attraction or repulsion, as seen in the Normal Mode Analysis, are not present.
 === Quality assurance ===
@@ Line 118: / Line 118: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Volume||735.655||0.042||0.552925||-0.0802412  (nm^3)
+||Volume||735.655||0.042||0.552925||-0.0802412  (nm³)
 |}
@@ Line 132: / Line 132: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Density||1009.64||0.058||0.75885||0.110104  (kg/m^3)
+||Density||1009.64||0.058||0.75885||0.110104  (kg/m³)
 |}
@@ Line 183: / Line 183: @@
 |}
-In Figure 13 one can see the van der Waals interaction energy. The energy has its average at -3184.89 kJ/mol. The values are fluctuating in an interval of about 1000 kJ/mol. The simulation does not seem to have established an equilibrium. This might have two reasons: either the van der Waals interaction energy always fluctuates in such a big interval or it does need more time to converge.
+In Figure 13 one can see the van der Waals interaction energy. The energy has its average at -3184.89 kJ/mol. The values are fluctuating in an interval of about 1000 kJ/mol. The simulation does not seem to have established an equilibrium. This might have two reasons: either the van der Waals interaction energy always fluctuates in such a big interval or it needs more time to converge.
 ==== Minimum distances between periodic images ====
-The get the minimum distances between periodic images, the following command has been used:
+To get the minimum distances between periodic images, the following command has been used:
 <code>g_mindist -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -od minimal-periodic-distance.xvg -pi</code>
@@ Line 213: / Line 213: @@
 The RMSF, root mean square fluctuation<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref>, was calculated with the following command:
 <code>g_rmsf -f 2NT0_wt_md.xtc -s 2NT0_wt_md.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res </code>
@@ Line 219: / Line 220: @@
 '''Indicate the start and end residue for the most flexible regions and the maximum amplitudes.'''
-In Figure 15 you can see the RMSF fluctuation of glucocerebrosidase which allows yout to recognise the most flexible regions. The most flexible region seems to be between residue 125 and residue 175. Other flexible regions are at the beginning of the protein, i.e. between residue 20 and 50, and at the region around position 350. The active site, Glu235 and Glu340, is not part of those flexible regions.
+In Figure 15 you can see the RMSF fluctuation of glucocerebrosidase which allows you to recognise the most flexible regions. The most flexible region seems to be between residue 125 and residue 175. Other flexible regions are at the beginning of the protein, i.e. between residue 20 and 50, and at the region around position 350. The active site, Glu235 and Glu340, is not part of those flexible regions.
@@ Line 229: / Line 230: @@
 ==== Convergence of RMSD ====
-The RMSD (root mean squared deviation) is used as an indicator of convergence of the structure towards an equilibrium state. It was calculated with the following commands:
+The RMSD (root mean squared deviation) is used as an indicator of convergence of the structure towards an equilibrium state. It was calculated with the following commands:<br/>
 <code>
 trjconv -f 2NT0_wt_md.xtc -o traj_nojump.xtc -pbc nojump <br/>
@@ Line 276: / Line 277: @@
 ==== Solvent accessible surface area ====
-With the Solvent accessible surface area (SASA)<ref>http://en.wikipedia.org/wiki/Accessible_Surface_Area</ref> we can see which parts of the protein are reachable by solvents.
+The solvent accessible surface area (SASA) <ref>http://en.wikipedia.org/wiki/Accessible_Surface_Area</ref> indicates, which parts of a protein are reachable by solvents.
 {| class="centered"
@@ Line 284: / Line 285: @@
 |}
-In figure 20 you can see the solvent accessable surface during the time, in figure 21 the atomic SAS and in figure 22 the residue SAS. It is very hard to say which residues are the most accessible to the solvent. But there might be some areas between residue 200 and 250 and 300 to 350, at residue 400 and also 450. Figure 20 shows that the solvation is almost the same all time.
+In Figure 20 you can see the solvent accessible surface in relation to time, in Figure 21 the atomic SAS and in Figure 22 the residue SAS. It is very hard to decide which residues are most accessible to the solvent. The areas between residue 200 and 250 and 300 to 350 and residues 400 and 450 seem to be reachable by solvents the most. Figure 20 shows that the solvation is almost the same at all time.
 ==== Hydrogen bonds ====
-As a next step we look at the hydrogen bonds in the protein and between the protein and the surrounding solvent. Therefore we use the following commands:
+Next, the hydrogen bonds in the protein and between the protein and the surrounding solvent have been investigated. Therefore the following commands have been used:
 <code>
@@ Line 302: / Line 303: @@
 '''Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.'''
-In figure 23 you can see the hydrogen bonds internally, that means which are build between the protein itself, and in figure 24 you can see the hydrogen bonds that are build with the surrounding water. Internally there are less hydrogen bonds build (about 350), less than with the water (about 800), but there are more pairs within 0.35 nm (1700 compared to 1100). The fluctuation is larger with the solvent. Internally there is only little fluctuation in an interval of about 50 to 100.
+In Figure 23 one can see the internal hydrogen bonds, i.e. hydrogen bonds within a single protein, and Figure 24 shows the hydrogen bonds to the surrounding water molecules. Internally, there are less hydrogen bonds built (about 350), than with the surrounding solvent (about 800), but more pairs within 0.35 nm (1700 compared to 1100). The fluctuation is larger for the hydrogen bonds with the solvent. Internally there is only little fluctuation in an interval of about 50 to 100.
 ==== Ramachandran (phi/psi) plots ====
@@ Line 483: / Line 484: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Volume||735.625||0.041||0.555533||0.00591744  (nm^3)
+||Volume||735.625||0.041||0.555533||0.00591744  (nm³)
 |}
@@ Line 500: / Line 501: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Density||1009.66||0.057||0.762479||-0.0081254  (kg/m^3)
+||Density||1009.66||0.057||0.762479||-0.0081254  (kg/m³)
 |}
@@ Line 603: / Line 604: @@
 [[File: glucocerebrosidase_bfactors_rmsf_mut7.png|thumb|400px|center|'''Figure 46b:''' Colored structure according to b-factors]]
-In figure 46a you can see the protein colored according flexible regions. The b-factors are shown in figure 46b. Blue means that the region is not flexible and red means that it is a very flexible region. You can see one loop that is very flexible and parts of the alpha helices and beta sheets that are a bit flexible. The most of the inner part of the protein is colored blue and therefore not flexible. Compared to the wildtype the protein lost some flexibility.
+In figure 46a you can see the protein colored according flexible regions. The b-factors are shown in figure 46b. Blue means that the region is not flexible and red means that it is a very flexible region. You can see one loop that is very flexible and some very small parts of the alpha helices and a beta sheet that are a bit flexible. The most of the inner part of the protein is colored blue and therefore not flexible. Compared to the wildtype the protein lost flexibility.
 ==== Convergence of RMSD ====
@@ Line 868: / Line 869: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Volume||735.529||0.058||0.543357||-0.382496  (nm^3)
+||Volume||735.529||0.058||0.543357||-0.382496  (nm³)
 |}
@@ Line 885: / Line 886: @@
 ||Energy||Average||Err.Est.||RMSD||Tot-Drift
 |-
-||Density||1009.82||0.08||0.745989||0.525138  (kg/m^3)
+||Density||1009.82||0.08||0.745989||0.525138  (kg/m³)
 |}
@@ Line 987: / Line 988: @@
 [[File: glucocerebrosidase_bfactors_rmsf_mut10.png|thumb|400px|center|'''Figure 74b:''' Colored structure according to b-factors]]
-In figure 74 you can see the protein colored according to b-factors and flexible regions. Blue means that the region is not flexible and red means that it is a very flexible region. You can see some loops that are flexible and parts of the alpha helices and beta sheets that are a bit more flexible. The most of the inner part of the protein is colored blue and therefore not flexible. It differs again from the wildtype. Other regions are colored red and therefore the most flexible and also the flexibility of the alpha helices and the beta sheets differs.
+In figure 74 you can see the protein colored according to b-factors and flexible regions. Blue means that the region is not flexible and red means that it is a very flexible region. You can see some loops that are flexible and parts of the alpha helices and beta sheets that are a bit more flexible. The most of the inner part of the protein is colored blue and therefore not flexible. It differs again from the wildtype. Other regions are colored red and are therefore the most flexible and also the flexibility of the alpha helices and the beta sheets differs.
 ==== Convergence of RMSD ====
@@ Line 1,158: / Line 1,159: @@
 '''Flexible regions: Root mean square fluctuations'''
+[[File:Flexible_regions_wildtype_mut7_glucocerebrosidase.png|thumb|center|200px|'''Figure 87:''' Most flexible regions of wildtype (orange) and mutated structre (green) mapped onto 2NT0. Active site residues are shown in blue, residues forming hydrogen bonds with active site in cyan and mutation 7 in red.]]
-The most interesting thing are the root mean square fluctuations. The flexible regions in the mutant are different to the wildtype. Interestingly the highest flexibility is around the mutated site. In the wildtype this is also flexible, but not as strong as in the mutant where the value is almost double of that in the wildtype. This is the first indication, that the mutation really changes something at the protein. In figure 16a and 46a this is even more outstanding. The most flexible loops are different. If the flexibility of the protein changes it might not be able to bind the substrate properly. So the change of the flexibility is a very important fact concerning the effect of the mutation.
+Comparing the root mean square fluctuations of wildtype and mutated structre, one can see that the flexibility of the mutated structure is different than the flexibility of the wildtype. The flexibility of both structures differs the most in two regions: position 125 to position 175 and position 325 to 375.
+The wildtype structure is more flexible than the mutated one in the area of position 125 to position 175. This region is hilighted in orange in Figure 87. None of the active site residues (shown in blue) nor the residues forming hydrogen bonds with the active site are part of this region. The most flexible region of the mutated structure includes in contrast the active site residue Glu340 and the mutation itself. This region ranges from residue 325 to residue 375 and is shown in green. The missing polar interaction formed between S370 and S366 (see polar interactions in structure-based mutation analysis)  might be the reason for the higher flexibility of this region in the mutated structure.
+The flexibility of a protein is an important aspect for its function. A protein may either be flexible or not flexible at all to perform a molecular function. The fact, that the flexibility of the protein is changing through the mutation, might be the cause for the damaging effect of the mutation N370S, especially as the new most flexible region in the mutant includes one of the active site residues.
+A more flexible active site than the wildtype active site could have a major impact on the proteins function: one of the active site residues could be shifted or moved away from the other one, which might inhibit the function of the protein. This might be the major cause of a damaging effect.
 '''Convergence of RMSD'''
@@ Line 1,178: / Line 1,185: @@
 '''Analysis of dynamics and time-averaged properties'''
-This is more important for the simulation. We see here some difference for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.
+This is more important for the simulation. We see some differences for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.
 '''All in all'''
@@ Line 1,191: / Line 1,198: @@
 {| class="centered"
-| [[File: glucocerebrosidase_md_movie_site470_wt.gif|thumb|400px|center|'''Figure 86a:''' Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to Asparagine at position 370]]
+| [[File: glucocerebrosidase_md_movie_site470_wt.gif|thumb|400px|center|'''Figure 88a:''' Molecular Dynamics Simulation of Glucocerebrosidase - a closer look to Asparagine at position 370]]
-| [[File: glucocerebrosidase_md_movie_site470_mut10.gif|thumb|400px|center|'''Figure 86b:''' Molecular Dynamics Simulation of Glucocerebrosidase -  a closer look to the mutated Serine at position 370]]
+| [[File: glucocerebrosidase_md_movie_site470_mut10.gif|thumb|400px|center|'''Figure 88b:''' Molecular Dynamics Simulation of Glucocerebrosidase -  a closer look to the mutated Serine at position 370]]
 |}
@@ Line 1,229: / Line 1,236: @@
 '''Analysis of dynamics and time-averaged properties'''
-This is more important for the simulation. We see here some difference for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.
+This is more important for the simulation. We see some differences for the root mean square deviations, the cluster analysis and also the distance RMSD. This indicates again that the simulation was not the same for wildtype and mutant. But again it does not help to say if the mutation is harmful.
 '''All in all'''
@@ Line 1,237: / Line 1,244: @@
 On the other hand the Ramachandran plot changes and this time even worse than for mutation seven. This means that some phi/psi angles are different. The orientation of some residues changed and that could be responsible for the loss of the protein's function.
-But in this case we know that this mutation should be harmless, because it is only listed in dbSNP. With Molecular Dynamics we wanted to see that the mutation is harmless because for our sequence based prediction we had the opposite result. But also with Molecular Dynamics, or at least with the analysis we did above, we can not classify this mutation as harmless. Of course the flexibility changes not much but the Ramachandran plot is again very different. So this also does not help us.
+But in this case we know that this mutation should be harmless, because it is only listed in dbSNP. With Molecular Dynamics we wanted to see that the mutation is harmless because for our sequence based prediction we had the opposite result. But also with Molecular Dynamics, or at least with the analysis we did above, we can not classify this mutation as harmless. Of course the flexibility changes not much but the Ramachandran plot is again very different. So this also does not help us. Further analysis is necessary.
 == References==

Difference between revisions of "Molecular Dynamics Simulations Analysis of Glucocerebrosidase"

Latest revision as of 02:18, 26 September 2011

Contents

Introduction

Wildtype

A brief check of results

Visualization of results

Quality assurance

Convergence of energy terms

Temperature

Pressure

Energy

Volume

Density

Box

Interaction Energy: Coulomb

Interaction Energy: van der Waals

Minimum distances between periodic images

Root mean square fluctuations

Convergence of RMSD

Convergence of radius of gyration

Structural analysis: properties derived from configurations

Solvent accessible surface area

Hydrogen bonds

Ramachandran (phi/psi) plots

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Cluster analysis

Distance RMSD

Mutation 7

A brief check of results

Visualization of results

Quality assurance

Convergence of energy terms

Temperature

Pressure

Energy

Volume

Density

Box

Interaction Energy: Coulomb

Interaction Energy: van der Waals

Minimum distances between periodic images

Root mean square fluctuations

Convergence of RMSD

Convergence of radius of gyration

Structural analysis: properties derived from configurations

Solvent accessible surface area

Hydrogen bonds

Ramachandran (phi/psi) plots

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Cluster analysis

Distance RMSD

Mutation 10

A brief check of results

Visualization of results

Quality assurance

Convergence of energy terms

Temperature

Pressure

Energy

Volume

Density

Box

Interaction Energy: Coulomb

Interaction Energy: van der Waals

Minimum distances between periodic images

Root mean square fluctuations

Convergence of RMSD

Convergence of radius of gyration

Structural analysis: properties derived from configurations

Solvent accessible surface area

Hydrogen bonds

Ramachandran (phi/psi) plots

Analysis of dynamics and time-averaged properties

Root mean square deviations again

Cluster analysis

Distance RMSD

Discussion