Run MD HEXA
- 1 Run the MD simulation
- 1.1 check the trajectory
- 1.2 Visualization
- 1.3 create a movie
- 1.4 energy calculations for pressure, temperature, potential and total energy
- 1.5 minimum distance between periodic boundary cells
- 1.6 RMSF for protein and C-alpha and PyMol analysis of average and B-factor
- 1.7 Radius of gyration
- 1.8 solvent accessible surface area
- 1.9 hydrogen-bonds between protein and protein / protein and water
- 1.10 Ramachandran plot
- 1.11 RMSD matrix
- 1.12 cluster analysis
- 1.13 internal RMSD
Run the MD simulation
Here we want to give a manual for how to analyze the MD simulation result as we did it in our section.
[Back to Molecular Dynamics]
check the trajectory
First of all we checked the trajectory, to see if our simulation finished successfully and the file is not corrupted.
gmxcheck -f traj.xtc
Next we want to visualize our results:
trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10 pymol protein.pdb
create a movie
Therefore, we load the PDB movie file in PyMol and save each single frame on an own png file. Next, we converted the png files to gif files and than we used [GiftedMotion] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file is too large.
energy calculations for pressure, temperature, potential and total energy
In the next analysis step we calculated the energy values for pressure, temperature, potential and total energy with following commands:
echo 13 0 | g_energy -f ener.edr -o pressure.xvg echo 12 0 | g_energy -f ener.edr -o temperature.xvg echo 9 0 | g_energy -f ener.edr -o potential.xvg echo 11 0 | g_energy -f ener.edr -o total_energy.xvg
We visualized the results of the different runs with the xmgrace program:
xmgrace pressure.xvg xmgrace temperature.xvg xmgrace potential.xvg xmgrace total_energy.xvg
minimum distance between periodic boundary cells
Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation my be wrong. To calculate the minimum distance we used following command:
g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi
We visualized the results with xmgrace:
RMSF for protein and C-alpha and PyMol analysis of average and B-factor
In the next step, we analyzed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have a lot of very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure. Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are. Therefore, we used following commands:
echo 1 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res
We visualized the rmsf-per-residue file with xmgrace. The PDB files were visualized with PyMol. Furthermore, we aligned the calculated structures with the start structure with PyMol to get a RMSD value. Additionally, we looked at the parts of the protein which are really flexible to see how the structure change over time.
Radius of gyration
The radius of gyration is the RMS distance of the protein parts from their center. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein center is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:
g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg
To visualize the result of this calculation we use two different xmgrace commands.
With the following command, we got a plot which shows the change of the radius of gyration over simulation time.
With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.
Xmgrace -nxy radius-of-gyration.xvg
solvent accessible surface area
Another important point is the solvent accessible surface area of the protein. With following command, we calculated the average solvent accessibility per residue and per atom over time, and also the solvent accessibility of the protein over the simulation time. First of all, we have to create the traj_nojump.xtc file with following command:
trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump
Next we calculate the solvent accessible surface area:
g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg
We visualized all of these files with xmgrace.
xmgrace file.xvg xmgrace -nxy file.xvg
The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of physicochemical residues over the simulation.
hydrogen-bonds between protein and protein / protein and water
Next, we calculated the numbers of hydrogen bonds between the protein itself and also between the protein and the surrounding water.
echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg
Again, we visualized the files with xmgrace.
xmgrace hydrogen-bonds-intra-protein.xvg xmgrace -nxy hydrogen-bonds-intra-protein.xvg
With the first command we got a plot which shows us how many hydrogen bonds are in the protein over the simulation time. The second plot shows us how many possible and real hydrogen bonds could be found in the protein.
Now we calculate and visualize the Ramachandran plot. The plot we calculated contains all angels of all residues.
g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg xmgrace ramachandran.xvg
To see, if there is a periodic motion during the simulation or are there very similar structure during the simulation, we calculated a RMSD matrix. Therefore, on both axes are the simulation time. In the plot you can see how similar the two structure at these two simulation points are to each other. So therefore, if there are structures at different time points very similar, you can see this on this plot. So in this case, there is an all-against-all structure comparison.
g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue gv rmsd-matrix.eps
Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight, how many different structures occur during the simulation.
echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
In this simulation, we only use the MainChain and the H atoms, because this calculation is really time consuming, and only looking at the MainChain and the H atoms is enough to get a closer look to the different structures and to cluster them together. The different clusters are visualized with PyMol
pymol clusters.pdb disable all split_states clusters show cartoon
Furthermore, it is possible to align different cluster structures, to see how big the difference between the different clusters is.
align clusters_x, clusters_y
As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each inter atomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space. We calculated the internal RMSD with following command:
g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg xmgrace distance-rmsd.xvg
Therefore, with this calculation we finished our MD result analysis.