From Bioinformatikpedia
Revision as of 15:04, 25 September 2011 by Link (talk | contribs) (cluster analysis)

Run the MD simulation

Here we want to give a receipt for how to analyse the MD simulation result as we did it in our section.

[Back to Molecular Dynamics]

check the trajectory

First of all we checked the trajectory, to see if our simulation finished successfully and the file is not corrupted.

gmxcheck -f traj.xtc

[Back to Molecular Dynamics]


Next we want to visualise our results:

trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10 
pymol protein.pdb 

[Back to Molecular Dynamics]

create a movie

Therefore, we load the pdb movie file in pymol and save each single frame on an own png file. Next, we converted the pgn files to gif files and than we used [GiftedMotion] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file was too large.

[Back to Molecular Dynamics]

energy calculations for pressure, temperature, potential and total energy

In the next analysis step we calculated the energy values for pressure, temperature, potential and total energy with following commands:

echo 13 0 | g_energy -f ener.edr -o pressure.xvg 
echo 12 0 | g_energy -f ener.edr -o temperature.xvg 
echo 9 0 | g_energy -f ener.edr -o potential.xvg 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg 

We visualised the results of the different runs with the xmgrace program:

xmgrace pressure.xvg
xmgrace temperature.xvg
xmgrace potential.xvg
xmgrace total_energy.xvg

[Back to Molecular Dynamics]

minimum distance between periodic boundary cells

Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation my be wrong. To calculate the minimum distance we used following command:

g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi 

We visualised the results with xmgrace:

xmgrace minimal-periodic-distance.xvg

[Back to Molecular Dynamics]

RMSF for protein and C-alpha and Pymol analysis of average and bfactor

In the next step, we analysed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have a lot of very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure. Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are. Therefore, we used following commands:

echo 1 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res 

We visualised the rmsf-per-residue file with xmgrace. The pdb files were visualised with pymol. Furthermore, we aligned the calculated structures with the start structure with pymol to get a RMSD value. Additionally, we looked at the parts of the protein which are really flexible to see how the structure change over time.

[Back to Molecular Dynamics]

Radius of gyration

The Radius of gyration is the RMS distance of the protein parts from their centre. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein centre is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:

g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg 

To visualise the result of this calculation we use two different xmgrace commands.

With the following command, we got a plot which shows the change of the radius of gyration over simulation time.

xmgrace radius-of-gyration.xvg

With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.

[Back to Molecular Dynamics]

solvent accessible surface area

Another important point is the solvent accessible surface area of the protein. With following command, we calculated the average solvent accessibility per residue and per atom over time, and also the solvent accessibility of the protein over the simulation time. First of all, we have to create the traj_nojump.xtc file with following command:

trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump 

Next we calculate the solvent accessible surface area:

g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg 
-oa atomic-sas.xvg -or residue-sas.xvg 

We visualised all of these files with xmgrace.

xmgrace file.xvg
xmgrace -nxy file.xvg

The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of pysicochemical residues over the simulation.

[Back to Molecular Dynamics]

hydrogen-bonds between protein and protein / protein and water

Next, we calculated the numbers of hydrogen bonds between the protein itself and also between the protein and the surrounding water.

echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg 

Again, we visualised the files with xmgrace.

xmgrace hydrogen-bonds-intra-protein.xvg 
xmgrace -nxy hydrogen-bonds-intra-protein.xvg 

With the first command we got a plot which shows us how many hydrogen bonds are in the protein over the simulation time. The second plot shows us how many possible and real hydrogen bonds could be found in the protein.

[Back to Molecular Dynamics]

Ramachandran plot

Now we calculate and visualise the Ramachandran plot. The plot we calculated contains all angels of all residues.

g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg 
xmgrace ramachandran.xvg

[Back to Molecular Dynamics]

RMSD matrix

To see, if there is a periodic motion during the simulation of are there very similar structure during the simulation, we calculated a RMSD matrix. Therefore, on both axis are the simulation time. In the plot you can see how similar the two structure at these two simulation points are to each other. So therefore, if there are structures at different time points very similar, you can see this on this plot. So in this case, there is an all-against-all structure comparison.

g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 
xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue 
gv rmsd-matrix.eps 

[Back to Molecular Dynamics]

cluster analysis

Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight, how many different structures occur during the simulation.

echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg 
-o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10 

In this simulation, we only use the MainChain and the H atoms, because this calculation is really time consuming, and only looking at the MainChain and the H atoms is enough to get a closer look to the different structures and to cluster them together. The different clusters are visualised with pymol

pymol clusters.pdb
disable all
split_states clusters
show cartoon

Furthermore, it is possible to align different cluster structures, to see how big the difference between the different clusters is.

align clusters_x, clusters_y

[Back to Molecular Dynamics]

internal RMSD

As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each interatomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space. We calculated the internal RMSD with following command:

g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg 
xmgrace distance-rmsd.xvg

Therefore, with this calculation we finished our MD result analysis.

[Back to Molecular Dynamics]