Difference between revisions of "Run MD HEXA"

From Bioinformatikpedia
(cluster analysis)
(cluster analysis)
 
Line 155: Line 155:
 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
   
In this simulation, we only use the MainChain and the H atoms, because this calculation is really time consuming, and only looking at the MainChain and the H atoms is enough to get a closer look to the different structures and to cluster them together.
+
In this simulation, we use only the MainChain and the H atoms, because this calculation is really time consuming. The analysis of the MainChain and the H atoms provides already a lot insightes into the structure differences and allows a good cluster.
 
The different clusters are visualized with PyMol
 
The different clusters are visualized with PyMol
   

Latest revision as of 19:02, 30 September 2011

Run the MD simulation

Here we want to give a manual for how to analyze the MD simulation result as we did it in our section.
[Back to Molecular Dynamics]

check the trajectory

First of all we checked the trajectory, to see if our simulation finished successfully and the file is not corrupted.

gmxcheck -f traj.xtc

[Back to Molecular Dynamics]

Visualization

Next we want to visualize our results:

trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10 
pymol protein.pdb 


[Back to Molecular Dynamics]

create a movie

Therefore, we load the PDB movie file in PyMol and save each single frame on an own png file. Next, we converted the png files to gif files and than we used [GiftedMotion] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file is too large.

[Back to Molecular Dynamics]

energy calculations for pressure, temperature, potential and total energy

In the next analysis step we calculated the energy values for pressure, temperature, potential and total energy with following commands:

echo 13 0 | g_energy -f ener.edr -o pressure.xvg 
echo 12 0 | g_energy -f ener.edr -o temperature.xvg 
echo 9 0 | g_energy -f ener.edr -o potential.xvg 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg 

We visualized the results of the different runs with the xmgrace program:

xmgrace pressure.xvg
xmgrace temperature.xvg
xmgrace potential.xvg
xmgrace total_energy.xvg

[Back to Molecular Dynamics]

minimum distance between periodic boundary cells

Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation may be wrong. To calculate the minimum distance we used following command:

g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi 

We visualized the results with xmgrace:

xmgrace minimal-periodic-distance.xvg

[Back to Molecular Dynamics]

RMSF for protein and C-alpha and PyMol analysis of average and B-factor

In the next step, we analyzed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have many very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure. Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are. Therefore, we used following commands:

echo 1 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res 

We visualized the rmsf-per-residue file with xmgrace. The PDB files were visualized with PyMol. Furthermore, we aligned the calculated structures with the start structure with PyMol to get a RMSD value. Additionally, we looked at the real flexible parts of the protein to see how the structure changes over time.

[Back to Molecular Dynamics]

Radius of gyration

The radius of gyration is the RMS distance of the protein parts from their center. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein center is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:

g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg 

To visualize the result of this calculation we use two different xmgrace commands.

With the following command, we got a plot which shows the change of the radius of gyration over simulation time.

xmgrace radius-of-gyration.xvg

With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.

Xmgrace -nxy radius-of-gyration.xvg


[Back to Molecular Dynamics]

solvent accessible surface area

Another important point is the solvent accessible surface area of the protein. With following command, we calculated the average solvent accessibility per residue and per atom over time, and also the solvent accessibility of the protein over the simulation time. First of all, we have to create the traj_nojump.xtc file with following command:

trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump 

Next we calculated the solvent accessible surface area:

g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg 
-oa atomic-sas.xvg -or residue-sas.xvg 

We visualized all of these files with xmgrace.

xmgrace file.xvg
xmgrace -nxy file.xvg

The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of physicochemical residues over the simulation.

[Back to Molecular Dynamics]

hydrogen-bonds between protein and protein / protein and water

Next, we calculated the numbers of hydrogen bonds between the protein itself and also between the protein and the surrounding water.

echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg 

Again, we visualized the files with xmgrace.

xmgrace hydrogen-bonds-intra-protein.xvg 
xmgrace -nxy hydrogen-bonds-intra-protein.xvg 

With the first command we got a plot which shows us how many hydrogen bonds are in the protein over the simulation time. The second plot shows us how many possible and real hydrogen bonds could be found in the protein.

[Back to Molecular Dynamics]

Ramachandran plot

Now we calculate and visualize the Ramachandran plot. The plot we calculated contains all angels of all residues.

g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg 
xmgrace ramachandran.xvg

[Back to Molecular Dynamics]

RMSD matrix

To see, if there is a periodic motion during the simulation or if the structure stays rigid during the simulation, we calculated a RMSD matrix. Therefore, both axes contain the simulation time. In the plot you can see how similar the two structures at these two simulation points are to each other. This means, that this plot can display if there are very similar structures at different time points. This corresponds to an all-against-all structure comparison, which can executed with the following command:

g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 
xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue 
gv rmsd-matrix.eps 

[Back to Molecular Dynamics]

cluster analysis

Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight of how many different structures occur during the simulation.

echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg 
-o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10 

In this simulation, we use only the MainChain and the H atoms, because this calculation is really time consuming. The analysis of the MainChain and the H atoms provides already a lot insightes into the structure differences and allows a good cluster. The different clusters are visualized with PyMol

pymol clusters.pdb
disable all
split_states clusters
show cartoon

Furthermore, it is possible to align different cluster structures, to see how big the difference between the different clusters is.

align clusters_x, clusters_y


[Back to Molecular Dynamics]

internal RMSD

As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each inter atomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space. We calculated the internal RMSD with following command:

g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg 
xmgrace distance-rmsd.xvg

Therefore, with this calculation we finished our MD result analysis.

[Back to Molecular Dynamics]