Difference between revisions of "Run MD HEXA"

From Bioinformatikpedia
(cluster analysis)
(cluster analysis)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=== Run the MD simulation ===
 
=== Run the MD simulation ===
   
Here we want to give a receipt for how to analyse the MD simulation result as we did it in our section.
+
Here we want to give a manual for how to analyze the MD simulation result as we did it in our section.
  +
<br>
 
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
   
Line 13: Line 13:
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
   
==== Visualistation ====
+
==== Visualization ====
   
Next we want to visualise our results:
+
Next we want to visualize our results:
   
 
trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10
 
trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10
Line 25: Line 25:
 
==== create a movie ====
 
==== create a movie ====
   
Therefore, we load the pdb movie file in pymol and save each single frame on an own png file. Next, we converted the pgn files to gif files and than we used [[http://www.onyxbits.de/giftedmotion GiftedMotion]] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file was too large.
+
Therefore, we load the PDB movie file in PyMol and save each single frame on an own png file. Next, we converted the png files to gif files and than we used [[http://www.onyxbits.de/giftedmotion GiftedMotion]] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file is too large.
   
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
Line 38: Line 38:
 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg
 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg
   
We visualised the results of the different runs with the xmgrace program:
+
We visualized the results of the different runs with the xmgrace program:
 
 
 
xmgrace pressure.xvg
 
xmgrace pressure.xvg
Line 49: Line 49:
 
==== minimum distance between periodic boundary cells ====
 
==== minimum distance between periodic boundary cells ====
   
Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation my be wrong.
+
Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation may be wrong.
 
To calculate the minimum distance we used following command:
 
To calculate the minimum distance we used following command:
   
 
g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi
 
g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi
   
We visualised the results with xmgrace:
+
We visualized the results with xmgrace:
 
 
 
xmgrace minimal-periodic-distance.xvg
 
xmgrace minimal-periodic-distance.xvg
Line 60: Line 60:
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
   
==== RMSF for protein and C-alpha and Pymol analysis of average and bfactor ====
+
==== RMSF for protein and C-alpha and PyMol analysis of average and B-factor ====
   
In the next step, we analysed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have a lot of very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure.
+
In the next step, we analyzed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have many very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure.
 
Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are.
 
Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are.
 
Therefore, we used following commands:
 
Therefore, we used following commands:
Line 69: Line 69:
 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res
 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res
   
We visualised the rmsf-per-residue file with xmgrace. The pdb files were visualised with pymol. Furthermore, we aligned the calculated structures with the start structure with pymol to get a RMSD value. Additionally, we looked at the parts of the protein which are really flexible to see how the structure change over time.
+
We visualized the rmsf-per-residue file with xmgrace. The PDB files were visualized with PyMol. Furthermore, we aligned the calculated structures with the start structure with PyMol to get a RMSD value. Additionally, we looked at the real flexible parts of the protein to see how the structure changes over time.
   
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
Line 75: Line 75:
 
==== Radius of gyration ====
 
==== Radius of gyration ====
   
The Radius of gyration is the RMS distance of the protein parts from their centre. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein centre is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:
+
The radius of gyration is the RMS distance of the protein parts from their center. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein center is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:
   
 
g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg
 
g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg
   
To visualise the result of this calculation we use two different xmgrace commands.
+
To visualize the result of this calculation we use two different xmgrace commands.
   
 
With the following command, we got a plot which shows the change of the radius of gyration over simulation time.
 
With the following command, we got a plot which shows the change of the radius of gyration over simulation time.
Line 86: Line 86:
   
 
With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.
 
With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.
  +
  +
Xmgrace -nxy radius-of-gyration.xvg
  +
   
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
Line 95: Line 98:
 
trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump
 
trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump
   
Next we calculate the solvent accessible surface area:
+
Next we calculated the solvent accessible surface area:
   
 
g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg
 
g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg
 
-oa atomic-sas.xvg -or residue-sas.xvg
 
-oa atomic-sas.xvg -or residue-sas.xvg
   
We visualised all of these files with xmgrace.
+
We visualized all of these files with xmgrace.
 
 
 
xmgrace file.xvg
 
xmgrace file.xvg
 
xmgrace -nxy file.xvg
 
xmgrace -nxy file.xvg
   
The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of pysicochemical residues over the simulation.
+
The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of physicochemical residues over the simulation.
   
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Molecular_Dynamics_Simulations_HEXA Back to Molecular Dynamics]]
Line 116: Line 119:
 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg
 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg
   
Again, we visualised the files with xmgrace.
+
Again, we visualized the files with xmgrace.
   
 
xmgrace hydrogen-bonds-intra-protein.xvg
 
xmgrace hydrogen-bonds-intra-protein.xvg
Line 127: Line 130:
 
==== Ramachandran plot ====
 
==== Ramachandran plot ====
   
Now we calculate and visualise the Ramachandran plot. The plot we calculated contains all angels of all residues.
+
Now we calculate and visualize the Ramachandran plot. The plot we calculated contains all angels of all residues.
   
 
g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg
 
g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg
Line 136: Line 139:
 
==== RMSD matrix ====
 
==== RMSD matrix ====
   
To see, if there is a periodic motion during the simulation of are there very similar structure during the simulation, we calculated a RMSD matrix. Therefore, on both axis are the simulation time. In the plot you can see how similar the two structure at these two simulation points are to each other. So therefore, if there are structures at different time points very similar, you can see this on this plot. So in this case, there is an all-against-all structure comparison.
+
To see, if there is a periodic motion during the simulation or if the structure stays rigid during the simulation, we calculated a RMSD matrix. Therefore, both axes contain the simulation time. In the plot you can see how similar the two structures at these two simulation points are to each other. This means, that this plot can display if there are very similar structures at different time points. This corresponds to an all-against-all structure comparison, which can executed with the following command:
   
 
g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10
 
g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10
Line 146: Line 149:
 
==== cluster analysis ====
 
==== cluster analysis ====
   
Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight, how many different structures occur during the simulation.
+
Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight of how many different structures occur during the simulation.
   
 
echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg
 
echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg
Line 152: Line 155:
 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
   
In this simulation, we only use the MainChain and the H atoms, because this calculation is really time consuming, and only looking at the MainChain and the H atoms is enough to get a closer look to the different structures and to cluster them together.
+
In this simulation, we use only the MainChain and the H atoms, because this calculation is really time consuming. The analysis of the MainChain and the H atoms provides already a lot insightes into the structure differences and allows a good cluster.
The different clusters are visualised with pymol
+
The different clusters are visualized with PyMol
   
 
pymol clusters.pdb
 
pymol clusters.pdb
Line 169: Line 172:
 
==== internal RMSD ====
 
==== internal RMSD ====
   
As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each interatomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space.
+
As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each inter atomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space.
 
We calculated the internal RMSD with following command:
 
We calculated the internal RMSD with following command:
   

Latest revision as of 19:02, 30 September 2011

Run the MD simulation

Here we want to give a manual for how to analyze the MD simulation result as we did it in our section.
[Back to Molecular Dynamics]

check the trajectory

First of all we checked the trajectory, to see if our simulation finished successfully and the file is not corrupted.

gmxcheck -f traj.xtc

[Back to Molecular Dynamics]

Visualization

Next we want to visualize our results:

trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10 
pymol protein.pdb 


[Back to Molecular Dynamics]

create a movie

Therefore, we load the PDB movie file in PyMol and save each single frame on an own png file. Next, we converted the png files to gif files and than we used [GiftedMotion] to create a motion gif file. Therefore, we load the first 15 - 40 frames. Sometimes we could use 40 frames, but if we want to create the gifs with the detailed mutations, we had to use 15 frames, because otherwise the file is too large.

[Back to Molecular Dynamics]

energy calculations for pressure, temperature, potential and total energy

In the next analysis step we calculated the energy values for pressure, temperature, potential and total energy with following commands:

echo 13 0 | g_energy -f ener.edr -o pressure.xvg 
echo 12 0 | g_energy -f ener.edr -o temperature.xvg 
echo 9 0 | g_energy -f ener.edr -o potential.xvg 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg 

We visualized the results of the different runs with the xmgrace program:

xmgrace pressure.xvg
xmgrace temperature.xvg
xmgrace potential.xvg
xmgrace total_energy.xvg

[Back to Molecular Dynamics]

minimum distance between periodic boundary cells

Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation may be wrong. To calculate the minimum distance we used following command:

g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi 

We visualized the results with xmgrace:

xmgrace minimal-periodic-distance.xvg

[Back to Molecular Dynamics]

RMSF for protein and C-alpha and PyMol analysis of average and B-factor

In the next step, we analyzed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have many very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure. Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are. Therefore, we used following commands:

echo 1 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res 

We visualized the rmsf-per-residue file with xmgrace. The PDB files were visualized with PyMol. Furthermore, we aligned the calculated structures with the start structure with PyMol to get a RMSD value. Additionally, we looked at the real flexible parts of the protein to see how the structure changes over time.

[Back to Molecular Dynamics]

Radius of gyration

The radius of gyration is the RMS distance of the protein parts from their center. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein center is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:

g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg 

To visualize the result of this calculation we use two different xmgrace commands.

With the following command, we got a plot which shows the change of the radius of gyration over simulation time.

xmgrace radius-of-gyration.xvg

With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.

Xmgrace -nxy radius-of-gyration.xvg


[Back to Molecular Dynamics]

solvent accessible surface area

Another important point is the solvent accessible surface area of the protein. With following command, we calculated the average solvent accessibility per residue and per atom over time, and also the solvent accessibility of the protein over the simulation time. First of all, we have to create the traj_nojump.xtc file with following command:

trjconv -f traj.xtc -o traj_nojump.xtc -pbc nojump 

Next we calculated the solvent accessible surface area:

g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg 
-oa atomic-sas.xvg -or residue-sas.xvg 

We visualized all of these files with xmgrace.

xmgrace file.xvg
xmgrace -nxy file.xvg

The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of physicochemical residues over the simulation.

[Back to Molecular Dynamics]

hydrogen-bonds between protein and protein / protein and water

Next, we calculated the numbers of hydrogen bonds between the protein itself and also between the protein and the surrounding water.

echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg 

Again, we visualized the files with xmgrace.

xmgrace hydrogen-bonds-intra-protein.xvg 
xmgrace -nxy hydrogen-bonds-intra-protein.xvg 

With the first command we got a plot which shows us how many hydrogen bonds are in the protein over the simulation time. The second plot shows us how many possible and real hydrogen bonds could be found in the protein.

[Back to Molecular Dynamics]

Ramachandran plot

Now we calculate and visualize the Ramachandran plot. The plot we calculated contains all angels of all residues.

g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg 
xmgrace ramachandran.xvg

[Back to Molecular Dynamics]

RMSD matrix

To see, if there is a periodic motion during the simulation or if the structure stays rigid during the simulation, we calculated a RMSD matrix. Therefore, both axes contain the simulation time. In the plot you can see how similar the two structures at these two simulation points are to each other. This means, that this plot can display if there are very similar structures at different time points. This corresponds to an all-against-all structure comparison, which can executed with the following command:

g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 
xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue 
gv rmsd-matrix.eps 

[Back to Molecular Dynamics]

cluster analysis

Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight of how many different structures occur during the simulation.

echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg 
-o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg 
-clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10 

In this simulation, we use only the MainChain and the H atoms, because this calculation is really time consuming. The analysis of the MainChain and the H atoms provides already a lot insightes into the structure differences and allows a good cluster. The different clusters are visualized with PyMol

pymol clusters.pdb
disable all
split_states clusters
show cartoon

Furthermore, it is possible to align different cluster structures, to see how big the difference between the different clusters is.

align clusters_x, clusters_y


[Back to Molecular Dynamics]

internal RMSD

As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each inter atomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space. We calculated the internal RMSD with following command:

g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg 
xmgrace distance-rmsd.xvg

Therefore, with this calculation we finished our MD result analysis.

[Back to Molecular Dynamics]