Difference between revisions of "Molecular Dynamics Simulations HEXA"

From Bioinformatikpedia
(RMSF for protein and C-alpha and Pymol analysis of average and bfactor)
(RMSF for protein and C-alpha and Pymol analysis of average and bfactor)
Line 355: Line 355:
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
  +
|original & average (protein)
|Alignment
 
  +
|original & B-Factors (protein)
|RMSD
 
  +
|average & B-Factors (protein)
  +
|
  +
|original & average (c-alpha)
  +
|original & B-Factors (c-alpha)
  +
|average & B-Factors (c-alpha)
 
|-
 
|-
|Wildtype
+
|colspan="7" | Wildtype
 
|-
 
|-
  +
|1.556
  +
|0.349
  +
|1.684
 
|
 
|
  +
|1.373
|
 
  +
|0.279
  +
| -
 
|-
 
|-
  +
|colspan="7" | Mutation 436
|
 
|
 
 
|-
 
|-
  +
|1.525
  +
|0.348
  +
|1.671
 
|
 
|
  +
|1.324
  +
|0.277
  +
|1.334
  +
|-
  +
|colspan="7" | Mutation 485
  +
|-
  +
|1.519
  +
|0.349
  +
|1.727
 
|
 
|
  +
|1.258
  +
|0.283
  +
|1.297
 
|-
 
|-
 
|}
 
|}

Revision as of 14:44, 20 September 2011

Run the MD simulation

Here we want to give a receipt for how to analyse the MD simulation result as we did it in our section.

  • check the trajectory

First of all we checked the trajectory, to see if our simulation finished successfully and the file is not corrupted.

gmxcheck -f traj.xtc


  • Visualistation

Next we want to visualise our results:

trjconv -s topol.tpr -f traj.xtc -o protein.pdb -pbc nojump -dt 10 
pymol protein.pdb 

Still TODO

  • create a movie and skip the g_filter step

Still TODO

  • energy calculations for pressure, temperature, potential and total energy

In the next analysis step we calculated the energy values for pressure, temperature, potential and total energy with following commands:

echo 13 0 | g_energy -f ener.edr -o pressure.xvg 
echo 12 0 | g_energy -f ener.edr -o temperature.xvg 
echo 9 0 | g_energy -f ener.edr -o potential.xvg 
echo 11 0 | g_energy -f ener.edr -o total_energy.xvg 

We visualised the results of the different runs with the xmgrace program:

xmgrace pressure.xvg
xmgrace temperature.xvg
xmgrace potential.xvg
xmgrace total_energy.xvg


  • minimum distance between periodic boundary cells

Next, we calculated the minimum distance between periodic boundary cells. A low distance means, that the part of the protein which is in this boundary cell have contacts with itself. This should not be the case, because one part of the protein should not have contacts with the completely equal part of the protein. Therefore, a low periodic boundary cell shows that the quality of the model is bad and the simulation my be wrong. To calculate the minimum distance we used following command:

g_mindist -f traj.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi 

We visualised the results with xmgrace:

xmgrace minimal-periodic-distance.xvg


  • RMSF for protein and C-alpha and Pymol analysis of average and bfactor

In the next step, we analysed the root mean square fluctuations for the complete protein and also for the C-alpha atoms. With the RMSF you can calculate the differences between two nearly identical structures. In our case, we have a lot of very similar structures. In general we use the same structure but over the simulation time, the structure moves and therefore we got a lot of very similar, but not equal structures during the simulation. We calculate the RMSF between the start structure and the average structure, which is the average of all structures calculated during the simulation. Furthermore, we also calculated the B-factors of the different residues of the structures. Therefore, we can get a good insight in the flexibility of the protein structure. Furthermore, we calculate this for the complete protein and the C-alpha atoms, to get the possibility to see how flexible the backbone and the residues are. Therefore, we used following commands:

echo 1 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res 
echo 3 0 | g_rmsf -f traj.xtc -s topol.tpr -o rmsf-per-residue_c.xvg -ox average_c.pdb -oq bfactors_c.pdb -res 

We visualised the rmsf-per-residue file with xmgrace. The pdb files were visualised with pymol. Furthermore, we aligned the calculated structures with the start structure with pymol to get a RMSD value. Additionally, we looked at the parts of the protein which are really flexible to see how the structure change over time.


  • Radius of gyration

The Radius of gyration is the RMS distance of the protein parts from their centre. So therefore, it is possible to get a good insight into the shape of the protein during simulation, because if the radius is higher, this means the distance between the different protein parts and the protein centre is higher and therefore the protein has a bigger shape than before. We calculate the radius of gyration with following command:

g_gyrate -f traj.xtc -s topol.tpr -o radius-of-gyration.xvg 

To visualise the result of this calculation we use two different xmgrace commands.

With the following command, we got a plot which shows the change of the radius of gyration over simulation time.

xmgrace radius-of-gyration.xvg

With the next command, we got some more detailed information about the radius of gyration. Therefore, we got the individual components of which the radius of gyration consists. These components correspond to the eigenvalues of the matrix of inertia. Therefore, the first component of the plot correspond to the longest axis of the molecule and vice versa.


  • solvent accessible surface area

Another important point is the solvent accessible surface area of the protein. With following command, we calculated the average solvent accessibility per residue and per atom over time, and also the solvent accessibility of the protein over the simulation time.

g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg 

We visualised all of these files with xmgrace.

xmgrace file.xvg
xmgrace -nxy file.xvg

The second command gave us a more detailed output. For the average solvent accessibility per residue and per atom we also got the standard deviation of this calculation which is very useful. For the solvent-accessibility-surface we additionally got the detailed composition of pysicochemical residues over the simulation.


  • hydrogen-bonds between protein and protein / protein and water

Next, we calculated the numbers of hydrogen bonds between the protein itself and also between the protein and the surrounding water.

echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg 
echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg 

Again, we visualised the files with xmgrace.

xmgrace hydrogen-bonds-intra-protein.xvg 
xmgrace -nxy hydrogen-bonds-intra-protein.xvg 

With the first command we got a plot which shows us how many hydrogen bonds are in the protein over the simulation time. The second plot shows us how many possible and real hydrogen bonds could be found in the protein.


  • Ramachandran plot

Now we calculate and visualise the Ramachandran plot. The plot we calculated contains all angels of all residues.

g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg 
xmgrace ramachandran.xvg


  • RMSD matrix

To see, if there is a periodic motion during the simulation of are there very similar structure during the simulation, we calculated a RMSD matrix. Therefore, on both axis are the simulation time. In the plot you can see how similar the two structure at these two simulation points are to each other. So therefore, if there are structures at different time points very similar, you can see this on this plot. So in this case, there is an all-against-all structure comparison.

g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10 
xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue 
gv rmsd-matrix.eps 


  • cluster analysis

Out next analysis is quiet similar to the RMSD matrix analysis. Here we calculated different clusters of very similar structures. This could be structures, which are very near in time, but also structures which are far away during the simulation time. Therefore, we can get a rough insight, how many different structures occur during the simulation.

echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10 

In this simulation, we only use the MainChain and the H atoms, because this calculation is really time consuming, and only looking at the MainChain and the H atoms is enough to get a closer look to the different structures and to cluster them together. The different clusters are visualised with pymol

pymol clusters.pdb
disable all
split_states clusters
show cartoon

Furthermore, it is possible to align different cluster structures, to see how big the difference between the different clusters is.

align clusters_x, clusters_y



  • internal RMSD

As last step, we calculate the RMSD values in our protein. Therefore, we calculated for each interatomic distance the RMSD value. This is a good hint to see how big the protein is. If there are only a lot of small RMSD values, the protein has to be very compact, whereas if all values a big, the protein needs a lot of space. We calculated the internal RMSD with following command:

g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg 
xmgrace distance-rmsd.xvg

Therefore, with this calculation we finished our MD result analysis.

Detailed results

Comparison of the results

In this section, we want to compare the different results of the MD analysis to look if there are differences between the wild type structure and the structures with the mutation. For more information about the single result analysis please look at the point Detailed results.

check the trajectory

 Wildtype  Mutation 436  Mutation 485
Item #frames Timesteps (ps) Item #frames Timesteps (ps) Item #frames Timesteps (ps)
Step 2001 5 Step 2001 5 Step 2001 5
Time 2001 5 Time 2001 5 Time 2001 5
Coords 2001 5 Coords 2001 5 Coords 2001 5

As you can see in the table above, each simulation has the same number of frames on the different items. Therefore, the different results of the different MD simulation runs are comparable. We used the results of these runs for the following comparison of the different results.

Visualize in pymol

create a movie

energy calculations for pressure, temperature, potential and total energy

In this section we compare the pressure, temperature, potential and total energy of the different runs.

Pressure
Wildtype Mutation 436 Mutation 485
Average (bar)
1.00711 1.0066 0.998385
Figure ?: Pressure distribution of the wildtype.
Figure ?: Pressure distribution of the Mutation 436.
Figure ?: Pressure distribution of the Mutation 485.

There are differences between the pressure of the different systems, but these differences are very low and therefore, it should not change the structure a lot. Therefore, we think, that such small differences between the three different structures do not explain why two of them do not function any longer, because of the mutation. If we have a look at Figure ?, Figure ?, and Figure ?, where we can see the distribution of the pressure over the simulation time which are very similar. So therefore, it is not possible to see big differences between the three different simulation results

temperature
Wildtype Mutation 436 Mutation 485
 Average (in K)
297.94 297.94 297.936
 Error Estimation
0.0029 0.0029 0.0045
Temperature distribution of the Wildtype.
Temperature distribution of Mutation 436.
Temperature distribution of Mutation 485.

The temperature of the system is nearly the same, only the temperature of the Mutation 485 is little bit lower. But this difference is that low, so therefore, we can say, the three model have the same temperature. If we have a look at the different plots of the temperatures over the simulation time, all three plots (Figure ?, Figure ? and Figure ?) show nearly the same picture. There are in each plot some outliers to higher or lower degress, but in general almost the complete time the system has a temperature of about 298K.


Potential
Wildtype Mutation 436 Mutation 485
Average (in kJ/mol)
-1.2815e+06 -1.28165e+06 -1.28176e+06
Figure : Potential energy distribution of the Wildtype.
Figure : Potential energy distribution of Mutation 436.
Figure : Potential energy distribution of the Mutation 485.

The average potential of the three different structures is very similar. Althoug there are very small differences between the wildtype structure and the structures with the mutation. The Wildtype has the highest potential energy, whereas Mutation 436 has a potential energy which is a little bit lower. The structure with a mutation at position 485 has the lowest potential. The values we can see on the tabel above, are only average values, therefore, we want to have a more detailed look to the plots of the potential energy distribtuion over time (Figure ?, Figure? and Figure ?). Especially if we compare Figure ? and Figure ?+2, we can see that almost during the complete simulation, the potential is lower than on the wildtype. If we look at Figure ?+1, we can see the same picture, but not that clear than on Figure ?+2. Therefore, both mutations change the potential energy a little bit.


Total Energy
Wildtype Mutation 436 Mutation 485
Average (in kJ/mol)
-1.05177e+06 -1.0519e+06 -1.05203e+06
Figure ?: Total energy distribution of the Wildtype.
Figure ?: Total energy distribution of Mutation 436.
Figure ?: Total energy distribution of Mutation 485.

If we look at the total energy of our different systems, it is clearly to see, that the trend, we already observed by potential energy is more clear to see here. Therefore, the different mutations have an effect on the protein structure and energy. Also in this case, the change is not that much, but nevertheless, there is a change, and also only little changes in the energy of the protein can damage the function. The mutation at position 485 seems has significantly more effect on the energy of the system, than the mutation at position 436, but both mutations decrease the energy of the structure. Therefore, it is possible, that the structure become too rigid and can not bind to their targets as without the mutation.

If we look at the plots (Figure ?, Figure ? and Figure ?) it is easy to see, that the distribution seems to be similar, but the average axis is lower on Figure ? and Figure ? than on the wildtype plot on Figure ?. This is the same result as we saw before on the analysis of the potential energy and it was expected, because the potential energy is a part of the total energy and therefore, if there are differences in the potential energy, there have to be changes in the total energy. In our case, the total energy shows more clearly that there is a energy difference between the wildtype structure and the mutation structures.

minimum distance between periodic boundary cells

Now we want to compare the calculations of the minimum distance between periodic boundary cells. First of all, the distance should not be 0, because than some parts of the protein will interact with itself, which should not occurr in a protein. So therefore, these minimum distance values should not be too low. Second, also small differences between the values of the different system could have big effects on the protein structure, because if some parts of the protein interact with itself, they could not interact with the original partner anylonger and therefore, the shape of the protein could be changed or destroy.

Wildtype Mutation 436 Mutation 485
Figure ?: Plot of the minimum distance between periodic boundary cells of the wildtype.
Figure ?: Plot of the minimum distance between periodic boundary cells of mutation 436.
Figure ?: Plot of the minimum distance between periodic boundary cells of mutation 485.

On the first view, we can see that Figure ?, Figure ? and Figure ? show totally different plots. First of all, it is important to keep in mind, that the MD simulation is a non-deterministic algorithm. Therefore, we can not compare the timeline itself, but we can compare the values and the distribution of the values. So therefore, we can see that on the wildtype plot the values are between 2 and 4. Most of the time the values are about 3.7, and only some values are lower than 3.
If we compare the wildtype to the plot of the structure with the mutation at position 346, we can see that almost all of the minimum distances during the simulation are lower then 3. Therefore, the distance between two interacting parts of the protein is signifiantly lower than on the wildtype. Because there is only one change in the complete sequence of the protein, we suggest, that the part with the mutation causes this changes. Therefore, the mutation lead to significantly different interaction in between the protein and therefore it probably change the shape of the protein.
If we compare the wildtype with the structure with a mutation at position 485, we can see that most of the distance is about 4. So in general, on the first view the two plots seems to be totally different. But if we have a closer look to the plots, these two plots are similar than the wildtype plot compared to the plot of mutation 436. Therefore, in this case the minimum distance seems to increase a little bit, but the difference is not that strong as we could observed by mutation 436. Nevertheless, also only small changes have influence of the function and the shape of the protein. Therefore, the interacting parts seems to be farther away than without the mutation.
Interesstingly, the two mutations have different effects on the interactions in the protein. Mutation 436 decrease the minimum distance between interacting parts of the protein, whereas the Mutation at position 485 increase the minimum distance. Nevertheless, if the minimum distance decrease of increase, in both cases the mutation changes the distances and therefore, we suggest that both mutations has an effect on the protein structure and function.

RMSF for protein and C-alpha and Pymol analysis of average and bfactor

Next we want to check if the mutations change the protein flexibility. Therefore, we calcualte the RMSF for the complete protein and the C-alpha atoms to have the possibility to differ between flexibility at the side chains and flexibility of the back bone. Furthermore, the program calcualted an average protein structure, which consists of all structures which are calculated during the simulation. The program also calculate the B-fact values on basis of the simulated structures. Furthermore, we want to visualise the most interessting results with pymol.

original & average (protein) original & B-Factors (protein) average & B-Factors (protein) original & average (c-alpha) original & B-Factors (c-alpha) average & B-Factors (c-alpha)
 Wildtype
1.556 0.349 1.684 1.373 0.279 -
 Mutation 436
1.525 0.348 1.671 1.324 0.277 1.334
 Mutation 485
1.519 0.349 1.727 1.258 0.283 1.297

Radius of gyration

solvent accesible surface area

hydrogen-bonds between protein and protein / protein and water

Ramachandran plot

RMSD matrix

cluster analysis

internal RMSD

Discussion