Difference between revisions of "MD Mutation436"

From Bioinformatikpedia
(Pressure)
(Temperature)
Line 143: Line 143:
   
 
[[Image:mut436_md_temperatur.png|thumb|center|Plot of the temperature distribution of the MD system.]]
 
[[Image:mut436_md_temperatur.png|thumb|center|Plot of the temperature distribution of the MD system.]]
  +
  +
As you can see on the picture, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 4 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relativly cold for a protein to work, because the temperature in our bodies is about 36°C.
   
 
==== Potential ====
 
==== Potential ====

Revision as of 19:57, 29 August 2011

check the trajectory

We checked the trajectory with following command:

gmxcheck -f mut436_md.xtc 

With the command we got following results:

Reading frame       0 time    0.000   
# Atoms  96555
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Furthermore, we got some detailed results about the different items during the simulation.

Item #frames Timestep (ps)
Step 2001 5
Time 2001 5
Lambda 0 -
Coords 2001 5
Velocities 0 -
Forces 0 -
Box 2001 5

The simulation finished on node 0 Fri Aug 26 08:40:07 2011.

Time
Node (s) Real (s) %
34860.474 34860.474 100%
9h41:00

The complete simulation needs 9 hours and 41 minutes to finishing.

Performance
Mnbf/s GFlops ns/day hour/ns
818.560 60.105 24.785 0.968

As you can see in the table above, it takes about 1 hour to simulat 1ns of the system. So therefore, it would be possible to simulate about 25ns in one complete day calculation time.

Visualize in pymol

First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:

Group 1 Group 2
System Water
Protein Ion
Backbone NA
MainChain+H CL
SideChain

Here is a picture of the visualization with ngmx:

Visualisation of the MD simulation for Mutation 436 with ngmx

Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continuse trajectories. Therefore, we used following command:

trjconv -s fole.tpr -f file.xtc -o output_file.pdb -pbc nojump -dt 10

The program asks for the a group as output. We want to see the whole system, therefore we decided to use group 0.

create a movie

energy calculations for pressure, temperature, potential and total energy

Temperature

Average (in K) 297.94
Error Estimation 0.0029
RMSD 0.944618
Tot-Drift 0.00834573

The plot with the temperature distribution of the system can be seen here:

Plot of the temperature distribution of the MD system.

As you can see on the picture, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 4 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relativly cold for a protein to work, because the temperature in our bodies is about 36°C.

Potential

Average (in kJ/mol) -1.28165e+06
Error Estimation 100
RMSD 1080.9
Tot-Drift -714.814

The plot with the potential energy distribution of the system can be seen here:

Plot of the potential energy distribution of the MD system.

Total energy

Average (in kJ/mol) -1.0519e+06
Error Estimation 100
RMSD 1322.68
Tot-Drift -708.38

The plot with the total energy distribution of the system can be seen here:

Plot of the total energy distribution of the MD system.

Pressure

Average (in bar) 1.0066
Error Estimation 0.014
RMSD 71.218
Tot-Drift -0.083422

The plot with the pressure distribution of the system can be seen here:

Plot of the pressure distribution of the MD system.

minimum distance between periodic boundary cells

Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.

RMSF for protein and C-alpha

Protein

First of all, we calculate the RMSF for the whole protein.

The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.

To compare the structure we align them with pymol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Alignment of the original structure (green) and the calculated average structure (turquoise)
Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
Perspective two
Alignment of the original structure (green) and the calculated average structure (turquoise)
Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
RMSD
1.525 0.348 1.671

The structure with the high B-factors is the most similar structure compared with the original structure from PDB. The average structure is not that similar. But we know, that the regions with high B-Factors are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen in the following plot:

Plot of the RMSF values over the whole protein.

There are two regions with very high B-factor values. One region at position 150, and the other region at position 440. If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.

Part of the alignment between the original structure and the average structure between residue 140 and 160.
Part of the alignment between the original structure and the average structure between residue 430 and 450.

Furthermore, we visualized the B-factors with the pymol selection B-factor method. We calculated the B-factors for the blue protein. If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.

Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color
Part of the alignment between the original structure and the average structure between residue 430 and 450. High B-Factor value -> bright color

In the second picture, you can see, that the color is dark blue. Therefore a peak lower than 0.3 do not mean that there is high flexibility. Therefore, our protein has only one very flexible region and this is around residue 140.


As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have a very different position and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. This also explains the relatively high RMSD values, because of the different positions of the flexible parts of the protein.

C-alpha

Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.

To compare the structure we align them with pymol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
Perspective two
Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
RMSD
1.324 0.277 1.334

As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar. This was expected, because we used twice the same model, but in this case we neglecte the residues of the atoms. But the backbone of the protein remains the same.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen in the following plot:

Distribution of the b-factor values by only regarding the backbone of the protein.

In this case, there is only one high peak at position 150. By observing the whole protein, it was possible to see, that the position of the beta sheet differs extremely between the two models. The other peak at position 440 could not be found in the plot. By a look at the picture above, we can see that the backbone do not differ extremely between the two models. Therefore, in this case the position of the residues has to be very different, which is not important in our case, because we do not regard side chains.

Pymol analysis of average and bfactor

still done - > vllt kein extra kapitel dafuer?

Radius of gyration

Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.

Plot with the distribution of the radius of gyration over time

The plot above shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs.

solvent accesible surface area

Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.

First of all, we have a look at the solvent accesibility of each residue, which can be seen in the following plot.

Solvent accesibility of each residue in the protein

Furthermore, it is also possible to look at the solvent accesibility of each atom of the complete protein, which can be seen in the next picture.

Solvent accesibility of each atom of the complete protein.

The last plot shows how much of the area of the protein is accesibile to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.

Area of the protein which is accesible to the surface during the simulation.

So we can see in the pictures there are a lot of differences between the solvent accesibility of the protein's surface, which was expected. If we have a closer look at the plot about the atomic accesibility and the residue accesibility, we can see, that both pictures agree. During the simulation there is a big differences in the accessibility of the protein. There is a area between 131 and 146 nm/2S/N during the simulation, which is a big difference. Therefore, we can see that the protein has to be really flexible, otherwise, these different values could not be explained.

hydrogen-bonds

In this case, we differ between hydrogen-bonds between the protein itselfs and bonds between the protein and the water.

As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differ extremely over the time.

In the following plot you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 320 and 330 hydrongen-bonds.

Number of hydrogen-bonds in the protein over simulation time

If we have a look at the number of bonds between the protein and the water, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 860 bonds with the water.

Number of hydrogen-bonds between the protein and the surronding water.

This is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.

salt bridges

Ramachandran plot

RMSD matrix

cluster analysis

internal RMSD