MD WildType

From Bioinformatikpedia
Revision as of 16:10, 19 September 2011 by Link (talk | contribs) (hydrogen-bonds)

check the trajectory

We checked the trajectory with following command:

gmxcheck -f 2GJX_A_md.xtc 

With the command we got following results:

Reading frame       0 time    0.000   
# Atoms  96543
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Furthermore, we got some detailed results about the different items during the simulation.

Item #frames Timestep (ps)
Step 2001 5
Time 2001 5
Lambda 0 -
Coords 2001 5
Velocities 0 -
Forces 0 -
Box 2001 5

The simulation finished on node 0 Thu Sep 15 23:45:08 2011

Time
Node (s) Real (s) %
22438.875 22438.875 1oo%
6h13:58

The complete simulation needs 6 hours and 13 minutes to finishing.

Performance
Mnbf/s GFlops ns/day hour/ns
1271.745 93.383 38.505 0.623

As you can see in the table above, it takes about half an hour to simulate 1ns of the system. So therefore, it would be possible to simulate about 40ns in one complete day calculation time.

Visualize in pymol

First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:

Group 1 Group 2
System Water
Protein Ion
Backbone NA
MainChain+H CL
SideChain

Figure 1 shows the visualization with ngmx:

Figure 1: Visualisation of the MD simulation for the wildtype with ngmx

Create a movie

Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continouse trajectories. Therefore, we used following command:

trjconv -s fole.tpr -f file.xtc -o output_file.pdb -pbc nojump -dt 10

The program asks for the a group as output. We only want to see the protein, therefore we decided to choose group 1.

Todo: Filem erstellen und die filtered machen


energy calculations for pressure, temperature, potential and total energy

Temperature

Average (in K) 297.94
Error Estimation 0.0029
RMSD 0.942857
Tot-Drift 0.0066475

The plot with the temperature distribution of the system can be seen here:

Figure 2: Plot of the temperature distribution of the MD system.

As you can see on Figure 2, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 4 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.

Potential

Average (in kJ/mol) -1.2815e+06
Error Estimation 100
RMSD 1078.55
Tot-Drift -661.902

The plot with the potential energy distribution of the system can be seen here:

Figure 3: Plot of the potential energy distribution of the MD system.

As can be seen on Figure 3, the potential energy of the system is between -1.285e+06 and -1.278e+06, which is a relatively low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.

Total energy

Average (in kJ/mol) -1.05177e+06
Error Estimation 100
RMSD 1321.31
Tot-Drift -656.777

The plot with the total energy distribution of the system can be seen here:

Figure 4: Plot of the total energy distribution of the MD system.

As we can see on Figure 4 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.055e+06 and -1.048e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.

Pressure

Average (in bar) 1.00711
Error Estimation 0.0087
RMSD 71.2473
Tot-Drift -0.0454746

The plot with the pressure distribution of the system can be seen here:

Figure 5: Plot of the pressure distribution of the MD system.

As you can see on Figure 5, the pressure in the system is most of the time about 0, but there a big outlier with 250 and about -200 bar. So therefore we are not sure, if a protein can work with such a pressure.

minimum distance between periodic boundary cells

Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.

Here you can see the result of this analysis.

Figure 6: Plot of the minimum distance between periodic boundary cells.

As you can see on Figure 6, there is a huge difference between the different time steps and distances. The highest distance is up to 4 nm, whereas the smallest distance is only about 1nm. Therefore, we can see that the protein is very flexible over the time.

TODO

RMSF for protein and C-alpha

Protein

First of all, we calculate the RMSF for the whole protein.

The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.

To compare the structure we align them with pymol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 7: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 8: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 9: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
Perspective two
Figure 10: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 11: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 12: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
RMSD
1.556 0.349 1.684

The structure with the high B-factors is the most similar structure (Figure 8 and Figure 11) compared with the original structure from PDB (Figure 7 and Figure 10). The average structure is not that similar (Figure 10 and Figure 12). But we know, that the regions with high B-Factors are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 13:

Figure 13: Plot of the RMSF values over the whole protein.

There are a lot regions with high B-factor values. The highest B-factor value can be found on position 150 (Figure 14), but there are also high values on position 110 (Figure 13), 290 (Figure 16), 320 (Figure 17), 410 (Figure 18), 460 (Figure 19) and 490 (Figure 20). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.

Figure 14: Part of the alignment between the original structure and the average structure between residue 110 and 120.
Figure 15: Part of the alignment between the original structure and the average structure between residue 140 and 160.
Figure 16: Part of the alignment between the original structure and the average structure between residue 280 and 300.
Figure 17: Part of the alignment between the original structure and the average structure between residue 310 and 330.
Figure 18: Part of the alignment between the original structure and the average structure between residue 400 and 420.
Figure 19: Part of the alignment between the original structure and the average structure between residue 450 and 470.
Figure 20: Part of the alignment between the original structure and the average structure between residue 480 and 500.

As we can see in the pictures above (Figure 14 - Figure 20),we can see that there is always a little difference between the two structures. Therefore, this regions seem to be flexible.

Furthermore, we visualized the B-factors with the pymol selection B-factor method. We calculated the B-factors for the blue protein (Figure 21 - Figure 27). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.

Figure 21: Part of the alignment between the original structure and the average structure between residue 110 and 120. High B-Factor value -> bright color.
Figure 22: Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color.
Figure 23: Part of the alignment between the original structure and the average structure between residue 280 and 300. High B-Factor value -> bright color.
Figure 24: Part of the alignment between the original structure and the average structure between residue 310 and 330. High B-Factor value -> bright color.
Figure 25: Part of the alignment between the original structure and the average structure between residue 400 and 420. High B-Factor value -> bright color.
Figure 26: Part of the alignment between the original structure and the average structure between residue 450 and 470. High B-Factor value -> bright color.
Figure 27: Part of the alignment between the original structure and the average structure between residue 480 and 500. High B-Factor value -> bright color.

In Figure 21 and Figure 22, the color of the protein is turquoise, which shows that there is a relatively high B-factor value. Figure 23 is also colored in turquoise. The pictures which show the center of the protein (Figure 24 - Figure 26) also show only a dark blue protein which means that there is a low B-factor value and therefore the plot or the picture are wrong. The last picture (Figure 27) shows us that there is a high B-factor value in the chain.

C-alpha

Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.

To compare the structure we align them with pymol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 28: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 29: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 30: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
Perspective two
Figure 31: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 32: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 33: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
RMSD
1.373 0.279 -

As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 19 and Figure 22). This was expected, because we used twice the same model, but in this case we neglecte the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 18, Figure 20, Figure 21 and Figure 23) have nearly the same RMSD value and therefore there are equally.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 24:

Figure 24: Distribution of the b-factor values by only regarding the backbone of the protein.

In this case, there are only three high peaks at position 150, 280 and 310. By comparison to figure 13, these high peaks can also be found in this plot. Furthermore, it was possible to observe these high B-factor values in the pictures.

Radius of gyration

Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.

Figure : Plot with the distribution of the radius of gyration over time
Figure : Plot with the distribution of the radius of gyration over time

Figure ? shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. But over time, the radius grows, so in the beginning the protein has a radius about 2.39, whereas in the end of the simulation the radius is about 2,43.

If we have a look also to the radius of the different axis, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less different between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that most of the Radius of gyration is because of motions in x and z direction and not so much about motions in y direction.

Convergence of RMSD

Next we want to analyse how the RMSD change over the simulation.

Figure : Plot with the difference between the original structure and the simulated structure measured in Angstrom

Figure ? shows us how much the protein structure during the simulation differ from the structure from the beginning of the simulation. So we can see in the beginning of the simulation the differences between the original and simulated structure is very low, which was expected, because the simulation only differ the structure of the protein a little bit and therefore, at the beginning the two structures has to be very similar. But over the time, the differences between the original and simulated structure grows. At the end there is a difference between them of about 0.3 Angstrom, which is relatively low if we keep in mind, that the structure should be the same. So therefore, we can see that the protein is very flexible, because there is a different of 0.3 Angstrom between the start structure and the simulated structure.

Figure : Plot with the difference between the original structure and the simulated structure (only the backbone) measured in Angstrom

In Figure ? you can see the same picture as in Figure ?, with the difference, that only the position of the backbone of the simulated protein is regarded. Here the RMSD is not that high as in the picture above, but the trend is the same. So we can see, that there is a higher RMSD if we regard the whole protein, because of other positions of the side chains, but the most important changes between the two structures is in the backbone and not on the sidechains.

Next we calculated an average structure of the simulation and again calculated the convergence of the RMSD. So it is possible to show how flexible this protein is in reality.

Figure : Plot with the difference between the average structure and the simulated structure measured in Angstrom
Figure : Plot with the difference between the average structure and the simulated structure (only the backbone) measured in Angstrom

As we can see on Figure ? and Figure ?, in this case the RMSD is significantly lower than the RMSD between the start structure and the simulated structure. Furthermore, the curve looks different, because in this case there is no continouse slope, instead the curve seems more to be on the same values. Therefore, we can see, that the protein is flexible. We can see that there is also a big deviation of the RMSD between the average structure and the simulated structure.

solvent accesible surface area

Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.

First of all, we have a look at the solvent accesibility of each residue, which can be seen on Figure 26.

Figure 26: Solvent accesibility of each residue in the protein

Furthermore, it is also possible to look at the solvent accesibility of each atom of the complete protein, which can be seen on Figure 27.

Figure 28: Solvent accesibility of each atom of the complete protein.

Figure 29 shows how much of the area of the protein is accesibile to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.

Figure 29: Area of the protein which is accesible to the surface during the simulation.

So we can see in the pictures there are a lot of differences between the solvent accesibility of the protein's surface, which was expected. If we have a closer look at the plot about the atomic accesibility and the residue accesibility, we can see, that both pictures agree. During the simulation there is a big differences in the accessibility of the protein. There is a area between 131 and 146 nm/2S/N during the simulation, which is a big difference. Therefore, we can see that the protein has to be really flexible, otherwise, these different values could not be explained.

hydrogen-bonds

In this case, we differ between hydrogen-bonds between the protein itself and bonds between the protein and the water.

As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differs extremely over the time.

On Figure ? you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 360. Most of the time, the protein has between 320 and 340 hydrogen-bonds.

Figure 30: Number of hydrogen-bonds in the protein over simulation time

If we have a look at the number of bonds between the protein and the water, which are visualized on Figure 31, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 860 bonds with the water.

Figure 31: Number of hydrogen-bonds between the protein and the surronding water.

This is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.

Ramachandran plot

Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyse the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.

Figure : Ramachandran Plot of the wild type.

As we can see on Figure ?, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which is right.

RMSD matrix

Next we analysed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.

The following matrix shows the RMSD values of our structures.

Figure : RMSD matrix of our structures during the simulation

As you can see in the picture above, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Only near on the diagonal there are some structures which are colored in cyan, but in general, most of the structures are colored in green which means a RMSD value of about 0.15 Angstrom. So we can see that there are differences of the protein structure during the simulation and therefore we can conclude that this protein is very flexible.

cluster analysis

Next, we started a cluster analysis. First of all, we found 225 different clusters.

We visualized all of these cluster structures:

Figure: Visualisation of the 231 different clusters

Next we aligned some structures of the cluster and measured the RMSD:

Cluster 1 Cluster 2 RMSD
cluster 1 cluster 2 0.880
cluster 1 cluster 5 0.068

The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. Therefore, we can see that the different structures of the simulation are very similar.