Difference between revisions of "MD WildType"

From Bioinformatikpedia
(RMSD matrix)
 
(2 intermediate revisions by one other user not shown)
Line 45: Line 45:
 
|}
 
|}
   
The simulation finished on node 0 Thu Sep 15 23:45:08 2011
+
The simulation finished on node 0 Thu September 15 23:45:08 2011
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 62: Line 62:
 
|}
 
|}
   
The complete simulation needs 6 hours and 13 minutes to finishing.
+
The complete simulation needs 6 hours and 13 minutes runtime.
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|colspan="4" | Performance
 
|colspan="4" | Performance
Line 83: Line 83:
 
<br><br>
 
<br><br>
   
=== Visualize in pymol ===
+
=== Visualize in PyMol ===
   
 
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
 
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
Line 110: Line 110:
 
Figure 1 shows the visualization with ngmx:
 
Figure 1 shows the visualization with ngmx:
   
[[Image:ngmx_2GJX_new.png|thumb|center|Figure 1: Visualisation of the MD simulation for the wildtype with ngmx]]
+
[[Image:ngmx_2GJX_new.png|thumb|center|Figure 1: Visualization of the MD simulation for the wildtype with ngmx]]
   
Furthermore, we also want to visualise the structure with pymol, which can be seen on Figure 2.
+
Furthermore, we also want to visualize the structure with PyMol, which can be seen on Figure 2.
   
[[Image:wt.png|thumb|center|Figure 2: Visualisation of the MD simulation for the wildtype with pymol]]
+
[[Image:wt.png|thumb|center|Figure 2: Visualization of the MD simulation for the wildtype with PyMol]]
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
Line 120: Line 120:
 
=== Create a movie ===
 
=== Create a movie ===
   
Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the
+
Next, we want to visualize the protein with PyMol. Therefore, we extracted 1000 frames from the
trajectory, leaving out the water and jump over the boundaries to make continouse trajectories.
+
trajectory, leaving out the water and jump over the boundaries to make continuous trajectories.
   
The program asks for the a group as output. We only want to see the protein, therefore we decided to choose group 1.
+
The program asks for a group as output. We only want to see the protein, therefore we decided to choose group 1.
   
 
Here you can see the movie in stick line and cartoon modus.
 
Here you can see the movie in stick line and cartoon modus.
Line 165: Line 165:
 
[[Image:wt_pressure_new.png|thumb|center|Figure 5: Plot of the pressure distribution of the MD system.]]
 
[[Image:wt_pressure_new.png|thumb|center|Figure 5: Plot of the pressure distribution of the MD system.]]
   
As you can see on Figure 5, the pressure in the system is most of the time about 0, but there a big outlier with about 232 and about -217 bar. So therefore we are not sure, if a protein can work with such a pressure.
+
As you can see on Figure 5, the pressure in the system is most of the time about 0, but there a big outliers with about 232 and about -217 bar. So therefore we are not sure, if a protein can work under such a pressure.
   
 
==== Temperature ====
 
==== Temperature ====
Line 194: Line 194:
 
[[Image:wt_md_temperatur_new.png|thumb|center|Figure 6: Plot of the temperature distribution of the MD system.]]
 
[[Image:wt_md_temperatur_new.png|thumb|center|Figure 6: Plot of the temperature distribution of the MD system.]]
   
As you can see on Figure 6, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 7 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.
+
As you can see on Figure 6, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maximum temperature is only about 7 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
Line 226: Line 226:
 
[[Image:wt_potential_new.png|thumb|center|Figure 7: Plot of the potential energy distribution of the MD system.]]
 
[[Image:wt_potential_new.png|thumb|center|Figure 7: Plot of the potential energy distribution of the MD system.]]
   
As can be seen on Figure 7, the potential energy of the system is between -1.2853e+06 and -1.2778e+06, which is a relatively low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.
+
As can be seen on Figure 7, the potential energy of the system is between -1.2853e+06 and -1.2778e+06, which is a relatively low energy. Therefore, this means that the protein is stable. So we can suggest, that the protein with such a low energy is stable and able to function and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
Line 283: Line 283:
 
|}
 
|}
   
As you can see on Figure 9, there is a huge difference between the different time steps and distances. The highest distance is up to 4 nm, whereas the smallest distance is only about 2nm. So in general the distances between the parts of the protein which interact with each other are between 2 and 4nm, which is not that near. Because of the jumping curve in the plot we can conclude, that the protein is flexible, because the minimum distance of the interaction is always different which means, that atoms which were in contact lost it and therefore the minimum distance decrease or increase.
+
As you can see on Figure 9, there is a huge difference between the different time steps and distances. The highest distance is up to 4 nm, whereas the smallest distance is only about 2 nm. So in general the distances between the parts of the protein which interact with each other are between 2 and 4nm, which is not that near. Because of the jumping curve in the plot we can conclude, that the protein is flexible, because the minimum distance of the interaction is always different which means, that atoms which were in contact lost it and therefore the minimum distance decrease or increase.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
<br><br>
 
<br><br>
   
=== RMSF for protein and C-alpha and Pymol analysis of average and bfactor ===
+
=== RMSF for protein and C-alpha and PyMol analysis of average and B-factor ===
   
 
==== Protein ====
 
==== Protein ====
Line 294: Line 294:
 
First of all, we calculate the RMSF for the whole protein.
 
First of all, we calculate the RMSF for the whole protein.
   
The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.
+
The analysis produce two different PDB files, one file with the average structure of the protein and the other file with high B-Factor values, which means that the high flexible regions of the protein are not in accordance with the original PDB file.
   
To compare the structure we align them with pymol with the original structure.
+
To compare the structure we align them with PyMol to the original structure.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 344: Line 344:
 
|}
 
|}
   
As we can see in the pictures above (Figure 17 - Figure 23),we can see that there is always a little difference between the two structures. Therefore, this regions seem to be flexible.
+
As we can see in the pictures above (Figure 17 - Figure 23), there is always a little difference between the two structures. Therefore, this regions seem to be flexible.
   
Furthermore, we visualized the B-factors with the pymol selection B-factor method. We calculated the B-factors for the blue protein (Figure 24 - Figure 30). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
+
Furthermore, we visualized the B-factors with the PyMol selection B-factor method. We calculated the B-factors for the blue protein (Figure 24 - Figure 30). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
   
 
{|
 
{|
Line 369: Line 369:
 
Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
 
Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
   
To compare the structure we align them with pymol with the original structure.
+
To compare the structure we align them with PyMol with the original structure.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 395: Line 395:
 
|}
 
|}
   
As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 22 and Figure 25). This was expected, because we used twice the same model, but in this case we neglecte the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 21, Figure 23, Figure 24 and Figure 26) have nearly the same RMSD value and therefore there are equally.
+
As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 22 and Figure 25). This was expected, because we used twice the same model, but in this case we neglect the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 21, Figure 23, Figure 24 and Figure 26) have nearly the same RMSD value and therefore there are equally.
   
 
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 37:
 
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 37:
Line 408: Line 408:
 
=== Radius of gyration ===
 
=== Radius of gyration ===
   
Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
+
Next, we want to analyze the radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
   
 
{|
 
{|
Line 444: Line 444:
   
 
 
Figure 38 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. But over time, the radius grows, so in the beginning the protein has a radius about 2.39, whereas in the end of the simulation the radius is about 2,43.
+
Figure 38 shows the radius of gyration over the simulation time. The radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. But over time, the radius grows, so in the beginning the protein has a radius about 2.39, whereas in the end of the simulation the radius is about 2,43.
   
If we have a look also to the radius of the different axis on Figure 39, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less different between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that most of the Radius of gyration is because of motions in x and z direction and not so much about motions in y direction.
+
If we have also a look to the radius of the different axis on Figure 39, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less differences between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that the radius of gyration consists especially of motions in x and z direction and not so much of motions in y direction.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
<br><br>
 
<br><br>
   
=== solvent accesible surface area ===
+
=== solvent accessible surface area ===
   
Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.
+
Next, we analyzed the solvent accessible surface area of the protein, which is the area of the protein which has contacts with the surrounding environment, mainly water.
   
First of all, we have a look at the solvent accesibility of each residue, which can be seen in Figure 40 and 41.
+
First of all, we have a look at the solvent accessibility of each residue, which can be seen in Figure 40 and 41.
   
 
{|
 
{|
|[[Image:wt_md_residue_solvent_new.png|thumb|center|Figure 40: Solvent accesibility of each residue in the protein]]
+
|[[Image:wt_md_residue_solvent_new.png|thumb|center|Figure 40: Solvent accessibility of each residue in the protein]]
|[[Image:wt_md_residue_solvent_new_bunt.png|thumb|center|Figure 41: Solvent accesibility of each residue in the protein with standard deviation]]
+
|[[Image:wt_md_residue_solvent_new_bunt.png|thumb|center|Figure 41: Solvent accessibility of each residue in the protein with standard deviation]]
 
|}
 
|}
   
The following table list the average, minimum and maximum values of the Solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.
+
The following table lists the average, minimum and maximum values of the solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 471: Line 471:
 
|0.004
 
|0.004
 
|-
 
|-
|Maximium (in nm²)
+
|Maximum (in nm²)
 
|2.058
 
|2.058
 
|-
 
|-
Line 477: Line 477:
   
 
The solvent accessibility area is the area of each residue which is accessible to the surface of the protein. In Figure 40 and Figure 41 we can see the average solvent accessibility of each residue over the complete simulation. Therefore, this means, that each residue can be more or less on the solvent in each state of the simulation.<br>
 
The solvent accessibility area is the area of each residue which is accessible to the surface of the protein. In Figure 40 and Figure 41 we can see the average solvent accessibility of each residue over the complete simulation. Therefore, this means, that each residue can be more or less on the solvent in each state of the simulation.<br>
We can see on the Figure 41 that the first 89 and the last 100 residues have an average solvent accessibility of 0, therefore, this means these residues are always completely in the interior of the protein. A lot of residues have a solvent accessibility about 0.5nm², and there are only some outlier with an accessibility of more than 1.5nm². So therefore, there are some residues which are almost always on the surface, a lot of residues which are partly or temporarly on the surface and also a lot of residues which are never on the surface.
+
We can see on the Figure 41 that the first 89 and the last 100 residues have an average solvent accessibility of 0, therefore, this means these residues are always completely in the interior of the protein. A lot of residues have a solvent accessibility about 0.5nm², and there are only some outliers with an accessibility of more than 1.5nm². So therefore, there are some residues which are almost always on the surface, a lot of residues which are partly or temporarily on the surface and also a lot of residues which are never on the surface.
If we look at Figure 41, we can see that the standard deviation is relatvely low, which means that there are no states of the system in which some residues with low or no solvent accessibility are complete accessible to the surface. If the standard deviation would be very high, it could be possible that there are states in the simulation which are very unusual. But this is not the case in our simulation.
+
If we look at Figure 41, we can see that the standard deviation is relatively low, which means that there are no states of the system in which some residues with low or no solvent accessibility are complete accessible to the surface. If the standard deviation would be very high, it could be possible that there are states in the simulation which are very unusual. But this is not the case in our simulation.
   
 
{|
 
{|
|[[Image:wt_md_atomic_solvent_new.png|thumb|center|Figure 42: Solvent accesibility of each atom of the complete protein.]]
+
|[[Image:wt_md_atomic_solvent_new.png|thumb|center|Figure 42: Solvent accessibility of each atom of the complete protein.]]
|[[Image:wt_md_atomic_solvent_bunt_new.png|thumb|center|Figure 43: Solvent accesibility of each atom of the complete protein with standard deviation.]]
+
|[[Image:wt_md_atomic_solvent_bunt_new.png|thumb|center|Figure 43: Solvent accessibility of each atom of the complete protein with standard deviation.]]
 
|}
 
|}
   
Line 499: Line 499:
 
|}
 
|}
   
In Figure 42 the average area per atom is ploted, which shows a similar picture as on Figure 40, but at the beginning there are no atoms with no surface accessibility. In general the atoms have not that big area as the residues, which is clear, because the area of the residues consist of the area of the single atoms which belong to this residue. As before, the standard deviation which is plotted in Figure 43, is not that high. It is a little bit higher than on Figure 41, but that was expected, because this Figure is more detailed and the scale is smaller, but in general Figure 42 and Figure 43 confirm the results of Figure 40 and Figure 41.
+
In Figure 42 the average area per atom is plotted, which shows a similar picture as on Figure 40, but at the beginning there are no atoms with no surface accessibility. In general the atoms have not that big area as the residues, which is clear, because the area of the residues consist of the area of the single atoms which belong to this residue. As before, the standard deviation which is plotted in Figure 43, is not that high. It is a little bit higher than on Figure 41, but that was expected, because this Figure is more detailed and the scale is smaller, but in general Figure 42 and Figure 43 confirm the results of Figure 40 and Figure 41.
At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result of the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. But there are a lot of atoms with low or no accessibility area in the plot and we know, that gromacs is a non-deterministic algorithm. Therefore, this result should be consistent with the results of the different residues.
+
At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result of the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. But there are a lot of atoms with low or no accessibility area in the plot.
   
 
{|
 
{|
|[[Image:wt_md_solvent_new.png|thumb|center|Figure 44: Area of the protein which is accesible to the surface during the simulation.]]
+
|[[Image:wt_md_solvent_new.png|thumb|center|Figure 44: Area of the protein which is accessible to the surface during the simulation.]]
|[[Image:wt_md_solvent_bunt_new.png|thumb|center|Figure 45: Area of the protein which is accesible to the surface during the simulation with standard deviation.]]
+
|[[Image:wt_md_solvent_bunt_new.png|thumb|center|Figure 45: Area of the protein which is accessible to the surface during the simulation with standard deviation.]]
 
|}
 
|}
   
Line 519: Line 519:
 
|}
 
|}
   
On Figure 44 and Figure 45 we can see the solvent accessibility surface of the protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 135nm², which is relatively consistent during the complete simulation. If we have a closer look to the distribution of the different pysiocochemical properties and the surface accessibilit of them that the area of the hydrophobic amino acids is larger than the are of the hydrophilic amino acids. This is really surprising, because in general hydrophobic amino acids prefer a location in the core of the protein and not on the surface.
+
On Figure 44 and Figure 45 we can see the solvent accessibility surface of the protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 135nm², which is relatively consistent during the complete simulation. If we have a closer look to the distribution of the different physicochemical properties and the surface accessibility of them we can see that the accessibility of the hydrophobic amino acids is larger than the accessibility of the hydrophilic amino acids. This is really surprising, because in general hydrophobic amino acids prefer a location in the core of the protein and not on the surface.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
Line 590: Line 590:
 
=== Ramachandran plot ===
 
=== Ramachandran plot ===
   
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyse the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
+
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyze the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
   
 
[[Image:ramachandran_wt_new.png|thumb|center|Figure 50: Ramachandran Plot of the wild type.]]
 
[[Image:ramachandran_wt_new.png|thumb|center|Figure 50: Ramachandran Plot of the wild type.]]
Line 601: Line 601:
 
=== RMSD matrix ===
 
=== RMSD matrix ===
   
Next we analysed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
+
Next we analyzed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
   
 
The following matrix shows the RMSD values of our structures.
 
The following matrix shows the RMSD values of our structures.
Line 616: Line 616:
 
Next, we started a cluster analysis. First of all, we found 225 different clusters.
 
Next, we started a cluster analysis. First of all, we found 225 different clusters.
   
We visualized all of these cluster structures on Figure 49:
+
We visualized all of these cluster structures on Figure 52:
   
[[Image:wt_md_clusters.png|thumb|center|Figure 49: Visualisation of the 231 different clusters]]
+
[[Image:wt_md_clusters.png|thumb|center|Figure 52: Visualization of the 225 different clusters]]
   
 
Next we aligned some structures of the cluster and measured the RMSD:
 
Next we aligned some structures of the cluster and measured the RMSD:
Line 637: Line 637:
 
|}
 
|}
   
The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. But if we align structures which are far away in number, the RMSD value increase, which shows us that over the simulation time the differences between the start structure and the simulated structures increase.
+
The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. But if we align clusters which are far away, the RMSD value increase, which shows us that over the simulation time the differences between the start structure and the simulated structures increase.
 
Therefore, we can see that the different structures of the simulation are very similar, but over the time they change more and more.
 
Therefore, we can see that the different structures of the simulation are very similar, but over the time they change more and more.
   
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 50.
+
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 53.
   
[[Image:wt_rmsd_dist_new.png|thumb|center|Figure 50: Distribution of the RMSD value over the different clusters]]
+
[[Image:wt_rmsd_dist_new.png|thumb|center|Figure 53: Distribution of the RMSD value over the different clusters]]
   
On Figure 50, it is possible to see, that the distribution is a gaussian distribution, with the highest peak at 0.2 Angstrom. This means, that most of the structures have a RMSD about 0.2 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.
+
On Figure 53, it is possible to see, that the distribution is a Gaussian distribution, with the highest peak at 0.2 Angstrom. This means, that most of the structures have a RMSD about 0.2 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
Line 653: Line 653:
 
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
 
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
   
[[Image:wt_md_internal_rms_new.png|thumb|center|Figure 51: Plot of the distance RMS values in the protein.]]
+
[[Image:wt_md_internal_rms_new.png|thumb|center|Figure 54: Plot of the distance RMS values in the protein.]]
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 667: Line 667:
 
|}
 
|}
   
As we can see on figure 51, at the beginning of our simulation the RMSD is relatively small, but it decreases almost the complete simulation. The internal RMSD is in the end at 0.28 Angstorm, which is not relatively high. Therefore the protein has no big distances in it self.
+
As we can see on figure 54, at the beginning of our simulation the RMSD is relatively small, but it decreases almost the complete simulation. The internal RMSD is in the end at 0.28 Angstrom, which is not relatively high. Therefore the protein has no big distances in it self.
   
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
 
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].

Latest revision as of 14:00, 29 September 2011

check the trajectory

We checked the trajectory and got following results:

Reading frame       0 time    0.000   
# Atoms  96543
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Furthermore, we got some detailed results about the different items during the simulation.

Item #frames Timestep (ps)
Step 2001 5
Time 2001 5
Lambda 0 -
Coords 2001 5
Velocities 0 -
Forces 0 -
Box 2001 5

The simulation finished on node 0 Thu September 15 23:45:08 2011

Time
Node (s) Real (s) %
22438.875 22438.875 1oo%
6h13:58

The complete simulation needs 6 hours and 13 minutes runtime.

Performance
Mnbf/s GFlops ns/day hour/ns
1271.745 93.383 38.505 0.623

As you can see in the table above, it takes about half an hour to simulate 1ns of the system. So therefore, it would be possible to simulate about 40ns in one complete day calculation time.

Back to [Tay-Sachs Disease].

Visualize in PyMol

First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:

Group 1 Group 2
System Water
Protein Ion
Backbone NA
MainChain+H CL
SideChain

Figure 1 shows the visualization with ngmx:

Figure 1: Visualization of the MD simulation for the wildtype with ngmx

Furthermore, we also want to visualize the structure with PyMol, which can be seen on Figure 2.

Figure 2: Visualization of the MD simulation for the wildtype with PyMol

Back to [Tay-Sachs Disease].

Create a movie

Next, we want to visualize the protein with PyMol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continuous trajectories.

The program asks for a group as output. We only want to see the protein, therefore we decided to choose group 1.

Here you can see the movie in stick line and cartoon modus.

Figure 3: Movie of the motion of the wildtype in stick view.
Figure 4: Movie of the motion of the wildtype in cartoon view

On Figure 3 and Figure 4, we can see that motion of the protein over time, which was created by the MD simulation.

Back to [Tay-Sachs Disease].

energy calculations for pressure, temperature, potential and total energy

Pressure

Average (in bar) 1.00711
Error Estimation 0.0087
RMSD 71.2473
Tot-Drift -0.0454746
Minimum -217.3543
Maximum 231.9909

The plot with the pressure distribution of the system can be seen here:

Figure 5: Plot of the pressure distribution of the MD system.

As you can see on Figure 5, the pressure in the system is most of the time about 0, but there a big outliers with about 232 and about -217 bar. So therefore we are not sure, if a protein can work under such a pressure.

Temperature

Average (in K) 297.94
Error Estimation 0.0029
RMSD 0.942857
Tot-Drift 0.0066475
Minimum (in K) 294.82
Maximum (in K) 301.31

The plot with the temperature distribution of the system can be seen here:

Figure 6: Plot of the temperature distribution of the MD system.

As you can see on Figure 6, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maximum temperature is only about 7 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.

Back to [Tay-Sachs Disease].

Potential

Average (in kJ/mol) -1.2815e+06
Error Estimation 100
RMSD 1078.55
Tot-Drift -661.902
Minimum (in kJ/mol) -1.2853e+06
Maximum (in kJ/mol) -1.2778e+06

The plot with the potential energy distribution of the system can be seen here:

Figure 7: Plot of the potential energy distribution of the MD system.

As can be seen on Figure 7, the potential energy of the system is between -1.2853e+06 and -1.2778e+06, which is a relatively low energy. Therefore, this means that the protein is stable. So we can suggest, that the protein with such a low energy is stable and able to function and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.

Back to [Tay-Sachs Disease].

Total energy

Average (in kJ/mol) -1.05177e+06
Error Estimation 100
RMSD 1321.31
Tot-Drift -656.777
Minimum (in kJ/mol) -1.05599e+06
Maximum (in kJ/mol) -1.04718e+06

The plot with the total energy distribution of the system can be seen here:

Figure 8: Plot of the total energy distribution of the MD system.

As we can see on Figure 8 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.05599e+06 and -1.04718e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.

Back to [Tay-Sachs Disease].

minimum distance between periodic boundary cells

Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.

Here you can see the result of this analysis.

Figure 9: Plot of the minimum distance between periodic boundary cells.
Average (in nm) 3.139
Minimum 1.770
Maximum 4.081

As you can see on Figure 9, there is a huge difference between the different time steps and distances. The highest distance is up to 4 nm, whereas the smallest distance is only about 2 nm. So in general the distances between the parts of the protein which interact with each other are between 2 and 4nm, which is not that near. Because of the jumping curve in the plot we can conclude, that the protein is flexible, because the minimum distance of the interaction is always different which means, that atoms which were in contact lost it and therefore the minimum distance decrease or increase.

Back to [Tay-Sachs Disease].

RMSF for protein and C-alpha and PyMol analysis of average and B-factor

Protein

First of all, we calculate the RMSF for the whole protein.

The analysis produce two different PDB files, one file with the average structure of the protein and the other file with high B-Factor values, which means that the high flexible regions of the protein are not in accordance with the original PDB file.

To compare the structure we align them with PyMol to the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 10: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 11: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 12: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
Perspective two
Figure 13: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 14: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 15: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
RMSD
1.556 0.349 1.684

The structure with the high B-factors is the most similar structure (Figure 11 and Figure 14) compared with the original structure from PDB (Figure 10 and Figure 13). The average structure is not that similar (Figure 13 and Figure 15). But we know, that the regions with high B-Factors are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 16:

Figure 16: Plot of the RMSF values over the whole protein.

There are a lot regions with high B-factor values. The highest B-factor value can be found on position 150 (Figure 17), but there are also high values on position 110 (Figure 16), 290 (Figure 19), 320 (Figure 20), 410 (Figure 21), 460 (Figure 22) and 490 (Figure 23). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.

Figure 17: Part of the alignment between the original structure and the average structure between residue 110 and 120.
Figure 18: Part of the alignment between the original structure and the average structure between residue 140 and 160.
Figure 19: Part of the alignment between the original structure and the average structure between residue 280 and 300.
Figure 20: Part of the alignment between the original structure and the average structure between residue 310 and 330.
Figure 21: Part of the alignment between the original structure and the average structure between residue 400 and 420.
Figure 22: Part of the alignment between the original structure and the average structure between residue 450 and 470.
Figure 23: Part of the alignment between the original structure and the average structure between residue 480 and 500.

As we can see in the pictures above (Figure 17 - Figure 23), there is always a little difference between the two structures. Therefore, this regions seem to be flexible.

Furthermore, we visualized the B-factors with the PyMol selection B-factor method. We calculated the B-factors for the blue protein (Figure 24 - Figure 30). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.

Figure 24: Part of the alignment between the original structure and the average structure between residue 110 and 120. High B-Factor value -> bright color.
Figure 25: Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color.
Figure 26: Part of the alignment between the original structure and the average structure between residue 280 and 300. High B-Factor value -> bright color.
Figure 27: Part of the alignment between the original structure and the average structure between residue 310 and 330. High B-Factor value -> bright color.
Figure 28: Part of the alignment between the original structure and the average structure between residue 400 and 420. High B-Factor value -> bright color.
Figure 29: Part of the alignment between the original structure and the average structure between residue 450 and 470. High B-Factor value -> bright color.
Figure 30: Part of the alignment between the original structure and the average structure between residue 480 and 500. High B-Factor value -> bright color.

In Figure 24 and Figure 25, the color of the protein is turquoise, which shows that there is a relatively high B-factor value. Figure 26 is also colored in turquoise. The pictures which show the center of the protein (Figure 27 - Figure 29) also show only a dark blue protein which means that there is a low B-factor value and therefore the plot or the picture are wrong. The last picture (Figure 30) shows us that there is a high B-factor value in the chain.

Back to [Tay-Sachs Disease].

C-alpha

Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.

To compare the structure we align them with PyMol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 31: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 32: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 33: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
Perspective two
Figure 34: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 35: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 36: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
RMSD
1.373 0.279 -

As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 22 and Figure 25). This was expected, because we used twice the same model, but in this case we neglect the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 21, Figure 23, Figure 24 and Figure 26) have nearly the same RMSD value and therefore there are equally.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 37:

Figure 37: Distribution of the b-factor values by only regarding the backbone of the protein.

In this case, there are only three high peaks at position 150, 280 and 310. By comparison to figure 16, these high peaks can also be found in this plot. Furthermore, it was possible to observe these high B-factor values in the pictures.

Back to [Tay-Sachs Disease].

Radius of gyration

Next, we want to analyze the radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.

Figure 38: Plot with the distribution of the radius of gyration over time
Figure 39: Plot with the distribution of the radius of gyration over time
Rg (in nm) RgX (in nm) RgY (in nm) RgZ (in nm)
Average 2.407 2.153 1.609 2.084
Minimum 2.346 2.012 1.423 1.945
Maximum 2.440 2.214 1.807 2.238


Figure 38 shows the radius of gyration over the simulation time. The radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. But over time, the radius grows, so in the beginning the protein has a radius about 2.39, whereas in the end of the simulation the radius is about 2,43.

If we have also a look to the radius of the different axis on Figure 39, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less differences between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that the radius of gyration consists especially of motions in x and z direction and not so much of motions in y direction.

Back to [Tay-Sachs Disease].

solvent accessible surface area

Next, we analyzed the solvent accessible surface area of the protein, which is the area of the protein which has contacts with the surrounding environment, mainly water.

First of all, we have a look at the solvent accessibility of each residue, which can be seen in Figure 40 and 41.

Figure 40: Solvent accessibility of each residue in the protein
Figure 41: Solvent accessibility of each residue in the protein with standard deviation

The following table lists the average, minimum and maximum values of the solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.

Average (in nm²) 0.537
Minimum (in nm²) 0.004
Maximum (in nm²) 2.058

The solvent accessibility area is the area of each residue which is accessible to the surface of the protein. In Figure 40 and Figure 41 we can see the average solvent accessibility of each residue over the complete simulation. Therefore, this means, that each residue can be more or less on the solvent in each state of the simulation.
We can see on the Figure 41 that the first 89 and the last 100 residues have an average solvent accessibility of 0, therefore, this means these residues are always completely in the interior of the protein. A lot of residues have a solvent accessibility about 0.5nm², and there are only some outliers with an accessibility of more than 1.5nm². So therefore, there are some residues which are almost always on the surface, a lot of residues which are partly or temporarily on the surface and also a lot of residues which are never on the surface. If we look at Figure 41, we can see that the standard deviation is relatively low, which means that there are no states of the system in which some residues with low or no solvent accessibility are complete accessible to the surface. If the standard deviation would be very high, it could be possible that there are states in the simulation which are very unusual. But this is not the case in our simulation.

Figure 42: Solvent accessibility of each atom of the complete protein.
Figure 43: Solvent accessibility of each atom of the complete protein with standard deviation.

As before, we only regard the atoms with an area of more than 0 in the following table.

Average (in nm²) 0.031
Minimum (in nm²) 0
Maximum (in nm²) 0.560

In Figure 42 the average area per atom is plotted, which shows a similar picture as on Figure 40, but at the beginning there are no atoms with no surface accessibility. In general the atoms have not that big area as the residues, which is clear, because the area of the residues consist of the area of the single atoms which belong to this residue. As before, the standard deviation which is plotted in Figure 43, is not that high. It is a little bit higher than on Figure 41, but that was expected, because this Figure is more detailed and the scale is smaller, but in general Figure 42 and Figure 43 confirm the results of Figure 40 and Figure 41. At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result of the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. But there are a lot of atoms with low or no accessibility area in the plot.

Figure 44: Area of the protein which is accessible to the surface during the simulation.
Figure 45: Area of the protein which is accessible to the surface during the simulation with standard deviation.
Average (in nm²) 135.036
Minimum (in nm²) 129.084
Maximum (in nm²) 142.218

On Figure 44 and Figure 45 we can see the solvent accessibility surface of the protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 135nm², which is relatively consistent during the complete simulation. If we have a closer look to the distribution of the different physicochemical properties and the surface accessibility of them we can see that the accessibility of the hydrophobic amino acids is larger than the accessibility of the hydrophilic amino acids. This is really surprising, because in general hydrophobic amino acids prefer a location in the core of the protein and not on the surface.

Back to [Tay-Sachs Disease].

hydrogen-bonds

In this case, we differ between hydrogen-bonds between the protein itself and bonds between the protein and the water.

As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differs extremely over the time.

In Figure 46 you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 360. Most of the time, the protein has between 320 and 340 hydrogen-bonds. If we look at Figure 47, we can see, that there are a lot more possible hydrogen bindings than occurred in real. There are about 1500 pairs of atoms with a distance less than 0.35nm, which is a distance where a hydrogen bond is theoretically possible. So therefore, you can see, that the protein has only a small number of hydrogen bonds, about 20% of all possible hydrogen bonds. Therefore, this protein could be very flexible, because of the small number of hydrogen bonds.

Figure 46: Number of hydrogen-bonds in the protein over simulation time
Figure 47: Number of hydrogen-bonds and possible hydrogen-bonds in the protein over simulation time
bonds in the protein possible bonds in the protein
Average 328.758 1537.77
Minimum 294 1486
Maximum 361 1587

If we have a look at the number of bonds between the protein and the water, which are visualized on Figure 48, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 850 bonds with the water. In Figure 49 we can see how many pairs of residues have a distance of less than 0.35 nm, which means these pairs are able to build hydrogen bonds with each other. The distance of possible and real occurring hydrogen bonds, shown in Figure 49, is significantly lower than on Figure 45. So therefore, in this case almost 80% of all possible hydrogen bonds are also real hydrogen bonds. Therefore, we can see that the binding between protein and water is really stable.

Figure 48: Number of hydrogen-bonds between the protein and the surrounding water.
Figure 49: Number of hydrogen-bonds and possible hydrogen-bonds between the protein and the surrounding water.
bonds between protein and water possible bonds between protein and water
Average 836.94 981.18
Minimum 768 853
Maximum 916 1091

The high number of hydrogen bonds between protein and water is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.

Back to [Tay-Sachs Disease].

Ramachandran plot

Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyze the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.

Figure 50: Ramachandran Plot of the wild type.

As we can see on Figure 50, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which is right.

Back to [Tay-Sachs Disease].

RMSD matrix

Next we analyzed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.

The following matrix shows the RMSD values of our structures.

Figure 51: RMSD matrix of our structures during the simulation

As you can see on Figure 51, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Only near on the diagonal there are some structures which are colored in cyan, but in general, most of the structures are colored in green which means a RMSD value of about 0.15 Angstrom. So we can see that there are differences of the protein structure during the simulation and therefore we can conclude that this protein is very flexible.

Back to [Tay-Sachs Disease].

cluster analysis

Next, we started a cluster analysis. First of all, we found 225 different clusters.

We visualized all of these cluster structures on Figure 52:

Figure 52: Visualization of the 225 different clusters

Next we aligned some structures of the cluster and measured the RMSD:

Cluster 1 Cluster 2 RMSD
cluster 1 cluster 2 0.654
cluster 1 cluster 5 0.899

The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. But if we align clusters which are far away, the RMSD value increase, which shows us that over the simulation time the differences between the start structure and the simulated structures increase. Therefore, we can see that the different structures of the simulation are very similar, but over the time they change more and more.

To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 53.

Figure 53: Distribution of the RMSD value over the different clusters

On Figure 53, it is possible to see, that the distribution is a Gaussian distribution, with the highest peak at 0.2 Angstrom. This means, that most of the structures have a RMSD about 0.2 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.

Back to [Tay-Sachs Disease].

internal RMSD

The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.

Figure 54: Plot of the distance RMS values in the protein.
Average (RMSD in nm) 0.238
Minimum (RMSD in nm) 4.89e-7
Maximum (RMSD in nm) 0.312

As we can see on figure 54, at the beginning of our simulation the RMSD is relatively small, but it decreases almost the complete simulation. The internal RMSD is in the end at 0.28 Angstrom, which is not relatively high. Therefore the protein has no big distances in it self.

Back to [Tay-Sachs Disease].