Difference between revisions of "MD Mutation436"

From Bioinformatikpedia
(internal RMSD)
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
=== check the trajectory ===
 
=== check the trajectory ===
   
We checked the trajectory with following command:
+
We checked the trajectory and got following results:
 
gmxcheck -f mut436_md.xtc
 
 
With the command we got following results:
 
   
 
Reading frame 0 time 0.000
 
Reading frame 0 time 0.000
Line 49: Line 45:
 
|}
 
|}
   
The simulation finished on node 0 Fri Aug 26 08:40:07 2011.
+
The simulation finished on node 0 Friday August 26 08:40:07 2011.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 66: Line 62:
 
|}
 
|}
   
The complete simulation needs 9 hours and 41 minutes to finishing.
+
The complete simulation needs 9 hours and 41 minutes runtime.
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|colspan="4" | Performance
 
|colspan="4" | Performance
Line 82: Line 78:
 
|}
 
|}
   
As you can see in the table above, it takes about 1 hour to simulat 1ns of the system. So therefore, it would be possible to simulate about 25ns in one complete day calculation time.
+
As you can see in the table above, it takes about 1 hour to simulate 1 nano second (ns) of the system. So therefore, it would be possible to simulate about 25ns in one complete day calculation time.
   
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
=== Visualize in pymol ===
 
  +
<br><br>
  +
=== Visualize in PyMol ===
   
 
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
 
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
Line 111: Line 109:
 
Figure 1 shows the visualization with ngmx:
 
Figure 1 shows the visualization with ngmx:
   
[[Image:ngmx_mut436.png|thumb|center|Figure 1: Visualisation of the MD simulation for Mutation 436 with ngmx]]
+
[[Image:ngmx_mut436.png|thumb|center|Figure 1: Visualization of the MD simulation for Mutation 436 with ngmx]]
   
  +
Furthermore, we also want to visualize the structure with PyMol, which can be seen on Figure 2.
=== create a movie ===
 
   
  +
[[Image:mut436.png|thumb|center|Figure 2: Visualization of the MD simulation for mutation 436 with PyMol]]
Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continuse trajectories. Therefore, we used following command:
 
   
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
trjconv -s fole.tpr -f file.xtc -o output_file.pdb -pbc nojump -dt 10
 
  +
<br><br>
  +
=== Create a movie ===
   
  +
Next, we want to visualize the protein with PyMol. Therefore, we extracted 1000 frames from the
The program asks for the a group as output. We only want to see the protein, therefore we decided to use group 1.
 
  +
trajectory, leaving out the water and jump over the boundaries to make continuous trajectories.
   
  +
The program asks for the a group as output. We only want to see the protein, therefore we decided to choose group 1.
Todo: film und filtered
 
   
  +
Here you can see the movie in stick line and cartoon modus.
  +
  +
{|
  +
|[[Image:mut436_animation.gif|thumb|center|Figure 3: Movie of the motion of mutation 436 in stick view.]]
  +
|[[Image:mut436_antimation_1.gif|thumb|center|Figure 4: Movie of the motion of mutation 436 in cartoon view]]
  +
|}
  +
  +
On Figure 3 and Figure 4, we can see that motion of the protein over time, which was created by the MD simulation.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
 
=== energy calculations for pressure, temperature, potential and total energy ===
 
=== energy calculations for pressure, temperature, potential and total energy ===
   
Line 141: Line 153:
 
|-
 
|-
 
|Minimum (in bar)
 
|Minimum (in bar)
  +
| -219.7197
|
 
 
|-
 
|-
 
|Maximum (in bar)
 
|Maximum (in bar)
  +
|238.8288
|
 
 
|-
 
|-
 
|}
 
|}
Line 152: Line 164:
 
[[Image:mut436_md_pressure.png|thumb|center|Figure 5: Plot of the pressure distribution of the MD system.]]
 
[[Image:mut436_md_pressure.png|thumb|center|Figure 5: Plot of the pressure distribution of the MD system.]]
   
As you can see on Figure 5, the pressure in the system is most of the time about 1, but there a big outlier with 250 and -250 bar. So therefore we are not sure, if a protein can work with such a pressure.
+
As you can see on Figure 5, the pressure in the system is most of the time about 1, but there a big outliers with 250 and -250 bar. So therefore we are not sure, if a protein can work with such a pressure.
   
 
==== Temperature ====
 
==== Temperature ====
Line 179: Line 191:
 
The plot with the temperature distribution of the system can be seen here:
 
The plot with the temperature distribution of the system can be seen here:
   
[[Image:mut436_md_temperatur.png|thumb|center|Figure 2: Plot of the temperature distribution of the MD system.]]
+
[[Image:mut436_md_temperatur.png|thumb|center|Figure 6: Plot of the temperature distribution of the MD system.]]
   
As you can see on Figure 2, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 6 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relativly cold for a protein to work, because the temperature in our bodies is about 36°C.
+
As you can see on Figure 6, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maximum temperature is only about 6 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.
   
 
==== Potential ====
 
==== Potential ====
Line 208: Line 220:
 
The plot with the potential energy distribution of the system can be seen here:
 
The plot with the potential energy distribution of the system can be seen here:
   
[[Image:mut436_md_potential.png|thumb|center|Figure 3: Plot of the potential energy distribution of the MD system.]]
+
[[Image:mut436_md_potential.png|thumb|center|Figure 7: Plot of the potential energy distribution of the MD system.]]
   
As can be seen on Figure 3, the potential engery of the system is between -1.2771e+06 and -1.2852e+06, which is a relativly low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.
+
As can be seen on Figure 7, the potential energy of the system is between -1.2771e+06 and -1.2852e+06, which is a relatively low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
==== Total energy ====
 
==== Total energy ====
Line 237: Line 252:
 
The plot with the total energy distribution of the system can be seen here:
 
The plot with the total energy distribution of the system can be seen here:
   
[[Image:mut436_md_total.png|thumb|center|Figure 4: Plot of the total energy distribution of the MD system.]]
+
[[Image:mut436_md_total.png|thumb|center|Figure 8: Plot of the total energy distribution of the MD system.]]
   
As we can see on Figure 4 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.05e+06 and -1.051e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.
+
As we can see on Figure 8 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.05e+06 and -1.051e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== minimum distance between periodic boundary cells ===
 
=== minimum distance between periodic boundary cells ===
Line 247: Line 265:
 
Here you can see the result of this analysis.
 
Here you can see the result of this analysis.
   
  +
[[Image:mut436_md_periodic_2.png|thumb|center|Figure 6: Plot of the minimum distance between periodic boundary cells.]]
 
  +
[[Image:mut436_md_periodic_2.png|thumb|center|Figure 9: Plot of the minimum distance between periodic boundary cells.]]
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 261: Line 280:
 
|}
 
|}
   
As you can see on Figure 6, there is a huge difference between the different time steps and distances. The highest distance is 4.096 nm, whereas the smallest distance is only 1.408 nm. Therefore, there are some states during the simulation in which atoms are close together if the interact and there are some states in which the atoms who interact are far away. Because of the huge bandwidth of minimum distance we can conclude, that the protein is flexible
+
As you can see on Figure 9, there is a huge difference between the different time steps and distances. The highest distance is 4.096 nm, whereas the smallest distance is only 1.408 nm. Therefore, there are some states during the simulation in which atoms are close together if they interact and there are some states in which the atoms who interact are far away. Because of the huge bandwidth of minimum distance we can conclude, that the protein is flexible
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
=== RMSF for protein and C-alpha and Pymol analysis of average and bfactor ===
+
=== RMSF for protein and C-alpha and PyMol analysis of average and b-factor ===
   
 
==== Protein ====
 
==== Protein ====
Line 269: Line 291:
 
First of all, we calculate the RMSF for the whole protein.
 
First of all, we calculate the RMSF for the whole protein.
   
The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.
+
The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexible regions of the protein are not in accordance with the original PDB file.
   
To compare the structure we align them with pymol with the original structure.
+
To compare the structure we align them with PyMol with the original structure.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 280: Line 302:
 
|colspan="3" | Perspective one
 
|colspan="3" | Perspective one
 
|-
 
|-
| [[Image:average_original.png|thumb|Figure 7: Alignment of the original structure (green) and the calculated average structure (turquoise)]]
+
| [[Image:average_original.png|thumb|Figure 10: Alignment of the original structure (green) and the calculated average structure (turquoise)]]
| [[Image:bfactor_original.png|thumb|Figure 8: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)]]
+
| [[Image:bfactor_original.png|thumb|Figure 11: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)]]
| [[Image:bfactor_average.png|thumb|Figure 9: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)]]
+
| [[Image:bfactor_average.png|thumb|Figure 12: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)]]
 
|-
 
|-
 
|colspan="3" | Perspective two
 
|colspan="3" | Perspective two
 
|-
 
|-
| [[Image:average_original_2.png|thumb|Figure 10: Alignment of the original structure (green) and the calculated average structure (turquoise)]]
+
| [[Image:average_original_2.png|thumb|Figure 13: Alignment of the original structure (green) and the calculated average structure (turquoise)]]
| [[Image:bfactor_original_2.png|thumb|Figure 11: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)]]
+
| [[Image:bfactor_original_2.png|thumb|Figure 14: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)]]
| [[Image:bfactor_average_2.png|thumb|Figure 12: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)]]
+
| [[Image:bfactor_average_2.png|thumb|Figure 15: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)]]
 
|-
 
|-
 
|colspan="3" | RMSD
 
|colspan="3" | RMSD
Line 298: Line 320:
 
|}
 
|}
   
The structure with the high B-factors is the most similar structure (Figure 8 and Figure 11) compared with the original structure from PDB (Figure 7 and Figure 10). The average structure is not that similar (Figure 10 and Figure 12). But we know, that the regions with high B-Factors are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.
+
The structure with the high B-factors is the most similar structure compared to the original structure from PDB (Figure 11 and Figure 13). The average structure is not that similar (Figure 10 and Figure 13). But we know, that the regions with high B-factor values are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.
   
Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 13:
+
Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 16:
   
[[Image:rmsf_protein.png|thumb|center|Figure 13: Plot of the RMSF values over the whole protein.]]
+
[[Image:rmsf_protein.png|thumb|center|Figure 16: Plot of the RMSF values over the whole protein.]]
   
There are two regions with very high B-factor values. One region at position 150 (Figure 14), and the other region at position 440 (Figure 15). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.
+
There are two regions with very high B-factor values. One region at position 150 (Figure 17), and the other region at position 440 (Figure 18). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.
   
 
{|
 
{|
| [[Image:140-160.png|thumb|Figure 14: Part of the alignment between the original structure and the average structure between residue 140 and 160.]]
+
| [[Image:140-160.png|thumb|Figure 17: Part of the alignment between the original structure and the average structure between residue 140 and 160.]]
| [[Image:430-450.png|thumb|Figure 15: Part of the alignment between the original structure and the average structure between residue 430 and 450.]]
+
| [[Image:430-450.png|thumb|Figure 18: Part of the alignment between the original structure and the average structure between residue 430 and 450.]]
 
|}
 
|}
   
Furthermore, we visualized the B-factors with the pymol selection B-factor method. We calculated the B-factors for the blue protein (Figure 16 and Figure 17). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
+
Furthermore, we visualized the B-factors with the PyMol selection B-factor method. We calculated the B-factors for the blue protein (Figure 19 and Figure 20). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
   
 
{|
 
{|
| [[Image:140.png|thumb|Figure 16: Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color]]
+
| [[Image:140.png|thumb|Figure 19: Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color]]
| [[Image:430.png|thumb|Figure 17: Part of the alignment between the original structure and the average structure between residue 430 and 450. High B-Factor value -> bright color]]
+
| [[Image:430.png|thumb|Figure 20: Part of the alignment between the original structure and the average structure between residue 430 and 450. High B-Factor value -> bright color]]
 
|}
 
|}
   
In the second picture, you can see, that the color is dark blue. Therefore a peak lower than 0.3 do not mean that there is high flexibility. Therefore, our protein has only one very flexible region and this is around residue 140.
+
In the second picture, you can see, that the color is dark blue. Therefore a peak lower than 0.3 do not mean that there is high flexibility. Hence, our protein has only one very flexible region and this is around residue 140.
   
   
As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have a very different position and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. This also explains the relatively high RMSD values, because of the different positions of the flexible parts of the protein.
+
As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have very different positions and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. This also explains the relatively high RMSD values, because of the different positions of the flexible parts of the protein.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
==== C-alpha ====
 
==== C-alpha ====
   
Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
+
Now we repeat the analysis, done for the protein, for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
   
To compare the structure we align them with pymol with the original structure.
+
To compare the structure we align them with PyMol to the original structure.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 336: Line 361:
 
|colspan="3" | Perspective one
 
|colspan="3" | Perspective one
 
|-
 
|-
| [[Image:average_original_c.png|thumb|Figure 18: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)]]
+
| [[Image:average_original_c.png|thumb|Figure 21: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)]]
| [[Image:bfactor_original_c.png|thumb|Figure 19: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)]]
+
| [[Image:bfactor_original_c.png|thumb|Figure 22: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)]]
| [[Image:bfactor_average_c.png|thumb|Figure 20: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)]]
+
| [[Image:bfactor_average_c.png|thumb|Figure 23: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)]]
 
|-
 
|-
 
|colspan="3" | Perspective two
 
|colspan="3" | Perspective two
 
|-
 
|-
| [[Image:average_original_c_2.png|thumb|Figure 21: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)]]
+
| [[Image:average_original_c_2.png|thumb|Figure 24: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)]]
| [[Image:bfactor_original_c_2.png|thumb|Figure 22: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)]]
+
| [[Image:bfactor_original_c_2.png|thumb|Figure 25: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)]]
| [[Image:bfactor_average_c_2.png|thumb|Figure 23: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)]]
+
| [[Image:bfactor_average_c_2.png|thumb|Figure 26: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)]]
 
|-
 
|-
 
|colspan="3" | RMSD
 
|colspan="3" | RMSD
Line 354: Line 379:
 
|}
 
|}
   
As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 19 and Figure 22). This was expected, because we used twice the same model, but in this case we neglecte the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 18, Figure 20, Figure 21 and Figure 23) have nearly the same RMSD value and therefore there are equally.
+
The structure alignments and the according RMSD delivers the same results as in the section above. The RMSD of the alignment between the structure with high B-factor and the original one is the smallest one which indicates that this structures align best (Figure 22 and Figure 25). This was expected, because we used twice the same model, but in this case we neglected the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 21, Figure 23, Figure 24 and Figure 26) have nearly the same RMSD value and therefore there are equally.
   
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 24:
+
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 27:
   
[[Image:bfactor_plot_calpha.png|thumb|center|Figure 24: Distribution of the b-factor values by only regarding the backbone of the protein.]]
+
[[Image:bfactor_plot_calpha.png|thumb|center|Figure 27: Distribution of the b-factor values by only regarding the backbone of the protein.]]
   
In this case, there is only one high peak at position 150. By observing the whole protein, it was possible to see, that the position of the beta sheet differs extremely between the two models. The other peak at position 440 could not be found in the plot. By a look at the picture above, we can see that the backbone do not differ extremely between the two models. Therefore, in this case the position of the residues has to be very different, which is not important in our case, because we do not regard side chains.
+
In this case, there is only one high peak at position 150. Having a closer look at the protein it can be seen that the position of the beta sheets differ extremely between the two models.
  +
The other peak at position 440 could not be found in the plot. Looking at the pictures above, we can see that the backbones of the two different models not differ extremely. This means that the position of the residues differ a lot, which is not important for us, because we do not regard side chains.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== Radius of gyration ===
 
=== Radius of gyration ===
   
Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
+
Next, we want to analyze the radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
   
 
{|
 
{|
|[[Image:radius_of_gyration.png|thumb|center|Figure 25: Plot with the distribution of the radius of gyration over time]]
+
|[[Image:radius_of_gyration.png|thumb|center|Figure 28: Plot with the distribution of the radius of gyration over time]]
|[[Image:radius_of_gyration_bunt.png|thumb|center|Figure 26: Plot with the distribution of the radius of gyration over time for each axis]]
+
|[[Image:radius_of_gyration_bunt.png|thumb|center|Figure 29: Plot with the distribution of the radius of gyration over time for each axis]]
 
|}
 
|}
   
Line 398: Line 427:
 
|}
 
|}
   
  +
Figure 28 shows the radius of gyration over the simulation time.
Figure 25 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs.
 
  +
The radius of gyration is the RMS distance from the outer parts of the protein to the protein center or gyration axis.
  +
The plot displays that the average radius is about 2.4 with some fluctuation. This indicates that the protein is flexible. Furthermore, the fluctuation is a periodic curve which shows the loss and the gain of space the protein needs. This suggest that the protein pulsates.
   
If we have a look also to the radius of the different axis (Figure 26), we can see, that only the radius of the x coordinates is mostly consistent around 2nm during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values and decreae during the simulation. Especially at the end of the simulation, the Rg values for z are very low. The y axis values, however, increase during the simulation, but do not reach the values of the x axis. So, therefore, we can see, that most of the Radius of gyration is because of motions in x and at the beginning in z and at the end in y direction.
+
If we have a further look at the radius of the different axis (Figure 29), we can see, that the radius of the x coordinates is the only consistent one at about 2nm during the simulation. The radius of the z axis shows deflection at the end of the simulation where it decreases. Especially at the end of the simulation, the Rg values for z are very low. The y axis values, however, increase during the simulation, but do not reach the values of the x axis. This shows that the motions in x direction, the motion in z direction at the beginning and motion in y direction at the end of the simulation has most influence on the whole gyration radius.
   
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
=== solvent accesible surface area ===
 
  +
<br><br>
  +
  +
=== solvent accessible surface area ===
   
Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.
+
Next, we analyzed the solvent accessible surface area of the protein, which is the area of the protein which has contacts with the surrounding environment, mainly water.
   
First of all, we have a look at the solvent accesibility of each residue, which can be seen on Figure 27.
+
First of all, we have a look at the solvent accessibility of each residue, which can be seen on Figure 27.
   
 
{|
 
{|
|[[Image:mut436_md_residue_solvent.png|thumb|center|Figure 27: Solvent accesibility area of each residue in the protein]]
+
|[[Image:mut436_md_residue_solvent.png|thumb|center|Figure 30: Solvent accessibility area of each residue in the protein]]
|[[Image:mut436_md_residue_solvent_bunt.png|thumb|center|Figure 28: Solvent accesibility area of each residue in the protein with standard deviation]]
+
|[[Image:mut436_md_residue_solvent_bunt.png|thumb|center|Figure 31: Solvent accessibility area of each residue in the protein with standard deviation]]
 
|}
 
|}
   
The following table list the average, minimum and maximum values of the Solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.
+
The following table list the average, minimum and maximum values of the solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
|Average (in nm^2)
+
|Average (in nm²)
 
|0.553
 
|0.553
 
|-
 
|-
|Minimum (in nm^2)
+
|Minimum (in nm²)
 
|0.003
 
|0.003
 
|-
 
|-
|Maximum (in nm^2)
+
|Maximum (in nm²)
 
|2.005
 
|2.005
 
|-
 
|-
 
|}
 
|}
   
The average area per residue during the trajectory is between 0 and 2nm^2, as can be seen on Figure 27. Most of the residues have an area about 0.5nm^2. So therefore, there are some very flexible residues, but most of the residues only move a little bit during the complete simulation. In Figure 28, you can also see the standard deviation, which is very low, so therefore in there are no big outlier, which means that there is no big deviation from the average area and the residues behave in the same way during the trajectory.
+
The average area per residue during the trajectory is between 0 and 2nm², which can be seen on Figure 30. Most of the residues have an area about 0.5nm². From this it follows that there are mainly sparse moving residues during the complete simulation with some exceptions where the residues are very flexible. In Figure 31, you can additionally see the standard deviation, which is very low and which indicates that there are no big outliers in there. This means that there is no big deviation from the average area and that the residues behave in the same way during the trajectory.
  +
Furthermore, it is also possible to look at the solvent accesibility of each atom of the complete protein, which can be seen on Figure 29 and Figure 30.
 
  +
Besides, we can analysis the position of the residues within the protein based on the solvent accessibility. First, we can see in the Figure 30 that the first 100 and the last 100 residues have an average solvent accessibility of 0 which means that these residues are always completely in the interior of the protein. Most of the residues have a solvent accessibility about 0.5nm², and there are only some outliers with an accessibility of more than 1.5nm². This means that there are some residues which are almost always on the surface, a lot of residues which are partly or temporarily on the surface and a lot of residues which are never on the surface.
<br>
 
  +
Looking at Figure 31, we can see that the standard deviation is relatively low. This means that there are no system states in which any residues with low or no solvent accessibility get complete accessible to the surface. If the standard deviation would be very high, it would indicate that there are some very unusual states in the simulation which is not the case in our simulation.
We can see on the Figure 27 that the first 100 and the last 100 residues have an average solvent accessibility of 0, therefore, this means these residues are always completely in the interior of the protein. A lot of residues have a solvent accessibility about 0.5nm, and there are only some outlier with an accessibility of more than 1.5nm^2. So therefore, there are some residues which are almost always on the surface, a lot of residues which are partly or temporarly on the surface and also a lot of residues which are never on the surface.
 
  +
If we look at Figure 28, we can see that the standard deviation is relatvely low, which means that there are no states of the system in which some residues with low or no solvent accessibility are complete accessible to the surface. If the standard deviation would be very high, it could be possible that there are states in the simulation which are very unusual. But this is not the case in our simulation.
 
  +
Furthermore, it is possible to look at the solvent accessibility of each atom of the complete protein, which can be seen in Figure 32 and Figure 33.
  +
 
{|
 
{|
|[[Image:mut436_md_atomic_solvent.png|thumb|center|Figure 29: Solvent accesibility of each atom of the complete protein.]]
+
|[[Image:mut436_md_atomic_solvent.png|thumb|center|Figure 32: Solvent accessibility of each atom of the complete protein.]]
|[[Image:mut436_md_atomic_solvent_bunt.png|thumb|center|Figure 30: Solvent accesibility of each atom of the complete protein with standard deviation.]]
+
|[[Image:mut436_md_atomic_solvent_bunt.png|thumb|center|Figure 33: Solvent accessibility of each atom of the complete protein with standard deviation.]]
 
|}
 
|}
   
Line 440: Line 476:
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
|Average (in nm^2)
+
|Average (in nm²)
 
|0.032
 
|0.032
 
|-
 
|-
|Minimum (in nm^2)
+
|Minimum (in nm²)
 
|0
 
|0
 
|-
 
|-
|Maximum (in nm^2)
+
|Maximum (in nm²)
 
|0.558
 
|0.558
 
|-
 
|-
 
|}
 
|}
   
In Figure 29 the average area per atom is ploted, which shows a similar picture as on Figure 27. In general the atoms have not that big area as the residues, which is clear, because the area of the residues consist of the area of the single atoms which belong to this residue. There is a huge number of atoms which have an area of about 0nm^2. As before, the standard deviation is not that high (Figure 30). It is a little bit higher than on Figure 28, but that was expected, because this Figure is more detailed and the scale is smaller, but in general Figure 29 and Figure 30 confirm the results of Figure 27 and Figure 28.
+
In Figure 32 the average area per atom is plotted, which deliver similar results to Figure 30. In general the atoms have not such a big area as the residues. This can be explained easily because the residue area is consisting of the single atom areas which belong to this residue.
  +
There are a huge number of atoms which have an area of about 0nm². As before, the standard deviation is not that high (Figure 33). It is a little bit higher than than the one in Figure 31 which was expected, because of the smaller and more detailed scale of this Figure. In general Figure 32 and Figure 33 confirm the results of Figure 30 and Figure 31.
   
At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result of the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. But there are a lot of atoms with low or no accessibility area in the plot and we know, that gromacs is a non-deterministic algorithm. Therefore, this result should be consistent with the results of the different residues.
+
At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result for the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. However, there are a lot of atoms with low or no accessibility area in the plot. Gromacs is a non-deterministic algorithm and that is why this result should be consistent with the results for the different residues.
  +
<br><br>
 
  +
Figure 31 shows how much of the area of the protein is accesibile to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.
 
  +
Figure 34 shows how much of the area of the protein is accessible to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.
   
 
{|
 
{|
|[[Image:mut436_md_solvent.png|thumb|center|Figure 31: Area of the protein which is accesible to the surface during the simulation.]]
+
|[[Image:mut436_md_solvent.png|thumb|center|Figure 34: Area of the protein which is accessible to the surface during the simulation.]]
|[[Image:mut436_md_solvent_bunt.png|thumb|center|Figure 32: Area of the protein which is accesible to the surface during the simulation with standard deviation.]]
+
|[[Image:mut436_md_solvent_bunt.png|thumb|center|Figure 35: Area of the protein which is accessible to the surface during the simulation with standard deviation.]]
 
|}
 
|}
   
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
|Average (in nm^2)
+
|Average (in nm²)
 
|138.727
 
|138.727
 
|-
 
|-
|Minimum (in nm^2)
+
|Minimum (in nm²)
 
|127.066
 
|127.066
 
|-
 
|-
|Maximum (in nm^2)
+
|Maximum (in nm²)
 
|146.571
 
|146.571
 
|-
 
|-
 
|}
 
|}
   
On Figure 31 and Figure 32 we can see the solvent accessibility surface of the protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 140nm^2, which is relatively consistent during the complete simulation. If we have a closer look to the distribution of the different pysiocochemical properties and the surface accessibility of them that the area of the hydrophobic amino acids is larger than the are of the hydrophilic amino acids. This is really surprising, because in general hydrophobic amino acids prefer a location in the core of the protein and not on the surface.
+
Figure 34 and Figure 35 display the solvent accessibility surface of the whole protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 140nm², which is relatively consistent during the complete simulation. The second plot describes the solvent accessibility for different physicochemical properties. It shows that the accessibility of the hydrophobic amino acids is larger than of the hydrophilic amino acids which is unexpected. Normally, hydrophobic amino acids prefer a location in the core of the protein and not on the surface.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== hydrogen-bonds ===
 
=== hydrogen-bonds ===
   
In this case, we differ between hydrogen-bonds between the protein itselfs and bonds between the protein and the water.
+
As a next step we analysis the formed hydrogen bonds within the protein during the simulation. Here, we differ between hydrogen-bonds between the protein itself and bonds between the protein and the water.
   
As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differ extremely over the time.
+
The following plots display the number of hydrogen bonds within the protein over the simulation time.
   
 
{|
 
{|
|[[Image:mut436_mt_number_intra.png|thumb|center|Figure 33: Number of hydrogen-bonds in the protein over simulation time]]
+
|[[Image:mut436_mt_number_intra.png|thumb|center|Figure 36: Number of hydrogen-bonds in the protein over simulation time]]
|[[Image:mut436_mt_number_intra_bunt.png|thumb|center|Figure 34: Number of hydrogen-bonds and the possible hydrogen-bonds in the protein over simulation time]]
+
|[[Image:mut436_mt_number_intra_bunt.png|thumb|center|Figure 37: Number of hydrogen-bonds and the possible hydrogen-bonds in the protein over simulation time]]
 
|}
 
|}
   
Line 507: Line 548:
 
|}
 
|}
   
On Figure 33 you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 320 and 330 hydrongen-bonds.
+
In Figure 36 you can see the bonds within the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 310 and 330 hydrogen-bonds. Besides, the number of bonds declines a bit in average over the simulation time. Furthermore, it is possible to see in this plot, that the protein is flexible, because the number of bonds fluctuate extremely over the time.
   
If we look at Figure 34, we can see, that there are a lot more possible hydrogen bindings than occurred in real. There are about 1500 pairs of atoms with a distance less than 0.35nm, which is a distance where a hydrogen bond is theoretically possible. So therefore, you can see, that the protein has only a small number of hydrogen bonds, about 20% of all possible hydrogen bonds. Therefore, this protein could be very flexible, because of the small number of hydrogen bonds.
+
Figure 37 displays the number of hydrogen bonds that occur during the simulation as well as all residue pairs with a distance smaller than 0.35nm which is the distance where a hydrogen bond is theoretically possible. This plot shows that there exist much more possible hydrogen bindings than occurred in real. Here the number of possible pairs is about 1500 whereas the number of formed hydrogen bond is only between 320 and 330 which is only about 20%. The small number of formed hydrogen bonds can indicate the high protein's flexibility.
   
  +
The following plots display the number of hydrogen bonds between the protein and the surrounding water over the simulation time.
   
 
{|
 
{|
|[[Image:mut436_md_number_water.png|thumb|center|Figure 35: Number of hydrogen-bonds between the protein and the surronding water.]]
+
|[[Image:mut436_md_number_water.png|thumb|center|Figure 38: Number of hydrogen-bonds between the protein and the surronding water.]]
|[[Image:mut436_md_number_water_bunt.png|thumb|center|Figure 36: Number of hydrogen-bonds and the possible hydrogen-bonds between the protein and the surronding water.]]
+
|[[Image:mut436_md_number_water_bunt.png|thumb|center|Figure 39: Number of hydrogen-bonds and the possible hydrogen-bonds between the protein and the surrounding water.]]
 
|}
 
|}
   
Line 536: Line 578:
 
|}
 
|}
   
If we have a look at the number of bonds between the protein and the water, which are visualized on Figure 35, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 860 bonds with the water.<br>
+
Looking at the number of hydrogen bonds formed between the protein and the surrounding water, which is visualized in Figure 38, we can see that there exist much more bonds between protein and water than within the protein. The number differs between 800 and 900 which is about 3 times more than the number within the protein. Besides, the average number of bonds between water and protein increases a bit over the simulation time. However, most of the time, the protein forms between 840 and 860 bonds with the surrounding water.
   
On Figure 36 we can see how many pairs of residues have a distance of less than 0.35 nm, which means these pairs are able to build hydrogen bonds with each other. The distance of possible and real occurring hydrogen bonds is significantly lower than on Figure 34. So therefore, in this case almost 80% of all possible hydrogen bonds are also real hydrogen bonds. Therefore, we can see that the binding between protein and water is really stable.
+
Figure 39 displays additionally the number of residue pairs with a distance less than 0.35 nm which is the distance where a hydrogen bond is theoretically possible. The number of pairs within 0.35nm is about 1000. Compared to Figure 34 the distance of possible and real occurring hydrogen bonds is significantly lower. In this case it almost 80% of all possible hydrogen bonds are also real hydrogen bonds. Therefore, we can see that the binding between protein and water is really stable.
   
This is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.
+
This is no surprise, because every residue on the surface has contact with water, whereas within the protein there are a lot of amino acids which have no contact partners, because of the big underlying distance to another amino acid.
  +
<br>
 
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== Ramachandran plot ===
 
=== Ramachandran plot ===
   
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyse the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
+
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyze the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
   
[[Image:ramachandran_wt_new.png|thumb|center|Figure 37: Ramachandran Plot of the wild type.]]
+
[[Image:ramachandran_mut436_new.png|thumb|center|Figure 40: Ramachandran Plot of the wild type.]]
   
As we can see on Figure 37, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which is right.
+
As we can see on Figure 40, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which are consistent to the white regions of the standard ramachandran plot.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== RMSD matrix ===
 
=== RMSD matrix ===
   
Next we analysed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
+
Next we analyzed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
   
 
The following matrix shows the RMSD values of our structures.
 
The following matrix shows the RMSD values of our structures.
   
[[Image:mut436_md_rmsd_matrix.png|thumb|center|Figure 38: RMSD matrix of our structures during the simulation]]
+
[[Image:mut436_md_rmsd_matrix.png|thumb|center|Figure 41: RMSD matrix of our structures during the simulation]]
   
As you can see on Figure 38, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Therefore, we can conclude, that we do not find very similar structures during the simulation and our protein shows different structures by moving around.
+
As you can see on Figure 41, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Therefore, we can conclude, that we do not find very similar structures during the simulation and our protein shows different structures by moving around.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== cluster analysis ===
 
=== cluster analysis ===
Line 565: Line 615:
 
Next, we started a cluster analysis. First of all, we found 231 different clusters.
 
Next, we started a cluster analysis. First of all, we found 231 different clusters.
   
We visualized all of these cluster structres in Figure 39:
+
We visualized all of these cluster structures in Figure 42:
   
[[Image:mut436_md_clusters.png|thumb|center|Figure 39: Visualisation of the 231 different clusters]]
+
[[Image:mut436_md_clusters.png|thumb|center|Figure 42: Visualization of the 231 different clusters]]
   
 
Next we aligned some structures of the cluster and measured the RMSD:
 
Next we aligned some structures of the cluster and measured the RMSD:
Line 590: Line 640:
 
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 40.
 
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 40.
   
[[Image:mut436_rmsd_dist.png|thumb|center|Figure 40: Distribution of the RMSD value over the different clusters]]
+
[[Image:mut436_rmsd_dist.png|thumb|center|Figure 43: Distribution of the RMSD value over the different clusters]]
   
On Figure 40, it is possible to see, that the distribution is a gaussian distribution, with the highest peak at 0.18 Angstrom. This means, that most of the structures have a RMSD about 0.18 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.
+
In Figure 43, it is possible to see, that the distribution is a Gaussian distribution, with the highest peak at 0.18 Angstrom. This means, that most of the structures have a RMSD about 0.18 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>
   
 
=== internal RMSD ===
 
=== internal RMSD ===
Line 598: Line 651:
 
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
 
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
   
[[Image:mut436_md_internal_rms.png|thumb|center|Figure 41: Plot of the distance RMS values in the protein.]]
+
[[Image:mut436_md_internal_rms.png|thumb|center|Figure 44: Plot of the distance RMS values in the protein.]]
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 612: Line 665:
 
|}
 
|}
   
As we can see on Figure 41, at the beginning of our simulation the RMSD is relatively small, but it decreases almost the complete simulation. Only at Time 6000 there is a vally in the plot. The internal RMSD is in the end at 0.25 Angstorm, which is not relatively high. Therefore the protein is big distances in it self.
+
Figure 44 shows that the RMSD increases consistent during the whole simulation. At the beginning the RMSD is relatively small and then arises very fast till it reaches a point where it rises slower. Only at Time 6000 there is a valley in the plot. The internal RMSD reaches at the end about 0.25 Angstrom, which is not relatively high. Therefore the protein has a big distances to itself.
  +
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]].
  +
<br><br>

Latest revision as of 13:08, 29 September 2011

check the trajectory

We checked the trajectory and got following results:

Reading frame       0 time    0.000   
# Atoms  96555
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Furthermore, we got some detailed results about the different items during the simulation.

Item #frames Timestep (ps)
Step 2001 5
Time 2001 5
Lambda 0 -
Coords 2001 5
Velocities 0 -
Forces 0 -
Box 2001 5

The simulation finished on node 0 Friday August 26 08:40:07 2011.

Time
Node (s) Real (s) %
34860.474 34860.474 100%
9h41:00

The complete simulation needs 9 hours and 41 minutes runtime.

Performance
Mnbf/s GFlops ns/day hour/ns
818.560 60.105 24.785 0.968

As you can see in the table above, it takes about 1 hour to simulate 1 nano second (ns) of the system. So therefore, it would be possible to simulate about 25ns in one complete day calculation time.

Back to [Tay-Sachs Disease].

Visualize in PyMol

First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:

Group 1 Group 2
System Water
Protein Ion
Backbone NA
MainChain+H CL
SideChain

Figure 1 shows the visualization with ngmx:

Figure 1: Visualization of the MD simulation for Mutation 436 with ngmx

Furthermore, we also want to visualize the structure with PyMol, which can be seen on Figure 2.

Figure 2: Visualization of the MD simulation for mutation 436 with PyMol

Back to [Tay-Sachs Disease].

Create a movie

Next, we want to visualize the protein with PyMol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continuous trajectories.

The program asks for the a group as output. We only want to see the protein, therefore we decided to choose group 1.

Here you can see the movie in stick line and cartoon modus.

Figure 3: Movie of the motion of mutation 436 in stick view.
Figure 4: Movie of the motion of mutation 436 in cartoon view

On Figure 3 and Figure 4, we can see that motion of the protein over time, which was created by the MD simulation.

Back to [Tay-Sachs Disease].

energy calculations for pressure, temperature, potential and total energy

Pressure

Average (in bar) 1.0066
Error Estimation 0.014
RMSD 71.218
Tot-Drift -0.083422
Minimum (in bar) -219.7197
Maximum (in bar) 238.8288

The plot with the pressure distribution of the system can be seen here:

Figure 5: Plot of the pressure distribution of the MD system.

As you can see on Figure 5, the pressure in the system is most of the time about 1, but there a big outliers with 250 and -250 bar. So therefore we are not sure, if a protein can work with such a pressure.

Temperature

Average (in K) 297.94
Error Estimation 0.0029
RMSD 0.944618
Tot-Drift 0.00834573
Minimum (in K) 294.63
Maximum (in K) 300.83

The plot with the temperature distribution of the system can be seen here:

Figure 6: Plot of the temperature distribution of the MD system.

As you can see on Figure 6, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maximum temperature is only about 6 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relatively cold for a protein to work, because the temperature in our bodies is about 36°C.

Potential

Average (in kJ/mol) -1.28165e+06
Error Estimation 100
RMSD 1080.9
Tot-Drift -714.814
Minimum (in kJ/mol) -1.2852e+06
Maximum (in kJ/mol) -1.2771e+06

The plot with the potential energy distribution of the system can be seen here:

Figure 7: Plot of the potential energy distribution of the MD system.

As can be seen on Figure 7, the potential energy of the system is between -1.2771e+06 and -1.2852e+06, which is a relatively low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.

Back to [Tay-Sachs Disease].

Total energy

Average (in kJ/mol) -1.0519e+06
Error Estimation 100
RMSD 1322.68
Tot-Drift -708.38
Minimum (in kJ/mol) -1.0557e+06
Maximum (in kJ/mol) -1.0463e+06

The plot with the total energy distribution of the system can be seen here:

Figure 8: Plot of the total energy distribution of the MD system.

As we can see on Figure 8 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.05e+06 and -1.051e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.

Back to [Tay-Sachs Disease].

minimum distance between periodic boundary cells

Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.

Here you can see the result of this analysis.


Figure 9: Plot of the minimum distance between periodic boundary cells.
Average (in nm) 2.415
Minimum 1.408
Maximum 4.096

As you can see on Figure 9, there is a huge difference between the different time steps and distances. The highest distance is 4.096 nm, whereas the smallest distance is only 1.408 nm. Therefore, there are some states during the simulation in which atoms are close together if they interact and there are some states in which the atoms who interact are far away. Because of the huge bandwidth of minimum distance we can conclude, that the protein is flexible

Back to [Tay-Sachs Disease].

RMSF for protein and C-alpha and PyMol analysis of average and b-factor

Protein

First of all, we calculate the RMSF for the whole protein.

The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexible regions of the protein are not in accordance with the original PDB file.

To compare the structure we align them with PyMol with the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 10: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 11: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 12: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
Perspective two
Figure 13: Alignment of the original structure (green) and the calculated average structure (turquoise)
Figure 14: Alignment of the original structure (green) and the calculated structure with high B-Factor values (turquoise)
Figure 15: Alignment of the structure with high B-Factor values (red) and the calculated average structure (blue)
RMSD
1.525 0.348 1.671

The structure with the high B-factors is the most similar structure compared to the original structure from PDB (Figure 11 and Figure 13). The average structure is not that similar (Figure 10 and Figure 13). But we know, that the regions with high B-factor values are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 16:

Figure 16: Plot of the RMSF values over the whole protein.

There are two regions with very high B-factor values. One region at position 150 (Figure 17), and the other region at position 440 (Figure 18). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.

Figure 17: Part of the alignment between the original structure and the average structure between residue 140 and 160.
Figure 18: Part of the alignment between the original structure and the average structure between residue 430 and 450.

Furthermore, we visualized the B-factors with the PyMol selection B-factor method. We calculated the B-factors for the blue protein (Figure 19 and Figure 20). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.

Figure 19: Part of the alignment between the original structure and the average structure between residue 140 and 160. High B-Factor value -> bright color
Figure 20: Part of the alignment between the original structure and the average structure between residue 430 and 450. High B-Factor value -> bright color

In the second picture, you can see, that the color is dark blue. Therefore a peak lower than 0.3 do not mean that there is high flexibility. Hence, our protein has only one very flexible region and this is around residue 140.


As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have very different positions and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. This also explains the relatively high RMSD values, because of the different positions of the flexible parts of the protein.

Back to [Tay-Sachs Disease].

C-alpha

Now we repeat the analysis, done for the protein, for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.

To compare the structure we align them with PyMol to the original structure.

original & average original & B-Factors average & B-Factors
Perspective one
Figure 21: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 22: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 23: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
Perspective two
Figure 24: Alignment of the original structure (green) and the calculated average structure of the c-alpha atoms (turquoise)
Figure 25: Alignment of the original structure (green) and the calculated structure with high B-Factor values of the c-alpha atoms (turquoise)
Figure 26: Alignment of the structure with high B-Factor values of the c-alpha atoms (red) and the calculated average structure of the c-alpha atoms (blue)
RMSD
1.324 0.277 1.334

The structure alignments and the according RMSD delivers the same results as in the section above. The RMSD of the alignment between the structure with high B-factor and the original one is the smallest one which indicates that this structures align best (Figure 22 and Figure 25). This was expected, because we used twice the same model, but in this case we neglected the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 21, Figure 23, Figure 24 and Figure 26) have nearly the same RMSD value and therefore there are equally.

Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 27:

Figure 27: Distribution of the b-factor values by only regarding the backbone of the protein.

In this case, there is only one high peak at position 150. Having a closer look at the protein it can be seen that the position of the beta sheets differ extremely between the two models. The other peak at position 440 could not be found in the plot. Looking at the pictures above, we can see that the backbones of the two different models not differ extremely. This means that the position of the residues differ a lot, which is not important for us, because we do not regard side chains.

Back to [Tay-Sachs Disease].

Radius of gyration

Next, we want to analyze the radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.

Figure 28: Plot with the distribution of the radius of gyration over time
Figure 29: Plot with the distribution of the radius of gyration over time for each axis
Rg (in nm) RgX (in nm) RgY (in nm) RgZ (in nm)
Average 2.408 2.094 1.853 1.929
Minimum 2.344 1.986 1.581 1.618
Maximum 2.436 2.179 2.102 2.212

Figure 28 shows the radius of gyration over the simulation time. The radius of gyration is the RMS distance from the outer parts of the protein to the protein center or gyration axis. The plot displays that the average radius is about 2.4 with some fluctuation. This indicates that the protein is flexible. Furthermore, the fluctuation is a periodic curve which shows the loss and the gain of space the protein needs. This suggest that the protein pulsates.

If we have a further look at the radius of the different axis (Figure 29), we can see, that the radius of the x coordinates is the only consistent one at about 2nm during the simulation. The radius of the z axis shows deflection at the end of the simulation where it decreases. Especially at the end of the simulation, the Rg values for z are very low. The y axis values, however, increase during the simulation, but do not reach the values of the x axis. This shows that the motions in x direction, the motion in z direction at the beginning and motion in y direction at the end of the simulation has most influence on the whole gyration radius.

Back to [Tay-Sachs Disease].

solvent accessible surface area

Next, we analyzed the solvent accessible surface area of the protein, which is the area of the protein which has contacts with the surrounding environment, mainly water.

First of all, we have a look at the solvent accessibility of each residue, which can be seen on Figure 27.

Figure 30: Solvent accessibility area of each residue in the protein
Figure 31: Solvent accessibility area of each residue in the protein with standard deviation

The following table list the average, minimum and maximum values of the solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.

Average (in nm²) 0.553
Minimum (in nm²) 0.003
Maximum (in nm²) 2.005

The average area per residue during the trajectory is between 0 and 2nm², which can be seen on Figure 30. Most of the residues have an area about 0.5nm². From this it follows that there are mainly sparse moving residues during the complete simulation with some exceptions where the residues are very flexible. In Figure 31, you can additionally see the standard deviation, which is very low and which indicates that there are no big outliers in there. This means that there is no big deviation from the average area and that the residues behave in the same way during the trajectory.

Besides, we can analysis the position of the residues within the protein based on the solvent accessibility. First, we can see in the Figure 30 that the first 100 and the last 100 residues have an average solvent accessibility of 0 which means that these residues are always completely in the interior of the protein. Most of the residues have a solvent accessibility about 0.5nm², and there are only some outliers with an accessibility of more than 1.5nm². This means that there are some residues which are almost always on the surface, a lot of residues which are partly or temporarily on the surface and a lot of residues which are never on the surface. Looking at Figure 31, we can see that the standard deviation is relatively low. This means that there are no system states in which any residues with low or no solvent accessibility get complete accessible to the surface. If the standard deviation would be very high, it would indicate that there are some very unusual states in the simulation which is not the case in our simulation.

Furthermore, it is possible to look at the solvent accessibility of each atom of the complete protein, which can be seen in Figure 32 and Figure 33.

Figure 32: Solvent accessibility of each atom of the complete protein.
Figure 33: Solvent accessibility of each atom of the complete protein with standard deviation.

As before, the residues at the beginning and the end with a value of 0 are ignored.

Average (in nm²) 0.032
Minimum (in nm²) 0
Maximum (in nm²) 0.558

In Figure 32 the average area per atom is plotted, which deliver similar results to Figure 30. In general the atoms have not such a big area as the residues. This can be explained easily because the residue area is consisting of the single atom areas which belong to this residue. There are a huge number of atoms which have an area of about 0nm². As before, the standard deviation is not that high (Figure 33). It is a little bit higher than than the one in Figure 31 which was expected, because of the smaller and more detailed scale of this Figure. In general Figure 32 and Figure 33 confirm the results of Figure 30 and Figure 31.

At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result for the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. However, there are a lot of atoms with low or no accessibility area in the plot. Gromacs is a non-deterministic algorithm and that is why this result should be consistent with the results for the different residues.


Figure 34 shows how much of the area of the protein is accessible to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.

Figure 34: Area of the protein which is accessible to the surface during the simulation.
Figure 35: Area of the protein which is accessible to the surface during the simulation with standard deviation.


Average (in nm²) 138.727
Minimum (in nm²) 127.066
Maximum (in nm²) 146.571

Figure 34 and Figure 35 display the solvent accessibility surface of the whole protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 140nm², which is relatively consistent during the complete simulation. The second plot describes the solvent accessibility for different physicochemical properties. It shows that the accessibility of the hydrophobic amino acids is larger than of the hydrophilic amino acids which is unexpected. Normally, hydrophobic amino acids prefer a location in the core of the protein and not on the surface.

Back to [Tay-Sachs Disease].

hydrogen-bonds

As a next step we analysis the formed hydrogen bonds within the protein during the simulation. Here, we differ between hydrogen-bonds between the protein itself and bonds between the protein and the water.

The following plots display the number of hydrogen bonds within the protein over the simulation time.

Figure 36: Number of hydrogen-bonds in the protein over simulation time
Figure 37: Number of hydrogen-bonds and the possible hydrogen-bonds in the protein over simulation time
bonds in the protein possible bonds in the protein
Average 319.787 1534.866
Minimum 292 1483
Maximum 356 1584

In Figure 36 you can see the bonds within the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 310 and 330 hydrogen-bonds. Besides, the number of bonds declines a bit in average over the simulation time. Furthermore, it is possible to see in this plot, that the protein is flexible, because the number of bonds fluctuate extremely over the time.

Figure 37 displays the number of hydrogen bonds that occur during the simulation as well as all residue pairs with a distance smaller than 0.35nm which is the distance where a hydrogen bond is theoretically possible. This plot shows that there exist much more possible hydrogen bindings than occurred in real. Here the number of possible pairs is about 1500 whereas the number of formed hydrogen bond is only between 320 and 330 which is only about 20%. The small number of formed hydrogen bonds can indicate the high protein's flexibility.

The following plots display the number of hydrogen bonds between the protein and the surrounding water over the simulation time.

Figure 38: Number of hydrogen-bonds between the protein and the surronding water.
Figure 39: Number of hydrogen-bonds and the possible hydrogen-bonds between the protein and the surrounding water.
bonds between protein and water possible bonds between protein and water
Average 853.403 999.847
Minimum 778 905
Maximum 907 1106

Looking at the number of hydrogen bonds formed between the protein and the surrounding water, which is visualized in Figure 38, we can see that there exist much more bonds between protein and water than within the protein. The number differs between 800 and 900 which is about 3 times more than the number within the protein. Besides, the average number of bonds between water and protein increases a bit over the simulation time. However, most of the time, the protein forms between 840 and 860 bonds with the surrounding water.

Figure 39 displays additionally the number of residue pairs with a distance less than 0.35 nm which is the distance where a hydrogen bond is theoretically possible. The number of pairs within 0.35nm is about 1000. Compared to Figure 34 the distance of possible and real occurring hydrogen bonds is significantly lower. In this case it almost 80% of all possible hydrogen bonds are also real hydrogen bonds. Therefore, we can see that the binding between protein and water is really stable.

This is no surprise, because every residue on the surface has contact with water, whereas within the protein there are a lot of amino acids which have no contact partners, because of the big underlying distance to another amino acid.

Back to [Tay-Sachs Disease].

Ramachandran plot

Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyze the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.

Figure 40: Ramachandran Plot of the wild type.

As we can see on Figure 40, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which are consistent to the white regions of the standard ramachandran plot.

Back to [Tay-Sachs Disease].

RMSD matrix

Next we analyzed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.

The following matrix shows the RMSD values of our structures.

Figure 41: RMSD matrix of our structures during the simulation

As you can see on Figure 41, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Therefore, we can conclude, that we do not find very similar structures during the simulation and our protein shows different structures by moving around.

Back to [Tay-Sachs Disease].

cluster analysis

Next, we started a cluster analysis. First of all, we found 231 different clusters.

We visualized all of these cluster structures in Figure 42:

Figure 42: Visualization of the 231 different clusters

Next we aligned some structures of the cluster and measured the RMSD:

Cluster 1 Cluster 2 RMSD
cluster 1 cluster 2 0.880
cluster 1 cluster 5 0.068

The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. Therefore, we can see that the different structures of the simulation are very similar.

To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure 40.

Figure 43: Distribution of the RMSD value over the different clusters

In Figure 43, it is possible to see, that the distribution is a Gaussian distribution, with the highest peak at 0.18 Angstrom. This means, that most of the structures have a RMSD about 0.18 Angstrom compared to the start structure. This value is not that high, but it is a strong hint, that the protein is flexible during the simulation.

Back to [Tay-Sachs Disease].

internal RMSD

The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.

Figure 44: Plot of the distance RMS values in the protein.
Average (RMSD in nm) 0.242
Minimum (RMSD in nm) 0.141
Maximum (RMSD in nm) 0.409

Figure 44 shows that the RMSD increases consistent during the whole simulation. At the beginning the RMSD is relatively small and then arises very fast till it reaches a point where it rises slower. Only at Time 6000 there is a valley in the plot. The internal RMSD reaches at the end about 0.25 Angstrom, which is not relatively high. Therefore the protein has a big distances to itself.

Back to [Tay-Sachs Disease].