Difference between revisions of "MD Mutation485"
(→solvent accesible surface area) |
(→solvent accesible surface area) |
||
Line 456: | Line 456: | ||
{| |
{| |
||
− | |[[Image: |
+ | |[[Image:mut485_md_solv_acc_atomic2.png|thumb|center|Figure 29: Solvent accesibility of each atom of the complete protein.]] |
− | |[[Image: |
+ | |[[Image:mut485_md_solv_acc_atomic.png|thumb|center|Figure 30: Solvent accesibility of each atom of the complete protein with standard deviation.]] |
|} |
|} |
||
Line 463: | Line 463: | ||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
− | |Average (in nm |
+ | |Average (in nm²) |
|0.032 |
|0.032 |
||
|- |
|- |
||
− | |Minimum (in nm |
+ | |Minimum (in nm²) |
|0 |
|0 |
||
|- |
|- |
||
− | |Maximum (in nm |
+ | |Maximum (in nm²) |
|0.561 |
|0.561 |
||
|- |
|- |
||
|} |
|} |
||
− | Figure 29 |
+ | In Figure 29 the average area per atom is ploted, which deliver similar results to Figure 27. In general the atoms have not such a big area as the residues. This can be explained easily because the residue area is consisting of the single atom areas which belong to this residue. |
+ | There are a huge number of atoms which have an area of about 0nm². As before, the standard deviation is not that high (Figure 30). It is a little bit higher than than the one in Figure 28 which was expected, because of the more smaller and detailed scale of this Figure. In general Figure 29 and Figure 30 confirm the results of Figure 27 and Figure 28. |
||
+ | |||
+ | At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result for the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. However, there are a lot of atoms with low or no accessibility area in the plot. Gromacs is a non-deterministic algorithm and that is why this result should be consistent with the results for the different residues. |
||
{| |
{| |
Revision as of 16:14, 25 September 2011
Contents
- 1 check the trajectory
- 2 Visualize in pymol
- 3 Create a movie
- 4 energy calculations for pressure, temperature, potential and total energy
- 5 minimum distance between periodic boundary cells
- 6 RMSF for protein and C-alpha
- 7 Radius of gyration
- 8 solvent accesible surface area
- 9 hydrogen-bonds
- 10 Ramachandran plot
- 11 RMSD matrix
- 12 cluster analysis
- 13 internal RMSD
check the trajectory
We checked the trajectory and got following results:
Reading frame 0 time 0.000 # Atoms 96545 Precision 0.001 (nm) Last frame 2000 time 10000.000
Furthermore, we got some detailed results about the different items during the simulation.
Item | #frames | Timestep (ps) |
Step | 2001 | 5 |
Time | 2001 | 5 |
Lambda | 0 | - |
Coords | 2001 | 5 |
Velocities | 0 | - |
Forces | 0 | - |
Box | 2001 | 5 |
The simulation finished on node 0 Thu Sep 15 19:12:47 2011.
Time | ||
Node (s) | Real (s) | % |
22336.000 | 22336.000 | 100% |
6h12:00 |
The complete simulation needs 6 hours and 12 minutes to finishing.
Performance | |||
Mnbf/s | GFlops | ns/day | hour/ns |
1277.617 | 93.808 | 38.682 | 0.620 |
As you can see in the table above, it takes about 2/3 hour to simulate 1 ns of the system. So therefore, it would be possible to simulate about __ns in one complete day calculation time.
Back to [Tay-Sachs Disease].
Visualize in pymol
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
Group 1 | Group 2 |
System | Water |
Protein | Ion |
Backbone | NA |
MainChain+H | CL |
SideChain |
Figure 1 shows the visualization with ngmx:
Furthermore, we also want to visualise the structure with pymol, which can be seen on Figure 2.
Back to [Tay-Sachs Disease].
Create a movie
Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continouse trajectories.
The program asks for the a group as output. We only want to see the protein, therefore we decided to choose group 1.
Here you can see the movie in stick line and cartoon modus.
On Figure 3 and Figure 4, we can see that motion of the protein over time, which was created by the MD simulation.
Back to [Tay-Sachs Disease].
energy calculations for pressure, temperature, potential and total energy
Pressure
Average (in bar) | 0.998385 |
Error Estimation | 0.0058 |
RMSD | 71.0317 |
Tot-Drift | -0.0436306 |
Minimum (in bar) | -230.0158 |
Maximum (in bar) | 243.7419 |
The plot with the pressure distribution of the system can be seen here:
As you can see in Figure 5, the pressure fluctuates in the system around 0 and the amplitude does mostly not exceed 100. Contrary,there are some outlier which reach values of almost 250 and -250 bar. In this cases we are not sure, if a protein works under such a high pressure.
Temperature
Average (in K) | 297.936 |
Error Estimation | 0.0045 |
RMSD | 0.940566 |
Tot-Drift | 0.00654126 |
Minimum (in K) | 294.99 |
Maximum (in K) | 301.08 |
The plot with the temperature distribution of the system can be seen here:
Figure 2 displays the distribution of the temperature in the current MD system. In this case the temperature fluctuates around 298K. The maximal occuring amplitude between the average and the outlier temperature is only about 4 K which is not much. But we have to keep in mind, that only some degree difference can affect huge functional loss of a protein. 298 K corresponds to 25°C, which is relatively low for human protein activity, because the highest activity is normally reached at body temperature which is about 36°C.
Potential
Average (in kJ/mol) | -1.28176e+06 |
Error Estimation | 85 |
RMSD | 1068.67 |
Tot-Drift | -536.314 |
Minimum (in kJ/mol) | -1.28513e+06 |
Maximum (in kJ/mol) | -1.27769e+06 |
The plot with the potential energy distribution of the system can be seen here:
Figure 3 displays the potential engery of the system which is between -1.28513e+06 and -1.27769e+06. This is a relativly low energy. Therefore, this probably means that the protein is stable. Furthermore, we can suggest, that the protein with such a low energy is able to function and is stable. Therefore, our simulation is probably true. Otherwise, if the energy of the simulated system would be too high, we can not trust the results, because the protein is too instable to work.
Total energy
Average (in kJ/mol) | -1.05203e+06 |
Error Estimation | 83 |
RMSD | 1308.04 |
Tot-Drift | -531.275 |
Minimum (in kJ/mol) | -1.05687e+06 |
Maximum (in kJ/mol) | -1.04680e+06 |
The plot with the total energy distribution of the system can be seen here:
Looking at Figure 4, we can see that the total energy deviates a little from the potential energy which is minimal lower. In this case, the total energy is between -1.055e+06 and -1.045e+06. These values lie already in a range, where we can suggest that the protein energy is sufficient low so that this one can work.
minimum distance between periodic boundary cells
Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.
Here you can see the result of this analysis:
Average (in nm) | 3.215 |
Minimum | 1.772 |
Maximum | 4.217 |
Figure 6 displays the minimum distance between periodic boundary cells at different time steps. I can be seen that there are huge differences between the distances at different times. The highest distance is 4.217 nm, whereas the smallest distance is only 1.772 nm. This means that there are some states during the simulation in which the interacting atoms are close together. Contrary there are some states in which the atoms who interact are far away. Because of the huge bandwidth of minimum distance we can conclude, that the protein is flexible.
Back to [Tay-Sachs Disease].
RMSF for protein and C-alpha
Protein
First of all, we calculate the RMSF for the whole protein.
The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.
To compare the structure we align them with Pymol with the original structure.
original & average | original & B-Factors | average & B-Factors |
Perspective one | ||
Perspective two | ||
RMSD | ||
1.519 | 0.349 | 1.727 |
The RMSD as well as the structure alignement visualization indicates which structures are similar. In this case, the small RMSD at 0.349 as well as the structure alignment where the structures mostly agree indicates that the high B-factors agrees mostly with the original structure (Figure 8 and 11). The alignment of the original structure and the avarage structure delivers a higher disagreement (Figure 10 and Figure 12). There, the RMSD with 1.519 is much higher and the structure alignment is not extremly worse than the other one. The regions with high B-Factors are usually very flexible. This means that the downloaded PDB structure is probably in another state, because of its flexible regions. The low RMSD and the agreeing structure alignment of the gigh B-factor structure and the original one displays that the qualitiy of predicted structure is quite good.
Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 13:
There are two regions with very high RMSF values. The first one is at position 150 (Figure 14), and the other one is at position 350 (Figure 15). The structure alignment of the average structure and the original one show that the most regions match with some exceptions. Therefore, we want to analysis, if these bad-machting regions are the regions with very high B-factor values. The following picture show the two regions around position 150 and 350 which have an arised RMSF.
Furthermore, we visualized the B-factors with the Pymol selection B-factor method at this two regions. We calculated the B-factors for the blue protein (Figure 16 and Figure 17). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
The first picture displays the alignment around region 150. Here, we can see that the loop is colored in yellow and red. This bright colors shows that the B-factor at this position is very high which indicates a higher flexibility. Contrary, the second picture for the region around residue 350 show only a small part in yellow and some parts in tourqouis whereas the rest is colored in dark blue. Furthermore, the second peak is about 0.35 which does not stand for a high flexibility. All in all, this detailed views assume that thers is only one region in the protein with a high flexibility.
As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have a very different position and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. That is also the reason for the relatively high RMSD which is caused by different position of the flexible parts of the protein.
C-alpha
Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
To compare the structure we align them with pymol with the original structure.
original & average | original & B-Factors | average & B-Factors |
Perspective one | ||
Perspective two | ||
RMSD | ||
1.258 | 0.283 | 1.297 |
As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 19 and Figure 22). This was expected, because we used twice the same model, but in this case we neglected the residues of the atoms whereas the backbone of the protein remains the same. The other two models (Figure 18, Figure 20, Figure 21 and Figure 23) have nearly the same RMSD value and therefore there are equally.
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 24:
In this case, there is only one high peak at position 150. Having a closer look at the protein it can be seen that the position of the beta sheets differ extremly between the two models. The other peak at position 350 could not be found in the plot. Looking at the pictures above, we can see that the backbones of the two different models not differ extremely. This means that the position of the residues differ a lot, which is not important for us, because we do not regard side chains.
Radius of gyration
Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
Rg (in nm) | RgX (in nm) | RgY (in nm) | RgZ (in nm) | |
Average | 2.416 | 2.145 | 1.630 | 2.094 |
Minimum | 2.347 | 1.992 | 1.444 | 1.809 |
Maximum | 2.449 | 2.212 | 1.927 | 2.219 |
- Hier noch ändern **********************
Figure 25 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs.
solvent accesible surface area
Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.
First of all, we have a look at the solvent accesibility of each residue, which can be seen on Figure 27. Furthermore, we regard at the solvent accesibility area of each residue in the protein with standard deviation (Figure 28).
The following table list the average, minimum and maximum values of the Solvent accessibility for each residue in the protein. The residues at the beginning and at the end of the simulation which have a value of 0 are ignored.
Average (in nm²) | 0.542 |
Minimum (in nm²) | 0.007 |
Maximum (in nm²) | 2.014 |
The average area per residue during the trajectory is between 0 and 2nm², as can be seen on Figure 27. Most of the residues have an area about 0.5nm². From this it follows that there are mainly sparse moving residues during the complete simulation with some exceptions where the residues are very flexible. In Figure 28, you can see additional the standard deviation, which is very low and which indicates that there are no big outliers in there. This means that there is no big deviation from the average area and that the residues behave in the same way during the trajectory.
Besides, we can analysis the position of the residues within the protein based on the solvent accesibility. First, we can see in the Figure 27 that the first 100 and the last 100 residues have an average solvent accessibility of 0 which means that these residues are always completely in the interior of the protein. Most of the residues have a solvent accessibility about 0.5nm², and there are only some outlier with an accessibility of more than 1.5nm². This means that there are some residues which are almost always on the surface, a lot of residues which are partly or temporarly on the surface and a lot of residues which are never on the surface. Looking at Figure 28, we can see that the standard deviation is relatvely low. This means that there are no system states in which any residues with low or no solvent accessibility get complete accessible to the surface. If the standard deviation would be very high, it would indicate that there are some very unusual states in the simulation which is not the case in our simulation.
Furthermore, it is possible to look at the solvent accesibility of each atom of the complete protein, which can be seen in Figure 29 and Figure 30.
As before, we only regard the atoms with an area of more than 0 in the following table.
Average (in nm²) | 0.032 |
Minimum (in nm²) | 0 |
Maximum (in nm²) | 0.561 |
In Figure 29 the average area per atom is ploted, which deliver similar results to Figure 27. In general the atoms have not such a big area as the residues. This can be explained easily because the residue area is consisting of the single atom areas which belong to this residue. There are a huge number of atoms which have an area of about 0nm². As before, the standard deviation is not that high (Figure 30). It is a little bit higher than than the one in Figure 28 which was expected, because of the more smaller and detailed scale of this Figure. In general Figure 29 and Figure 30 confirm the results of Figure 27 and Figure 28.
At the end of the plot, there are a lot of atoms which have a surface accessibility area of 0, which is consistent with the result for the residues. But at the beginning of the plot, there are no atoms which have no surface accessibility area. However, there are a lot of atoms with low or no accessibility area in the plot. Gromacs is a non-deterministic algorithm and that is why this result should be consistent with the results for the different residues.
Average (in nm^2) | 135.452 |
Minimum (in nm^2) | 129.167 |
Maximum (in nm^2) | 142.977 |
- Hier noch ändern **********************
So we can see in the pictures there are a lot of differences between the solvent accesibility of the protein's surface, which was expected. If we have a closer look at the plot about the atomic accesibility and the residue accesibility, we can see, that both pictures agree. During the simulation there is a big differences in the accessibility of the protein. There is a area between 131 and 146 nm/2S/N during the simulation, which is a big difference. Therefore, we can see that the protein has to be really flexible, otherwise, these different values could not be explained.
hydrogen-bonds
In this case, we differ between hydrogen-bonds between the protein itselfs and bonds between the protein and the water.
As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differ extremely over the time.
On Figure 30 you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 320 and 330 hydrongen-bonds.
bonds in the protein | possible bonds in the protein | |
Average | 323.337 | 1543.024 |
Minimum | 300 | 1491 |
Maximum | 354 | 1602 |
- Hier noch ändern **********************
If we have a look at the number of bonds between the protein and the water, which are visualized on Figure 31, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 860 bonds with the water.
bonds between protein and water | possible bonds between protein and water | |
Average | 847.965 | 999.310 |
Minimum | 783 | 882 |
Maximum | 912 | 1126 |
- Hier noch ändern **********************
This is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.
Ramachandran plot
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyse the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
- Hier noch ändern **********************
As we can see on Figure ?, there are a lot of beta sheets, alpha helices and right-handed alpha helices. The white regions are the regions where no secondary structure can be found, which is right.
RMSD matrix
Next we analysed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
The following matrix shows the RMSD values of our structures.
- Hier noch ändern **********************
As you can see in the picture above, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Only near on the diagonal there are some structures which are colored in cyan, but in general, most of the structures are colored in green which means a RMSD value of about 0.15 Angstrom. So we can see that there are differences of the protein structure during the simulation and therefore we can conclude that this protein is very flexible.
cluster analysis
Next, we started a cluster analysis. First of all, we found 225 different clusters.
We visualized all of these cluster structures:
Next we aligned some structures of the cluster and measured the RMSD:
Cluster 1 | Cluster 2 | RMSD |
cluster 1 | cluster 2 | |
cluster 1 | cluster 5 | |
cluster 30 | cluster 100 |
- Hier noch ändern **********************
The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. But if we align structures which are far away in number, the RMSD value increase, which shows us that over the simulation time the differences between the start structure and the simulated structures increase. Therefore, we can see that the different structures of the simulation are very similar, but over the time they change more and more.
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure ?.
- Hier noch ändern **********************
On Figure ?, it is possible to see, that the distribution is a gaussian distribution, with the highest peak at 0.2 Angstrom. This means, that most of the structures have a RMSD about 0.2 Angstrom compared to the start structure. This value is relatively high, especially if we keep in mind, that both structure are the same structure, and only for one structure the motion over the time was simulated.
internal RMSD
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
Average (RMSD in nm) | 0.243 |
Minimum (RMSD in nm) | 4.906e-07 |
Maximum (RMSD in nm) | 0.289 |