Difference between revisions of "MD Mutation436"
(→solvent accesible surface area) |
(→Radius of gyration) |
||
Line 340: | Line 340: | ||
Figure 25 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. |
Figure 25 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs. |
||
+ | |||
+ | If we have a look also to the radius of the different axis, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less different between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that most of the Radius of gyration is because of motions in x and z direction and not so much about motions in y direction. |
||
=== solvent accesible surface area === |
=== solvent accesible surface area === |
Revision as of 20:01, 19 September 2011
Contents
- 1 check the trajectory
- 2 Visualize in pymol
- 3 create a movie
- 4 energy calculations for pressure, temperature, potential and total energy
- 5 minimum distance between periodic boundary cells
- 6 RMSF for protein and C-alpha
- 7 Pymol analysis of average and bfactor
- 8 Radius of gyration
- 9 solvent accesible surface area
- 10 hydrogen-bonds
- 11 Ramachandran plot
- 12 RMSD matrix
- 13 cluster analysis
- 14 internal RMSD
check the trajectory
We checked the trajectory with following command:
gmxcheck -f mut436_md.xtc
With the command we got following results:
Reading frame 0 time 0.000 # Atoms 96555 Precision 0.001 (nm) Last frame 2000 time 10000.000
Furthermore, we got some detailed results about the different items during the simulation.
Item | #frames | Timestep (ps) |
Step | 2001 | 5 |
Time | 2001 | 5 |
Lambda | 0 | - |
Coords | 2001 | 5 |
Velocities | 0 | - |
Forces | 0 | - |
Box | 2001 | 5 |
The simulation finished on node 0 Fri Aug 26 08:40:07 2011.
Time | ||
Node (s) | Real (s) | % |
34860.474 | 34860.474 | 100% |
9h41:00 |
The complete simulation needs 9 hours and 41 minutes to finishing.
Performance | |||
Mnbf/s | GFlops | ns/day | hour/ns |
818.560 | 60.105 | 24.785 | 0.968 |
As you can see in the table above, it takes about 1 hour to simulat 1ns of the system. So therefore, it would be possible to simulate about 25ns in one complete day calculation time.
Visualize in pymol
First of all, we visualized the simulation with with ngmx, because it draws bonds based on the topology file. ngmx gave the user the possibility to choose different parameters. Therefore, we decided to visualize the system with following parameters:
Group 1 | Group 2 |
System | Water |
Protein | Ion |
Backbone | NA |
MainChain+H | CL |
SideChain |
Figure 1 shows the visualization with ngmx:
create a movie
Next, we want to visualize the protein with pymol. Therefore, we extracted 1000 frames from the trajectory, leaving out the water and jump over the boundaries to make continuse trajectories. Therefore, we used following command:
trjconv -s fole.tpr -f file.xtc -o output_file.pdb -pbc nojump -dt 10
The program asks for the a group as output. We only want to see the protein, therefore we decided to use group 1.
Todo: film und filtered
energy calculations for pressure, temperature, potential and total energy
Temperature
Average (in K) | 297.94 |
Error Estimation | 0.0029 |
RMSD | 0.944618 |
Tot-Drift | 0.00834573 |
The plot with the temperature distribution of the system can be seen here:
As you can see on Figure 2, most of the time the system has a temperature about 298K. The maximal difference between this average temperature and the minimum/maxmimum temperature is only about 4 K, which is not that high. But we have to keep in mind, that only some degree difference can destroy the function of a protein. 298 K is about 25°C, which is relativly cold for a protein to work, because the temperature in our bodies is about 36°C.
Potential
Average (in kJ/mol) | -1.28165e+06 |
Error Estimation | 100 |
RMSD | 1080.9 |
Tot-Drift | -714.814 |
The plot with the potential energy distribution of the system can be seen here:
As can be seen on Figure 3, the potential engery of the system is between -1.282e+06 and -1.281e+06, which is a relativly low energy. Therefore this means that the protein is stable. So we can suggest, that the protein with such a low energy is able to function and is stable and therefore, our simulation could be true. Otherwise, if the energy of the simulated system is too high, we can not trust the results, because the protein is too instable to work.
Total energy
Average (in kJ/mol) | -1.0519e+06 |
Error Estimation | 100 |
RMSD | 1322.68 |
Tot-Drift | -708.38 |
The plot with the total energy distribution of the system can be seen here:
As we can see on Figure 4 above, the total energy of the protein is a little bit higher than the potential energy of the protein. In this case, the energy is between -1.05e+06 and -1.051e+06. But these values are already in a range, where we can suggest that the energy of the protein is low enough so that this one can work.
Pressure
Average (in bar) | 1.0066 |
Error Estimation | 0.014 |
RMSD | 71.218 |
Tot-Drift | -0.083422 |
The plot with the pressure distribution of the system can be seen here:
As you can see on Figure 5, the pressure in the system is most of the time about 1, but there a big outlier with 250 and -250 bar. So therefore we are not sure, if a protein can work with such a pressure.
minimum distance between periodic boundary cells
Next we try to calculate the minimum distance between periodic boundary cells. As before, the program asks for one group to use for the calculation and we decided to use only the protein, because the calculation needs a lot of time and the whole system is significant bigger than only the protein. So therefore, we used group 1.
Here you can see the result of this analysis.
As you can see on Figure 6, there is a huge difference between the different time steps and distances. The highest distance is up to 4 nm, whereas the smallest distance is only about 1nm. Therefore, we can see that the protein is very flexible over the time.
RMSF for protein and C-alpha
Protein
First of all, we calculate the RMSF for the whole protein.
The analysis produce two different pdb files, one file with the average structure of the protein and one file with high B-Factor values, which means that the high flexbile regions of the protein are not in accordance with the original PDB file.
To compare the structure we align them with pymol with the original structure.
original & average | original & B-Factors | average & B-Factors |
Perspective one | ||
Perspective two | ||
RMSD | ||
1.525 | 0.348 | 1.671 |
The structure with the high B-factors is the most similar structure (Figure 8 and Figure 11) compared with the original structure from PDB (Figure 7 and Figure 10). The average structure is not that similar (Figure 10 and Figure 12). But we know, that the regions with high B-Factors are very flexible, and therefore in the structure downloaded from the PDB, the protein is in another state, because of its flexible regions. Therefore, because of the low RMSD between the high B-factors structure and the original structure we can see, that the simulation predicts the structure quite good.
Furthermore, we got a plot of the RMSF values of the protein, which can be seen in Figure 13:
There are two regions with very high B-factor values. One region at position 150 (Figure 14), and the other region at position 440 (Figure 15). If we compare the picture of the original and the average structure, we can see that most of the regions build a very good alignment, whereas some regions vary in their position. Therefore, we want to compare, if these regions are the regions with very high B-factor values.
Furthermore, we visualized the B-factors with the pymol selection B-factor method. We calculated the B-factors for the blue protein (Figure 16 and Figure 17). If you see red, this part of the protein is very flexible. The brighter the color, the higher is the flexibility of this residue.
In the second picture, you can see, that the color is dark blue. Therefore a peak lower than 0.3 do not mean that there is high flexibility. Therefore, our protein has only one very flexible region and this is around residue 140.
As you can see in the pictures above, especially in the first picture, which is the part with the highest peak in the plot, the structures have a very different position and the alignment in this part of the protein is very bad, although the rest of the alignment is quite good. This also explains the relatively high RMSD values, because of the different positions of the flexible parts of the protein.
C-alpha
Now we repeat the analysis done for the protein for the C-alpha atoms of the protein. Therefore, we followed the same steps as in the section above.
To compare the structure we align them with pymol with the original structure.
original & average | original & B-Factors | average & B-Factors |
Perspective one | ||
Perspective two | ||
RMSD | ||
1.324 | 0.277 | 1.334 |
As in the section above, the RMSD between the structure with high B-factor values and the original structure is the most similar (Figure 19 and Figure 22). This was expected, because we used twice the same model, but in this case we neglecte the residues of the atoms. But the backbone of the protein remains the same. The other two models (Figure 18, Figure 20, Figure 21 and Figure 23) have nearly the same RMSD value and therefore there are equally.
Furthermore, we got a plot of the RMSF values of the protein, which can be seen on Figure 24:
In this case, there is only one high peak at position 150. By observing the whole protein, it was possible to see, that the position of the beta sheet differs extremely between the two models. The other peak at position 440 could not be found in the plot. By a look at the picture above, we can see that the backbone do not differ extremely between the two models. Therefore, in this case the position of the residues has to be very different, which is not important in our case, because we do not regard side chains.
Pymol analysis of average and bfactor
still done - > vllt kein extra kapitel dafuer?
Radius of gyration
Next, we want to analyse the Radius of gyration. Therefore we use g_gyrate and use only the protein for the calculation.
Figure 25 shows the radius of gyration over the simulation time. The Radius of gyration is the RMS distance of its parts from its center or gyration axis. On the plot it could be seen, that the average radius is about 2.4, but there are big differences, which means, that the protein is flexible. The distance between the different parts of the protein and the center differs so the protein seems to pulsate, because there is a periodic curve which shows the loss and the gain of space the protein needs.
If we have a look also to the radius of the different axis, we can see, that the radius of the x coordinates is most similar during the whole simulation. The radius of the z axis shows more deflection than the x coordinates values, but both values are in a similar range. The y axis values, however, is significant lower so therefore, there is less different between the positions of the y axis values. Especially at the end of the simulation, the Rg values for y are very low. So, therefore, we can see, that most of the Radius of gyration is because of motions in x and z direction and not so much about motions in y direction.
solvent accesible surface area
Next, we analysed the solvent accesible surfare area of the protein, which is the area of the protein which has contacts with the surronding environment, mainly water.
First of all, we have a look at the solvent accesibility of each residue, which can be seen on Figure 26.
The average area per residue during the trajectory is between 0 and 2nm^2. Most of the residues have an area about 0.5nm^2. So therefore, there are some very flexible residues, but most of the residues only move a little bit during the complete simulation. In Figure ?, you can also see the standard deviation, which is very low, so therefore in there are no big outlier, which means that there is no big deviation from the average area and the residues behave in the same way during the trajectory. Furthermore, it is also possible to look at the solvent accesibility of each atom of the complete protein, which can be seen on Figure 27.
In Figure ? the average area per atom is ploted, which shows a similar picture as on Figure ?. In general the atoms have not that big area as the residues, which is clear, because the area of the residues consist of the area of the single atoms which belong to this residue. There is a huge number of atoms which have an area of about 0nm^2. As before, the standard deviation is not that high. It is a little bit higher than on Figure ?, but that was expected, because this Figure is more detailed and the scale is smaller, but in general Figure ? and Figure ? confirm the results of Figure ? and Figure ?.
Figure 29 shows how much of the area of the protein is accesibile to the surface during the complete simulation. As we saw before, by the gyration radius of the protein, the values differ during the simulation, which shows, that the protein is flexible.
On Figure ? and Figure ? we can see the solvent accessibility surface of the protein during the simulation. The surface accessibility of the hydrophobic residues has an area of about 140nm^2, which is relatively consistent during the complete simulation. If we have a closer look to the distribution of the different pysiocochemical properties and the surface accessibility of them that the area of the hydrophobic amino acids is larger than the are of the hydrophilic amino acids. This is really surprising, because in general hydrophobic amino acids prefer a location in the core of the protein and not on the surface.
hydrogen-bonds
In this case, we differ between hydrogen-bonds between the protein itselfs and bonds between the protein and the water.
As before, it is possible to see in the plot, that the protein is flexible, because the number of bonds differ extremely over the time.
On Figure 30 you can see the bonds between the protein. Here you can see that the number differs between 300 bonds and 355. Most of the time, the protein has between 320 and 330 hydrongen-bonds.
If we have a look at the number of bonds between the protein and the water, which are visualized on Figure 31, we can see that there are a lot more bonds between protein and water than in between the protein. The number differs between 800 and 900 and there there are about 3 times more bonds between the protein and the water. Over the simulation time, the number of bonds between water and protein grows in average. But most of the time, the protein has between 840 and 860 bonds with the water.
This is no surprise, because every residue on the surface has contact with water, whereas in the protein there are a lot of amino acids which do not have contact partners, because the other amino acids are too far away.
Ramachandran plot
Now, we want to have a closer look to the secondary structure of the protein during the simulation. Therefore, we used a Ramachandran plot to analyse the phi and psi torsion angles of the backbone to get a better understanding of the secondary structure during the simulation.
todo picture
RMSD matrix
Next we analysed the RMSD values. Therefore, we used a RMSD matrix. This is useful to see if there are groups of structures over the simulation that share a common structure. These groups will have lower RMSD values withing their group and higher RMSD values compared to structure which are not in the group.
The following matrix shows the RMSD values of our structures.
As you can see in the picture above, there is one big group which is colored in green, but it is not possible to find any very dense groups which all have a very low RMSD compared to each other. Therefore, we can conclude, that we do not find very similar structures during the simulation and our protein shows different structures by moving around.
cluster analysis
Next, we started a cluster analysis. First of all, we found 231 different clusters.
We visualized all of these cluster structres:
Next we aligned some structures of the cluster and measured the RMSD:
Cluster 1 | Cluster 2 | RMSD |
cluster 1 | cluster 2 | 0.880 |
cluster 1 | cluster 5 | 0.068 |
The RMSD values of the different structures are very similar, which can be seen in the picture above. Furthermore, the RMSD values of the different structures of the clusters are very low. Therefore, we can see that the different structures of the simulation are very similar.
To have a better insight into the distribution of the RMSD value between the different clusters, we visualize the distribution in Figure ?.
internal RMSD
The last point in our analysis is the calculation of the internal RMSD values. This means the distances between the single atoms of the protein, which can us help to obtain the structure of the protein.
As we can see on figure blub, at the beginning of our simulation the RMSD is relatively small, but it decreases almost the complete simulation. Only at Time 6000 there is a vally in the plot. The internal RMSD is in the end at 0.25 Angstorm, which is not relatively high. Therefore the protein is big distances in it self.