Molecular Dynamics Analysis BCKDHA

From Bioinformatikpedia
Revision as of 12:50, 15 September 2011 by Reisinger (talk | contribs) (Internal RMSD)

Contents

Wildtype

A brief check of results

To verified that the simulations finished properly we first use the command

  • gmxcheck -f wt.xtc

How many frames are in the trajectory file and what is the time resolution?

  • frames: 2001
  • time resolution: 5ps

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

  • real time: 9h27:35
  • simulation speed: 25.370 ns/day
  • simulation speed: 107991 years/second

Which contribution to the potential energy accounts for most of the calculations?

  • potential energy: -1.24431e+06

Visualization of results

To get a pdb file to be able to visualize the model with pymol we used the Swiss army knife gromacs tool trjconv:

  • trjconv -s wt.tpr -f wt.xtc -o protein.pdb -pbc nojump -dt 10
Figure1: MD simulation of the movement of BCKDHA

Quality assurance

Energy calculations

To calculate the different energies we used the command:
g_energy -f wtMD.edr -o energy.xvg
After submitting this command we had to choose the energy which should calculated.

  • Pressure: 13
  • Temperature: 12
  • Potential: 9
  • Total Energy: 11

Pressure

Figure2: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 1.01601 0.015 71.2152 -0.0706383

In Figure 2 the pressure of the molecular dynamic system is shown. The average value is 1.0161 bar which is shown in the table above. But as we can see the pressure ranges from about -250 bar to 250 bar. Since there is such a big range of 500 bar we are not sure if this value of 1.0161 bar is the equilibrium which should be reached or only the arithmetic average of all the values.

Temperature

Figure3: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.941 0.0047 0.954498 0.00557078

In Figure 3 the temperature of the MD simulation is shown. As we can see it ranges between 294 K and 302 K so it has a very small deviation of the average value of 297.9 K. Since there is only such a small fluctuation we can see that the temperature in the system is quite stable which means that it reached an equilibrium.

Potential

Figure4: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.24431e+06 66 1041.57 -463.992

Figure 4 shows the potential of the md system. As we can see in the picture the potential ranges from -1.24e+06 kJ/mol to -1.25e+06 kJ/mol. Although this is a very huge range of 10000 we can see that all in all the potential is very low. This low potential indicates that the protein is quite stable. Since the structure of a protein is responsible for the function of a protein this stability is important for the function of the protein.

Total Energy

Figure5: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.02119e+06 65 1279.76 -459.819

The low potential energy already indicated that the total energy of the system has to be quite low. By looking at Figure 5 we can see that the energy is a bit higher than the potential energy but it is still very low. Additionally there is less variation in the energy since it ranges between -1.017e+06 kJ/mol and -1.025e+06 kJ/mol. Again we can say that such a low energy stands for a stable protein which indicates that the simulation was correct.

Minimum distance between periodic boundary cells

It is important to calculate the minimal distances to find out if there are direct interactions. Such interactions could appear if the distances are shorter than the cut off value of electrostatic interactions.

To calculate the minimum distance we used the command
g_mindist -f wtMD.xtc -s wtMD.tpr -od minimal-periodic-distance.xvg -pi
After submitting this command we chose group 1 to calculate the minimum distance for the whole protein.

Figure6: Minimum distance between periodic boundary cells

The shortest periodic distance is 1.40945 nm at time 6090 ps between atoms 25 and 6490. As we can see the values range between 1 nm and 4 nm during the whole simulation. It is interesting that the most variation is in the middle part of the simulation so we can say that in the middle of the simulation. Since the fluctuation of the values correspond with the movement and flexibility of the protein we can say that the protein is very flexible between 4000 ps and 7000 ps. The rest of the time it is also flexible but less than in this period.

Root mean square fluctuations

For calculating the RMSF of a protein each atom of this protein is compared with the calculated average stucture of the protein. By comparing them it is possible to find out how much it varies from its average position and so the flexibility of this region can be calculated. Regions with a high fluctuation are more flexible than regions with a low one.

To calculate the minimum distance we used the command
g_rmsf -f wtMD.xtc -s wtMD.tpr -o rmsf-per-residue.xvg -ox average.pdb -oq bfactors.pdb -res
After submitting this command we had to choose the group we want the RMSF to be calculated for:

  • Protein: 1
  • C-alpha: 3
RMSF for protein RMSF for C-alpha
Figure7: RMSF for protein
Figure8: RMSF for C-alpha

Figure 7 where the RMSF for the whole protein is plotted points out that mainly the beginning of the protein features the most fluctuation. This indicates that this region of the protein is the one with the most flexibility. In the middle part of the protein there are nearly no peaks which means that this part is fixed. In the end there is a peak which is a bit higher than the rest of the protein. This could suggest that this region is also a bit flexible.
In Figure 8 we did the RMSF calculations not for the whole protein but only for the c-alpha atoms. Those atoms are the central carbon atoms of the protein which means that in this case calculation only the backbone is considered. But by comparing the two plots above with each other we can see that in both cases the beginning of the protein is the part which has definitly the huges fluctuation of the average structure. So not only the side chains differ from the average structure but also the backbone which indicates for a strong flexibility.

Pymol analysis of average and bfactors

These average.pdb file was produced automatically during the calculation of the RMSF because it is needed for comparisons. This file contains the average structure of the protein. Because of the option -oq the bfactor.pdb file was produced additionally. In this file the temperature factors (bfactors) are calculated and added to a reference structure by coloring the specific regions of the structure. Normally the parts of the protein which are most flexible have also the highest temperature. To find out if this is the fact in our case we used pymol to analyze the average structure and the bfactors structure. Additionally we compared the predicted average structure with the original structure of our protein.


Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure9: Alignment of the original structure with the average structure
Figure10: Alignment of the original structure with the structure containing the bfactors
Figure11: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.169 RMSD: 0.377 RMSD: 1.422


To find out how accurate the calculated average structure is we aligned it with the original structure of out protein (1u5b). As we can see in Figure 9 and additionally because of the RMSD value of 1.169 the superposition of the two structures is not covering perfectly. The middle part of the protein is aligned quite good. The most deviating parts are the two ends of the structures on the left and on the right side of the picture. By looking at the already discussed RMSF we can see that these regions are the most flexible ones so it is possible that the two structures are only in two different states of movement. Next we compared the structure of 1u5b with the structure containing the bfactors and according to Figure 10 the used reference structure on which the bfactors are added is the structure of 1u5b. This is obvious since they are superposed nearly perfectly. There is a minimal shift in the alignment but since this occurs at the whole structure we consider it to be an error of the superposition tool. Now we come to the analysis of the bfactors. In Figure 11 we can see the alignment of the structur containing the bfactors with the average structure. Of course they are a bit different again because of the different states of movement but in this figure the bfactors are the most interesting observation. As it is shown in the picture only the part in the end of the protein (left side of the picture) is colored indicating that only this part of the protein is flexible. The coloring ranges from yellow to red where yellow stands for little and red for high flexibility. This flexibility according to the bfactors is reflected by the RMSF value above. The other end of the protein is not colored but only a bit shifted. We thought that this shift could be a result of a movement but since it is not colored this theory is perhaps false. But by looking at the RMSF values above we see that there is only a very little fluctuation of the atoms. So perhaps this part is a little bit flexible but not enough to be marked as flexible by the bfactors.

C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure12: Alignment of the original structure with the C-alpha atoms of the average structure
Figure13: Alignment of the original structure with the C-alpha atoms of the structure which contains the bfactors
Figure14: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 0.955 RMSD: 0.300 RMSD: 0.993

To analyse the run where we only considered the C-alpha atoms of the structure again we first wanted to find out how good the average structure fits the original structure. As we can see in Figure 12 there is again a lot of variation between the average structure and the structure of u15b. But by looking at the RMSD value (0.955) we can see that it is smaller than the RMSD value considereing the whole protein for the average structure (1.169). Since we assumed that the variation is aroused by the different states of movement in the different structures we can say that the backbone has a bit less flexibility because of the lower RMSD value. The most variation is again in the both endings of the protein. Next the comparison of the original structure with the C-alpha atoms of the structure containing the bfactors gets analysed. Figure 13 and also the very low RMSD value show that the superposition of the two structures is very close. Perhaps there is a little bit of variation between the two structures or we have the same case as in Figure 10 where we assumed a mistake of the programm since there was a shift during the whole alignment. It is hard to see it here because of the spheres. The last analyses is the detailed one of the structure containing the bfactors ( Figure 14). Again the part with the highest temperature is colored where red means high flexibility and green low flexibility. As we can see only the end of the protein (right side) is colored so only this part of the protein exhibits flexibility. This observation agrees with the RMSF because in both cases the beginning of the protein is predicted to be flexible.

Radius of gyration

The radius of gyration reflects how the structure changes during the simulation and how the shape changes during the time.

To calculate the radius of gyration we used the command
g_gyrate -f wtMD.xtc -s wtMD.tpr -o radius-of-gyration.xvg
After submitting this command we chose group 1 to calculate the radius of gyration for the whole protein

Figure15: Radius of gyration during the MD simulation

According to the black line in Figure 15 the radius of gyration ranges between 2.22 and 2.4. nm during the whole simulation. The black line describes the general change in the shape of the protein. By looking at the plot more closely we can see that there is trend. In the beginning the radius has its maximal value of about 2.4 nm but during the simulation it falls have of the time. But after about 6300 ps the decline of the radius stopps. After this moment the value is between 2.23 and 2.27 nm which shows that the fluctuation is very small. But the fact there is still variation until the end of the simulation shows that there the protein moves all the time indicating the flexibility of the protein. The changes in the profile of the protein are specified by the red (x axis), green (y axis) and blue (z axis) lines.

Structural analysis

First we had to use the command
trjconv -f wtMD.xtc -o wtMD_nojump.xtc -pbc nojump
This is important because the protein possibly jumps out of the box so the trajectory has to be rebuild. This has the effect that the particles are back in the center.

Solvent accessible surface area

The solvent accessible surface area (SASA) of a protein is the part of the surface which is reachable a solvent. This definition of SASA can be devided into two subgroups - hydrophilic SASA and hydrophobic SASA. Which means that the possibility that a solvent can reach the surface depends on its properties.

To calculate the solvent accessible surface area we used the command
g_sas -f wtMD_nojump.xtc -s wtMD.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg
After submitting this command we had to choose two groups. Both times we chose protein.

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure16: Plot of the average solvent accessibe surface over time per residue
Figure17: Plot of the average solvent accessibe surface over time per atom
Figure18: Plot of the solvent accessible surface of the protein during the md simulation


In Figure 16 the average sas for each residue during the simulation is shown. We can see that there is much variation and the solvent asseccible areas for the residues range between 0 nm2 and 2.3 nm2. As there are also regions which have a sas of 0 nm2 we can see that there are parts which are not accessible for solvents but the most regions are accessible. The most accessible one is in the total beginning since the peak is definitly the highest one. Additionally there are two high peaks in the middle of the protein which differ completely from the peaks next to them since they are all quite. This shows that there are only a few parts in the the center of the protein which are accessible for solvents but here the accessibility is very good. Figure 17 the average solvent accessibe surface over time per atom is shown. Again there is a lot variation in the sas. It ranges between 0 nm2 and 0.55 nm2. The last plot ( Figure 18) shows the general sas for the whole protein during the simulation. The red line describes the accessibility for hydrophilic solvents and the black line for hydrophobic solvents. As we can see the accessibility for hydrophobic solvents is a little bit higher but not a lot. The green line which hardly fluctuate shows the general sas for the protein during the whole simulation indicating that the sas is always quite the same.

Hydrogen bonds

There are two different possibilities of hydrogen bonds. They can be inside of the protein (protein-protein) or between the protein and the surrounding solvents. For the building of a hydrogen bond it is important that the hydrogen-donor and the hydrogen-acceptor are not to far away from each other. This means that high flexibility of a protein would lead to high variation in the hydrogen bonds.

To calculate the hydrogen bonds between protein and protein and between protein and water we used the commands

echo 1 1 | g_hbond -f wtMD_nojump.xtc -s wtMD.tpr -num hydrogen-bonds-intra-protein.xvg
echo 1 12 | g_hbond -f wtMD_nojump.xtc -s wtMD.tpr -num hydrogen-bonds-protein-water.xvg

protein and protein protein and water
Figure19a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure20a: Hydrogen bonds with the surrounding solvents and pairs within 0.35 nm during the simulation
Figure19b: Internal hydrogen bonds during the simulation
Figure20b: Hydrogen bonds with the surrounding solvents during the simulation


Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 594 1158 308.847 343926
protein-water 29470 30034 806.073 4.42551e+08


Figure 19a (left) shows the number of internal hydrogen bonds during the simulation. According to the black line in this plot which describes the hydrogen bonds the number of bonds is about 300. This number is supported by the table above. Since the black line shows nearly no variation during the whole simulation it seems that there is no change in the number of hydrogen bonds. But by looking at Figure 19b we can see that the number of hydrogen bonds change since there is much fluctuation in the curve. Although they vary between 280 and 335 a trend can be seen. In the beginning the average number is about 310 then they go down to about 300 and rise again to 320. So we see that the number of hydrogen bonds first decline a bit but after one third of the simulation they rise again. By comparing this trend with the one in Figure 20b we can say that they are completely oppositional. First the number is low then rises a bit and after about one third of the time they fall again. It has to be recognized that the number of extrenal hydrogen bonds is always much higher than the internal one since the range lies between 740 and 860 but it is interesting that they are completely opposed. It is obvious that they have to be like this because of the movement of the shape of the protein. Since there is movement which is indicated by the alternating hydrogen bonds we can say that the protein is very flexible during the simulation. The red lines in Figure 19a and Figure 20a display the pairs within 0.35 nm. There are much more pairs within this distance inside of the protein (1400-1500 pairs) than with the surrounding solvents (1000 -1200 pairs). Additionally there is much more variation in the number of pairing with the solvents during the simulation than inside of the protein.

salt bridges

Ramachandran plot

In a Ramachandran plot the backbone dihedral angles ψ and φ of the amino acid residues are visualized. On the left upper corner the beta sheets are shown. Under the beta sheets the alpha helices are described. On the right side the lefthanded helices of the protein are shown.


To calculate the ramachandran plot we used the command
g_rama -f wtMD_nojump.xtc -s wtMD.tpr -o ramachandran.xvg

Ramachandran plot of our simulation general Ramachandran plot
Figure22: Ramachandran plot of our protein
Figure23: General Ramachandran plot (<ref>http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg</ref>)


Figure 22 shows the Ramachandran plot of the protein predicted by MD. As we can see the regions for beta sheets and alpha helices are very black and also the part for lefthanded helices. Additionally to these fields the other three corners are black. By comparing it to the general Ramachandran plot (Figure 23) we can say that there are much more black fields in the plot of the simulation. This shows that the angles are not that concentrated on one position but vary a lot. Since there are regions which are completely white it is obvious that some positions and angle combinations not occur in the simulated protein. The fact that there are so many different angle positions and not only the ones like in the general Ramachandran plot could indicate that this protein is flexible.

Analysis of dynamics and time-averaged properties

RMSD matrix

A RMSD matrix is helpful to find groups of structures which are similar between the different points of time. When there are groups of structures which are similar the RMSD value is lower between them and high to other groups. The range of the RMSD value is from 0 (blue) to 0.579 (red).

To calculate the RMSD matrix we used the command
g_rms -s wtMD.tpr -f wtMD_nojump.xtc -f2 wtMD_nojump.xtc -m rmsd-matrix.xpm -dt 10
After submitting this command we had to choose two groups. Both times we chose protein.

Figure24: RMSD matrix of the structures of our protein during the simulation

Figure 24 shows the correlation between the several structures of our protein during the simulation. It is obvious that there have to be a diagonal which is turquoise and blue. As we can see there is only one other part in the matrix which is turquoise and it is in the end of the simulation between 6000 ps and 10000 ps. This shows that these groups of structures are all quite similar. Additionally there are red parts between 6500 ps till 9000 ps and 1000 ps till 2500 ps. This shows that the group of structures which are similar in the end are quite different from the groups in the first part of the simulation. It is also very interesting that the groups of structuers which are completely in the beginning of the simulation seem to be very different to the whole rest of structures during the simulation since thw border of the matrix is red and only in the bottom left corner it is colored green.

Cluster analysis

The similarity of structures which is analysed above can also be calculated and shown by clustering the structures which are similar to each other. This is done in the next step.

To calculate the cluster we used the command
echo 6 6 | g_cluster -s wtMD.tpr -f wtMD_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10

For the analysis of the first two cluster we used pymol.

Figure25: Visualisation of the cluster of structure groups
Figure26: Plot of the RMSD values of the clusters


The programm was able to find 542 cluster. In figure Figure 25 the clustered structures of the protein are visualised. The plot of the clustures show the RMSD values of the different plots. The RMSD values range from 0.07 nm to 0.57 nm. The fact that these values are quite low indicates that the all of the groups of structures are not completely different. As we can see most of the clusters have an RMSD value of about 0.35 which shows that that the main part of the structures have a bit similarity to other groups of structures. There is also a little number of groups with a value of 0.57 which shows that these groups of structures only have a bit similartiy during the simulation. Since the peaks between 0.1 and 0.2 are very small there only a few groups of structures which show a very high similarity during the simulation.
Furthermore we compared two of the clusters to each other by comparing the structures. We chose cluster 1 and cluster 2 for this comparison. Since the RMSD value is 0.709 we can see that the clusters are not completely different and there are groups of structures in the clusters which still have similarities.

Internal RMSD

The internal RMSD are the atomic distances inside the protein. With this measure it is possible to get information about the changes of the structure during the simulation.


To calculate the internal RMSD we used the command
g_rmsdist -s wtMD.tpr -f wtMD_nojump.xtc -o distance-rmsd.xvg
After submitting this command we chose group 1 to calculate the internal RMSD for the protein

Figure27: Internal RMSD of our protein during the simulation

The internal RMSD values ranges from 0.1 nm to 0.45 nm so we can see that there is a lot fluctuation during the simulation. As we can see in Figure 27 in the beginning the values are very low but then they rose very fast until 1500 ps. After this point they only range between 0.3 nm and 0.4 nm which is not a huge variation. After about 5000 ps they rise again a bit so that the average value for the following time is 0.4 nm. After 10000 ps it seems that the RMSD converges against 04.nm.

Mutation M82L

A brief check of results

How many frames are in the trajectory file and what is the time resolution?

  • frames: 2001
  • time resolution: 5

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

  • real time: 1d03h11:10
  • simulation speed: 8.828 ns/day
  • simulation speed: 310388 years/second

Which contribution to the potential energy accounts for most of the calculations?

  • potential energy: -1.24452e+06

Quality assurance

Energy calculations

Pressure

Figure29: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 0.998885 0.018 71.3509 0.00251495

In Figure 29 the pressure of the molecular dynamic system is shown. According to the figure and also to the table average value is 0.999 bar. But as we can see the pressure ranges from about -240 bar to 230 bar so the values lie in a big range. Because of this fact we are not sure if this value of 0.999 bar is the equilibrium which should be reached or only the arithmetic average of all the values.

Temperature

Figure30: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.939 0.0044 0.959135 0.00200358

The temperature which is shown in Figure 30 ranges between 294.5 K and 301 K. As we can see in the figure there is a lot fluctuation around the average value of 197.939 but the range itself is not very big. Because of this fact we can say that the temperature in the system is quite stable which means that it reached an equilibrium.

Potential

Figure31: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.24452e+06 98 1059.66 -676.063

In Figure 31 where the potential of the system is shown we can see that there is a lot fluctuation during the simulation. The values range between -1.249e+06 and -1.24e+06 during the whole simulation indicating that the potential did not reach an equilibrium. But it is possible to recognize a trend in the fluctuation since the values decline during the whole simulation. A low potential indicates that a protein is stable so in our case we can say that the protein got more stable during the simulation. This fact is very good for the protein since the shape is important for the function and so it is important for a functional protein to be stable.

Total Energy

Figure32: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.02137e+06 98 1298.13 -674.564

The total energy ( Figure 32) is very low again with an average value of -1.02137e+06 kJ/mol, but it is not as low as the potential. Except for a few peaks we can say that the fluctuation is quite regular indicating that the protein never has a high energy. As already mentioned in the analysis of the potential we can say that this high total energy is important for the stablility and therefore also for the function of the protein.

Minimum distance between periodic boundary cells

Figure33: Minimum distance between periodic boundary cells


The shortest periodic distance is 1.91319 nm at time 630 ps, between atoms 25 and 5578. The distance ranges between 2 nm and 5 nm which shows that the protein is flexible during the whole simulation. By looking at Figure 33 we can recognize a trend in the minimum distance since it starts quite low and fluctuates around a value of 2.5 nm and after 4700 ps the values rise constantly. After 7000 ps they fluctuate around a average value of 4. So we can say that during the simulation the minimum distance rise.

Root mean square fluctuations

RMSF for protein RMSF for C-alpha
Figure34: RMSF for protein
Figure35: RMSF for C-alpha

In Figure 34 the RMSF for each residue of the protein is shown. Because of the huge peak at the first residues it is obvious that the part which varies most of the average structure is in the beginning. This variation indicates that this region of the protein is very flexible because it changes its shape a lot. The rest of the protein is quite fixed beacuse there are no remarkable high peaks. Only the region in the end contains a peak which is a big higher than the other one so we can say that there is perhaps a region which is a bit flexible. Since the trend in Figure 35 which shows the RMSF for the C-alpha atoms is the same we can say that not only the residues in the beginning are flexible but also the backbone. There is also a small peak in the end indicating that there is little flexibility in the end of the backbone of the protein.

Pymol analysis of average and bfactors

Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure36: Alignment of the mutated structure with the average structure
Figure37: Alignment of the mutated structure with the structure containing the bfactors
Figure38: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.106 RMSD: 0.377 RMSD: 1.294


First we aligned the calculated average structure with the original structure of the protein to see if the mutation causes huge changes in the structure (Figure 36). As we can see there are big differences but they've already been there without the mutation (Figure 9) and can be explained that the two structures are in different steps of the movement. In the next figure (Figure 37) we compared the structer of 1u5b with the structures containing the bfactors. Since this structure is the reference structure the two structres should be very similar or perhaps identical. As we can see both structures are aligned nearly perfectly. It only looks like there is a shift in the whole structure but this could be a mistake of the programm. In Figure 38 it can be seen very good that the beginning of the protein is the onlyregio which is colored with another color than blue or turquoise so it is the only region which has a higher temperature. As some parts are completely red we can say that the beginning of the protein is very flexible.

C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure39: Alignment of the mutated original structure with the C-alpha atoms of the average structure
Figure40: Alignment of the mutated original structure with the C-alpha atoms of the structure which contains the bfactors
Figure41: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 0.886 RMSD: 0.289 RMSD: 0.930

To find out if the backbone is also flexible and not only the sidechains we also analyse the C-alpha atoms. In Figure 39 we first aligned the calculated C-alpha atoms with the original structure and as we can see there are huge differences between the two structures expecially in the beginning and in the end of the protein. But these differences are not evoked by the mutation since there are also differences in Figure 12. This variation is only because the two structures are in different steps in the movement. To compare the structure of the bfactors with the original structure we also aligned them ( Figure 40). Here we can see that they cover each other nearly perfectly. This observation is supported by the RMSD value of 0.289 which is really very low. This fact let us assume that there is no influence of the mutation on the structure. In Figure 41 the bfactors are colored and so we can see very good that only the beginning of the protein is not only blue or turquoise. This indicates that only the beginning of the protein is flexible. Since only the very ending of the protein is completely red and not only orange we know that only this part is very flexible and the rest is indeed flexible but not that much.

Radius of gyration

Figure42: Radius of gyration during the MD simulation

The radius of gyration which is visualised in Figure 42 ranges between 2.25 and 2.4 nm. According to the black line which describes the general variation in the shape of the protein the radius decrease during the simulation. Except of the middle of the simulation where the values rise a bit to 2.35 nm again. But in the end we have only a radius of gyration of about 2.25 nm. Since there is onyl a very small fluctuation in the end of the simulation we can assume that there is only a litte flexibility in the protein.

Structural analysis

Solvent accessible surface area

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure43: Plot of the average solvent accessibe surface over time per residue
Figure44: Plot of the average solvent accessibe surface over time per atom
Figure45: Plot of the solvent accessible surface of the protein during the md simulation

In the first figure ( Figure 43) the average solvent accessibility for each residue during the simluation is shown. Since the values range between 0 nm2 and 2 nm2 we can say that the surface of the protein has many very different regions. There are many regions in the middle of the protein which have very low values indicating that these parts of the protein can hardly be reached by solvents. Nevertheless there are three very high peaks at residue 150, 180 and 230 so there are also small regions in this middle part which are good accessible. But the region with the highest accessibility is in the beginning of the protein at residue 10 since the peak at this position is the highest one in the whole plot. All in all we can say looking at the plot that there are more high values in the beginning and in the end of the protein and less in the middle part. In Figure 44 the average sas for each atom during the simulation is shown. Again there is a lot variation between the different parts of the protein since the values range between 0 nm2 and 0.5 nm2. The parts with very low values are wuite the same as in Figure 43 and are located in the middle of the protein. The interesting point is that the highest value is not in the beginning but in the end of the protein. Additionally there are many peaks in the middle part which are as high as the peaks in the beginning. These facts indicate that the first part of the protein is not better accessible for solvents than many other regions of the protein. The accessibility in the end of the protein is very variing. Although there is one vrey high peak the peaks in the neighbourhood are very low so there is only a small region which is good accessible for solvents but around this region the accessibility got worse. In Figure 45 the accessibility is devided in hydrophilic (red) and hydrophobic (black). Both lines are at about 110 nm2 - 120 nm2 and do not change a lot during the simulation. Since the black line is bit over the red one we can conclude that the accessibility for hydropbohic solvents is a bit better but not a lot.

4124 out of 6658 atoms were classified as hydrophobic

Hydrogen bonds

protein and protein protein and water
Figure46a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure47a: Hydrogen bonds and pairs within 0.35 nm with the surrounding solvents during the simulation
Figure46b: Internal hydrogen bonds during the simulation
Figure47b: Hydrogen bonds with the surrounding solvents during the simulation


Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 594 1158 304.417 343926
protein-water 29474 30038 817.490 4.4267e+08

By comparing Figure 46a which shows the internal hydrogen bonds with Figure 47a we see that there are more hydrogen bonds with the surrounding solvents than in the interior of the protein. This is clear since there are much more possibilities to build bonds with the surrounding solvent. But it is interesting that there are more pairs within 0.35 nm inside of the protein than with the surrounding solvents. Additionally the numbers of these pairs vary lot indicating that there is much movement in the protein which means that the protein is flexible all the time. The black lines which describe the hydrogen bonds seems barely to fluctuate. But by looking at Figure 46b and Figure 47b we can recognize that they indeed fluctuate. The number of hydrogen bonds inside of the protein range between 280 and 330 and the number of hydrogen bonds with the surrounding solvents lies between 750 and 870. Both rise and decline very constantly all the time like a sinus curve. It is remarkable that both curves are inversely. This fact and that they are fluctuating all the time indicates that there is a lot variation in the shape of the protein which means that it is very flexible.

salt bridges

Ramachandran plot

Ramachandran plot of our simulation general Ramachandran plot
Figure49: Ramachandran plot of our protein
Figure50: General Ramachandran plot (<ref>http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg</ref>)

By comparing the Ramachandran plot of our protein ( Figure 49) with the general Ramachandran plot ( Figure 50) we can see that there are huge differences. Indeed the regions where the alpha helices and the beta sheets are visualised are the same but there are much more left-handed alpha helices in our protein. Additionally the right upper corner and the bottom right corner are black which means that these angle combinations are also very common in our protein.

Analysis of dynamics and time-averaged properties

RMSD matrix

Figure51: RMSD matrix of the structures of our protein during the simulation


Figure 51 shows the correlation between the several structures of our protein during the simulation. It is obvious that there have to be a diagonal which is turquoise and blue. In addition to this diagonal there are three regions next to the diagonal which are also turquoise. According to this turquoise coloration we can say that the structures between 0 ps and 4000 ps correlate quite good with each other. There are some green and yellow parts in this square so some of the structures do are not very similar but all in all the structures correlate. The next region is between 4000 ps and 6000 ps where the structures correlate very good. In the last part of the simulation between 6500 ps and 10000 ps the structures correlate the best since the coloring is partly blue. As the coorelating parts are like blocks around the diagonal we can say that the structures change only a little during some parts of the simulation since they all correlate with each other in one black. But then there is a jump to another structure. It seems like there is nearly no change in the structure after 6500 ps since here the structures are very similar. The jumps between the several blocks has to be high since the structures of the beginning and the end of the simulation do not correlate at all. This is shown be the red coloring between 200 ps - 600 ps and 6000 ps - 10000 ps.

Cluster analysis

Figure52: Visualisation of the cluster of structure groups
Figure53: Plot of the RMSD values of the clusters


The programm was able to find 525 cluster. In figure Figure 52 the clustered structures of the protein are visualised. The plot (Figure 53)shows the distribution of the RMSD values of the clustures. The RMSD values range from 0.07 nm to 0.9 nm. Since there are such high RMSD values we can see that some structures are clustered although they are not very similar. Contrary we also have cluster with a very low RMSD of 0.07 which shows that the structures in these clusters have to be very similar. But most of the clusters (19000) have an RMSD of 0.58 indicating that the structures are not very similar but correlate quite good. There is also another high peak with about 10000 clusters which have an RMSd of 0.2 This is a very low value indicating that many very similar structures are clustered.
Additionally we compared the two clusters one and two with each other. The have an RMSD of 0.822 nm. This is interesting since there are clusters which have an internal RMSD which is higher than this one. So we can assume that there are a few structures which are very different to all other structures.

Internal RMSD

Figure54: Internal RMSD of our protein during the simulation

The internal RMSD which is plotted in Figure 54 changes a lot during the simulation. In the beginning it les at 0.1 nm which means that the interatomic distances were all very small. But between 0 ps and 1000 ps there are two huge jumps so that the RMSD value amounts 0.35 nm. Until the end of the simulation it rises so that the RMSD in the end is at about 0.45 nm. Since there is not much fluctuation in the end it seems that the value convergates to 0.45 nm. Still there is a bit variation indicating that the protein stays flexible.

Mutation C264W

A brief check of results

How many frames are in the trajectory file and what is the time resolution?

  • frames:
  • time resolution:

How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

  • real time: 1d03h22:33
  • simulation speed: 8.767 ns/day
  • simulation speed: 312557 years/second

Which contribution to the potential energy accounts for most of the calculations?

  • potential energy: -1.24420e+06


Quality assurance

Energy calculations

Pressure

Figure56: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 1.00041 0.0069 71.2651 -0.0331535

In Figure 56 the pressure of the molecular dynamic system is shown. According to the figure and also to the table average value is 1.00041 bar. But as we can see the pressure ranges from about -250 bar to 250 bar so the values lie in a big range. Because of this fact we are not sure if this value of 1.00041 bar is the equilibrium which should be reached or only the arithmetic average of all the values.

Temperature

Figure57: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.94 0.0064 0.956382 -8.60686e-05

In Figure 57 the temperature of the MD simulation is shown. As we can see it ranges between 294 K and 300.5 K so it has a very small deviation of the average value of 297.94 K. Since there is only such a small fluctuation we can see that the temperature in the system is quite stable which means that it reached an equilibrium.

Potential

Figure58: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.2442e+06 75 1052.26 -477.246

Figure 58 shows the potential of the md system. As we can see in the picture the potential ranges from -1.248e+06 kJ/mol to -1.24e+06 kJ/mol. Although this is a very huge range of 10000 we can see that all in all the potential is very low. This low potential indicates that the protein is quite stable. Since the structure of a protein is responsible for the function of a protein this stability is important for the function of the protein.


Total Energy

Figure59: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.0211e+06 75 1292.04 -477.313

The low potential energy already indicated that the total energy of the system has to be quite low. By looking at Figure 59 we can see that the energy is a bit higher than the potential energy but it is still very low. Additionally there is less variation in the energy since it ranges between -1.025e+06 kJ/mol and -1.017e+06 kJ/mol. Again we can say that such a low energy stands for a stable protein which indicates that the simulation was correct.

Minimum distance between periodic boundary cells

Figure60: Minimum distance between periodic boundary cells

The shortest periodic distance is 2.01518 nm at time 1590 ps, between atoms 166 and 6569. In Figure 60 the minimum distance is visualised by the black line. In the beginning the minimal distance is 0.25 nm but in the following 4500 ps it rise till 4 nm. After 4500 ps the minimal distance fluctuate around this value. With this information we can conclude that in the beginning there is much more variation and so flexibilty in protein but although the change in the minimum distance in the end is not that much there are still fluctuations. This fact indicates that the protein is still flexible in the end of the simulation but does not change its chape that much as in the beginning.

Root mean square fluctuations

RMSF for protein RMSF for C-alpha
Figure61: RMSF for protein
Figure62: RMSF for C-alpha


Figure 61 where the RMSF for the whole protein is plotted it is obvious that mainly the beginning of the protein shows the most fluctuation. This indicates that this region of the protein is the one with the most flexibility. In the middle part of the protein there are no significant high peaks which means that this part is fixed. In the end there is a peak which is a bit higher than the rest of the protein. This could suggest that this region is also a bit flexible. By comparing this plot with Figure 62 we can see the same trend only the peaks are not as high as in figure 61. This indicates that especially the backbone which is displayed by the C-alpha atoms is very flexible.

Pymol analysis of average and bfactors

Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure63: Alignment of the mutated structure with the average structure
Figure64: Alignment of the mutated structure with the structure containing the bfactors
Figure65: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.319 RMSD: 0.385 RMSD: 1.463

First we wanted to find out if there is a change in the structure because of the mutation so we aligned the average structure with the original structure ( Figure 63). As we can see the structures can not be superposed very good and especially in the beginning the two structures seem to be completely different. But by comparing it with Figure 9 we can see that this bad superposition does not occur because of the mutation. Only the part in the beginning is interesting because in the original structure and in the mutated structure they go in two different directions and this is not the case in the unmutated average structure. In Figure 64 we superposed the original structure (1u5b) with the structure containing the bfactors. The superposition seems to quite perfect which is supported by the RMSD value. In Figure 65 the bfactors are visualised. As we can see the beginning of the protein is red orange and turquoise so we can say that this part is the flexible one. But it is remarkable that this time the red part is not at the complete end but a few residues earlier which is different to the other two cases. Additionally to this fact we can also observe that on the other side of the protein (right) the protein is colored light blue indicating that this region is also a bit flexible.

C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure66: Alignment of the mutated structure with the C-alpha atoms of the average structure
Figure67: Alignment of the mutated structure with the C-alpha atoms of the structure which contains the bfactors
Figure68: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 1.106 RMSD: 0.301 RMSD: 1.112

In Figure 66 we first aligned the calculated C-alpha atoms with the original structure and as we can see there are huge differences between the two structures expecially in the beginning and in the end of the protein. These differences are not only evoked by the mutation since there are also differences in Figure 12. But we can see that in the beginning the structure is different than in Figure 12 so we can assume that this change in the structure is caused by the mutation. In Figure 67 the original structure is aligned with the reference structure containing the bfactors. Here we can see that the two structures are superposed very well. And it seems that there are no differences between the two structures. The bfactors which are shown in Figure 68 indicate that the beginning of the protein (left side) has a high temperature and so this part is flexible. It is interesting that on the right side of the picture there are also some C-alpha atoms which are colored in light blue meaning that this region of the protein is also a bit flexible.

Radius of gyration

Figure69: Radius of gyration during the MD simulation

In Figure 69 the radius of gyration of our protein is shown. It ranges between 2.25 nm and 2.4 nm. In the beginning it lies at about 2.4 nm but the it decrease all the time untill 6500 ps where it reachs a value of 2.25 nm. At 6500 ps it rise a bit so that the radius is about 2.3 nm. During the remaining simulation it fluctuates around this value. Since there is more variation in the radius in the first part of the simulation we can say that here the protein is more flexible. But although the radius of gyration does not change that much in the end there is still a bit fluctuation indicating that the protein is still flexible.

Structural analysis

Solvent accessible surface area

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure70: Plot of the average solvent accessibe surface over time per residue
Figure71: Plot of the average solvent accessibe surface over time per atom
Figure72: Plot of the solvent accessible surface of the protein during the md simulation

The first figure ( Figure 70) shows the average sas for each residue during the simulation. It reaches between 0.01 nm2 and 2 nm2 which are huge differences. By looking at the plot we can see that the most accessible region is in the beginning since this peak is the highest one. After this peak the accessibility decline with each following residue until resiue 100 where the surface seems to be not accessible for solvents. There are more regions on the surface especially in the center of the protein which are not accessible for solvents according to the plot. Although there are so many low values in the middle part there are three high peaks which show that there are some good reachable regions anyway. The end of the protein has not as high peaks as the beginning but there are many quite high peaks indicating that this region is accessible for solvents. Figure 71 shows the same tendency as the figure before as there are much more low values in the middle part of the protein and more high peaks in the both ends of the protein. The interesting point is that there are some peaks in the middle which are as high as the one in the beginning of the protein meaning that those regions are as accessible as the beginning. In the end of the protein at atom 64000 and 6500 there are two peaks which are even higher than the one in the beginning so the accessibility is even higher in some parts in the end of the protein. In the last figure ( Figure 72) the solvents are devided in hydrophilic and hydrophobic so we can see whether the accessibility is better for hydrophilic (red) or hydrophobic (black) solvents. According to the figure the accessibility is better for hydrophobic solvents as the black line is always a bit over the red one. It is like this nearly the whole simulation apart from the end where they have both the same value.

Hydrogen bonds

protein and protein protein and water
Figure73a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure74a: Hydrogen bonds and pairs within 0.35 nm with the surrounding solvents during the simulation
Figure73b: Internal hydrogen bonds during the simulation
Figure74b: Hydrogen bonds with the surrounding solvents during the simulation


Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 595 1159 302.744 344802
protein-water 29467 30031 807.996 4.42462e+08

In Figure 73a and Figure 74a the hydrogen bonds and the pairs within 0.35 nm are shown. As we can see and also according to the table above there are much more hydrogen bonds between the protein and the solvent (~800) than inside of the protein (~300). It is interesting that there are more pairs within 0.35 nm inside of the protein than with the solvents. Since there are less changes in the number of pairs during the simulation inside of the protein we can assume that the core of the protein is much more stable than the surface. Although it looks like there is no fluctuation in the hydrogen bonds we can see that they vary a lot when we look at Figure 73b and Figure 74b. The hydrogen bonds inside the protein range between 275 and 330 and the hydrogen bonds with the surrounding solvents range between 750 and 870. According to the plots we can see that both fluctuate a lot but all in all the number of hydrogen bonds first declines untill 4000 ps and the they rise again. The hydrogen bonds with the solvents behave contrary since the number rise first untill 4000 ps and then they decline. Since there is so much variation in the hydrogen bonds we can assume that the protein is flexible during the whole simulation.

salt bridges

Ramachandran plot

Figure76: Ramachandran plot of our protein
Figure77: General Ramachandran plot (<ref>http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg</ref>)

In Figure 76 the Ramachandran plot of our protein is shown. By comparing it with the average Ramachandran plot ( Figure 77) we can see that there are some similarities between the two figures although the left has much more black regions. But only the regions which are bordered red in the right figure are filled with black dots in our ramachandran plot except for the middle of the right border. This means that there are the average kind of angle combinations in our protein just in a much higher number. Only the combination of 0 Psi and 150-200 Phi only occurs in our protein accordning to the plots.

Analysis of dynamics and time-averaged properties

RMSD matrix

Figure78: RMSD matrix of the structures of our protein during the simulation

Figure 78 shows the correlation between the several structures of our protein during the simulation. It is obvious that there have to be a diagonal which is turquoise and blue. In addition to this diagonal there are many regions next to the diagonal which are also turquoise. Since nearly the whole neighbourhood around the diagonal is turquoise we can say that all the structures which are simulated are very similar to the ones which are simulated directly before and after themselves. So we can say that the change during the simulation is not very high. Only after 6500 ps it seems like there was a bigger change since the square between 6500 ps and 10000 ps is sometimes blue. Additionally to this observation we can see that the structures simulated between 0 ps - 800 ps and 6800 ps - 10000 ps are completely different to each other since this regions are red.

Cluster analysis

Figure79: Visualisation of the cluster of structure groups
Figure80: Plot of the RMSD values of the clusters

The program found 411 clusters. In Figure 25 the clustered structures of the protein are visualised. The plot of the clustures show the RMSD values of the different plots. The RMSD values range from 0.08 nm to 0.75 nm. A RMSD value of 0.08 nm is very low indicating that the structures in this cluster have to be very similar. Since the peak is very small at this position there are only a few clusters with that similar structures. Contrary to those clusters the clusters with a RMSD value of 0.75 nm contain structures which are not very similar. But again there are only a few clusters with that property as the peak is very low. According to the plot most of the clusters have a RMSD value of 0.13 nm and 0.45 nm because the peaks are the highest ones in this region. Those values are all quite low indicating that most of the structures which are clustured are similar. This shows that there are no big changes in the structure during the simulation. Additionally we compared the first cluster with the second one. By aligning them we got an RMSD value of 0.553 nm. This is interesting because there are cluster which have a higher RMSD value so the structures which are clustered in cluster one and two have to be very similar.

Internal RMSD

Figure81: Internal RMSD of our protein during the simulation

The internal RMSD values ranges from 0.05 nm to 0.48 nm so we can see that there is a lot fluctuation during the simulation. As we can see in Figure 81 in the beginning the values are very low but then they rose very fast until 5800 ps. At 5800 ps the internal RMSD value is 0.48 nm. Until the end of the simulation it only declines a bit but mainly fluctuates around 0.45. Since there is much more variation until 5800 ps we can say that the protein is much more flexible in the beginning of the simulation. But there is still fluctuation after this point so the protein is flexible untill the end. Additionally the differences inside of the protein grow since the value is much higher in the end than in the beginning.