MD simulation analysis TSD
The journal for this task can be found here.
<xr id="tbl:generalstats"/> shows general statistics about the three simulations. It should be noted that the output of gmxcheck does not account for the number of CPUs used in the calculation and only reports the real time that passed. Normalizing the runtimes by the number of CPUs involved yields that the Wildtype and P182L mutation completed within 4 hours. R178H took 4 hours longer, however the difference is negligible, considering that they were not done in a testing environment and it is assumed that all CPUs could maintain equal load throughout the runs. In fact Gromacs reports a particularly high 12% of the time being lost due to particle-particle/Particle-Mesh-Ewald imbalance.
The three trajectories are shown in <xr id="tab:trajectories"/>. It can be seen that there is only little movement during all simulations. Secondary structure elements might slightly bend but never lose their form while loops show more movement and also exhibit larger structural changes. Observing the mutation sites its noticeable that proline obviously remains very rigid and maintains its function as a helix breaker in the wildtype as well as in simulation of R178H. When mutated to leucine the helix still breaks at this position, likely due to surrounding constraints like the following loop that carries a residue reaching into the binding site, however the side chain exhibits more movement. This is even more true for the mutated histidine at position 178. It was previously hypothesized that this residue might still fit into the binding pocket, although charge coordination function is lost, but the MD analysis suggests that it might in fact be too large since it seems to reach out of the pocket most of the time instead of being embedded in it. Globally however the simulation of R178H seems to be the most rigid one, while largest movements can be observed in the P182L simulation.
<xr id="tab:pressure"/> shows the pressure oscillations for the three simulations. As can be seen per-frame values differ by several 100 bar, as to be expected <ref name="gromacsmanualpressure">http://www.gromacs.org/Documentation/Terminology/Pressure</ref>. More importantly the average shows convergence in every simulation and does not undergo any major changes towards the end of the simulations. The final average pressure also lies close to 0 in all cases.
<xr id="tab:temperature"/> shows the temperature variances during the simulations. The mutations both show higher extreme values than the wildtype structure, especially P182L. The average however remains exactly the same for all simulations, which is the expected behavior after a period of time passed <ref name="">http://www.gromacs.org/Documentation/Terminology/Thermostats</ref>. The fact the all simulations arrive at the same value also supports that the simulations went well and arrived at the correct temperature.
<xr id="tab:potentialenergy"/> shows the fluctuations of potential energy during the simulations. As can be seen, during all simulations it is globally decreasing and the final averages of all three runs are similar.
<xr id="tab:totenergy"/> shows the total energy which is composed of the potential energy shown before and the kinetic energy <ref name="gromacstotenergy">http://www.gromacs.org/Documentation/Terminology/Total_Energy</ref>. It also globally decreases for all runs and the final averages are very similar which leads to the conclusion, that the kinetic energy behaves similarly. Given that the WT behaves in the same way than the mutations one cannot say that the mutations have an effect on the total energy.
Radius of Gyration
<xr id="tab:gyration"/> shows the gyration radius for every simulation as well as the single components. The overall movement with respect to the center of mass, is low in every simulation. Small peaks at the beginning and the end of the wildtype simulation cannot be observed in the mutation simulations. In addition, the radius of gyration around the y-axis is always lowest and the individual components reach comparable values in the wildtype and R178H simulations while they do not in the simulation of P182L. The overall low movement is in line with previous observations.
<xr id="tab:bfactor_ca_vs_prot"/> shows a comparison of the B-factors based on the converted RMSF values. It can be seen that the values calculated only from C-alpha atoms are similar to those calculated from all atoms in the structure. As a result, in further analyses, only flexibility inferred from all atoms will be further discussed.
It can be seen that the mutation residues do not seem to be significantly moving during any of the simulations, whether they are mutated or not. The wildtype shows flexibility in several loop regions at the right of the figures. These loops were also found to be variable during NMA, however no functionality could be linked to them. In addition, E323 shows flexibility based on all atoms and also on the backbone only. This is indeed one of the residues that are essential for functionality as outlined in the introduction.
R178H does not exhibit any flexibility at this position which meets expectations, given that the mutated residue might fit into the binding pocket but cannot coordinate the ligand anymore (c.f. MD preparation). It is interesting to see though, that this behavior is observable in the protein although there was no ligand present during the simulation. The molecule generally appears very rigid, with even the outside loops showing no significant movement anymore.
P182L is not known to cause TSD and analysis from flexibility only also suggests this. All regions flexible in the wildtype remain flexible, however the missing proline, which acted as a helix breaker seems to introduce additional flexibility in the right side of the protein. In the native protein conformation the first domain is located at this side of the protein. It might either give additional stability or be disturbed by the missing rigidity in the contact helix. Which of the two is the case would have to be observed by trying to model the unresolved region in this domain (which is why it was excluded in the first place) and performing additional MD runs. It is noteworthy that one of the residues that gain flexibility, E462, is one of the important residues. Whether the mutation shows biological functionality that is not exhibited in the wildtype or whether, in reality, the mutation is accounted for by the presence of the additional domain, cannot be determined at this point.
Comparison to experimental values
Comparing the RMSF-induced B-Factors evaluated above to the ones present in the original PDB file, almost no similarities could be observed. The PDB file shows higher overall flexibility and the few residues with highest flexibility are none of those seen before. The only exception are the two loops at the left, which have high B-Factors in the experimental values as well. Apart from that, there are no residues that could be explained to be flexible by any of the analyses performed so far. Of course the 'real' B-Factor as approximated by the experimental values in the B-Factor is based on more than just movement of the residue. Apparently the RMSF values do not approximate these values very well but instead carry other information, that gives hints to protein functionality.
It should also be noted, that the first domain has a relatively low B-Factor which gives indications that it might in fact stabilize the effects of mutation P182L as already presumed above.
From all movements during the simulation average structures are created during the calculation of RMSF values. The B-Factor structures used above are based on the, biological, input structure. Comparing these structures to the unphysical average structures shows that one a global basis, residues which have a high B-Factor indeed seem to be farther away in the average structure, when compared to the reference. However a high movement does not necessarily mean that a residues travels a large absolute distance, therefore several cases are also observable where the B-Factor is high, but the average and input structure are almost the same. Using the PyMol script "show_bumps" <ref name="pymshowbumps">http://www.pymolwiki.org/index.php/Show_bumps</ref> to visualize clashes between VDW radii did not show significantly more or stronger clashes for the average structures nor did it show a stronger effect for the mutants, compared to the wildtype.
In addition changes at mutated residues do not seem to be larger than anywhere else. There also is no tendency for the average structure of the simulation with R178H to deviate less from the input structure than the averages of the other two simulations do.
RMS Fluctuation plots
<xr id="tab:rmsfprot"/> shows the per residue RMSF values calculated on the full protein. The plots reinforce the results already discusses above when looking at the B-Factors that were inferred from these RMSF values. However, here it becomes apparent that the simulation of R178H is, absolutely, only slightly less flexible than the wildtype or P182L simulation. Since B-Factor highlighting was done relative to the minimum and maximum values and R178H contains a high maximum value at the last residue, which is an artifact and insignificant in terms of function, the B-Factors seemed to suggest no flexibility in this case. Nonetheless the point remains that there is, with the exception of few regions or even single residues that were already discussed above, comparably little movement in the structure during the MD simulation.
To ascertain the motion difference between wildtype and mutant simulations the p-value was computed using a two tailed t-distribution. The more significant the p-value the more distinct are the RMSF values and also fluctuations. The p-value between the P182L and wildtype structure is 0.137. The R178H mutation reaches a similar p-value of 0.127. These significance levels show that there is little difference in the motions.
The RMSD was computed for the whole protein and for the backbone only but as the curves are only slightly shifted by approximately 0.05 Å, only the C-alpha values are displayed (see <xr id="tab:rmsdprot"/>). The RMSD was computed against the starting structure of the simulations. All 3 curves stay most of the time in the area between 0.15 and 0.25. The wildtype RMSD oscillates around 0.2 and has a slight peak in the last quarter of the time. The R178H mutation values, which expresses overall the least RMSD fluctuations, rises slowly in the first half of the time and then it evens out into a comparably straight line at the value 0.25. The RMSD of the P182L simulation is from beginning on around the value of 0.23 and expresses some local minima and maxima. Overall it can be stated that the comparably low RMSD indicates that the structural changes within the simulations are of minor extent.
The Ramachandran plots, see <xr id="tab:rama"/>, depict the φ/ψ distributions of all 3 simulations. For simplicity reasons only every 100th simulation step was plotted. Most points lie in the α-helix area on the left hand side and some are around the helix region on the right. Second most φ/ψ configurations are found in the β-sheet area and various scattered points indicate the loop regions. It should be noted that many beta-sheets are contained in the full alpha subunit, due to the presence of the first domain. However since only the second domain could be used for MD analysis the beta-sheet regions are limited to the central beta-barrel.
Overall all 3 plots are very similar. This is in agreement with previous observations which all indicate that the protein is not very flexible during the molcular dynamics simulations and secondary structure elements are retained, while loops show more movement.
The observation of very limited flexibility is common throughout the molecular dynamics analysis. Although the structure vibrates and has some motile loops, one of which was also identified in NMA already, the quality assessment and further analyses leave doubts about whether the motions of the analysed protein domain can be regarded significant. Throughout the several analyses performed on the simulations, it stands out how little the simulations vary within and also between one and another as there is neither a significant difference between wildtype and mutation nor do the amino acid changes result in any major modifications. One can argue that this is not the point of an MD analysis, and naturally even small changes can have a large impact, however there is hardly anything in the simulations that seems to suggest disturbances of function or explain the enzymatic mechanism. In fact, it is questionable whether few changes might have been observed at all, had they not been known to happen before hand. That includes, that the rigidity of R178H seems to somehow suggest a loss in function, however at the same time the high flexibility of P182L can be considered an equally strong point for loss of function and as it is known from the previous analyses only the former mutation is actually disease causing. Unfortunately none of the simulations gave strong indications as to why this is. This might be attributed to the fact that the MD analysis could have been performed in much more detail but the fact the first domain is missing, which seemed important according to NMA, might have an equally large effect. In addition it should be remembered that the protein in question appears as a heterodimer and association between the alpha- and beta-subunit is essential for enzymatic activity.