Molecular Dynamics Simulations Analysis (PKU)

From Bioinformatikpedia
Revision as of 11:53, 30 August 2012 by Boidolj (talk | contribs) (Root Mean Square Deviations)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Short Introduction

We will analyze our completed molecular dynamics simulations, following the task description and the tutorial of the Utrecht University Molecular Modeling Practical. We have completed one successful run for the wildtype protein and for the mutations ALA322GLY and ARG408TRP each. The commands used to generate plots, images etc. can be found in our journal.

Initial Checks

All three simulations run for the desired 10 ns, the trajectories contain 2000 frames in 5 ps steps each. The wildtype simulation took significantly longer, since we used only 16 cores for the wildtype, 32 for the mutants. Almost half of the calculation time, 44.2% in each run, is spent on calculating Coulomb interactions and the Lennard-Jones potential of the solvent molecules. A few key statistics can be found in <xr id="tab:simulation_stats"/>.

<figtable id="tab:simulation_stats"> Statistics of the MD simulations

Mutation Sim. time Sim. speed time to reach 1 s
Wildtype 11:32 h 20.8 ns/day 131,621 years
ALA322GLY 4:20 h 55.3 ns/day 49,543 years
ARG408TRP 4:26 h 54.1 ns/day 50,685 years

</figtable>

Simulation Analysis

<figtable id="tab:overlays">

Overlays of the trajectories of all three simulations.
Overlay of all frames of the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
Overlay of all frames of the 10 ns simulation of the Gly322Ala mutation of phenylalanine hydroxylase structure 1J8U.
Overlay of all frames of the 10 ns simulation of the Arg408Trp mutation of phenylalanine hydroxylase structure 1J8U.

</figtable>

<xr id="tab:overlays"/> shows the overlay of all frames of a simulation. The trajectory for these image is already filtered from jumps over the boundaries and motions in space. We see that the protein remains compact during the simulations but little details. In the following sections we analyze the simulations in closer detail.

Quality Assurance

Convergence of Energy Terms

In the following we will present our plots as well as a short summary for the comparison of all three plots. for a further and more detailed view please refer to the specialized topics #Wildtype, #Ala322Gly and #Arg408Trp <figtable id="tab:temperatures">

Plot of the system temperature during the 10 ns simulations.
a) Plot of the system temperature during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system temperature during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system temperature during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable> In <xr id="tab:temperatures" /> one can see the differences in the temperatures between the runs are rather marginal which is the expected result. The range of energy fluctuation is rather big, which is not what we would expect. The 10 degree span from 292.5° to 302.5° is rather strange, as the biological range is much smaller. The average temperature shown in red is on the other hand ranges in the supposed way. With only the graphs of the temperature over time we can not explain the big fluctuations in the temperature.

<figtable id="tab:pressures">

Plot of the system pressure during the 10 ns simulations.
a) Plot of the system pressure during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system pressure during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system pressure during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable> The pressures of the three systems are shown similar to the temperature plots in <xr id="tab:pressures" />. As we do not have any further information towards the pressure in the human body as well as no significant differences in the three plots, we expect those results to be the standard.

<figtable id="tab:volumes">

Plot of the system volume during the 10 ns simulations.
a) Plot of the system volume during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system volume during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system volume during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable> In this table which shows the volumes of each of the systems we have our first significant differences between the wildtype and the mutants. As shown in the first picture on the very left in <xr id="tab:volumes" /> the volume of the wildtype is a bit higher than in the two systems to the right of it. Those appear to be more compact throughout the simulation.

<figtable id="tab:densities">

Plot of the system density during the 10 ns simulations.
a) Plot of the system density during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system density during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system density during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable> In <xr id="tab:densities" /> the presented plots show the density of the system, which are rather equal to each other, but for the end, where one could see an increasing density in the very right plot, where the wildtype referring area shows no increase, but this might also just be a coincidence.

<figtable id="tab:energies">

Plot of the system's potential, kinetic and total energy during the 10 ns simulations.
a) Plot of the system's potential, kinetic and total energy during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the system's potential, kinetic and total energy during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the system's potential, kinetic and total energy during the 10 ns simulation of the Arg408Trp mutation.

</figtable> The plots of <xr id="tab:energies" /> show no differences, which one can see with the bare eye.

<figtable id="tab:boxes">

Plot of the system extension in 3 dimensions during the 10 ns simulations.
a) Plot of the system extension in 3 dimensions during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. X- and Y-dimensions overlap and are not to distinguish in the plot.
b) Plot of the system extension in 3 dimensions during the 10 ns simulation of the Ala322Gly mutation. X- and Y-dimensions overlap and are not to distinguish in the plot.
c) Plot of the system extension in 3 dimensions during the 10 ns simulation of the Arg408Trp mutation. X- and Y-dimensions overlap and are not to distinguish in the plot.

</figtable>

Just like <xr id="tab:energies" /> this ( <xr id="tab:boxes" />) shows no clear differences in the plots.

<figtable id="tab:coulombs">

Plot of the system's Coulomb interaction energy during the 10 ns simulations.
a) Plot of the system's Coulomb interaction energy during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system's Coulomb interaction energy during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system's Coulomb interaction energy during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable> Here (<xr id="tab:coulombs" />) in difference to the two tables before, can see differences between the wildtype (left) and the two mutations. Whereas the plot in the center shows a similar course to the wildtype, the right plot shows a totally different picture within a different range

<figtable id="tab:vdWs">

Plot of the system's van-der-Waals interaction energy during the 10 ns simulations.
a) Plot of the system's van-der-Waals interaction energy during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. A running average in a window of length 100 ps is indicated in red.
b) Plot of the system's van-der-Waals interaction energy during the 10 ns simulation of the Ala322Gly mutation. A running average in a window of length 100 ps is indicated in red.
c) Plot of the system's van-der-Waals interaction energy during the 10 ns simulation of the Arg408Trp mutation. A running average in a window of length 100 ps is indicated in red.

</figtable>

Wildtype

<xr id="tab:temperatures"/> a) shows the temperature during the simulation. It fluctuates slightly around 297.9° Kelvin or 24.7° Celsius but stays within just 3 degrees. (Calculation of heat capacity was erroneous in Gromacs and has been disabled in 4.5.)
<xr id="tab:pressures"/> a) shows how the pressure fluctuates wildly from -200 to +200 bar and peaks up to +- 400 bar during the whole simulation. The average stays very close to the setting of 1 bar. This could either simply be a feature of the simulation or be considered realistic, as the volume of the simulation box is very small and small fluctuations in the volume cause large pressure fluctuations (cf. ambermd.org). <xr id="tab:volumes"/> a) shows accordingly small changes of the volume, mostly within 0.5 nm^3 of 356.6 nm^3. Density (cf. <xr id="tab:densities"/> a)) remains very stable around 1021.3 kg/m^3, as do the potential and kinetic energy in <xr id="tab:energies"/> a). The size of the box containing the simulation (cf. <xr id="tab:boxes"/> a)) remains almost fix in all three dimensions. The small peaks are probably water molecules crossing the periodic boundaries. The energies of the van-der-Waals interactions and the Coulomb interactions are shown in <xr id="tab:vdWs"/> a) and <xr id="tab:coulombs" /> a) respectively. While the energy of the van-der-Waals interactions stays roughly constant, the energy from coulomb interactions first goes down steeply, then stabilizes but does not converge. Altogether, we see for most terms a stable behaviour, and assume, that the initial conditions have already been equilibrated properly in the short runs before the production run.


Ala322Gly

<xr id="tab:temperatures"/> b) shows the temperature during the simulation. It remains around 297.9° Kelvin or 24.7° Celsius and stays mostly within just a few degrees, with a minimum of 292.5° Kelvin and a maximum of 303.1° Kelvin .
<xr id="tab:pressures"/> b) shows how the pressure fluctuates wildly from -200 to +200 bar and peaks up to +- 400 bar during the whole simulation. The average stays very close to the setting of 1 bar, which also differs from the physiological pressure of around 0.37 bar. <xr id="tab:volumes"/> b) shows small changes of the volume, mostly within 0.5 nm^3 of 356.3 nm^3. Density (cf. <xr id="tab:densities"/> b)) remains very stable around 1021.8 kg/m^3, as do the potential and kinetic energy in <xr id="tab:energies"/> b). The size of the box containing the simulation (cf. <xr id="tab:boxes"/> b)) stays almost constant in all three dimensions. The small peaks are probably water molecules crossing the periodic boundaries. The energies of the van-der-Waals interactions and the Coulomb interactions are shown in <xr id="tab:vdWs"/> b) and <xr id="tab:coulombs" /> b) respectively. While the energy of the van-der-Waals interactions stays roughly constant, the energy from coulomb interactions first goes down steeply, probably equilibrating, then fluctuates with peaks around 1700 ps and 5800 ps. Altogether, we see for most terms a stable behaviour, and assume, that the initial conditions have already been equilibrated properly in the short runs before the production run.


Arg408Trp

<xr id="tab:temperatures"/> c) shows the temperature during the simulation. It fluctuates only a few degrees slightly around 297.9° Kelvin or 24.7° Celsius.
<xr id="tab:pressures"/> c) shows how the pressure fluctuates wildly from -200 to +200 bar and peaks up to +- 400 bar during the whole simulation. The average is around 0.45 bar. <xr id="tab:volumes"/> c) shows small changes of the volume, mostly within 0.5 nm^3 of 356.36 nm^3. Density (cf. <xr id="tab:densities"/> c)) remains very stable around 1021.7 kg/m^3, as do the potential and kinetic energy in <xr id="tab:energies"/> c). The size of the box containing the simulation (cf. <xr id="tab:boxes"/> c)) remains almost fix in all three dimensions. The small peaks are probably water molecules crossing the periodic boundaries. The energies of the van-der-Waals interactions and the Coulomb interactions are shown in <xr id="tab:vdWs"/> c) and <xr id="tab:coulombs" /> c) respectively. The energy of the van-der-Waals interactions rises around 6000 ps from -2200 kJ/mol to -2050 kJ/mol, remaining unstable on a higher level than the wildtype. The energy from coulomb interactions goes down continuously, with a few spikes between 5000 ps and 6000 ps. The interaction terms suggest a relevant change in the second half of the simulation. Altogether, we see for most terms a stable behaviour, and assume again that the simulation was successful.


Minimum Distance Between Periodic Images

Since the protein uses periodic boundaries, it is possible that the protein interacts with another copy of itself. This interaction could even be indirect if the hydration shell of the protein touches over the boundaries, so the distance between periodic images should be at least 2 nm.

<figtable id="tab:mindist">

Plot of the minimal distance of interactions of the atoms during the 10 ns simulations.
a) Plot of the minimal distance of interactions of the atoms of the wildtype protein during the 10 ns simulation. The distances for the three dimensions overlap and are not to distinguish in the plot.
b) Plot of the minimal distance of interactions of the atoms of the protein during the 10 ns simulation of the Ala322Gly mutation. The distances for the three dimensions overlap and are not to distinguish in the plot.
c) Plot of the minimal distance of interactions of the atoms of the protein during the 10 ns simulation of the Arg408Trp mutation. The distances for the three dimensions overlap and are not to distinguish in the plot.

</figtable>


<figtable id="tab:mindist_c_alpha">

Plot of the minimal distance of interactions of the C alpha atoms of the backbone during the 10 ns simulations.
a) Plot of the minimal distance of interactions of the C alpha atoms of the backbone during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U. The distances for the three dimensions overlap and are not to distinguish in the plot.
b) Plot of the minimal distance of interactions of the C alpha atoms of the backbone during the 10 ns simulation of the Ala322Gly mutation. The distances for the three dimensions overlap and are not to distinguish in the plot.
c) Plot of the minimal distance of interactions of the C alpha atoms of the backbone during the 10 ns simulation of the Arg408Trp mutation. The distances for the three dimensions overlap and are not to distinguish in the plot.

</figtable>

Wildtype

The minimal distance in this simulation is 1.69 nm around 1350 ps, near the simulation start. There is another valley around 7800 ps, but if there was any interaction, it was only transient and did probably not affect the simulation, as there is no plateau in an unsafe distance as can be seen in <xr id="tab:mindist"/> a). Looking only at the backbone C alpha atoms in <xr id="tab:mindist_c_alpha"/> a), the distance is always well above 2 nm. Here, interactions would severely affect the simulation if e.g. hydrogen bonds between the backbone would form. There is a saw teeth like movement between 6000 ps and 7500 ps where the distance reaches a peak and a minimum twice in short succession. This could indicate spatial movement or a contraction and rebound of the protein in this time window.


Ala322Gly

The minimal distance in this simulation is 1.48 nm around 1680 ps, near the simulation start. There is a valley around 8500 ps, but if there was any interaction, it was only transient and did probably not affect the simulation. Mostly, the distance of the protein atoms remained safely above 2 nm as can be seen in <xr id="tab:mindist"/> b). Looking only at the backbone C alpha atoms in <xr id="tab:mindist_c_alpha"/> b), the distance is always well above 2 nm, with a plateau from 7000 ps to 9500 ps, suggesting a noticeable movement of the protein. This could indicate spatial movement or some internal movement.


Arg408Trp

The minimal distance in this simulation is 1.77 nm at 5 ps, at the simulation start. There are small fluctuations and even a very slight trend upwards, suggesting a pushing inwards of the protein's sidechains. The periodic distance of the protein atoms remains safely above 2 nm as can be seen in <xr id="tab:mindist"/> c). Looking only at the backbone C alpha atoms in <xr id="tab:mindist_c_alpha"/> c), we notice the absence of visible changes, in comparison to the wildtype and the weak mutation. Still, the distance remains large enough to prevent periodic interactions.


Root Mean Square Fluctuations

<figtable id="tab:rmsfs">

Plot of the RMSF of all residues of the protein vs. its average position during the simulations.
a) Plot of the RMSF of all residues of the protein vs. its average position during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSF of all residues of the protein vs. its average position during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the RMSF of all residues of the protein vs. its average position during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


<figtable id="tab:b_factors_down_site">

The B-factors of the proteins, view on the binding pocket. Blue indicates little movement, red greater flexibility.
a) The B-factors of the wildtype.
b) The B-factors of the Ala322Gly mutation.
c) The B-factors of the Arg408Trp mutation.

</figtable>


<figtable id="tab:b_factors_up_site">

The B-factors of the proteins, view on the upper site. Blue indicates little movement, red greater flexibility.
a) The B-factors of the wildtype.
b) The B-factors of the Ala322Gly mutation.
c) The B-factors of the Arg408Trp mutation.

</figtable>


<figtable id="tab:b_factors_binding_site">

The B-factors of the binding site. Blue indicates little movement, red greater flexibility.
a) The B-factors of the binding site in the wildtype.
b) The B-factors of the binding site in the Ala322Gly mutation.
c) The B-factors of the binding site in the Arg408Trp mutation.

</figtable>


<figtable id="tab:b_factors_322">

The B-factors of the Ala322Gly mutation site. Blue indicates little movement, red greater flexibility. In the picture, the mutation is located in the middle turn of the helix, on the lower side.
a) The B-factors in the wildtype.
b) The B-factors in the Ala322Gly mutation.

</figtable>


<figtable id="tab:b_factors_408">

The B-factors of the Arg408Trp mutation site. Blue indicates little movement, red greater flexibility. In the picture, the unmutated and mutated residue points inward from the loop coming from th upper helix.
a) The B-factors in the wildtype.
b) The B-factors in the Arg408Trp mutation.

</figtable>

Wildtype

The most flexible regions corresponding to the peaks in <xr id="tab:rmsfs"/> a) are the loops from residue 18 to 32 (nb: There is an offset of 118 of the sequence in the PDB files compared to the Uniprot sequence) with a highly flexible Tyr138 (B-factor 330.63), 153 to 163 with again the most flexible residue tyrosine Tyr277 (B-factor 203.20) and 258 to 267 with Phe382 as the most flexible (B-factor 148.98). There are other single residues with high B-factors, most of them located at the end of alpha helices and often tyrosines as can be seen in <xr id="tab:b_factors_down_site"/> a) and <xr id="tab:b_factors_up_site"/> a). <xr id="tab:b_factors_binding_site"/> a) shows a close-up of the binding site.



Ala322Gly

In <xr id="tab:b_factors_down_site"/> b) and <xr id="tab:b_factors_up_site"/> b) we see that compared to the wildtype, the same regions and residues are flexible, some of them more mobile, some more rigid. For example the loop containing the highly flexible Tyr138 (cf. <xr id="tab:b_factors_up_site"/> b) in the lower middle) appears now more rigid, but the N-terminal helix (to the right) from Ile125 to Gln134 gained flexibility. <xr id="tab:b_factors_binding_site"/> b) shows a close-up of the binding site with very little changes in flexibility and very minor changes in structure that are probably more due to 'natural' variance in the simulation. <xr id="tab:b_factors_322"/> shows the mutated helix. Here we see clearly how the sidechain that is missing after the mutation to glycine increases flexibility of the helix.


Arg408Trp

In <xr id="tab:b_factors_down_site"/> c) and <xr id="tab:b_factors_up_site"/> c) we see a few key changes in flexibility. Most regions stay similar to the wildtype, but e.g. Tyr138 becomes more rigid (cf. <xr id="tab:b_factors_up_site"/> c) in the lower middle). Also, the previously rigid Val118 gains great flexibility not present in wildtype or the weak mutation. Interestingly, especially visible in helices in <xr id="tab:b_factors_up_site"/>, flexibility is lost in various places. <xr id="tab:b_factors_binding_site"/> c) shows a close-up of the binding site with surprisingly little changes in flexibility and very minor changes in structure, probably because this is an inherently stable region. <xr id="tab:b_factors_408"/> shows the mutated loop. Here we see -- as could be expected -- a more flexible tryptophane whose bulk does not fit in the native protein structure, disrupting also stability of the secondary structure elements flanking the loop.

Statistical Difference

For the difference in fluctuations between the wildtype and the mutants we calculated the p-Value using a two tailed t-distribution (see the script here). With p = 1.222e-08 there is a significant difference in flexibility between the wildtype and the Ala322Gly mutant. The p-value for wildtype and Arg408Trp is 0.334, so there is no significant difference here.

Convergence of RMSD

<figure id="fig:1J8U_average">

The average structure of the wildtype during the simulation. The structure is not physical as atom positions are averaged over the whole simulation.

</figure> <xr id="fig:1J8U_average"/> shows the average structure of the wildtype simulation, which means the position of every atom is the average position of this atom during the simulation. This kind of structure has impossible configurations but will serve as reference for the convergence of the protein during the simulations. While convergence of the RMSD against the starting structure could still mean that the protein changes between conformations equally distant from the starting structure, convergence of the average structure means a stable conformation. But since the simulations only run a short time, the average structure will be closer to the structure assumed by the protein in the middle of the simulation and differ even from a stable conformation at the end of the simulation. This means, the RMSD against the average structure will rise again at the end of the simulation and makes this kind of plot more difficult to interpret on its own. Both, RMSD vs. starting structure and RMSD vs. average structure can give a more accurate picture of what is going on, than each on its own.


<figtable id="tab:rmsd_all-atom-vs-start">

Plot of the RMSD of all atoms of the protein vs. the starting structure during the simulation.
a) Plot of the RMSD of all atoms of the protein vs. the starting structure during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSD of all atoms of the protein vs. the starting structure during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the RMSD of all atoms of the protein vs. the starting structure during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


<figtable id="tab:rmsd_all-atom-vs-average">

Plot of the RMSD of all atoms of the protein vs. the average structure during the simulation.
a) Plot of the RMSD of all atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSD of all atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the RMSD of all atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


<figtable id="tab:rmsd_backbone-vs-start">

Plot of the RMSD of the backbone atoms of the protein vs. the starting structure during the simulation.
a) Plot of the RMSD of the backbone atoms of the protein vs. the starting structure during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSD of the backbone atoms of the protein vs. the starting structure during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the RMSD of the backbone atoms of the protein vs. the starting structure during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


<figtable id="tab:rmsd_backbone-vs-average">

Plot of the RMSD of the backbone atoms of the protein vs. the average structure during the simulation.
a) Plot of the RMSD of the backbone atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSD of the backbone atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the RMSD of the backbone atoms of the protein vs. the (theoretical) average structure during the 10 ns simulation of the Arg408Trp mutation.

</figtable>

Wildtype

<xr id="tab:rmsd_all-atom-vs-start"/> a) shows the RMSD of the simulated protein compared to the starting structure <xr id="tab:rmsd_all-atom-vs-average"/> a) the RMSD compared to the average structure. The RMSD compared to the starting structure rises steep during the first 2 ns, then rises more slowly but without noticeable convergence. In the time window between 6000 and 7500 we see the saw like movement encountered previously again.
If the structure continually changes, the RMSD compared to the average structure would follow a hyperbola during the simulation, if the structure converges, we would see a declining RMSD and convergence towards a small value (0, if the final structure were rigid). In fact, we see a hyperbola-like behaviour with a plateau in the middle of the simulation, which fits to the observation from <xr id="tab:rmsd_all-atom-vs-start"/> a) that the structure does not converge but changes more slowly towards the end. The most likely conclusion is that the structure has not yet converged towards an equilibrium state. The same applies if we look only at the backbone atoms in <xr id="tab:rmsd_backbone-vs-start"/> a) and <xr id="tab:rmsd_backbone-vs-average"/> a). Here, we see again clearly how the structure does not reach a plateau. Around 1900ps and around 6500 ps in <xr id="tab:rmsd_backbone-vs-average"/> a) we again see some pronounced shift in the structure.


Ala322Gly

The RMSD against the starting structure in <xr id="tab:rmsd_all-atom-vs-start"/> b) appears to reach a plateau around 0.17 nm at 6000 ps, with a few sharp changes before that around 2000 ps and 4000 ps. Also the RMSD against the average structure in <xr id="tab:rmsd_all-atom-vs-average"/> b) converges, at least better than in the wildtype, suggesting less structural changes. The RMSD of the backbone in <xr id="tab:rmsd_backbone-vs-start"/> b) looks less stable, with possibly additional changes around 6000 ps and 8100 ps. The RMSD against the average structure in <xr id="tab:rmsd_backbone-vs-average"/> b) appears very flat with a sharp decrease at the start, indicating that the protein stays stably near the same conformation.


Arg408Trp

The RMSD against the starting structure in <xr id="tab:rmsd_all-atom-vs-start"/> c) appears to reach a plateau around 0.22 nm at 7500 ps, with a few sharp changes before that around 4000 ps and 6000 ps. Also the RMSD against the average structure in <xr id="tab:rmsd_all-atom-vs-average"/> c) declines until 3000 ps, showing a few peaks from 3000 ps to 5000 ps but settles towards the end of the simulation instead of rising again as in the wildtype. The RMSD of the backbone in <xr id="tab:rmsd_backbone-vs-start"/> c) reaches a stable plateau at 6000ps, but falls towards the end of the simulation (but starts to rise and could have risen again to the same conformation if the simulation had continued). The RMSD against the average structure in <xr id="tab:rmsd_backbone-vs-average"/> c) appears very flat with a steady decrease until 3000 ps, indicating that the protein stays stably near the same average conformation.


Convergence of Radius of Gyration

<figtable id="tab:radius_gyration">

Plot of the radius of gyration during the 10 ns simulations.
a) Plot of the radius of gyration during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the radius of gyration during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the radius of gyration during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


<figtable id="tab:inertia">

Plot of the moments of inertia during the 10 ns simulations.
a) Plot of the moments of inertia during the 10 ns simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the moments of inertia during the 10 ns simulation of the Ala322Gly mutation.
c) Plot of the moments of inertia during the 10 ns simulation of the Arg408Trp mutation.

</figtable>


Wildtype

The radius of gyration indicates the global shape of our protein during the simulation and stays very constant in the whole simulation (cf. <xr id="tab:radius_gyration"/> a)). There is a slight expansion along the Y-axis, that has the shortest extent, at the begin of the simulation. The changes of shape over time are also depicted in <xr id="tab:overlays"/> a). <xr id="tab:inertia"/> a) shows the inertia of the protein with respect to its rotation along the three axes. Our protein is not quite symmetrical around the Y-axis, which explains why the radii along X- and Z-axes are very close, but the moments of inertia differ.


Ala322Gly

<xr id="tab:radius_gyration"/> b) indicates that our protein stays at very nearly the same proportions during the simulation with a radius of gyration of 1.93 nm. There is only a slight increase in the X- and Z-axes at the start but no noticeable changes during the later simulation. Similarly, the moments of inertia (cf. <xr id="tab:inertia"/> b) ) indicate that the protein holds a stable outer form during the simulation.

Arg408Trp

<xr id="tab:radius_gyration"/> c) shows how the mutant protein contracts around 5900 ps, from 1.93 nm to 1.91 nm but this transition is abrupt, there is no slow 'drifting apart' or 'agglomeration' during the simulation. These changes appear only on the longer X- and Z-axes. Similarly, the moments of inertia (cf. <xr id="tab:inertia"/> c) ) indicate a change in the proteins stability at 5900 ps.

Structural Analysis: Properties Derived from Configurations


Solvent accessible surface

<figtable id="tab:sas">

Plot of the area accessible to the solvent during the 10 ns simulations.
a) Plot of the area accessible to solvent of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the area accessible to solvent of the Ala322Gly mutation.
c) Plot of the area accessible to solvent of the Arg408Trp mutation.

</figtable>


<figtable id="tab:residue_sas">

Plot of the area of every residue accessible to the solvent during the 10 ns simulations.
a) Plot of the area of every residue accessible to solvent of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the area of every residue accessible to solvent of the Ala322Gly mutation.
c) Plot of the area of every residue accessible to solvent of the Arg408Trp mutation.

</figtable>

Wildtype

Not surprisingly, the residues most accessible to the solvent are situated on the outside of the protein. The first 120 residues lie at the outside of the structure, less accessible residues in this section mostly are bound in secondary structure elements. For example the dip in accessibility from residue 38 to 49 (cf. <xr id="tab:residue_sas"/> a)) form a helix, shielding the residues from the solvent. The peak around 150 to 160 is due to a loop that pokes out of the protein, also the peak at residue 179. The residues most exposed are Lys81, Lys97, Arg179, Glu242, Lys243 and Tyr299, all of them polar, the sidechains pointing towards the outside and located in loops (Glu242, Lys243 and Tyr299) or at the very end of helices (Lys81, Lys97 and Arg179) at the outside. The total accessibility (cf. <xr id="tab:sas"/> a)) increases minimally over the simulation but there are no abrupt changes that would indicate an interesting activity.

Ala322Gly

The analysis for the wildtype in general applies also for this mutation: The terminal parts forming the outside are more accessible to the solvent, and single loops poking through are also very accessible. The loop in the residue 150 to 160 region is a bit less accessible (cf. <xr id="tab:residue_sas"/> b)) and there is a new single residues standing out, Lys32 located in a N-terminal loop with a accessible surface of 1.8 nm^2 compared to (also large) 1.4 nm^2 in the wildtype. Unfortunately we do not have different runs of the wildtype simulation to assign a clear significance to this single difference, but since this mutated protein still retains catalytic function, there is most likely no great functional influence of this residue. The total accessibility shown in <xr id="tab:sas"/> b) does not change much during the simulation or compared to the wildtype.

Arg408Trp

Interestingly, the differences between the wildtype and the functionally weaker mutation Ala322Gly appear again in this mutation: As with the wildtype, the terminal parts forming the outside are more accessible to the solvent, and single loops poking through are also very accessible. The loop in the residue 150 to 160 region is a bit less accessible (cf. <xr id="tab:residue_sas"/> c)) and there is the same single residues standing ou now, Lys32 with a accessible surface of 1.7 nm^2 compared to 1.4 nm^2 in the wildtype and 1.8 nm^2 in the weak mutation. We still do not have clear data to assign functional influence to this difference but the similarities of the mutations hint to a very specific pattern of accessibility in the wildtype that is easily disrupted by mutations at any site (and might be or not be of importance). The total accessibility shown in <xr id="tab:sas"/> c) does not change much during the simulation or compared to the wildtype.

Hydrogen Bonds

<figtable id="tab:hydrogen_bonds_protein">

Plot of the number of hydrogen bonds within the protein during the 10 ns simulations.
a) Plot of the number of hydrogen bonds within the protein of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the number of hydrogen bonds within the protein of the Ala322Gly mutation.
c) Plot of the number of hydrogen bonds within the protein of the Arg408Trp mutation.

</figtable> In <xr id="tab:hydrogen_bonds_protein" /> we show the course of the amount of hydrogenbonds within the protein. Those can show a greater distortion or stability which can then be reflected towards the protein and its function. In our case however there are very little to no changes between the course of the plot of the wildtype and the two mutants. So the mutations do not change the overall amount of hydrogenbonds.

<figtable id="tab:hydrogen_bonds_water">

Plot of the number of hydrogen bonds from protein to water during the 10 ns simulations.
a) Plot of the number of hydrogen bonds from protein to water of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the number of hydrogen bonds from protein to water of the Ala322Gly mutation.
c) Plot of the number of hydrogen bonds from protein to water of the Arg408Trp mutation.

</figtable> As in <xr id="tab:hydrogen_bonds_protein" /> there are hydrogenbonds shown in these plots (<xr id="tab:hydrogen_bonds_water" />), but this time they show the amount of hydrogenbonds formed to the solvent, which is in our case water. different to the former plot one can see some changes introduces by the mutants. In this case we have to distinguish between hydrogenbonds and pairs within 0.35 nm , as the hydrogenbonds itself do not change much, but the course of the pairs within 0.35 nm have greater differences. The pairs within 0.35 nm are in a close enough distance to interact via hydrogenbonds, but their angle is unfavorable.

Wildtype

<xr id="tab:hydrogen_bonds_water" /> a) shows a strong increase in the number of pairs until around step 2000. then there is a small downward movement for about 100 steps and then a slowly but steady increase with some fluctuation. The endpoint of the course is about 6000 .

Ala322Gly

Not unlike the wildtype the plot in <xr id="tab:hydrogen_bonds_water" /> b) shows a strong increase in the beginning, but the increase is slower. The number of hydrogenbonds at step 2000 is about 5000 in the wildtype and around 4000 in this mutation. Then again the course increase slowly, but in the end there is a small decrease again, which leads to an endpoint of about 5500

Arg408Trp

This mutation shows the biggest differences in comparison to the other two plot from <xr id="tab:hydrogen_bonds_water" />. This plot shows a slower increase than the wildtype, but at step 2000 there is a drastic decrease in pairs, which drops the amount from 5000 (the amount of pairs in the wildtype) to around 4000. Then we see again this small increase and like the other mutant this leads to an endpoint of around 5500

Secondary Structure

<figtable id="tab:sec_struc">

Plot of the number of residues forming secondary structure elements during the 10 ns simulations.
a) Plot of the number of residues forming secondary structure elements of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the number of residues forming secondary structure elements of the Ala322Gly mutation.
c) Plot of the number of residues forming secondary structure elements of the Arg408Trp mutation.

</figtable>


<figtable id="tab:sec_struc2">

Plot of the secondary structure per residue during the 10 ns simulations.
a) Plot of the the secondary structure per residue of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the secondary structure per residue of the Ala322Gly mutation.
c) Plot of the secondary structure per residue of the Arg408Trp mutation.

</figtable>

Wildtype

Overall about two thirds of the protein are structured throughout the whole simulation, where structured means A-Helix, B-Bridge, B-Sheet and Turn. Again about two thirds of the structured parts are an A-Helix in the wildtype (see <xr id="tab:sec_struc" /> a)). Interesting to notice is that in the run of the simulation the number of residues, which are in an A-Helix increases, whereas the overall structured residues stay the same. This means, there is a slight shift from other structures to A-Helical. In the same time and the same amount, the number of 3'-Helices is decreasing (indicated by the purple course at the very left).

Ala322Gly

The first big difference one notices, is the introduction of the 5'-Helix in the system, this shows in a change of colors in the plot. In comparison to the wildtype (see <xr id="tab:sec_struc" /> a)), the 5'-helix can almost be excluded, as there are only very little residues with this structure and only for a very little time in this mutation (see <xr id="tab:sec_struc" /> b) mostly around step 8000).
The total number of structured residues is almost identical to the one in the wildtype, but the increase in A-helical structures is not shown. As well as the absence of the decrease of 3'-Helices, there is a slightly increase in residues, which are present in turns.

Arg408Trp

Around the mutation (residue 290 after applying the offset, see <xr id="tab:sec_struc2" /> c)) there is a change from helix to turn structure, probably caused by the weakening of the neighbouring helical elements by the bulky tryptophane. The trajectories reveal that the overall conformation does not change significantly - the helix stays in place but 'opens' at the end -, but enough to apparently influence the dssp prediction. Overall, there are less A-helical residues, less Coil residues, slightly more 3-helical residues and about ten residues less altogether in structure elements (see <xr id="tab:sec_struc" /> c)).


Ramachandran Plots

<figtable id="tab:ramachandran">

Ramachandran plot of the residue angles during the 10 ns simulations in 1 ns steps.
a) Ramachandran plot of the residue angles of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Ramachandran plot of the residue angles of the Ala322Gly mutation.
c) Ramachandran plot of the residue angles of the Arg408Trp mutation.

</figtable> <xr id="tab:ramachandran"/> shows the configuration of angles in the simulated proteins in steps of 10 ns. Shown are the plot of all residues, of the highly flexible glycine and the helix breaker proline that only allows few angle configurations as well as the residues before proline that might be of interest. Favored and allowed regions are coloured following Lowell et al. <ref name=Lovell>Lovell, S.C.; Davis, I.W.; Arendall, W.B.; De Bakker, P.I.W.; Word, J.M.; Prisant, M.G.; Richardson, J.S.; Richardson, D.C. (2003). "Structure validation by C-alpha geometry: Phi,Psi and C-beta deviation". Proteins: Structure, Function, and Genetics 50 (3): 437–50</ref>.

Wildtype

<xr id="tab:ramachandran"/> a) shows the configuration of angles in the wildtype. We see a concentration in the areas allowed for alpha-helices and beta-sheets but also a few residues in the left handed helix region. The proline and pre-proline angles stays strictly within the favorable regions,

Ala322Gly

<xr id="tab:ramachandran"/> b) shows only very small differences to the wildtype. There is a slightly greater variability in the general plot but the angles stay in the allowed regions at all times. The special cases glycine, proline and pre-proline appear unchanged.

Arg408Trp

<xr id="tab:ramachandran"/> c) shows a few differences to the wildtype. The clustering of the alpha helix region is less dense in the general plot and there are more angles configurations outside the favored regions. Still, the angles stay in the allowed regions at all times. For proline, we see two residues in new configurations outside the favored regions, while the glycines and pre-prolines appear unchanged. The plots hint at a greater flexibility in the structure but do not allow conclusions to the functional impediments we know are present in this mutation.


Analysis of Dynamics and Time-averaged Properties


Root Mean Square Deviations

<figtable id="tab:rmsd_backbone_b">

Plot of the RMSD of the backbone and the C beta atom during the 10 ns simulations.
a) Plot of the RMSD of the backbone and the C beta atom of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the RMSD of the backbone and the C beta atom of the Ala322Gly mutation.
c) Plot of the RMSD of the backbone and the C beta atom of the Arg408Trp mutation.

</figtable>


<figtable id="tab:rmsd_backbone_b_matrix">

Matrix of the RMSD of the backbone and the C beta atom during the 10 ns simulations.
a) Matrix of the RMSD of the backbone and the C beta atom of the wildtype phenylalanine hydroxylase structure 1J8U in the upper left half and membership in the same (red) or different (blue) cluster in the lower half.
b) Matrix of the RMSD of the backbone and the C beta atom of the Ala322Gly mutation in the upper left half and membership in the same (red) or different (blue) cluster in the lower half.
c) Matrix of the RMSD of the backbone and the C beta atom of the Arg408Trp mutation in the upper left half and membership in the same (red) or different (blue) cluster in the lower half.

</figtable> Including the C-beta atom in the RMSD takes the direction of sidechains into account, but does not care for the exact rotamer of the sidechain. <xr id="tab:rmsd_backbone_b_matrix"/> shows the matrix of pairwise RMSDs. <xr id="tab:rmsd_backbone_b"/> shows the RMSD of backbone and C-beta atoms vs. the starting structure as reference. Areas with lower RMSD (blue or light green squares) in the matrix indicate conformations that are stable for a time.

Wildtype

In the wildtype we see seven stable sections with durations from 500 to 2500 ps (cf. <xr id="tab:rmsd_backbone_b_matrix"/> a)). Transitions, marked by high RMSD values near the diagonal, occur around 1500 ps, 2700 ps, 4700 ps, 5600 ps, 6800 ps and 8500 ps. These transitions are somewhat subjective, and could be subdivided or joined to larger regions in some cases. Of interest is the longest stable period from 2700 ps to 4700. It is disrupted for about 100 ps but returns to the stable state again and still shows similarity to the states in the following three ns, indicating a very stable conformation of the native protein.

Ala322Gly

In the Ala322Gly mutation there is a large number of very short stable states and a high average similarity between all states signified by little blue and red colouring in the matrix (cf. <xr id="tab:rmsd_backbone_b_matrix"/> b)). There is only one larger and more stable period from 4200 ps to 5500 ps. This fits to the observation that fluctuation of this mutation is significantly smaller and stability higher, altogether there are only gradual transitions in a more narrow folding space. This restricted range of conformations could also be the reason for the decrease in function in this mutation.

Arg408Trp

In the Arg408Trp mutation we see five stable periods with transitions approximately at 1500 ps, 2000 ps, 3400 ps and 5600 ps. There is one long period at the end of the simulation that is clearly marked as large blue square. This marks probably a stable conformation of the protein. Although overall this mutation is of comparable flexibility, this conformation might trap the protein in a non-functional state.


Cluster Analysis

With the standard parameters, a cut-off of 0.1 and the gromacs clustering algorithm, we obtained 15 clusters in the wildtype, 8 in the Ala322Gly mutation and 13 in the Arg408Trp mutation. The median structures of the largest two can be found in <xr id="tab:structure_clusters"/>. Gromos also calculates the number of transitions to and from clusters, where few transitions mean that the trajectory stays continuously within the cluster for a longer time.

<figtable id="tab:structure_clusters">

Aligned structures of the largest two clusters of every mutation.
a) Structures of the two largest clusters in the simulation of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Structures of the two largest clusters in the simulation of the Ala322Gly mutation.
c) Structures of the two largest clusters in the simulation of the Arg408Trp mutation.

</figtable>

Wildtype

The two largest clusters have 477 and 189 members and 138 and 98 cluster transitions between successive frames respectively. The larger cluster has much less transitions, signifying a more stable conformation. The representative structures are from frames 7640 ps and 2960 ps, which fall in the largest stable periods identified in the previous analysis. The structures show significant differences in the placement of loops, secondary elements are shifted but largely remain in place (see <xr id="tab:structure_clusters"/> a)), the pairwise RMS value calculated by pymol is 0.779.

Ala322Gly

The two largest clusters have 791 and 82 members and 189 and 69 cluster transitions between successive frames respectively. The larger cluster contains almost 8% of all structures and shows only half the number of transitions expected in the wildtype, which fits the observation that there are only few larger changes in the mutant. The representative structures are from frames 5420 ps and 430 ps. In these structures (see <xr id="tab:structure_clusters"/> b)), we also see much less differences than in the wildtype, even loops are placed similarly, the RMS value is 0.736

Arg408Trp

The two largest clusters have 513 and 205 members and 148 and 45 cluster transitions respectively. These numbers are similar to the wildtype, but the representative structures from frames 7300 ps and 530 ps have a RMS value of 0.925. In <xr id="tab:structure_clusters"/> c), even a few secondary structure elements are displaced, most noticeably the short helix to the left of the center.

Cross-Comparison of Clusters

<figtable id="tab:cross_clusters"> RMS value of clusters across simulations

Ala322Gly c1 Ala322Gly c2 Arg408Trp c1 Arg408Trp c2
Wildtype c1 0.894 1.028 0.913 1.025
Wildtype c2 0.938 0.948 0.851 0.924

</figtable>


Distance RMSD

<figtable id="tab:distance_rmsd">

Plot of the distance-RMSD of protein during the 10 ns simulations.
a) Plot of the distance-RMSD of the wildtype phenylalanine hydroxylase structure 1J8U.
b) Plot of the distance-RMSD of the Ala322Gly mutation.
c) Plot of the distance-RMSD of the Arg408Trp mutation.

</figtable>

The distance RMSD differs from the normal RMSD as it does not involve least square fitting distorting the results. The graphs show similar properties but smaller differences are discussed in the subsections.

Wildtype

We see the same basic tendencies in the plot of the normal RMSD (<xr id="tab:rmsd_all-atom-vs-start"/> a)) and the dRMSD (<xr id="tab:distance_rmsd"/> a)) but less pronounced in the dRMSD, for example the spike around 1800 ps. As the RMSD, the dRMSD shows no clear convergence but ends at 0.211 slightly below the RMSD value of 0.225 and shows less of an upwards trend after 6000 ps.

Ala322Gly

Overall the dRMSD (see <xr id="tab:distance_rmsd"/> b)) of the Ala322Gly mutation lies 0.02 below the normal RMSD (see <xr id="tab:rmsd_all-atom-vs-start"/> b)), and shows the same behaviour: It converges at around 5000 ps at a value of 0.15. Some of the spikes are more pronounced, e.g. around 4000 ps and 8500 ps.

Arg408Trp

As with the previous mutation, the dRMSD (see <xr id="tab:distance_rmsd"/> c)) of the Arg408Trp mutation lies approximately 0.02 below the normal RMSD (see <xr id="tab:rmsd_all-atom-vs-start"/> c)). The curve follows the same trend, rising steadily until 8500 ps to 0.19 and falling towards the end of the simulation to 0.175.


Conclusion of the Comparison of Wildtype and Mutants

All conclusions are under the caveat that we only run one simulation per wildtype/mutation and can only draw from this small data set of observations.
We expected to see minor changes in protein behaviour in the Ala322Gly mutation that inhibits, but not completely disables, enzyme function. This is true for the structural properties we discussed above but there are significant differences e.g. in the overall flexibility of the mutant that fluctuates less, despite the fact that the introduced mutation removed sterical constraints. The same stability can be observed in the cluster analysis and the RMSD matrix. It is possible that the motions necessary for enzyme function are despite the greater rigidity still possible for this mutant but happen at a lower rate, thereby reducing enzyme activity. We observed no direct influence of the mutation on the binding pocket.

The Arg408Trp mutation causes Phenylketonuria so we expected to see changes in the protein behaviour that disable the enzyme function severely. Indeed we observed major changes in fundamental properties like the Coulomb interaction energy and even in the hydrogen bond network. As with the previous mutant, this mutation shows decreased flexibility over the whole structure, despite the introduction of an ill-fitting residue, but not significantly. The cluster analysis on the other hand shows a behaviour more similar to the wildtype, including changes in conformation over the simulation. Since there is also no direct influence on the binding site observable, we have to assume that the functional break-down is a consequence of the inability of larger conformational changes not seen in this simulation (including likely problems with the folding of the protein - the simulation starts with the fully folded structure), blocked access to the binding site of ligands, improper binding due to increased or decreased flexibility in critical areas outside the binding site or even other reasons not due to dynamic properties of the protein itself (like agglomeration, polymerization (PAH is a tetramer), (mis-)recognition by binding partners, etc.).

References

<references/>