Difference between revisions of "Molecular Dynamcis analysis"

From Bioinformatikpedia
m (Molecular Dynamics analysis)
(Discussion)
 
(259 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
<sup>by [[User:Greil|Robert Greil]] and [[User:Landerer|Cedric Landerer]]</sup>
=Molecular Dynamics analysis=
 
 
==Wildtype==
 
==Wildtype==
  +
  +
First of all, we checked the resultfile with <code>gmxcheck</code>.
  +
* gmxcheck -f ref_md.tpr.xtc
  +
  +
'''Result'''
  +
Reading frame 0 time 0.000
  +
# Atoms 68601
  +
Precision 0.001 (nm)
  +
Last frame 2000 time 10000.000
  +
  +
Item #frames Timestep (ps)
  +
Step 2001 5
  +
Time 2001 5
  +
Lambda 0
  +
Coords 2001 5
  +
Velocities 0
  +
Forces 0
  +
Box 2001 5
  +
  +
The Simulation took ''6h33:50'' and the simulation speed was ''36.564 ns/day''. So, to reach 1 second of simulation, we had to wait around ''75061'' years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about ''-9.185e+05 kJ/mol'' with a range of about ''0.15e+04 kJ/mol''. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command <code>trjconv -s ref_md.tpr -f ref_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10</code>.
  +
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_cartoon_anim.gif|thumb|Figure 1.1: Motion of the wild-type protein, Cartoon representation]]
  +
|[[File:1a6z_wt_anim.gif|thumb|Figure 1.2: Motion of the wild-type protein]]
  +
|}
  +
  +
To create the images, we saved each frame in PyMol<ref>The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC</ref> as an image, which we converted into gif format by <code>for file in *.png; do convert "$file" "$(basename $file .png).gif"; done</code>. Than we were able to use ''gifsicle'' to create an animated gif with the command <code>gifsicle *.gif -loop</code>.
  +
  +
As one can see in Figure 1.1 and Figure 1.2, we have mostly a motion in space. The motion within the protein, like it is shown by the [[task_9|normal mode analysis]] is not identifiable. There is no movement of the beta-sheet or helical region against each other.
  +
  +
===Quality assurance===
  +
For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.
  +
  +
====Convergence of energy terms====
  +
To create the necessary files to check the convergence of energy terms we used the command:
  +
* <code>g_energy -f ref_md.tpr.edr -o xvg/xxxx.xvg</code>.
  +
  +
'''Temperature'''
  +
  +
[[File:Temperature_wt_1a6z.png|300px|thumb|center|Figure 1.3: Temperature over time of the molecular dynamics simulation of the wild-type protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Temperature 297.942 0.0055 1.11471 0.0143589 (K)
  +
  +
Figure 1.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.
  +
  +
'''Pressure'''
  +
  +
[[File:Pressure_wt_1a6z.png|300px|thumb|center|Figure 1.4: Pressure over time of the molecular dynamics simulation of the wild-type protein]]
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Pressure 0.995939 0.014 83.256 0.0113633 (bar)
  +
  +
Figure 1.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar<ref>D. A. T. Dick and Leah M. Lowenstein Osmotic Equilibria in Human Erythrocytes Studied by Immersion Refractometry</ref>, so the pressure during the simulation is not in the physiological range.
  +
  +
'''Energy'''
  +
[[File:Energy_wt_1a6z.png|300px|thumb|center|Figure 1.5: Energy over time of the molecular dynamics simulation of the wild-type protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Potential -919346 21 889.917 -54.8929 (kJ/mol)
  +
Kinetic En. 164147 3 614.133 7.91086 (kJ/mol)
  +
Total Energy -755199 19 1096.12 -46.9806 (kJ/mol)
  +
  +
Figure 1.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.
  +
  +
'''Volume'''
  +
[[File:Volume_wt_1a6z.png|300px|thumb|center|Figure 1.6: Volume over time of the molecular dynamics simulation of the wild-type protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Volume 694.068 0.012 0.512013 0.00522608 (nm^3)
  +
  +
The simulation volume shown in Figure 1.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.
  +
  +
'''Density'''
  +
[[File:Density_wt_1a6z.png|300px|thumb|center|Figure 1.7: Density over time of the molecular dynamics simulation of the wild-type protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Density 999.461 0.018 0.737307 -0.00750088 (kg/m^3)
  +
  +
The density of the simulation stays all the time in the same range (Figure 1.7), with just a small divergence. Therefor we can assume a stable density for the simulation.
  +
  +
'''Box'''
  +
[[File:Box_wt_1a6z.png|300px|thumb|center|Figure 1.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the wild-type protein. The X and Y values are equal over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Box-X 9.93815 5.8e-05 0.00244379 2.49479e-05 (nm)
  +
Box-Y 9.93815 5.8e-05 0.00244379 2.49479e-05 (nm)
  +
Box-Z 7.02734 4.1e-05 0.00172803 1.76268e-05 (nm)
  +
  +
The size of the box containing the simulation volume is very constant (Figure 2.8).
  +
  +
==== Interaction Energy: Coulomb ====
  +
Here we take a look at the interaction energies between the protein and the solvent.
  +
  +
[[File:1a6z_wt_Coulomb.png|300px|thumb|center|Figure 1.9: Coulomb energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Coul-SR:Protein-non-Protein -21213.5 39 373.942 197.347 (kJ/mol)
  +
Coul-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
The coulomb energy is not converged at all like it is shown in Figure 1.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Interaction Energy: Van der Waals ====
  +
We also take a look at the non covalent interaction energies between the protein and the solvent
  +
  +
[[File:1a6z_wt_Vanderwaals.png|300px|thumb|center|Figure 1.10: Van der Waals energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
LJ-SR:Protein-non-Protein -2237.6 18 140.778 93.3922 (kJ/mol)
  +
LJ-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 1.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Minimum distances between periodic images====
  +
To create the corresponding <code>.xvg</code> files, we used the command <code>g_mindist -f ref_md.tpr.xtc -s ref_md.tpr -od xvg/min_dist_XXXX.xvg -pi</code>.
  +
  +
Figure 1.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 1.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots.
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_min_dist_prot.png|thumb|Figure 1.11: Minimal periodic distance on the protein]]
  +
|[[File:1a6z_wt_min_dist_calpha.png|thumb|Figure 1.12: Minimal periodic distance on the C-alpha groups]]
  +
|}
  +
  +
For the C-alpha groups, the shortest periodic distance is 2.6075 (nm) at time 6015 (ps), between atoms 2513 and 4306. For the whole protein, the shortest periodic distance is 1.74506 (nm) at time 6120 (ps),between atoms 2532 and 3799. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.
  +
  +
====Root mean square fluctuations====
  +
To get the root mean square<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref> (RMSF) fluctuations of the protein, we used the command:
  +
*<code>g_rmsf -f ref_md.tpr.xtc -s ref_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res</code>
  +
  +
Figure 1.13 indicates that the HFE protein is very stable. Just 2 smaller regions about residue 89 and about residue 196 show a larger flexibility. This is not surprising as, according to UniProt, both regions are within a loop.
  +
{|align="center"
  +
|[[File:1a6z_wt_rmsf_per_res.png|thumb|center|Figure 1.13: RMSF per residue of HFE in nm]]
  +
|[[File:1a6z_wt_average_bfactor_cartoon.png|thumb|center|Figure 1.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.]]
  +
|}
  +
Figure 1.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. The possible functional region around the three helices is shown as very stable.
  +
  +
====Convergence of RMSD====
  +
To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:
  +
*<code>trjconv -f ref_md.tpr.xtc -o traj_nojump.xtc -pbc nojump</code>
  +
*<code>g_rms -f traj_nojump.xtc -s ref_md.tpr xvg/rmsd_XXX_vs_start.xvg</code> used option 1,1 and option 1,4.
  +
*<code>echo 1 | trjconv -f traj_nojump.xtc -s ref_md.tpr -o protein.xtc</code>
  +
*<code>g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_all_vs_average.xvg</code> used option 1,1 and option 1,4.
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_rmsd_all_vs_start.png|thumb|center|Figure 1.15: RMSD over time of all atoms versus the starting structure]]
  +
|[[File:1a6z_wt_rmsd_all_vs_average.png|thumb|center|Figure 1.16: RMSD over time of all atoms versus the average structure]]
  +
|[[File:1a6z_wt_rmsd_backbone_vs_start.png|thumb|center|Figure 1.17: RMSD over time of backbone versus the starting structure]]
  +
|[[File:1a6z_wt_rmsd_backbone_vs_average.png|thumb|center|Figure 1.18: RMSD over time of backbone versus the average structure]]
  +
|}
  +
  +
In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing (Figure 1.15, 1.17), the RMSD against the average structure shows an oscillating behavior(Figure 1.16, 1.18). But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 1.15, 1.17 and Figure 1.16, 1.18).
  +
  +
The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.
  +
  +
====Convergence of radius of gyration====
  +
Gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref> is a measure of the size of an object, a surface, or an ensemble of points.
  +
  +
* <code>g_gyrate -f ref_md.tpr.xtc -s ref_md.tpr -o xvg/radius_gyration.xvg</code>
  +
  +
[[File:1a6z_wt_radius_gyration.png|300px|thumb|center|Figure 1.19: radius of gyration in nm]]
  +
  +
There is no convergence of the gyration observable (Figure 1.19). The gyration moves in a range from 2.25 to 2.5. The X and Y directed gyrations show an comparable behavior, while the gyration in Z-direction behave totally different.
  +
  +
=== Structural analysis: properties derived from configurations ===
  +
  +
==== Solvent accessible surface area ====
  +
To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA)<ref>http://en.wikipedia.org/wiki/Accessible_surface_area</ref> by using the command:
  +
*<code>g_sas -f traj_nojump.xtc -s ref_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_solvent_acc_surf.png|thumb|center|Figure 1.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues. ]]
  +
|[[File:1a6z_wt_atomic_sas.png|thumb|center|Figure 1.21: Average area of accessible surface per atom and standard deviation in nm².]]
  +
|[[File:1a6z_wt_residue_sas.png|thumb|center|Figure 1.22: Average area of accessible surface per residue and standard deviation in nm².]]
  +
|}
  +
  +
Figure 1.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Interestingly, an equal amount of hydrophobic and hydrophilic residues are accessible.
  +
Figure 1.21 show the average accessible area of all atoms. Figure 1.22 show the same for each residue. In both cases, a core region which is lies protected within the protein is not observable.
  +
  +
====Hydrogen bonds====
  +
We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:
  +
*<code>echo 1 1 | g_hbond -f traj_nojump.xtc -s ref_md.tpr -num xvg/hydro_bonds_intra_prot.xvg</code>
  +
*<code>echo 1 12 | g_hbond -f traj_nojump.xtc -s ref_md.tpr -num xvg/hydro_bonds_prot_water.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_hydro_bonds_intra_prot.png|thumb|center|Figure 1.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|[[File:1a6z_wt_hydro_bonds_prot_water.png|thumb|center|Figure 1.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|}
  +
  +
Figure 1.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 1.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent.
  +
  +
====Ramachandran (phi/psi) plot====
  +
  +
The Ramachandran plot<ref>http://en.wikipedia.org/wiki/Ramachandran_plot</ref> visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:
  +
* <code>g_rama -f traj_nojump.xtc -s ref_md.tpr -o xvg/ramachandran.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_ramachandran.png|300px|thumb|center|Figure 1.25: Ramachandran plot of the HFE protein.]]
  +
|[[File:rama.gif|300px|thumb|center|Figure 1.26: Standard Ramachandran plot<ref>http://www.cryst.bbk.ac.uk/PPS95/course/3_geometry/rama.html</ref>.]]
  +
|}
  +
  +
We can observe a lot of angle combinations (Figure 1.25) which are not part of the standard ramachandran plot (Figure 1.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 1.25, many left handed alpha helices.
  +
  +
=== Analysis of dynamics and time-averaged properties ===
  +
  +
==== Root mean square deviations again ====
  +
  +
  +
* <code>g_rms -s ref_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10</code>
  +
* <code>xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue</code>
  +
  +
[[File:1a6z_wt_rmsd-matrix.png|200px|thumb|center|Figure 1.27: Dotplot of the RMSD of HFE against HFE over time.]]
  +
  +
Figure 1.27 shows the rmsd over time, we see that the rmsd decreases over time in both dimensions. That means, that the movement after 1200 ps is less strong then before. The protein converges after 1200 ps.
  +
  +
====Cluster analysis====
  +
Here we analyzed cluster distribution and the size of the clusters.
  +
We used the command:
  +
* <code>echo 6 6 | g_cluster -s ref_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_wt_cluster_sizes.png|thumb|center|Figure 1.28: Cluster size distribution]]
  +
|[[File:1a6z_wt_rmsd_disstribution.png|thumb|center|Figure 1.29: RMSD distribution]]
  +
|}
  +
  +
470 clusters were found by <code>g_cluster</code>. The most clusters consists of just one structure. Just 2 clusters consits of 7 structures (Figure 1.28). The RMSD distribution shown in Figure 1.29 shows that the most structures have an RMSD larger than 0.2. This indicates a flexible structure.
  +
  +
====Distance RMSD====
  +
To get the dRMSD, we used the command:
  +
*<code>g_rmsdist -s ref_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg</code>
  +
  +
[[File:1a6z_wt_distance_rmsd.png|200px|thumb|center|Figure 1.30: dRMSD of the wild-type]]
  +
  +
Figure 1.30 shows the same behavior as Figure 1.15. Also no convergence is observable.
  +
 
==Mutation 8a [C282Y]==
 
==Mutation 8a [C282Y]==
  +
still waiting to finish // LRZ won't do anything
 
  +
First of all, we checked the resulting file with <code>gmxcheck</code>.
  +
* gmxcheck -f 282a_md.tpr.xtc
  +
  +
'''Result'''
  +
Reading frame 0 time 0.000
  +
# Atoms 68604
  +
Precision 0.001 (nm)
  +
Last frame 2000 time 10000.000
  +
  +
Item #frames Timestep (ps)
  +
Step 2001 5
  +
Time 2001 5
  +
Lambda 0
  +
Coords 2001 5
  +
Velocities 0
  +
Forces 0
  +
Box 2001 5
  +
  +
The Simulation took ''11h32:38'' and the simulation speed was ''20.790 ns/day''. So, to reach 1 second of simulation, we had to wait around ''131,780.954'' years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about ''-9.19218e+05 kJ/mol'' with a range of about ''0.15e+04 kJ/mol''. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command <code>trjconv -s 282a_md.tpr -f 282a_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10</code>.
  +
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_cartoon_anim.gif|thumb|Figure 2.1: Motion of the mutated protein, Cartoon representation]]
  +
|[[File:1a6z_282a_aim.gif|thumb|Figure 2.2: Motion of the mutated protein]]
  +
|}
  +
  +
To create the images, we saved each frame in PyMol as an image, which we converted into gif format by <code>for file in *.png; do convert "$file" "$(basename $file .png).gif"; done</code>. Than we were able to use ''gifsicle'' to create an animated gif with the command <code>gifsicle *.gif -loop</code>.
  +
  +
As one can see in Figure 2.1 and Figure 2.2, we have mostly a motion in space. The motion within the protein, like it is shown by the [[task_9|normal mode analysis]] is not identifiable. There is no movement of the beta-sheet or helical region against each other.
  +
  +
===Quality assurance===
  +
For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.
  +
  +
====Convergence of energy terms====
  +
To create the necessary files to check the convergence of energy terms we used the command:
  +
* <code>g_energy -f 282a_md.tpr.edr -o xvg/xxxx.xvg</code>.
  +
  +
'''Temperature'''
  +
[[File:1a6z_282a_temperature.png|300px|thumb|center|Figure 2.3: Temperature over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Temperature 297.941 0.0052 1.11152 0.00591841 (K)
  +
  +
Figure 2.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.
  +
  +
'''Pressure'''
  +
[[File:1a6z_282a_pressure.png|300px|thumb|center|Figure 2.4: Pressure over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Pressure 1.00547 0.014 82.9753 0.0439014 (bar)
  +
  +
  +
Figure 2.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar, so the pressure during the simulation is not in the physiological range.
  +
  +
'''Energy'''
  +
[[File:1a6z_282a_energy.png|300px|thumb|center|Figure 2.5: Energy over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Potential -919218 38 888.988 -238.954 (kJ/mol)
  +
Kinetic En. 164142 2.9 612.357 3.26073 (kJ/mol)
  +
Total Energy -755076 37 1093.34 -235.696 (kJ/mol)
  +
  +
Figure 2.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.
  +
  +
'''Volume'''
  +
[[File:1a6z_282a_volume.png|300px|thumb|center|Figure 2.6: Volume over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Volume 694.169 0.02 0.527806 -0.0281068 (nm^3)
  +
  +
The simulation volume shown in Figure 2.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.
  +
  +
'''Density'''
  +
[[File:1a6z_282a_density.png|300px|thumb|center|Figure 2.7: Density over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Density 999.333 0.029 0.759851 0.0403628 (kg/m^3)
  +
  +
  +
The density of the simulation stays all the time in the same range (Figure 2.7), with just a small divergence. Therefor we can assume a stable density for the simulation.
  +
  +
'''Box'''
  +
[[File:1a6z_282a_box.png?|300px|thumb|center|Figure 2.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the mutated protein. The X and Y values are equal over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Box-X 9.93864 9.7e-05 0.00251894 -0.000134026 (nm)
  +
Box-Y 9.93864 9.7e-05 0.00251894 -0.000134026 (nm)
  +
Box-Z 7.02768 6.9e-05 0.00178116 -9.47747e-05 (nm)
  +
  +
  +
The size of the box containing the simulation volume is very constant (Figure 2.8).
  +
  +
==== Interaction Energy: Coulomb ====
  +
Here we also take a look at the non covalent interaction energies between the protein and the solvent
  +
  +
[[File:1a6z_282a_coulomb.png|300px|thumb|center|Figure 2.9: Coulomb energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Coul-SR:Protein-non-Protein -21282.1 120 458.064 -645.984 (kJ/mol)
  +
Coul-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
The coulomb energy is not converged at all like it is shown in Figure 2.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. The fluctuation is much higher than for the wild-type. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Interaction Energy: Van der Waals ====
  +
We also take a look at the non covalent interaction energies between the protein and the solvent
  +
  +
[[File:1a6z_282a_vanderwaals.png|300px|thumb|center|Figure 2.10: Van der Waals energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
LJ-SR:Protein-non-Protein -2230.49 11 135.734 -66.2062 (kJ/mol)
  +
LJ-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
  +
The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 2.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. The energy level is a bit lower as the one of the wild-type. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Minimum distances between periodic images====
  +
To create the corresponding <code>.xvg</code> files, we used the command <code>g_mindist -f 282a_md.tpr.xtc -s 282a_md.tpr -od xvg/mindist_xxxx.xvg -pi</code>.
  +
  +
Figure 2.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 2.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots.
  +
Figure 2.12 shows an oscillation for the minimal distance and a slightly increasing tendency of the maximum internal distance.
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_mindist_prot.png|thumb|Figure 2.11: Minimal periodic distance on the protein]]
  +
|[[File:1a6z_282a_mindist_calpha.png|thumb|Figure 2.12: Minimal periodic distance on the C-alpha groups]]
  +
|}
  +
  +
For the C-alpha groups, the shortest periodic distance is 2.32134 (nm) at time 9040 (ps), between atoms 2472 and 4307. For the whole protein, the shortest periodic distance is 1.55939 (nm) at time 9125 (ps), between atoms 2488 and 4355. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.
  +
  +
====Root mean square fluctuations====
  +
To get the root mean square (RMSF) fluctuations of the protein, we used the command:
  +
*<code>g_rmsf -f 282a_md.tpr.xtc -s 282a_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res</code>
  +
  +
Figure 2.13 shows a different RMS fluctuation as Figure 1.13. The region about residue 196 which is the most flexible one in the wild-type, is still flexible but not as much as in the wild-type. Therefor the region about residue 41 is more flexible than in the wild-type. So, we see a change in flexibility mostly at the C-terminus of the protein.
  +
  +
{|align="center"
  +
|[[File:A6z_282a_rmsf_per_res.png|thumb|center|Figure 2.13: RMSF per residue of HFE in nm]]
  +
|[[File:1a6z_282a_average_bfactor_cartoon.png|thumb|center|Figure 2.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.]]
  +
|}
  +
Figure 2.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. In this case, the possible functional region is mostly affected. The underlying beta-sheet is now more flexible than in the wild-type. Also the alpha-helices are now more flexible. This is quite impressive as the mutation is not interacting with this region.
  +
  +
====Convergence of RMSD====
  +
To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:
  +
*<code>trjconv -f 282a_md.tpr.xtc -o traj_nojump.xtc -pbc nojump</code>
  +
*<code>g_rms -f traj_nojump.xtc -s 282a_md.tpr -o xvg/rmsd_XXX_vs_start.xvg</code> used option 1,1 and option 1,4.
  +
*<code>echo 1 | trjconv -f traj_nojump.xtc -s 282a_md.tpr -o protein.xtc</code>
  +
*<code>g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_XXX_vs_average.xvg</code> used option 1,1 and option 1,4.
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_rmsd_all_vs_start.png|thumb|center|Figure 2.15: RMSD over time of all atoms versus the starting structure]]
  +
|[[File:1a6z_282a_rmsd_all_vs_average.png|thumb|center|Figure 2.16: RMSD over time of all atoms versus the average structure]]
  +
|[[File:1a6z_282a_rmsd_backbone_vs_start.png|thumb|center|Figure 2.17: RMSD over time of backbone versus the starting structure]]
  +
|[[File:1a6z_282a_rmsd_backbone_vs_average.png|thumb|center|Figure 2.18: RMSD over time of backbone versus the average structure]]
  +
|}
  +
  +
In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing with some peaks(Figure 2.15, 2.17), the RMSD against the average structure shows an oscillating behavior(Figure 2.16, 2.18), even more than the wild-type. But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 2.15, 2.17 and Figure 2.16, 2.18).
  +
  +
The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.
  +
  +
====Convergence of radius of gyration====
  +
Gyration is a measure of the size of an object, a surface, or an ensemble of points.
  +
  +
* <code>g_gyrate -f 282a_md.tpr.xtc -s 282a_md.tpr -o xvg/radius_gyration.xvg</code>
  +
  +
[[File:1a6z_282a_radius_gyration.png|300px|thumb|center|Figure 2.19: radius of gyration in nm]]
  +
  +
There is no convergence of the gyration observable (Figure 2.19). Just RgX and RgY reach a common point, but this also happened before about 3000ps. so, we can not be sure if these two really converge. But like in the case of the wild-type, the Z direction shows a different behavior. The gyration moves in a range from 2.25 to 2.5.
  +
  +
=== Structural analysis: properties derived from configurations ===
  +
  +
==== Solvent accessible surface area ====
  +
To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA) by using the command:
  +
*<code>g_sas -f traj_nojump.xtc -s 282a_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_solvent_acc_surf.png|thumb|center|Figure 2.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues.]]
  +
|[[File:1a6z_282a_atomic_sas.png|thumb|center|Figure 2.21: Average area of accessible surface per atom and standard deviation in nm².]]
  +
|[[File:1a6z_282a_residue_sas.png|thumb|center|Figure 2.22: Average area of accessible surface per residue and standard deviation in nm².]]
  +
|}
  +
  +
Figure 2.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Like in the wild-type, an equal amount of hydrophobic and hydrophilic residues are accessible. But here, the amount of hydrophobic accessible residues is slightly higher.
  +
Figure 2.21 show the average accessible area of all atoms. Figure 2.22 show the same for each residue. In both cases, a core region wich is lies protected within the protein is not observable.
  +
  +
====Hydrogen bonds====
  +
We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:
  +
*<code>echo 1 1 | g_hbond -f traj_nojump.xtc -s 282a_md.tpr -num xvg/hydro_bonds_intra_prot.xvg</code>
  +
*<code>echo 1 12 | g_hbond -f traj_nojump.xtc -s 282a_md.tpr -num xvg/hydro_bonds_prot_water.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_hydro_bonds_intra_prot.png|thumb|center|Figure 2.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|[[File:1a6z_282a_hydro_bonds_prot_water.png|thumb|center|Figure 2.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|}
  +
  +
Figure 2.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 2.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent.
  +
  +
In general, there is just a slightly observable difference compared to the wild-type at 4500 ps and about 8000 ps in the amount of hydrogen bonds.
  +
  +
====Ramachandran (phi/psi) plot====
  +
  +
The Ramachandran plot visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:
  +
* <code>g_rama -f traj_nojump.xtc -s 282a_md.tpr -o xvg/ramachandran.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_ramachandran.png|300px|thumb|center|Figure 2.25: Ramachandran plot of the HFE protein.]]
  +
|[[File:rama.gif|300px|thumb|center|Figure 2.26: Standard Ramachandran plot.]]
  +
|}
  +
  +
We can observe a lot of angle combinations (Figure 2.25) which are not part of the standard ramachandran plot (Figure 2.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 2.25, many left handed alpha helices. Interesting is, that the large clusters do not spread out that much as in the case of the wild-type. We would call this plot more regular.
  +
  +
=== Analysis of dynamics and time-averaged properties ===
  +
  +
==== Root mean square deviations again ====
  +
  +
  +
* <code>g_rms -s 282a_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10</code>
  +
* <code>xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue</code>
  +
  +
[[File:1a6z_282a_rmsd-matrix.png|200px|thumb|center|Figure 2.27: Dotplot of the RMSD of HFE against HFE over time.]]
  +
  +
Figure 2.27 shows the rmsd over time, we see a stronger decrease of the rmsd over time in both dimensions as in the case of the wild-type. This probably mean, the mutated structure reaches the equilibrium faster than the wild type.
  +
  +
====Cluster analysis====
  +
Here we analyzed cluster distribution and the size of the clusters.
  +
We used the command:
  +
* <code>echo 6 6 | g_cluster -s 282a_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282a_cluster_sizes.png|thumb|center|Figure 2.28: Cluster size distribution]]
  +
|[[File:1a6z_282a_rmsd_disstribution.png|thumb|center|Figure 2.29: RMSD distribution]]
  +
|}
  +
  +
575 clusters were found by <code>g_cluster</code>. The most clusters consists of just one structure. Just 1 clusters consists of 6 structures (Figure 2.28). The RMSD distribution shown in Figure 2.29 shows that the most structures have an RMSD larger than 0.2. This indicates that the flexibility is not affected by the mutation.
  +
  +
====Distance RMSD====
  +
To get the dRMSD, we used the command:
  +
*<code>g_rmsdist -s 282a_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg</code>
  +
  +
[[File:1a6z_282a_distance_rmsd.png|200px|thumb|center|Figure 2.30: dRMSD of the mutated protein]]
  +
  +
Figure 2.30 shows the same behavior as Figure 2.15. Also no convergence is observable. But, the dRMSD is not so spiky. The peaks about 1500 ps, 5000 ps and 7000 ps are not that high. So, Figure 2.30 is a smoother version of the data shown in Figure 2.15.
  +
 
==Mutation 8b [C282S]==
 
==Mutation 8b [C282S]==
  +
still waiting to finish // LRZ won't do anything
 
  +
First of all, we checked the resulting file with <code>gmxcheck</code>.
  +
* gmxcheck -f 282b_md.tpr.xtc
  +
  +
'''Result'''
  +
Reading frame 0 time 0.000
  +
# Atoms 68600
  +
Precision 0.001 (nm)
  +
Last frame 2000 time 10000.000
  +
  +
Item #frames Timestep (ps)
  +
Step 2001 5
  +
Time 2001 5
  +
Lambda 0
  +
Coords 2001 5
  +
Velocities 0
  +
Forces 0
  +
Box 2001 5
  +
  +
The Simulation took ''11h37:37'' and the simulation speed was ''20.641 ns/day''. So, to reach 1 second of simulation, we had to wait around ''132996.409'' years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about ''-9.19215e+05 kJ/mol'' with a range of about ''0.15e+04 kJ/mol''. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command <code>trjconv -s 282b_md.tpr -f 282b_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10</code>.
  +
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_cartoon_anim.gif|thumb|Figure 3.1: Motion of the mutated protein, Cartoon representation]]
  +
|[[File:1a6z_282b_anim.gif|thumb|Figure 3.2: Motion of the mutated protein]]
  +
|}
  +
  +
To create the images, we saved each frame in PyMol as an image, which we converted into gif format by <code>for file in *.png; do convert "$file" "$(basename $file .png).gif"; done</code>. Than we were able to use ''gifsicle'' to create an animated gif with the command <code>gifsicle *.gif -loop</code>.
  +
  +
As one can see in Figure 3.1 and Figure 3.2, we have mostly a motion in space. The motion within the protein, like it is shown by the [[task_9|normal mode analysis]] is not identifiable. There is no movement of the beta-sheet or helical region against each other.
  +
  +
===Quality assurance===
  +
For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.
  +
  +
====Convergence of energy terms====
  +
To create the necessary files to check the convergence of energy terms we used the command:
  +
* <code>g_energy -f 282b_md.tpr.edr -o xvg/xxxx.xvg</code>.
  +
  +
'''Temperature'''
  +
[[File:1a6z_282b_temperature.png|300px|thumb|center|Figure 3.3: Temperature over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Temperature 297.942 0.0062 1.11548 0.00209918 (K)
  +
  +
Figure 3.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.
  +
  +
'''Pressure'''
  +
[[File:1a6z_282b_pressure.png|300px|thumb|center|Figure 3.4: Pressure over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Pressure 1.00504 0.0098 82.9603 -0.0293434 (bar)
  +
  +
Figure 3.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar, so the pressure during the simulation is not in the physiological range.
  +
  +
'''Energy'''
  +
[[File:1a6z_282b_energy.png|300px|thumb|center|Figure 3.5: Energy over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Potential -919215 33 894.443 -178.792 (kJ/mol)
  +
Kinetic En. 164143 3.4 614.547 1.15673 (kJ/mol)
  +
Total Energy -755072 34 1102.52 -177.636 (kJ/mol)
  +
  +
Figure 3.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.
  +
  +
'''Volume'''
  +
[[File:1a6z_282b_volume.png|300px|thumb|center|Figure 3.6: Volume over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Volume 694.06 0.026 0.544105 -0.162848 (nm^3)
  +
  +
The simulation volume shown in Figure 3.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.
  +
  +
'''Density'''
  +
[[File:1a6z_282b_density.png|300px|thumb|center|Figure 3.7: Density over time of the molecular dynamics simulation of the mutated protein]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Density 999.395 0.037 0.783482 0.234471 (kg/m^3)
  +
  +
The density of the simulation stays all the time in the same range (Figure 3.7), with just a small divergence. Therefor we can assume a stable density for the simulation.
  +
  +
'''Box'''
  +
[[File:1a6z_282b_box.png?|300px|thumb|center|Figure 3.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the mutated protein. The X and Y values are equal over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Box-X 9.93812 0.00012 0.00259699 -0.000777245 (nm)
  +
Box-Y 9.93812 0.00012 0.00259699 -0.000777245 (nm)
  +
Box-Z 7.02731 8.8e-05 0.00183635 -0.000549606 (nm)
  +
  +
The size of the box containing the simulation volume is very constant (Figure 3.8).
  +
  +
==== Interaction Energy: Coulomb ====
  +
Here we also take a look at the non covalent interaction energies between the protein and the solvent
  +
  +
[[File:1a6z_282b_coulomb.png|300px|thumb|center|Figure 3.9: Coulomb energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
Coul-SR:Protein-non-Protein -20938.5 30 368.752 -132.83 (kJ/mol)
  +
Coul-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
The coulomb energy is not converged at all like it is shown in Figure 3.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. The fluctuation is much higher than for the wild-type. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Interaction Energy: Van der Waals ====
  +
We also take a look at the non covalent interaction energies between the protein and the solvent
  +
  +
[[File:1a6z_282b_vanderwaals.png|300px|thumb|center|Figure 3.10: Van der Waals energy of the simulation over time.]]
  +
  +
Energy Average Err.Est. RMSD Tot-Drift
  +
-------------------------------------------------------------------------------
  +
LJ-SR:Protein-non-Protein -2086.53 9.1 137.417 9.01909 (kJ/mol)
  +
LJ-14:Protein-non-Protein 0 0 0 0 (kJ/mol)
  +
  +
The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 3.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. The energy level is a bit lower as the one of the wild-type. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.
  +
  +
====Minimum distances between periodic images====
  +
To create the corresponding <code>.xvg</code> files, we used the command <code>g_mindist -f 282b_md.tpr.xtc -s 282b_md.tpr -od xvg/mindist_xxxx.xvg -pi</code>.
  +
  +
Figure 3.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 3.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots.
  +
Figure 3.12 shows an oscillation for the minimal distance and a slightly increasing tendency of the maximum internal distance.
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_mindist_prot.png|thumb|Figure 3.11: Minimal periodic distance on the protein]]
  +
|[[File:1a6z_282b_mindist_caplha.png|thumb|Figure 3.12: Minimal periodic distance on the C-alpha groups]]
  +
|}
  +
  +
For the C-alpha groups, the shortest periodic distance is 2.94439 (nm) at time 3770 (ps), between atoms 2513 and 3324. For the whole protein, the shortest periodic distance is 2.17028 (nm) at time 2345 (ps), between atoms 2532 and 3330. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.
  +
  +
====Root mean square fluctuations====
  +
To get the root mean square (RMSF) fluctuations of the protein, we used the command:
  +
*<code>g_rmsf -f 282b_md.tpr.xtc -s 282b_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res</code>
  +
  +
Figure 3.13 shows a different RMS fluctuation as Figure 1.13. The region about residue 196 which is the most flexible one in the wild-type, shows here the same flexibility like in the wild-type. The region about residue 86 is no less flexible than in the wild-type.
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_rmsf_per_res.png|thumb|center|Figure 3.13: RMSF per residue of HFE in nm]]
  +
|[[File:1a6z_282b_average_bfactor_cartoon.png|thumb|center|Figure 3.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.]]
  +
|}
  +
  +
Figure 3.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. This mutation shows less impact than the other one on the same position. The flexibility is all in all pretty much the same like in the wild-type.
  +
  +
====Convergence of RMSD====
  +
To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:
  +
*<code>trjconv -f 282b_md.tpr.xtc -o traj_nojump.xtc -pbc nojump</code>
  +
*<code>g_rms -f traj_nojump.xtc -s 282b_md.tpr -o xvg/rmsd_XXX_vs_start.xvg</code> used option 1,1 and option 1,4.
  +
*<code>echo 1 | trjconv -f traj_nojump.xtc -s 282b_md.tpr -o protein.xtc</code>
  +
*<code>g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_XXX_vs_average.xvg</code> used option 1,1 and option 1,4.
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_rmsd_all_vs_start.png|thumb|center|Figure 3.15: RMSD over time of all atoms versus the starting structure]]
  +
|[[File:1a6z_282b_rmsd_all_vs_average.png|thumb|center|Figure 3.16: RMSD over time of all atoms versus the average structure]]
  +
|[[File:1a6z_282b_rmsd_backbone_vs_start.png|thumb|center|Figure 3.17: RMSD over time of backbone versus the starting structure]]
  +
|[[File:1a6z_282b_rmsd_backbone_vs_average.png|thumb|center|Figure 3.18: RMSD over time of backbone versus the average structure]]
  +
|}
  +
  +
In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing with some peaks(Figure 3.15, 3.17), the RMSD against the average structure increases at the beginning, but drops about 800 ps to its initial conditions (Figure 3.16, 3.18). But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 3.15, 3.17 and Figure 3.16, 3.18).
  +
  +
The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.
  +
  +
====Convergence of radius of gyration====
  +
Gyration is a measure of the size of an object, a surface, or an ensemble of points.
  +
  +
* <code>g_gyrate -f 282b_md.tpr.xtc -s 282b_md.tpr -o xvg/radius_gyration.xvg</code>
  +
  +
[[File:1a6z_282b_radius_gyration.png|300px|thumb|center|Figure 3.19: radius of gyration in nm]]
  +
  +
There is no convergence of the gyration observable (Figure 2.19). Just RgZ and RgY reach a common point. The gyration moves in a range from 2.25 to 2.5. The behavior of RgZ is not comparable with the other mutation and the wild-type.
  +
  +
=== Structural analysis: properties derived from configurations ===
  +
  +
==== Solvent accessible surface area ====
  +
To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA) by using the command:
  +
*<code>g_sas -f traj_nojump.xtc -s 282b_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_solvent_acc_surf.png|thumb|center|Figure 3.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues.]]
  +
|[[File:1a6z_282b_atomic_sas.png|thumb|center|Figure 3.21: Average area of accessible surface per atom and standard deviation in nm².]]
  +
|[[File:1a6z_282b_residue_sas.png|thumb|center|Figure 3.22: Average area of accessible surface per residue and standard deviation in nm².]]
  +
|}
  +
  +
Figure 3.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Like in the wild-type, an equal amount of hydrophobic and hydrophilic residues are accessible. So, it is not possible to differentiate between the wild-type and this mutation.
  +
Figure 3.21 show the average accessible area of all atoms. Figure 3.22 show the same for each residue. In both cases, a core region wich is lies protected within the protein is not observable.
  +
  +
====Hydrogen bonds====
  +
We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:
  +
*<code>echo 1 1 | g_hbond -f traj_nojump.xtc -s 282b_md.tpr -num xvg/hydro_bonds_intra_prot.xvg</code>
  +
*<code>echo 1 12 | g_hbond -f traj_nojump.xtc -s 282b_md.tpr -num xvg/hydro_bonds_prot_water.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_hydro_bonds_intra_prot.png|thumb|center|Figure 3.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|[[File:1a6z_282b_hydro_bonds_prot_water.png|thumb|center|Figure 3.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.]]
  +
|}
  +
  +
Figure 3.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 3.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent. The results is nearly the same as for the wild-type.
  +
  +
====Ramachandran (phi/psi) plot====
  +
  +
The Ramachandran plot visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:
  +
* <code>g_rama -f traj_nojump.xtc -s 282b_md.tpr -o xvg/ramachandran.xvg</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_ramachandran.png|300px|thumb|center|Figure 3.25: Ramachandran plot of the HFE protein.]]
  +
|[[File:rama.gif|300px|thumb|center|Figure 3.26: Standard Ramachandran plot.]]
  +
|}
  +
  +
We can observe a lot of angle combinations (Figure 3.25) which are not part of the standard ramachandran plot (Figure 3.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 3.25, many left handed alpha helices. Interesting is, that, also, the large clusters do not spread out that much as in the case of the wild-type. We would call this plot more regular. But compared to the other mutation, the negativ phi area is more dense.
  +
  +
=== Analysis of dynamics and time-averaged properties ===
  +
  +
==== Root mean square deviations again ====
  +
  +
  +
* <code>g_rms -s 282b_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10</code>
  +
* <code>xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue</code>
  +
  +
[[File:1a6z_282b_rmsd-matrix.png|200px|thumb|center|Figure 3.27: Dotplot of the RMSD of HFE against HFE over time.]]
  +
  +
Figure 3.27 shows the rmsd over time, the RMSD decreases about 1500 ps. This is the behavior we can observe in Figure 3.16 and 3.18. Also this matrix is comparable with the wild-type.
  +
  +
====Cluster analysis====
  +
Here we analyzed cluster distribution and the size of the clusters.
  +
We used the command:
  +
* <code>echo 6 6 | g_cluster -s 282b_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10</code>
  +
  +
{|align="center"
  +
|[[File:1a6z_282b_cluster_sizes.png|thumb|center|Figure 3.28: Cluster size distribution]]
  +
|[[File:1a6z_282b_rmsd_distribution.png|thumb|center|Figure 3.29: RMSD distribution]]
  +
|}
  +
  +
400 clusters were found by <code>g_cluster</code>. The most clusters consists of just one structure. Just 1 clusters consists of 9, 8 and 7 structures (Figure 3.28). The RMSD distribution shown in Figure 3.29 shows that the most structures have an RMSD larger than 0.2. This indicates that the flexibility is not affected by the mutation. This obersavtion holds for all three simulations.
  +
  +
====Distance RMSD====
  +
To get the dRMSD, we used the command:
  +
*<code>g_rmsdist -s 282b_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg</code>
  +
  +
[[File:1a6z_282b_distance_rmsd.png|200px|thumb|center|Figure 3.30: dRMSD of the mutated protein]]
  +
  +
Figure 3.30 shows the same behavior as Figure 3.15. Also no convergence is observable. But a bit smoother at the end of the simulation.
  +
  +
==Discussion ==
  +
In all cases, the simulation parameter, like temperature, pressure and so on behave the same. So, we assume that all simulation reached the same state and are comparable.
  +
  +
'''The Solvent''' accessible surface remains the same in all simulations, just in the case of 282a, we see a slightly change towards the hydrophobic residues on the surface. For the wild-type and the 282b structure, an equal amount of hydrophilic and hydrophobic residues are accessible. The high hydrophilic fraction of the surface is probably due to the fact that the HFE protein is located in the plasma membrane, whereat the membrane region is not part of the simulated structure.
  +
  +
'''The Ramachandran''' plots of the simulations show, that the wild-type is more diverse than the two mutated structures. A larger difference is specially in the positive phi range observable (Figures 1.25 2.25 3.25). Here, many angle combinations present in the wild-type are missing in the mutated structures.
  +
  +
The two dimensional '''Root mean square deviation''' over time shows in all three cases the same picture as the one dimensional counterpart. For the wild-type and the 282b structure, we see a slow divergence than in the case of 282a. WT and 282b show a larger divergence in the early states of the simulation while 282a is changing just slightly. This indicates a less flexibility in this structure and a larger one in the other two. This is exactly what one can observe in the flexibility analysis.
  +
  +
'''The flexibility''' is the most interesting part. While the structure 282a is less flexible, the mutated structure 282b is more flexible than the wild-type. The structure 282a has a flexibility index (sum over the rmsd) of 49.57 and the structure has a flexibility index of 53.457, while the wild-type has one of 51.4624 which is very close to the mean of 51.4966. So, the wild-type can be called a mean structure in this case. We can observe four interesting points in the flexibility distribution shown in Figure 4.1. These regions are also marked in the structures shown in Figure 4.2. The first one is within a loop connecting two beta-strands. Here the structure 282a is more flexible than the other structures, so this could causes issues by forming a beta-sheet by the two beta-strands. Therefore the stability of the whole protein could be affected. In the second case, both mutated structures are less flexible than the wild type. The position is also whiting a loop, this time connecting a beta-sheet with an alpha helix. As this helix is assumed to be part of an active site, a loss in flexibility could cause issues at binding at the TFR protein. So, a key-hole principle can no longer act. The third case is also a transfer from a beta-sheet into an alpha-helix. Here we have an interesting change in flexibility as first, the flexibility is increased at the structure 282b and 282a follows the wild-type, while just a few residues later the flexibility changes and the structure follows the wild-type. The other mutated structure (282a) behaves the opposite. The fourth case shows a difference at a very flexible region. The mutation 282a causes a high loss in flexibility at this point, while the flexibility is not affected by the other mutation. All cases have in common that they are not connected to the mutated side in sequence and in structure. Just in case 4, the mutated residue is interacting with the beta-sheet which is connected by the flexible region marked as 4. The mutated residue interacts with a cysteine (Figure 4.3). In the wild-type structure also a cystein is placed which can form a disulfide bond. In case of 282a, the cystein is replaced by an tyrosin, in the case of 282b, the cystein is replaced by an serin. In both cases, the mutated residues can form a hydrogen bond with the cystein to replace the disulfide bond<ref>Gregoret LM, Rader SD, Fletterick RJ, Cohen FE.: Hydrogen bonds involving sulfur atoms in proteins.</ref>. This disulfide bond is more stable as the hydrogen bonds since no water can attack to break up the secondary structure. Therefor it is surprisingly, that the 282b structure is flexible as the wild-type while the flexibility is decreased by the tyrosin in structure 282a. To get a better idea what happened here exactly, the flexibility for all other possible amino acids at this position should be calculated.
  +
  +
  +
{|align="center"
  +
|[[File:disc_compared_flexibility.png|thumb|center|400px|Figure 4.1: Comparison of the flexibility per residue in nm. Also the difference of the mutated structures compared to the wild-type is shown. a value close to zero indicates no change in flexibility while a value larger zero indicate a higher flexibility in the wild-type structure and a value smaller zero a higher flexibility in the mutated structure.]]
  +
|[[File:1a6z_all_superim_bfactors.png|thumb|center|400px|Figure 4.2: Superimposed structures of the wild-type and the two mutated structures colored according to their b-factors. Blue colored regions are stable ones, while red regions are the flexible ones. The marked regions corresponds in their numbering to the boxes numbered in Figure 4.1.]]
  +
|[[File:1a6z_all_superim_bfactors_detail_4.png|thumb|center|400px|Figure 4.3: Detailed look at the mutated side. For each structure, the residue at position 282 is visible, also the interacting cysteine.]]
  +
|}
  +
  +
  +
All in all, we see two different mutation which changes the flexibility all over the protein, not just in the mutated region. As the region is not interaction directly with a binding partner, and also the biochemical properties of the amino acid is not changed, the loss of function is most likely due to the change in flexibility. A major question which remains unanswered is, why are so many regions affected which are placed not even in the same domain. As the HFE forms a complex with Beta-2-Microglobulin (B2M) to stabilize itself and avoid scissors movement like it is observable in the normal mode analysis, the beta-sheet close to the mutated residue could be affected (Figure 4.3) and the stabilizing effect can not take place. It also would be very interesting to see which of the other possible amino acids lead to a gain or loss in flexibility to compare this effect.
  +
  +
== References ==
  +
<references />
  +
  +
[[Category : Hemochromatosis]]

Latest revision as of 17:55, 3 October 2011

by Robert Greil and Cedric Landerer

Contents

Wildtype

First of all, we checked the resultfile with gmxcheck.

  • gmxcheck -f ref_md.tpr.xtc

Result

Reading frame       0 time    0.000   
# Atoms  68601
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Item        #frames Timestep (ps)
Step          2001    5
Time          2001    5
Lambda           0
Coords        2001    5
Velocities       0
Forces           0
Box           2001    5

The Simulation took 6h33:50 and the simulation speed was 36.564 ns/day. So, to reach 1 second of simulation, we had to wait around 75061 years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about -9.185e+05 kJ/mol with a range of about 0.15e+04 kJ/mol. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command trjconv -s ref_md.tpr -f ref_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10.


Figure 1.1: Motion of the wild-type protein, Cartoon representation
Figure 1.2: Motion of the wild-type protein

To create the images, we saved each frame in PyMol<ref>The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC</ref> as an image, which we converted into gif format by for file in *.png; do convert "$file" "$(basename $file .png).gif"; done. Than we were able to use gifsicle to create an animated gif with the command gifsicle *.gif -loop.

As one can see in Figure 1.1 and Figure 1.2, we have mostly a motion in space. The motion within the protein, like it is shown by the normal mode analysis is not identifiable. There is no movement of the beta-sheet or helical region against each other.

Quality assurance

For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.

Convergence of energy terms

To create the necessary files to check the convergence of energy terms we used the command:

  • g_energy -f ref_md.tpr.edr -o xvg/xxxx.xvg.

Temperature

Figure 1.3: Temperature over time of the molecular dynamics simulation of the wild-type protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift 
-------------------------------------------------------------------------------
Temperature                 297.942     0.0055    1.11471  0.0143589  (K)

Figure 1.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.

Pressure

Figure 1.4: Pressure over time of the molecular dynamics simulation of the wild-type protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Pressure                   0.995939      0.014     83.256  0.0113633  (bar)

Figure 1.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar<ref>D. A. T. Dick and Leah M. Lowenstein Osmotic Equilibria in Human Erythrocytes Studied by Immersion Refractometry</ref>, so the pressure during the simulation is not in the physiological range.

Energy

Figure 1.5: Energy over time of the molecular dynamics simulation of the wild-type protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Potential                   -919346         21    889.917   -54.8929  (kJ/mol)
Kinetic En.                  164147          3    614.133    7.91086  (kJ/mol)
Total Energy                -755199         19    1096.12   -46.9806  (kJ/mol)

Figure 1.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.

Volume

Figure 1.6: Volume over time of the molecular dynamics simulation of the wild-type protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Volume                      694.068      0.012   0.512013 0.00522608  (nm^3)

The simulation volume shown in Figure 1.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.

Density

Figure 1.7: Density over time of the molecular dynamics simulation of the wild-type protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Density                     999.461      0.018   0.737307 -0.00750088  (kg/m^3)

The density of the simulation stays all the time in the same range (Figure 1.7), with just a small divergence. Therefor we can assume a stable density for the simulation.

Box

Figure 1.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the wild-type protein. The X and Y values are equal over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Box-X                       9.93815    5.8e-05 0.00244379 2.49479e-05  (nm)
Box-Y                       9.93815    5.8e-05 0.00244379 2.49479e-05  (nm)
Box-Z                       7.02734    4.1e-05 0.00172803 1.76268e-05  (nm)

The size of the box containing the simulation volume is very constant (Figure 2.8).

Interaction Energy: Coulomb

Here we take a look at the interaction energies between the protein and the solvent.

Figure 1.9: Coulomb energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Coul-SR:Protein-non-Protein   -21213.5         39    373.942    197.347  (kJ/mol)
Coul-14:Protein-non-Protein          0          0          0          0  (kJ/mol)

The coulomb energy is not converged at all like it is shown in Figure 1.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Interaction Energy: Van der Waals

We also take a look at the non covalent interaction energies between the protein and the solvent

Figure 1.10: Van der Waals energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
LJ-SR:Protein-non-Protein    -2237.6         18    140.778    93.3922  (kJ/mol)
LJ-14:Protein-non-Protein          0          0          0          0  (kJ/mol)

The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 1.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Minimum distances between periodic images

To create the corresponding .xvg files, we used the command g_mindist -f ref_md.tpr.xtc -s ref_md.tpr -od xvg/min_dist_XXXX.xvg -pi.

Figure 1.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 1.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots.

Figure 1.11: Minimal periodic distance on the protein
Figure 1.12: Minimal periodic distance on the C-alpha groups

For the C-alpha groups, the shortest periodic distance is 2.6075 (nm) at time 6015 (ps), between atoms 2513 and 4306. For the whole protein, the shortest periodic distance is 1.74506 (nm) at time 6120 (ps),between atoms 2532 and 3799. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.

Root mean square fluctuations

To get the root mean square<ref>http://en.wikipedia.org/wiki/Root_mean_square_fluctuation</ref> (RMSF) fluctuations of the protein, we used the command:

  • g_rmsf -f ref_md.tpr.xtc -s ref_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 1.13 indicates that the HFE protein is very stable. Just 2 smaller regions about residue 89 and about residue 196 show a larger flexibility. This is not surprising as, according to UniProt, both regions are within a loop.

Figure 1.13: RMSF per residue of HFE in nm
Figure 1.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.

Figure 1.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. The possible functional region around the three helices is shown as very stable.

Convergence of RMSD

To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:

  • trjconv -f ref_md.tpr.xtc -o traj_nojump.xtc -pbc nojump
  • g_rms -f traj_nojump.xtc -s ref_md.tpr xvg/rmsd_XXX_vs_start.xvg used option 1,1 and option 1,4.
  • echo 1 | trjconv -f traj_nojump.xtc -s ref_md.tpr -o protein.xtc
  • g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_all_vs_average.xvg used option 1,1 and option 1,4.
Figure 1.15: RMSD over time of all atoms versus the starting structure
Figure 1.16: RMSD over time of all atoms versus the average structure
Figure 1.17: RMSD over time of backbone versus the starting structure
Figure 1.18: RMSD over time of backbone versus the average structure

In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing (Figure 1.15, 1.17), the RMSD against the average structure shows an oscillating behavior(Figure 1.16, 1.18). But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 1.15, 1.17 and Figure 1.16, 1.18).

The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.

Convergence of radius of gyration

Gyration<ref>http://en.wikipedia.org/wiki/Radius_of_gyration</ref> is a measure of the size of an object, a surface, or an ensemble of points.

  • g_gyrate -f ref_md.tpr.xtc -s ref_md.tpr -o xvg/radius_gyration.xvg
Figure 1.19: radius of gyration in nm

There is no convergence of the gyration observable (Figure 1.19). The gyration moves in a range from 2.25 to 2.5. The X and Y directed gyrations show an comparable behavior, while the gyration in Z-direction behave totally different.

Structural analysis: properties derived from configurations

Solvent accessible surface area

To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA)<ref>http://en.wikipedia.org/wiki/Accessible_surface_area</ref> by using the command:

  • g_sas -f traj_nojump.xtc -s ref_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg
Figure 1.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues.
Figure 1.21: Average area of accessible surface per atom and standard deviation in nm².
Figure 1.22: Average area of accessible surface per residue and standard deviation in nm².

Figure 1.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Interestingly, an equal amount of hydrophobic and hydrophilic residues are accessible. Figure 1.21 show the average accessible area of all atoms. Figure 1.22 show the same for each residue. In both cases, a core region which is lies protected within the protein is not observable.

Hydrogen bonds

We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:

  • echo 1 1 | g_hbond -f traj_nojump.xtc -s ref_md.tpr -num xvg/hydro_bonds_intra_prot.xvg
  • echo 1 12 | g_hbond -f traj_nojump.xtc -s ref_md.tpr -num xvg/hydro_bonds_prot_water.xvg
Figure 1.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.
Figure 1.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.

Figure 1.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 1.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent.

Ramachandran (phi/psi) plot

The Ramachandran plot<ref>http://en.wikipedia.org/wiki/Ramachandran_plot</ref> visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:

  • g_rama -f traj_nojump.xtc -s ref_md.tpr -o xvg/ramachandran.xvg
Figure 1.25: Ramachandran plot of the HFE protein.
Figure 1.26: Standard Ramachandran plot<ref>http://www.cryst.bbk.ac.uk/PPS95/course/3_geometry/rama.html</ref>.

We can observe a lot of angle combinations (Figure 1.25) which are not part of the standard ramachandran plot (Figure 1.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 1.25, many left handed alpha helices.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

  • g_rms -s ref_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10
  • xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue
Figure 1.27: Dotplot of the RMSD of HFE against HFE over time.

Figure 1.27 shows the rmsd over time, we see that the rmsd decreases over time in both dimensions. That means, that the movement after 1200 ps is less strong then before. The protein converges after 1200 ps.

Cluster analysis

Here we analyzed cluster distribution and the size of the clusters. We used the command:

  • echo 6 6 | g_cluster -s ref_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
Figure 1.28: Cluster size distribution
Figure 1.29: RMSD distribution

470 clusters were found by g_cluster. The most clusters consists of just one structure. Just 2 clusters consits of 7 structures (Figure 1.28). The RMSD distribution shown in Figure 1.29 shows that the most structures have an RMSD larger than 0.2. This indicates a flexible structure.

Distance RMSD

To get the dRMSD, we used the command:

  • g_rmsdist -s ref_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg
Figure 1.30: dRMSD of the wild-type

Figure 1.30 shows the same behavior as Figure 1.15. Also no convergence is observable.

Mutation 8a [C282Y]

First of all, we checked the resulting file with gmxcheck.

  • gmxcheck -f 282a_md.tpr.xtc

Result

Reading frame       0 time    0.000   
# Atoms  68604
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Item        #frames Timestep (ps)
Step          2001    5
Time          2001    5
Lambda           0
Coords        2001    5
Velocities       0
Forces           0
Box           2001    5

The Simulation took 11h32:38 and the simulation speed was 20.790 ns/day. So, to reach 1 second of simulation, we had to wait around 131,780.954 years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about -9.19218e+05 kJ/mol with a range of about 0.15e+04 kJ/mol. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command trjconv -s 282a_md.tpr -f 282a_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10.


Figure 2.1: Motion of the mutated protein, Cartoon representation
Figure 2.2: Motion of the mutated protein

To create the images, we saved each frame in PyMol as an image, which we converted into gif format by for file in *.png; do convert "$file" "$(basename $file .png).gif"; done. Than we were able to use gifsicle to create an animated gif with the command gifsicle *.gif -loop.

As one can see in Figure 2.1 and Figure 2.2, we have mostly a motion in space. The motion within the protein, like it is shown by the normal mode analysis is not identifiable. There is no movement of the beta-sheet or helical region against each other.

Quality assurance

For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.

Convergence of energy terms

To create the necessary files to check the convergence of energy terms we used the command:

  • g_energy -f 282a_md.tpr.edr -o xvg/xxxx.xvg.

Temperature

Figure 2.3: Temperature over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Temperature                 297.941     0.0052    1.11152 0.00591841  (K)

Figure 2.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.

Pressure

Figure 2.4: Pressure over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Pressure                    1.00547      0.014    82.9753  0.0439014  (bar)


Figure 2.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar, so the pressure during the simulation is not in the physiological range.

Energy

Figure 2.5: Energy over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Potential                   -919218         38    888.988   -238.954  (kJ/mol)
Kinetic En.                  164142        2.9    612.357    3.26073  (kJ/mol)
Total Energy                -755076         37    1093.34   -235.696  (kJ/mol)

Figure 2.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.

Volume

Figure 2.6: Volume over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Volume                      694.169       0.02   0.527806 -0.0281068  (nm^3)

The simulation volume shown in Figure 2.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.

Density

Figure 2.7: Density over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Density                     999.333      0.029   0.759851  0.0403628  (kg/m^3)


The density of the simulation stays all the time in the same range (Figure 2.7), with just a small divergence. Therefor we can assume a stable density for the simulation.

Box

File:1a6z 282a box.png?
Figure 2.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the mutated protein. The X and Y values are equal over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Box-X                       9.93864    9.7e-05 0.00251894 -0.000134026  (nm)
Box-Y                       9.93864    9.7e-05 0.00251894 -0.000134026  (nm)
Box-Z                       7.02768    6.9e-05 0.00178116 -9.47747e-05  (nm)


The size of the box containing the simulation volume is very constant (Figure 2.8).

Interaction Energy: Coulomb

Here we also take a look at the non covalent interaction energies between the protein and the solvent

Figure 2.9: Coulomb energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Coul-SR:Protein-non-Protein   -21282.1        120    458.064   -645.984  (kJ/mol)
Coul-14:Protein-non-Protein          0          0          0          0  (kJ/mol)

The coulomb energy is not converged at all like it is shown in Figure 2.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. The fluctuation is much higher than for the wild-type. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Interaction Energy: Van der Waals

We also take a look at the non covalent interaction energies between the protein and the solvent

Figure 2.10: Van der Waals energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
LJ-SR:Protein-non-Protein   -2230.49         11    135.734   -66.2062  (kJ/mol)
LJ-14:Protein-non-Protein          0          0          0          0  (kJ/mol)


The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 2.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. The energy level is a bit lower as the one of the wild-type. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Minimum distances between periodic images

To create the corresponding .xvg files, we used the command g_mindist -f 282a_md.tpr.xtc -s 282a_md.tpr -od xvg/mindist_xxxx.xvg -pi.

Figure 2.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 2.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots. Figure 2.12 shows an oscillation for the minimal distance and a slightly increasing tendency of the maximum internal distance.

Figure 2.11: Minimal periodic distance on the protein
Figure 2.12: Minimal periodic distance on the C-alpha groups

For the C-alpha groups, the shortest periodic distance is 2.32134 (nm) at time 9040 (ps), between atoms 2472 and 4307. For the whole protein, the shortest periodic distance is 1.55939 (nm) at time 9125 (ps), between atoms 2488 and 4355. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.

Root mean square fluctuations

To get the root mean square (RMSF) fluctuations of the protein, we used the command:

  • g_rmsf -f 282a_md.tpr.xtc -s 282a_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 2.13 shows a different RMS fluctuation as Figure 1.13. The region about residue 196 which is the most flexible one in the wild-type, is still flexible but not as much as in the wild-type. Therefor the region about residue 41 is more flexible than in the wild-type. So, we see a change in flexibility mostly at the C-terminus of the protein.

Figure 2.13: RMSF per residue of HFE in nm
Figure 2.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.

Figure 2.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. In this case, the possible functional region is mostly affected. The underlying beta-sheet is now more flexible than in the wild-type. Also the alpha-helices are now more flexible. This is quite impressive as the mutation is not interacting with this region.

Convergence of RMSD

To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:

  • trjconv -f 282a_md.tpr.xtc -o traj_nojump.xtc -pbc nojump
  • g_rms -f traj_nojump.xtc -s 282a_md.tpr -o xvg/rmsd_XXX_vs_start.xvg used option 1,1 and option 1,4.
  • echo 1 | trjconv -f traj_nojump.xtc -s 282a_md.tpr -o protein.xtc
  • g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_XXX_vs_average.xvg used option 1,1 and option 1,4.
Figure 2.15: RMSD over time of all atoms versus the starting structure
Figure 2.16: RMSD over time of all atoms versus the average structure
Figure 2.17: RMSD over time of backbone versus the starting structure
Figure 2.18: RMSD over time of backbone versus the average structure

In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing with some peaks(Figure 2.15, 2.17), the RMSD against the average structure shows an oscillating behavior(Figure 2.16, 2.18), even more than the wild-type. But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 2.15, 2.17 and Figure 2.16, 2.18).

The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.

Convergence of radius of gyration

Gyration is a measure of the size of an object, a surface, or an ensemble of points.

  • g_gyrate -f 282a_md.tpr.xtc -s 282a_md.tpr -o xvg/radius_gyration.xvg
Figure 2.19: radius of gyration in nm

There is no convergence of the gyration observable (Figure 2.19). Just RgX and RgY reach a common point, but this also happened before about 3000ps. so, we can not be sure if these two really converge. But like in the case of the wild-type, the Z direction shows a different behavior. The gyration moves in a range from 2.25 to 2.5.

Structural analysis: properties derived from configurations

Solvent accessible surface area

To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA) by using the command:

  • g_sas -f traj_nojump.xtc -s 282a_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg
Figure 2.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues.
Figure 2.21: Average area of accessible surface per atom and standard deviation in nm².
Figure 2.22: Average area of accessible surface per residue and standard deviation in nm².

Figure 2.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Like in the wild-type, an equal amount of hydrophobic and hydrophilic residues are accessible. But here, the amount of hydrophobic accessible residues is slightly higher. Figure 2.21 show the average accessible area of all atoms. Figure 2.22 show the same for each residue. In both cases, a core region wich is lies protected within the protein is not observable.

Hydrogen bonds

We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:

  • echo 1 1 | g_hbond -f traj_nojump.xtc -s 282a_md.tpr -num xvg/hydro_bonds_intra_prot.xvg
  • echo 1 12 | g_hbond -f traj_nojump.xtc -s 282a_md.tpr -num xvg/hydro_bonds_prot_water.xvg
Figure 2.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.
Figure 2.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.

Figure 2.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 2.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent.

In general, there is just a slightly observable difference compared to the wild-type at 4500 ps and about 8000 ps in the amount of hydrogen bonds.

Ramachandran (phi/psi) plot

The Ramachandran plot visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:

  • g_rama -f traj_nojump.xtc -s 282a_md.tpr -o xvg/ramachandran.xvg
Figure 2.25: Ramachandran plot of the HFE protein.
Figure 2.26: Standard Ramachandran plot.

We can observe a lot of angle combinations (Figure 2.25) which are not part of the standard ramachandran plot (Figure 2.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 2.25, many left handed alpha helices. Interesting is, that the large clusters do not spread out that much as in the case of the wild-type. We would call this plot more regular.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

  • g_rms -s 282a_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10
  • xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue
Figure 2.27: Dotplot of the RMSD of HFE against HFE over time.

Figure 2.27 shows the rmsd over time, we see a stronger decrease of the rmsd over time in both dimensions as in the case of the wild-type. This probably mean, the mutated structure reaches the equilibrium faster than the wild type.

Cluster analysis

Here we analyzed cluster distribution and the size of the clusters. We used the command:

  • echo 6 6 | g_cluster -s 282a_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
Figure 2.28: Cluster size distribution
Figure 2.29: RMSD distribution

575 clusters were found by g_cluster. The most clusters consists of just one structure. Just 1 clusters consists of 6 structures (Figure 2.28). The RMSD distribution shown in Figure 2.29 shows that the most structures have an RMSD larger than 0.2. This indicates that the flexibility is not affected by the mutation.

Distance RMSD

To get the dRMSD, we used the command:

  • g_rmsdist -s 282a_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg
Figure 2.30: dRMSD of the mutated protein

Figure 2.30 shows the same behavior as Figure 2.15. Also no convergence is observable. But, the dRMSD is not so spiky. The peaks about 1500 ps, 5000 ps and 7000 ps are not that high. So, Figure 2.30 is a smoother version of the data shown in Figure 2.15.

Mutation 8b [C282S]

First of all, we checked the resulting file with gmxcheck.

  • gmxcheck -f 282b_md.tpr.xtc

Result

Reading frame       0 time    0.000   
# Atoms  68600
Precision 0.001 (nm)
Last frame       2000 time 10000.000   

Item        #frames Timestep (ps)
Step          2001    5
Time          2001    5
Lambda           0
Coords        2001    5
Velocities       0
Forces           0
Box           2001    5

The Simulation took 11h37:37 and the simulation speed was 20.641 ns/day. So, to reach 1 second of simulation, we had to wait around 132996.409 years. But this is a bit to long for this Project, so we just used the results we got. The potential energy was fluctuating about -9.19215e+05 kJ/mol with a range of about 0.15e+04 kJ/mol. These information are given in the different log-files provided by the simulation. To create the pdb file for the visualization, we used the command trjconv -s 282b_md.tpr -f 282b_md.tpr.xtc -o protein.pdb -pbc nojump -dt 10.


Figure 3.1: Motion of the mutated protein, Cartoon representation
Figure 3.2: Motion of the mutated protein

To create the images, we saved each frame in PyMol as an image, which we converted into gif format by for file in *.png; do convert "$file" "$(basename $file .png).gif"; done. Than we were able to use gifsicle to create an animated gif with the command gifsicle *.gif -loop.

As one can see in Figure 3.1 and Figure 3.2, we have mostly a motion in space. The motion within the protein, like it is shown by the normal mode analysis is not identifiable. There is no movement of the beta-sheet or helical region against each other.

Quality assurance

For the quality assurance, we checked the convergence of different relevant parameters like the temperature, pressure or density. If they reach a thermal equilibrium, we can be sure our simulation has not to be extended any further.

Convergence of energy terms

To create the necessary files to check the convergence of energy terms we used the command:

  • g_energy -f 282b_md.tpr.edr -o xvg/xxxx.xvg.

Temperature

Figure 3.3: Temperature over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Temperature                 297.942     0.0062    1.11548 0.00209918  (K)

Figure 3.3 shows the temperature over the simulation time. The energy fluctuates between 294K (20.85°C) and 302K (28.85°C) around a mean of about 298K (24.85°C). So, the protein is simulated in a temperature range at which a hypothermia would be the consequence. As the temperature stays the whole simulation time in the same range, we can rate this as a stable region, but the fluctuation is about 8K. So the region is quite large and as there is no trend observable, so we can not see any convergence.

Pressure

Figure 3.4: Pressure over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Pressure                    1.00504     0.0098    82.9603 -0.0293434  (bar)

Figure 3.4 shows the pressure during the simulation. The pressure is not converged as it ranges from -300 bar to 300 bar, which corresponds to about 600 atmospheres. The normal pressure in a human cell is about 0.37 bar, so the pressure during the simulation is not in the physiological range.

Energy

Figure 3.5: Energy over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Potential                   -919215         33    894.443   -178.792  (kJ/mol)
Kinetic En.                  164143        3.4    614.547    1.15673  (kJ/mol)
Total Energy                -755072         34    1102.52   -177.636  (kJ/mol)

Figure 3.5 shows the energy terms of the simulation. Included are the potential, the kinetic and the total energy. The total energy is calculated by subtracting the potential energy from the kinetic energy. All energy terms reach the end of the simulation with about the same value as at the beginning of the simulation. So, the fluctuation is just in a small range.

Volume

Figure 3.6: Volume over time of the molecular dynamics simulation of the mutated protein

Energy Average Err.Est. RMSD Tot-Drift


Volume 694.06 0.026 0.544105 -0.162848 (nm^3)

The simulation volume shown in Figure 3.6 moves around 694 nm^3. A convergence is not observable as the fluctuation is about 4 nm^3 during the whole simulation. But as the range of fluctuation is very small, we would classify this as a stable volume.

Density

Figure 3.7: Density over time of the molecular dynamics simulation of the mutated protein
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Density                     999.395      0.037   0.783482   0.234471  (kg/m^3)

The density of the simulation stays all the time in the same range (Figure 3.7), with just a small divergence. Therefor we can assume a stable density for the simulation.

Box

File:1a6z 282b box.png?
Figure 3.8: Box size in X,Y and Z direction over time of the molecular dynamics simulation of the mutated protein. The X and Y values are equal over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Box-X                       9.93812    0.00012 0.00259699 -0.000777245  (nm)
Box-Y                       9.93812    0.00012 0.00259699 -0.000777245  (nm)
Box-Z                       7.02731    8.8e-05 0.00183635 -0.000549606  (nm)

The size of the box containing the simulation volume is very constant (Figure 3.8).

Interaction Energy: Coulomb

Here we also take a look at the non covalent interaction energies between the protein and the solvent

Figure 3.9: Coulomb energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Coul-SR:Protein-non-Protein   -20938.5         30    368.752    -132.83  (kJ/mol)
Coul-14:Protein-non-Protein          0          0          0          0  (kJ/mol)

The coulomb energy is not converged at all like it is shown in Figure 3.9. So the high fluctuation means that we have a high fluctuation interaction with the surrounding solvent. The fluctuation is much higher than for the wild-type. That could mean, that the physiological environment is not matched at all. This could also be the reason for the strong movement in space. So the electrostatic interferences pushes the protein in one direction. As Coul-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Interaction Energy: Van der Waals

We also take a look at the non covalent interaction energies between the protein and the solvent

Figure 3.10: Van der Waals energy of the simulation over time.
Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
LJ-SR:Protein-non-Protein   -2086.53        9.1    137.417    9.01909  (kJ/mol)
LJ-14:Protein-non-Protein          0          0          0          0  (kJ/mol)

The Van der Waals energy describes the close range interaction of non covalent bound atoms. Figure 3.10 shows the interaction energy of the protein with the solvent. Here we see a smaller fluctuation compared to the coulomb energy, but also here, a convergence is not observable. The energy level is a bit lower as the one of the wild-type. As LJ-14:Protein-non-Protein is just 0, we did not include this term to the figure.

Minimum distances between periodic images

To create the corresponding .xvg files, we used the command g_mindist -f 282b_md.tpr.xtc -s 282b_md.tpr -od xvg/mindist_xxxx.xvg -pi.

Figure 3.11 shows the minimal periodic distance on the protein calculated by the simulation. Figure 3.12 shows the minimal periodic distance on the C-alpha groups. In both cases, the boxes 1,2 and 3 have the same values over time, so just box 3 is visible in the plots. Figure 3.12 shows an oscillation for the minimal distance and a slightly increasing tendency of the maximum internal distance.

Figure 3.11: Minimal periodic distance on the protein
Figure 3.12: Minimal periodic distance on the C-alpha groups

For the C-alpha groups, the shortest periodic distance is 2.94439 (nm) at time 3770 (ps), between atoms 2513 and 3324. For the whole protein, the shortest periodic distance is 2.17028 (nm) at time 2345 (ps), between atoms 2532 and 3330. As the shortest periodic distance is within the whole protein smaller than just in the C-alpha groups, the smallest distance must be placed in a side chain. Also the fluctuation interval becomes smaller if we look at the whole protein. If the distance becomes shorter than the cut-off distance for the electrostatic interactions, the energy level would be increase dramatically.

Root mean square fluctuations

To get the root mean square (RMSF) fluctuations of the protein, we used the command:

  • g_rmsf -f 282b_md.tpr.xtc -s 282b_md.tpr -o xvg/rmsf_per_res.xvg -ox average.pdb -oq bfactors.pdb -res

Figure 3.13 shows a different RMS fluctuation as Figure 1.13. The region about residue 196 which is the most flexible one in the wild-type, shows here the same flexibility like in the wild-type. The region about residue 86 is no less flexible than in the wild-type.

Figure 3.13: RMSF per residue of HFE in nm
Figure 3.14: HFE protein colored according to the b-factors. The average and the b-factor structure are superimposed.

Figure 3.14 shows the average structure superimposed with the b-factor structure. Both structures are colored according to there b-factors. Blue regions are less flexible than red region. We can observe a higher difference in structure in regions with higher b-factors. This mutation shows less impact than the other one on the same position. The flexibility is all in all pretty much the same like in the wild-type.

Convergence of RMSD

To create all necessary files for the root mean square deviation (RMSD) calculation , we used the commands:

  • trjconv -f 282b_md.tpr.xtc -o traj_nojump.xtc -pbc nojump
  • g_rms -f traj_nojump.xtc -s 282b_md.tpr -o xvg/rmsd_XXX_vs_start.xvg used option 1,1 and option 1,4.
  • echo 1 | trjconv -f traj_nojump.xtc -s 282b_md.tpr -o protein.xtc
  • g_rms -f protein.xtc -s average.pdb -o xvg/rmsd_XXX_vs_average.xvg used option 1,1 and option 1,4.
Figure 3.15: RMSD over time of all atoms versus the starting structure
Figure 3.16: RMSD over time of all atoms versus the average structure
Figure 3.17: RMSD over time of backbone versus the starting structure
Figure 3.18: RMSD over time of backbone versus the average structure

In all cases, no plateau is reached. The difference between the comparison of the starting structure and the average structure with the all atom and backbone model is clearly visible. Whereas the RMSD against the starting structure is continuously increasing with some peaks(Figure 3.15, 3.17), the RMSD against the average structure increases at the beginning, but drops about 800 ps to its initial conditions (Figure 3.16, 3.18). But in both cases (starting and average structure) the comparison with the all atom and backbone show the same behavior (Figure 3.15, 3.17 and Figure 3.16, 3.18).

The starting structure should be the better measure for convergence because, if the RMSD against the starting structure remains nearly constant, an equilibrium is reached, while an constant RMSD against the average structure could also mean an equal change in both directions over time, like becoming a structure closer to the average and a becoming a structure with a higher RMSD.

Convergence of radius of gyration

Gyration is a measure of the size of an object, a surface, or an ensemble of points.

  • g_gyrate -f 282b_md.tpr.xtc -s 282b_md.tpr -o xvg/radius_gyration.xvg
Figure 3.19: radius of gyration in nm

There is no convergence of the gyration observable (Figure 2.19). Just RgZ and RgY reach a common point. The gyration moves in a range from 2.25 to 2.5. The behavior of RgZ is not comparable with the other mutation and the wild-type.

Structural analysis: properties derived from configurations

Solvent accessible surface area

To get an idea which residues are accessible by the solvent, we calculated the Solvent accessible surface area (SASA) by using the command:

  • g_sas -f traj_nojump.xtc -s 282b_md.tpr -o xvg/solvent_acc_surf.xvg -oa xvg/atomic_sas.xvg -or xvg/residue_sas.xvg
Figure 3.20: Area of solvent accessible surface over time according to hydrophilic and hydrophobic residues.
Figure 3.21: Average area of accessible surface per atom and standard deviation in nm².
Figure 3.22: Average area of accessible surface per residue and standard deviation in nm².

Figure 3.20 shows the total solvent accessible surface area over time. The area is not changing over time, so we have no strong movement within the protein. There is no element adjustment observable. Like in the wild-type, an equal amount of hydrophobic and hydrophilic residues are accessible. So, it is not possible to differentiate between the wild-type and this mutation. Figure 3.21 show the average accessible area of all atoms. Figure 3.22 show the same for each residue. In both cases, a core region wich is lies protected within the protein is not observable.

Hydrogen bonds

We also looked at the hydrogen bonds within the protein and the one which connect the protein with the surrounding solvent. Therefor we used the commands:

  • echo 1 1 | g_hbond -f traj_nojump.xtc -s 282b_md.tpr -num xvg/hydro_bonds_intra_prot.xvg
  • echo 1 12 | g_hbond -f traj_nojump.xtc -s 282b_md.tpr -num xvg/hydro_bonds_prot_water.xvg
Figure 3.23: Hydrogen bonds within the protein in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.
Figure 3.24: Hydrogen bonds between the protein and the solvent in black and the number of pairs within 0.35 nm which are able to form a hydrogen bond in red.

Figure 3.23 show that the number of hydrogen binds is very stable, also the number of pairs which can form a hydrogen bond is fluctuating less that is most likely because there is no strong movement of the different secondary structure elements against each other. So, there are just less hydrogen bonds which brake up or form new. Figure 3.24 shows the number of hydrogen bonds between the protein and the surrounding solvent. The number of hydrogen bonds here is about 3 times larger than within the protein. This is not surprising as the most residues are accessible by the solvent. In this case, the fluctuation is much stronger than within the protein. Most likely because of the movement of the protein in space within the solvent. The results is nearly the same as for the wild-type.

Ramachandran (phi/psi) plot

The Ramachandran plot visualizes the phi and psi angles of the amino acid back bone. To get a plot of the hfe protein, we used the command:

  • g_rama -f traj_nojump.xtc -s 282b_md.tpr -o xvg/ramachandran.xvg
Figure 3.25: Ramachandran plot of the HFE protein.
Figure 3.26: Standard Ramachandran plot.

We can observe a lot of angle combinations (Figure 3.25) which are not part of the standard ramachandran plot (Figure 3.26). Specially in the positive phi range. Most residues are part of beta-sheets and alpha helices. Also, the HFE protein contains, according to Figure 3.25, many left handed alpha helices. Interesting is, that, also, the large clusters do not spread out that much as in the case of the wild-type. We would call this plot more regular. But compared to the other mutation, the negativ phi area is more dense.

Analysis of dynamics and time-averaged properties

Root mean square deviations again

  • g_rms -s 282b_md.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10
  • xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue
Figure 3.27: Dotplot of the RMSD of HFE against HFE over time.

Figure 3.27 shows the rmsd over time, the RMSD decreases about 1500 ps. This is the behavior we can observe in Figure 3.16 and 3.18. Also this matrix is comparable with the wild-type.

Cluster analysis

Here we analyzed cluster distribution and the size of the clusters. We used the command:

  • echo 6 6 | g_cluster -s 282b_md.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist xvg/rmsd_disstribution.xvg -o clusters.xpm -sz xvg/cluster_sizes.xvg -tr cluster_transitions.xpm -ntr xvg/cluster_transitions.xvg -clid xvg/cluster_id_over_time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10
Figure 3.28: Cluster size distribution
Figure 3.29: RMSD distribution

400 clusters were found by g_cluster. The most clusters consists of just one structure. Just 1 clusters consists of 9, 8 and 7 structures (Figure 3.28). The RMSD distribution shown in Figure 3.29 shows that the most structures have an RMSD larger than 0.2. This indicates that the flexibility is not affected by the mutation. This obersavtion holds for all three simulations.

Distance RMSD

To get the dRMSD, we used the command:

  • g_rmsdist -s 282b_md.tpr -f traj_nojump.xtc -o xvg/distance_rmsd.xvg
Figure 3.30: dRMSD of the mutated protein

Figure 3.30 shows the same behavior as Figure 3.15. Also no convergence is observable. But a bit smoother at the end of the simulation.

Discussion

In all cases, the simulation parameter, like temperature, pressure and so on behave the same. So, we assume that all simulation reached the same state and are comparable.

The Solvent accessible surface remains the same in all simulations, just in the case of 282a, we see a slightly change towards the hydrophobic residues on the surface. For the wild-type and the 282b structure, an equal amount of hydrophilic and hydrophobic residues are accessible. The high hydrophilic fraction of the surface is probably due to the fact that the HFE protein is located in the plasma membrane, whereat the membrane region is not part of the simulated structure.

The Ramachandran plots of the simulations show, that the wild-type is more diverse than the two mutated structures. A larger difference is specially in the positive phi range observable (Figures 1.25 2.25 3.25). Here, many angle combinations present in the wild-type are missing in the mutated structures.

The two dimensional Root mean square deviation over time shows in all three cases the same picture as the one dimensional counterpart. For the wild-type and the 282b structure, we see a slow divergence than in the case of 282a. WT and 282b show a larger divergence in the early states of the simulation while 282a is changing just slightly. This indicates a less flexibility in this structure and a larger one in the other two. This is exactly what one can observe in the flexibility analysis.

The flexibility is the most interesting part. While the structure 282a is less flexible, the mutated structure 282b is more flexible than the wild-type. The structure 282a has a flexibility index (sum over the rmsd) of 49.57 and the structure has a flexibility index of 53.457, while the wild-type has one of 51.4624 which is very close to the mean of 51.4966. So, the wild-type can be called a mean structure in this case. We can observe four interesting points in the flexibility distribution shown in Figure 4.1. These regions are also marked in the structures shown in Figure 4.2. The first one is within a loop connecting two beta-strands. Here the structure 282a is more flexible than the other structures, so this could causes issues by forming a beta-sheet by the two beta-strands. Therefore the stability of the whole protein could be affected. In the second case, both mutated structures are less flexible than the wild type. The position is also whiting a loop, this time connecting a beta-sheet with an alpha helix. As this helix is assumed to be part of an active site, a loss in flexibility could cause issues at binding at the TFR protein. So, a key-hole principle can no longer act. The third case is also a transfer from a beta-sheet into an alpha-helix. Here we have an interesting change in flexibility as first, the flexibility is increased at the structure 282b and 282a follows the wild-type, while just a few residues later the flexibility changes and the structure follows the wild-type. The other mutated structure (282a) behaves the opposite. The fourth case shows a difference at a very flexible region. The mutation 282a causes a high loss in flexibility at this point, while the flexibility is not affected by the other mutation. All cases have in common that they are not connected to the mutated side in sequence and in structure. Just in case 4, the mutated residue is interacting with the beta-sheet which is connected by the flexible region marked as 4. The mutated residue interacts with a cysteine (Figure 4.3). In the wild-type structure also a cystein is placed which can form a disulfide bond. In case of 282a, the cystein is replaced by an tyrosin, in the case of 282b, the cystein is replaced by an serin. In both cases, the mutated residues can form a hydrogen bond with the cystein to replace the disulfide bond<ref>Gregoret LM, Rader SD, Fletterick RJ, Cohen FE.: Hydrogen bonds involving sulfur atoms in proteins.</ref>. This disulfide bond is more stable as the hydrogen bonds since no water can attack to break up the secondary structure. Therefor it is surprisingly, that the 282b structure is flexible as the wild-type while the flexibility is decreased by the tyrosin in structure 282a. To get a better idea what happened here exactly, the flexibility for all other possible amino acids at this position should be calculated.


Figure 4.1: Comparison of the flexibility per residue in nm. Also the difference of the mutated structures compared to the wild-type is shown. a value close to zero indicates no change in flexibility while a value larger zero indicate a higher flexibility in the wild-type structure and a value smaller zero a higher flexibility in the mutated structure.
Figure 4.2: Superimposed structures of the wild-type and the two mutated structures colored according to their b-factors. Blue colored regions are stable ones, while red regions are the flexible ones. The marked regions corresponds in their numbering to the boxes numbered in Figure 4.1.
Figure 4.3: Detailed look at the mutated side. For each structure, the residue at position 282 is visible, also the interacting cysteine.


All in all, we see two different mutation which changes the flexibility all over the protein, not just in the mutated region. As the region is not interaction directly with a binding partner, and also the biochemical properties of the amino acid is not changed, the loss of function is most likely due to the change in flexibility. A major question which remains unanswered is, why are so many regions affected which are placed not even in the same domain. As the HFE forms a complex with Beta-2-Microglobulin (B2M) to stabilize itself and avoid scissors movement like it is observable in the normal mode analysis, the beta-sheet close to the mutated residue could be affected (Figure 4.3) and the stabilizing effect can not take place. It also would be very interesting to see which of the other possible amino acids lead to a gain or loss in flexibility to compare this effect.

References

<references />