Structure-Based Mutation Analysis Hemochromatosis

From Bioinformatikpedia
Revision as of 10:23, 25 June 2012 by Joerdensv (talk | contribs) (Gromacs)

Hemochromatosis>>Task 7: Structure-based mutation analysis


Riddle of the task

It took you over an hour to figure out the right combination, but the door is finally open. The sight is unbelievable. Inside the nex room lies a treasure beyond imagination: heaps of gems, gold, and jewelry. Exotic furs, marvelous paintings, and many more. You step inside to collect what should now be yours...

The moment you reach out for the first piece of treasure it vanishes into thin air. ALL of it. The treasure was just an illusion... You look around and see another entrance into the room. A collapsed one. Across the room is a person, kneeling before another door. You shout... No answer. He didn't even move. As you get closer to him you see that, whoever it was, is dead. His skin mummified due to the dry air. Next to him an old leathery backpack. You reach out to take it as you notice small fragments on the floor. They look like tiny bits of red glass. Now that you're in front of him you also see many of these splinters burried inside the person's flesh. Within the backpack you find several glass orbs: a blue one, a yellow one, a green one, an orange one, a cyan one, and a violet one. Infront of the dead man, at the bottom of the door, you notice three slots. Each of them about the size of the orbs. One of them is red, the second one orange, and the third one yellow...


Short task description

Detailed description: Structure-based mutation analysis


Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.


Structure selection and mapping of the mutations

<figure id="mut_map">

Figure 1: M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S mapped onto 1a6zC. Mutations are shown in sticks representation and colored red. Glycosylation sites are colored cyan. Disulfide bonds are colored orange and also shown as sticks.

</figure>

There are only two structures available for HFE at PDB: 1a6z and 1de4. We chose 1a6z for this task as it has the better resolution (2.6 Å instead of 2.8 Å) and has only a beta-2-microglobulin in addition to HFE. In 1de4 HFE would be complexed with transferrin receptor (TFR). All of the mutations from the previous task (M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S) are included in the PDB structure (residues 26-297).

<xr id="mut_map"/> shows a three dimensional mapping of the mutations (red) onto 1a6zC. Glycosylation sites (cyan) and disulfide bonds (orange) are also indicated. The only such residue that is directly affected by a mutation is the disulfide bond spanned by C225 and C282 where C282 is mutated into Serine. Though Q127H, L183P, and R224W are quite close to the glycosylation site at 130 and the two disulfide bonds (C124-C187, C224-C282) and therefore might affect them indirectly.



SCWRL and FoldX

In order to analyze the effects of the mutations we have created several models with SCWRL and FoldX. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:

  • green: reference (1a6zC)
  • cyan: SCWRL wildtype
  • magenta: SCWRL mutant
  • orange: FoldX wildtype
  • red: FoldX mutant


An overview table containing all energy values for the models and their wildtypes can be found here.


M35T

<figtable id="M35T_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 1: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for M35T. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 17.720
  • FoldX energy (norm.): 7.744


The wildtype of M35T is part of a beta sheet complex in the MHC I domain and spans two hydrogen bonds to a neighboring beta sheet (cf. <xr id="M35T_pymol"/>). Both of these hydrogen bonds are preserved in the mutant model (SCWRL and FoldX). The FoldX model uses a slighty different rotamer, though, which enables it to form an additional hydrogen bond to the previous residue. This might cause an increased stability over the wildtype. The changes to the surface due to the mutation are only minor and should not cause any problems. The surface model also shows that FoldX uses a slighty different rotamer for the wildtype model. Even the energy values indicate only minor changes in the whole model. Therefore this mutation should be considered non disease causing.


V53M

<figtable id="V53M_pymol">

SCWRL.
FoldX.
Table 2: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for V53M. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 70.349
  • FoldX energy (norm.): 3.705


V53M marks the transition of a beta sheet into a turn within the MHC I domain and forms two hydrogen bonds to the beginning of the next beta sheet (cf. <xr id="V53M_pymol"/>). The mutant models both retain these bonds and do not form additional ones either. As this residue is burried within the protein, there are no changes to the surface, but the residue loses its strong hydrophobic character which would force it into the protein during translation/folding. Even though the rotamers used by SCWRL and FoldX differ only slightly the difference in the energy model is quite huge. While FoldX would not indicate that V53M is disease causing, SCWRL's energy change does so. The mutation is quite hard to classify. Though considering that it has the second highest energy change for all SCWRL models and that it loses its hydrophobicity, it is more likely to be disease causing than not.


G93R

<figtable id="G93R_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 3: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for G93R. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 16.141
  • FoldX energy (norm.): -3.772


G93R lies within a big alpha helix in the MHC I domain (cf. <xr id="G93R_pymol"/>). The important three hydrogen bonds for the helix stabilization are conserved in both mutant models. In the FoldX model an additional hydrogen bond within the helix structure is formed. While these changes seem harmless at first, it should also be noted that this region is supposed to be the interface for the TFR-HFE complex. This makes the changes to the surface even more severe than they would seem on their own. The much bigger arginine causes a massiv bulk on the surface which is very likely to interfere with the complex formation. Therefore this mutation should be considered disease causing, even if the energy models do not suggest this.


Q127H

<figtable id="Q127H_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 4: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for Q127H. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 17.911
  • FoldX energy (norm.): -3.676


Q127H is at the start of a coil/turn between two beta sheets in the MHC I domain (cf. <xr id="Q127H_pymol"/>). While both mutant models retain the two hydrogen bonds that stabilize this coil/turn, the FoldX model forms even an additional one, they both lose a hydrogen bond which connects Q127 and E125. Thus the indirect anchor to the previous beta sheet is lost. This might not be that severe, but one of the connected amino acids marks the glycosylation site N130 (connected by the lower hydrogen bond in the figures). With this in mind this mutation should be considered disease causing. Like in the previous mutation this is contrary to the energy models.


A162S

<figtable id="A162S_pymol">

SCWRL.
FoldX.
Table 5: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for A162S. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 38.099
  • FoldX energy (norm.): 6.643


A162S is part of a helix in the MHC I domain (cf. <xr id="A162S_pymol"/>). All wildtype hydrogen bonds are preserved in the mutant models and several new ones are formed (3 in SCWRL and 4 in FoldX). This should further stabilize the structure. Additionally the residue is buried within the protein and thus causes no changes on the surface. Even the size of the wildtype and mutant amino acids does not differ much. The only indicator for a malign mutation would be the energy change in the SCWRL model, but this has proven to be quite unreliabe in the previous mutations. Therefore this mutation should be considered non disease causing.


L183P

<figtable id="L183P_pymol">

SCWRL.
FoldX.
Table 6: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for L183P. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 0.284
  • FoldX energy (norm.): 33.996


L183P is, again, located in one of the MHC I domain's helices. Proline's effect as a helix breaker is demonstrated in <xr id="L183P_pymol"/>. Both stabilizing hydrogen bonds are lost and no new ones are formed. As mentioned before this region is interface for the TFR-HFE complex and therefore a break in one of the three big helices should be considered to be disease causing, even though this particular residue is not on the surface or the protein. This is also the first FoldX model to show a big energy change. Maybe FoldX's energy model is a better indicator than SCWRL's.


T217I

<figtable id="T217I_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 7: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for T217I. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 48.103
  • FoldX energy (norm.): 9.312
  • Warning: Highly hydrophobic amino acid on the surface!


T217I is the first mutation that is within the C1 domain (cf. <xr id="T217I_pymol"/>). It is part of a coil/turn between two beta sheets and seems to play an important role in the stabilization of this region as it forms a total of 5 hydrogen bonds. All but one of these bonds are lost in both mutant models. Though the hydrogen bond which is conserved is probably the most important one as it reaches across the coil/turn to the beginning of the next beta sheet. While the changes to the surface are only minor, the fact that the mutant is highly hydrophobic indicates a malign mutation. The change in the energy models also, more or less, suggest this mutation to be disease causing.


R224W

<figtable id="R224W_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 8: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for R224W. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 53.506
  • FoldX energy (norm.): 2.104


R224W lies within one of the C1 domain's beta sheets and forms two stabilizing hydrogen bonds to the neighboring beta sheet (cf. <xr id="R224W_pymol"/>). All hydrogen bonds are unchanged in the mutant models and no new ones are formed. While the mutant residue has quite a different structure than the wildtype the rotamer chosen by FoldX seems to resemble the original one better. The mutant produces moderate changes on the protein surface which could severe considering that this side of the C1 domain is aligned with Beta-2-Microglobulin (when in complex). SCWRL's energy model also indicates a malign mutation. Overall R224W should be considered disease causing.


E277K

<figtable id="E277K_pymol">

SCWRL.
FoldX.
SCWRL.
FoldX.
Table 9: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for E227K. The mutated residue is colored: reference (green), SCWRL wt (cyan), SCWRL mt (magenta), FoldX wt (orange), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks. The upper figures show the hydrogen bonds and rotamers, the lower ones the protein surface.

</figtable>

  • SCWRL energy (norm.): 122.406
  • FoldX energy (norm.): 25.854


E277K is part of a very small helix (4 residues according to DSSP) within the C1 domain (cf. <xr id="E277K_pymol"/>). It seems to have a quite complex role in stabilizing the entire domain as it forms hydrogen bonds with three different structural formations: One with the following beta sheet (Y280), two with G275 which is within a short coil, and one with T221 which is at the start of another beta sheet within the C1 domain. Both mutant models lose the hydrogen bonds with G275 and the SCWRL model additionally loses the one with T221 which might have serve effects on the tertiary structure of the C1 domain. These destabilizations, the moderate changes on the protein surface (cave in the SCWRL model, bulk in the FoldX one), and the high energy changes for both models strongly indicate a disease causing mutation.


C282S

<figtable id="C282S_pymol">

SCWRL.
FoldX.
Table 10: Comparison between the reference structure (1a6zC) and the SCWRL and FoldX models for C282S. The mutated residue is colored: reference (green), SCWRL mt (magenta), and FoldX mt (red). Additional residues to which hydrogen bonds are formed are shown as sticks.

</figtable>

  • SCWRL energy (norm.): 34.746
  • FoldX energy (norm.): 46.690


C282S is located within a beta sheet of the C1 domain (cf. <xr id="C282S_pymol"/>) and forms two hydrogen bonds with the neighboring sheet. These bonds are retained in both mutant models and they even form a third one with the same residue. The difference in residue size is minor and it is located within the protein (no surface changes). Though the major problem with this mutation is the loss of the only disulfide bridge (C225-C282) within the C1 domain which is also reflected in the big energy model changes. This loss alone is enough to consider this mutation disease causing.


Minimise

Next we used Minimise to minimize (lame pun...) the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found here.

The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.


<figtable id="energy_gain">

All models.
FoldX models.
SCWRL models.
Table 11: Median energy change per iteration of minimization. Each box is based on the energy difference between the current and the first iteration. Statistics are shown for all 31 models (left), all 20 FoldX models (center), and all 11 SCWRL models (right).

</figtable>


MUST... NOT... CREATE... MORE... FIGURES...
WILL... GO... CRAZY...


Gromacs

For gromacs we used the models created with SCWRL and FoldX. (this replaces Step 1 to 3 in the task description)


title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no


We used this .mdp file for evaluating all energies. For more information regarding the arguments read this.




The output of g_energy can be seen in the following pictures. These are only for chosen models. The total amount of pictures can be found here. The final calculated energies can also be found on that page under section "Tables".


For getting the runtimes of mdrun we iteratively called mdrun with different stepsizes in the .mdp file. At first we looked at 100+X*100 steps resulting in <xr id="runtimes"/>, left picture. Here you can see that the runtime (noted as real-time so uninfluenced by system) is capped at around 32 seconds. As these gaps were too big to see whether its a linear growth we performed another test. To get more accurate results we performed the same test again with 100+X*1. The result can be seen in <xr id="runtimes"/>, right picture.

Based on this result we conclude that the runtime is linear up to a certain point where no improvement can be made anymore and the program terminates.

<figtable id="runtimes">

Runtime of mdrun with different number of steps, stepwidth 100
Runtime of mdrun with different number of steps, stepwidth 1
Table TODO: Plots of runtimes of mdrun against the number of steps.



The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken. Here you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.

The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.

As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.

<figtable id="gromacs_energies">

Wildtype based on the foldX A126S mutation.The used forcefield was Amber03.
Mutant, calculated with foldX and the Amber03 forcefield.
Wildtype based on the foldX A126S mutation.The used forcefield was Charmm27.
Wildtype based on the scwrl method.The used forcefield was Charmm27.
Wildtype based on the foldX A126S mutation.The used forcefield was Amber99-ildn.
Wildtype based on the scwrl method.The used forcefield was Amber99-ildn
Table TODO: Plots of the energy values (bond, angle and potential) of the model .


<figtable id="amber03-table">

Mutation change in potential validation
M35T -150.7 benign
V53M 783.4 malign
G93R -529.9 malign
Q127H -196.1 malign
A162S 113.9 benign
L183P 310.5 malign
T217I 159.6 benign
R224W 603.2 benign(should be malign)
E277K 455.8 malign
C282S 32.4 malign
Table TODO: change in potential when comparing the mutated foldX models with the wildtype ones.


Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign

Conclusion

Maybe?


References

<references/>