# Structure-Based Mutation Analysis Hemochromatosis

## Riddle of the task

It took you over an hour to figure out the right combination, but the door is finally open. The sight is unbelievable. Inside the nex room lies a treasure beyond imagination: heaps of gems, gold, and jewelry. Exotic furs, marvelous paintings, and many more. You step inside to collect what should now be yours...

The moment you reach out for the first piece of treasure it vanishes into thin air. ALL of it. The treasure was just an illusion... You look around and see another entrance into the room. A collapsed one. Across the room is a person, kneeling before another door. You shout... No answer. He didn't even move. As you get closer to him you see that, whoever it was, is dead. His skin mummified due to the dry air. Next to him an old leathery backpack. You reach out to take it as you notice small fragments on the floor. They look like tiny bits of red glass. Now that you're in front of him you also see many of these splinters burried inside the person's flesh. Within the backpack you find several glass orbs: a blue one, a yellow one, a green one, an orange one, a cyan one, and a violet one. Infront of the dead man, at the bottom of the door, you notice three slots. Each of them about the size of the orbs. One of them is red, the second one orange, and the third one yellow...

## Short task description

Detailed description: Structure-based mutation analysis

In this task we employed several methods for structure-based predictions of mutation effects. The methods were SCWRL, FoldX, Minimise, and Gromacs. After the generation of models for each mutation and method we used PyMol and energy statistics to classify the mutations into disease causing ones and benign mutations.

## Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.

## Structure selection and mapping of the mutations

<figure id="mut_map">

Figure 1: M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S mapped onto 1a6zC. Mutations are shown in sticks representation and colored red. Glycosylation sites are colored cyan. Disulfide bonds are colored orange and also shown as sticks.

</figure>

There are only two structures available for HFE at PDB: 1a6z and 1de4. We chose 1a6z for this task as it has the better resolution (2.6 Å instead of 2.8 Å) and has only a beta-2-microglobulin in addition to HFE. In 1de4 HFE would be complexed with transferrin receptor (TFR). All of the mutations from the previous task (M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S) are included in the PDB structure (residues 26-297).

<xr id="mut_map"/> shows a three dimensional mapping of the mutations (red) onto 1a6zC. Glycosylation sites (cyan) and disulfide bonds (orange) are also indicated. The only such residue that is directly affected by a mutation is the disulfide bond spanned by C225 and C282 where C282 is mutated into Serine. Though Q127H, L183P, and R224W are quite close to the glycosylation site at 130 and the two disulfide bonds (C124-C187, C224-C282) and therefore might affect them indirectly.

## SCWRL and FoldX

In order to analyze the effects of the mutations we have created several models with SCWRL<ref name="scwrl">Georgii G. Krivov, Maxim V. Shapovalov, and Roland L. Dunbrack, Jr. (2009): Improved prediction of protein side-chain conformations with SCWRL4. PMID 19603484</ref> and FoldX<ref name="foldx1">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005): The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8. PMID 15980494</ref><ref name="foldx2">Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005): Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52. PMID 16006526</ref><ref name="foldx3">Guerois R., Nielsen J. E., Serrano L. (2002): Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87 PMID 12079393</ref>. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:

• green: reference (1a6zC)
• cyan: SCWRL wildtype
• magenta: SCWRL mutant
• orange: FoldX wildtype
• red: FoldX mutant

An overview table containing all energy values for the models and their wildtypes can be found here. In the following section only the normalized energy change will be given. This number represents the difference in energy compared to the wildtype model in one-tenth of a percent (i.e. 17 means +1.7%).

### M35T

<figtable id="M35T_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 17.720
• FoldX energy (norm.): 7.744

The wildtype of M35T is part of a beta sheet complex in the MHC I domain and spans two hydrogen bonds to a neighboring beta sheet (cf. <xr id="M35T_pymol"/>). Both of these hydrogen bonds are preserved in the mutant model (SCWRL and FoldX). The FoldX model uses a slighty different rotamer, though, which enables it to form an additional hydrogen bond to the previous residue. This might cause an increased stability over the wildtype. The changes to the surface due to the mutation are only minor and should not cause any problems. The surface model also shows that FoldX uses a slighty different rotamer for the wildtype model. Even the energy values indicate only minor changes in the whole model. Therefore this mutation should be considered non disease causing.

### V53M

<figtable id="V53M_pymol">

 SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 70.349
• FoldX energy (norm.): 3.705

V53M marks the transition of a beta sheet into a turn within the MHC I domain and forms two hydrogen bonds to the beginning of the next beta sheet (cf. <xr id="V53M_pymol"/>). The mutant models both retain these bonds and do not form additional ones either. As this residue is burried within the protein, there are no changes to the surface, but the residue loses its strong hydrophobic character which would force it into the protein during translation/folding. Even though the rotamers used by SCWRL and FoldX differ only slightly the difference in the energy model is quite huge. While FoldX would not indicate that V53M is disease causing, SCWRL's energy change does so. The mutation is quite hard to classify. Though considering that it has the second highest energy change for all SCWRL models and that it loses its hydrophobicity, it is more likely to be disease causing than not.

### G93R

<figtable id="G93R_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 16.141
• FoldX energy (norm.): -3.772

G93R lies within a big alpha helix in the MHC I domain (cf. <xr id="G93R_pymol"/>). The important three hydrogen bonds for the helix stabilization are conserved in both mutant models. In the FoldX model an additional hydrogen bond within the helix structure is formed. While these changes seem harmless at first, it should also be noted that this region is supposed to be the interface for the TFR-HFE complex. This makes the changes to the surface even more severe than they would seem on their own. The much bigger arginine causes a massiv bulk on the surface which is very likely to interfere with the complex formation. Therefore this mutation should be considered disease causing, even if the energy models do not suggest this.

### Q127H

<figtable id="Q127H_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 17.911
• FoldX energy (norm.): -3.676

Q127H is at the start of a coil/turn between two beta sheets in the MHC I domain (cf. <xr id="Q127H_pymol"/>). While both mutant models retain the two hydrogen bonds that stabilize this coil/turn, the FoldX model forms even an additional one, they both lose a hydrogen bond which connects Q127 and E125. Thus the indirect anchor to the previous beta sheet is lost. This might not be that severe, but one of the connected amino acids marks the glycosylation site N130 (connected by the lower hydrogen bond in the figures). With this in mind this mutation should be considered disease causing. Like in the previous mutation this is contrary to the energy models.

### A162S

<figtable id="A162S_pymol">

 SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 38.099
• FoldX energy (norm.): 6.643

A162S is part of a helix in the MHC I domain (cf. <xr id="A162S_pymol"/>). All wildtype hydrogen bonds are preserved in the mutant models and several new ones are formed (3 in SCWRL and 4 in FoldX). This should further stabilize the structure. Additionally the residue is buried within the protein and thus causes no changes on the surface. Even the size of the wildtype and mutant amino acids does not differ much. The only indicator for a malign mutation would be the energy change in the SCWRL model, but this has proven to be quite unreliabe in the previous mutations. Therefore this mutation should be considered non disease causing.

### L183P

<figtable id="L183P_pymol">

 SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 0.284
• FoldX energy (norm.): 33.996

L183P is, again, located in one of the MHC I domain's helices. Proline's effect as a helix breaker is demonstrated in <xr id="L183P_pymol"/>. Both stabilizing hydrogen bonds are lost and no new ones are formed. As mentioned before this region is interface for the TFR-HFE complex and therefore a break in one of the three big helices should be considered to be disease causing, even though this particular residue is not on the surface or the protein. This is also the first FoldX model to show a big energy change. Maybe FoldX's energy model is a better indicator than SCWRL's.

### T217I

<figtable id="T217I_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 48.103
• FoldX energy (norm.): 9.312
• Warning: Highly hydrophobic amino acid on the surface!

T217I is the first mutation that is within the C1 domain (cf. <xr id="T217I_pymol"/>). It is part of a coil/turn between two beta sheets and seems to play an important role in the stabilization of this region as it forms a total of 5 hydrogen bonds. All but one of these bonds are lost in both mutant models. Though the hydrogen bond which is conserved is probably the most important one as it reaches across the coil/turn to the beginning of the next beta sheet. While the changes to the surface are only minor, the fact that the mutant is highly hydrophobic indicates a malign mutation. The change in the energy models also, more or less, suggest this mutation to be disease causing.

### R224W

<figtable id="R224W_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 53.506
• FoldX energy (norm.): 2.104

R224W lies within one of the C1 domain's beta sheets and forms two stabilizing hydrogen bonds to the neighboring beta sheet (cf. <xr id="R224W_pymol"/>). All hydrogen bonds are unchanged in the mutant models and no new ones are formed. While the mutant residue has quite a different structure than the wildtype the rotamer chosen by FoldX seems to resemble the original one better. The mutant produces moderate changes on the protein surface which could severe considering that this side of the C1 domain is aligned with Beta-2-Microglobulin (when in complex). SCWRL's energy model also indicates a malign mutation. Overall R224W should be considered disease causing.

### E277K

<figtable id="E277K_pymol">

 SCWRL. FoldX. SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 122.406
• FoldX energy (norm.): 25.854

E277K is part of a very small helix (4 residues according to DSSP) within the C1 domain (cf. <xr id="E277K_pymol"/>). It seems to have a quite complex role in stabilizing the entire domain as it forms hydrogen bonds with three different structural formations: One with the following beta sheet (Y280), two with G275 which is within a short coil, and one with T221 which is at the start of another beta sheet within the C1 domain. Both mutant models lose the hydrogen bonds with G275 and the SCWRL model additionally loses the one with T221 which might have serve effects on the tertiary structure of the C1 domain. These destabilizations, the moderate changes on the protein surface (cave in the SCWRL model, bulk in the FoldX one), and the high energy changes for both models strongly indicate a disease causing mutation.

### C282S

<figtable id="C282S_pymol">

 SCWRL. FoldX.

</figtable>

• SCWRL energy (norm.): 34.746
• FoldX energy (norm.): 46.690

C282S is located within a beta sheet of the C1 domain (cf. <xr id="C282S_pymol"/>) and forms two hydrogen bonds with the neighboring sheet. These bonds are retained in both mutant models and they even form a third one with the same residue. The difference in residue size is minor and it is located within the protein (no surface changes). Though the major problem with this mutation is the loss of the only disulfide bridge (C225-C282) within the C1 domain which is also reflected in the big energy model changes. This loss alone is enough to consider this mutation disease causing.

## Minimise

Next we used Minimise to minimize the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found here.

The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.

<figtable id="energy_gain">

 All models. FoldX models. SCWRL models.

</figtable>

In order to compare the minimise resutls for SCWRL and FoldX with each other and with the original values given by the modeling programs we chose the 2nd iteration results as these showed an improvement in energy for all models. Then the energy values were again normalized (cf. Section: SCWRL and FoldX). An overview table for these values can be found here. <xr id="it2_comparison"/> shows a comparison between the two normalized values. After the minimization every mutation exhibits the same energy change whether it is a SCWRL or FoldX model. Even the magnitude of the changes are quite similar for both methods. In contrast the new values show no correlation to their original ones at all. This suggests that Minimise's performance is almost independent of the input model. The new values also have a good correlation with the mutations' effects (i.e. positive = malign, negative = benign). Only M35T and V53M would result in false predictions (assuming R224W is indeed benign).

<figtable id="it2_comparison">

Mutation SCWRL norm. original norm. FoldX norm. original norm. Validation
M35T 2.431904837 17.72037844 2.656691758 7.744433688 benign
V53M -1.390424842 70.34982121 -2.620524491 3.705278503 malign
G93R 7.858037681 16.14153574 10.91059879 -3.772906935 malign
Q127H 5.379939254 17.91113835 5.091157565 -3.676245328 malign
A162S -1.378597488 38.09921951 -1.029897803 6.643335904 benign
L183P 19.9535606 0.284110511 13.45154088 33.99673341 malign
T217I -1.748299106 48.10396821 -0.268900854 9.312876843 benign
R224W -8.539391409 53.50612664 -10.18388517 2.104124083 benign (uncertain)
E277K 9.191026282 122.4069842 1.966468695 25.85492841 malign
C282S 1.037044876 34.74671548 1.456294229 46.69076258 malign
Table 12: Comparison between the normalized values for the second iteration of minimise and the values for the original models from SCWRL and FoldX. The last column shows the real annotation for the mutation. For R224W it is uncertain if the mutation is indeed benign or if it just has not been classified as malign, yet.

</figtable>

<figure id="R224W_min_fx">

Figure 2: Changes in structure for the R224W FoldX model over 5 iterations with Minimise. Residues within 5Å are also shown as sticks. Colors are: green (1a6zC), blue (FoldX model), cyan (iteration 1), yellow (iteration 2), orange (iteration 3), red (iteration 4), and magenta (iteration 5).

</figure>

<xr id="R224W_min_fx"/> shows the changes in the structure of the R224W mutation based on the FoldX model. Only the first iteration of Minimise changes the area around the mutation site. The remaining iterations seem to have no effect on its structure. The same is true for the entire protein structure. This raises the question what Minimise does in the remaining iterations as the energy values keep geeting worse nevertheless. Although this is only an example, the other mutation models show about the same behavior.

## Gromacs

For gromacs we used the models created with SCWRL and FoldX that were repaired (with repairPDB) like for the minimise-step. (this replaces Step 1 to 3 in the task description)

```title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no
```

We used this .mdp file for evaluating all energies. For more information regarding the arguments read this.

For getting the runtimes of mdrun we iteratively called mdrun with different stepsizes in the .mdp file. At first we looked at 100+X*100 steps resulting in <xr id="runtimes"/>, left picture. Here you can see that the runtime (noted as real-time so uninfluenced by system) is capped at around 32 seconds. As these gaps were too big to see whether its a linear growth we performed another test. To get more accurate results we performed the same test again with 100+X*1. The result can be seen in <xr id="runtimes"/>, right picture.

Based on this result we conclude that the runtime is linear up to a certain point where no improvement can be made anymore and the program terminates.

<figtable id="runtimes">

 Runtime of mdrun with different number of steps, stepwidth 100 Runtime of mdrun with different number of steps, stepwidth 1

The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken for chosen models. The total amount of pictures (for all models) can be found here. The final calculated energies can also be found on that page under section "Tables".

Here (<xr id="gromacs_energies"/>) you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.

The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.

As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.

In the following <xr id="delta_potential"/> we noted the change of potential, observed when comparing the mutation model against the wildtype model. The used models were all created with FoldX.

<figtable id="gromacs_energies">

 A162S mutant, calculated with foldX and the Amber03 forcefield. Wildtype based on the foldX A126S mutation.The used forcefield was Amber03. Wildtype based on the foldX A126S mutation.The used forcefield was Charmm27. Wildtype based on the scwrl method.The used forcefield was Charmm27. Wildtype based on the foldX A126S mutation.The used forcefield was Amber99-ildn. Wildtype based on the scwrl method.The used forcefield was Amber99-ildn

<figtable id="delta_potential">

Mutation change in potential validation
M35T -150.7 benign
V53M 783.4 malign
G93R -529.9 malign
Q127H -196.1 malign
A162S 113.9 benign
L183P 310.5 malign
T217I 159.6 benign
R224W 603.2 benign(should be malign)
E277K 455.8 malign
C282S 32.4 malign
Table 14: change in potential when comparing the mutated foldX models with the wildtype ones.

Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign

In addition to this we also computed potential values for different force fields. It is noticable, that the amber99sb-ildn forcefield generates a much lower potential (~ -42000) than the amber03 and charmm27 (~ -38800) (see "tables"). For this the amber99sb-ildn forcefield generates higher bond and angle values than the amber03 forcefield, whereas charmm27 as forcefield seems to only use bondvalues and reaches about the same potential as the amber03 forcefield through this.

Maybe?

<references/>