Structure-Based Mutation Analysis Hemochromatosis
Riddle of the task
It took you over an hour to figure out the right combination, but the door is finally open. The sight is unbelievable. Inside the nex room lies a treasure beyond imagination: heaps of gems, gold, and jewelry. Exotic furs, marvelous paintings, and many more. You step inside to collect what should now be yours...
The moment you reach out for the first piece of treasure it vanishes into thin air. ALL of it. The treasure was just an illusion... You look around and see another entrance into the room. A collapsed one. Across the room is a person, kneeling before another door. You shout... No answer. He didn't even move. As you get closer to him you see that, whoever it was, is dead. His skin mummified due to the dry air. Next to him an old leathery backpack. You reach out to take it as you notice small fragments on the floor. They look like tiny bits of red glass. Now that you're in front of him you also see many of these splinters burried inside the person's flesh. Within the backpack you find several glass orbs: a blue one, a yellow one, a green one, an orange one, a cyan one, and a violet one. Infront of the dead man, at the bottom of the door, you notice three slots. Each of them about the size of the orbs. One of them is red, the second one orange, and the third one yellow...
Short task description
Detailed description: Structure-based mutation analysis
In this task we employed several methods for structure-based predictions of mutation effects. The methods were SCWRL, FoldX, Minimise, and Gromacs. After the generation of models for each mutation and method we used PyMol and energy statistics to classify the mutations into disease causing ones and benign mutations.
A protocol with a description of the data acquisition and other scripts used for this task is available here.
Structure selection and mapping of the mutations
There are only two structures available for HFE at PDB: 1a6z and 1de4. We chose 1a6z for this task as it has the better resolution (2.6 Å instead of 2.8 Å) and has only a beta-2-microglobulin in addition to HFE. In 1de4 HFE would be complexed with transferrin receptor (TFR). All of the mutations from the previous task (M35T, V53M, G93R, Q127H, A162S, L183P, T217I, R224W, E277K, and C282S) are included in the PDB structure (residues 26-297).
<xr id="mut_map"/> shows a three dimensional mapping of the mutations (red) onto 1a6zC. Glycosylation sites (cyan) and disulfide bonds (orange) are also indicated. The only such residue that is directly affected by a mutation is the disulfide bond spanned by C225 and C282 where C282 is mutated into Serine. Though Q127H, L183P, and R224W are quite close to the glycosylation site at 130 and the two disulfide bonds (C124-C187, C224-C282) and therefore might affect them indirectly.
SCWRL and FoldX
In order to analyze the effects of the mutations we have created several models with SCWRL<ref name="scwrl">Georgii G. Krivov, Maxim V. Shapovalov, and Roland L. Dunbrack, Jr. (2009): Improved prediction of protein side-chain conformations with SCWRL4. PMID 19603484</ref> and FoldX<ref name="foldx1">Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005): The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8. PMID 15980494</ref><ref name="foldx2">Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005): Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52. PMID 16006526</ref><ref name="foldx3">Guerois R., Nielsen J. E., Serrano L. (2002): Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87 PMID 12079393</ref>. These models were then superimposed onto the reference structure (1a6zC). Our analysis included changes in the hydrogen bonds, differences in the potential energy, and surface changes (unless burried within the protein). The color codes in the following section are:
- green: reference (1a6zC)
- cyan: SCWRL wildtype
- magenta: SCWRL mutant
- orange: FoldX wildtype
- red: FoldX mutant
An overview table containing all energy values for the models and their wildtypes can be found here. In the following section only the normalized energy change will be given. This number represents the difference in energy compared to the wildtype model in one-tenth of a percent (i.e. 17 means +1.7%).
- SCWRL energy (norm.): 17.720
- FoldX energy (norm.): 7.744
The wildtype of M35T is part of a beta sheet complex in the MHC I domain and spans two hydrogen bonds to a neighboring beta sheet (cf. <xr id="M35T_pymol"/>). Both of these hydrogen bonds are preserved in the mutant model (SCWRL and FoldX). The FoldX model uses a slighty different rotamer, though, which enables it to form an additional hydrogen bond to the previous residue. This might cause an increased stability over the wildtype. The changes to the surface due to the mutation are only minor and should not cause any problems. The surface model also shows that FoldX uses a slighty different rotamer for the wildtype model. Even the energy values indicate only minor changes in the whole model. Therefore this mutation should be considered non disease causing.
- SCWRL energy (norm.): 70.349
- FoldX energy (norm.): 3.705
V53M marks the transition of a beta sheet into a turn within the MHC I domain and forms two hydrogen bonds to the beginning of the next beta sheet (cf. <xr id="V53M_pymol"/>). The mutant models both retain these bonds and do not form additional ones either. As this residue is burried within the protein, there are no changes to the surface, but the residue loses its strong hydrophobic character which would force it into the protein during translation/folding. Even though the rotamers used by SCWRL and FoldX differ only slightly the difference in the energy model is quite huge. While FoldX would not indicate that V53M is disease causing, SCWRL's energy change does so. The mutation is quite hard to classify. Though considering that it has the second highest energy change for all SCWRL models and that it loses its hydrophobicity, it is more likely to be disease causing than not.
- SCWRL energy (norm.): 16.141
- FoldX energy (norm.): -3.772
G93R lies within a big alpha helix in the MHC I domain (cf. <xr id="G93R_pymol"/>). The important three hydrogen bonds for the helix stabilization are conserved in both mutant models. In the FoldX model an additional hydrogen bond within the helix structure is formed. While these changes seem harmless at first, it should also be noted that this region is supposed to be the interface for the TFR-HFE complex. This makes the changes to the surface even more severe than they would seem on their own. The much bigger arginine causes a massiv bulk on the surface which is very likely to interfere with the complex formation. Therefore this mutation should be considered disease causing, even if the energy models do not suggest this.
- SCWRL energy (norm.): 17.911
- FoldX energy (norm.): -3.676
Q127H is at the start of a coil/turn between two beta sheets in the MHC I domain (cf. <xr id="Q127H_pymol"/>). While both mutant models retain the two hydrogen bonds that stabilize this coil/turn, the FoldX model forms even an additional one, they both lose a hydrogen bond which connects Q127 and E125. Thus the indirect anchor to the previous beta sheet is lost. This might not be that severe, but one of the connected amino acids marks the glycosylation site N130 (connected by the lower hydrogen bond in the figures). With this in mind this mutation should be considered disease causing. Like in the previous mutation this is contrary to the energy models.
- SCWRL energy (norm.): 38.099
- FoldX energy (norm.): 6.643
A162S is part of a helix in the MHC I domain (cf. <xr id="A162S_pymol"/>). All wildtype hydrogen bonds are preserved in the mutant models and several new ones are formed (3 in SCWRL and 4 in FoldX). This should further stabilize the structure. Additionally the residue is buried within the protein and thus causes no changes on the surface. Even the size of the wildtype and mutant amino acids does not differ much. The only indicator for a malign mutation would be the energy change in the SCWRL model, but this has proven to be quite unreliabe in the previous mutations. Therefore this mutation should be considered non disease causing.
- SCWRL energy (norm.): 0.284
- FoldX energy (norm.): 33.996
L183P is, again, located in one of the MHC I domain's helices. Proline's effect as a helix breaker is demonstrated in <xr id="L183P_pymol"/>. Both stabilizing hydrogen bonds are lost and no new ones are formed. As mentioned before this region is interface for the TFR-HFE complex and therefore a break in one of the three big helices should be considered to be disease causing, even though this particular residue is not on the surface or the protein. This is also the first FoldX model to show a big energy change. Maybe FoldX's energy model is a better indicator than SCWRL's.
- SCWRL energy (norm.): 48.103
- FoldX energy (norm.): 9.312
- Warning: Highly hydrophobic amino acid on the surface!
T217I is the first mutation that is within the C1 domain (cf. <xr id="T217I_pymol"/>). It is part of a coil/turn between two beta sheets and seems to play an important role in the stabilization of this region as it forms a total of 5 hydrogen bonds. All but one of these bonds are lost in both mutant models. Though the hydrogen bond which is conserved is probably the most important one as it reaches across the coil/turn to the beginning of the next beta sheet. While the changes to the surface are only minor, the fact that the mutant is highly hydrophobic indicates a malign mutation. The change in the energy models also, more or less, suggest this mutation to be disease causing.
- SCWRL energy (norm.): 53.506
- FoldX energy (norm.): 2.104
R224W lies within one of the C1 domain's beta sheets and forms two stabilizing hydrogen bonds to the neighboring beta sheet (cf. <xr id="R224W_pymol"/>). All hydrogen bonds are unchanged in the mutant models and no new ones are formed. While the mutant residue has quite a different structure than the wildtype the rotamer chosen by FoldX seems to resemble the original one better. The mutant produces moderate changes on the protein surface which could severe considering that this side of the C1 domain is aligned with Beta-2-Microglobulin (when in complex). SCWRL's energy model also indicates a malign mutation. Overall R224W should be considered disease causing.
- SCWRL energy (norm.): 122.406
- FoldX energy (norm.): 25.854
E277K is part of a very small helix (4 residues according to DSSP) within the C1 domain (cf. <xr id="E277K_pymol"/>). It seems to have a quite complex role in stabilizing the entire domain as it forms hydrogen bonds with three different structural formations: One with the following beta sheet (Y280), two with G275 which is within a short coil, and one with T221 which is at the start of another beta sheet within the C1 domain. Both mutant models lose the hydrogen bonds with G275 and the SCWRL model additionally loses the one with T221 which might have serve effects on the tertiary structure of the C1 domain. These destabilizations, the moderate changes on the protein surface (cave in the SCWRL model, bulk in the FoldX one), and the high energy changes for both models strongly indicate a disease causing mutation.
- SCWRL energy (norm.): 34.746
- FoldX energy (norm.): 46.690
C282S is located within a beta sheet of the C1 domain (cf. <xr id="C282S_pymol"/>) and forms two hydrogen bonds with the neighboring sheet. These bonds are retained in both mutant models and they even form a third one with the same residue. The difference in residue size is minor and it is located within the protein (no surface changes). Though the major problem with this mutation is the loss of the only disulfide bridge (C225-C282) within the C1 domain which is also reflected in the big energy model changes. This loss alone is enough to consider this mutation disease causing.
Next we used Minimise to minimize the energy for each of the 31 models created with SCWRL (10 mutations + WT) and FoldX (10 mutations and wildtypes). Each model was consecutively minimized five times (i.e. the output from the previous iteration was used as input for the next one). A table with the absolute energy values can be found here.
The median energy change per iteration in relation to the first iteration is shown in <xr id="energy_gain"/>. It clearly demonstrates that too many iterations not only fail to improve the model, but make it even worse. For the FoldX models only the second iteration makes the models better, every iteration thereafter makes the models worse than they were after the first one. The SCWRL models stop to improve after the third iteration. After the fifth iteration they are about as good as after the first iteration.
In order to compare the minimise resutls for SCWRL and FoldX with each other and with the original values given by the modeling programs we chose the 2nd iteration results as these showed an improvement in energy for all models. Then the energy values were again normalized (cf. Section: SCWRL and FoldX). An overview table for these values can be found here. <xr id="it2_comparison"/> shows a comparison between the two normalized values. After the minimization every mutation exhibits the same energy change whether it is a SCWRL or FoldX model. Even the magnitude of the changes are quite similar for both methods. In contrast the new values show no correlation to their original ones at all. This suggests that Minimise's performance is almost independent of the input model. The new values also have a good correlation with the mutations' effects (i.e. positive = malign, negative = benign). Only M35T and V53M would result in false predictions (assuming R224W is indeed benign).
|Mutation||SCWRL norm.||original norm.||FoldX norm.||original norm.||Validation|
<xr id="R224W_min_fx"/> shows the changes in the structure of the R224W mutation based on the FoldX model. Only the first iteration of Minimise changes the area around the mutation site. The remaining iterations seem to have no effect on its structure. The same is true for the entire protein structure. This raises the question what Minimise does in the remaining iterations as the energy values keep geeting worse nevertheless. Although this is only an example, the other mutation models show about the same behavior.
title = PBSA minimization in vacuum cpp = /usr/bin/cpp define = -DFLEXIBLE -DPOSRES implicit_solvent = GBSA integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System ns_type = grid coulombtype = cut-off rcoulomb = 1.0 rvdw = 1.0 constraints = none pbc = no
We used this .mdp file for evaluating all energies. For more information regarding the arguments read this.
For getting the runtimes of mdrun we iteratively called mdrun with different stepsizes in the .mdp file. At first we looked at 100+X*100 steps resulting in <xr id="runtimes"/>, left picture. Here you can see that the runtime (noted as real-time so uninfluenced by system) is capped at around 32 seconds. As these gaps were too big to see whether its a linear growth we performed another test. To get more accurate results we performed the same test again with 100+X*1. The result can be seen in <xr id="runtimes"/>, right picture.
Based on this result we conclude that the runtime is linear up to a certain point where no improvement can be made anymore and the program terminates.
The pictures in <xr id="gromacs_energies"/> show the resulting calculated values of bonds, angles and potential based on the number of steps taken for chosen models. The total amount of pictures (for all models) can be found here. The final calculated energies can also be found on that page under section "Tables".
Here (<xr id="gromacs_energies"/>) you can see that at the beginning the potential and bond values are very high and with each step (for about the first 20 steps) improve to values that seem to be near the ones that are calculated in a later step. For the angles: they start at a value that is found in the end, but at first (about the first 20 steps) are raised and then reduced again.
The potential is over the number of steps decreasing constantly. At the same time the values of bond and angle increase.
As the potential is the only value over time that continuous decreasing we use this value for prediction of the disease causing mutation.
In the following <xr id="delta_potential"/> we noted the change of potential, observed when comparing the mutation model against the wildtype model. The used models were all created with FoldX.
|Mutation||change in potential||validation|
|R224W||603.2||benign(should be malign)|
Based on our knowledge a cutoff for this prediction of +/-175 as change of potential could be best (lower than 175: predicted as benign, else malign). This would (in our case) lead to a accuracy of 80% (with R224W being malign 90%). However, as always one should keep in mind that we only have 10 mutations here. Also the potential does not correlate that well with the state benign/malign, as the C282S mutation has a change of potential of only 32 (which would suggest the same structural attributes as the wildtype) but is classified as malign
In addition to this we also computed potential values for different force fields. It is noticable, that the amber99sb-ildn forcefield generates a much lower potential (~ -42000) than the amber03 and charmm27 (~ -38800) (see "tables"). For this the amber99sb-ildn forcefield generates higher bond and angle values than the amber03 forcefield, whereas charmm27 as forcefield seems to only use bondvalues and reaches about the same potential as the amber03 forcefield through this.