Structure-based mutation analysis

From Bioinformatikpedia

by Robert Greil and Cedric Landerer

General

According to the UniProt entry about HFE_HUMAN are three 3D-structures of the HFE_HUMAN available, which are listed below. We have chosen the '1A6Z' because it has the best resolution, a very good R-Value (it measures the quality of the model obtained from the crystallographic data), a pH near the physiological optimum and is as good as complete. '1DE4' has a slightly better R-Value and pH, but this PDB also includes the transferrin receptor, which we do not need and do not want in our structure. Also the missing residues of chain A are the same as in the structure '1A6Z' which are only the first three positions. '1C42' is only a hypothetical model, so we exclude it from further research.

Figure 1: stereochemical properties of 1a6z

All stereo chemical properties of the structure are shown in Figure 1.<ref>Lebrón JA, Bennett MJ, Vaughn DE, Chirino AJ, Snow PM, Mintier GA, Feder JN, Bjorkman PJ.: Crystal structure of the hemochromatosis protein HFE and characterization of its interaction with transferrin receptor.</ref>.

PDB Method Resolution (Å) Chain R-Value R-Free pH Temperature Completeness Missing residues (Chain:pos)
1A6Z X-ray 2.60 A/C 0.233 0.277 6.5 -150.15°C (123 k) 98.0 A:1-3 C:1-3 - - - -
1C42 model - A - - - - - - - - - - -
1DE4 X-ray 2.80 A/D/G 0.231 0.265 8.0 -173.15°C (100 k) 94.3 A:1-3 C:121,757-760 D:1-3 F:121,757-760 G:1-3 I:121,757-760

We used the following 7 mutations, because we were not able to map 3 of them to the 1A6ZA chain mostly because of position errors. We are using the same names as given in Task 6. Mutation 8 had to be split into two parts because of possible mutation into to different amino acids.

Mutation
Mutation 2 [M35T]
Mutation 3 [S65C]
Mutation 4 [I105T]
Mutation 5 [Q127H]
Mutation 6 [A176V]
Mutation 7 [T217I]
Mutation 8a [C282Y]
Mutation 8b [C282S]

Mapping

Because we have no annotation about the active site so we just visualized the mutations at the '1A6Z' structure in Figure 2. Secondly we are using the same mutations as in Task 6 and have therefore the same issues with the visualization (only 7 of 10 visualize-able).

All mutations are scattered across the protein with no affinity to some special region or secondary structure element. Mutations shown in red are near glycosylation positions and mutations in yellow are near disulfide bonds (Figure 2). The mutations are shown in detail in the table below.

Figure 2: 1A6ZA with 7 of 10 visualized mutations taken from task 6
Mutation 2 [M35T] Mutation 3 [S65C] Mutation 4 [I105T] Mutation 5 [Q127H]
Mutation Superposition Mutation Superposition Mutation Superposition Mutation Superposition
PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL
UProt(35) pdb(13) mutant.png UProt(35) pdb(13) superpos.png UProt(65) pdb(43) mutant.png UProt(65) pdb(43) superpos.png UProt(105) pdb(83) mutant.png UProt(105) pdb(83) superpos.png UProt(127) pdb(105) mutant.png UProt(127) pdb(105) superpos.png
SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL
UProt(35) 1a6z 35 mutant.png UProt(35) 1a6z 35 superpos.png UProt(65) 1a6z 65 mutant.png UProt(65) 1a6z 65 superpos.png UProt(105) 1a6z 105 mutant.png UProt(105) 1a6z 105 superpos.png UProt(127) 1a6z 127 mutant.png UProt(127) 1a6z 127 superpos.png
Mutation 6 [A176V] Mutation 7 [T217I] Mutation 8a [C282Y] Mutation 8b [C282S]
Mutation Superposition Mutation Superposition Mutation Superposition Mutation Superposition
PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL PyMOL
UProt(176) pdb(154) mutant.png UProt(176) pdb(154) superpos.png UProt(217) pdb(195) mutant.png UProt(217) pdb(195) superpos.png UProt(282a) pdb(260) mutant.png UProt(282a) pdb(260) superpos.png UProt(282b) pdb(260) mutant.png UProt(282b) pdb(260) superpos.png
SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL SCWRL
UProt(176) 1a6z 176 mutant.png UProt(176) 1a6z 176 superpos.png UProt(217) 1a6z 217 mutant.png UProt(217) 1a6z 217 superpos.png UProt(282) 1a6z 282a mutant.png UProt(282) 1a6z 282a superpos.png UProt(282) 1a6z 282b mutant.png UProt(282) 1a6z 282b superpos.png

The structure shown in grey is the reference structure, the green painted amino acid is the unmutated amino acid and the red one is the mutation. It is clearly visible that the side chains/rotamers used by SCWRL are often very different to these introduced by PyMOL. PyMOL does only that the side chains of the mutated amino acid but SCWRL even recalculates side chains of non mutated amino acids because of non allowed clashes between them. SCWRL introduces mostly small directional changes to prohibit these clashes. We believe the rotamers used by SCRWL are the more correct ones, because of the fact that SCRWL does also check for possible clashes and that PyMOL is more a visualization tool.

Energy comparison

SCWRL

SCWRL predicts protein side-chain confirmations given a fixed backbone. We are using SCWRL4<ref>Georgii G. Krivov, Maxim V. Shapovalov, and Roland L. Dunbrack, Jr.: Improved prediction of protein side-chain conformations with SCWRL4</ref> released in 2009.

Usage:

  • use only chain A of backbone pdb: 1A6ZA.pdb
  • extract amino acid sequence and change it to lowercase: aa.txt
  • introduce each mutation into on seperated aa_x.txt file as capital
    • cmd: scwrl -i 1A6ZA.pdb -s aa_x.txt -o ./mutant_pdbs/1A6ZA_mutant_x.pdb > 1A6ZA_mutant_x.txt

Results:

Mutation Position Energy Energy normalized
Reference -- 247.944 1
Mutation 2 [M35T] 35 252.324 1,017665279
Mutation 3 [S65C] 65 246.695 0,994962572
Mutation 4 [I105T] 105 250.833 1,011651825
Mutation 5 [Q127H] 127 252.368 1,017842739
Mutation 6 [A176V] 176 280.381 1,130823896
Mutation 7 [T217I] 217 260.189 1,049386152
Mutation 8a [C282Y] 282 389.539 1,571076533
Mutation 8b [C282S] 282 255.859 1,031922531
  • The energy is normalized by the wild-type structure. A value larger than 1 means that the energy is increased compared to the wild-type. A value smaller 1 shows a decreased energy.

Only mutation 3 shows an decreased energy level which means that this mutation is able to occur more often because it is favored. All mutations expect 8a are placed around an energy level of 1 what means that mutations should occur at the same amount in population as the reference. Therefore it is very astounding that mutation 8a has the highest increased energy level although it is the mutation which causes most of all hemochromatosis cases.

Minimise

Minimise is able to minimise the energy of a model.

Usage:

  • remove all hydrogen and water atoms from the pdb files with repairPDB
    • cmd: repairPDB 1A6ZA_mutant_x.pdb -nosol > ./repair_pdb/1A6ZA_mutant_x_clean.pdb
  • minimise the energy of the models:
    • cmd: minimise 1A6ZA_mutant_x_clean.pdb ./minimised_pdb/1A6ZA_mutant_x_clean_minimised.pdb > 1A6ZA_mutant_x_clean_minimised.txt

Results:

Mutation Position Energy Energy normalized
Reference -- -3724.153777 1
Mutation 2 [M35T] 35 -5020.465319 1,348082174
Mutation 3 [S65C] 65 -5040.815685 1,353546601
Mutation 4 [I105T] 105 -5028.869826 1,35033893
Mutation 5 [Q127H] 127 -5031.137220 1,350947765
Mutation 6 [A176V] 176 -4957.946411 1,331294761
Mutation 7 [T217I] 217 -5037.718631 1,352714988
Mutation 8a [C282Y] 282 -2596.778899 0,697280256
Mutation 8b [C282S] 282 -5017.057355 1,347167076
  • The energy is normalized by the wild-type structure. A value larger than 1 means that the energy is increased compared to the wild-type. A value smaller 1 shows a decreased energy.

The result seems to almost a correct one. According to the energy levels all mutations except 8a occur less than the reference structure because they have an significant increased energy level. But it is not very clear, why mutation 8a shows a radical decreased energy level, which implies that this mutation will occur much more often in the population as the reference structure. This can not be correct and thus should be counted as an error by minimise.

FoldX

FoldX<ref>Joost Schymkowitz, Jesper Borg, Francois Stricher, Robby Nys, Frederic Rousseau, and Luis Serrano: The FoldX web server: an online force field</ref> scores the importance of amino acid interactions according to the overall stability of the protein and calculates the energy.

Usage:

  • create a runfile tutorial and adjust all default parameters to known (if possible): runfile.txt
  • create a listfile of all pdb files that should be included in energy calculation: listfile.txt
  • run foldx with runlist
    • cmd: Foldx -runfile runfile.txt > output.txt

Results:

Mutation Position Energy Energy normalized
Reference -- 169.51 1
Mutation 2 [M35T] 35 208.08 1,227538198
Mutation 3 [S65C] 65 206.66 1,219161111
Mutation 4 [I105T] 105 210.39 1,241165713
Mutation 5 [Q127H] 127 205.04 1,209604153
Mutation 6 [A176V] 176 214.31 1,264291192
Mutation 7 [T217I] 217 208.15 1,227951153
Mutation 8a [C282Y] 282 242.23 1,429001239
Mutation 8b [C282S] 282 215.50 1,271311427
  • The energy is normalized by the wild-type structure. A value larger then 1 means that the energy is increased compared to the wild-type. A value smaller 1 shows a decreased energy.

The result seems to be a mixture of the result of SCWRL and Minimise. All mutations have an increased energy level, which means they do not occur as often as the reference. But again mutation 8a has the highest energy level, which seems to some strange behavior associated with this mutation.

Gromacs

Gromacs<ref>David Van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan E. Mark, Herman J. C. Berendsen: GROMACS: Fast, flexible, and free</ref> is a software suite for chemical simulations, developed at the University of Groningen in the early 1990s.

  • We used the -ignh mode to ignore all hydroxen atom.
  • As forcefield, we chosed the AMBER03, CHARMM27 and AMBERGS model, the corresponding energy curves are shown below in Figure 3 to Figure 6.

Energy table for the AMBER03 forcefield.

Mutation Total Energy Bond Difference Bond Total Energy Angle Difference Angle Total Energy Potential Difference Potential
Wild-Type 848,392 1 2707,42 1 -32380,6 1
[M35T] 774,332 0,912705447 2707,35 0,999974145 -32738 1,011037473
[S65C] 695,044 0,819248649 2699,65 0,997130109 -32872,2 1,01518193
[I105T] 0 0 0 0 0 0
[Q127H] 835,05 0,984273779 2782,21 1,027624085 -32213,2 0,994830238
[A176V] 760,573 0,896487709 2734,66 1,010061239 -33030,1 1,020058307
[T217I] 727,061 0,8569871 2712,54 1,001891099 -32609,5 1,007069048
[C282Y] 851,692 1,003889711 2754,62 1,017433571 -31431,1 0,970676887
[C282S] 852,244 1,004540354 2706,07 0,99950137 -32036,4 0,989370178
Figure 3: Energy curve of the AMBER03 forcefield with nstep = 500
Figure4: Energy curve of the CHARMM27 forcefield with nstep = 500
Figure 5: Energy curve of the AMBERGS forcefield with nstep = 500
Figure 6: time versus nstep plot of the three different forcefields


Wild-Type force field comparisson

Forcefield Bond Angle Potetial
AMBER03 848,392 2707,42 -32380,6
CHARMM27 1064,95 --- -37356,1
AMBERGS 724.545 2785.47 -40390.8

Energy for the Wild-Type

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 848.392 380 -nan -2320.49
Angle 2707.42 22 -nan -96.9545
Potential -32380.6 1200 -nan -7696.01

Energy for the Mutation [M35T]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 774.332 300 2156.92 -1864.67
Angle 2707.35 16 130.232 -51.7455
Potential -32738 1100 3905.23 -7119.34

Energy for the Mutation [S65C]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 695.044 230 1873.95 -1383.66
Angle 2699.65 9.8 113.651 -42.4234
Potential -32872.2 890 3424.15 -5775.47

Energy for the Mutation [I105T]
For this mutation, gromacs faild to calculate energies.

Energy for the Mutation [Q127H]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 835.05 370 -nan -2189.82
Angle 2782.21 20 -nan -94.6097
Potential -32213.2 1100 -nan -7254.92


Energy for the Mutation [A176V]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 760.573 290 -nan -1676.2
Angle 2734.66 20 -nan -125.203
Potential -33030.1 1000 -nan -6485.15

Energy for the Mutation [T217I]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 727.061 260 1980.17 -1567.84
Angle 2712.54 12 119.206 -50.0057
Potential -32609.5 980 3633.19 -6435.18


Energy for the Mutation [C282Y]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 851.692 370 2500.48 -2249.54
Angle 2754.62 25 158.776 -152.741
Potential -31431.1 2100 16296.8 -13588.8


Energy for the Mutation [C282S]

Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Bond 852.244 380 2424.32 -2374.34
Angle 2706.07 24 145.766 -110.811
Potential -32036.4 1200 4277.91 -7896.74

Discussion

General
In the most cases, the energy level changes just slightly, so the lost in function is not due to stabilizing or destabilizing of the structure, but more due to changing chemical properties at the surface and in functional regions. So, the flexibility is most likely not affected. Just Mutation 8a shows a different behavior and differs in the most cases from the average energy deviation. This is not surprising thus, this mutation is the most common cause of hemachromatosis<ref>Feder J.N. A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis.</ref>. As HFE forms a complex to regulate the transferin receptor, a change in the stereo-chemical properties at the surface will change the binding affinity.

Mutation 2 [M35T]
This mutation is a non damaging mutation. The energy level in all cases differ just a bit, but in all cases, we can see an increase of the energy level. While Methionine can form a sulfide bond, Threonine is able to form a hydrogen bond. Thus, the mutated amino acid can be easily stabilized by the surrounding amino acids. And also, as this mutation is within a beta sheet a hydrogen bond can stabilize the sheet additionally. And as this mutation is a non damaging one, Methionine at position 35 is also not a functional residue.

Mutation 3 [S65C]
In this case, Serine is changed to Cysteine. Compared to Mutation 2, here an amino acid which can form a hydrogen bond is changed into one which is able to form a disulfide bond. As the mutation is also part of a beta sheet, a missing hydrogen bond destabilizes the structure. SCWRL evaluated this with a slightly decrease and the other tools with an increase of the energy level. As the mutation is at the end of a beta sheet, it is possible, that the length of the sheet is just decreased, and the corresponding turn is increased in length. The mustation causes a malfunction of the protein, most likely by damaging the assumed transferin binding site.

Mutation 4 [I105T]
This mutation is part of a helix and while this mutation is first of all a change in size, the structure of the helix is damaged by the mutation. The helix is bended and so the parallel helices are no longer parallel. And as we assume this is a functional component like a binding site, the function of the protein is disturbed. Because Gromacs was not able to calculate the energy for this mutation, our assumptions are based just on FoldX, SCWRL and Minimise. The reason why Gromacs was not able to calculate the energy is still unclear.

Mutation 5 [Q127H]
A damaging mutation within a loop. Here a polar amino acid is exchanged by an aromatic one. Gromacs is the only tool which shows a decreasing energy. The other tools show an increasing energy. As this mutation is not within a structural element, and the mutation is a damaging one, the reason for the change in function is most likely either a functional residue or a change in the flexibility. As we have no information about functional residues, we assume a change in flexibility is the reason for the malfunction.

Mutation 6 [A176V]
This is also a damaging mutation at the beginning of a helix. Here two amino acids with analog properties are exchanged. The surface of the protein is changed at a possible functional region, which may cause the functional damage. Also the energy level increases, but like in the most cases just slightly. So here we assume that the stability and the flexibility of the protein is not changed.

Mutation 7 [T217I]
This non damaging mutation is at the beginning of a loop. The energy level is just slightly increased, and we also see no change in structure. So it is not surprising that this mutation is not causing a malfunction. The flexibility is not affected, and as the function is not affected, the position is functional unimportant.

Mutation 8a [C282Y]
Here a Cysteine is exchanged by a Tyrosine. If we look at the energy levels, the additional hydrogen bond stabilizes the beta-sheet additionally. So, the flexibility is changed in this region. As this mutation is the most common one, this residue could be a possible functional one. As a mutation at position 283 prevents the normal interaction between HFE and B2M and between HFE and TFRC<ref>Le Gac G. et. Al. Phenotypic expression of the C282Y/Q283P compound heterozygosity in HFE and molecular modeling of the Q283P mutation effect.</ref>, this mutation could also be important for this interaction.

Mutation 8b [C282S]
In this case, a Cysteine is exchanged by a Serine. In general, a possible disulfide bond is exchanged by a hydrogen bond. This also stabilizes the protein. As this mutation is also a damaging one, the position might be also important, specially for the interaction with B2M abd TFRC. The energy level differs, like in the most cases, not much from the mean deviation. Therefore we assume that the damaging effect is also due to a negative effect on inter proteogenic interactions.

References

<references />