Structure-based mutation analysis HEXA

From Bioinformatikpedia
Revision as of 10:40, 29 August 2011 by Link (talk | contribs) (Conclusion)

Sequence Description

We had to use a PDB file, in which are no missing residues and the quality of the structure should be high. We found only one PDB structure which was not bounded to a ligand. Therefore, we could not regard the quality and the pH value, the R-factor and the coverage. Nevertheless, we listed in the following table this values:

experiment type X-Ray diffraction
Resolution 2.8 Å
temperature (Kelvin) 100K
temperature (Celsius) -173 °C
pH-Value 5.5 (slightly acid)
R-Value 0.270


It was not possible to find one file, without any missing residues. In each file there was a gap between residue 74 to 89 and the last amino acid. Therefore, we decided to cut off the first 89 residues and use a PDB file with a structure from 89 - 528. This file can be found [here].

Mutations

Because of the shorten PDB file, it was not possible for us to analyse the first two mutations on position 29 and 39.

SNP-id codon number mutation codon mutation triplet
rs4777505 29 Asn -> Ser AAC -> AGC
rs121907979 39 Leu -> Arg CTT -> CGT
rs61731240 179 His -> Asp CAT -> GAT
rs121907974 211 Phe -> Ser TTC -> TCC
rs61747114 248 Leu -> Phe CTT -> TTT
rs1054374 293 Ser -> Ile AGT -> ATT
rs121907967 329 Trp -> TER TGG -> TAG
rs1800430 399 Asn -> Asp AAC -> GAC
rs121907982 436 Ile -> Val ATA -> GTA
rs121907968 485 Trp -> Arg gTGG -> CGG

Analysis of the mutations

We created for each mutation an extra page. The summary of the analysis can be seen in the Summary Section.


Protocoll - Using the methods

Pymol

We visualized the local hydrogen-bonding network with following commands:

distance hbonds, all, all, 3.2, mode=2
zoom resi <interval>
hide labels, all
color red, resi <mutation_position>

Furthermore, we also used the polar contact mode in pymol to visulize the h-bonds.

The clashed are visualized by the following commands:

distance clash = pos_mutation, all, 2.0, 0
zoom clash

FoldX

To use FoldX, we created a runfile, which can be found [here]. We fitted the temperature and pH-value to the values we extracted from the PDB page. Furthermore, we analysed the mutations with a random choosen temperature and pH value, to see how much influence these parameteres have on the analysis.

We ran FoldX with following command:

FoldX -runfile run.txt > foldx_output

minimise

Next we used minimise. Therefore, it was not necessary to create any file for the run. Sadly, we could not find any documentation about minimise and therefore, it was really hard to figure out how it works and what means the output.

We ran minimise with following command:

minimise <input> <output>

Gromacs

Before we could run Gromacs, we had to curate our PDB file. Therefore, we used the script repairPDB to extract chainA. Next we run SCRWL to make sure, that every residue is available in the PDB file.

Then we used the commands which are listed in our task section.


Additionally, to the analysis of our mutated sequences, we also chose different forcefields and analysed our protein without any mutation with these forcefields. Here are the results of this analysis:

analysed energies (in kJ/mol) force field
AMBER3 AMBER99SB-ILDN CHARMM27
Bond Average 852.968 1091.57 1796.6
Err Estimation - 270 240
RMSD 42.0241 -nan 2924.29
Drift -74.0853 -1622.75 -1404.11
Angle Average 3438.47 3326.81 4764.7
Error Estimation - 62 60
RMSD 16.8864 -nan 466.82
Drift -33.7041 404.076 368.45
Potential Average -50917.7 -61304.1 166.582
Error Estimation - 960 39
RMSD 66.4149 -nan 79.7058
Drift -132.636 -6402.44 280.841

Furthermore, we used different numbers for nsteps. The result of how long these analysis run, can be found in the following table and graph:

nstep time real time user time sys #steps
50 8.268s 3.860s 0.110s 24
500 27.523s 47.650s 0.540s 321
5000 25.281s 17.710s 0.210s 114
50000 14.940s 14.210s 0.300s 91

Results

Energy

 FoldX  Minimise  Gromacs
Mutation energy value Ratio difference energy value Ratio difference energy value Ratio difference
wildtype -154.17 0 -9610.467157 0 -61304.1 0
Rs61731240 -151.61 1.56 -9480.968602 1.35 -48160.5 21.44
Rs121907974 -144.25 6.34 -9594.637506 0.16 -46177.4 24.68
Rs61747114 -153.78 0.15 -9606.588566 0.04 -48802.5 20.40
Rs1054374 -152.15 1.21 -6189.246312 35.60 -48652.6 20.64
Rs121907967 - - - - - -
Rs1800430 -155.11 -0.72 -9505.864181 1.09 -48693.9 20.57
Rs121907982 -154.36 -0.23 -9618.062763 -0.08 -48418.9 21.02
Rs121907968 -149.08 3.20 -9608.663976 0.02 -49443.8 19.35

FoldX

There are differences in the energy of the mutated structures and the wild-type structure, but in general these differences are not that strong. The highest energy values have mutation Rs121907974 and Rs121907968, where the mutated structures have 6% and 3% more energy than the wild type. The lowest difference is between wildtype and Rs61747114 with only 0.15, which is nearly 0. There are two structures which have less energy than the wildtype (mutation Rs1800430 and Rs121907982) but also here the difference is very low (0.23 and 0.72) and therefore it is hard to explain why these mutations damage the protein, because the energy difference is not that high.

Minimise

Minimise has lower energy values than FoldX. But these two values are not comparable, because both methods based on different calculation models. In this case Rs1054374 has the highest energy different with about 35%, which is very high. The lowest energy differences can be seen for mutation Rs61747114 (0.04) and Rs121907968 (0.02). There is also one structure which has a higher energy, but the value is very low (-0.08). In general expect the mutation Rs1054374 almost all of the other structures has less energy difference compared to the wildtype.

Gromacs

Gromacs uses the lowest values, but as before, these values are not comparable with the other methods because of different calculation models. In this case you can see a strong difference between wildtype and mutation with about 20% in average. Furthermore, there is no big difference between the values of the different mutations, as we could seen before. The lowest energy difference has mutation Rs121907968 with 19.35 whereas the highest differences is between wildtype and mutation Rs121907974 with 24.6. This result is consistent with FoldX, where Rs121907974 also has the highest difference between wildtype and mutation.

Discussion

In this section we want to discuss the results of the structure-based mutation analysis.

General

Interesstingly, Gromacs do not calculate any structure with lower energy than the wildtype. FoldX and Minimise have almost always the same trend. There are two exception with the mutation Rs1054374 which has a low ratio with FoldX and a very high ratio with Minimise. Furthermore, FoldX calculated a higher energy than the wildtype energy for Rs1800430, whereas Minimise does not predict this trend. In general to compare the predictions of the different programs we draw a graph with the different values which can be seen in the following picture.

Visualisation of the ratio of the different programs. (FoldX: orange, Minimise: purple, Gromacs: grey)

As you can see in the picture with some exceptions, all three programs predict the same energy trend, if we calculate the ratio and not compare the total values. Therefore, if there is a strong difference between wildtype and mutation all three methods predict it mostly correct and we can conclude, that these mutation cause damages on the protein structure and function.


Bonds

Mutation Position H-bonds original H-bonds mutated
Rs61731240 179 no no
rs121907974 211 no no
rs61747114 248 no no
rs1054374 293 two no
rs121907967 329 - -
rs1800430 399 no no
rs121907982 436 no no
rs121907968 485 one one (different location)

Additionally to the energy calculation we also had a look to the H-bonds between the amino acid we looked at and the rest of the protein. In most cases the amino acid do not have any H-bonds. Only in two cases there are H-bonds. In the first case (rs1054374) the wildtype has 2 H-bonds, whereas the mutated structure do not have any wildtypes. Therefore we suggest, that this mutation destroy the protein function. The second case is mutation rs121907968 where both (mutation and wildtype) have one H-bond. But in this case the mutated amino acid has a wrong H-bond. Thats the reason why we decided that this mutation damages the protein.

Conclusion

Now we want to decide if one mutation damage the protein. This decision is made because of the H-bond analysis and the analysis of the energy values. If two methods predict the same trend, and the third another trend, we decided to trust the majority. We classify a mutation as damaging if there are changes in the H-bonds or the ratio between the wildtype and mutation energy is more than 1.

Mutation damaging decision because of
Rs61731240 yes FoldX, Minimise, Gromacs
Rs121907974 yes FoldX, Gromacs
Rs61747114 no FoldX, Minimise
Rs1054374 yes Minimise, Gromacs, H-bonds
Rs121907967 yes shorter chain
Rs1800430 yes Minimise, Gromacs
Rs121907982 no FoldX, Minimise
Rs121907968 yes FoldX, Gromacs, H-bonds

Because we already know if the mutation is damaging or not, we compare our conclusion with the reality.

Mutation database effect prediction comparions
Rs61731240 SNP-DB neutral non-neutral wrong
Rs121907974 HGMD, SNP-DB non-neutral non-neutral right
Rs61747114 SNP-DB neutral neutral right
Rs1054374 SNP-DB neutral non-neutral wrong
Rs121907967 HGMD, SNP-DB non-neutral non-neutral right
Rs1800430 SNP-DB neutral non-neutral wrong
Rs121907982 SNP-DB neutral neutral right
Rs121907968 SNP-DB, HGMD non-neutral non-neutral right

In the last table we can see that about 70% of our predictions are right, whereas 30% are wrong. Therefore this is a bad result and we can see, that analysing the energy of a mutated structure alone is not sufficient to decide if a mutation is neutral or non-neutral.