Structure-based mutation analysis HEXA
Contents
Sequence Description
We had to use a [PDB] file, in which are no missing residues and the quality of the structure should be high. We found only one PDB structure which was not bounded to a ligand. Therefore, we could not regard the quality and the pH value, the R-factor and the coverage. Nevertheless, we listed in the following table this values:
experiment type | X-Ray diffraction |
Resolution | 2.8 Å |
temperature (Kelvin) | 100K |
temperature (Celsius) | -173 °C |
pH-Value | 5.5 (slightly acid) |
R-Value | 0.270 |
It was not possible to find one file, without any missing residues. In each file there was a gap between residue 74 to 89 and the last amino acid. Therefore, we decided to cut off the first 89 residues and use a PDB file with a structure from 89 - 528. This file can be found [here].
Back to [Tay-Sachs Disease]
Mutations
Because of the shorten PDB file, it was not possible for us to analyse the first two mutations on position 29 and 39.
SNP-id | codon number | mutation codon | mutation triplet |
rs4777505 | 29 | Asn -> Ser | AAC -> AGC |
rs121907979 | 39 | Leu -> Arg | CTT -> CGT |
rs61731240 | 179 | His -> Asp | CAT -> GAT |
rs121907974 | 211 | Phe -> Ser | TTC -> TCC |
rs61747114 | 248 | Leu -> Phe | CTT -> TTT |
rs1054374 | 293 | Ser -> Ile | AGT -> ATT |
rs121907967 | 329 | Trp -> TER | TGG -> TAG |
rs1800430 | 399 | Asn -> Asp | AAC -> GAC |
rs1800431 | 436 | Ile -> Val | ATA -> GTA |
rs121907968 | 485 | Trp -> Arg | gTGG -> CGG |
Back to [Tay-Sachs Disease]
Analysis of the mutations
We created for each mutation an extra page. The summary of the analysis can be seen in the Summary Section.
Protocol - Using the methods
PyMol
We visualized the local hydrogen-bonding network with following commands:
distance hbonds, all, all, 3.2, mode=2 zoom resi <interval> hide labels, all color red, resi <mutation_position>
Furthermore, we also used the polar contact mode in PyMol to visualize the H-bonds.
The clashed are visualized by the following commands:
distance clash = pos_mutation, all, 2.0, 0 zoom clash
Back to [Tay-Sachs Disease]
FoldX
To use FoldX, we created a runfile, which can be found [here]. We fitted the temperature and pH-value to the values we extracted from the [PDB] page. Furthermore, we analysed the mutations with a random chosen temperature and pH value, to see how much influence these parameters have on the analysis.
We ran FoldX with following command:
FoldX -runfile run.txt > foldx_output
Back to [Tay-Sachs Disease]
minimise
Next we used minimise. Therefore, it was not necessary to create any file for the run. Sadly, we could not find any documentation about minimise and therefore, it was really hard to figure out how it works and what the output means.
We ran minimise with following command:
minimise <input> <output>
Back to [Tay-Sachs Disease]
Gromacs
Before we could run Gromacs, we had to curate our PDB file. Therefore, we used the script repairPDB to extract chain A. Next we run SCRWL to make sure, that every residue is available in the PDB file.
Then we used the commands which are listed in our task section.
Additionally, to the analysis of our mutated sequences, we also chose different forcefields and analysed our protein without any mutation with these forcefields.
Here are the results of this analysis:
analysed energies (in kJ/mol) | force field | |||
AMBER3 | AMBER99SB-ILDN | CHARMM27 | ||
Bond | Average | 852.968 | 1091.57 | 1796.6 |
Err Estimation | - | 270 | 240 | |
RMSD | 42.0241 | -nan | 2924.29 | |
Drift | -74.0853 | -1622.75 | -1404.11 | |
Angle | Average | 3438.47 | 3326.81 | 4764.7 |
Error Estimation | - | 62 | 60 | |
RMSD | 16.8864 | -nan | 466.82 | |
Drift | -33.7041 | 404.076 | 368.45 | |
Potential | Average | -50917.7 | -61304.1 | 166.582 |
Error Estimation | - | 960 | 39 | |
RMSD | 66.4149 | -nan | 79.7058 | |
Drift | -132.636 | -6402.44 | 280.841 |
Furthermore, we used different numbers of steps. The result of how long these analysis run, can be found in the following table and graph:
nstep | time real | time user | time sys | #steps |
50 | 8.268s | 3.860s | 0.110s | 24 |
500 | 27.523s | 47.650s | 0.540s | 321 |
5000 | 25.281s | 17.710s | 0.210s | 114 |
50000 | 14.940s | 14.210s | 0.300s | 91 |
Back to [Tay-Sachs Disease]
Results
Energy
In the following table we list all different energy values which are calculated by the different programs. To get the possibility to compare the values with each other we also calculated a ratio between the value of the wildtype and the mutation and compared these two values. It is not possible to compare the values of the different programs directly, because each program use other assumptions for the calculation of the energy. Nevertheless, we think, that the general trend between these programs has to be similar and therefore, we calculated the ratio to compare them.
FoldX | Minimise | Gromacs | ||||
Mutation | energy value | Ratio difference | energy value | Ratio difference | energy value | Ratio difference |
wildtype | -154.17 | 0 | -9610.467157 | 0 | -61304.1 | 0 |
Rs61731240 | -151.61 | 1.56 | -9480.968602 | 1.35 | -48160.5 | 21.44 |
Rs121907974 | -144.25 | 6.34 | -9594.637506 | 0.16 | -46177.4 | 24.68 |
Rs61747114 | -153.78 | 0.15 | -9606.588566 | 0.04 | -48802.5 | 20.40 |
Rs1054374 | -152.15 | 1.21 | -6189.246312 | 35.60 | -48652.6 | 20.64 |
Rs121907967 | - | - | - | - | - | - |
Rs1800430 | -155.11 | -0.72 | -9505.864181 | 1.09 | -48693.9 | 20.57 |
rs1800431 | -154.36 | -0.23 | -9618.062763 | -0.08 | -48418.9 | 21.02 |
Rs121907968 | -149.08 | 3.20 | -9608.663976 | 0.02 | -49443.8 | 19.35 |
Back to [Tay-Sachs Disease]
FoldX
There are differences in the energy of the mutated structures and the wild-type structure, but in general these differences are not that strong. The highest energy values have mutation Rs121907974 and Rs121907968, where the mutated structures have 6% and 3% more energy than the wild type. The lowest difference is between wildtype and Rs61747114 with only 0.15, which is nearly 0.
There are two structures which have less energy than the wildtype (mutation Rs1800430 and Rs1800431) but also here the difference is very low (0.23 and 0.72) and therefore it is hard to explain why these mutations damage the protein, because the energy difference is not that high.
Back to [Tay-Sachs Disease]
Minimise
Minimise has lower energy values than FoldX. But these two values are not comparable, because both methods based on different calculation models. In this case Rs1054374 has the highest energy different with about 35%, which is very high. The lowest energy differences can be seen for mutation Rs61747114 (0.04) and Rs121907968 (0.02). There is also one structure which has a higher energy, but the value is very low (-0.08). In general expect the mutation Rs1054374 almost all of the other structures has less energy difference compared to the wildtype.
Back to [Tay-Sachs Disease]
Gromacs
Gromacs uses the lowest values, but as before, these values are not comparable with the other methods because of different calculation models. In this case you can see a strong difference between wildtype and mutation with about 20% in average. Furthermore, there is no big difference between the values of the different mutations, as we could seen before. The lowest energy difference has mutation Rs121907968 with 19.35 whereas the highest differences is between wildtype and mutation Rs121907974 with 24.6. This result is consistent with FoldX, where Rs121907974 also has the highest difference between wildtype and mutation.
Back to [Tay-Sachs Disease]
Discussion
In this section we want to discuss the results of the structure-based mutation analysis.
General
Interestingly, Gromacs do not calculate any structure with lower energy than the wildtype. FoldX and Minimise have almost always the same trend. There are two exception with the mutation Rs1054374 which has a low ratio with FoldX and a very high ratio with Minimise. Furthermore, FoldX calculated a higher energy than the wildtype energy for Rs1800430, whereas Minimise does not predict this trend. In general to compare the predictions of the different programs we draw a graph with the different values which can be seen in Figure 1.
As you can see in the picture with some exceptions, all three programs predict the same energy trend, if we calculate the ratio and not compare the total values.
Therefore, if there is a strong difference between wildtype and mutation all three methods predict it mostly correct and we can conclude, that these mutation cause damages on the protein structure and function.
Back to [Tay-Sachs Disease]
Bonds
Mutation | Position | H-bonds original | H-bonds mutated |
Rs61731240 | 179 | no | no |
rs121907974 | 211 | no | no |
rs61747114 | 248 | no | no |
rs1054374 | 293 | two | no |
rs121907967 | 329 | - | - |
rs1800430 | 399 | no | no |
rs1800431 | 436 | no | no |
rs121907968 | 485 | one | one (different location) |
Additionally to the energy calculation we also had a look to the H-bonds between the amino acid we looked at and the rest of the protein. In most cases the amino acid does not have any H-bonds. Only in two cases there are H-bonds. In the first case (rs1054374) the wildtype has 2 H-bonds, whereas the mutated structure do not have any H-bonds. Therefore we suggest, that this mutation destroy the protein function. The second case is mutation rs121907968 where both (mutation and wildtype) have one H-bond. But in this case the mutated amino acid has a wrong H-bond. That is the reason why we decided that this mutation damages the protein.
Back to [Tay-Sachs Disease]
Conclusion
Now we want to decide if one mutation damage the protein. This decision is made because of the H-bond analysis and the analysis of the energy values. If two methods predict the same trend, and the third another trend, we decided to trust the majority. We classify a mutation as damaging if there are changes in the H-bonds or the ratio between the wildtype and mutation energy is more than 1.
Mutation | damaging | decision because of |
Rs61731240 | yes | FoldX, Minimise, Gromacs |
Rs121907974 | yes | FoldX, Gromacs |
Rs61747114 | no | FoldX, Minimise |
Rs1054374 | yes | Minimise, Gromacs, H-bonds |
Rs121907967 | yes | shorter chain |
Rs1800430 | yes | Minimise, Gromacs |
Rs1800431 | no | FoldX, Minimise |
Rs121907968 | yes | FoldX, Gromacs, H-bonds |
Because we already know if the mutation is damaging or not, we compare our conclusion with the reality.
Mutation | database | effect | prediction | comparions |
Rs61731240 | SNP-DB | neutral | non-neutral | wrong |
Rs121907974 | HGMD, SNP-DB | non-neutral | non-neutral | right |
Rs61747114 | SNP-DB | neutral | neutral | right |
Rs1054374 | SNP-DB | neutral | non-neutral | wrong |
Rs121907967 | HGMD, SNP-DB | non-neutral | non-neutral | right |
Rs1800430 | SNP-DB | neutral | non-neutral | wrong |
Rs1800431 | SNP-DB | neutral | neutral | right |
Rs121907968 | SNP-DB, HGMD | non-neutral | non-neutral | right |
In the last table we can see that about 70% of our predictions are right, whereas 30% are wrong. Therefore this is a bad result and we can see, that analysing the energy of a mutated structure alone is not sufficient to decide if a mutation is neutral or non-neutral.
Back to [Tay-Sachs Disease]