Fabry:Structure-based mutation analysis
Fabry Disease » Structure-based mutation analysis
The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.
Contents
Preparation
<figtable id="tab:Prep">
All available PDB structures assigned to the Uniprot entry P06280 along with the according Resolution,
Coverage and R-factor. The R-factor was obtained from the PDBsum page. The chosen structure 3S5Y is highlighted.
Entry | Method | Resolution (Å) | Chain | Positions (up to 429) | R-factor<ref>R value http://www.proteopedia.org/wiki/index.php?title=R_value&oldid=569063, last checked on June 21, 2012</ref> | R-free<ref>Free R http://www.proteopedia.org/wiki/index.php?title=Free_R&oldid=1390871, last checked on June 21, 2012</ref> | pH | PDBsum | PDB |
---|---|---|---|---|---|---|---|---|---|
1R46 | X-ray | 3.25 | A/B | 32-422 | 0.262 | 0.301 | 8.0 | [»] | [»] |
1R47 | X-ray | 3.45 | A/B | 32-422 | 0.285 | 0.321 | 8.0 | [»] | [»] |
3GXN | X-ray | 3.01 | A/B | 32-421 | 0.239 | 0.301 | 4.5 | [»] | [»] |
3GXP | X-ray | 2.20 | A/B | 32-422 | 0.204 | 0.265 | 4.5 | [»] | [»] |
3GXT | X-ray | 2.70 | A/B | 32-422 | 0.245 | 0.306 | 4.5 | [»] | [»] |
3HG2 | X-ray | 2.30 | A/B | 32-422 | 0.178 | 0.202 | 4.6 | [»] | [»] |
3HG3 | X-ray | 1.90 | A/B | 32-426 | 0.167 | 0.197 | 6.5 | [»] | [»] |
3HG4 | X-ray | 2.30 | A/B | 32-423 | 0.166 | 0.221 | 4.6 | [»] | [»] |
3HG5 | X-ray | 2.30 | A/B | 32-422 | 0.192 | 0.227 | 4.6 | [»] | [»] |
3LX9 | X-ray | 2.04 | A/B | 32-422 | 0.178 | 0.218 | 6.5 | [»] | [»] |
3LXA | X-ray | 3.04 | A/B | 32-426 | 0.216 | 0.244 | 6.5 | [»] | [»] |
3LXB | X-ray | 2.85 | A/B | 32-427 | 0.227 | 0.264 | 6.5 | [»] | [»] |
3LXC | X-ray | 2.35 | A/B | 32-422 | 0.186 | 0.237 | 6.5 | [»] | [»] |
3S5Y | X-ray | 2.10 | A/B | 32-422 | 0.195 | 0.230 | 5.1 | [»] | [»] |
3S5Z | X-ray | 2.00 | A/B | 32-421 | 0.211 | 0.234 | 5.1 | [»] | [»] |
3TV8 | X-ray | 2.64 | A/B | 32-422 | 0.203 | 0.239 | 4.6 | [»] | [»] |
</figtable>
We did not choose the structure 3HG3, although it has the best resolution (1.90 Å) and the second best R-factor (see <xr id="tab:Prep"/>), which is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data <ref>R-factor (crystallography) (May 17, 2012) http://en.wikipedia.org/wiki/R-factor_%28crystallography%29, June 20, 2012</ref>, since it has an Alanin at position 170 (part of the active site) instead of an Aspartic acid. After excluding those structures, that had deviations in the sequence, we had to choose between ten sequences (1R46, 1R47, 3GXN, 3GXP, 3GXT, 3HG2, 3HG4, 3HG5, 3S5Y, 3S5Z) and decided to use 3S5Y. This structure has the advantage of a good pH, very good coverage and still reasonable resolution and R-factor.
Vizualisation
From the first glance, we would consider A143T as disease causing, because it is located right next to a Cysteine, that forms a disulfide bond (yellow) and might interfere with it. The same applies for S65T, which is in the structural neighborhood of this bond and also might cause atom clashes.
There is no mutation that can interfere with the active site(purple) or the substrate binding site (cyan).
The mutation N215S seems to play an important role in the binding of the ligand N-Acetyl-D-Glucosamine.
Both prolines Pro 323 and Pro 40 do not give the impression as if they would be very important. Also Arg 356 is on the surface of the protein and is not involved in any binding, although it might be influencing the nearby helix. This also is applicable for the SNP R118H.
It seems possible, that the mutation V316I does not have a big impact on the helix it is located in.
Neither Q279E nor I289V have an obvious reason to be considered as crucial in this structure.
Create mutation
Pymol
<figtable id="tab:Pymol"> Mutagenesis of the 10 selected SNPs performed with pymol. This was done on the basis of a backbone independant library. Usually the rotamer with the least atomic clashes was chosen. Clashes are depicted as red and green disks. The wildtype amino acid is shown in green, the mutated one in red. Hydrogen bonds of the mutant to the surrounding are depicted in blue. If shown, the active site (residues 170 and 231) is colored pink, substrate binding site (position 203-207) cyan and the existing five disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382) are highlighted in yellow. Ligand atoms are represented as spheres in dark cyan.
</figtable>
Taking a closer look at each of the mutations, which are shown in the pictures in <xr id="tab:Pymol"/> we can underline most of our assumptions posed in section Vizualisation. The SNPs shown there were created with Pymol, a backbone independant library was used and we tried to pick that rotamer, that seemed to produce the least atom clashes. Inspite of that, A143T interferes with the ligand and a cysteine, that forms a disulfide bond and thus can be considered as an important site.
In this view, the mutation S65T does not seem to be that malign, since most of the important hydrogen bonds are kept and only little clashes are caused. The same applies again for V316I, Q279E and I289V. The most atom clashes are produced by the introduced Glu at position 279, but in all three point mutations the important h-bonds that stabilize the alpha-helix are still present.
The assumption that Asn 215 has an important role in binding of the ligand can be confirmed, but considering the newly introduced hydrogen bonds when mutating it to a Serine, that form a contact to N-Acetyl-D-Glucosamine the mutation might though be not disease causing. Again, both mutations of the prolines (P40S and P323T) seem to be not malign, since there appear hardly any atom clashes and all hydrogen bonds are still present.
The closer look at the SNP R118H reveals that this mutation has to be disease causing, since on the one hand the size and shape of the amino acid is changed completely and on the other hand the formed non covalent connections to the adjoint alpha-helices are changed completely. This conclusion can be drawn for the mutation of Arginine 356 to Tryptophane, because the crucial bonds to the alpha helix on the right are destroyed by the introduced aromatic rings.
<figtable id="tab:PymolSurface"> Comparison of the surface of the wildtype and the mutant of the 10 selected SNPs performed with pymol. This was done on the basis of a backbone independant library. Usually the rotamer with the least atomic clashes was chosen. The wildtype amino acid is shown in green, the mutated one in red. Hydrogen bonds of the mutant to the surrounding are depicted in blue. If shown, the active site (residues 170 and 231) is colored pink, substrate binding site (position 203-207) cyan and the existing five disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382) are highlighted in yellow. Ligand atoms are represented as spheres in dark cyan.
</figtable>
According to the surface comparison of the first SNP A143T in <xr id="tab:PymolSurface"/>, this mutation surely is disease causing, because the entry of the binding pocket for
(2R,3S,4R,5S)-2-(hydroxymethyl)piperidine- 3,4,5-triol, which is shown in dark cyan, is constricted due to the introduced Tyrosine.
There are only two point mutations, that do not affect the outside of the protein. Both do not introduce a hydrophilic amino acid into the protein. Since P40S does not seem to affect the adjacent secondary structures it can be considered non-disease causing. The mutation of an Isoleucine to Valine at position 289 deletes a bond to the sheet in the upper right.
Neither N215S nor S65T are considered as disease causing, because they appear to not change the surface of the protein. Similarly, the Isoleucine at residue 316 does not change the surface, but it is hydrophobic and thus we consider the point mutation as damaging.
R356W changes the surface structure of the protein and also introduces a hydrophobic amino acid to the surface and thus has to be considered as disease causing
The remaining three examined mutations (P323T, Q279E and R118H) are reagarded as neutral, because they neither change the surface nor add non-polar residues to the surface.
SCWRL
<figtable id="tab:MutSCWRL"> Mutagenesis of the 10 selected SNPs performed with SCWRL. This was done on the basis of a backbone dependant library. The wildtype amino acid is shown in green, the mutated one in red. The whole structure of the wildtype is colored light blue, the mutant light red. Hydrogen bonds of the mutant to the surrounding are depicted in red, those of the not mutated site in green.
</figtable>
<figtable id="tab:EnergySCWRL"> Minimal energy introduced by the mutated site. Shown is the energy if only chain A is mutated, as well as the energy when both sites are mutated. The last column contains the fraction of the energy of the mutant and the wildtype
SNP | Minimal energy Chain A |
Minimal energy Both chains |
Energy(mutant) /Energy(wt) |
---|---|---|---|
A143T | 142.857 | 290.650 | 1.029 |
I289V | 138.101 | 277.376 | 0.995 |
N215S | 143.360 | 287.975 | 1.033 |
P323T | 143.375 | 287.394 | 1.033 |
P40S | 144.203 | 289.775 | 1.039 |
Q279E | 143.082 | 286.738 | 1.031 |
R118H | 142.636 | 285.207 | 1.028 |
R356W | 149.729 | 305.003 | 1.079 |
S65T | 145.551 | 293.644 | 1.049 |
V316I | 148.135 | 294.203 | 1.067 |
</figtable>
The mutations created by SCWRL are shown in <xr id="tab:MutSCWRL"/>(only chain A was mutated in the structures shown in these pictures), where they are also compared to the wildtype of this residue. Here the focus is on the newly introduced and the deleted hydrogen bonds. The amount of new connections correlates with the size of the minimal energy shown in <xr id="tab:EnergySCWRL"/>, the higher the number, the more new non-covalent bondings are formed and the more severe the changes are.
The free energy of the non mutated chain A of the wildtype protein is 138.787. The only SNP that has a smaller free energy of 138.101 is I289V. Looking at the corresponding picture, we see that none of the h-bonds formed by the wildtype is changed by the mutant. The biggest value can be observed when examining the mutation R356W, where all hydrogen bonds on the right hand side are deleted due to the aromatic rings of the Tryptophane. The smallest values for single mutation sites are I289V, R118H and A143T.
Comparing the values of the structures where only one chain was mutated to these where both chains were mutated, a doubling of the values can be observed, but in most cases, the latter is roughly twice as big as the first term, but usually a little bigger.
Comparison energies
foldX
<figtable id="tab:MutFoldX"> Mutagenesis of the 10 selected SNPs performed with FoldX. The wildtype amino acid is shown in green, the mutated one in red. The whole structure of the wildtype is colored light blue, the mutant light red. Hydrogen bonds of the mutant to the surrounding are depicted in red, those of the not mutated site in green.
</figtable>
The program FoldX calculates the free energy of unfolding ∆G and thus here predicts the stability of the mutated protein. Since the total energy of unfolding in the starting protein is 34.82, this value can be considered as indicator for a stable protein. The raw energies and differences in each part of the force field term can be looked up in the RawFoldX table and the DifFoldX table, respectively. Interesting in our case are the differences listed in <xr id="tab:DifEnergies"/>. There the increase or decrease of ∆G is shown in respect to the mentioned "starting energy" 34.82 and also to the relative wildtype of each mutation, which is computed by FoldX.
Compared to the original protein, all SNPs, except for A143T have a lower Gibbs free energy, thus are less stable. Regarding these values only the mutants R356W and A143T can be considered almost as stable or more stable than the original protein. Regarding the comparison to the relative wildtypes the mutants R356W, V316I, S65T and I289V we consider as equally stable.
Looking at the pictures in <xr id="tab:MutFoldX"/>, which graphically compare the original wildtype to the mutants, we mainly see a similar picture to the comparisons in <xr id="tab:MutSCWRL"/>. Most of the time, the orientation of the introduced amino acid differs a little, but the hydrogen bonds are comparable to those added or deleted by SCWRL. A big difference can be seen in the mutant N215S, where FoldX does not keep the connection to the ligand and rather connects to another residue in the adjoint helix. Thus this conformation can be considered as worse. In P40S and S65T, the build h-bonds as well tend more to the inside and towards already bound structure, than the SCWRL rotamers.
<figtable id="tab:DifEnergies"> Shown are the differences in free energy of unfolding of each SNP compared to the starting energy (see <xr id="tab:energyStart"/> ) of the wildtype and to the energy of their relative wild type (see DifFoldX table) calculated by FoldX
SNP | Difference to starting energy |
Fraction starting energy |
Difference to relative wild type |
Fraction relative wild type |
---|---|---|---|---|
A143T | 3.38 | 0.903 | 1.63 | 1.055 |
I289V | -1.42 | 1.041 | 0.88 | 1.025 |
N215S | -4.04 | 1.116 | 1.19 | 1.032 |
P40S | -2.39 | 1.069 | 3.62 | 1.108 |
P323T | -5.26 | 1.151 | 1.54 | 1.040 |
Q279E | -2.82 | 1.081 | 2.95 | 1.085 |
R118H | -2.43 | 1.070 | 1.19 | 1.033 |
R356W | -0.64 | 1.018 | -0.23 | 0.994 |
S65T | -8.71 | 1.250 | 0.16 | 1.004 |
V316I | -8.09 | 1.232 | -0.59 | 0.986 |
</figtable>
<figtable id="tab:energyStart"> Here, the starting energies are listed, that were calculated by FoldX before performing any stability calculations on the protein.
Type | Energy |
---|---|
BackHbond | -528.41 |
SideHbond | -149.96 |
Energy_VdW | -1013.85 |
Electro | -25.51 |
Energy_SolvP | 1310.89 |
Energy_SolvH | -1318.53 |
Energy_vdwclash | 99.21 |
energy_torsion | 26.73 |
backbone_vdwclash | 478.11 |
Entropy_sidec | 482.20 |
Entropy_mainc | 1219.31 |
water bonds | -29.53 |
helix dipole | -12.51 |
loop_entropy | 0.00 |
cis_bond | 4.50 |
disulfide | -29.93 |
kn electrostatic | -0.98 |
partial covalent interactions | 0.00 |
Energy_Ionisation | 1.20 |
Entropy Complex | 0.00 |
Total | 34.82 |
</figtable>
Minimise
<figure id="fig:Minimise1">
</figure> <figure id="fig:Minimise2">
</figure>
<figtable id="tab:Energies"> Comparison of the energies calculated by minimise after each iteration step. Each figure shows the energies of the structures created by SCWRL and FoldX for one point mutation. The SCWRL energies are plotted in red, FoldX energies in blue. It can be seen in all pictures, that the energies become worse (bigger) in almost each step and always are worse than at the beginning.
</figtable>
Minimise tries to minimize the energy of the structure. In <xr id="fig:Minimise2"/> an example for a point mutation is shown. Here only little changes were performed by minimise. This might be due to the fact that this position has already been optimized by SCWRL (in this case). Other parts of the structure on the other hand, have been changed dramatically, which can be seen in <xr id="fig:Minimise1"/>. Another part of the same structure, where position 143 is mutated into a Threonine is shown. The two shown sheets were mutated into a coil during the fifth iteration of minimise. Similar modifications can be observed in the other SNP iterations, too.
In the tables MinItFoldX and MinItSCWRL the end energies of the iterations are listed. Those are also shown in the figures in <xr id="tab:Energies"/> and compared there. The SCWRL structures always have a higher and thus worse energy than the FoldX structures, although it seems, that after five iterations the energy of the structures of both programs tend to become equal. What becomes obvious is, that minimise does not try to optimize the energy of the structure, because it gets worse in almost each iteration step.
Gromacs
Conclusion
<figtable id="tab:ScoringScheme"> In this table, the scoring scheme of the overall prediction and all its components are shown. For each of the before mentioned methods one rating from "--" to "+" is assigned. These are summed up for the overall score and the subsequent prediction.
SNP | Vizualisation | pymol hbond |
pymol surface |
SCWRL energy |
SCWRL vizual |
FoldX energy |
FoldX vizual |
Gromacs | Sum | Prediction | True classification |
Result prediction |
---|---|---|---|---|---|---|---|---|---|---|---|---|
P40S | + | + | + | - | - | - | 0 | 0 | non-disease- causing |
disease- causing |
Wrong | |
S65T | - | 0 | + | -- | -- | - | -- | -7 | disease- causing |
disease- causing |
Right | |
R118H | 0 | -- | 0 | 0 | - | 0 | - | -4 | disease- causing |
non-disease- causing |
Wrong | |
A143T | - | -- | -- | 0 | - | + | - | -6 | disease- causing |
disease- causing |
Right | |
N215S | -- | 0 | + | - | - | - | 0 | -4 | disease- causing |
disease- causing |
Right | |
Q279E | + | 0 | 0 | - | - | 0 | - | -2 | disease- causing |
disease- causing |
Right | |
I289V | + | + | - | + | + | 0 | + | 4 | non-disease- causing |
non-disease- causing |
Right | |
V316I | 0 | + | - | -- | + | 0 | + | 0 | non-disease- causing |
non-disease- causing |
Right | |
P323T | + | + | 0 | - | - | 0 | - | -1 | non-disease- causing |
non-disease- causing |
Right | |
R356W | 0 | -- | -- | -- | -- | 0 | -- | -10 | disease- causing |
disease- causing |
Right | |
prediction precision | 0.7 | 0.5 | 0.4 | 0.7 | 0.8 | 0.7 | 0.6 | 0.8 | 0.8 |
</figtable>
The prediction in this week's task did rely mainly on intuition, especially regarding the vizual inspections. We came up with a scoring scheme, that ranges from "--" to "+", meaning "very damaging" and "probably not disease causing", respectively. A single "-" means "damaging", a "0" can mean either "probably not too damaging" or an "indecisive conclusion". The reasoning for the vizualisation and the pymol scores can be read from the description text in the sections Vizualisation and Pymol.
The SCWRL energy score was assigned a "--" if the "Energy(mutant)/Energy(wt)" fraction was greater than 1.04, a "-" from 1.03 to 1.04, a "0" if it was greater than 1 and smaller than 1.03 and a "+" if the fraction was smaller than 1. The vizual score was mainly based on the introduced and deleted hydrogen bonds and the orientation of the mutated amino acid.
The FoldX energy score is based on two values, the "Fraction starting energy" and the "Fraction relative wild type". These two were evaluated independantly and than added. A "+" was assigned to either of these two if their value was smaller than 1, a "0" if it was between 1 and 1.1, above this, it was rated "-".
All these scores were summed up and then rated either non-disease causing (greater or equal -1) or disease causing (smaller than -1). The result is listed in <xr id="tab:ScoringScheme"/>. From this we conclude, that it is not helpful, to use only one of these methods, since none of them has a very good predictive power (except maybe for the vizual inspection of the SCWRL result), but using them all together gives quite a good impression of the malignancy of a point mutation
References
<references />