Fabry:Structure-based mutation analysis

From Bioinformatikpedia
Revision as of 12:30, 25 June 2012 by Rackersederj (talk | contribs) (Preparation)

Fabry Disease » Structure-based mutation analysis


The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Preparation

<figtable id="tab:Prep"> All available PDB structures assigned to the Uniprot entry P06280 along with the according Resolution,
Coverage and R-factor. The R-factor was obtained from the PDBsum page. The chosen structure 3S5Y is highlighted.

Entry Method Resolution (Å) Chain Positions (up to 429) R-factor<ref>R value http://www.proteopedia.org/wiki/index.php?title=R_value&oldid=569063, last checked on June 21, 2012</ref> R-free<ref>Free R http://www.proteopedia.org/wiki/index.php?title=Free_R&oldid=1390871, last checked on June 21, 2012</ref> pH PDBsum PDB
1R46 X-ray 3.25 A/B 32-422 0.262 0.301 8.0 ] ]
1R47 X-ray 3.45 A/B 32-422 0.285 0.321 8.0 ] ]
3GXN X-ray 3.01 A/B 32-421 0.239 0.301 4.5 ] ]
3GXP X-ray 2.20 A/B 32-422 0.204 0.265 4.5 ] ]
3GXT X-ray 2.70 A/B 32-422 0.245 0.306 4.5 ] ]
3HG2 X-ray 2.30 A/B 32-422 0.178 0.202 4.6 ] ]
3HG3 X-ray 1.90 A/B 32-426 0.167 0.197 6.5 ] ]
3HG4 X-ray 2.30 A/B 32-423 0.166 0.221 4.6 ] ]
3HG5 X-ray 2.30 A/B 32-422 0.192 0.227 4.6 ] ]
3LX9 X-ray 2.04 A/B 32-422 0.178 0.218 6.5 ] ]
3LXA X-ray 3.04 A/B 32-426 0.216 0.244 6.5 ] ]
3LXB X-ray 2.85 A/B 32-427 0.227 0.264 6.5 ] ]
3LXC X-ray 2.35 A/B 32-422 0.186 0.237 6.5 ] ]
3S5Y X-ray 2.10 A/B 32-422 0.195 0.230 5.1 ] ]
3S5Z X-ray 2.00 A/B 32-421 0.211 0.234 5.1 ] ]
3TV8 X-ray 2.64 A/B 32-422 0.203 0.239 4.6 ] ]

</figtable>

We did not choose the structure 3HG3, although it has the best resolution (1.90 Å) and the second best R-factor (see <xr id="tab:Prep"/>), which is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data <ref>R-factor (crystallography) (May 17, 2012‎) http://en.wikipedia.org/wiki/R-factor_%28crystallography%29, June 20, 2012</ref>, since it has an Alanin at position 170 (part of the active site) instead of an Aspartic acid. After excluding those structures, that had deviations in the sequence, we had to choose between ten sequences (1R46, 1R47, 3GXN, 3GXP, 3GXT, 3HG2, 3HG4, 3HG5, 3S5Y, 3S5Z) and decided to use 3S5Y. This structure has the advantage of a good pH, very good coverage and still reasonable resolution and R-factor.

Vizualisation

<figure id="fig:allSNPs">

All SNPs mapped onto the structure 3S5Y, chain A. Mutated sites are shown in red. The active site (residues 170 and 231) is shown in pink, substrate binding site (position 203-207) in cyan and the existing five disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382) are highlighted in yellow. Ligands are depicted in lines representation in dark cyan.

</figure>

From the first glance, we would consider A143T as disease causing, because it is located right next to a Cysteine, that forms a disulfide bond (yellow) and might interfere with it. The same applies for S65T, which is in the structural neighborhood of this bond and also might cause atom clashes.
There is no mutation that can interfere with the active site(purple) or the substrate binding site (cyan).
The mutation N215S seems to play an important role in the binding of the ligand N-Acetyl-D-Glucosamine.
Both prolines Pro 323 and Pro 40 do not give the impression as if they would be very important. Also Arg 356 is on the surface of the protein and is not involved in any binding, although it might be influencing the nearby helix. This also is applicable for the SNP R118H.
It seems possible, that the mutation V316I does not have a big impact on the helix it is located in.
Neither Q279E nor I289V have an obvious reason to be considered as crucial in this structure.

Create mutation

Pymol

<figtable id="tab:Pymol"> Mutagenesis of the 10 selected SNPs performed with pymol. This was done on the basis of a backbone independant library. Usually the rotamer with the least atomic clashes was chosen. Clashes are depicted as red and green disks. The wildtype amino acid is shown in green, the mutated one in red. Hydrogen bonds of the mutant to the surrounding are depicted in blue. If shown, the active site (residues 170 and 231) is colored pink, substrate binding site (position 203-207) cyan and the existing five disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382) are highlighted in yellow. Ligand atoms are represented as spheres in dark cyan.

A143T
I289V
N215S
P40S
P323T
Q279E
R118H
R356W
S65T
V316I

</figtable>

Taking a closer look at each of the mutations, which are shown in the pictures in <xr id="tab:Pymol"/> we can underline most of our assumptions posed in section Vizualisation. The SNPs shown there were created with Pymol, a backbone independant library was used and we tried to pick that rotamer, that seemed to produce the least atom clashes. Inspite of that, A143T interferes with the ligand and a cysteine, that forms a disulfide bond and thus can be considered as an important site.
In this view, the mutation S65T does not seem to be that malign, since most of the important hydrogen bonds are kept and only little clashes are caused. The same applies again for V316I, Q279E and I289V. The most atom clashes are produced by the introduced Glu at position 279, but in all three point mutations the important h-bonds that stabilize the alpha-helix are still present.
The assumption that Asn 215 has an important role in binding of the ligand can be confirmed, but considering the newly introduced hydrogen bonds when mutating it to a Serine, that form a contact to N-Acetyl-D-Glucosamine the mutation might though be not disease causing. Again, both mutations of the prolines (P40S and P323T) seem to be not malign, since there appear hardly any atom clashes and all hydrogen bonds are still present.
The closer look at the SNP R118H reveals that this mutation has to be disease causing, since on the one hand the size and shape of the amino acid is changed completely and on the other hand the formed non covalent connections to the adjoint alpha-helices are changed completely. This conclusion can be drawn for the mutation of Arginine 356 to Tryptophane, because the crucial bonds to the alpha helix on the right are destroyed by the introduced aromatic rings.

<figtable id="tab:Pymol"> Comparison of the surface of the wildtype and the mutant of the 10 selected SNPs performed with pymol. This was done on the basis of a backbone independant library. Usually the rotamer with the least atomic clashes was chosen. Clashes are depicted as red and green disks. The wildtype amino acid is shown in green, the mutated one in red. Hydrogen bonds of the mutant to the surrounding are depicted in blue. If shown, the active site (residues 170 and 231) is colored pink, substrate binding site (position 203-207) cyan and the existing five disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382) are highlighted in yellow. Ligand atoms are represented as spheres in dark cyan.

A143T
I289V
N215S
P40S
P323T
A143T
I289V
N215S
P40S
P323T
Q279E
R118H
R356W
S65T
V316I
Q279E
R118H
R356W
S65T
V316I

</figtable>

Compare surface

SCWRL

<figtable id="tab:MutSCWRL"> Mutagenesis of the 10 selected SNPs performed with SCWRL. This was done on the basis of a backbone dependant library. The wildtype amino acid is shown in green, the mutated one in red. The whole structure of the wildtype is colored light blue, the mutant light red. Hydrogen bonds of the mutant to the surrounding are depicted in red, those of the not mutated site in green.

A143T
I289V
N215S
P40S
P323T
Q279E
R118H
R356W
S65T
V316I

</figtable>

<figtable id="tab:EnergySCWRL"> Minimal energy introduced by the mutated site. Shown is the energy if only chain A is mutated, as well as the energy when both sites are mutated. The last column contains the fraction of the energy of the mutant and the wildtype

SNP Minimal energy
Chain A
Minimal energy
Both chains
Energy(mutant)
/Energy(wt)
A143T 142.857 290.650 1.029
I289V 138.101 277.376 0.995
N215S 143.360 287.975 1.033
P323T 143.375 287.394 1.033
P40S 144.203 289.775 1.039
Q279E 143.082 286.738 1.031
R118H 142.636 285.207 1.028
R356W 149.729 305.003 1.079
S65T 145.551 293.644 1.049
V316I 148.135 294.203 1.067

</figtable>

The mutations created by SCWRL are shown in <xr id="tab:MutSCWRL"/>(only chain A was mutated in the structures shown in these pictures), where they are also compared to the wildtype of this residue. Here the focus is on the newly introduced and the deleted hydrogen bonds. The amount of new connections correlates with the size of the minimal energy shown in <xr id="tab:EnergySCWRL"/>, the higher the number, the more new non-covalent bondings are formed and the more severe the changes are.
The free energy of the non mutated chain A of the wildtype protein is 138.787. The only SNP that has a smaller free energy of 138.101 is I289V. Looking at the corresponding picture, we see that none of the h-bonds formed by the wildtype is changed by the mutant. The biggest value can be observed when examining the mutation R356W, where all hydrogen bonds on the right hand side are deleted due to the aromatic rings of the Tryptophane. The smallest values for single mutation sites are I289V, R118H and A143T.
Comparing the values of the structures where only one chain was mutated to these where both chains were mutated, a doubling of the values can be observed, but in most cases, the latter is roughly twice as big as the first term, but usually a little bigger.


Comparison energies

foldX

<figtable id="tab:MutFoldX"> Mutagenesis of the 10 selected SNPs performed with FoldX. The wildtype amino acid is shown in green, the mutated one in red. The whole structure of the wildtype is colored light blue, the mutant light red. Hydrogen bonds of the mutant to the surrounding are depicted in red, those of the not mutated site in green.

A143T
I289V
N215S
P40S
P323T
Q279E
R118H
R356W
S65T
V316I

</figtable>


The program FoldX calculates the free energy of unfolding ∆G and thus here predicts the stability of the mutated protein. Since the total energy of unfolding in the starting protein is 34.82, this value can be considered as indicator for a stable protein. The raw energies and differences in each part of the force field term can be looked up in the RawFoldX table and the DifFoldX table, respectively. Interesting in our case are the differences listed in <xr id="tab:DifEnergies"/>. There the increase or decrease of ∆G is shown in respect to the mentioned "starting energy" 34.82 and also to the relative wildtype of each mutation, which is computed by FoldX. Compared to the original protein, all SNPs, except for A143T have a lower Gibbs free energy, thus are less stable. Regarding these values only the mutants R356W and A143T can be considered almost as stable or more stable than the original protein. Regarding the comparison to the relative wildtypes the mutants R356W, V316I, S65T and I289V we consider as equally stable.
Looking at the pictures in <xr id="tab:MutFoldX"/>, which graphically compare the original wildtype to the mutants, we mainly see a similar picture to the comparisons in <xr id="tab:MutSCWRL"/>. Most of the time, the orientation of the introduced amino acid differs a little, but the hydrogen bonds are comparable to those added or deleted by SCWRL. A big difference can be seen in the mutant N215S, where FoldX does not keep the connection to the ligand and rather connects to another residue in the adjoint helix. Thus this conformation can be considered as worse. In P40S and S65T, the build h-bonds as well tend more to the inside and towards already bound structure, than the SCWRL rotamers.

<figtable id="tab:DifEnergies"> Shown are the differences in free energy of unfolding of each SNP compared to the starting energy (see <xr id="tab:energyStart"/> ) of the wildtype and to the energy of their relative wild type (see DifFoldX table) calculated by FoldX

SNP Difference to
starting energy
Fraction
starting energy
Difference to
relative wild type
Fraction
relative wild type
A143T 3.38 0.903 1.63 1.055
I289V -1.42 1.041 0.88 1.025
N215S -4.04 1.116 1.19 1.032
P40S -2.39 1.069 3.62 1.108
P323T -5.26 1.151 1.54 1.040
Q279E -2.82 1.081 2.95 1.085
R118H -2.43 1.070 1.19 1.033
R356W -0.64 1.018 -0.23 0.994
S65T -8.71 1.250 0.16 1.004
V316I -8.09 1.232 -0.59 0.986

</figtable>

<figtable id="tab:energyStart"> Here, the starting energies are listed, that were calculated by FoldX before performing any stability calculations on the protein.

Type Energy
BackHbond -528.41
SideHbond -149.96
Energy_VdW -1013.85
Electro -25.51
Energy_SolvP 1310.89
Energy_SolvH -1318.53
Energy_vdwclash 99.21
energy_torsion 26.73
backbone_vdwclash 478.11
Entropy_sidec 482.20
Entropy_mainc 1219.31
water bonds -29.53
helix dipole -12.51
loop_entropy 0.00
cis_bond 4.50
disulfide -29.93
kn electrostatic -0.98
partial covalent interactions 0.00
Energy_Ionisation 1.20
Entropy Complex 0.00
Total 34.82

</figtable>


Minimise

<figure id="fig:Minimise1">

Example for the changes minimise makes in the structure. Depicted are two beta sheets, that are structural quite far away located from the mutation at position 143 (see next figure). Iteration 1 through 4 keep the sheets, but the structure of iteration 5 (red) turns it into a coil.

</figure> <figure id="fig:Minimise2">

Example for the changes minimise makes in the structure at the mutated site. It can be seen, that the mutation site at position 143 is changed only slightly and all secondary structures are kept. This is different to other parts of the structure, which is depicted in the above figure

</figure>

<figtable id="tab:Energies"> Comparison of the energies calculated by minimise after each iteration step. Each figure shows the energies of the structures created by SCWRL and FoldX for one point mutation. The SCWRL energies are plotted in red, FoldX energies in blue. It can be seen in all pictures, that the energies become worse (bigger) in almost each step and always are worse than at the beginning.

A143T
I289V
N215S
P40S
P323T
Q279E
R118H
R356W
S65T
V316I

</figtable>

Minimise tries to ??....??. In <xr id="fig:Minimise2"/> an example for a point mutation is shown. Here only little changes were performed by minimise. This might be due to the fact that this position has already been optimized by SCWRL (in this case). Other parts of the structure on the other hand, have been changed dramatically, which can be seen in <xr id="fig:Minimise1"/>. Another part of the same structure, where position 143 is mutated into a Threonine is shown. The two shown sheets were mutated into a coil during the fifth iteration of minimise. Similar modifications can be observed in the other SNP iterations, too.

In the tables MinItFoldX and MinItSCWRL the end energies of the iterations are listed. Those are also shown in the figures in <xr id="tab:Energies"/> and compared there. The SCWRL structures always have a higher and thus worse energy than the FoldX structures, although it seems, that after five iterations the energy of the structures of both programs tend to become equal. (TODO: Maybe on example with more than 5 iterations. P232T?) What becomes obvious is, that minimise does not try to optimize the energy of the structure, because it gets worse in almost each iteration step.

Gromacs

Conclusion

What are we concluding this week? Which one is disease causing?

<figtable id="tab:ScoringScheme"> ADD CAPTION HERE

SNP Vizualisation pymol
hbond
pymol
surface
SCWRL
energy
SCWRL
vizual
FoldX
energy
FoldX
vizual
Gromacs Sum Prediction True
classification
Result prediction
P40S + + + - - - 0 0 non-disease-
causing
disease-
causing
Wrong
S65T - 0 + -- -- - -- -7 disease-
causing
disease-
causing
Right
R118H 0 -- 0 0 - 0 - -4 disease-
causing
non-disease-
causing
Wrong
A143T - -- -- 0 - + - -6 disease-
causing
disease-
causing
Right
N215S -- 0 + - - - 0 -4 disease-
causing
disease-
causing
Right
Q279E + 0 0 - - 0 - -2 disease-
causing
disease-
causing
Right
I289V + + - + + 0 + 4 non-disease-
causing
non-disease-
causing
Right
V316I 0 + - -- + 0 + 0 non-disease-
causing
non-disease-
causing
Right
P323T + + 0 - - 0 - -1 non-disease-
causing
non-disease-
causing
Right
R356W 0 -- -- -- -- 0 -- -10 disease-
causing
disease-
causing
Right

</figtable>

The prediction in this week's task did rely mainly on intuition, especially regarding the vizual inspections. We came up with a scoring scheme, that ranges from "--" to "+", meaning "very damaging" and "probably not disease causing", respectively. A single "-" means "damaging", a "0" can mean either "probably not too damaging" or an "indecisive conclusion". The reasoning for the vizualisation and the pymol scores can be read from the description text in the sections Vizualisation and Pymol.

The SCWRL energy score was assigned a "--" if the "Energy(mutant)/Energy(wt)" fraction was greater than 1.04, a "-" from 1.03 to 1.04, a "0" if it was greater than 1 and smaller than 1.03 and a "+" if the fraction was smaller than 1. The vizual score was mainly based on the introduced and deleted hydrogen bonds and the orientation of the mutated amino acid.

The FoldX energy score is based on two values, the "Fraction starting energy" and the "Fraction relative wild type". These two were evaluated independantly and than added. A "+" was assigned to either of these two if their value was smaller than 1, a "0" if it was between 1 and 1.1, above this, it was rated "-".

All these scores were summed up and then rated either non-diseasw causing (greater or equal -1) or disease causing (smaller than -1). The result is listed in <xr id="tab:ScoringScheme"/>.

References

<references />