Fabry:Sequence-based mutation analysis

Fabry Disease » Sequence-based mutation analysis

The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Dataset preparation

Q279E
N215S
I289V
S65T
R356W
V316I
P323T
P40S
R118H
A143T

Amino acid properties

<figtable id="tab:aaProp"> Physicochemical properties of the chosen SNPs and changes of properties between wildtype (wt) and mutant (mt).
Used abbreviation in this table:
AA: Amino Acid, Pol: Side-chain polarity, Charge: Side-chain charge at pH 7.4, HI: Hydropathy index, RM: Residue Mass, iP: isoelectric point

SNP	wt AA	wt Pol	wt Charge	wt HI	wt RM	wt iP	mt AA	mt Pol	mt Charge	mt HI	mt RM	mt iP	change in Pol	change in Charge	change in HI	change in RM	change in iP
Q279E	Q	polar	neutral	-3.5	128.131	5.65	E	polar	negative	-3.5	129.116	3.15	none	neutral to negative	0	0.99	-2.5
N215S	N	polar	neutral	-3.5	114.104	5.41	S	polar	neutral	-0.8	87.078	5.68	none	none	2.7	-27.026	0.27
I289V	I	nonpolar	neutral	4.5	113.160	6.05	V	nonpolar	neutral	4.2	99.133	6.00	none	none	-0.3	-14.027	-0.05
S65T	S	polar	neutral	-0.8	87.078	5.68	T	polar	neutral	-0.7	101.105	5.60	none	none	0.1	14.027	-0.08
R356W	R	polar	positive	-4.5	156.188	10.76	W	nonpolar	neutral	-0.9	186.213	5.89	polar to nonpolar	positive to neutral	3.6	30.025	-4.87
V316I	V	nonpolar	neutral	4.2	99.133	6.00	I	nonpolar	neutral	4.5	113.160	6.05	none	none	0.3	14.027	0.05
P323T	P	nonpolar	neutral	-1.6	97.117	6.30	T	polar	neutral	-0.7	101.105	5.60	nonpolar to polar	none	0.9	3.988	-0.7
P40S	P	nonpolar	neutral	-1.6	97.117	6.30	S	polar	neutral	-0.8	87.078	5.68	nonpolar to polar	none	0.8	-10.039	-0.62
R118H	R	polar	positive	-4.5	156.188	10.76	H	polar	pos(10%), neutr(90%)	-3.2	137.142	7.60	none	positive to pos(10%), neutr(90%)	1.3	-19.046	-3.16
A143T	A	nonpolar	neutral	1.8	71.079	6.01	T	polar	neutral	-0.7	101.105	5.60	nonpolar to polar	none	-2.5	30.026	-0.41

</figtable>

The polarity of the side chain determines whether an amino acid is hydrophobic or not. Hydrophobicity is a measure of how soluble an amino acid is in water. Hydrophobic amino acids are more likely to be found inside a protein, while hydrophilic amino acids rather are in contact with the aqueous environment. <ref>Hydrophobicity Index for Common Amino Acids http://www.sigmaaldrich.com/life-science/metabolomics/learning-center/amino-acid-reference-chart.html#hydro, June 16, 2012</ref> Therefore, depending on the localisation of an amino acid, a change in the polarity due to a mutation can cause a major defect. This may be the case in the SNPs P40S, A143T, P323T and R356W.
Furthermore the type of charge (positive or negative) is important for the structure of a protein, because it controls the binding of the amino acid to close-by residues. A modification again can break the coherence of the protein, which might happen when the mutations R118H, Q279E and R356W occur.
The hydropathy index of an amino acid is a number representing the hydrophobic or hydrophilic properties of its sidechain.<ref>Kyte J, Doolittle RF (May 1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1): 105–32. PMID 7108955. </ref> The larger the number is, the more hydrophobic the amino acid, thus the most hydrophobic amino acid is isoleucine (4.5) and the most hydrophilic one is arginine (-4.5). Considering a hydropathy change greater than 1 (in both direction, positive and negative) as crucial, only one SNP highly increases the hydrophobicity (A143T) and 3 increase the hydrophilic character of the position (R118H, N215S and R356W)
The average residue mass ranges from 57.052 (Glycine) to 186.213 (Tryptophan), thus we expect an alteration of the mass of greater than 10 as critical. This concerns all mutations expect for Q279E and P323T.
The isoelectric point is the pH at which an amino acid carries no net charge. Below the pI it carries a net positive charge, above it a net negative charge. Since the pH in the human body is on average 6.7, only three amino acids are positively charged (Histidine, Lysine and Arginine). The pI ranges from 2.85 (Aspartic acid) to 10.76 (Arginine), therefore we considered a change of 0.8 as probably desease causing. This applies only for R118H, Q279E and R356W.

Simple structural analysis

P40S

The SNP P40S. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP P40S in the α-galactosidase protein is marked in purple.

</figure>

The SNP P40S. The wild type amino acid is colored in green, the mutation in red.

</figure>

S65T

The SNP S65T. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP S65T in the α-galactosidase protein is marked in purple.

</figure>

The SNP S65T. The wild type amino acid is colored in green, the mutation in red.

</figure>

R118H

The SNP R118H. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP R118H in the α-galactosidase protein is marked in purple.

</figure>

The SNP R118H. The wild type amino acid is colored in green, the mutation in red.

</figure>

A143T

The SNP A143T. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP A143T in the α-galactosidase protein is marked in purple.

</figure>

The SNP A143T. The wild type amino acid is colored in green, the mutation in red.

</figure>

N215S

The SNP N215S. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP N215S in the α-galactosidase protein is marked in purple.

</figure>

The SNP N215S. The wild type amino acid is colored in green, the mutation in red.

</figure>

Q279E

The SNP Q279E. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP Q279E in the α-galactosidase protein is marked in purple.

</figure>

The SNP Q279E. The wild type amino acid is colored in green, the mutation in red.

</figure>

I289V

The SNP I289V. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP I289V in the α-galactosidase protein is marked in purple.

</figure>

The SNP I289V. The wild type amino acid is colored in green, the mutation in red.

</figure>

V316I

The SNP V316I. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP V316I in the α-galactosidase protein is marked in purple.

</figure>

The SNP V316I. The wild type amino acid is colored in green, the mutation in red.

</figure>

P323T

The SNP P323T. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP P323T in the α-galactosidase protein is marked in purple.

</figure>

The SNP P323T. The wild type amino acid is colored in green, the mutation in red.

</figure>

R356W

The SNP R356W. The wild type amino acid is colored in green, the mutation in red.

</figure>

The position of the SNP R356W in the α-galactosidase protein is marked in purple.

</figure>

The SNP R356W. The wild type amino acid is colored in green, the mutation in red.

</figure>

Secondary Structure

<figtable id="tab:Location"> Secondary structure assignment predicted by the three methods Psipred, Reprof and DSSP in Task 3 for the mutated amino acid itself and
the ten adjacent residues to the left and to the right.
H represents a helix at this position, C represents coiled regions, E sheets and - is a not predictable region

SNP	SecStruc Psipred	SecStruc Psipred long	SecStruc Reprof	SecStruc Reprof long	SecStruc DSSP	SecStruc DSSP long
Q279E	H	CCCCCCCHHHHHHHHHHHHHH	H	EECCCCCCHHHHHHHHHHHHH	H	CCCCC--HHHHHHHHHHHHHC
N215S	H	CCCCCCCCCCHHHHHCCCCCC	H	CECCCCCCCCHHHHHHHHHHH	H	HHHCCCC---HHHHCCC-CEE
I289V	H	HHHHHHHHHHHCCCEEEECCC	H	HHHHHHHHHHHHHCHHCCCCC	C	HHHHHHHHHHCC--EEE-C-C
S65T	H	CCCCCCCCCCHHHHHHHHHHH	H	CCCCCCCHHHHHHHHHHHHHH	H	CCC-CCCC-CHHHHHHHHHHH
R356W	C	CCEEEEEEECCCCCCEEEEEE	C	HHHHHHHHHHCCCCCCCCHHH	-	CEEEEEEEE---CCC-EEEEE
V316I	H	HHHCCCCHHHHHHCCCCCCCC	E	HHHHCCCCCEEEECCCCCCCC	H	HHHHHH-HHHHHHHC-CC---
P323T	C	HHHHHHCCCCCCCCCEEEEEC	C	CCEEEECCCCCCCCCCEECCC	C	HHHHHHHC-CC----EEEE-C
P40S	C	CCCCCCCCCCCCCCCCCCCCC	C	HHCCCCCCCCCCCHHHHHHEE	-	---CC--CC--EEEECHHHHC
R118H	H	CCCCCCCCHHHHHHHHHHCCC	H	CCCCCCHHHHHHHHHHHCCCC	H	-CCC-CCHHHHHHHHHHHCC-
A143T	C	EECCCCCCCCCCCCCCCCHHH	C	EEECCCCCCCCCCCCCCCCCC	C	EEECCCE-CCCCE--CCCHHH

</figtable>

Despite the fact, we know from last weeks' task, that the disease causing mutations are spread all over the protein without any respect to the secondary structure, we assumed we had no prior knowledge about it. Thus we looked at the predicted secondary structure at the position of each point mutation and its surrounding (10 residues to the left and 10 residues to the right). The only remarkable fact is, that there are (almost) no sheets at the mutated residues. From Task 6 we know, that this happened only by chance and due to the small amount of picked SNPs.

Substitution matrices

<figtable id="tab:Subsmatr"> Substitution values for all SNPs,
assigned by the three substitution matrices
BLOSUM62, PAM1 and PAM250.

SNP	Value BLOSUM62	Value PAM1	Value PAM250
Q279E	3	27	2
N215S	1	20	1
I289V	4	33	4
S65T	2	38	1
R356W	-4	8	2
V316I	4	57	4
P323T	-2	4	0
P40S	-1	12	1
R118H	0	10	2
A143T	0	32	1

</figtable>

Since the PAM1 and the PAM250 matrices are designed for proteins of very diverse degree of kinship, 99% and ~20% relationship, respectively, those two matrices tend to give contradictory scores of how likely a substitution is. On the other hand, BLOSUM62 and PAM1, although the BLOSUM matrix was created from sequences with identity of less than 62 percent, usually provide similar predictions.

PSSM

<figtable id="tab:pssm"> The occurrence of the wild and mutation type according to the pssm generated by Psi-Blast

conservation of	P40S	S65T	R118H	A143T	N215S	Q279E	I289V	V316I	P323T	R356W
wild type	81%	16%	8%	11%	6%	12%	17%	29%	15%	15%
mutant type	2%	20%	2%	6%	4%	38%	11%	19%	7%	4%

</figtable>

Multiple sequence alignment

And another step close to evolution: Identify all mammalian homologous sequences. Create a multiple sequence alignment for them with a method of your choice. Using this you can now calculate conservation for WT and mutant residues again. Compare this to the matrix- and PSSM-derived results.

Scoring methods

SIFT

<figtable id="tab:Sift"> Sift Scores

SNP	Prediction	Sift Score	Sequences represented at this position
P40S	AFFECT PROTEIN FUNCTION	0.00	41
S65T	AFFECT PROTEIN FUNCTION	0.01	45
R118H	be TOLERATED	0.06	48
A143T	AFFECT PROTEIN FUNCTION	0.01	48
N215S	AFFECT PROTEIN FUNCTION	0.01	48
Q279E	AFFECT PROTEIN FUNCTION	0.00	48
I289V	AFFECT PROTEIN FUNCTION	0.05	48
V316I	be TOLERATED	0.75	48
P323T	AFFECT PROTEIN FUNCTION	0.01	48
R356W	AFFECT PROTEIN FUNCTION	0.01	47

</figtable>

Median sequence conservation: 2.99

SIFT

Polyphen2

<figtable id="tab:Polyphen"> Polyphen Scores

SNP	rs ID	Sec Struc	Prediction	pph2 Class	pph2 Prob	pph2 FPR	pph2 TPR	pph2 FDR
Q279E	rs28935485	H	probably damaging	deleterious	0.983	0.0387	0.745	0.0657
N215S	rs28935197	.	benign	neutral	0.048	0.167	0.941	0.194
I289V	?	H	probably damaging	deleterious	0.975	0.0436	0.762	0.072
S65T	?	.	probably damaging	deleterious	0.995	0.0277	0.681	0.0521
R356W	?	.	probably damaging	deleterious	1	0.00026	0.00018	0.0109
V316I	?	H	benign	neutral	0.308	0.113	0.904	0.144
P323T	?	T	possibly damaging	deleterious	0.612	0.091	0.872	0.124
P40S	?	.	probably damaging	deleterious	1	0.00026	0.00018	0.0109
R118H	?	H	benign	neutral	0.015	0.209	0.956	0.229
A143T	?	T	probably damaging	deleterious	1	0.00026	0.00018	0.0109

</figtable>

Polyphen2 [1]

SNAP

<figtable id="tab:snap2"> Predictions by SNAP2 of the effect of SNPs relative to the wild type

Mutation	Binary prediction	Reliability Index	Expected Accuracy
P40A	Non-neutral	1	60%
P40C	Non-neutral	2	63%
P40D	Non-neutral	2	63%
P40E	Non-neutral	2	63%
P40F	Non-neutral	5	75%
P40G	Non-neutral	4	71%
P40H	Non-neutral	2	63%
P40I	Non-neutral	3	67%
P40K	Non-neutral	1	60%
P40L	Non-neutral	3	67%
P40M	Non-neutral	2	63%
P40N	Non-neutral	2	63%
P40Q	Non-neutral	1	60%
P40R	Non-neutral	2	63%
P40S	Neutral	0	51%
P40T	Neutral	0	51%
P40V	Non-neutral	1	60%
P40W	Non-neutral	7	80%
P40Y	Non-neutral	4	71%

S65A	Neutral	4	67%
S65C	Neutral	2	59%
S65D	Neutral	2	59%
S65E	Neutral	3	62%
S65F	Non-neutral	1	60%
S65G	Neutral	4	67%
S65H	Neutral	3	62%
S65I	Neutral	1	53%
S65K	Neutral	4	67%
S65L	Neutral	0	51%
S65M	Neutral	0	51%
S65N	Neutral	6	79%
S65P	Neutral	1	53%
S65Q	Neutral	3	62%
S65R	Neutral	3	62%
S65T	Neutral	6	79%
S65V	Neutral	1	53%
S65W	Non-neutral	5	75%
S65Y	Neutral	0	51%

R118A	Non-neutral	4	71%
R118C	Non-neutral	4	71%
R118D	Non-neutral	4	71%
R118E	Non-neutral	3	67%
R118F	Non-neutral	4	71%
R118G	Non-neutral	3	67%
R118H	Non-neutral	1	60%
R118I	Non-neutral	3	67%
R118K	Neutral	1	53%
R118L	Non-neutral	3	67%
R118M	Non-neutral	3	67%
R118N	Neutral	0	51%
R118P	Non-neutral	4	71%
R118Q	Neutral	0	51%
R118S	Neutral	0	51%
R118T	Neutral	0	51%
R118V	Non-neutral	3	67%
R118W	Non-neutral	7	80%
R118Y	Non-neutral	3	67%

A143C	Neutral	4	67%
A143D	Neutral	0	51%
A143E	Neutral	2	59%
A143F	Neutral	0	51%
A143G	Neutral	6	79%
A143H	Neutral	4	67%
A143I	Neutral	4	67%
A143K	Neutral	4	67%
A143L	Neutral	3	62%
A143M	Neutral	4	67%
A143N	Neutral	5	73%
A143P	Neutral	3	62%
A143Q	Neutral	4	67%
A143R	Neutral	3	62%
A143S	Neutral	7	85%
A143T	Neutral	7	85%
A143V	Neutral	5	73%
A143W	Non-neutral	5	75%
A143Y	Neutral	1	53%

N215A	Neutral	1	53%
N215C	Neutral	1	53%
N215D	Neutral	4	67%
N215E	Neutral	4	67%
N215F	Neutral	0	51%
N215G	Neutral	4	67%
N215H	Neutral	5	73%
N215I	Neutral	1	53%
N215K	Neutral	6	79%
N215L	Neutral	0	51%
N215M	Neutral	1	53%
N215P	Neutral	2	59%
N215Q	Neutral	5	73%
N215R	Neutral	5	73%
N215S	Neutral	7	85%
N215T	Neutral	7	85%
N215V	Neutral	2	59%
N215W	Non-neutral	4	71%
N215Y	Neutral	0	51%

Q279A	Neutral	1	53%
Q279C	Neutral	1	53%
Q279D	Neutral	3	62%
Q279E	Neutral	6	79%
Q279F	Neutral	1	53%
Q279G	Neutral	2	59%
Q279H	Neutral	6	79%
Q279I	Neutral	3	62%
Q279K	Neutral	6	79%
Q279L	Neutral	3	62%
Q279M	Neutral	3	62%
Q279N	Neutral	6	79%
Q279P	Neutral	3	62%
Q279R	Neutral	5	73%
Q279S	Neutral	6	79%
Q279T	Neutral	6	79%
Q279V	Neutral	3	62%
Q279W	Non-neutral	2	63%
Q279Y	Neutral	2	59%

I289A	Neutral	0	51%
I289C	Neutral	2	59%
I289D	Non-neutral	4	71%
I289E	Neutral	0	51%
I289F	Neutral	2	59%
I289G	Non-neutral	4	71%
I289H	Neutral	1	53%
I289K	Neutral	0	51%
I289L	Neutral	7	85%
I289M	Neutral	6	79%
I289N	Non-neutral	1	60%
I289P	Non-neutral	3	67%
I289Q	Neutral	1	53%
I289R	Non-neutral	1	60%
I289S	Neutral	0	51%
I289T	Neutral	2	59%
I289V	Neutral	8	91%
I289W	Non-neutral	4	71%
I289Y	Neutral	0	51%

V316A	Neutral	4	67%
V316C	Neutral	3	62%
V316D	Non-neutral	2	63%
V316E	Neutral	2	59%
V316F	Neutral	0	51%
V316G	Non-neutral	1	60%
V316H	Neutral	2	59%
V316I	Neutral	8	91%
V316K	Neutral	0	51%
V316L	Neutral	7	85%
V316M	Neutral	5	73%
V316N	Neutral	1	53%
V316P	Neutral	0	51%
V316Q	Neutral	2	59%
V316R	Neutral	0	51%
V316S	Neutral	1	53%
V316T	Neutral	4	67%
V316W	Non-neutral	4	71%
V316Y	Neutral	0	51%

P323A	Neutral	2	59%
P323C	Neutral	0	51%
P323D	Neutral	1	53%
P323E	Neutral	2	59%
P323F	Non-neutral	2	63%
P323G	Neutral	0	51%
P323H	Neutral	2	59%
P323I	Neutral	0	51%
P323K	Neutral	1	53%
P323L	Neutral	0	51%
P323M	Neutral	0	51%
P323N	Neutral	2	59%
P323Q	Neutral	3	62%
P323R	Neutral	1	53%
P323S	Neutral	3	62%
P323T	Neutral	4	67%
P323V	Neutral	1	53%
P323W	Non-neutral	5	75%
P323Y	Neutral	0	51%

R356A	Non-neutral	3	67%
R356C	Non-neutral	2	63%
R356D	Non-neutral	2	63%
R356E	Non-neutral	2	63%
R356F	Non-neutral	3	67%
R356G	Non-neutral	2	63%
R356H	Neutral	0	51%
R356I	Non-neutral	2	63%
R356K	Neutral	3	62%
R356L	Non-neutral	2	63%
R356M	Non-neutral	2	63%
R356N	Neutral	1	53%
R356P	Non-neutral	3	67%
R356Q	Neutral	0	51%
R356S	Neutral	1	53%
R356T	Neutral	1	53%
R356V	Non-neutral	2	63%
R356W	Non-neutral	5	75%
R356Y	Non-neutral	1	60%

</figtable>

Results and Conclusion

<figtable id="tab:Overview"> This table gives an overview over all features examined in the sections above. The red background color indicates a disease causing prediction, the green color a non-disease causing one. In the end all red fields are summed up for each row and the resulting value leads to our prediction given in <xr id="tab:Result"/>.
Used abbreviation in this table:
AA: Amino Acid, Pol: Side-chain polarity, Charge: Side-chain charge at pH 7.4, HI: Hydropathy index, RM: Residue Mass, iP: isoelectric point

SNP	change in Pol	change in Charge	change in HI	change in RM	change in iP	SecStruc Psipred	SecStruc Reprof	SecStruc DSSP	Value BLOSUM62	Value PAM1	Value PAM250	Sift Score	pph2 Class	SNAP prediction	Sum bad scores
A143T	nonpolar to polar	none	-2.5	30.026	-0.41	C	C	C	0	32	1	0.01	deleterious	neutral	6
R356W	polar to nonpolar	positive to neutral	3.6	30.025	-4.87	C	C	-	-4	8	2	0.01	deleterious	non-neutral	10
I289V	none	none	-0.3	-14.027	-0.05	H	H	C	4	33	4	0.05	deleterious	neutral	5
V316I	none	none	0.3	14.027	0.05	H	E	H	4	57	4	0.75	neutral	neutral	4
R118H	none	positive to pos(10%), neutr(90%)	1.3	-19.046	-3.16	H	H	H	0	10	2	0.06	neutral	non-neutral	9
N215S	none	none	2.7	-27.026	0.27	H	H	H	1	20	1	0.01	neutral	neutral	7
Q279E	none	neutral to negative	0	0.99	-2.5	H	H	H	3	27	2	0.00	deleterious	neutral	7
P40S	nonpolar to polar	none	0.8	-10.039	-0.62	C	C	-	-1	12	1	0.00	deleterious	neutral	7
S65T	none	none	0.1	14.027	-0.08	H	H	H	2	38	1	0.01	deleterious	neutral	7
P323T	nonpolar to polar	none	0.9	3.988	-0.7	C	C	C	-2	4	0	0.01	deleterious	neutral	6

</figtable>

<figtable id="tab:Result"> All examined SNPs with the "sum bad score" according to <xr id="tab:Overview"/> and our resulting
prediction. The table also shows if the SNP truly is disease causing or not and whether our sequence
based prediction is true

SNP	Sum bad scores	Prediction	True classification	Result prediction
A143T	6	Non-disease causing	Disease causing	Wrong
R356W	10	Disease causing	Disease causing	Right
I289V	5	Non-disease causing	Non-disease causing	Right
V316I	4	Non-disease causing	Non-disease causing	Right
R118H	9	Disease causing	Non-disease causing	Wrong
N215S	7	Disease causing	Disease causing	Right
Q279E	7	Disease causing	Disease causing	Right
P40S	7	Disease causing	Disease causing	Right
S65T	7	Disease causing	Disease causing	Right
P323T	6	Non-disease causing	Non-disease causing	Right

</figtable>

In <xr id="tab:Overview"/> we list all afore gathered information in condensed form and highlight values that we consider as an indicator for a disease causing mutation with red color. Results contained in green colored fields are considered neutral. The summed up score of disease indicators is again shown in <xr id="tab:Result"/> along with our prediction for each SNP. Mutations with score smaller than 7 are considered neutral, greater or equal to that disease causing. Next to the prediction we show the true classification of the single nucleotide polymorphism acording to the mapping we did in Task 6. We show that only two of the ten predictions are wrong, which we consider as a surprisingly good result.

References

Fabry:Sequence-based mutation analysis

Contents

Dataset preparation

Amino acid properties

Simple structural analysis

Secondary Structure

Substitution matrices

PSSM

Multiple sequence alignment

Scoring methods

SIFT

Polyphen2

SNAP

Results and Conclusion

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools