Sequence-based mutation analysis

Task description

For this Task the group that reviewed our page last week was to chose 10 SNP's on which we can work on. We assume we don't know which of the SNP's cause MSUP or affects the protein structure or function. As we haven't received any message from the group which reviewed our page last week by thursday, we've decided to chose the 10 SNP's by ourselves: L17F, M82I,Q125E, I213T, C258Y, T310R, A328T, I361V, R429H, N404S

Comparison of wild type AA and mutant AA

Physiological properties<ref>http://en.wikipedia.org/wiki/Amino_acid</ref> and predicted secondary structure element (reprof with pssm)

Position	wt-AA	wt properties	mutant-AA	mutant properties	expected effect on protein	sec-struct-element( reprof )
17	L	-Hydrophobic side chain, non-polar, charge: neutral	F	Hydrophobic side chain, non-polar, charge: neutral	-	L
82	M	Hydrophobic side chain, non-polar, charge: neutral	I	Hydrophobic side chain, non-polar, charge: neutral	-	E
125	Q	polar, charge: neutral	E	polar, charge: negative	+	L
213	I	Hydrophobic side chain, non-polar, charge: neutral	T	polar, charge: neutral	+	H
258	C	polar, charge: neutral	Y	polar, charge: neutral	-	L
310	T	polar, charge: neutral	R	polar, charge: positive	+	H
328	A	non-polar, charge: neutral	T	polar, charge: neutral	-	E
361	I	Hydrophobic side chain, non-polar, charge: neutral	V	non-polar, charge: neutral	-	H
404	N	polar, charge: neutral	S	polar, charge: neutral	-	L
429	R	polar, charge: positive	H	polar, charge: positive( 10% ) neeutral( 90% )	+	H

Blosum62, PAM1, PAM250 scores

The scores for an aminoacid change in the matrices Blosum and PAM should give another hint, whether the substitution has an effect on the resulting protein or not. The higher the X in BlosumX, the shorter is the evolutionary context of sequences it is calculated for, while in PAMX the opposite is the case.

Position	wt-AA	mutant-AA	Blosum62-Score<ref>http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm</ref> is	PAM1-Score<ref>http://www.icp.ucl.ac.be/~opperd/private/pam1.html</ref>	PAM250-Score<ref>http://www.icp.ucl.ac.be/~opperd/private/pam250.html</ref>
17	L	F	0	13	13
82	M	I	1	5	2
125	Q	E	2	27	7
213	I	T	-2	7	4
258	C	Y	-2	3	4
310	T	R	-1	2	5
328	A	T	-1	32	11
361	I	V	1	33	9
404	N	S	1	20	5
429	R	H	0	10	6

3D-Structure

We used pyMols Mutagenesis Wizard to visualize the individual Mutations. In each image the C-chain of the original residue is coloured green, that of the SNP is coloured pink. Oxygen atoms are coloured red and nitrogen blue, while sulphur is coloured yellow for both residues. The L17F SNP is not displayed because it was not in the pdb file we used (1DTW). The easiest to spot changes that look like the might have an effect are C258Y, T310R and R429H. In case of the C258Y mutation the protein looses a sulphur atom and gains an aromatic ring, while the T310R mutation greatly increases the size of the amino acid. In case of the R429H mutation the protein again gains an aromatic ring. While the Q125E mutation is similar in size to the original amino acid the change in charge is very likely to have an effect for the protein.

M82I	Q125E	I213T	C258Y	T310R
A328T	I361V	N404S	R429H

PSI-BLAST - PSSM // homolog MSA

The command we used to create the PSSM-file, the file itsself, and the MSA can be found in the journal

The underlying database is big_db. To interpret these values better, we should find all mammalian hologs of the gene, build a MSA out of these, and calculate the conservation of wild type and mutant in this context. We used HomoloGene<ref>http://www.ncbi.nlm.nih.gov/homologene</ref> from NCBI for this task. Because the result was, there are no mammalian homologs, we just used all results from the search, which can be found at <ref>http://www.ncbi.nlm.nih.gov/sites/homologene/569</ref>.

pos	wt-AA	mut-AA	pssm-score	pssm percent wt	pssm percent mut
17	L	F	1	31	9
82	M	I	1	28	6
125	Q	E	-1	97	0
213	I	T	-2	41	2
258	C	Y	-4	29	0
310	T	R	-3	60	0
328	A	T	2	59	12
361	I	V	2	63	13
404	N	S	2	19	16
429	R	H	-1	26	1

snp	conservation wt	conservation mut	others
L17F	10/15 (66%)	0/15 (0%)	T(1), I(1), R(1), N(1), M(1)
M82I	10/15 (66%)	0/15 (0%)	V(2), L(2), T(1)
Q125E	15/15 (100%)	0/15 (0%)	-
I213T	10/15 (66%)	0/15 (0%)	L(4), M(1)
C258Y	12/15 (80%)	0/15 (0%)	S(1), A(2)
T310R	12/15 (80%)	0/15 (0%)	M(1), V(2)
A328T	14/15 (93%)	0/15 (0%)	C(1)
I361V	13/15 (86%)	1/15 (7%)	L(1)
N404S	10/15 (66%)	0/15 (0%)	A(1), D(1), P(2), H(1)
R429H	8/15 (53%)	0/15 (0%)	E(2), D(2), Q(1), A(1), K(1)

Tools

Comparison

Figure 1. Venn-Diagram of the SNPs predicted from Polyphen, SIFT and SNAP that are assumed to have a negative effect on protein functionality.

A visual comparison of the predictions can be seen in Figure 1. For the comparison we only used results from SNAP2. The first observation made when comparing the three predictions is, that the number of SNPs that are predicted to have a negative effect greatly varies between the tools. While PolyPhen predicts seven of the SNPs to be deleterious, SIFT predicts five and SNAP only four to impact protein function. All three methods predicted the SNPs Q125E, T310R and C258Y to have a negative effect, which increses our confidence in the quality of these predictions. PolyPhen and SIFT or SNAP respectively both predicted the L17F and I213T mutation to be deleterious again increasing our confidence in these predictions. None of the tools predicted I361V or N404S to be deleterious and the R429H, M82I and A328T SNPs were only predicted by either SIFT or Polyphen, making them more unlikely.

Polyphen2

SNP	polyphen-score	sensitivity	specificity	description
L17F	0.995	0.68	0.97	PROBABLY DAMAGING
M82I	0.468	0.89	0.90	POSSIBLY DAMAGING
Q125E	0.544	0.88	0.91	POSSIBLY DAMAGING
I213T	0.644	0.87	0.91	POSSIBLY DAMAGING
C258Y	1.000	0.00	1.00	PROBABLY DAMAGING
T310R	1.000	0.00	1.00	PROBABLY DAMAGING
A328T	0.999	0.14	0.99	PROBABLY DAMAGING
I361V	0.386	0.90	0.89	BENIGN
N404S	0.000	1.00	1.00	BENIGN
R429H	0.002	0.99	0.30	BENIGN

Snap/Snap2

SNP	snap-pred	snap-rel	snap-expected-acc	snap2-pred	snap2-rel	snap2-expected-acc
L17F	Neutral	3	78%	Neutral	6	82%
M82I	Neutral	4	85%	Neutral	4	72%
Q125E	Non-neutral	1	63%	Non-neutral	4	71%
I213T	Neutral	3	78%	Non-neutral	6	80%
C258Y	Non-neutral	2	70%	Non-neutral	6	80%
T310R	Non-neutral	1	63%	Non-neutral	6	80%
A328T	Neutral	4	85%	Neutral	1	57%
I361V	Neutral	5	89%	Neutral	8	93%
N404S	Neutral	7	94%	Neutral	9	97%
R429H	Neutral	6	92%	Neutral	4	72%

SIFT

SNP	sift-prediction	sift-score	sift-conservation	sequences represented at pos
L17F	AFFECT PROTEIN FUNCTION	0.00	4.32	1
M82I	be TOLERATED	0.26	3.12	17
Q125E	AFFECT PROTEIN FUNCTION	0.02	3.02	19
I213T	be TOLERATED	0.22	3.02	19
C258Y	AFFECT PROTEIN FUNCTION	0.02	3.02	19
T310R	AFFECT PROTEIN FUNCTION	0.01	3.02	19
A328T	be TOLERATED	1.00	3.02	19
I361V	be TOLERATED	0.06	3.02	19
N404S	be TOLERATED	0.99	3.02	19
R429H	AFFECT PROTEIN FUNCTION	0.03	3.21	15

Discussion

Given that all three prediction methods agree we will assume the mutations Q125E, T310R and C258Y to be deleterious. As neither I361V or N404S were predicted by a single we will assume them to be neutral.

For the less certain negative effect predictions L17F and I213T we tried to classify based on difference in amino acid properties and visual inspection. For L17F we believe that the change from L to F will not have a major effect for the protein function. The reason for this is that the mutation is inside a loop so a mutation here is less likely to affect the protein structure and therby the function. Also Leu end Phe are in both properties and size. The I213T mutation however causes a large change in polarity and is inside a helix. Both of these reasons increase the likelyhood that this SNP is non-neutral.

The SNPs M82I, A328T and R429H were all predicted by two methods to have no effect on protein function. While M82I and A328Tis both are inside a sheet, the mutation in M82I is not a large change in terms of size or chemical properties. The A328T mutation however is a change in polarity so the change might have an effect. For R429H we assume that the change to a amino acid with an aromatic ring inside a helix must have some kind of effect on the protein.

The actual results in HGMD for our SNPs:

SNP	Final Prediction	HGMD
L17F	neutral	neutral
M82I	neutral	neutral
Q125E	negative	negative
I213T	negative	negative
C258Y	negative	negative
T310R	negative	negative
A328T	uncertain	negative
I361V	neutral	neutral
N404S	neutral	neutral
R429H	uncertain	neutral

By simply using the consensus between all three predictions we allready were able to correctly classify five of the ten SNPs. For the contradicting predictions in most cases it helped to consider the chemical properties and the size of the amino acids. In the cases of the A328T and R429 mutation we feel like the the change in properties might or might not be large enough to safely call a mutation deleterious or not. Overall from the prediction methods SNAP2 performed best on our SNPs and predicted nine out of ten mutations correctly, while both SIFT and PolyPhen only got 7 out of ten right. It should be kept in mind however that our test set is very small and thereby might not be representative for all cases of SNPs. So just applying SNAP2 might not be ideal for every situation. Using multiple prediction methods also helps by detecting 'easier' cases, where no closer manual inspection might be necessary, which can help if the number of mutations is larger than in our case.

References

Task 6: MSUD - Sequence-based mutation analysis

Contents

Sequence-based mutation analysis

Task description

Comparison of wild type AA and mutant AA

Physiological properties<ref>http://en.wikipedia.org/wiki/Amino_acid</ref> and predicted secondary structure element (reprof with pssm)

Blosum62, PAM1, PAM250 scores

3D-Structure

PSI-BLAST - PSSM // homolog MSA

Tools

Comparison

Polyphen2

Snap/Snap2

SIFT

Discussion

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools