Difference between revisions of "Sequence-based mutation analysis of ARSA"

Revision as of 15:35, 11 August 2011

Many mutations in the human genome are suspected to have an impact on protein function. Thus, the prediction of the effects of these mutations on the function - especially for disease causing mutation - is a very important task. In this TASK, we will apply different sequence based methods to predict mutation effects on the protein's function and then try to discriminate neutral from non-neutral mutations.
We randomly picked 10 missense mutations from dbSNP and HGMD. At this point, we act like we did not know which of these mutations is causing the disease and which is not. After having applied the methods and interpreted the results, we are going to lift the curtain and check if our guesses were correct. The mutations, we picked are summarized in the table below:

Nr.	mutation	position
1	Asp-Asn	29
2	Pro - Ala	136
3	Gln-His	153
4	Trp-Cys	193
5	Thr-Met	274
6	Phe -Val	356
7	Thr-Ile	409
8	Asn-Ser	440
9	Cys-Gly	489
10	Arg-His	496

In the following sections, we will apply the methods and discuss the individual results. An overall summary and guess of the impact on the function is then made in the last section.

Substitution Matrices

A first very rough guess on the effect of mutation can be made by looking at the standard substitution matrices, like the BLOSUM and PAM matrices. Low negative scores in these matrices indicate, that mutations of two amino acids are rarely observed and thus the amino acids should have very different physico-chemical properities. Consequently substitution with low scores might affect structure and/or the function of the protein.
Scores around zero reflect substitutions which occur with mean frequency. The properties of the substituted amino acids are not very similar, but also do not differ much.
Substitutions with a high positive score are observed very frequently. Thus the properties of the amino acids are similar and thus the substiotion is not very likely to affect the protein's structure or function.
When doing this analysis, we have to keep in mind, that this is a very inaccurate method to "predict" the impact of a certain mutation, as these matrices are calculated with a lot of proteins, which evens out effects specific to our protein, protein familiy respectively. But it can give a first gues, if the mutations is likely to occur in general or not.
We extracted the scores for our mutations from BLOSUM62, PAM1 and PAM100 and summarized these in the following table:

Nr.	Substitution	BLOSUM62	PAM1	PAM250
1	Asp(D) -> Asn(N)	1 (worst: -4)	36 (worst: 0)	7 (worst: 0)
2	Pro(P) -> Ala(A)	-1 (worst: -4)	22 (worst: 0)	11 (worst: 0)
3	Gln(Q) -> His(H)	0 (worst: -3)	20 (worst: 0)	7 (worst: 0)
4	Trp(W) -> Cys(C)	-2 (worst: -4)	0 (worst: 0)	1 (worst: 1)
5	Thr((T) -> Met(M)	-1 (worst: -3)	2 (worst: 0)	1 (worst: 0)
6	Phe(F) -> Val(V)	-1 (worst: -4)	1 (worst: 0)	10 (worst: 1)
7	Thr(T) -> Ile(I)	-2 (worst: -3)	7 (worst: 0)	4 (worst: 0)
8	Asn(N) -> Ser(S)	1 (worst: -4)	34 (worst: 0)	8 (worst: 0)
9	Cys(C) -> Gly(G)	-3 (worst: -4)	1 (worst: 0)	4 (worst: 0)
10	Arg(R) -> His(H)	0 (worst: -3)	8 (worst: 0)	5 (worst: 1)

Secondary Structure

As one can see in the picture above, none of the mutations is in the middle of a secondary structure element. Only the mutations 1,2,4 and 5 are close to or - depending on the prediction method - at the border of secondary structure elements.

Prediction of effect

SNAP

SNAP uses a neural-network approach to predict effects of single amino acid substitutions on protein function. It uses in silico derived protein information - like secondary structure, conservation, solvent accessibility, etc. - for the prediction. <ref> SNAP: predict effect of non-synonymous polymorphisms on function. Yana Bromberg and Burkhard Rost Nucleic Acids Research, 2007, Vol. 35, No. 11 3823-3835 </ref>
We ran snap using the following command:


snapfun -i ARSA.fasta -m mutants.txt -o snap.out

output:


nsSNP	Prediction	Reliability Index	Expected Accuracy
-----	------------	-------------------	-------------------
D29N	Non-neutral		7			96%
Q153H	 Neutral 		0			53%
T274M	Non-neutral		6			93%
T409I	Non-neutral		1			63%
C489G	Non-neutral		5			87%
W193C	Non-neutral		3			78%
F356V	 Neutral 		1			60%
N440S	Non-neutral		2			70%
R496H	 Neutral 		1			60%
P136A	Non-neutral		4			82%

In order to analyze all possible combinations of amino acid substitutions from the above mutated positions, we used the Generate Mutants tool on http://rostlab.org/services/snap/submit to create all possible exchanges from the following pattern: referenceAminoAcidPosition* . Then we again executed snap:


snapfun -i ARSA.fasta -m all_mutants.txt -o snap_all.out

Next, we wrote a perl script to parse and summarize the SNAP output in the following table, which shows which amino acid substitutions are Non-neutral or Neutral. We consider a residue as important if 66-100 % of all possible substitutions are Non-Neutral, as probably important if 33-66 % of possible substitutions are Non-Neutral and as not important, if 0-33 % of all possible substitutions are Non-Neutral.

ref\mutation	important	A	R	N	D	C	Q	E	G	H	I	L	K	M	F	P	S	T	W	Y	V
D29	yes	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
Q153	yes	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
T274	yes	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral
T409	yes	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral
C489	yes	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
W193	yes	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral
F356	probably	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Neutral	Neutral	Non-neutral	Neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Neutral
N440	yes	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
R496	yes	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral
P136	yes	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral

SIFT

SIFT predicts the effect of amino acid substitutions by building a multiple alignment and then calculating the probability of each possible substitution. The score in the SIFT-output is the probability of the substitution. SIFT predicts a substitution as damaging if this probability is <= 0.05 and as tolerated if the probability is > 0.05. The median conservation in the output measures the diversity of the sequences used in the multiple alignment. It should be between 2.75 and 3.25. Higher values indicate that the sequences were too closely related. <ref>http://sift.jcvi.org/www/SIFT_help.html</ref> We used SIFT with the UniProt-TrEMBL 2009 Database and uploaded a file containing our chosen mutations:

D29N
P136A
Q153H
W193C
T274M
F356V
T409I
N440S
C489G
R496H

As median conservation we used the standard parameter 3.00 and we excluded all sequences with a sequence identity higher than 90%.

Mutation NR	Substitution	predicted	score	median conservation	comment
1	D29N	AFFECT PROTEIN FUNCTION	0.00	3.04
2	P136A	AFFECT PROTEIN FUNCTION	0.00	3.07
3	Q153H	TOLERATED	0.29	3.04
4	W193C	AFFECT PROTEIN FUNCTION	0.04	3.04
5	T274M	AFFECT PROTEIN FUNCTION	0.00	3.04
6	F356V	TOLERATED	0.81	3.04
7	T409I	AFFECT PROTEIN FUNCTION	0.02	3.48	low confidence
8	N440S	TOLERATED	0.07	3.08
9	C489G	AFFECT PROTEIN FUNCTION	0.00	3.56	low confidence
10	R496H	TOLERATED	0.28	3.56

PolyPhen

PolyPhen predicts wether a mutation is damaging or not by using a Naïve-Bayes-approach. The score is the posterior probability that the mutation is damaging.<ref>http://genetics.bwh.harvard.edu/pph2/dokuwiki/overview</ref> We used PolyPhen with standard parameters. The results are shown below.

Mutation NR	Substitution	HumDiv		HumVar		Link (expires in September)
		predicted	score	predicted	score
1	D29N	probably damaging	1.000	probably damaging	0.999	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479962.html
2	P136A	probably damaging	1.000	probably damging	0.999	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479963.html
3	Q153H	possibly damaging	0.945	possibly damaging	0.520	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479964.html
4	W193C	probably damaging	0.977	possibly damaging	0.633	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479965.html
5	T274M	probably damaging	1.000	probably damaging	1.000	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479966.html
6	F356V	benign	0.000	benign	0.001	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479967.html
7	T409I	probably damaging	0.961	benign	0.432	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479968.html
8	N440S	possibly damaging	0.834	benign	0.255	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479969.html
9	C489G	damaging	0.999	probably damaging	0.906	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479970.html
10	R496H	benign	0.003	benign	0.000	http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479971.html

Comparison of results

To compare the results of the different prediction methods we created the table below. If a mutation was predicted to have an effect, a "X" was set, if a mutation was predicted to have no effect, a "-" was set. For PolyPhen "X" means "damaging" or "probably damaging", a "/" means "possibly damaging" and a "-" means "benign".

Mutation NR	Substitution	SNAP	SIFT	PolyPhen
				HumDiv	HumVar
1	D29N	X	X	X	X
2	P136A	X	X	X	X
3	Q153H	-	-	/	/
4	W193C	X	X	X	/
5	T274M	X	X	X	X
6	F356V	-	-	-	-
7	T409I	X	X	X	-
8	N440S	X	-	/	-
9	C489G	X	X	X	X
10	R496H	-	-	-	-

Multiple sequence alignments

First, we downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term taxonomy:40674, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA.
Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below.

Multiple sequence alignments of all 75 homologuous sequences using muscle

The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions.

pos	conservation - reference	conservation - mutant
29	0.86	0
153	0.14	0
274	0.87	0
409	0.35	0.16
489	0.80	0.05
193	0.13	0
356	0.15	0
440	0.15	0
496	0.14	0.01
136	0.93	0

PSI-BLAST

To infer the position specific sequence profile, we executed PSI-BLAST with the following command:


blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt

The graphic shows the relevant lines of the profile matrix regarding our mutated positions. The scores of interest - which score our mutation substitutions - are highlighted in green.


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 D   -5 -5 -2  8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6    0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  2.49 1.56
 153 Q    3  2 -1  4 -4 -1 -1 -2  0 -2 -3 -3  4 -2 -3 -1 -2 -3 -2 -2   26  10   3  23   0   3   3   3   2   2   1   1  13   2   1   3   2   0   1   2  0.53 1.48
 274 T   -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 -3 -5 -4  1  8 -6 -5 -3    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   7  92   0   0   0  1.94 1.62
 409 T   -1  0  0 -1 -2 -1 -1  0 -1 -1 -1  0 -1 -1  3  0  1  6  0 -1    5   5   5   4   1   3   4   8   1   3   6   5   1   2  13   6   8  11   3   4  0.26 0.95
 489 C    2 -1  1 -4  8 -4 -4 -2 -1 -1 -2 -3 -1 -4 -4  0  0  5 -1 -3   15   4   8   0  36   0   0   2   1   3   3   1   1   0   0   6   5   9   2   0  0.99 1.22
 440 N   -5 -3  6  5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6  2 -2 -3 -6 -6 -5    0   1  46  36   0   1   2   0   0   0   0   1   0   0  10   1   1   0   0   0  1.48 1.67
 356 F   -3 -1 -5 -5 -3  0 -1 -6  1  3  0 -1  0  2 -6 -3 -2 -3  5  3    1   4   0   0   1   5   4   0   3  18   8   5   2   8   0   1   2   0  20  20  0.59 1.62
 193 W   -2  4  2  3 -5  0  0 -2  0 -3 -4  1 -3 -1 -2 -1 -2  1  1 -3    3  25  11  16   0   4   5   3   2   2   1   7   0   2   2   4   2   2   5   2  0.46 1.45
 136 P   -3 -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7  9 -4 -4 -7 -6 -5    1   0   0   0   0   0   0   0   0   0   0   0   0   0  98   0   0   0   0   0  3.03 1.61
 496 R   -3  1  0 -3 -4  1  1 -1  1 -3  1  1 -2  2  4  0 -3 -1 -1 -3    1   7   4   1   0   5  10   4   3   1  16   9   0   9  20   8   1   1   1   1  0.34 0.96

Summary and Discussion

The mutations are listed below, together with a pymol mutagenesis image and a description of the properties of the mutations. We also included short summary tables of the methods we applied and added a short discussion/interpretation of the results. For a detailed descitption of the summary tables, please read the individual sections.

Nr.

mutation

position

reference

mutation

both

1

Asp-Asn

29

Description of Asp-Asn

Aspartic acid (Asp)

Asparagine (Asn)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
1 (worst: -4)	36 (worst: 0)	7 (worst: 0)	-5	X	X	X	X	important	0.86	yes (at the border)

Aspartic acid is an acidic amino acid while Asparagine is a hydrophilic amino acid. So the mutation changes the behaviour towards water as well as the pH. The lysosomal enzyme exhibits a very low pH value, thus acidic amino acids are preferred in this environment. Consequently the effect could be deleterious. This hypothesis is supported by all predictions and also the substitution matrices show rather low values. The mutations is located at the border of a beta sheet, which is also an indicator for a possible deleterious effect. Also the conservtion of the amino acid is very high in the MSA of related sequences, which indicates, that the residue is quite important. Furthermore it is classified as important residue by our analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect. This effect is not introduced by a structural change of the aminpo acid itself - structures are very similar (see abbove) - but through the drastic change of the amino acid property. Regarding to our analysis we classify this mutation as deleterious.
This is supported by the Uniprot annotation, which associates it to infantile-onset Metachromatic leukodystrophy. It causes a severe reduction of enzyme activity.

2

Pro - Ala

136

Description of Pro-Ala

Proline (Pro)

Alanine (Ala)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-1 (worst: -4)	22 (worst: 0)	11 (worst: 0)	-3	X	X	X	X	important	0.93	yes (at the border)

Proline and Alanine are both hydrophobic amino acids. In contrast to mutation 1, the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is structural very important. It is even located at the border of an alpha-helix. Thus, the change to the smallest amino acid Alanine could introduced a big structural change, despite the similarity, regariding to their chemical properties.
For this mutation, all predictions yield damaging effects and the substitution matrices indicate, that a substitution from Pro to Ala is very unlikely. Again the conservtion of the amino acid is very high in the MSA of related sequences, which indicates, that the residue is quite important.
Furthermore it is classified as important residue by our analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect.
Regarding to HGMD, this mutation leads to the outbreak of the disease.

3

Gln-His

153

Description of Gln-His

Glutamine (Gln)

Histidine (His)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
0 (worst: -3)	20 (worst: 0)	7 (worst: 0)	0	-	/	/	-	important	0.14	no

Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. Also Gln and His are very different in structure, so His needs much more space than Gln, which should have a big influence on the structure of ARSA (see above pymol images).
In this case the mutations is not located within a secondary structure element and it is also not conserved in the MSA. Further on, the values in PAM and the PSSM are quite high. The value in the BLSOUM62 matrix however is low. These factors indicate, that the mutation should not have a severe effect.
The predictions made by SNAP, SIFT and Polyphen are not consistent. Whereas SIFT and SNAP predict a neutral effect, Polyphen predicts a benign effect. Regarding to the above results, we tend to classify this mutation as neutral.
This is however not the case. HGMD states, that the mutation is associated to Metachromatic Leukodystrophy.

4

Trp-Cys

193

Description of Trp-Cys

Tryptophan (Trp)

Cysteine (Cys)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-2 (worst: -4)	0 (worst: 0)	1 (worst: 1)	-5	X	X	/	X	important	0.13	no

Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. This should have a huge influence on the structure of ARSA.
Again the residue is not conserved across the homologs to ARSA and it is not located within a secondary structure element. This could indicate a neutral substitution. However, all substitution matrices yield very low values for the given substitution and the predictions of SNAP, SIFT and Polyphen suggest a damaging effect. Regarding these results we would assign this mutation a deleterious effect.
However, HGMD does not contain this mutation and dbSNP does not assign a deleterious effect. The mutation is a single nucleotide polymorphism (SNP), which - by defintion - occurs in a certain part of the population. As Metachromatic leukodystrophy is not very widespread this mutation should be a non-damaging natural variant.

5

Thr-Met

274

Description of Thr-Met

Threonine (Thr)

Methionine (Met)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-1 (worst: -3)	2 (worst: 0)	1 (worst: 0)	-3	X	X	X	X	important	0.87	yes (at the border)

Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. Also, Methionine has a very long sidechain while Threonine does not. So the structure of ARSA should be altered by this mutation.
Besides these properties, the mutation is located within a secondary structure element, the residue (Thr) is highly conserved across homologs and all prediction tools predict a deleterious effect on the enzyme's function. Further on, the values in the substitution matrices are very low, indicating a deleterious effect. Reagrding to this, the mutation should have a severe effect on the structure and the function.
HGMD assigns a deleterious effect to the mutation.

6

Phe -Val

356

Description of Phe-Val

Phenylalanine (Phe)

Valine (Val)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-1 (worst: -4)	1 (worst: 0)	10 (worst: 1)	3	-	-	-	-	probably important	0.15	no

Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val. While looking at the substitution-matrices, one can notice that the scores are not great but also not really bad. The prediction methods all agree, that this mutation should have no harmful effect and due to the fact that the conservation in the MSA is very low and the mutation is not disrupting a secondary structure element, we believe that this mutation should be neutral. dbSNP classifies this mutation as SNP, so it should not be harmful.

7

Thr-Ile

409

Description of Thr-Ile

Threonine (Thr)

Isoleucine (Ile)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-2 (worst: -3)	7 (worst: 0)	4 (worst: 0)	-1	X	X	-	X	important	0.35	no

Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes. All prediction methods except the HumVar-Mode of PolyPhen assign a functional change to this mutation. The conservation in the MSA is relatively high but the mutation does not disrupt a secondary structure element and the scores in the substitution matrices are not that bad. The mutation is known to cause Metachromatic Leukodystrophy.

8

Asn-Ser

440

Description of Asn-Ser

Asparagine (Asn)

Serine (Ser)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
1 (worst: -4)	34 (worst: 0)	8 (worst: 0)	-2	X	/	-	-	important	0.15	no

Asparagine and Serine are both hydrophilic amino acids. Also they are almost of the same size. So the mutation should not have a very dramatic effect. The scores in the substitution matrices for this mutation are very high, the conservation in the MSA is very low and the mutation is not disrupting a secondary structure elemtent but nevertheless the prediction methods do not agree on the effect of the mutation. DbSNP classifies this mutation as SNP, so it should not be harmful.

9

Cys-Gly

489

Description of Cys-Gly

Cystein (Cys)

Glycine (Gly)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen (HumDiv)	Polyphen (HumVar)	SIFT	Residue Importance	Conservation in MSA	Disrupting SS
-3 (worst: -4)	1 (worst: 0)	4 (worst: 0)	-2	X	X	X	X	important	0.80	no

Cystein and Glycine are both hydrophilic amino acids. One difference is the size: Gly is the smallest of the amino acids while Cys is a little bigger. But more important Cystein contains sulfur which is important for building sulfur bridges. So function should be changed by this mutation. The conservation of Cystein is very high in the MSA and the scores in the substitution matrices are very low. Also, all 4 methods agree that this mutation changes the function of the Arylsulfatase A. This mutation causes Metachromatic leukodystrophy.

10

Arg-His

496

Description of Arg-His

Arginine (Arg)

Histidine (His)

BLOSUM62	PAM1	PAM250	PSI-BLAST	SNAP	Polyphen	SIFT (HumDiv)	Polyphen (HumVar)	Residue Importance	Conservation in MSA	Disrupting SS
0 (worst: -3)	8 (worst: 0)	5 (worst: 1)	1	-	-	-	-	important	0.14	no

Arginine and Histidine are both basic amino acids so the only effect could come from the difference in size of the two. The conservation of Arginine in the MSA is very low and all 4 methods agree in the fact that this mutation is not disease-causing. Also the fact that the mutation does not disrupt a secondary structure element supports this idea. The mutation is classified as SNP and due to that not disease-causing.

References

@@ Line 49: / Line 49: @@
 |-
 |}
+In the following sections, we will apply the methods and discuss the individual results. An overall summary and guess of the impact on the function is then made in the last section.
 === Substitution Matrices ===

Difference between revisions of "Sequence-based mutation analysis of ARSA"

Revision as of 15:35, 11 August 2011

Contents

Introduction

Substitution Matrices

Secondary Structure

Prediction of effect

SNAP

SIFT

PolyPhen

Comparison of results

Multiple sequence alignments

PSI-BLAST

Summary and Discussion

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools