Sequence-based mutation analysis of ARSA

From Bioinformatikpedia
Revision as of 20:45, 26 June 2011 by Kassner (talk | contribs) (Comparison of results)

Intro

Description of Mutations

We randomly picked 10 missense mutations from dbSNP and HGMD. The mutations are listed below, together with a pymol mutagenesis image and a description of the properties of the mutations.

Nr. mutation position reference mutation both
1 Asp-Asn 29
29 arsa asp.png
29 arsa asn.png
29 arsa both.png
Description of Asp-Asn
Aspartic acid (Asp)
Asparagine (Asn)
BLOSUM62 PAM1 PAM250
1 (worst: -4) 36 (worst: 0) 7 (worst: 0)

Aspartic acid is an acidic amino acid while Asparagine is a hydrophilic amino acid. So the mutation changes the behaviour towards water as well as the pH.

2 Pro - Ala 136
136 arsa PRO.png
136 arsa ALA.png
136 arsa both.png
Description of Pro-Ala
Proline (Pro)
Alanine (Ala)
BLOSUM62 PAM1 PAM250
-1 (worst: -4) 22 (worst: 0) 11 (worst: 0)

Proline and Alanine are both hydrophobic amino acids. So the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is structural very important. Alanine is one the the smallest amino acids and so the mutation from Pro to Ala should have a big influence on the structure of ARSA.

3 Gln-His 153
153 arsa GLN.png
153 arsa HIS.png
153 arsa both.png
Description of Gln-His
Glutamine (Gln)
Histidine (His)
BLOSUM62 PAM1 PAM250
0 (worst: -3) 20 (worst: 0) 7 (worst: 0)

Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. Also Gln and His are very different in structure, so His needs much more space than Gln, which should have a big influence on the structure of ARSA.

4 Trp-Cys 193
193 arsa TRP.png
193 arsa CYS.png
193 arsa both.png
Description of Trp-Cys
Tryptophan (Trp)
Cysteine (Cys)
BLOSUM62 PAM1 PAM250
-2 (worst: -4) 0 (worst: 0) 1 (worst: 1)

Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. This should have a huge influence on the structure of ARSA.

5 Thr-Met 274
274 arsa THR.png
274 arsa MET.png
274 arsa both.png
Description of Thr-Met
Threonine (Thr)
Methionine (Met)
BLOSUM62 PAM1 PAM250
-1 (worst: -3) 2 (worst: 0) 1 (worst: 0)

Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. Also, Methionine has a very long sidechain while Threonine does not. So the structure of ARSA should be altered by this mutation.

6 Phe -Val 356
356 arsa PHE.png
356 arsa VAL.png
356 arsa both.png
Description of Phe-Val
Phenylalanine (Phe)
Valine (Val)
BLOSUM62 PAM1 PAM250
-1 (worst: -4) 1 (worst: 0) 10 (worst: 1)

Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val.

7 Thr-Ile 409
409 arsa THR.png
409 arsa ILE.png
409 arsa both.png
Description of Thr-Ile
Threonine (Thr)
Isoleucine (Ile)
BLOSUM62 PAM1 PAM250
-2 (worst: -3) 7 (worst: 0) 4 (worst: 0)

Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes.

8 Asn-Ser 440
440 arsa ASN.png
440 arsa SER.png
440 arsa both.png
Description of Asn-Ser
Asparagine (Asn)
Serine (Ser)
BLOSUM62 PAM1 PAM250
1 (worst: -4) 34 (worst: 0) 8 (worst: 0)

Asparagine and Serine are both hydrophilic amino acids. Also they are almost of the same size. So the mutation should not have a very dramatic effect.

9 Cys-Gly 489
489 arsa CYS.png
489 arsa GLY.png
489 arsa both.png
Description of Cys-Gly
Cystein (Cys)
Glycine (Gly)
BLOSUM62 PAM1 PAM250
-3 (worst: -4) 1 (worst: 0) 4 (worst: 0)

Cystein and Glycine are both hydrophilic amino acids. The only difference is the size: Gly is the smallest of the amino acids while Cys is a little bigger. The effect should not be that dramatically.

10 Arg-His 496
496 arsa ARG.png
496 arsa HIS.png
496 arsa both.png
Description of Arg-His
Arginine (Arg)
Histidine (His)
BLOSUM62 PAM1 PAM250
0 (worst: -3) 8 (worst: 0) 5 (worst: 1)

Arginine and Histidine are both basic amino acids so the only effect could come from the difference in size of the two.

Substitution Matrices

To compare the different mutations here the scores for the three matrices in one table:

Nr. Substitution BLOSUM62 PAM1 PAM250
1 Asp(D) -> Asn(N) 1 (worst: -4) 36 (worst: 0) 7 (worst: 0)
2 Pro(P) -> Ala(A) -1 (worst: -4) 22 (worst: 0) 11 (worst: 0)
3 Gln(Q) -> His(H) 0 (worst: -3) 20 (worst: 0) 7 (worst: 0)
4 Trp(W) -> Cys(C) -2 (worst: -4) 0 (worst: 0) 1 (worst: 1)
5 Thr((T) -> Met(M) -1 (worst: -3) 2 (worst: 0) 1 (worst: 0)
6 Phe(F) -> Val(V) -1 (worst: -4) 1 (worst: 0) 10 (worst: 1)
7 Thr(T) -> Ile(I) -2 (worst: -3) 7 (worst: 0) 4 (worst: 0)
8 Asn(N) -> Ser(S) 1 (worst: -4) 34 (worst: 0) 8 (worst: 0)
9 Cys(C) -> Gly(G) -3 (worst: -4) 1 (worst: 0) 4 (worst: 0)
10 Arg(R) -> His(H) 0 (worst: -3) 8 (worst: 0) 5 (worst: 1)

Secondary Structure

Sec Struct Mutations ARSA.png

As one can see in the picture above, none of the mutations is in the middle of a secondary structure element. Only the mutations 1,2,4 and 5 are close to or - depending on the prediction method - at the border of secondary structure elements.

Prediction of effect

SNAP

We ran snap using the following command:


snapfun -i ARSA.fasta -m mutants.txt -o snap.out

output:


nsSNP	Prediction	Reliability Index	Expected Accuracy
-----	------------	-------------------	-------------------
D29N	Non-neutral		7			96%
Q153H	 Neutral 		0			53%
T274M	Non-neutral		6			93%
T409I	Non-neutral		1			63%
C489G	Non-neutral		5			87%
W193C	Non-neutral		3			78%
F356V	 Neutral 		1			60%
N440S	Non-neutral		2			70%
R496H	 Neutral 		1			60%
P136A	Non-neutral		4			82%

In order to analyze all possible combinations of amino acid substitutions from the above mutated positions, we used the Generate Mutants tool on http://rostlab.org/services/snap/submit to create all possible exchanges from the following pattern: referenceAminoAcidPosition* . Then we again executed snap:


snapfun -i ARSA.fasta -m all_mutants.txt -o snap_all.out

Next, we wrote a perl script to parse and summarize the SNAP output in the following table, which shows which amino acid substitutions are Non-neutral or Neutral:

ref\mutation A R N D C Q E G H I L K M F P S T W Y V
D29 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
Q153 Non-neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
T274 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
T409 Neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
C489 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
W193 Non-neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
F356 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Neutral Non-neutral Non-neutral Neutral Neutral Neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Neutral Neutral
N440 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral
R496 Non-neutral Non-neutral Non-neutral Non-neutral Neutral Non-neutral Non-neutral Neutral Non-neutral Non-neutral Neutral Non-neutral Neutral Non-neutral Non-neutral Non-neutral Non-neutral Neutral Non-neutral
P136 Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral Non-neutral

SIFT

SIFT predicts the effect of amino acid substitutions by building a multiple alignment and then calculating the probability of each possible substitution. The score in the SIFT-output is the probability of the substitution. SIFT predicts a substitution as damaging if this probability is <= 0.05 and as tolerated if the probability is > 0.05. The median conservation in the output measures the diversity of the sequences used in the multiple alignment. It should be between 2.75 and 3.25. Higher values indicate that the sequences were too closely related. <ref>http://sift.jcvi.org/www/SIFT_help.html</ref> We used SIFT with the UniProt-TrEMBL 2009 Database and uploaded a file containing our chosen mutations:

D29N
P136A
Q153H
W193C
T274M
F356V
T409I
N440S
C489G
R496H

As median conservation we used the standard parameter 3.00 and we excluded all sequences with a sequence identity higher than 90%.

Mutation NR Substitution predicted score median conservation comment
1 D29N AFFECT PROTEIN FUNCTION 0.00 3.04
2 P136A AFFECT PROTEIN FUNCTION 0.00 3.07
3 Q153H TOLERATED 0.29 3.04
4 W193C AFFECT PROTEIN FUNCTION 0.04 3.04
5 T274M AFFECT PROTEIN FUNCTION 0.00 3.04
6 F356V TOLERATED 0.81 3.04
7 T409I AFFECT PROTEIN FUNCTION 0.02 3.48 low confidence
8 N440S TOLERATED 0.07 3.08
9 C489G AFFECT PROTEIN FUNCTION 0.00 3.56 low confidence
10 R496H TOLERATED 0.28 3.56

PolyPhen

PolyPhen predicts wether a mutation is damaging or not by using a Naïve-Bayes-approach. The score is the posterior probability that the mutation is damaging.<ref>http://genetics.bwh.harvard.edu/pph2/dokuwiki/overview</ref> We used PolyPhen with standard parameters. The results are shown below.

Mutation NR Substitution HumDiv HumVar Link (expires in September)
predicted score predicted score
1 D29N probably damaging 1.000 probably damaging 0.999 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479962.html
2 P136A probably damaging 1.000 probably damging 0.999 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479963.html
3 Q153H possibly damaging 0.945 possibly damaging 0.520 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479964.html
4 W193C probably damaging 0.977 possibly damaging 0.633 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479965.html
5 T274M probably damaging 1.000 probably damaging 1.000 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479966.html
6 F356V benign 0.000 benign 0.001 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479967.html
7 T409I probably damaging 0.961 benign 0.432 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479968.html
8 N440S possibly damaging 0.834 benign 0.255 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479969.html
9 C489G damaging 0.999 probably damaging 0.906 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479970.html
10 R496H benign 0.003 benign 0.000 http://genetics.bwh.harvard.edu/ggi/pph2/6b8e887bab2c4971aff12f9579630878eaaed666/479971.html

Comparison of results

To compare the results of the different prediction methods we created the table below. If a mutation was predicted to have an effect, a "X" was set, if a mutation was predicted to have no effect, a "-" was set. For PolyPhen "X" means "damaging" or "probably damaging", a "/" means "possibly damaging" and a "-" means "benign".

Mutation NR Substitution SNAP SIFT PolyPhen
HumDiv HumVar
1 D29N X X X X
2 P136A X X X X
3 Q153H - - / /
4 W193C X X X /
5 T274M X X X X
6 F356V - - - -
7 T409I X X X -
8 N440S X - / -
9 C489G X X X X
10 R496H - - - -

Multiple sequence alignments

First, we downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term taxonomy:40674, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA.
Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below.

Multiple sequence alignments of all 75 homologuous sequences using muscle

The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions.

pos conservation - reference conservation - mutant
29 0.86 0
153 0.14 0
274 0.87 0
409 0.35 0.16
489 0.80 0.05
193 0.13 0
356 0.15 0
440 0.15 0
496 0.14 0.01
136 0.93 0

PSI-BLAST


blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 D   -5 -5 -2  8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6    0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  2.49 1.56
 153 Q    3  2 -1  4 -4 -1 -1 -2  0 -2 -3 -3  4 -2 -3 -1 -2 -3 -2 -2   26  10   3  23   0   3   3   3   2   2   1   1  13   2   1   3   2   0   1   2  0.53 1.48
 274 T   -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 -3 -5 -4  1  8 -6 -5 -3    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   7  92   0   0   0  1.94 1.62
 409 T   -1  0  0 -1 -2 -1 -1  0 -1 -1 -1  0 -1 -1  3  0  1  6  0 -1    5   5   5   4   1   3   4   8   1   3   6   5   1   2  13   6   8  11   3   4  0.26 0.95
 489 C    2 -1  1 -4  8 -4 -4 -2 -1 -1 -2 -3 -1 -4 -4  0  0  5 -1 -3   15   4   8   0  36   0   0   2   1   3   3   1   1   0   0   6   5   9   2   0  0.99 1.22
 440 N   -5 -3  6  5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6  2 -2 -3 -6 -6 -5    0   1  46  36   0   1   2   0   0   0   0   1   0   0  10   1   1   0   0   0  1.48 1.67
 356 F   -3 -1 -5 -5 -3  0 -1 -6  1  3  0 -1  0  2 -6 -3 -2 -3  5  3    1   4   0   0   1   5   4   0   3  18   8   5   2   8   0   1   2   0  20  20  0.59 1.62
 193 W   -2  4  2  3 -5  0  0 -2  0 -3 -4  1 -3 -1 -2 -1 -2  1  1 -3    3  25  11  16   0   4   5   3   2   2   1   7   0   2   2   4   2   2   5   2  0.46 1.45
 136 P   -3 -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7  9 -4 -4 -7 -6 -5    1   0   0   0   0   0   0   0   0   0   0   0   0   0  98   0   0   0   0   0  3.03 1.61
 496 R   -3  1  0 -3 -4  1  1 -1  1 -3  1  1 -2  2  4  0 -3 -1 -1 -3    1   7   4   1   0   5  10   4   3   1  16   9   0   9  20   8   1   1   1   1  0.34 0.96

References

<references />