Difference between revisions of "Sequence-based mutation analysis of ARSA"

Revision as of 17:13, 26 June 2011

Proline and Alanine are both hydrophobic amino acids. So the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is structural very important. Alanine is one the the smallest amino acids and so the mutation from Pro to Ala should have a big influence on the structure of ARSA.

3

Gln-His

153

Description of Gln-His

Glutamine (Gln)

Histidine (His)

BLOSUM62	PAM1	PAM250
0 (worst: -3)	20 (worst: 0)	7 (worst: 0)

Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. Also Gln and His are very different in structure, so His needs much more space than Gln, which should have a big influence on the structure of ARSA.

4

Trp-Cys

193

Description of Trp-Cys

Tryptophan (Trp)

Cysteine (Cys)

BLOSUM62	PAM1	PAM250
-2 (worst: -4)	0 (worst: 0)	1 (worst: 1)

Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. This should have a huge influence on the structure of ARSA.

5

Thr-Met

274

Description of Thr-Met

Threonine (Thr)

Methionine (Met)

BLOSUM62	PAM1	PAM250
-1 (worst: -3)	2 (worst: 0)	1 (worst: 0)

Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. Also, Methionine has a very long sidechain while Threonine does not. So the structure of ARSA should be altered by this mutation.

6

Phe -Val

356

Description of Phe-Val

Phenylalanine (Phe)

Valine (Val)

BLOSUM62	PAM1	PAM250
-1 (worst: -4)	1 (worst: 0)	10 (worst: 1)

Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val.

7

Thr-Ile

409

Description of Thr-Ile

Threonine (Thr)

Isoleucine (Ile)

BLOSUM62	PAM1	PAM250
-2 (worst: -3)	7 (worst: 0)	4 (worst: 0)

Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes.

8

Asn-Ser

440

Description of Asn-Ser

Asparagine (Asn)

Serine (Ser)

BLOSUM62	PAM1	PAM250
1 (worst: -4)	34 (worst: 0)	8 (worst: 0)

Asparagine and Serine are both hydrophilic amino acids. Also they are almost of the same size. So the mutation should not have a very dramatic effect.

9

Cys-Gly

489

Description of Cys-Gly

Cystein (Cys)

Glycine (Gly)

BLOSUM62	PAM1	PAM250
-3 (worst: -4)	1 (worst: 0)	4 (worst: 0)

Cystein and Glycine are both hydrophilic amino acids. The only difference is the size: Gly is the smallest of the amino acids while Cys is a little bigger. The effect should not be that dramatically.

10

Arg-His

496

Description of Arg-His

Arginine (Arg)

Histidine (His)

BLOSUM62	PAM1	PAM250
0 (worst: -3)	8 (worst: 0)	5 (worst: 1)

Arginine and Histidine are both basic amino acids so the only effect could come from the difference in size of the two.

Substitution Matrices

To compare the different mutations here the scores for the three matrices in one table:

Nr.	Substitution	BLOSUM62	PAM1	PAM250
1	Asp(D) -> Asn(N)	1 (worst: -4)	36 (worst: 0)	7 (worst: 0)
2	Pro(P) -> Ala(A)	-1 (worst: -4)	22 (worst: 0)	11 (worst: 0)
3	Gln(Q) -> His(H)	0 (worst: -3)	20 (worst: 0)	7 (worst: 0)
4	Trp(W) -> Cys(C)	-2 (worst: -4)	0 (worst: 0)	1 (worst: 1)
5	Thr((T) -> Met(M)	-1 (worst: -3)	2 (worst: 0)	1 (worst: 0)
6	Phe(F) -> Val(V)	-1 (worst: -4)	1 (worst: 0)	10 (worst: 1)
7	Thr(T) -> Ile(I)	-2 (worst: -3)	7 (worst: 0)	4 (worst: 0)
8	Asn(N) -> Ser(S)	1 (worst: -4)	34 (worst: 0)	8 (worst: 0)
9	Cys(C) -> Gly(G)	-3 (worst: -4)	1 (worst: 0)	4 (worst: 0)
10	Arg(R) -> His(H)	0 (worst: -3)	8 (worst: 0)	5 (worst: 1)

Secondary Structure

As one can see in the picture above, none of the mutations is in the middle of a secondary structure element. Only the mutations 1,2,4 and 5 are close to or - depending on the prediction method - at the border of secondary structure elements.

SNAP

We ran snap using the following command:


snapfun -i ARSA.fasta -m mutants.txt -o snap.out

output:


nsSNP	Prediction	Reliability Index	Expected Accuracy
-----	------------	-------------------	-------------------
D29N	Non-neutral		7			96%
Q153H	 Neutral 		0			53%
T274M	Non-neutral		6			93%
T409I	Non-neutral		1			63%
C489G	Non-neutral		5			87%
W193C	Non-neutral		3			78%
F356V	 Neutral 		1			60%
N440S	Non-neutral		2			70%
R496H	 Neutral 		1			60%
P136A	Non-neutral		4			82%

In order to analyze all possible combinations of amino acid substitutions from the above mutated positions, we used the Generate Mutants tool on http://rostlab.org/services/snap/submit to create all possible exchanges from the following pattern: referenceAminoAcidPosition* . Then we again executed snap:


snapfun -i ARSA.fasta -m all_mutants.txt -o snap_all.out

Next, we wrote a perl script to parse and summarize the SNAP output in the following table, which shows which amino acid substitutions are Non-neutral or Neutral:

ref\mutation	A	R	N	D	C	Q	E	G	H	I	L	K	M	F	P	S	T	W	Y	V
D29	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
Q153	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
T274	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral
T409	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral
C489	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
W193	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral
F356	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Neutral	Neutral	Non-neutral	Neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Neutral
N440	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral
R496	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Neutral	Non-neutral	Neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Neutral	Non-neutral
P136	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral		Non-neutral	Non-neutral	Non-neutral	Non-neutral	Non-neutral

Multiple sequence alignments

First, we downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term taxonomy:40674, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA.
Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below.

Multiple sequence alignments of all 75 homologuous sequences using muscle

The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions.

pos	conservation - reference	conservation - mutant
29	0.86	0
153	0.14	0
274	0.87	0
409	0.35	0.16
489	0.80	0.05
193	0.13	0
356	0.15	0
440	0.15	0
496	0.14	0.01
136	0.93	0

PSI-BLAST


blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 D   -5 -5 -2  8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6    0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  2.49 1.56
 153 Q    3  2 -1  4 -4 -1 -1 -2  0 -2 -3 -3  4 -2 -3 -1 -2 -3 -2 -2   26  10   3  23   0   3   3   3   2   2   1   1  13   2   1   3   2   0   1   2  0.53 1.48
 274 T   -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 -3 -5 -4  1  8 -6 -5 -3    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   7  92   0   0   0  1.94 1.62
 409 T   -1  0  0 -1 -2 -1 -1  0 -1 -1 -1  0 -1 -1  3  0  1  6  0 -1    5   5   5   4   1   3   4   8   1   3   6   5   1   2  13   6   8  11   3   4  0.26 0.95
 489 C    2 -1  1 -4  8 -4 -4 -2 -1 -1 -2 -3 -1 -4 -4  0  0  5 -1 -3   15   4   8   0  36   0   0   2   1   3   3   1   1   0   0   6   5   9   2   0  0.99 1.22
 440 N   -5 -3  6  5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6  2 -2 -3 -6 -6 -5    0   1  46  36   0   1   2   0   0   0   0   1   0   0  10   1   1   0   0   0  1.48 1.67
 356 F   -3 -1 -5 -5 -3  0 -1 -6  1  3  0 -1  0  2 -6 -3 -2 -3  5  3    1   4   0   0   1   5   4   0   3  18   8   5   2   8   0   1   2   0  20  20  0.59 1.62
 193 W   -2  4  2  3 -5  0  0 -2  0 -3 -4  1 -3 -1 -2 -1 -2  1  1 -3    3  25  11  16   0   4   5   3   2   2   1   7   0   2   2   4   2   2   5   2  0.46 1.45
 136 P   -3 -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7  9 -4 -4 -7 -6 -5    1   0   0   0   0   0   0   0   0   0   0   0   0   0  98   0   0   0   0   0  3.03 1.61
 496 R   -3  1  0 -3 -4  1  1 -1  1 -3  1  1 -2  2  4  0 -3 -1 -1 -3    1   7   4   1   0   5  10   4   3   1  16   9   0   9  20   8   1   1   1   1  0.34 0.96

@@ Line 271: / Line 271: @@
 ==== Secondary Structure ====
-[[File:Sec_Struct_Mutations_ARSA.png | 900px | Predicted Secondary Structure for ARSA with marked Mutations 1-10]]
+[[ File:Sec_Struct_Mutations_ARSA.png | 900px ]]
+As one can see in the picture above, none of the mutations is in the middle of a secondary structure element. Only the mutations 1,2,4 and 5 are close to or - depending on the prediction method - at the border of secondary structure elements.
 === SNAP ===

Difference between revisions of "Sequence-based mutation analysis of ARSA"

Revision as of 17:13, 26 June 2011

Contents

Intro

Description of Mutations

Substitution Matrices

Secondary Structure

SNAP

Multiple sequence alignments

PSI-BLAST

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools