Difference between revisions of "Sequence-based mutation analysis of ARSA"

From Bioinformatikpedia
(Intro)
(SNAP)
Line 38: Line 38:
   
   
  +
output:
The relevant lines of the psiblast matrix are shown below:
 
  +
<code>
  +
nsSNP Prediction Reliability Index Expected Accuracy
  +
----- ------------ ------------------- -------------------
  +
D29N Non-neutral 7 96%
  +
Q153H Neutral 0 53%
  +
T274M Non-neutral 6 93%
  +
T409I Non-neutral 1 63%
  +
C489G Non-neutral 5 87%
  +
D381D Neutral 5 89%
  +
P195P Neutral 6 92%
  +
H151H Neutral 4 85%
  +
W193C Non-neutral 3 78%
  +
</code>
   
 
=== Multiple sequence alignments ===
 
=== Multiple sequence alignments ===

Revision as of 15:33, 23 June 2011

Intro

SNP type mutation position
missense Asp-Asn 29
missense Gln-His 153
missense Thr-Met 274
missense Thr-Ile 409
missense Cys-Gly 489
missense Trp [W]-Cys [C] 193
missense Phe [F]-Val [V] 356
missense Asn [N]-Ser [S] 440
missense R - H 496
missense P - A 136

SNAP

We ran snap using the following command:


snapfun -i ARSA.fasta -m mutants.txt -o snap.out


output:


nsSNP	Prediction	Reliability Index	Expected Accuracy
-----	------------	-------------------	-------------------
D29N	Non-neutral		7			96%
Q153H	 Neutral 		0			53%
T274M	Non-neutral		6			93%
T409I	Non-neutral		1			63%
C489G	Non-neutral		5			87%
D381D	 Neutral 		5			89%
P195P	 Neutral 		6			92%
H151H	 Neutral 		4			85%
W193C	Non-neutral		3			78%

Multiple sequence alignments

First, we downloaded the HSSP file for ARSA to get all proteins, which are homolog to it. Then we extracted from it all 75 mammalian proteins and downloaded their sequences. Uniprot identifiers of these are listed below:

  • sp|Q08DD1|ARSA_BOVIN
  • sp|P15289|ARSA_HUMAN
  • sp|P50428|ARSA_MOUSE
  • sp|P15848|ARSB_HUMAN
  • sp|P50429|ARSB_MOUSE
  • sp|P50430|ARSB_RAT
  • sp|P51689|ARSD_HUMAN
  • sp|P51690|ARSE_HUMAN
  • sp|Q60HH5|ARSE_MACFA
  • sp|P54793|ARSF_HUMAN
  • sp|Q32KH9|ARSG_CANFA
  • sp|Q96EG1|ARSG_HUMAN
  • sp|Q3TYD4|ARSG_MOUSE
  • sp|Q32KJ9|ARSG_RAT
  • sp|Q32KH8|ARSH_CANFA
  • sp|Q5FYA8|ARSH_HUMAN
  • sp|Q32KH7|ARSI_CANFA
  • sp|Q5FYB1|ARSI_HUMAN
  • sp|Q32KI9|ARSI_MOUSE
  • sp|Q32KJ8|ARSI_RAT
  • sp|Q32KH5|GALNS_CANFA
  • sp|P34059|GALNS_HUMAN
  • sp|Q571E4|GALNS_MOUSE
  • sp|Q8WNQ7|GALNS_PIG
  • sp|Q32KJ6|GALNS_RAT
  • sp|P08842|STS_HUMAN
  • sp|P50427|STS_MOUSE
  • sp|P15589|STS_RAT
  • tr|Q8N322|Q8N322_HUMAN
  • tr|Q96I49|Q96I49_HUMAN
  • tr|Q6YL38|Q6YL38_HUMAN
  • tr|Q63HL5|Q63HL5_HUMAN
  • tr|Q6ZNJ9|Q6ZNJ9_HUMAN
  • tr|B4DVI5|B4DVI5_HUMAN
  • tr|A8K4A0|A8K4A0_HUMAN
  • tr|C9J5G7|C9J5G7_HUMAN
  • tr|B7XD04|B7XD04_HUMAN
  • tr|B7Z267|B7Z267_HUMAN
  • tr|B2R6P1|B2R6P1_HUMAN
  • tr|B7Z6V4|B7Z6V4_HUMAN
  • tr|B4DQ74|B4DQ74_HUMAN
  • tr|B7WNL6|B7WNL6_HUMAN
  • tr|A1L484|A1L484_HUMAN
  • tr|B2R7S0|B2R7S0_HUMAN
  • tr|B7Z1M0|B7Z1M0_HUMAN
  • tr|A5D7J7|A5D7J7_BOVIN
  • tr|Q32KI0|Q32KI0_CANFA
  • tr|Q32KI2|Q32KI2_CANFA
  • tr|D2HFI0|D2HFI0_AILME
  • tr|Q2XQY2|Q2XQY2_MACFA
  • tr|Q32KI1|Q32KI1_CANFA
  • tr|D2HFI1|D2HFI1_AILME
  • tr|A6MKC3|A6MKC3_CALJA
  • tr|D2H6D4|D2H6D4_AILME
  • tr|D2HFI2|D2HFI2_AILME
  • tr|Q8WNR3|Q8WNR3_PIG
  • tr|D2HXW7|D2HXW7_AILME
  • tr|Q32KI3|Q32KI3_CANFA
  • tr|A6QLR7|A6QLR7_BOVIN
  • tr|D2I3S5|D2I3S5_AILME
  • tr|A1XI21|A1XI21_HORSE
  • tr|Q32KI5|Q32KI5_CANFA
  • tr|Q19AM0|Q19AM0_BOVIN
  • tr|D2HFH9|D2HFH9_AILME
  • tr|A6QLZ3|A6QLZ3_BOVIN
  • tr|Q15B85|Q15B85_MACFA
  • tr|Q9DC66|Q9DC66_MOUSE
  • tr|Q32KK2|Q32KK2_RAT
  • tr|B5DEF1|B5DEF1_RAT
  • tr|B2RWQ7|B2RWQ7_MOUSE
  • tr|B4F7E2|B4F7E2_RAT
  • tr|Q8CC47|Q8CC47_MOUSE
  • tr|Q32KK0|Q32KK0_RAT
  • tr|Q3KR80|Q3KR80_RAT
  • tr|D3ZC09|D3ZC09_RAT

Next, we calculated multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview images of the alignments are shown below.

Multiple sequence alignments of all 75 homologuous sequences using muscle
pos conservation - reference conservation - mutant
29 0.86 0
153 0.14 0
274 0.87 0
409 0.35 0.16
489 0.80 0.05
193 0.13 0
356 0.15 0
440 0.15 0
496 0.14 0.01
136 0.93 0

PSI-BLAST


blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 D   -5 -5 -2  8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6    0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  2.49 1.56
 153 Q    3  2 -1  4 -4 -1 -1 -2  0 -2 -3 -3  4 -2 -3 -1 -2 -3 -2 -2   26  10   3  23   0   3   3   3   2   2   1   1  13   2   1   3   2   0   1   2  0.53 1.48
 274 T   -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 -3 -5 -4  1  8 -6 -5 -3    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   7  92   0   0   0  1.94 1.62
 409 T   -1  0  0 -1 -2 -1 -1  0 -1 -1 -1  0 -1 -1  3  0  1  6  0 -1    5   5   5   4   1   3   4   8   1   3   6   5   1   2  13   6   8  11   3   4  0.26 0.95
 489 C    2 -1  1 -4  8 -4 -4 -2 -1 -1 -2 -3 -1 -4 -4  0  0  5 -1 -3   15   4   8   0  36   0   0   2   1   3   3   1   1   0   0   6   5   9   2   0  0.99 1.22
 440 N   -5 -3  6  5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6  2 -2 -3 -6 -6 -5    0   1  46  36   0   1   2   0   0   0   0   1   0   0  10   1   1   0   0   0  1.48 1.67
 356 F   -3 -1 -5 -5 -3  0 -1 -6  1  3  0 -1  0  2 -6 -3 -2 -3  5  3    1   4   0   0   1   5   4   0   3  18   8   5   2   8   0   1   2   0  20  20  0.59 1.62
 193 W   -2  4  2  3 -5  0  0 -2  0 -3 -4  1 -3 -1 -2 -1 -2  1  1 -3    3  25  11  16   0   4   5   3   2   2   1   7   0   2   2   4   2   2   5   2  0.46 1.45
 136 P   -3 -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7  9 -4 -4 -7 -6 -5    1   0   0   0   0   0   0   0   0   0   0   0   0   0  98   0   0   0   0   0  3.03 1.61
 496 R   -3  1  0 -3 -4  1  1 -1  1 -3  1  1 -2  2  4  0 -3 -1 -1 -3    1   7   4   1   0   5  10   4   3   1  16   9   0   9  20   8   1   1   1   1  0.34 0.96