Task 6: Sequence-based mutation analysis

From Bioinformatikpedia
Revision as of 09:07, 27 June 2011 by Meier (talk | contribs) (PSSM)

Task description

A detailed task description can be found here.

Mutation selection

We selected the following ten mutations:

  • I65T
  • R71H
  • R158Q
  • R261Q
  • T266A
  • P275S
  • T278N
  • P281L
  • G312D
  • R408W

5 of these mutations are associated with our disease phenylketonuria the other 5 not. The 5 disease causing mutations are the most frequent missense/nonsense mutations of people who suffer from phenylketonuria. These numbers were taken from pahDB. However, at this point we are not going to tell which of these are the associated and which are not. We are going to lift this "secret" after our sequence based mutation analysis in order to validate our in silico generated predictions.

To keep our association secret we encrypted the file which contains the disease association for each mutation with the following linux command: "vim -x selected_mutations_and_association.txt"

The encrypted output is as follows:

Encrypted img.png


The decryption can be only performed with the correct password.

Physicochemical properties and changes

For the annotation of the physicochemical properties of our amino acids we used the following venn diagram from [1]:

Aa venn diagram.png

Additionally to this we used also the table of this Wikipedia page as a reference.

I65T

I: aliphatic, hydrophob, large, unpolar, neutral
T: small, hydrophob, polar, neutral

R71H

R: polar, aliphatic, positive charged, hydrophob, large, neutral
H: positive charged, polar, aromatic, hydrophob, basic

R158Q

R: polar, aliphatic, positive charged, hydrophob, large, neutral

Q: polar, aliphatic, hydrophob, neutral

R261Q

R: polar, aliphatic, positive charged, hydrophob, large, neutral
Q: polar, aliphatic, hydrophob, neutral

T266A

T: small, hydrophob, polar, neutral

A: tiny, hydrophobic, nonpolar, neutral

P275S

P: small, proline, nonpolar, neutral

S: tiny, polar, neutral

T278N

T: small, hydrophob, polar, neutral

N: small, polar, neutral

P281L

P: small, proline, nonpolar, neutral

L: hydrophobic, aliphatic, nonpolar, neutral

G312D

G: very small, unpolar, neutral
D: polar, small, negative charged, acidic

R408W

R: polar, aliphatic, positive charged, hydrophob, large, neutral
W: polar, aromatic, large, hydrophob, neutral

Visualization of the changed Amino Acid

We used the structure 1J8U to visualize our mutations with Pymol. This structure is only solved from residue 118 to residue 424. Hence we could not visualize the mutations at position 65 and 71.

I65T

Was not included in the used structure.

R71H

Was not included in the used structure.

R158Q

R156Q.png

R261Q

Figure .... Visualization of the mutated residue Q (white) versus the wildtype residue R (green) in the structure of PAH.

T266A

T266A.png

P275S

P275S.png

T278N

T278N.png

P281L

P281L.png

G312D

Figure .... Visualization of the wildtype residue G (blue) in the structure of PAH. Figure .... Visualization of the mutated residue D (white) in the structure of PAH.

R408W

Figure .... Visualization of the mutated residue W (white) versus the wildtype residue R (green) in the structure of PAH.

Mutations compared to BLOSUM62, PAM(1/250), PSSM and conservation of MSA with all mammalian homologous

BLOSUM 62 matrix

BLOSUM 62 Matrix
BLOSUM62 BCKDHA.gif
The BLOSUM 62 was calculated from blocks of clusters with a sequence identity of 62%. A positive score is given to the more likely substitutions while a negative score is given to the less likely substitutions. Source: [2] Disclaimer: This file is redistributed from Wikimedia and copyrighted under the Creative Commons licence.

PAM 1/250 matrix

We took the values for our PAM1 and PAM 250 matrix from this page.

PAM1 Matrix

A R N D C Q E G H I L K M F P S T W Y V
A 9867 2 9 10 3 8 17 21 2 6 4 2 6 2 22 35 32 0 2 18
R 1 9913 1 0 1 10 0 0 10 3 1 19 4 1 4 6 1 8 0 1
N 4 1 9822 36 0 4 6 6 21 3 1 13 0 1 2 20 9 1 4 1
D 6 0 42 9859 0 6 53 6 4 1 0 3 0 0 1 5 3 0 0 1
C 1 1 0 0 9973 0 0 0 1 1 0 0 0 0 1 5 1 0 3 2
Q 3 9 4 5 0 9876 27 1 23 1 3 6 4 0 6 2 2 0 0 1
E 10 0 7 56 0 35 9865 4 2 3 1 4 1 0 3 4 2 0 1 2
G 21 1 12 11 1 3 7 9935 1 0 1 2 1 1 3 21 3 0 0 5
H 1 8 18 3 1 20 1 0 9912 0 1 1 0 2 3 1 1 1 4 1
I 2 2 3 1 2 1 2 0 0 9872 9 2 12 7 0 1 7 0 1 33
L 3 1 3 0 0 6 1 1 4 22 9947 2 45 13 3 1 3 4 2 15
K 2 37 25 6 0 12 7 2 2 4 1 9926 20 0 3 8 11 0 1 1
M 1 1 0 0 0 2 0 0 0 5 8 4 9874 1 0 1 2 0 0 4
F 1 1 1 0 0 0 0 1 2 8 6 0 4 9946 0 2 1 3 28 0
P 13 5 2 1 1 8 3 2 5 1 2 2 1 1 9926 12 4 0 0 2
S 28 11 34 7 11 4 6 16 2 2 1 7 4 3 17 9840 38 5 2 2
T 22 2 13 4 1 3 2 2 1 11 2 8 6 1 5 32 9871 0 2 9
W 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 9976 1 0
Y 1 0 3 0 3 0 1 0 4 1 1 0 0 21 0 1 1 2 9945 1
V 13 2 1 1 3 2 2 3 3 57 11 1 17 1 3 2 10 0 2 9901

PAM250 Matrix

A R N D C Q E G H I L K M F P S T W Y V
A 13 6 9 9 5 8 9 12 6 8 6 7 7 4 11 11 11 2 4 9
R 3 17 4 3 2 5 3 2 6 3 2 9 4 1 4 4 3 7 2 2
N 4 4 6 7 2 5 6 4 6 3 2 5 3 2 4 5 4 2 3 3
D 5 4 8 11 1 7 10 5 6 3 2 5 3 1 4 5 5 1 2 3
C 2 1 1 1 52 1 1 2 2 2 1 1 1 1 2 3 2 1 4 2
Q 3 5 5 6 1 10 7 3 7 2 3 5 3 1 4 3 3 1 2 3
E 5 4 7 11 1 9 12 5 6 3 2 5 3 1 4 5 5 1 2 3
G 12 5 10 10 4 7 9 27 5 5 4 6 5 3 8 11 9 2 3 7
H 2 5 5 4 2 7 4 2 15 2 2 3 2 2 3 3 2 2 3 2
I 3 2 2 2 2 2 2 2 2 10 6 2 6 5 2 3 4 1 3 9
L 6 4 4 3 2 6 4 3 5 15 34 4 20 13 5 4 6 6 7 13
K 6 18 10 8 2 10 8 5 8 5 4 24 9 2 6 8 8 4 3 5
M 1 1 1 1 0 1 1 1 1 2 3 2 6 2 1 1 1 1 1 2
F 2 1 2 1 1 1 1 1 3 5 6 1 4 32 1 2 2 4 20 3
P 7 5 5 4 3 5 4 5 5 3 3 4 3 2 20 6 5 1 2 4
S 9 6 8 7 7 6 7 9 6 5 4 7 5 3 9 10 9 4 4 6
T 8 5 6 6 4 5 5 6 4 6 4 6 5 3 6 8 11 2 3 6
W 0 2 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 55 1 0
Y 1 1 2 1 3 1 1 1 3 2 2 1 2 15 1 2 2 3 31 2
V 7 4 4 4 4 4 4 4 5 4 15 10 4 10 5 5 5 72 4 17

PSSM

The readable PSSM file of PSI-Blast can be generated by the -Q option:
blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e10E-6_i5.blast' -h 10E-6 -j 5 -C './reference_i5_e10E-6.chk' -Q reference_i5_e10E-6.txt