Difference between revisions of "Sequence-based mutation analysis BCKDHA"

From Bioinformatikpedia
(Substitution matrices)
(Precition accuracy of the tools)
Line 633: Line 633:
 
|G249S||true||true||false||false
 
|G249S||true||true||false||false
 
|-
 
|-
|C264W||true||true||false||false
+
|C264W||true||true||true||true
 
|-
 
|-
|R265W||true||true||false||false
+
|R265W||true||true||true||true
 
|-
 
|-
|I326T||true||true||false||false
+
|I326T||true||true||true||true
 
|-
 
|-
|F409C||true||true||false||false
+
|F409C||true||true||true||true
 
|-
 
|-
|Y438N ||true||true||false||false
+
|Y438N ||true||true||true||true
  +
|}
 
  +
<br><br>
  +
{|border="1"
  +
!tool
  +
!true
  +
!false
  +
|-
  +
|SNAP||8||2
  +
|-
  +
|SIFT||8||2
  +
|-
  +
|PolyPhen2 (HumDiv/HumVar)||7/6||3/4
 
|}
 
|}
   

Revision as of 20:09, 26 June 2011

General

We chose the following mutations for the sequence-based mutation analysis:

  • G29E
  • M82L
  • Q125E
  • Y166N
  • G249S
  • C264W
  • R265W
  • I326T
  • F409C
  • Y438N

The mutation positions are relative to the Uniprot reference sequence.


A Protocol was created describing all steps for running the programs etc.

Amino Acid Properties

Reference amino acid Mutated amino acid Structural Difference Secondary Structure
Position Residue Properties Structure Residue Properties Structure
29 G tiny, small E charged, polar C
82 M sulphur containing, hydrophobic
BCKDHA M82L M.png
L hydrophobic, aliphatic
BCKDHA M82L L.png
BCKDHA M82L.png
C
125 Q acidic, polar
BCKDHAQ80E Q.png
E charged, polar
BCKDHAQ80E E.png
BCKDHAQ80E.png
C
166 Y hydrophobic, aromatic, polar
BCKDHA Y121N Y.png
N acidic, polar, small
BCKDHA Y121N N.png
BCKDHA Y121N.png
H
249 G tiny, small
BCKDHA G204S G.png
S polar, small, tiny, hydroxylic
BCKDHA G204S S.png
BCKDHA G204S.png
H
264 C sulphur containing, hydrophobic, tiny, small, polar
BCKDHA C219W C.png
W hydrophobic, aromatic, polar
BCKDHA C219W W.png
BCKDHA C219W.png
E
265 R charged, positive (basic), polar
BCKDHA R220W R.png
W hydrophobic, aromatic, polar
BCKDHA R220W W.png
[[File:
BCKDHA R220W.png
.png|thumb|100px]]
E
326 I aliphatic, hydrophobic
BCKDHA I281T I.png
T hydroxylic, hydrophobic, small, polar
BCKDHA I281T T.png
BCKDHA I281T.png
E
409 F aromatic, hydrophobic
BCKDHA F364C F.png
C sulphur containing, hydrophobic, tiny, small, polar
BCKDHA F364C C.png
BCKDHA F364C.png
C
438 Y hydrophobic, aromatic, polar
BCKDHA Y393N Y.png
N acidic, polar, small
BCKDHA Y393N N.png
BCKDHA Y393N.png
C

Annotation: H = helix, E = beta-sheet, C = coil

To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.

The Protocol describes in detail the way how we used pymol to visualize our mutations.

Discussion

Looking at the differences in structure and biochemical properties properties of the reference amino acid and the mutated amino acid, someone might draw conclusions about the effect of the mutation on the proteins' function.

G29E
This prediction lies in the signal peptide of the protein. Therefore the mutation has no direct effect on the function of the protein. Anyhow, looking at the biochemical properties, they are quite different and especially Glycin has due to its unique smallness a very special role in most of the protein sequences and a mutation might destroy the protein structure. From this point of view this mutation might cause an effect on the protein's function.
M82L
Methionine and leucine are both hydrophobic amino acids with an similar size. The loss of the sulfur-containing methionine could lead to an disease causing effect but as the structures are quite similar this mutation might be tolerated. Furthermore the amino acid does not contribute in the formation of any important secondary structure, so a substitution here can be even more be tolerated.
Q125E
The structure of these two amino acids is almost identical. The only difference is the substitution of an hydroxy group with an amino group. This changes the physical property from uncharged to a negative charge, which might influence the protein's function.
Y166N
This substitution lead to a mutation of an hydrophobic aromatic residue to a small polar amino acid. These differences in the amino acids' properties are likely to change the proteins' structure as this residue is located in a helix and therefore this mutation is very likely to affect the protein's function.
G249S
This mutation introduces a polar, hydroxylic amino acid to a position in a helix where usually a tiny, unpolar amino acids is located. As the size of both amino acids is quite small, therefore a mutation might be benign.
C264W
This mutation has a huge impact on the structure and physiochemical properties of the amino acid at this position. A small, sulphur-containing amino acid is replaced by an aromatic amino acid, which occupies a lot more space. The hydrophobicity and polarity remain the same, nevertheless are the amino acids very different and this mutation will destroy the protein's function.
R265W
Here a positively charged amino acid is substituted by a hydrophobic one. This change is quite severe, therefore we assume the protein's function cannot be maintained.
I326T
This mutation introduces a hydroxy group to the amino acid which makes it polar. This is quite an important change which might have an influence on the function of the protein.
F409C
This substitution leads to totally different amino acids concerning structure and physiochemical properties. A bulky aromatic residue is substituted by a small, polar, sulphur-containing amino acid. These drastic changes might very likely change the protein's structure and therefore affect its function.
Y438N
This mutation also changes the amino acid completely. The big aromatic residue which is hydrophobic is substituted by a small polar one. These differences are likely to affect the protein function.

Substitution matrices

Position AA1/AA2 BLOSUM62 PAM1 PAM250 result
score worst score worst score worst
29 G/E -2 -4 (I, L) 7 0 (I, W, Y) 9 2 (W) BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous
82 M/L 2 -3 (D, G) 8 0 (N, D, C, E, G, H, P, W, Y) 3 0 (C) The three values are positive and quite high relative to the other values of Methionin. This means that all thress matrices indicate that this mutation occurs quite often
125 Q/E 2 -3 (C, F, I) 27 0 (F, W, Y) 7 1 (C, F, W) all three substitution matrices show that this mutation occurs quite often
166 Y/N -2 -3 (D, G, P) 3 0 (R, D, Q, G, K, M, P) 2 1 (A, R, D, Q, E, G, K, P) Since the values of the three matrices are low the mutation does not occur very often, therefore it is not very probably
249 G/S 0 -4 (I, L) 21 0 (I, W, Y) 11 2 (W) All three values are high what indicates that this mutation is quite common and therefore probably not very damaging
264 C/W -2 -4 (E) 0 0 (N, D, Q, E, G, L, K, M, F, W) 1 1 (R, N, D, Q, E, L, K, M, F, W) The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein
265 R/W -3 -3 (W, V, F, I, C) 8 0 (D, E, G, Y) 7 1 (F) PAM1 and PAM250 have high values whereas BLOSUM62 has a low value for this substitution. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite common and so not very damaging
326 I/T -1 -4 (G) 7 0 (G, A, P, W) 4 1 (W) Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 indicate that the mutation have no negative influence on the protein
409 F/C -2 -4 (P) 0 0 (D, C, Q, E, K, P, V) 1 1 (R, D, C, Q, E, G, K, P) The three values are all very low which means that this mutation is very rare. This indicates that the mutation affects the function and the structure of the protein
438 Y/N -2 -3 (D, G, P) 3 0 (R, D, Q, G, K, M, P) 2 1 (A, R, D, Q, E, G, K, P) Again the scores are all low which indicates the damaging effect of the mutation


BLOSUM62
PAM1
PAM250

PSSM


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 G    1  0  0 -2  1  0 -2  4  1 -1 -3 -1 -3 -2 -3  1 -1 -4 -1 -1   10   6   5   1   3   4   1  37   3   5   2   3   0   2   0   9   2   0   2   6  0.37 0.77
  82 M   -4 -5 -6 -6 -3 -5 -5 -6 -5  2  5 -5  5 -2 -5 -5 -3 -3 -3  1    0   0   0   0   0   0   0   0   0  12  67   0  14   0   0   0   0   0   1   6  1.23 1.27
 125 Q   -1 -1 -3 -3 -5  8  0 -4  0 -3 -4 -1 -1 -6 -4 -1  1 -5 -4 -4    4   2   0   0   0  74   2   0   1   1   1   2   1   0   0   2   9   0   0   0  1.46 1.28
 166 Y    3 -3 -4 -4  3  0 -3 -4  1 -2 -2 -3  0 -2 -4 -1  1  7  3  1   24   1   0   0   7   5   1   1   3   1   3   1   2   1   0   3   8  15  12  11  0.62 1.29
 249 G    5 -4 -3 -4 -4 -3 -2  4 -4 -5 -5 -3 -4 -5 -4  1 -2 -5 -5 -4   54   0   0   0   0   0   3  35   0   0   0   0   0   0   0   8   1   0   0   0  1.12 1.21
 264 C   -2 -5 -3 -5  9 -5 -5 -5 -5  3 -2 -5 -3 -4 -1 -3 -3 -5 -4  4    1   0   2   0  45   0   0   0   0  15   2   0   0   0   4   1   1   0   0  29  1.43 1.18
 265 R   -3  4  2 -3 -5  5  2 -4  0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2  0    0  25  12   0   0  34  15   0   1   2   0   1   0   0   0   1   1   0   1   7  0.88 1.21
 326 I   -3 -5 -6 -6 -4 -5 -6 -6 -6  7  0 -5  0 -2 -5 -5 -3 -5 -4  4    0   0   0   0   0   0   0   0   0  66   6   0   1   1   0   0   0   0   0  26  1.40 1.17
 409 F   -4 -3 -6 -6 -3 -5 -5 -6 -3  0  1 -5  1  8 -6 -5 -3  0  1 -1    1   1   0   0   1   0   0   0   0   5  11   0   3  69   0   0   1   1   3   4  1.56 1.31
 438 Y    0 -2 -2 -4 -2 -1 -1 -3  3 -3 -3 -3 -3  1 -5 -3 -3  3  8 -2    9   2   1   0   1   3   3   1   6   0   1   0   0   1   0   0   0   3  66   2  1.34 0.89

The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.

The Q125E mutation has a score of 0.

The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.

The highest score for our substitutions is +5 for the M82L mutations. Accoring to the pssm this mutation is conserved and likely to be tolerated. This can be explained by the similar size and hydrophobic nature of both amino acids.

Secondary Structure

To find out if the mutation has an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred

We compared the structure for each position:

Position 29          
seq: SQAALLLLRQPGARGLARSHPPRQQQQFSSLDDK non-mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC
Position 82
seq: VISGIPIYRVMDRQGQIINPSEDPHLPKEKV non-mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH Position 125
seq: KEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYG non-mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC
Position 166
seq: EAGVLMYRDYPLELFMAQCYG non-mut: HHHHHHHCCCCHHHHHHHHCC mut: CHHHHHHCCCCHHHHHHHHCC
Position 249
seq: VVICYFGEGAASEGDAHAGFNFAATLECP non-mut: EEEEEECCCCCCHHHHHHHHHHHHHHCCC mut: EEEEEECCCCCCCHHHHHHHHHHHHCCCC
Position 264
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
Position 265
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
Position 326
seq: RAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS non-mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC
Position 409
seq: KPKPNPNLLFSDVYQEMPAQL non-mut: CCCCCHHHHHHHHHCCCCHHH mut: CCCCCHHHHHHHHHCCCCHHH
Position 438
seq: QEMPAQLRKQQESLARHLQTYGEHYPLDHFDK non-mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC

As we can see by comparing the secondary structure of all the positions, most of the mutations have no influence on the secondary structure since there are no changes on the position of the mutation or in the neighbourhood. On the positions 166, 249, 264 and 265 the mutations have an influence on the structure. The mutation on position 166 has an influence on the secondary structure 6 residues earlier because the helix which starts normally at position 160 now starts at position 161. Because of the mutation on position 249 the surrounding helix is shorter because it starts one residue later and ends one residue earlier than without mutation. Because the mutations 264 and 265 are next to each other it is not clear which of them is responsibel for the change in the secondary structure or if it is the combination of the two mutations. Nevertheless there is a change in the neighbourhood of these mutations because four or five residues after the mutation occurs a beta sheet which is not in the wildtype structure. Additionally the helix which should start 19 or 20 residues after the mutation starts one position earlier.

Multiple Sequence Alignment

To find the homologue sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are mammals.

ID Accession Entry name
sp P11178 ODBA_BOVIN
sp P12694 ODBA_HUMAN
sp Q8HXY4 ODBA_MACFA
sp P50136 ODBA_MOUSE
sp A5A6H9 ODBA_PANTR
sp P11960 ODBA_RAT
tr Q6ZSA3 Q6ZSA3_HUMAN
tr E7ESE6 E7ESE6_HUMAN
tr B2R8A9 B2R8A9_HUMAN
tr Q658P7 Q658P7_HUMAN
tr E7EW46 E7EW46_HUMAN
tr B4DP47 B4DP47_HUMAN
tr Q59EI3 Q59EI3_HUMAN
tr F1N5F2 F1N5F2_BOVIN
tr B1PK12 B1PK12_PIG
tr E2RPW4 E2RPW4_CANFA
tr B2LSM3 B2LSM3_SHEEP
tr F1RHA0 F1RHA0_PIG
tr F1PI86 F1PI86_CANFA
tr D2HMT3 D2HMT3_AILME
tr Q2TBT9 Q2TBT9_BOVIN
tr Q3U3J1 Q3U3J1_MOUSE
tr Q99L69 Q99L69_MOUSE
tr Q5EB89 Q5EB89_RAT
tr B1WBN3 B1WBN3_RAT
Multiple Alignment of the homologous sequences of BCKDHA with CLUSTALW

With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologous was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences.

With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.

position conservation wildtype conservation mutant
29 0.7 0
82 0.96 0
125 0.96 0.04
166 1 0
249 1 0
264 1 0
265 1 0
326 0.91 0
409 0.91 0
438 0.91 0

The results show that all amino acids on the observed positions are really good conserved since the value is always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for the structure and the function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function.

SNAP

To run SNAP we used the command:

snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out

nsSNP Prediction Reliability Index Expected Accuracy
G29E Neutral 0 53%
M82E Non-neutral 4 82%
Q125E Non-neutral 1 63%
Y166N Non-neutral 2 70%
G249S Neutral 1 60%
C264W Non-neutral 4 82%
R265W Non-neutral 4 82%
I326T Non-neutral 3 78%
F409C Non-neutral 4 82%
Y438N Non-neutral 4 82%

The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29 and 249 would not have an influence on the protein.


A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions. This run should show whether the substituted amino acid is essential at the corresponding position of the sequence or the mutation in not tolerated because an unwanted effect was introduced by drastically changing the physiochemical properties of the amino acid.

The following table shows for each position to what extend each of the positions is predicted to tolerate mutations.

Mutation Tolerated Substitutions Non-tolerated Substitutions Ratio tolerated Mutations
G29E ARNDCQEGHILKMFPSTWYV 100%
M82E ILM ARNDCQEGHKFPSTWYV 15%
Q125E Q ARNDCQEGHILKMFPSTWYV 5%
Y166N QHMFWY ARNDCEGILKPSTV 30%
G249S AGS RNDCQEHILKMFPTWYV 15%
C264W C ARNDQEGHILKMFPSTWYV 5%
R265W R ANDCQEGHILKMFPSTWYV 5%
I326T ILV ARNDCQEGHKMFPSTWY 15%
F409C F ARNDCQEGHILKMPSTWYV 5%
Y438N Y ARNDCQEGHILKMFPSTWV 5%

This table shows that only the position 29 is not essential for the protein's functions. This is explainable by the fact that this mutation lies not within the actual protein sequence.

The position which allows for the most substitutions is the tyrosin at position 166. This tyrosin constitutes the end of a helix on the surface of the protein, a fact which might explain the variabilty of this position. The selected mutation to asparagine is not tolerated due to a quite different structure.

Position 82 allows the mutation for 3 amino acids, which are structurally all very similar. So for this position the structure seems to be important. The same is true for position 249 where alanine, glycine and serine are predicted to be neutral to the protein function. All of these amino acids are quite small and may therefore not disturb the protein structure and function.

In position 326 all three branched-chain amino acids are predicted to have a neutral effect on the protein's function. These amino acids are structurally and biochemically very similar and therefore a substitution is tolerated. The mutation to threonine however is not tolerated as this amino acid has different properties.

All other mutations are not tolerated at all. This means for these positions the wild-type residues are totally essential for the protein's structure and function and cannot be replaced by any other amino acid.

SIFT

The following table displays the SIFT results. The threshold for intolerance is 0.05.

The amino acids are colored in the following way:

  • nonpolar
  • uncharged polar
  • basic
  • acidic

Capital letters: amino acids appear in the alignment

Lower case letters: amino acids result from prediction

Seq Rep:fraction of sequences that contain one of the basic amino acids

Position Reference AA Mutated AA SIFT prediction SIFT Matrix Prediction
Predict Not Tolerated Seq Rep Predict Tolerated BCKDHA aa.PNG
29 G E 0.37 wcmPdIGnqRhVTkeSFLAy BCKDHA 29.PNG tolerated
82 M L dhgnweyrkspqafvi 0.91 TCLM BCKDHA 82.PNG tolerated
125 Q E ywvtsrpnmlkihgfedca 0.98 Q BCKDHA 125.PNG not tolerated
166 Y N cpdmeqkngrtisval 1.00 FHYW BCKDHA 166.PNG not tolerated
249 G S whyfimrqnlckdvtps 1.00 EGA BCKDHA 249.PNG not tolerated
264 C W ywvtsrqpnmlkihgfeda 0.98 C BCKDHA 264.PNG not tolerated
265 R W ywvtsqpnmlkihgfedca 1.00 R BCKDHA 265.PNG not tolerated
326 I T hdwpneqcrsgkytaM 1.00 FLVI BCKDHA 326.PNG not tolerated
409 F C hndkrqgecpstamvwiy 1.00 LF BCKDHA 409.PNG not tolerated
438 Y N wvtsrqpnmlkihgfedca 1.00 Y BCKDHA 438.PNG not tolerated

The only substitutions SIFT predicts not to affect protein function are G29E and M82L. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from methionine to leucine........ These two amino acids quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.

Polyphen2

Position AA1/AA2 HumDiv HumVar
prediction Score Sensitivity Specificity prediction Score Sensitivity Specificity
29 G/E benign 0.025 0.96 0.80 benign 0.018 0.96 0.52
82 M/L benign 0.001 0.99 0.15 benign 0.001 0.99 0.08
125 Q/E possibly damaging 0.759 0.85 0.93 benign 0.285 0.87 0.75
166 Y/N probably damaging 0.997 0.40 0.98 probably damaging 0.964 0.59 0.93
249 G/S benign 0.145 0.93 0.86 benign 0.292 0.86 0.75
264 C/W probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
265 R/W probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
326 I/T probably damaging 0.997 0.40 0.98 probably damaging 0.998 0.16 0.99
409 F/C probably damaging 0.998 0.27 0.99 probably damaging 0.939 0.64 0.92
438 Y/N probably damaging 1.000 0.00 1.00 probably damaging 0.987 0.49 0.96

Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The two mutations which are in both datasets marked as "benign" are on the positions 29 and 249. The mutation which is only in the HumVar dataset predicted as benign is on position 125.

Comparison

Comparison of the predicted results of this TASK

Position AA1/AA2 BLOSUM62 PAM1 PAM250 PSSM Secondary Structure Multiple Alignment SNAP SIFT Polyphen2
Prediction Conservation wildtype Prediction Prediction HumDiv HumVar
29 G/E non-neutral neutral neutral non-neutral neutral neutral neutral neutral neutral neutral
82 M/L neutral neutral neutral neutral neutral non-neutral non-neutral neutral neutral neutral
125 Q/E neutral neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
166 Y/N non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
249 G/S non-neutral neutral neutral neutral non-neutral non-neutral neutral non-neutral neutral neutral
264 C/W non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
265 R/W non-neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
326 I/T non-neutral neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
409 F/C non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
438 Y/N non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
G29E
On position 29 is the mutation from Glycine to Glutamic acid. Except of BLOSUM62 and PSSM all of the other tools and sources point out that this mutation is neutral. It is interesting that these two sources have quite low scores which indicates that they are sure about their prediction. But the fact that the other matrices have very high values and that this position is not very conserved with only 72% shows that this amino acid changes quite often. Because of this actuality it is possible that the mutation is neutral. Also the result of the secondary structure prediction confirms us in this presumption because there is no change between the original and the mutated secondary structure. Additionally the residue is a coil region which means it does not endoce a structural element. It has to be mentioned that SNAP is not absolutely sure about its prediction of neutrality which can be seen on the scores. It has the score of 0 shich is the lowest one and so it is very unsure. In contrast the tools SIFT and PolyPhen2 are sure about their prediction because SIFT has a quite high score (0.68)and PolyPhen2 has two really low scores (0.025 and 0.018) which shows that these mutations are certain. Since nearly all methods predict the mutation as neutral and most of them have really good scores for that we also predict the mutation as neutral.
M82L
Q125E
Based on the predictions of the different tools it is not possible to decide wether this mutation from Glutamine to Glutamic acid is neutral or not because the tools say completely different things. By looking at the protein itself it can be seen that there is a structural difference between the two proteins since it is not possible to superpose them perfectly. But the difference between the two structures seems to be minimal which can be the explanation that there are different predictions. It is also important to recognize that most of the tools are not completely sure with the prediction they made. For example the PSSM value is 0 which is exactly on the border between neutral or not. Also SNAP is not very sure at all because the reliability index is only 1. In contrast the preditino of SIFT is very save since the value for this mutation is 0.0 which is the lowest possible value. And also PolyPhen2 is unassured given that the two different datasets predict two different things. One says neutral and the other one non-neutral. Because of this many different results it can be assumed that if this mutation is non-neutral the influence is minimal. Since the experimental structure is different we would say that this mutation is non-neutral but as already said and as it can be seen the influence is minimal and so the difference is also not that grave.
Y166N
The mutation from Tyrosine to Asparagine is predicted to be non-neutral by all of the tools and sources. The values of the substitution matrices and the PSSM score are all low which shows that this mutation does not occur very often. The rareness of a change on this position is also shown by the conservation score which is 1. This reflects that the amino acid at position 166 has an important role for the structure and function of the protein and so the mutation to another amino acid is not neutral. Also the secondary structure prediction points out that there are changes because of the different amino acid. Especially the tool PolyPhen2 shows how certain this prediction is, since the scores are 0.997 (HumDiv) and 0.964 (HumVar) where 1.0 is the highest score. The prediction can be confirmed by looking at the experimental structure of the two different proteins. The mutant protein has no aromatic ring while the original protein has one. Because of this structural difference the pysicochemical properities of the two proteins are not the same.
G249S
The results of the several tools and sources are controvert to each other again. While BLOSUM62 predicts that the mutation is non-neutral, PAM1, PAM250 and the PSSM score shows that the mutation is neutral. But it has to point out that the value of BLOSUM62 is 0 which means that it is on the border to neutral. So out of the substitutions matrices and the PSSM score the mutation is neutral. But by looking on the predicted secondary structure it can be seen that there are differences predicted since the helix in the mutated protein is two positions shorter than the original one. This indicates that the mutation is non-neutral and also the fact that this position is 100% conserved let us assume that the mutation would have an influence on the structure and function2of the protein. SNAP predits the mutation to be neutral but given that the reliability index is only 1 indicates that it is nott sure. In contrast to SNAP, SIFT determines the mutation as non-neutral and is very sure because the value of Serine is 0.0 which is the lowest value and indicates mostly that a mutation is non-neutral. The two predictions of PolyPhen2 declare this SNP as neutral. All in all it is nor clear how the effect of this mutation is and again it is possible that if there is a change in the structure of the protein it is not grave since the different programs are that unsure. By looking at the experimental structure it can be seen that the mutated protein has an additionally side chain. Because of this fact we predict this mutation as non-neutral.
C264W
The declaration of this mutation as non-neutral is clear since all tools predict it as non-neutral. Basing on the subsitution matrices it can be seen that this prediction is certain because the values of all three matrices are very low. Also the PSSM score is one of the lowest possible values which also indicates that the change of this amino acid is very rare. The conservation score points can be interpreted on the same way since it is 1 which means that this mutation occurs very rare and it can be assumed that this amino acid has an important role in the structure and function of the protein. With this knowledge the mutation has to be non-neutral. The three tools SNAP, SIFT and PolyPhen2 are all absolutely sure about there prediction because all of them have the best possible scores predicting the influence of the mutation. By looking on the experimental structure and the differences between the two proteins the assumption that the mutation is non-neutral is tightened because the two proteins have a completely different structure and fold.
R265W
This mutation is predicted by nearly every tool or source to be non-neutral. Only PAM1 and PAM250 declare this position as neutral with high scores of 8 and 7. This is currious because all other tools and sources are absolutely sure that this a change of this amino acid will have an effect on the protein. The PSSM score is -5 which is really low and shows that it is unlikely that a mutation at this position is neutral and also the conservation score which is 1 shows that this amino acid is very good conserved and a change would be fatal. The tools SNAP, SIFT and PolyPhen2 predicts this mutation as definitly damaging. To be really sure we also imply the predicted and the experimental structure of the protein. By looking on the predicted structure it is clear that there is a change because a beta sheet occurs although there is none in the original protein and additionally the coil after the mutation is one residue shorter than in the wildtype. Also the experimental structure shows that the proteins have a completely different structure and folding. Because of all this facts we woulg declare this mutation as non-neutral.
I326T
Most of the prediction methods see this mutation as beeing non-neutral. Only PAM1, PAM250 and the predicted secondary structure declares this mutation as neutral. It is interesting that the two substitution matrices have high values of 7 and 4 which shows that they are quite sure about their prediction. And also by looking at the structure this guess is supported because no change occurs. But all the other methods predict this mutation beeing non-neutral. BLOSUM62 and PSSM for example have a very low value which shows that it is sure that this mutation is non-neutral. This position is also very good conserved which prefigures that a change on this position does not occur often and that a change would have a damaging influence on the protein. And also the three prediction methods declare the SNP to be damaging with a very high assurance. The structure also indicates that there is an influence by this mutation since the protein which is expressed after the mutation misses a side chain. To summarize all the results we also predict the mutation as beeing non-neutral.
F409C
Except of the predicted secondary structure all of the methods declare the mutation from Phenylalanine to Cysteine as non-neutral. It is possible that the mutation have an effect on a very distant position and so it is not detected by looking on the next neighbourhood of the mutated position. this can be an explanation for this prediction. All the other methos are sure about the prediction which can be seen for example on the very low values in the substitution matrices or of the PSSM score. And the three prediction tools have also score which point out that the mutation is clearly non-neutral. To safeguard this prediction we also look on the amino acids and their proterties. This comperison ensures the prediction since the only equal property is the hydrophobicity. The wildtype amino acid is additionally aromatic whereas the mutated amino acid is sulphur containing, tiny, small and polar. This shows that the two amino acids are nearly completely different. This awareness is repeated by looking on the seondary structure and the differences. The three pictures demonstrate that the structures are different. since the mutated protein has no aromatic ring. Because of all these disparities we decide that the mutation is non-neutral.
Y438N
On position 438 it is nearly the same as on position 409. The mutation of Tyrosine to Asparagine is predicted as non-neutral of nearly all methods. Only the predicted secondary structure declares the mutation as neutral. One explanation for this result is that the mutation influences a position which is far away. So the change in the secondary structure can not be recognized. This is very probable since all the other methods predict it as non-neutral. The prediction by the substituation matrices and the PSSM is not as sure as the one of the mutation on position 409 since the values are not as low as them but still they are low enough to decide that the mutation will have an influence on the structure and function of the protein. SNAP and SIFT are as sure as on the last mutation which means that this result is absolutely sure since the values are very good (SNAP:4, SIFT:0.0). PolyPhen2 is even more certain about the result because it predicts the damaging influence of this mutation with 1.0 (HumDiv) and 0.987 (HumVar). Again we also look on the pysicochemical properities and the structure of the two amino acids to find the differences. The only property which is the same in both amino acids is the polarity. The other properties are completely different since the mutated amino acid is acidic and small whereas the original amino acid is hydrophobic and aromatic. The missing aromatic property in the mutated amino acid indicates that there also have to be two different structures because the mutated protein won't have an aromativ ring. By looking at the structure this assumption is confirmed. Since all the methods and also the structure and the properties declare the mutation as non-neutral we also predict it as non-neutral.

Comparison of the prediction of the tools with the dbSNP and HGMD

Position AA1/AA2 dbSNP/HGMD own prediction SNAP SIFT PolyPhen2
29 G/E non-neutral/- neutral neutral neutral neutral/neutral
82 M/L non-neutral/- non-neutral neutral neutral/neutral
125 Q/E -/non-neutral non-neutral non-neutral non-neutral non-neutral/neutral
166 Y/N -/non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
249 G/S non-neutral/non-neutral non-neutral neutral non-neutral neutral/neutral
264 C/W non-neutral/- non-neutral non-neutral non-neutral non-neutral/non-neutral
265 R/W non-neutral/ non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
326 I/T -/non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
409 F/C non-neutral/non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
438 Y/N non-neutral/non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral



Precition accuracy of the tools

Mutation SNAP SIFT PolyPhen2
HumDiv HumVar
G29E false false false false
M82L true false false false
Q125E true true true false
Y166N false true true true
G249S true true false false
C264W true true true true
R265W true true true true
I326T true true true true
F409C true true true true
Y438N true true true true



tool true false
SNAP 8 2
SIFT 8 2
PolyPhen2 (HumDiv/HumVar) 7/6 3/4

go back to Maple_syrup_urine_disease main page
go back to Task 5 Mapping SNPs