Difference between revisions of "Sequence-based mutation analysis BCKDHA"

From Bioinformatikpedia
(Comparison of the predicted results of this TASK)
(Precition accuracy of the tools)
 
Line 631: Line 631:
 
|}
 
|}
   
==== Precition accuracy of the tools ====
+
==== Prediction accuracy of the tools ====
   
 
The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)
 
The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)

Latest revision as of 10:37, 26 August 2011

General

We chose the following mutations for the sequence-based mutation analysis:

Figure 1: Reference amino acids which are substituted coloured as follows: SNP is listed only in dbSNP, only in HGMD, in HGMD and dbSNP
Figure 2: Protein with mutated positions
  • G29E
  • M82L
  • Q125E
  • Y166N
  • G249S
  • C264W
  • R265W
  • I326T
  • F409C
  • Y438N

The mutation positions are relative to the Uniprot reference sequence. Figure 1 shows the BCKDHA protein where the chosen mutation positions are colored according to the source, which listed the mutation. Figure 2 shows the protein where the amino acids listed above are mutated.


A Protocol was created describing all steps for running the programs etc.

Amino Acid Properties

Reference amino acid Mutated amino acid Structural Difference Secondary Structure
Position Residue Properties Structure Residue Properties Structure Prediction in protein structure
29 G tiny, small E charged, polar C
82 M sulphur containing, hydrophobic
Methionine on position 82
L hydrophobic, aliphatic
Leucine on position 82
Wt and mut on pos 82
C
Pos 82 in protein structure
125 Q acidic, polar
Glutamine on position 125
E charged, polar
Glutamic acid on position 125
Wt and mut on pos 125
C
Pos 125 in protein structure
166 Y hydrophobic, aromatic, polar
Tyrosine on position 166
N acidic, polar, small
Asparagine on position 166
Wt and mut on pos 166
H
Pos 166 in protein structure
249 G tiny, small
Glycine on position 249
S polar, small, tiny, hydroxylic
Serine on position 249
Wt and mut on pos 249
H
Pos 249 in protein structure
264 C sulphur containing, hydrophobic, tiny, small, polar
Cysteine on position 264
W hydrophobic, aromatic, polar
Tryptophan on position 264
Wt and mut on pos 264
E
Pos 264 in protein structure
265 R charged, positive (basic), polar
Arginine on position 265
W hydrophobic, aromatic, polar
Tryptophan on position 265
Wt and mut on pos 265
E
Pos 265 in protein structure
326 I aliphatic, hydrophobic
Isoleucine on position 326
T hydroxylic, hydrophobic, small, polar
Threonine on position 326
Wt and mut on pos 326
E
Pos 326 in protein structure
409 F aromatic, hydrophobic
Phenylalanine on position 409
C sulphur containing, hydrophobic, tiny, small, polar
Cysteine on position 409
Wt and mut on pos 409
C
Pos 409 in protein structure
438 Y hydrophobic, aromatic, polar
Tyrosine on position 438
N acidic, polar, small
Asparagine on positiono 438
Wt and mut on pos 438
C
Pos 438 in protein structure

Annotation: WT = Wildtype Amino acid, Mut = mutated amino acid H = helix, E = beta-sheet, C = coil

To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.

The Protocol describes in detail the way how we used pymol to visualize our mutations.

Discussion

Looking at the differences in structure and biochemical properties of the reference amino acid and the mutated amino acid, someone might draw conclusions about the effect of the mutation on the proteins' function.

G29E
This prediction lies in the signal peptide of the protein. Therefore the mutation has no direct effect on the function of the protein. Anyhow, looking at the biochemical properties, they are quite different and especially Glycin has due to its unique smallness a very special role in most of the protein sequences and a mutation might destroy the protein structure if it was within the actual protein sequence. From this point of view this mutation might cause an effect on the protein's function.
M82L
Methionine and leucine are both hydrophobic amino acids with an similar size. The loss of the sulfur-containing methionine could lead to an disease causing effect but as the structures are quite similar this mutation might be tolerated. Furthermore the amino acid does not contribute in the formation of any important secondary structure, so a substitution here can be even more be tolerated.
Q125E
The structure of these two amino acids is almost identical. The only difference is the substitution of an hydroxy group with an amino group. This changes the physical property from uncharged to a negative charge, which might influence the protein's function.
Y166N
This substitution leads to a mutation of an hydrophobic aromatic residue to a small polar amino acid. These differences in the amino acids' properties are likely to change the proteins' structure as this residue is located in a helix and therefore this mutation is very likely to affect the protein's function.
G249S
This mutation introduces a polar, hydroxylic amino acid to a position in a helix where usually a tiny, unpolar amino acids is located. As the size of both amino acids is quite small, therefore a mutation might be benign.
C264W
This mutation has a huge impact on the structure and physiochemical properties of the amino acid at this position. A small, sulphur-containing amino acid is replaced by an aromatic amino acid, which occupies a lot more space. The hydrophobicity and polarity remain the same, nevertheless are the amino acids very different and this mutation will destroy the protein's function.
R265W
Here a positively charged amino acid is substituted by a hydrophobic one. This change is quite severe, therefore we assume the protein's function cannot be maintained.
I326T
This mutation introduces a hydroxy group to the amino acid which makes it polar. This is quite an important change which might have an influence on the function of the protein.
F409C
This substitution leads to a totally different amino acid concerning structure and physiochemical properties. A bulky aromatic residue is substituted by a small, polar, sulphur-containing amino acid. These drastic changes might very likely change the protein's structure and therefore affect its function.
Y438N
This mutation also changes the amino acid completely. The big aromatic residue which is hydrophobic is substituted by a small polar one. These differences are likely to affect the protein function.

Substitution matrices

Now we take a look at substitution matrices like BLOSUM62 (Figure 3), PAM1 (Figure 4) and PAM250 (Figure 5). We looked at the scores for the real mutations given by HGMD and dbSNP and compared these substitutions with the worst possible substitutions for the corresponding amino acids.

Figure 3: BLOSUM62
Figure 4: PAM1
Figure 5: PAM250


Position AA1/AA2 BLOSUM62 PAM1 PAM250 result
score worst score worst score worst
29 G/E -2 -4 (I, L) 7 0 (I, W, Y) 9 2 (W) BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous
82 M/L 2 -3 (D, G) 8 0 (N, D, C, E, G, H, P, W, Y) 3 0 (C) The three values are positive and quite high relative to the other values of Methionin. This means that all thress matrices indicate that this mutation occurs quite often
125 Q/E 2 -3 (C, F, I) 27 0 (F, W, Y) 7 1 (C, F, W) All three substitution matrices show that this mutation occurs quite often
166 Y/N -2 -3 (D, G, P) 3 0 (R, D, Q, G, K, M, P) 2 1 (A, R, D, Q, E, G, K, P) Since the values of the three matrices are low the mutation does not occur very often which suggests that it could effect a change in the proteins' function
249 G/S 0 -4 (I, L) 21 0 (I, W, Y) 11 2 (W) All three values are high what indicates that this mutation is quite common and therefore probably not very damaging
264 C/W -2 -4 (E) 0 0 (N, D, Q, E, G, L, K, M, F, W) 1 1 (R, N, D, Q, E, L, K, M, F, W) The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein
265 R/W -3 -3 (W, V, F, I, C) 8 0 (D, E, G, Y) 7 1 (F) PAM1 and PAM250 have high values whereas BLOSUM62 has a low value for this substitution. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite common and so not very damaging
326 I/T -1 -4 (G) 7 0 (G, A, P, W) 4 1 (W) Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 indicate that the mutation have no negative influence on the protein
409 F/C -2 -4 (P) 0 0 (D, C, Q, E, K, P, V) 1 1 (R, D, C, Q, E, G, K, P) The three values are all very low which means that this mutation is very rare. This suggests that the mutation affects the function and the structure of the protein
438 Y/N -2 -3 (D, G, P) 3 0 (R, D, Q, G, K, M, P) 2 1 (A, R, D, Q, E, G, K, P) Again the scores are all low which indicates the damaging effect of the mutation

PSSM


Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 G    1  0  0 -2  1  0 -2  4  1 -1 -3 -1 -3 -2 -3  1 -1 -4 -1 -1   10   6   5   1   3   4   1  37   3   5   2   3   0   2   0   9   2   0   2   6  0.37 0.77
  82 M   -4 -5 -6 -6 -3 -5 -5 -6 -5  2  5 -5  5 -2 -5 -5 -3 -3 -3  1    0   0   0   0   0   0   0   0   0  12  67   0  14   0   0   0   0   0   1   6  1.23 1.27
 125 Q   -1 -1 -3 -3 -5  8  0 -4  0 -3 -4 -1 -1 -6 -4 -1  1 -5 -4 -4    4   2   0   0   0  74   2   0   1   1   1   2   1   0   0   2   9   0   0   0  1.46 1.28
 166 Y    3 -3 -4 -4  3  0 -3 -4  1 -2 -2 -3  0 -2 -4 -1  1  7  3  1   24   1   0   0   7   5   1   1   3   1   3   1   2   1   0   3   8  15  12  11  0.62 1.29
 249 G    5 -4 -3 -4 -4 -3 -2  4 -4 -5 -5 -3 -4 -5 -4  1 -2 -5 -5 -4   54   0   0   0   0   0   3  35   0   0   0   0   0   0   0   8   1   0   0   0  1.12 1.21
 264 C   -2 -5 -3 -5  9 -5 -5 -5 -5  3 -2 -5 -3 -4 -1 -3 -3 -5 -4  4    1   0   2   0  45   0   0   0   0  15   2   0   0   0   4   1   1   0   0  29  1.43 1.18
 265 R   -3  4  2 -3 -5  5  2 -4  0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2  0    0  25  12   0   0  34  15   0   1   2   0   1   0   0   0   1   1   0   1   7  0.88 1.21
 326 I   -3 -5 -6 -6 -4 -5 -6 -6 -6  7  0 -5  0 -2 -5 -5 -3 -5 -4  4    0   0   0   0   0   0   0   0   0  66   6   0   1   1   0   0   0   0   0  26  1.40 1.17
 409 F   -4 -3 -6 -6 -3 -5 -5 -6 -3  0  1 -5  1  8 -6 -5 -3  0  1 -1    1   1   0   0   1   0   0   0   0   5  11   0   3  69   0   0   1   1   3   4  1.56 1.31
 438 Y    0 -2 -2 -4 -2 -1 -1 -3  3 -3 -3 -3 -3  1 -5 -3 -3  3  8 -2    9   2   1   0   1   3   3   1   6   0   1   0   0   1   0   0   0   3  66   2  1.34 0.89

The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.

The Q125E mutation has a score of 0.

The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.

The highest score for our substitutions is +5 for the M82L mutations. Accoring to the pssm this mutation is conserved and likely to be tolerated. This can be explained by the similar size and hydrophobic nature of both amino acids.

Secondary Structure

To find out if the mutations have an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred

We compared the structure for each position. The mutation positions are colored red and the regions where changes occure are colored blue:

Position 29          
seq: SQAALLLLRQPGARGLARSHPPRQQQQFSSLDDK non-mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC
Position 82
seq: VISGIPIYRVMDRQGQIINPSEDPHLPKEKV non-mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH Position 125
seq: KEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYG non-mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC
Position 166
seq: EAGVLMYRDYPLELFMAQCYG non-mut: HHHHHHHCCCCHHHHHHHHCC mut: CHHHHHHCCCCHHHHHHHHCC
Position 249
seq: VVICYFGEGAASEGDAHAGFNFAATLECP non-mut: EEEEEECCCCCCHHHHHHHHHHHHHHCCC mut: EEEEEECCCCCCCHHHHHHHHHHHHCCCC
Position 264
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
Position 265
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
Position 326
seq: RAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS non-mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC
Position 409
seq: KPKPNPNLLFSDVYQEMPAQL non-mut: CCCCCHHHHHHHHHCCCCHHH mut: CCCCCHHHHHHHHHCCCCHHH
Position 438
seq: QEMPAQLRKQQESLARHLQTYGEHYPLDHFDK non-mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC

As we can see by comparing the secondary structure of all the positions, most of the mutations have no influence on the secondary structure since there are no changes on the position of the mutation or in the neighbourhood. On the positions 166, 249, 264 and 265 the mutations have an influence on the structure. The mutation on position 166 has an influence on the secondary structure 6 residues earlier because the helix which starts normally at position 160 now starts at position 161. Because of the mutation on position 249 the surrounding helix is shorter because it starts one residue later and ends one residue earlier than without mutation. Because the mutations 264 and 265 are next to each other it is not clear which of them is responsibel for the change in the secondary structure or if it is the combination of the two mutations. Nevertheless there is a change in the neighbourhood of these mutations because four or five residues after the mutation occurs a beta sheet which is not in the wildtype structure. Additionally the helix which should start 19 or 20 residues after the mutation starts one position earlier.

Multiple Sequence Alignment

To find the homolog sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are of mammals.

ID Accession Entry name
sp P11178 ODBA_BOVIN
sp P12694 ODBA_HUMAN
sp Q8HXY4 ODBA_MACFA
sp P50136 ODBA_MOUSE
sp A5A6H9 ODBA_PANTR
sp P11960 ODBA_RAT
tr Q6ZSA3 Q6ZSA3_HUMAN
tr E7ESE6 E7ESE6_HUMAN
tr B2R8A9 B2R8A9_HUMAN
tr Q658P7 Q658P7_HUMAN
tr E7EW46 E7EW46_HUMAN
tr B4DP47 B4DP47_HUMAN
tr Q59EI3 Q59EI3_HUMAN
tr F1N5F2 F1N5F2_BOVIN
tr B1PK12 B1PK12_PIG
tr E2RPW4 E2RPW4_CANFA
tr B2LSM3 B2LSM3_SHEEP
tr F1RHA0 F1RHA0_PIG
tr F1PI86 F1PI86_CANFA
tr D2HMT3 D2HMT3_AILME
tr Q2TBT9 Q2TBT9_BOVIN
tr Q3U3J1 Q3U3J1_MOUSE
tr Q99L69 Q99L69_MOUSE
tr Q5EB89 Q5EB89_RAT
tr B1WBN3 B1WBN3_RAT
Figure 6: Multiple Alignment of the homologous sequences of BCKDHA with CLUSTALW

With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologs was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences (see Figure 6).

With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.

position conservation wildtype conservation mutant
29 0.7 0
82 0.96 0
125 0.96 0.04
166 1 0
249 1 0
264 1 0
265 1 0
326 0.91 0
409 0.91 0
438 0.91 0

The results show that all amino acids on the observed positions are really good conserved since the values are always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for structure and function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function. The only position where the conservation of the mutation is higher than 0 is on position 125. But as we can see the conservation is absolutely minimal so it can be neglected.

SNAP

mutation prediction

To run SNAP we used the command:

snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out

nsSNP Prediction Reliability Index Expected Accuracy
G29E Neutral 0 53%
M82E Non-neutral 4 82%
Q125E Non-neutral 1 63%
Y166N Non-neutral 2 70%
G249S Neutral 1 60%
C264W Non-neutral 4 82%
R265W Non-neutral 4 82%
I326T Non-neutral 3 78%
F409C Non-neutral 4 82%
Y438N Non-neutral 4 82%

The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29 and 249 probably have no influence on the protein.

position specific prediction

A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions. This run should show whether the substituted amino acid is essential at the corresponding position of the sequence or the mutation can not be tolerated because an unwanted effect was introduced by drastically changing the physiochemical properties of the amino acid.

The following table shows for each position to what extend each of the positions is predicted to tolerate mutations.

Mutation Tolerated Substitutions Non-tolerated Substitutions Ratio tolerated Mutations
G29E ARNDCQEGHILKMFPSTWYV 100%
M82E ILM ARNDCQEGHKFPSTWYV 15%
Q125E Q ARNDCQEGHILKMFPSTWYV 5%
Y166N QHMFWY ARNDCEGILKPSTV 30%
G249S AGS RNDCQEHILKMFPTWYV 15%
C264W C ARNDQEGHILKMFPSTWYV 5%
R265W R ANDCQEGHILKMFPSTWYV 5%
I326T ILV ARNDCQEGHKMFPSTWY 15%
F409C F ARNDCQEGHILKMPSTWYV 5%
Y438N Y ARNDCQEGHILKMFPSTWV 5%

This table shows that only the position 29 is not essential for the protein's functions. This is explainable by the fact that this mutation lies not within the actual protein sequence.

The position which allows for the most substitutions is the tyrosin at position 166. This tyrosin constitutes the end of a helix on the surface of the protein, a fact which might explain the variabilty of this position. The selected mutation to asparagine is not tolerated due to a quite different structure.

Position 82 allows the mutation for 3 amino acids, which are structurally all very similar. So for this position the structure seems to be important. The same is true for position 249 where alanine, glycine and serine are predicted to be neutral to the protein function. All of these amino acids are quite small and may therefore not disturb the protein structure and function.

On position 326 all three branched-chain amino acids are predicted to have a neutral effect on the protein's function. These amino acids are structurally and biochemically very similar and therefore a substitution is tolerated. The mutation to threonine however is not tolerated as this amino acid has different properties.

All other mutations are not tolerated at all. This means for these positions the wild-type residues are totally essential for the protein's structure and function and cannot be replaced by any other amino acid.

SIFT

The following table displays the SIFT results. The threshold for intolerance is 0.05.

The amino acids are colored in the following way:

  • nonpolar
  • uncharged polar
  • basic
  • acidic

Capital letters: amino acids appear in the alignment

Lower case letters: amino acids result from prediction

Seq Rep:fraction of sequences that contain one of the basic amino acids

Pos Ref AA Mut AA SIFT prediction SIFT Matrix Prediction
Predict Not Tolerated Seq Rep Predict Tolerated BCKDHA aa.PNG
29 G E 0.37 wcmPdIGnqRhVTkeSFLAy BCKDHA 29.PNG tolerated
82 M L dhgnweyrkspqafvi 0.91 TCLM BCKDHA 82.PNG tolerated
125 Q E ywvtsrpnmlkihgfedca 0.98 Q BCKDHA 125.PNG not tolerated
166 Y N cpdmeqkngrtisval 1.00 FHYW BCKDHA 166.PNG not tolerated
249 G S whyfimrqnlckdvtps 1.00 EGA BCKDHA 249.PNG not tolerated
264 C W ywvtsrqpnmlkihgfeda 0.98 C BCKDHA 264.PNG not tolerated
265 R W ywvtsqpnmlkihgfedca 1.00 R BCKDHA 265.PNG not tolerated
326 I T hdwpneqcrsgkytaM 1.00 FLVI BCKDHA 326.PNG not tolerated
409 F C hndkrqgecpstamvwiy 1.00 LF BCKDHA 409.PNG not tolerated
438 Y N wvtsrqpnmlkihgfedca 1.00 Y BCKDHA 438.PNG not tolerated


The only substitutions SIFT predicts not to affect protein function are G29E and M82L. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from methionine to leucine. These two amino acids are quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.

Polyphen2

Position AA1/AA2 HumDiv HumVar
prediction Score Sensitivity Specificity prediction Score Sensitivity Specificity
29 G/E benign 0.025 0.96 0.80 benign 0.018 0.96 0.52
82 M/L benign 0.001 0.99 0.15 benign 0.001 0.99 0.08
125 Q/E possibly damaging 0.759 0.85 0.93 benign 0.285 0.87 0.75
166 Y/N probably damaging 0.997 0.40 0.98 probably damaging 0.964 0.59 0.93
249 G/S benign 0.145 0.93 0.86 benign 0.292 0.86 0.75
264 C/W probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
265 R/W probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
326 I/T probably damaging 0.997 0.40 0.98 probably damaging 0.998 0.16 0.99
409 F/C probably damaging 0.998 0.27 0.99 probably damaging 0.939 0.64 0.92
438 Y/N probably damaging 1.000 0.00 1.00 probably damaging 0.987 0.49 0.96

Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The three mutations which are in both datasets marked as "benign" are at the positions 29, 82 and 249. The mutation which is only in the HumVar dataset predicted as benign is at position 125.

Comparison

Comparison of the predicted results of this TASK

Position AA1/AA2 BLOSUM62 PAM1 PAM250 PSSM Secondary Structure Multiple Alignment SNAP SIFT Polyphen2
Prediction Conservation wildtype Prediction Prediction HumDiv HumVar
29 G/E non-neutral neutral neutral non-neutral neutral neutral neutral neutral neutral neutral
82 M/L neutral neutral neutral neutral neutral non-neutral non-neutral neutral neutral neutral
125 Q/E neutral neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
166 Y/N non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
249 G/S non-neutral neutral neutral neutral non-neutral non-neutral neutral non-neutral neutral neutral
264 C/W non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
265 R/W non-neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
326 I/T non-neutral neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
409 F/C non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
438 Y/N non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
G29E
Except of BLOSUM62 and PSSM all of the other tools and sources point out that this mutation is neutral. It is interesting that these two sources have quite low scores which indicates that they are sure about their prediction. But the fact that the other matrices have very high values and that this position is not very conserved with only 72% shows that this amino acid changes quite often. Because of this it is possible that the mutation is neutral. Also the result of the secondary structure prediction confirms us in this presumption because there is no change between the original and the mutated secondary structure. Additionally the residue is in a coil region which means it does not endoce a structural element. It has to be mentioned that SNAP is not absolutely sure about its prediction of neutrality which can be seen on the scores. It has a reliability index of 0 which is the lowest one and so the prediction is very unsure. In contrast, the tools SIFT and PolyPhen2 are sure about their prediction because SIFT has a quite high score (0.68)and PolyPhen2 has two really low scores (0.025 and 0.018) which shows that these mutations are certain. Since nearly all methods predict the mutation as neutral and this mutation happens in the signal peptide of our protein we also assume the mutation to be neutral.
M82L
The mutation on position 82 from methionine to leucine is predicted to be neutral by most of the methods. Only SNAP and the conservation score declare the mutation as non-neutral. The prediction of SNAP has a reliability index of 4 which means that SNAP is very sure about the result of its prediction. By looking on the conservation score it arises that this position is indeed very good conserved because the value is 96%. But the value is not 1 which shows that some mutations on this position are possible. In contrast to these two results the substitution matrices display that the mutation is neutral because all of them have high values for the change from methionine to leucine. This is due to structural similarities of these two amino acids. The PSSM score is also very high which indicates that this mutation has no damaging effect on the structure and function of the protein. The two tools SIFT and PolyPhen2 are also very sure about the fact that this mutation is neutral. This is based on the fact that SIFT predicts it with a high value of 0.65 and PolyPhen2 with a low value of 0.001. Another additional information is the structure of the two proteins. By comparing the two structures it appears that there is nearly no change in the structure of the protein. Because of this fact and all the other methods which declare this mutation as neutral, we also predict it as neutral.
Q125E
Based on the predictions of the different tools it is not possible to decide whether this mutation from glutamine to glutamic acid is neutral or not because the tools say completely different things. By looking at the structure of the amino acids it can be seen that they are quite similar but lead to different physiochemical properties. This can be an explanation for the different predictions, depending on whether the tool takes the amino acid properties or the structure into account by predicting its effect on the protein's function. It is also important to recognize that most of the tools are not completely sure with the prediction they made. For example the PSSM value is 0 which is exactly on the border between neutral or not. SNAP is also not very sure at all because the reliability index is only 1. In contrast the predition of SIFT is very save since the value for this mutation is 0.0 which is the lowest possible value. PolyPhen2 is uncertain given that the two different datasets predict two different things. Because of these different results it can be assumed that if this mutation is non-neutral the influence is minimal. Since the experimental structure is different we would say that this mutation is non-neutral but as already said it can be assumed that the influence is minimal and so the effect on the protein function is not that grave.
Y166N
The mutation from tyrosine to asparagine is predicted to be non-neutral by all of the tools and sources. The values of the substitution matrices and the PSSM score are all low which shows that this mutation does not occur very often. The rareness of a change on this position is also shown by the conservation score which is 1. This reflects that the amino acid at position 166 has an important role for the structure and function of the protein and so the mutation to another amino acid is not neutral. The secondary structure prediction points out that there are changes because of the different amino acid. Especially the tool PolyPhen2 shows how certain this prediction is, since the scores are 0.997 (HumDiv) and 0.964 (HumVar) with 1.0 being the highest score. The prediction can be confirmed by looking at the experimental structure of the two different amino acids. The mutant residue has no aromatic ring while the original residue has one. Because of this structural difference the physicochemical properities of the two amino acids are not the same and a mutation would have an huge impact on the protein's function.
G249S
The results of several tools and sources are controversial again. While BLOSUM62 indicates that the mutation is non-neutral, PAM1, PAM250 and the PSSM score show that the mutation is neutral. But it has to be pointed out that the value of BLOSUM62 is 0 which means that it is on the border to be neutral. So looking at the substitutions matrices and the PSSM score the mutation is neutral. But by looking at the predicted secondary structure it can be seen that there are differences since the helix in the mutated protein is two positions shorter than the original one. This indicates that the mutation is non-neutral and also the fact that this position is 100% conserved lets us assume that the mutation would have an influence on the structure and function of the protein. SNAP predicts the mutation to be neutral but given that the reliability index is only 1 indicates that it is not sure. In contrast to SNAP, SIFT determines the mutation as non-neutral and is very sure because the value of serine is 0.0 which is the lowest value and indicates mostly that a mutation is non-neutral. The two predictions of PolyPhen2 declare this SNP as neutral. All in all it is nor clear how the effect of this mutation is and again it is possible that if there is a change in the structure of the protein it is not grave since the different tools are that unsure. By looking at the experimental structure it can be seen that the mutated protein has an additionally side chain but both amino acids are very small and a mutation might be tolerated. Because of this fact we predict this mutation as non-neutral.
C264W
The declaration of this mutation as non-neutral is clear since all tools predict it as non-neutral. Based on the substitution matrices it can be seen that this substitution is very uncommon because the values of all three matrices are very low. The PSSM score is one of the lowest possible values which also indicates that the change of this amino acid is very rare. The conservation score can be interpreted the same way since it is 1 which means that this mutation occurs very rarely and it can be assumed that this amino acid has an important role in the structure and function of the protein. This fact is also confirmed by the SNAP scan using all possible mutations. With this knowledge the mutation has to be non-neutral. The three tools SNAP, SIFT and PolyPhen2 are all absolutely sure about there prediction because all of them have the best possible scores predicting the influence of the mutation. By looking on the experimental structure and the differences between the two amino acids the assumption that the mutation is non-neutral is tightened.
R265W
This mutation is predicted by nearly every tool or source to be non-neutral. Only PAM1 and PAM250 declare this position as neutral with high scores of 8 and 7. This is curious because all other tools and sources are absolutely sure that a change of this amino acid will have an effect on the protein. The PSSM score is -5 which is really low and shows that it is unlikely that a mutation at this position is neutral and also the conservation score which is 1 shows that this amino acid is very good conserved and a change would be fatal. The tools SNAP, SIFT and PolyPhen2 predicts this mutation as definitly damaging. To be really sure we also imply the predicted and the experimental structure of the protein. By looking at the predicted structure it is clear that there is a change because a beta sheet occurs although there is none in the original protein and additionally the coil after the mutation is one residue shorter than in the wildtype. The damaging fact of this mutation can furthermore be explained by the fact that this position is absolutely essential for the protein's function as SNAP predicts no mutation on this position to be tolerated. Therefore we would declare this mutation as non-neutral.
I326T
Most of the prediction methods see this mutation as being non-neutral. Only PAM1, PAM250 and the predicted secondary structure declare this mutation as neutral. It is interesting that the two substitution matrices have high values of 7 and 4 which shows that this substitution is quite common. And also by looking at the structure this guess is supported because no change occurs. But all the other methods predict this mutation to be non-neutral. BLOSUM62 and PSSM, for example, have a very low value which shows that it is sure that this mutation has a damaging effect. This position is also very good conserved which prefigures that a change on this position does not occur often and that a mutation would have a damaging influence on the protein. The three prediction methods declare the SNP to be damaging with a very high assurance. And also the properties of the two amino acids indicate that there is an influence by this mutation since they are completely different. To summarize all the results we also predict the mutation as beeing non-neutral.
F409C
Except of the predicted secondary structure all of the methods declare the mutation from phenylalanine to cysteine as non-neutral. It is possible that the mutation have an effect on a very distant position and so it is not detected by looking on the next neighbourhood of the mutated position. This can be an explanation for this prediction. All the other methods are sure about the prediction which is shown for example by the very low values in the substitution matrices or by the PSSM score. Additionally the three prediction tools have scores which point out that the mutation is clearly non-neutral. To safeguard this prediction we also look on the amino acids and their properties. This comparison ensures the prediction since the bulky hydrophobic ring is substituted by a small, sulfur-containg side chain. This shows that the two amino acids are totally different. Furthermore SNAP predicts this position to be essential for the protein's function as no other amino acids than phenylalanine is tolerated at this position. Because of all these disparities we decide that the mutation is non-neutral.
Y438N
On position 438 it is nearly the same as on position 409. The mutation of tyrosine to asparagine is predicted as non-neutral by nearly all methods. Only the predicted secondary structure could lead to the assumption that the mutation is neutral. One explanation for this result is that the mutation influences a position which is far away. So the change in the secondary structure can not be recognized. This is very likely since all the other methods predict this substitution as non-neutral. The prediction by the substitution matrices and the PSSM is not as sure as the one of the mutation on position 409 since the values are not as low as them but still they are low enough to decide that the mutation will have an influence on the structure and function of the protein. SNAP and SIFT are as sure as on the last mutation which means that this result is absolutely sure since the values are very good (SNAP:4, SIFT:0.0). PolyPhen2 is even more certain about the result because it predicts the damaging influence of this mutation with 1.0 (HumDiv) and 0.987 (HumVar). Again we also look on the physiochemical properities and the structure of the two amino acids to find out if there are differences. The only property which is the same in both amino acids is the polarity. The other properties are completely different since the mutated amino acid is acidic and small whereas the original amino acid is hydrophobic and aromatic. The missing aromatic ring may be the fact that SNAP only allows for tyrosin to be in this position. Since all the methods and also the structure and the properties declare the mutation as non-neutral we also predict it to be non-neutral.

Comparison of the prediction of the tools with the annotation in dbSNP and HGMD

In order to categorize the SNPs extracted from dbSNP and HGMD into "neutral" and "non-neutral" concering their effect on the protein function the following assumption was made:

All SNPs listed in the HGMD are disease related mutations. SNPs that are listed in both databases do have an effect on the protein's structure and function. All SNPs only listed in dbSNPs do not affect the protein function.

Position AA1/AA2 dbSNP/HGMD own prediction SNAP SIFT PolyPhen2
29 G/E neutral neutral neutral neutral neutral/neutral
82 M/L neutral neutral non-neutral neutral neutral/neutral
125 Q/E non-neutral non-neutral non-neutral non-neutral non-neutral/neutral
166 Y/N non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
249 G/S non-neutral non-neutral neutral non-neutral neutral/neutral
264 C/W neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
265 R/W non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
326 I/T non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
409 F/C non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral
438 Y/N non-neutral non-neutral non-neutral non-neutral non-neutral/non-neutral

Prediction accuracy of the tools

The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)


tool TP TN FP FN Sensitivity Specificity Accuracy
SNAP 2 7 1 0 0.33 0.86 0.7
SIFT 2 7 0 1 0.66 1.0 0.9
PolyPhen2 (HumDiv) 2 6 1 1 0.66 0.86 0.8
PolyPhen2 (HumVar) 2 5 2 1 0.66 0.71 0.7


Comparing the different prediction methods, someone should look at the accuracy of the predictions. In our case SIFT performed the best and made the most correct predictions. PolyPhen using the HumDiv dataset made also 8 out of 10 predictions correct, SNAP and PolyPhen (HumVar) predicted only the effect of 7 mutations on the protein function correctly.


go back to Maple_syrup_urine_disease main page
go back to Task 5 Mapping SNPs

go to Task7 Structure-based mutation analysis