Sequence-based mutation analysis BCKDHA
- 1 General
- 2 Amino Acid Properties
- 3 Substitution matrices
- 4 PSSM
- 5 Secondary Structure
- 6 Multiple Sequence Alignment
- 7 SNAP
- 8 SIFT
- 9 Polyphen2
- 10 Comparison
We chose the following mutations for the sequence-based mutation analysis:
The mutation positions are relative to the Uniprot reference sequence. Figure 1 shows the BCKDHA protein where the chosen mutation positions are colored according to the source, which listed the mutation. Figure 2 shows the protein where the amino acids listed above are mutated.
A Protocol was created describing all steps for running the programs etc.
Amino Acid Properties
|Reference amino acid||Mutated amino acid||Structural Difference||Secondary Structure|
|Position||Residue||Properties||Structure||Residue||Properties||Structure||Prediction||in protein structure|
|29||G||tiny, small||E||charged, polar||C|
|82||M||sulphur containing, hydrophobic||L||hydrophobic, aliphatic||C|
|125||Q||acidic, polar||E||charged, polar||C|
|166||Y||hydrophobic, aromatic, polar||N||acidic, polar, small||H|
|249||G||tiny, small||S||polar, small, tiny, hydroxylic||H|
|264||C||sulphur containing, hydrophobic, tiny, small, polar||W||hydrophobic, aromatic, polar||E|
|265||R||charged, positive (basic), polar||W||hydrophobic, aromatic, polar||E|
|326||I||aliphatic, hydrophobic||T||hydroxylic, hydrophobic, small, polar||E|
|409||F||aromatic, hydrophobic||C||sulphur containing, hydrophobic, tiny, small, polar||C|
|438||Y||hydrophobic, aromatic, polar||N||acidic, polar, small||C|
Annotation: H = helix, E = beta-sheet, C = coil
To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.
The Protocol describes in detail the way how we used pymol to visualize our mutations.
Looking at the differences in structure and biochemical properties properties of the reference amino acid and the mutated amino acid, someone might draw conclusions about the effect of the mutation on the proteins' function.
- This prediction lies in the signal peptide of the protein. Therefore the mutation has no direct effect on the function of the protein. Anyhow, looking at the biochemical properties, they are quite different and especially Glycin has due to its unique smallness a very special role in most of the protein sequences and a mutation might destroy the protein structure if it was within the actual protein sequence. From this point of view this mutation might cause an effect on the protein's function.
- Methionine and leucine are both hydrophobic amino acids with an similar size. The loss of the sulfur-containing methionine could lead to an disease causing effect but as the structures are quite similar this mutation might be tolerated. Furthermore the amino acid does not contribute in the formation of any important secondary structure, so a substitution here can be even more be tolerated.
- The structure of these two amino acids is almost identical. The only difference is the substitution of an hydroxy group with an amino group. This changes the physical property from uncharged to a negative charge, which might influence the protein's function.
- This substitution lead to a mutation of an hydrophobic aromatic residue to a small polar amino acid. These differences in the amino acids' properties are likely to change the proteins' structure as this residue is located in a helix and therefore this mutation is very likely to affect the protein's function.
- This mutation introduces a polar, hydroxylic amino acid to a position in a helix where usually a tiny, unpolar amino acids is located. As the size of both amino acids is quite small, therefore a mutation might be benign.
- This mutation has a huge impact on the structure and physiochemical properties of the amino acid at this position. A small, sulphur-containing amino acid is replaced by an aromatic amino acid, which occupies a lot more space. The hydrophobicity and polarity remain the same, nevertheless are the amino acids very different and this mutation will destroy the protein's function.
- Here a positively charged amino acid is substituted by a hydrophobic one. This change is quite severe, therefore we assume the protein's function cannot be maintained.
- This mutation introduces a hydroxy group to the amino acid which makes it polar. This is quite an important change which might have an influence on the function of the protein.
- This substitution leads to totally different amino acids concerning structure and physiochemical properties. A bulky aromatic residue is substituted by a small, polar, sulphur-containing amino acid. These drastic changes might very likely change the protein's structure and therefore affect its function.
- This mutation also changes the amino acid completely. The big aromatic residue which is hydrophobic is substituted by a small polar one. These differences are likely to affect the protein function.
|29||G/E||-2||-4 (I, L)||7||0 (I, W, Y)||9||2 (W)||BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous|
|82||M/L||2||-3 (D, G)||8||0 (N, D, C, E, G, H, P, W, Y)||3||0 (C)||The three values are positive and quite high relative to the other values of Methionin. This means that all thress matrices indicate that this mutation occurs quite often|
|125||Q/E||2||-3 (C, F, I)||27||0 (F, W, Y)||7||1 (C, F, W)||all three substitution matrices show that this mutation occurs quite often|
|166||Y/N||-2||-3 (D, G, P)||3||0 (R, D, Q, G, K, M, P)||2||1 (A, R, D, Q, E, G, K, P)||Since the values of the three matrices are low the mutation does not occur very often, therefore it is not very probably|
|249||G/S||0||-4 (I, L)||21||0 (I, W, Y)||11||2 (W)||All three values are high what indicates that this mutation is quite common and therefore probably not very damaging|
|264||C/W||-2||-4 (E)||0||0 (N, D, Q, E, G, L, K, M, F, W)||1||1 (R, N, D, Q, E, L, K, M, F, W)||The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein|
|265||R/W||-3||-3 (W, V, F, I, C)||8||0 (D, E, G, Y)||7||1 (F)||PAM1 and PAM250 have high values whereas BLOSUM62 has a low value for this substitution. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite common and so not very damaging|
|326||I/T||-1||-4 (G)||7||0 (G, A, P, W)||4||1 (W)||Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 indicate that the mutation have no negative influence on the protein|
|409||F/C||-2||-4 (P)||0||0 (D, C, Q, E, K, P, V)||1||1 (R, D, C, Q, E, G, K, P)||The three values are all very low which means that this mutation is very rare. This indicates that the mutation affects the function and the structure of the protein|
|438||Y/N||-2||-3 (D, G, P)||3||0 (R, D, Q, G, K, M, P)||2||1 (A, R, D, Q, E, G, K, P)||Again the scores are all low which indicates the damaging effect of the mutation|
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 29 G 1 0 0 -2 1 0 -2 4 1 -1 -3 -1 -3 -2 -3 1 -1 -4 -1 -1 10 6 5 1 3 4 1 37 3 5 2 3 0 2 0 9 2 0 2 6 0.37 0.77 82 M -4 -5 -6 -6 -3 -5 -5 -6 -5 2 5 -5 5 -2 -5 -5 -3 -3 -3 1 0 0 0 0 0 0 0 0 0 12 67 0 14 0 0 0 0 0 1 6 1.23 1.27 125 Q -1 -1 -3 -3 -5 8 0 -4 0 -3 -4 -1 -1 -6 -4 -1 1 -5 -4 -4 4 2 0 0 0 74 2 0 1 1 1 2 1 0 0 2 9 0 0 0 1.46 1.28 166 Y 3 -3 -4 -4 3 0 -3 -4 1 -2 -2 -3 0 -2 -4 -1 1 7 3 1 24 1 0 0 7 5 1 1 3 1 3 1 2 1 0 3 8 15 12 11 0.62 1.29 249 G 5 -4 -3 -4 -4 -3 -2 4 -4 -5 -5 -3 -4 -5 -4 1 -2 -5 -5 -4 54 0 0 0 0 0 3 35 0 0 0 0 0 0 0 8 1 0 0 0 1.12 1.21 264 C -2 -5 -3 -5 9 -5 -5 -5 -5 3 -2 -5 -3 -4 -1 -3 -3 -5 -4 4 1 0 2 0 45 0 0 0 0 15 2 0 0 0 4 1 1 0 0 29 1.43 1.18 265 R -3 4 2 -3 -5 5 2 -4 0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2 0 0 25 12 0 0 34 15 0 1 2 0 1 0 0 0 1 1 0 1 7 0.88 1.21 326 I -3 -5 -6 -6 -4 -5 -6 -6 -6 7 0 -5 0 -2 -5 -5 -3 -5 -4 4 0 0 0 0 0 0 0 0 0 66 6 0 1 1 0 0 0 0 0 26 1.40 1.17 409 F -4 -3 -6 -6 -3 -5 -5 -6 -3 0 1 -5 1 8 -6 -5 -3 0 1 -1 1 1 0 0 1 0 0 0 0 5 11 0 3 69 0 0 1 1 3 4 1.56 1.31 438 Y 0 -2 -2 -4 -2 -1 -1 -3 3 -3 -3 -3 -3 1 -5 -3 -3 3 8 -2 9 2 1 0 1 3 3 1 6 0 1 0 0 1 0 0 0 3 66 2 1.34 0.89
The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.
The Q125E mutation has a score of 0.
The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.
The highest score for our substitutions is +5 for the M82L mutations. Accoring to the pssm this mutation is conserved and likely to be tolerated. This can be explained by the similar size and hydrophobic nature of both amino acids.
To find out if the mutation has an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred
We compared the structure for each position:
seq: SQAALLLLRQPGARGLARSHPPRQQQQFSSLDDK non-mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC
seq: VISGIPIYRVMDRQGQIINPSEDPHLPKEKV non-mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH Position 125
seq: KEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYG non-mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC
seq: EAGVLMYRDYPLELFMAQCYG non-mut: HHHHHHHCCCCHHHHHHHHCC mut: CHHHHHHCCCCHHHHHHHHCC
seq: VVICYFGEGAASEGDAHAGFNFAATLECP non-mut: EEEEEECCCCCCHHHHHHHHHHHHHHCCC mut: EEEEEECCCCCCCHHHHHHHHHHHHCCCC
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
seq: RAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS non-mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC
seq: KPKPNPNLLFSDVYQEMPAQL non-mut: CCCCCHHHHHHHHHCCCCHHH mut: CCCCCHHHHHHHHHCCCCHHH
seq: QEMPAQLRKQQESLARHLQTYGEHYPLDHFDK non-mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
As we can see by comparing the secondary structure of all the positions, most of the mutations have no influence on the secondary structure since there are no changes on the position of the mutation or in the neighbourhood. On the positions 166, 249, 264 and 265 the mutations have an influence on the structure. The mutation on position 166 has an influence on the secondary structure 6 residues earlier because the helix which starts normally at position 160 now starts at position 161. Because of the mutation on position 249 the surrounding helix is shorter because it starts one residue later and ends one residue earlier than without mutation. Because the mutations 264 and 265 are next to each other it is not clear which of them is responsibel for the change in the secondary structure or if it is the combination of the two mutations. Nevertheless there is a change in the neighbourhood of these mutations because four or five residues after the mutation occurs a beta sheet which is not in the wildtype structure. Additionally the helix which should start 19 or 20 residues after the mutation starts one position earlier.
Multiple Sequence Alignment
To find the homolog sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are mammals.
With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologs was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences.
With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.
|position||conservation wildtype||conservation mutant|
The results show that all amino acids on the observed positions are really good conserved since the value is always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for the structure and the function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function.
To run SNAP we used the command:
snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out
|nsSNP||Prediction||Reliability Index||Expected Accuracy|
The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29 and 249 would not have an influence on the protein.
position specific prediction
A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions. This run should show whether the substituted amino acid is essential at the corresponding position of the sequence or the mutation in not tolerated because an unwanted effect was introduced by drastically changing the physiochemical properties of the amino acid.
The following table shows for each position to what extend each of the positions is predicted to tolerate mutations.
|Mutation||Tolerated Substitutions||Non-tolerated Substitutions||Ratio tolerated Mutations|
This table shows that only the position 29 is not essential for the protein's functions. This is explainable by the fact that this mutation lies not within the actual protein sequence.
The position which allows for the most substitutions is the tyrosin at position 166. This tyrosin constitutes the end of a helix on the surface of the protein, a fact which might explain the variabilty of this position. The selected mutation to asparagine is not tolerated due to a quite different structure.
Position 82 allows the mutation for 3 amino acids, which are structurally all very similar. So for this position the structure seems to be important. The same is true for position 249 where alanine, glycine and serine are predicted to be neutral to the protein function. All of these amino acids are quite small and may therefore not disturb the protein structure and function.
In position 326 all three branched-chain amino acids are predicted to have a neutral effect on the protein's function. These amino acids are structurally and biochemically very similar and therefore a substitution is tolerated. The mutation to threonine however is not tolerated as this amino acid has different properties.
All other mutations are not tolerated at all. This means for these positions the wild-type residues are totally essential for the protein's structure and function and cannot be replaced by any other amino acid.
The following table displays the SIFT results. The threshold for intolerance is 0.05.
The amino acids are colored in the following way:
- uncharged polar
Capital letters: amino acids appear in the alignment
Lower case letters: amino acids result from prediction
Seq Rep:fraction of sequences that contain one of the basic amino acids
The only substitutions SIFT predicts not to affect protein function are G29E and M82L. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from methionine to leucine........ These two amino acids quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.
|166||Y/N||probably damaging||0.997||0.40||0.98||probably damaging||0.964||0.59||0.93|
|264||C/W||probably damaging||1.000||0.00||1.00||probably damaging||1.000||0.00||1.00|
|265||R/W||probably damaging||1.000||0.00||1.00||probably damaging||1.000||0.00||1.00|
|326||I/T||probably damaging||0.997||0.40||0.98||probably damaging||0.998||0.16||0.99|
|409||F/C||probably damaging||0.998||0.27||0.99||probably damaging||0.939||0.64||0.92|
|438||Y/N||probably damaging||1.000||0.00||1.00||probably damaging||0.987||0.49||0.96|
Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The two mutations which are in both datasets marked as "benign" are at the positions 29 and 249. The mutation which is only in the HumVar dataset predicted as benign is at position 125.
Comparison of the predicted results of this TASK
|Position||AA1/AA2||BLOSUM62||PAM1||PAM250||PSSM||Secondary Structure||Multiple Alignment||SNAP||SIFT||Polyphen2|
- Except of BLOSUM62 and PSSM all of the other tools and sources point out that this mutation is neutral. It is interesting that these two sources have quite low scores which indicates that they are sure about their prediction. But the fact that the other matrices have very high values and that this position is not very conserved with only 72% shows that this amino acid changes quite often. Because of this it is possible that the mutation is neutral. Also the result of the secondary structure prediction confirms us in this presumption because there is no change between the original and the mutated secondary structure. Additionally the residue is a coil region which means it does not endoce a structural element. It has to be mentioned that SNAP is not absolutely sure about its prediction of neutrality which can be seen on the scores. It has a reliability index of 0 shich is the lowest one and so the prediction is very unsure. In contrast, the tools SIFT and PolyPhen2 are sure about their prediction because SIFT has a quite high score (0.68)and PolyPhen2 has two really low scores (0.025 and 0.018) which shows that these mutations are certain. Since nearly all methods predict the mutation as neutral and this mutation happens in the signal peptide of our protein we also assume the mutation to be neutral.
- The mutation on position 82 from methionine to leucine is predicted to be neutral by most of the methods. Only SNAP and the conservation score declare the mutation as non-neutral. The prediction of SNAP has a reliability index of 4 which means that SNAP is very sure about the result of its prediction. By looking on the conservation score it arises that this position is indeed very good conserved because the value is 96%. But the value is not 1 which shows that some mutations on this position are possible. In contrast to these two results the substitution matrices display that the mutation is neutral because all of them have high values for the change from methionine to leucine. This is due to structural similarities of these two amino acids. The PSSM score is also very high which indicates that this mutation has no damaging effect on the structure and function of the protein. The two tools SIFT and PolyPhen2 are also very sure about the fact that this mutation is neutral. This is based on the fact that SIFT predicts it with a high value of 0.65 and PolyPhen2 with a low value of 0.001. Another additional information is the structure of the two proteins. By comparing the two structures it appears that there is nearly no change in the structure of the protein. Because of this fact and all the other methods which declare this mutation as neutral, we also predict it as neutral.
- Based on the predictions of the different tools it is not possible to decide wether this mutation from glutamine to glutamic acid is neutral or not because the tools say completely different things. By looking at the structure of the amino acids it can be seen that they are quite similar but lead to different physiochemical properties. This can be an explanation for the different predictions, depending on whether the tool takes the amino acid properties or the structure into account by predicting its effect on the protein's function. It is also important to recognize that most of the tools are not completely sure with the prediction they made. For example the PSSM value is 0 which is exactly on the border between neutral or not. SNAP is also not very sure at all because the reliability index is only 1. In contrast the predition of SIFT is very save since the value for this mutation is 0.0 which is the lowest possible value. PolyPhen2 is uncertain given that the two different datasets predict two different things. Because of these different results it can be assumed that if this mutation is non-neutral the influence is minimal. Since the experimental structure is different we would say that this mutation is non-neutral but as already said it can be assumed that the influence is minimal and so the effect on the protein function is not that grave.
- The mutation from tyrosine to asparagine is predicted to be non-neutral by all of the tools and sources. The values of the substitution matrices and the PSSM score are all low which shows that this mutation does not occur very often. The rareness of a change on this position is also shown by the conservation score which is 1. This reflects that the amino acid at position 166 has an important role for the structure and function of the protein and so the mutation to another amino acid is not neutral. Secondary structure prediction points out that there are changes because of the different amino acid. Especially the tool PolyPhen2 shows how certain this prediction is, since the scores are 0.997 (HumDiv) and 0.964 (HumVar) with 1.0 being the highest score. The prediction can be confirmed by looking at the experimental structure of the two different amino acids. The mutant residue has no aromatic ring while the original residue has one. Because of this structural difference the physicochemical properities of the two amino acids are not the same and a mutation would have an huge impact on the protein's function.
- The results of several tools and sources are controversial again. While BLOSUM62 indicates that the mutation is non-neutral, PAM1, PAM250 and the PSSM score show that the mutation is neutral. But it has to be pointed out that the value of BLOSUM62 is 0 which means that it is on the border to be neutral. So looking at the substitutions matrices and the PSSM score the mutation is neutral. But by looking at the predicted secondary structure it can be seen that there are differences since the helix in the mutated protein is two positions shorter than the original one. This indicates that the mutation is non-neutral and also the fact that this position is 100% conserved lets us assume that the mutation would have an influence on the structure and function of the protein. SNAP predits the mutation to be neutral but given that the reliability index is only 1 indicates that it is not sure. In contrast to SNAP, SIFT determines the mutation as non-neutral and is very sure because the value of serine is 0.0 which is the lowest value and indicates mostly that a mutation is non-neutral. The two predictions of PolyPhen2 declare this SNP as neutral. All in all it is nor clear how the effect of this mutation is and again it is possible that if there is a change in the structure of the protein it is not grave since the different tools are that unsure. By looking at the experimental structure it can be seen that the mutated protein has an additionally side chain but both amino acids are very small and a mutation might be tolerated. Because of this fact we predict this mutation as non-neutral.
- The declaration of this mutation as non-neutral is clear since all tools predict it as non-neutral. Based on the subsitution matrices it can be seen that this substitution is very uncommon because the values of all three matrices are very low. The PSSM score is one of the lowest possible values which also indicates that the change of this amino acid is very rare. The conservation score points can be interpreted the same way since it is 1 which means that this mutation occurs very rarely and it can be assumed that this amino acid has an important role in the structure and function of the protein. This fact is also confirmed by the SNAP scan using all possible mutations. With this knowledge the mutation has to be non-neutral. The three tools SNAP, SIFT and PolyPhen2 are all absolutely sure about there prediction because all of them have the best possible scores predicting the influence of the mutation. By looking on the experimental structure and the differences between the two amino acids the assumption that the mutation is non-neutral is tightened.
- This mutation is predicted by nearly every tool or source to be non-neutral. Only PAM1 and PAM250 declare this position as neutral with high scores of 8 and 7. This is curious because all other tools and sources are absolutely sure that this a change of this amino acid will have an effect on the protein. The PSSM score is -5 which is really low and shows that it is unlikely that a mutation at this position is neutral and also the conservation score which is 1 shows that this amino acid is very good conserved and a change would be fatal. The tools SNAP, SIFT and PolyPhen2 predicts this mutation as definitly damaging. To be really sure we also imply the predicted and the experimental structure of the protein. By looking at the predicted structure it is clear that there is a change because a beta sheet occurs although there is none in the original protein and additionally the coil after the mutation is one residue shorter than in the wildtype. The damaging fact of this mutation can furthermore be explained by the fact that this position is absolutely essential for the protein's function as SNAP predicts no mutation on this position to be tolerated. Therefore we would declare this mutation as non-neutral.
- Most of the prediction methods see this mutation as being non-neutral. Only PAM1, PAM250 and the predicted secondary structure declares this mutation as neutral. It is interesting that the two substitution matrices have high values of 7 and 4 which shows that this substitution is quite common. And also by looking at the structure this guess is supported because no change occurs. But all the other methods predict this mutation to be non-neutral. BLOSUM62 and PSSM, for example, have a very low value which shows that it is sure that this mutation has a damaging effect. This position is also very good conserved which prefigures that a change on this position does not occur often and that a mutation would have a damaging influence on the protein. The three prediction methods declare the SNP to be damaging with a very high assurance. The properties of the two amino acids also indicates that there is an influence by this mutation since they are completely different. To summarize all the results we also predict the mutation as beeing non-neutral.
- Except of the predicted secondary structure all of the methods declare the mutation from phenylalanine to cysteine as non-neutral. It is possible that the mutation have an effect on a very distant position and so it is not detected by looking on the next neighbourhood of the mutated position. this can be an explanation for this prediction. All the other methods are sure about the prediction which can be seen for example on the very low values in the substitution matrices or of the PSSM score. And the three prediction tools have scores which point out that the mutation is clearly non-neutral. To safeguard this prediction we also look on the amino acids and their properties. This comperison ensures the prediction since the a bulky hydrophobic ring is substituted by a small, sulfur-containg side chain. This shows that the two amino acids are totally different. Furthermore SNAP predicts this position to be essential for the protein's function as no other amino acids than phenylalanine is tolerated at this position. Because of all these disparities we decide that the mutation is non-neutral.
- On position 438 it is nearly the same as on position 409. The mutation of tyrosine to asparagine is predicted as non-neutral by nearly all methods. Only the predicted secondary structure could lead to the assumption that the mutation is neutral. One explanation for this result is that the mutation influences a position which is far away. So the change in the secondary structure can not be recognized. This is very likely since all the other methods predict this substitution as non-neutral. The prediction by the substitution matrices and the PSSM is not as sure as the one of the mutation on position 409 since the values are not as low as them but still they are low enough to decide that the mutation will have an influence on the structure and function of the protein. SNAP and SIFT are as sure as on the last mutation which means that this result is absolutely sure since the values are very good (SNAP:4, SIFT:0.0). PolyPhen2 is even more certain about the result because it predicts the damaging influence of this mutation with 1.0 (HumDiv) and 0.987 (HumVar). Again we also look on the physiochemical properities and the structure of the two amino acids to find the differences. The only property which is the same in both amino acids is the polarity. The other properties are completely different since the mutated amino acid is acidic and small whereas the original amino acid is hydrophobic and aromatic. The missing aromatic ring may be the fact that SNAP only allows for tyrosin to be in this position. Since all the methods and also the structure and the properties declare the mutation as non-neutral we also predict it to be non-neutral.
Comparison of the prediction of the tools with the annotation in dbSNP and HGMD
In order to categorize the SNPs extracted from dbSNP and HGMD into "neutral" and "non-neutral" concering their effect on the protein function the following assumption was made:
All SNPs listed in the HGMD are disease related mutations. SNPs that are listed in both databases do have an effect on the protein's structure and function. All SNPs only listed in dbSNPs do not affect the protein function.
Precition accuracy of the tools
The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)
Comparing the different prediction methods, someone should look at the accuracy of the predictions. In our case SIFT performed the best and made the most correct predictions. PolyPhen using the HumDiv dataset made also 8 out of 10 predictions correct, SNAP and PolyPhen (HumVar) predicted only the effect of 7 mutations on the protein function correctly.
go to Task7 Structure-based mutation analysis