Sequence-based mutation analysis BCKDHA
We chose the following mutations for the sequence-based mutation analysis:
The mutation positions are relative to the Uniprot reference sequence.
A Protocol was created describing all steps for running the programs etc.
Amino Acid Properties
|Reference amino acid||Mutated amino acid||Structural Difference||Secondary Structure|
|29||G||tiny, small||E||charged, polar||C|
|82||M||sulphur containing, hydrophobic||L||hydrophobic, aliphatic||C|
|125||Q||acidic, polar||E||charged, polar||C|
|249||G||tiny, small||S||polar, small, tiny, hydroxylic||H|
|264||C||sulphur containing, hydrophobic, tiny, small, polar||W||hydrophobic, aromatic, polar||E|
|265||R||charged, positive (basic), polar||W||hydrophobic, aromatic, polar||[[File:||E|
|326||I||aliphatic, hydrophobic||T||hydroxylic, hydrophobic, small, polar||E|
|361||I||aliphatic, hydrophobic||V||aliphatic, hydrophobic, small||H|
|409||F||aromatic, hydrophobic||C||sulphur containing, hydrophobic, tiny, small, polar||C|
|438||Y||hydrophobic, aromatic, polar||N||acidic, polar, small||C|
Annotation: H = helix, E = beta-sheet, C = coil
To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.
The Protocol describes in detail the way how we used pymol to visualize our mutations.
|29||G/E||-2||-4 (I, L)||7||0 (I, W, Y)||9||2 (W)||BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous|
|82||M/L||2||-3 (D, G)||8||0 (N, D, C, E, G, H, P, W, Y)||3||0 (C)||The three values are positive and quite high relative to the other values of Methionin. This means that all thress matrices indicate that this mutation occurs quite often|
|125||Q/E||2||-3 (C, F, I)||27||0 (F, W, Y)||7||1 (C, F, W)||all three substitutionmatrizes show that this mutation occurs quite often|
|166||Y/N||-2||-3 (D, G, P)||3||0 (R, D, Q, G, K, M, P)||2||1 (A, R, D, Q, E, G, K, P)||Since the values of the three matrizes are low the mutation is not ofteh which shows that is not very probably|
|249||G/S||0||-4 (I, L)||21||0 (I, W, Y)||11||2 (W)||All three values are high what means that this mutation is often and so probably not very damaging|
|264||C/W||-2||-4 (E)||0||0 (N, D, Q, E, G, L, K, M, F, W)||1||1 (R, N, D, Q, E, L, K, M, F, W)||The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein|
|265||R/W||-3||-3 (W, V, F, I, C)||8||0 (D, E, G, Y)||7||1 (F)||PAM1 and PAM250 have high values whereas BLOSUM62 has a low value. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite often and so not very damaging|
|326||I/T||-1||-4 (G)||7||0 (G, A, P, W)||4||1 (W)||Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 says that the mutation have no bad influence in the protein and thats why it is probably.|
|409||F/C||-2||-4 (P)||0||0 (D, C, Q, E, K, P, V)||1||1 (R, D, C, Q, E, G, K, P)||The three values are all very low which means that this mutation is very rare. This hypothesize that the mutation damages the function and the structure of the protein|
|438||Y/N||-2||-3 (D, G, P)||3||0 (R, D, Q, G, K, M, P)||2||1 (A, R, D, Q, E, G, K, P)||Again the scores are all low which indicates the damaging effect of the mutation|
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 29 G 1 0 0 -2 1 0 -2 4 1 -1 -3 -1 -3 -2 -3 1 -1 -4 -1 -1 10 6 5 1 3 4 1 37 3 5 2 3 0 2 0 9 2 0 2 6 0.37 0.77 82 M -4 -5 -6 -6 -3 -5 -5 -6 -5 2 5 -5 5 -2 -5 -5 -3 -3 -3 1 0 0 0 0 0 0 0 0 0 12 67 0 14 0 0 0 0 0 1 6 1.23 1.27 125 Q -1 -1 -3 -3 -5 8 0 -4 0 -3 -4 -1 -1 -6 -4 -1 1 -5 -4 -4 4 2 0 0 0 74 2 0 1 1 1 2 1 0 0 2 9 0 0 0 1.46 1.28 166 Y 3 -3 -4 -4 3 0 -3 -4 1 -2 -2 -3 0 -2 -4 -1 1 7 3 1 24 1 0 0 7 5 1 1 3 1 3 1 2 1 0 3 8 15 12 11 0.62 1.29 249 G 5 -4 -3 -4 -4 -3 -2 4 -4 -5 -5 -3 -4 -5 -4 1 -2 -5 -5 -4 54 0 0 0 0 0 3 35 0 0 0 0 0 0 0 8 1 0 0 0 1.12 1.21 264 C -2 -5 -3 -5 9 -5 -5 -5 -5 3 -2 -5 -3 -4 -1 -3 -3 -5 -4 4 1 0 2 0 45 0 0 0 0 15 2 0 0 0 4 1 1 0 0 29 1.43 1.18 265 R -3 4 2 -3 -5 5 2 -4 0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2 0 0 25 12 0 0 34 15 0 1 2 0 1 0 0 0 1 1 0 1 7 0.88 1.21 326 I -3 -5 -6 -6 -4 -5 -6 -6 -6 7 0 -5 0 -2 -5 -5 -3 -5 -4 4 0 0 0 0 0 0 0 0 0 66 6 0 1 1 0 0 0 0 0 26 1.40 1.17 409 F -4 -3 -6 -6 -3 -5 -5 -6 -3 0 1 -5 1 8 -6 -5 -3 0 1 -1 1 1 0 0 1 0 0 0 0 5 11 0 3 69 0 0 1 1 3 4 1.56 1.31 438 Y 0 -2 -2 -4 -2 -1 -1 -3 3 -3 -3 -3 -3 1 -5 -3 -3 3 8 -2 9 2 1 0 1 3 3 1 6 0 1 0 0 1 0 0 0 3 66 2 1.34 0.89
The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.
The Q125E mutation has a score of 0.
The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.
The highest score for our substitutions is +5 for the M82L mutations. This mutation is conserved and likely to be tolerated.
To find out if the mutation has an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred
We compared the structure for each position:
seq: SQAALLLLRQPGARGLARSHPPRQQQQFSSLDDK non-mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC mut: HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC
seq: VISGIPIYRVMDRQGQIINPSEDPHLPKEKV non-mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH mut: CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH Position 125
seq: KEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYG non-mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC mut: HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC
seq: EAGVLMYRDYPLELFMAQCYG non-mut: HHHHHHHCCCCHHHHHHHHCC mut: CHHHHHHCCCCHHHHHHHHCC
seq: VVICYFGEGAASEGDAHAGFNFAATLECP non-mut: EEEEEECCCCCCHHHHHHHHHHHHHHCCC mut: EEEEEECCCCCCCHHHHHHHHHHHHCCCC
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
seq: IIFFCRNNGYAISTPTSEQYRGD non-mut: EEEEEECCCCCCCCCCCHHCCCC mut: EEEEEECCCEEECCCCCHHCCCH
seq: RAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS non-mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC mut: HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC
seq: KPKPNPNLLFSDVYQEMPAQL non-mut: CCCCCHHHHHHHHHCCCCHHH mut: CCCCCHHHHHHHHHCCCCHHH
seq: QEMPAQLRKQQESLARHLQTYGEHYPLDHFDK non-mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC mut: CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
As we can see by comparing the secondary structure of all the positions, most of the mutations have no influence on the secondary structure since there are no changes on the position of the mutation or in the neighbourhood. On the positions 166, 249, 264 and 265 the mutations have an influence on the structure. The mutation on position 166 has an influence on the secondary structure 6 residues earlier because the helix which starts normally at position 160 now starts at position 161. Because of the mutation on position 249 the surrounding helix is shorter because it starts one residue later and ends one residue earlier than without mutation. Because the mutations 264 and 265 are next to each other it is not clear which of them is responsibel for the change in the secondary structure or if it is the combination of the two mutations. Nevertheless there is a change in the neighbourhood of these mutations because four or five residues after the mutation occurs a beta sheet which is not in the wildtype structure. Additionally the helix which should start 19 or 20 residues after the mutation starts one position earlier.
Multiple Sequence Alignment
To find the homologue sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are mammals.
With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologous was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences.
With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.
|position||conservation wildtype||conservation mutant|
The results show that all amino acids on the observed positions are really good conserved since the value is always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for the structure and the function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function.
To run SNAP we used the command:
snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out
|nsSNP||Prediction||Reliability Index||Expected Accuracy|
The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29 and 249 would not have an influence on the protein.
A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions.
The following table displays the SIFT results. The threshold for intolerance is 0.05.
The amino acids are colored in the following way:
- uncharged polar
Capital letters: amino acids appear in the alignment
Lower case letters: amino acids result from prediction
Seq Rep:fraction of sequences that contain one of the basic amino acids
The only substitutions SIFT predicts not to affect protein function are G29E and M82L. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from methionine to leucine........ These two amino acids quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.
|166||Y/N||probably damaging||0.997||0.40||0.98||probably damaging||0.964||0.59||0.93|
|264||C/W||probably damaging||1.000||0.00||1.00||probably damaging||1.000||0.00||1.00|
|265||R/W||probably damaging||1.000||0.00||1.00||probably damaging||1.000||0.00||1.00|
|326||I/T||probably damaging||0.997||0.40||0.98||probably damaging||0.998||0.16||0.99|
|409||F/C||probably damaging||0.998||0.27||0.99||probably damaging||0.939||0.64||0.92|
|438||Y/N||probably damaging||1.000||0.00||1.00||probably damaging||0.987||0.49||0.96|
Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The two mutations which are in both datasets marked as "benign" are on the positions 29 and 249. The mutation which is only in the HumVar dataset predicted as benign is on position 125.
Comparison of the predicted results of this TASK
|Position||AA1/AA2||BLOSUM62||PAM1||PAM250||PSSM||Secondary Structure||Multiple Alignment||SNAP||SIFT||Polyphen2|
On position 29 is the mutation from Glycine to Glutamic acid. Except of BLOSUM62 and PSSM all of the other tools and sources point out that this mutation is neutral. It is interesting that these two sources have quite low scores which indicates that they are sure about their prediction. But the fact that the other matrices have very high values and that this position is not very conserved with only 72% shows that this amino acid changes quite often. Because of this actuality it is possible that the mutation is neutral. Also the result of the secondary structure prediction confirms us in this presumption because there is no change between the original and the mutated secondary structure. Additionally the residue is a coil region which means it does not endoce a structural element. It has to be mentioned that SNAP is not absolutely sure about its prediction of neutrality which can be seen on the scores. It has the score of 0 shich is the lowest one and so it is very unsure. In contrast the tools SIFT and PolyPhen2 are sure about their prediction because SIFT has a quite high score (0.68)and PolyPhen2 has two really low scores (0.025 and 0.018) which shows that these mutations are certain. Since nearly all methods predict the mutation as neutral and most of them have really good scores for that we also predict the mutation as neutral.
Basing on the predictions of the different tools it is not possible to decide wether this mutation from Glutamine to Glutamic acid is neutral or not because the tools say completely different things. By looking at the protein itself it can be seen that there is a structural difference between the two proteins since it is not possible to superpose them perfectly. But the difference between the two structures seems to be minimal which can be the explanation that there are different predictions. It is also important to recognize that most of the tools are not completely sure with the prediction they made. For example the PSSM value is 0 which is exactly on the border between neutral or not. Also SNAP is not very sure at all because the reliability index is only 1. In contrast the preditino of SIFT is very save since the value for this mutation is 0.0 which is the lowest possible value. And also PolyPhen2 is unassured given that the two different datasets predict two different things. One says neutral and the other one non-neutral. Because of this many different results it can be assumed that if this mutation is non-neutral the influence is minimal. Since the experimental structure is different we would say that this mutation is non-neutral but as already said and as it can be seen the influence is minimal and so the difference is also not that grave.
The mutation from Tyrosine to Asparagine is predicted to be non-neutral by all of the tools and sources. The values of the substitution matrices and the PSSM score are all low which shows that this mutation does not occur very often. The rareness of a change on this position is also shown by the conservation score which is 1. This reflects that the amino acid at position 166 has an important role for the structure and function of the protein and so the mutation to another amino acid is not neutral. Also the secondary structure prediction points out that there are changes because of the different amino acid. Especially the tool PolyPhen2 shows how certain this prediction is, since the scores are 0.997 (HumDiv) and 0.964 (HumVar) where 1.0 is the highest score. The prediction can be confirmed by looking at the experimental structure of the two different proteins. The mutant protein has no aromatic ring while the original protein has one. Because of this structural difference the pysicochemical properities of the two proteins are not the same.
The results of the several tools and sources are controvert to each other again. While BLOSUM62 predicts that the mutation is non-neutral, PAM1, PAM250 and the PSSM score shows that the mutation is neutral. But it has to point out that the value of BLOSUM62 is 0 which means that it is on the border to neutral. So out of the substitutions matrices and the PSSM score the mutation is neutral. But by looking on the predicted secondary structure it can be seen that there are differences predicted since the helix in the mutated protein is two positions shorter than the original one. This indicates that the mutation is non-neutral and also the fact that this position is 100% conserved let us assume that the mutation would have an influence on the structure and function2of the protein. SNAP predits the mutation to be neutral but given that the reliability index is only 1 indicates that it is nott sure. In contrast to SNAP, SIFT determines the mutation as non-neutral and is very sure because the value of Serine is 0.0 which is the lowest value and indicates mostly that a mutation is non-neutral. The two predictions of PolyPhen2 declare this SNP as neutral. All in all it is nor clear how the effect of this mutation is and again it is possible that if there is a change in the structure of the protein it is not grave since the different programs are that unsure. By looking at the experimental structure it can be seen that the mutated protein has an additionally side chain. Because of this fact we predict this mutation as non-neutral.
The declaration of this mutation as non-neutral is clear since all tools predict it as non-neutral. Basing on the subsitution matrices it can be seen that this prediction is certain because the values of all three matrices are very low. Also the PSSM score is one of the lowest possible values which also indicates that the change of this amino acid is very rare. The conservation score points can be interpreted on the same way since it is 1 which means that this mutation occurs very rare and it can be assumed that this amino acid has an important role in the structure and function of the protein. With this knowledge the mutation has to be non-neutral. The three tools SNAP, SIFT and PolyPhen2 are all absolutely sure about there prediction because all of them have the best possible scores predicting the influence of the mutation. By looking on the experimental structure and the differences between the two proteins the assumption that the mutation is non-neutral is tightened because the two proteins have a completely different structure and fold.
This mutation is predicted by nearly every tool or source to be non-neutral. Only PAM1 and PAM250 declare this position as neutral with high scores of 8 and 7. This is currious because all other tools and sources are absolutely sure that this a change of this amino acid will have an effect on the protein. The PSSM score is -5 which is really low and shows that it is unlikely that a mutation at this position is neutral and also the conservation score which is 1 shows that this amino acid is very good conserved and a change would be fatal. The tools SNAP, SIFT and PolyPhen2 predicts this mutation as definitly damaging. To be really sure we also imply the predicted and the experimental structure of the protein. By looking on the predicted structure it is clear that there is a change because a beta sheet occurs although there is none in the original protein and additionally the coil after the mutation is one residue shorter than in the wildtype. Also the experimental structure shows that the proteins have a completely different structure and folding. Because of all this facts we woulg declare this mutation as non-neutral.
Most of the prediction methods see this mutation as beeing non-neutral. Only PAM1, PAM250 and the predicted secondary structure declares this mutation as neutral. It is interesting that the two substitution matrices have high values of 7 and 4 which shows that they are quite sure about their prediction. And also by looking at the structure this guess is supported because no change occurs. But all the other methods predict this mutation beeing non-neutral. BLOSUM62 and PSSM for example have a very low value which shows that it is sure that this mutation is non-neutral. This position is also very good conserved which prefigures that a change on this position does not occur often and that a change would have a damaging influence on the protein. And also the three prediction methods declare the SNP to be damaging with a very high assurance. The structure also indicates that there is an influence by this mutation since the protein which is expressed after the mutation misses a side chain. To summarize all the results we also predict the mutation as beeing non-neutral.
Except of the predicted secondary structure all of the methods declare the mutation from Phenylalanine to Cysteine as non-neutral. It is possible that the mutation have an effect on a very distant position and so it is not detected by looking on the next neighbourhood of the mutated position. this can be an explanation for this prediction. All the other methos are sure about the prediction which can be seen for example on the very low values in the substitution matrices or of the PSSM score. And the three prediction tools have also score which point out that the mutation is clearly non-neutral. To safeguard this prediction we also look on the amino acids and their proterties. This comperison ensures the prediction since the only equal property is the hydrophobicity. The wildtype amino acid is additionally aromatic whereas the mutated amino acid is sulphur containing, tiny, small and polar. This shows that the two amino acids are nearly completely different. This awareness is repeated by looking on the seondary structure and the differences. The three pictures demonstrate that the structures are different. since the mutated protein has no aromatic ring. Because of all these disparities we decide that the mutation is non-neutral.
On position 438 it is nearly the same as on position 409. The mutation of Tyrosine to Asparagine is predicted as non-neutral of nearly all methods. Only the predicted secondary structure declares the mutation as neutral. One explanation for this result is that the mutation influences a position which is far away. So the change in the secondary structure can not be recognized. This is very probable since all the other methods predict it as non-neutral. The prediction by the substituation matrices and the PSSM is not as sure as the one of the mutation on position 409 since the values are not as low as them but still they are low enough to decide that the mutation will have an influence on the structure and function of the protein. SNAP and SIFT are as sure as on the last mutation which means that this result is absolutely sure since the values are very good (SNAP:4, SIFT:0.0). PolyPhen2 is even more certain about the result because it predicts the damaging influence of this mutation with 1.0 (HumDiv) and 0.987 (HumVar). Again we also look on the pysicochemical properities and the structure of the two amino acids to find the differences. The only property which is the same in both amino acids is the polarity. The other properties are completely different since the mutated amino acid is acidic and small whereas the original amino acid is hydrophobic and aromatic. The missing aromatic property in the mutated amino acid indicates that there also have to be two different structures because the mutated protein won't have an aromativ ring. By looking at the structure this assumption is confirmed. Since all the methods and also the structure and the properties declare the mutation as non-neutral we also predict it as non-neutral.
Comparison of the prediction of the tools with the dbSNP and HGMD