Difference between revisions of "Sequence-based mutation analysis BCKDHA"

Latest revision as of 10:37, 26 August 2011

General

We chose the following mutations for the sequence-based mutation analysis:

Figure 1: Reference amino acids which are substituted coloured as follows: SNP is listed only in dbSNP, only in HGMD, in HGMD and dbSNP

Figure 2: Protein with mutated positions

G29E
M82L
Q125E
Y166N
G249S
C264W
R265W
I326T
F409C
Y438N

The mutation positions are relative to the Uniprot reference sequence. Figure 1 shows the BCKDHA protein where the chosen mutation positions are colored according to the source, which listed the mutation. Figure 2 shows the protein where the amino acids listed above are mutated.

A Protocol was created describing all steps for running the programs etc.

Amino Acid Properties

	Reference amino acid			Mutated amino acid			Structural Difference	Secondary Structure
Position	Residue	Properties	Structure	Residue	Properties	Structure	Structural Difference	Prediction	in protein structure
29	G	tiny, small		E	charged, polar			C
82	M	sulphur containing, hydrophobic	Methionine on position 82	L	hydrophobic, aliphatic	Leucine on position 82	Wt and mut on pos 82	C	Pos 82 in protein structure
125	Q	acidic, polar	Glutamine on position 125	E	charged, polar	Glutamic acid on position 125	Wt and mut on pos 125	C	Pos 125 in protein structure
166	Y	hydrophobic, aromatic, polar	Tyrosine on position 166	N	acidic, polar, small	Asparagine on position 166	Wt and mut on pos 166	H	Pos 166 in protein structure
249	G	tiny, small	Glycine on position 249	S	polar, small, tiny, hydroxylic	Serine on position 249	Wt and mut on pos 249	H	Pos 249 in protein structure
264	C	sulphur containing, hydrophobic, tiny, small, polar	Cysteine on position 264	W	hydrophobic, aromatic, polar	Tryptophan on position 264	Wt and mut on pos 264	E	Pos 264 in protein structure
265	R	charged, positive (basic), polar	Arginine on position 265	W	hydrophobic, aromatic, polar	Tryptophan on position 265	Wt and mut on pos 265	E	Pos 265 in protein structure
326	I	aliphatic, hydrophobic	Isoleucine on position 326	T	hydroxylic, hydrophobic, small, polar	Threonine on position 326	Wt and mut on pos 326	E	Pos 326 in protein structure
409	F	aromatic, hydrophobic	Phenylalanine on position 409	C	sulphur containing, hydrophobic, tiny, small, polar	Cysteine on position 409	Wt and mut on pos 409	C	Pos 409 in protein structure
438	Y	hydrophobic, aromatic, polar	Tyrosine on position 438	N	acidic, polar, small	Asparagine on positiono 438	Wt and mut on pos 438	C	Pos 438 in protein structure

Annotation: WT = Wildtype Amino acid, Mut = mutated amino acid H = helix, E = beta-sheet, C = coil

To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.

The Protocol describes in detail the way how we used pymol to visualize our mutations.

Discussion

Looking at the differences in structure and biochemical properties of the reference amino acid and the mutated amino acid, someone might draw conclusions about the effect of the mutation on the proteins' function.

G29E: This prediction lies in the signal peptide of the protein. Therefore the mutation has no direct effect on the function of the protein. Anyhow, looking at the biochemical properties, they are quite different and especially Glycin has due to its unique smallness a very special role in most of the protein sequences and a mutation might destroy the protein structure if it was within the actual protein sequence. From this point of view this mutation might cause an effect on the protein's function.
M82L: Methionine and leucine are both hydrophobic amino acids with an similar size. The loss of the sulfur-containing methionine could lead to an disease causing effect but as the structures are quite similar this mutation might be tolerated. Furthermore the amino acid does not contribute in the formation of any important secondary structure, so a substitution here can be even more be tolerated.
Q125E: The structure of these two amino acids is almost identical. The only difference is the substitution of an hydroxy group with an amino group. This changes the physical property from uncharged to a negative charge, which might influence the protein's function.
Y166N: This substitution leads to a mutation of an hydrophobic aromatic residue to a small polar amino acid. These differences in the amino acids' properties are likely to change the proteins' structure as this residue is located in a helix and therefore this mutation is very likely to affect the protein's function.
G249S: This mutation introduces a polar, hydroxylic amino acid to a position in a helix where usually a tiny, unpolar amino acids is located. As the size of both amino acids is quite small, therefore a mutation might be benign.
C264W: This mutation has a huge impact on the structure and physiochemical properties of the amino acid at this position. A small, sulphur-containing amino acid is replaced by an aromatic amino acid, which occupies a lot more space. The hydrophobicity and polarity remain the same, nevertheless are the amino acids very different and this mutation will destroy the protein's function.
R265W: Here a positively charged amino acid is substituted by a hydrophobic one. This change is quite severe, therefore we assume the protein's function cannot be maintained.
I326T: This mutation introduces a hydroxy group to the amino acid which makes it polar. This is quite an important change which might have an influence on the function of the protein.
F409C: This substitution leads to a totally different amino acid concerning structure and physiochemical properties. A bulky aromatic residue is substituted by a small, polar, sulphur-containing amino acid. These drastic changes might very likely change the protein's structure and therefore affect its function.
Y438N: This mutation also changes the amino acid completely. The big aromatic residue which is hydrophobic is substituted by a small polar one. These differences are likely to affect the protein function.

Substitution matrices

Now we take a look at substitution matrices like BLOSUM62 (Figure 3), PAM1 (Figure 4) and PAM250 (Figure 5). We looked at the scores for the real mutations given by HGMD and dbSNP and compared these substitutions with the worst possible substitutions for the corresponding amino acids.

Figure 3: BLOSUM62

Figure 4: PAM1

Figure 5: PAM250

Position	AA1/AA2	BLOSUM62		PAM1		PAM250		result
Position	AA1/AA2	score	worst	score	worst	score	worst
29	G/E	-2	-4 (I, L)	7	0 (I, W, Y)	9	2 (W)	BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous
82	M/L	2	-3 (D, G)	8	0 (N, D, C, E, G, H, P, W, Y)	3	0 (C)	The three values are positive and quite high relative to the other values of Methionin. This means that all thress matrices indicate that this mutation occurs quite often
125	Q/E	2	-3 (C, F, I)	27	0 (F, W, Y)	7	1 (C, F, W)	All three substitution matrices show that this mutation occurs quite often
166	Y/N	-2	-3 (D, G, P)	3	0 (R, D, Q, G, K, M, P)	2	1 (A, R, D, Q, E, G, K, P)	Since the values of the three matrices are low the mutation does not occur very often which suggests that it could effect a change in the proteins' function
249	G/S	0	-4 (I, L)	21	0 (I, W, Y)	11	2 (W)	All three values are high what indicates that this mutation is quite common and therefore probably not very damaging
264	C/W	-2	-4 (E)	0	0 (N, D, Q, E, G, L, K, M, F, W)	1	1 (R, N, D, Q, E, L, K, M, F, W)	The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein
265	R/W	-3	-3 (W, V, F, I, C)	8	0 (D, E, G, Y)	7	1 (F)	PAM1 and PAM250 have high values whereas BLOSUM62 has a low value for this substitution. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite common and so not very damaging
326	I/T	-1	-4 (G)	7	0 (G, A, P, W)	4	1 (W)	Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 indicate that the mutation have no negative influence on the protein
409	F/C	-2	-4 (P)	0	0 (D, C, Q, E, K, P, V)	1	1 (R, D, C, Q, E, G, K, P)	The three values are all very low which means that this mutation is very rare. This suggests that the mutation affects the function and the structure of the protein
438	Y/N	-2	-3 (D, G, P)	3	0 (R, D, Q, G, K, M, P)	2	1 (A, R, D, Q, E, G, K, P)	Again the scores are all low which indicates the damaging effect of the mutation

PSSM

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  29 G    1  0  0 -2  1  0 -2  4  1 -1 -3 -1 -3 -2 -3  1 -1 -4 -1 -1   10   6   5   1   3   4   1  37   3   5   2   3   0   2   0   9   2   0   2   6  0.37 0.77
  82 M   -4 -5 -6 -6 -3 -5 -5 -6 -5  2  5 -5  5 -2 -5 -5 -3 -3 -3  1    0   0   0   0   0   0   0   0   0  12  67   0  14   0   0   0   0   0   1   6  1.23 1.27
 125 Q   -1 -1 -3 -3 -5  8  0 -4  0 -3 -4 -1 -1 -6 -4 -1  1 -5 -4 -4    4   2   0   0   0  74   2   0   1   1   1   2   1   0   0   2   9   0   0   0  1.46 1.28
 166 Y    3 -3 -4 -4  3  0 -3 -4  1 -2 -2 -3  0 -2 -4 -1  1  7  3  1   24   1   0   0   7   5   1   1   3   1   3   1   2   1   0   3   8  15  12  11  0.62 1.29
 249 G    5 -4 -3 -4 -4 -3 -2  4 -4 -5 -5 -3 -4 -5 -4  1 -2 -5 -5 -4   54   0   0   0   0   0   3  35   0   0   0   0   0   0   0   8   1   0   0   0  1.12 1.21
 264 C   -2 -5 -3 -5  9 -5 -5 -5 -5  3 -2 -5 -3 -4 -1 -3 -3 -5 -4  4    1   0   2   0  45   0   0   0   0  15   2   0   0   0   4   1   1   0   0  29  1.43 1.18
 265 R   -3  4  2 -3 -5  5  2 -4  0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2  0    0  25  12   0   0  34  15   0   1   2   0   1   0   0   0   1   1   0   1   7  0.88 1.21
 326 I   -3 -5 -6 -6 -4 -5 -6 -6 -6  7  0 -5  0 -2 -5 -5 -3 -5 -4  4    0   0   0   0   0   0   0   0   0  66   6   0   1   1   0   0   0   0   0  26  1.40 1.17
 409 F   -4 -3 -6 -6 -3 -5 -5 -6 -3  0  1 -5  1  8 -6 -5 -3  0  1 -1    1   1   0   0   1   0   0   0   0   5  11   0   3  69   0   0   1   1   3   4  1.56 1.31
 438 Y    0 -2 -2 -4 -2 -1 -1 -3  3 -3 -3 -3 -3  1 -5 -3 -3  3  8 -2    9   2   1   0   1   3   3   1   6   0   1   0   0   1   0   0   0   3  66   2  1.34 0.89

The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.

The Q125E mutation has a score of 0.

The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.

The highest score for our substitutions is +5 for the M82L mutations. Accoring to the pssm this mutation is conserved and likely to be tolerated. This can be explained by the similar size and hydrophobic nature of both amino acids.

Secondary Structure

To find out if the mutations have an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred

We compared the structure for each position. The mutation positions are colored red and the regions where changes occure are colored blue:

Position 29          

seq:      SQAALLLLRQPGARGLARSHPPRQQQQFSSLDDK
non-mut:  HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC
mut:      HHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCC 


Position 82 

seq:      VISGIPIYRVMDRQGQIINPSEDPHLPKEKV
non-mut:  CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH
mut:      CCCCCCEEEEECCCCCCCCCCCCCCCCHHHH

Position 125         

seq:      KEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYG 
non-mut:  HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC
mut:      HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCC 
 

Position 166 
        
seq:      EAGVLMYRDYPLELFMAQCYG 
non-mut:  HHHHHHHCCCCHHHHHHHHCC 
mut:      CHHHHHHCCCCHHHHHHHHCC 


Position 249         

seq:      VVICYFGEGAASEGDAHAGFNFAATLECP   
non-mut:  EEEEEECCCCCCHHHHHHHHHHHHHHCCC 
mut:      EEEEEECCCCCCCHHHHHHHHHHHHCCCC 


Position 264         

seq:      IIFFCRNNGYAISTPTSEQYRGD 
non-mut:  EEEEEECCCCCCCCCCCHHCCCC
mut:      EEEEEECCCEEECCCCCHHCCCH 


Position 265         

seq:      IIFFCRNNGYAISTPTSEQYRGD 
non-mut:  EEEEEECCCCCCCCCCCHHCCCC 
mut:      EEEEEECCCEEECCCCCHHCCCH 


Position 326          

seq:      RAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS
non-mut:  HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC 
mut:      HHHCCCCCEEEEEECCCCCCCCCCCCCCCCCC 


Position 409          

seq:      KPKPNPNLLFSDVYQEMPAQL 
non-mut:  CCCCCHHHHHHHHHCCCCHHH
mut:      CCCCCHHHHHHHHHCCCCHHH 


Position 438          

seq:      QEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
non-mut:  CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
mut:      CCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC

As we can see by comparing the secondary structure of all the positions, most of the mutations have no influence on the secondary structure since there are no changes on the position of the mutation or in the neighbourhood. On the positions 166, 249, 264 and 265 the mutations have an influence on the structure. The mutation on position 166 has an influence on the secondary structure 6 residues earlier because the helix which starts normally at position 160 now starts at position 161. Because of the mutation on position 249 the surrounding helix is shorter because it starts one residue later and ends one residue earlier than without mutation. Because the mutations 264 and 265 are next to each other it is not clear which of them is responsibel for the change in the secondary structure or if it is the combination of the two mutations. Nevertheless there is a change in the neighbourhood of these mutations because four or five residues after the mutation occurs a beta sheet which is not in the wildtype structure. Additionally the helix which should start 19 or 20 residues after the mutation starts one position earlier.

Multiple Sequence Alignment

To find the homolog sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are of mammals.

ID	Accession	Entry name
sp	P11178	ODBA_BOVIN
sp	P12694	ODBA_HUMAN
sp	Q8HXY4	ODBA_MACFA
sp	P50136	ODBA_MOUSE
sp	A5A6H9	ODBA_PANTR
sp	P11960	ODBA_RAT
tr	Q6ZSA3	Q6ZSA3_HUMAN
tr	E7ESE6	E7ESE6_HUMAN
tr	B2R8A9	B2R8A9_HUMAN
tr	Q658P7	Q658P7_HUMAN
tr	E7EW46	E7EW46_HUMAN
tr	B4DP47	B4DP47_HUMAN
tr	Q59EI3	Q59EI3_HUMAN
tr	F1N5F2	F1N5F2_BOVIN
tr	B1PK12	B1PK12_PIG
tr	E2RPW4	E2RPW4_CANFA
tr	B2LSM3	B2LSM3_SHEEP
tr	F1RHA0	F1RHA0_PIG
tr	F1PI86	F1PI86_CANFA
tr	D2HMT3	D2HMT3_AILME
tr	Q2TBT9	Q2TBT9_BOVIN
tr	Q3U3J1	Q3U3J1_MOUSE
tr	Q99L69	Q99L69_MOUSE
tr	Q5EB89	Q5EB89_RAT
tr	B1WBN3	B1WBN3_RAT

Figure 6: Multiple Alignment of the homologous sequences of BCKDHA with CLUSTALW

With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologs was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences (see Figure 6).

With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.

position	conservation wildtype	conservation mutant
29	0.7	0
82	0.96	0
125	0.96	0.04
166	1	0
249	1	0
264	1	0
265	1	0
326	0.91	0
409	0.91	0
438	0.91	0

The results show that all amino acids on the observed positions are really good conserved since the values are always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for structure and function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function. The only position where the conservation of the mutation is higher than 0 is on position 125. But as we can see the conservation is absolutely minimal so it can be neglected.

SNAP

mutation prediction

To run SNAP we used the command:

snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out

nsSNP	Prediction	Reliability Index	Expected Accuracy
G29E	Neutral	0	53%
M82E	Non-neutral	4	82%
Q125E	Non-neutral	1	63%
Y166N	Non-neutral	2	70%
G249S	Neutral	1	60%
C264W	Non-neutral	4	82%
R265W	Non-neutral	4	82%
I326T	Non-neutral	3	78%
F409C	Non-neutral	4	82%
Y438N	Non-neutral	4	82%

The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29 and 249 probably have no influence on the protein.

position specific prediction

A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions. This run should show whether the substituted amino acid is essential at the corresponding position of the sequence or the mutation can not be tolerated because an unwanted effect was introduced by drastically changing the physiochemical properties of the amino acid.

The following table shows for each position to what extend each of the positions is predicted to tolerate mutations.

Mutation	Tolerated Substitutions	Non-tolerated Substitutions	Ratio tolerated Mutations
G29E	ARNDCQEGHILKMFPSTWYV		100%
M82E	ILM	ARNDCQEGHKFPSTWYV	15%
Q125E	Q	ARNDCQEGHILKMFPSTWYV	5%
Y166N	QHMFWY	ARNDCEGILKPSTV	30%
G249S	AGS	RNDCQEHILKMFPTWYV	15%
C264W	C	ARNDQEGHILKMFPSTWYV	5%
R265W	R	ANDCQEGHILKMFPSTWYV	5%
I326T	ILV	ARNDCQEGHKMFPSTWY	15%
F409C	F	ARNDCQEGHILKMPSTWYV	5%
Y438N	Y	ARNDCQEGHILKMFPSTWV	5%

This table shows that only the position 29 is not essential for the protein's functions. This is explainable by the fact that this mutation lies not within the actual protein sequence.

The position which allows for the most substitutions is the tyrosin at position 166. This tyrosin constitutes the end of a helix on the surface of the protein, a fact which might explain the variabilty of this position. The selected mutation to asparagine is not tolerated due to a quite different structure.

Position 82 allows the mutation for 3 amino acids, which are structurally all very similar. So for this position the structure seems to be important. The same is true for position 249 where alanine, glycine and serine are predicted to be neutral to the protein function. All of these amino acids are quite small and may therefore not disturb the protein structure and function.

On position 326 all three branched-chain amino acids are predicted to have a neutral effect on the protein's function. These amino acids are structurally and biochemically very similar and therefore a substitution is tolerated. The mutation to threonine however is not tolerated as this amino acid has different properties.

All other mutations are not tolerated at all. This means for these positions the wild-type residues are totally essential for the protein's structure and function and cannot be replaced by any other amino acid.

SIFT

The following table displays the SIFT results. The threshold for intolerance is 0.05.

The amino acids are colored in the following way:

nonpolar
uncharged polar
basic
acidic

Capital letters: amino acids appear in the alignment

Lower case letters: amino acids result from prediction

Seq Rep:fraction of sequences that contain one of the basic amino acids

Pos	Ref AA	Mut AA	SIFT prediction			Prediction
Pos	Ref AA	Mut AA	Predict Not Tolerated	Seq Rep	Predict Tolerated	Prediction
29	G	E		0.37	`wcmPdIGnqRhVTkeSFLAy`	tolerated
82	M	L	`dhgnweyrkspqafvi`	0.91	`TCLM`	tolerated
125	Q	E	`ywvtsrpnmlkihgfedca`	0.98	`Q`	not tolerated
166	Y	N	`cpdmeqkngrtisval`	1.00	`FHYW`	not tolerated
249	G	S	`whyfimrqnlckdvtps`	1.00	`EGA`	not tolerated
264	C	W	`ywvtsrqpnmlkihgfeda`	0.98	`C`	not tolerated
265	R	W	`ywvtsqpnmlkihgfedca`	1.00	`R`	not tolerated
326	I	T	`hdwpneqcrsgkytaM`	1.00	`FLVI`	not tolerated
409	F	C	`hndkrqgecpstamvwiy`	1.00	`LF`	not tolerated
438	Y	N	`wvtsrqpnmlkihgfedca`	1.00	`Y`	not tolerated

The only substitutions SIFT predicts not to affect protein function are G29E and M82L. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from methionine to leucine. These two amino acids are quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.

Polyphen2

Position	AA1/AA2	HumDiv				HumVar
Position	AA1/AA2	prediction	Score	Sensitivity	Specificity	prediction	Score	Sensitivity	Specificity
29	G/E	benign	0.025	0.96	0.80	benign	0.018	0.96	0.52
82	M/L	benign	0.001	0.99	0.15	benign	0.001	0.99	0.08
125	Q/E	possibly damaging	0.759	0.85	0.93	benign	0.285	0.87	0.75
166	Y/N	probably damaging	0.997	0.40	0.98	probably damaging	0.964	0.59	0.93
249	G/S	benign	0.145	0.93	0.86	benign	0.292	0.86	0.75
264	C/W	probably damaging	1.000	0.00	1.00	probably damaging	1.000	0.00	1.00
265	R/W	probably damaging	1.000	0.00	1.00	probably damaging	1.000	0.00	1.00
326	I/T	probably damaging	0.997	0.40	0.98	probably damaging	0.998	0.16	0.99
409	F/C	probably damaging	0.998	0.27	0.99	probably damaging	0.939	0.64	0.92
438	Y/N	probably damaging	1.000	0.00	1.00	probably damaging	0.987	0.49	0.96

Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The three mutations which are in both datasets marked as "benign" are at the positions 29, 82 and 249. The mutation which is only in the HumVar dataset predicted as benign is at position 125.

Comparison

Comparison of the predicted results of this TASK

Position	AA1/AA2	BLOSUM62	PAM1	PAM250	PSSM	Secondary Structure	Multiple Alignment	SNAP	SIFT	Polyphen2
						Prediction	Conservation wildtype	Prediction	Prediction	HumDiv	HumVar
29	G/E	non-neutral	neutral	neutral	non-neutral	neutral	neutral	neutral	neutral	neutral	neutral
82	M/L	neutral	neutral	neutral	neutral	neutral	non-neutral	non-neutral	neutral	neutral	neutral
125	Q/E	neutral	neutral	neutral	non-neutral	neutral	non-neutral	non-neutral	non-neutral	non-neutral	neutral
166	Y/N	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral
249	G/S	non-neutral	neutral	neutral	neutral	non-neutral	non-neutral	neutral	non-neutral	neutral	neutral
264	C/W	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral
265	R/W	non-neutral	neutral	neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral
326	I/T	non-neutral	neutral	neutral	non-neutral	neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral
409	F/C	non-neutral	non-neutral	non-neutral	non-neutral	neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral
438	Y/N	non-neutral	non-neutral	non-neutral	non-neutral	neutral	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral

G29E: Except of BLOSUM62 and PSSM all of the other tools and sources point out that this mutation is neutral. It is interesting that these two sources have quite low scores which indicates that they are sure about their prediction. But the fact that the other matrices have very high values and that this position is not very conserved with only 72% shows that this amino acid changes quite often. Because of this it is possible that the mutation is neutral. Also the result of the secondary structure prediction confirms us in this presumption because there is no change between the original and the mutated secondary structure. Additionally the residue is in a coil region which means it does not endoce a structural element. It has to be mentioned that SNAP is not absolutely sure about its prediction of neutrality which can be seen on the scores. It has a reliability index of 0 which is the lowest one and so the prediction is very unsure. In contrast, the tools SIFT and PolyPhen2 are sure about their prediction because SIFT has a quite high score (0.68)and PolyPhen2 has two really low scores (0.025 and 0.018) which shows that these mutations are certain. Since nearly all methods predict the mutation as neutral and this mutation happens in the signal peptide of our protein we also assume the mutation to be neutral.

M82L: The mutation on position 82 from methionine to leucine is predicted to be neutral by most of the methods. Only SNAP and the conservation score declare the mutation as non-neutral. The prediction of SNAP has a reliability index of 4 which means that SNAP is very sure about the result of its prediction. By looking on the conservation score it arises that this position is indeed very good conserved because the value is 96%. But the value is not 1 which shows that some mutations on this position are possible. In contrast to these two results the substitution matrices display that the mutation is neutral because all of them have high values for the change from methionine to leucine. This is due to structural similarities of these two amino acids. The PSSM score is also very high which indicates that this mutation has no damaging effect on the structure and function of the protein. The two tools SIFT and PolyPhen2 are also very sure about the fact that this mutation is neutral. This is based on the fact that SIFT predicts it with a high value of 0.65 and PolyPhen2 with a low value of 0.001. Another additional information is the structure of the two proteins. By comparing the two structures it appears that there is nearly no change in the structure of the protein. Because of this fact and all the other methods which declare this mutation as neutral, we also predict it as neutral.

Q125E: Based on the predictions of the different tools it is not possible to decide whether this mutation from glutamine to glutamic acid is neutral or not because the tools say completely different things. By looking at the structure of the amino acids it can be seen that they are quite similar but lead to different physiochemical properties. This can be an explanation for the different predictions, depending on whether the tool takes the amino acid properties or the structure into account by predicting its effect on the protein's function. It is also important to recognize that most of the tools are not completely sure with the prediction they made. For example the PSSM value is 0 which is exactly on the border between neutral or not. SNAP is also not very sure at all because the reliability index is only 1. In contrast the predition of SIFT is very save since the value for this mutation is 0.0 which is the lowest possible value. PolyPhen2 is uncertain given that the two different datasets predict two different things. Because of these different results it can be assumed that if this mutation is non-neutral the influence is minimal. Since the experimental structure is different we would say that this mutation is non-neutral but as already said it can be assumed that the influence is minimal and so the effect on the protein function is not that grave.

Y166N: The mutation from tyrosine to asparagine is predicted to be non-neutral by all of the tools and sources. The values of the substitution matrices and the PSSM score are all low which shows that this mutation does not occur very often. The rareness of a change on this position is also shown by the conservation score which is 1. This reflects that the amino acid at position 166 has an important role for the structure and function of the protein and so the mutation to another amino acid is not neutral. The secondary structure prediction points out that there are changes because of the different amino acid. Especially the tool PolyPhen2 shows how certain this prediction is, since the scores are 0.997 (HumDiv) and 0.964 (HumVar) with 1.0 being the highest score. The prediction can be confirmed by looking at the experimental structure of the two different amino acids. The mutant residue has no aromatic ring while the original residue has one. Because of this structural difference the physicochemical properities of the two amino acids are not the same and a mutation would have an huge impact on the protein's function.

G249S: The results of several tools and sources are controversial again. While BLOSUM62 indicates that the mutation is non-neutral, PAM1, PAM250 and the PSSM score show that the mutation is neutral. But it has to be pointed out that the value of BLOSUM62 is 0 which means that it is on the border to be neutral. So looking at the substitutions matrices and the PSSM score the mutation is neutral. But by looking at the predicted secondary structure it can be seen that there are differences since the helix in the mutated protein is two positions shorter than the original one. This indicates that the mutation is non-neutral and also the fact that this position is 100% conserved lets us assume that the mutation would have an influence on the structure and function of the protein. SNAP predicts the mutation to be neutral but given that the reliability index is only 1 indicates that it is not sure. In contrast to SNAP, SIFT determines the mutation as non-neutral and is very sure because the value of serine is 0.0 which is the lowest value and indicates mostly that a mutation is non-neutral. The two predictions of PolyPhen2 declare this SNP as neutral. All in all it is nor clear how the effect of this mutation is and again it is possible that if there is a change in the structure of the protein it is not grave since the different tools are that unsure. By looking at the experimental structure it can be seen that the mutated protein has an additionally side chain but both amino acids are very small and a mutation might be tolerated. Because of this fact we predict this mutation as non-neutral.

C264W: The declaration of this mutation as non-neutral is clear since all tools predict it as non-neutral. Based on the substitution matrices it can be seen that this substitution is very uncommon because the values of all three matrices are very low. The PSSM score is one of the lowest possible values which also indicates that the change of this amino acid is very rare. The conservation score can be interpreted the same way since it is 1 which means that this mutation occurs very rarely and it can be assumed that this amino acid has an important role in the structure and function of the protein. This fact is also confirmed by the SNAP scan using all possible mutations. With this knowledge the mutation has to be non-neutral. The three tools SNAP, SIFT and PolyPhen2 are all absolutely sure about there prediction because all of them have the best possible scores predicting the influence of the mutation. By looking on the experimental structure and the differences between the two amino acids the assumption that the mutation is non-neutral is tightened.

R265W: This mutation is predicted by nearly every tool or source to be non-neutral. Only PAM1 and PAM250 declare this position as neutral with high scores of 8 and 7. This is curious because all other tools and sources are absolutely sure that a change of this amino acid will have an effect on the protein. The PSSM score is -5 which is really low and shows that it is unlikely that a mutation at this position is neutral and also the conservation score which is 1 shows that this amino acid is very good conserved and a change would be fatal. The tools SNAP, SIFT and PolyPhen2 predicts this mutation as definitly damaging. To be really sure we also imply the predicted and the experimental structure of the protein. By looking at the predicted structure it is clear that there is a change because a beta sheet occurs although there is none in the original protein and additionally the coil after the mutation is one residue shorter than in the wildtype. The damaging fact of this mutation can furthermore be explained by the fact that this position is absolutely essential for the protein's function as SNAP predicts no mutation on this position to be tolerated. Therefore we would declare this mutation as non-neutral.

I326T: Most of the prediction methods see this mutation as being non-neutral. Only PAM1, PAM250 and the predicted secondary structure declare this mutation as neutral. It is interesting that the two substitution matrices have high values of 7 and 4 which shows that this substitution is quite common. And also by looking at the structure this guess is supported because no change occurs. But all the other methods predict this mutation to be non-neutral. BLOSUM62 and PSSM, for example, have a very low value which shows that it is sure that this mutation has a damaging effect. This position is also very good conserved which prefigures that a change on this position does not occur often and that a mutation would have a damaging influence on the protein. The three prediction methods declare the SNP to be damaging with a very high assurance. And also the properties of the two amino acids indicate that there is an influence by this mutation since they are completely different. To summarize all the results we also predict the mutation as beeing non-neutral.

F409C: Except of the predicted secondary structure all of the methods declare the mutation from phenylalanine to cysteine as non-neutral. It is possible that the mutation have an effect on a very distant position and so it is not detected by looking on the next neighbourhood of the mutated position. This can be an explanation for this prediction. All the other methods are sure about the prediction which is shown for example by the very low values in the substitution matrices or by the PSSM score. Additionally the three prediction tools have scores which point out that the mutation is clearly non-neutral. To safeguard this prediction we also look on the amino acids and their properties. This comparison ensures the prediction since the bulky hydrophobic ring is substituted by a small, sulfur-containg side chain. This shows that the two amino acids are totally different. Furthermore SNAP predicts this position to be essential for the protein's function as no other amino acids than phenylalanine is tolerated at this position. Because of all these disparities we decide that the mutation is non-neutral.

Y438N: On position 438 it is nearly the same as on position 409. The mutation of tyrosine to asparagine is predicted as non-neutral by nearly all methods. Only the predicted secondary structure could lead to the assumption that the mutation is neutral. One explanation for this result is that the mutation influences a position which is far away. So the change in the secondary structure can not be recognized. This is very likely since all the other methods predict this substitution as non-neutral. The prediction by the substitution matrices and the PSSM is not as sure as the one of the mutation on position 409 since the values are not as low as them but still they are low enough to decide that the mutation will have an influence on the structure and function of the protein. SNAP and SIFT are as sure as on the last mutation which means that this result is absolutely sure since the values are very good (SNAP:4, SIFT:0.0). PolyPhen2 is even more certain about the result because it predicts the damaging influence of this mutation with 1.0 (HumDiv) and 0.987 (HumVar). Again we also look on the physiochemical properities and the structure of the two amino acids to find out if there are differences. The only property which is the same in both amino acids is the polarity. The other properties are completely different since the mutated amino acid is acidic and small whereas the original amino acid is hydrophobic and aromatic. The missing aromatic ring may be the fact that SNAP only allows for tyrosin to be in this position. Since all the methods and also the structure and the properties declare the mutation as non-neutral we also predict it to be non-neutral.

Comparison of the prediction of the tools with the annotation in dbSNP and HGMD

In order to categorize the SNPs extracted from dbSNP and HGMD into "neutral" and "non-neutral" concering their effect on the protein function the following assumption was made:

All SNPs listed in the HGMD are disease related mutations. SNPs that are listed in both databases do have an effect on the protein's structure and function. All SNPs only listed in dbSNPs do not affect the protein function.

Position	AA1/AA2	dbSNP/HGMD	own prediction	SNAP	SIFT	PolyPhen2
29	G/E	neutral	neutral	neutral	neutral	neutral/neutral
82	M/L	neutral	neutral	non-neutral	neutral	neutral/neutral
125	Q/E	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/neutral
166	Y/N	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral
249	G/S	non-neutral	non-neutral	neutral	non-neutral	neutral/neutral
264	C/W	neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral
265	R/W	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral
326	I/T	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral
409	F/C	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral
438	Y/N	non-neutral	non-neutral	non-neutral	non-neutral	non-neutral/non-neutral

Prediction accuracy of the tools

The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)

tool	TP	TN	FP	FN	Sensitivity	Specificity	Accuracy
SNAP	2	7	1	0	0.33	0.86	0.7
SIFT	2	7	0	1	0.66	1.0	0.9
PolyPhen2 (HumDiv)	2	6	1	1	0.66	0.86	0.8
PolyPhen2 (HumVar)	2	5	2	1	0.66	0.71	0.7

Comparing the different prediction methods, someone should look at the accuracy of the predictions. In our case SIFT performed the best and made the most correct predictions. PolyPhen using the HumDiv dataset made also 8 out of 10 predictions correct, SNAP and PolyPhen (HumVar) predicted only the effect of 7 mutations on the protein function correctly.

go back to Maple_syrup_urine_disease main page
go back to Task 5 Mapping SNPs

go to Task7 Structure-based mutation analysis

@@ Line 631: / Line 631: @@
 |}
-==== Precition accuracy  of the tools  ====
+==== Prediction accuracy  of the tools  ====
 The following table shows number of correct predicted effects on the protein function concerning the actual effect on the protein as annotated in dbSNP and HGMD. (TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives)

Difference between revisions of "Sequence-based mutation analysis BCKDHA"

Latest revision as of 10:37, 26 August 2011

Contents

General

Amino Acid Properties

Discussion

Substitution matrices

PSSM

Secondary Structure

Multiple Sequence Alignment

SNAP

mutation prediction

position specific prediction

SIFT

Polyphen2

Comparison

Comparison of the predicted results of this TASK

Comparison of the prediction of the tools with the annotation in dbSNP and HGMD

Prediction accuracy of the tools

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools