Difference between revisions of "Sequence-based mutation analysis BCKDHA"
(→Secondary Structure) |
(→Secondary Structure) |
||
Line 147: | Line 147: | ||
Position 29 <br> |
Position 29 <br> |
||
seq: LLLRQP<font color=red>G</font>ARG |
seq: LLLRQP<font color=red>G</font>ARG |
||
− | non-mut: HHHCCC<font color=red>C</font>CCC |
+ | non-mut: HHHCCC<font color=red>C</font>CCC |
mut: HHHCCC<font color=red>C</font>CCC<br> |
mut: HHHCCC<font color=red>C</font>CCC<br> |
||
Position 125 <br> |
Position 125 <br> |
||
seq: ESQR<font color=red>Q</font>GRISF |
seq: ESQR<font color=red>Q</font>GRISF |
||
− | non-mut: HHHH<font color=red>C</font>CCCCC |
+ | non-mut: HHHH<font color=red>C</font>CCCCC |
mut: HHHH<font color=red>C</font>CCCCC <br> |
mut: HHHH<font color=red>C</font>CCCCC <br> |
||
Position 166 <br> |
Position 166 <br> |
||
seq: AGVLM<font color=red>Y</font>RDYP |
seq: AGVLM<font color=red>Y</font>RDYP |
||
− | non-mut: HHHHH<font color=red>H</font>CCCC |
+ | non-mut: HHHHH<font color=red>H</font>CCCC |
mut: HHHHH<font color=red>H</font>CCCC <br> |
mut: HHHHH<font color=red>H</font>CCCC <br> |
||
Position 249 <br> |
Position 249 <br> |
||
seq: ASEGDAHA<font color=red>G</font>F |
seq: ASEGDAHA<font color=red>G</font>F |
||
− | non-mut: CCHHHHHH<font color=red>H</font>H |
+ | non-mut: CCHHHHHH<font color=red>H</font>H |
mut: CCCHHHHH<font color=red>H</font>H <br> |
mut: CCCHHHHH<font color=red>H</font>H <br> |
||
Position 264 <br> |
Position 264 <br> |
||
seq: PIIFF<font color=red>C</font>RNNG |
seq: PIIFF<font color=red>C</font>RNNG |
||
− | non-mut: CEEEE<font color=red>E</font>ECCC |
+ | non-mut: CEEEE<font color=red>E</font>ECCC |
mut: CEEEE<font color=red>E</font>ECCC <br> |
mut: CEEEE<font color=red>E</font>ECCC <br> |
||
Position 265 <br> |
Position 265 <br> |
||
seq: PIIFFC<font color=red>R</font>NNG |
seq: PIIFFC<font color=red>R</font>NNG |
||
− | non-mut: CEEEEE<font color=red>E</font>CCC |
+ | non-mut: CEEEEE<font color=red>E</font>CCC |
mut: CEEEEE<font color=red>E</font>CCC <br> |
mut: CEEEEE<font color=red>E</font>CCC <br> |
||
Position 326 <br> |
Position 326 <br> |
||
seq: ENQPFL<font color=red>I</font>EAM |
seq: ENQPFL<font color=red>I</font>EAM |
||
− | non-mut: CCCCEE<font color=red>E</font>EEE |
+ | non-mut: CCCCEE<font color=red>E</font>EEE |
mut: CCCCEE<font color=red>E</font>EEE <br> |
mut: CCCCEE<font color=red>E</font>EEE <br> |
||
Position 361 <br> |
Position 361 <br> |
||
seq: <font color=red>I</font>SRLRHYLLS |
seq: <font color=red>I</font>SRLRHYLLS |
||
− | non-mut: <font color=red>H</font>HHHHHHHHH |
+ | non-mut: <font color=red>H</font>HHHHHHHHH |
mut: <font color=red>H</font>HHHHHHHHH <br> |
mut: <font color=red>H</font>HHHHHHHHH <br> |
||
Position 409 <br> |
Position 409 <br> |
||
seq: PKPNPNLL<font color=red>F</font>S |
seq: PKPNPNLL<font color=red>F</font>S |
||
− | non-mut: CCCCHHHH<font color=red>H</font>H |
+ | non-mut: CCCCHHHH<font color=red>H</font>H |
mut: CCCCHHHH<font color=red>H</font>H <br> |
mut: CCCCHHHH<font color=red>H</font>H <br> |
||
Position 438 <br> |
Position 438 <br> |
||
seq: HLQTYGEH<font color=red>Y</font>P |
seq: HLQTYGEH<font color=red>Y</font>P |
||
− | non-mut: HHHHHCCC<font color=red>C</font>C |
+ | non-mut: HHHHHCCC<font color=red>C</font>C |
mut: HHHHHCCC<font color=red>C</font>C <br> |
mut: HHHHHCCC<font color=red>C</font>C <br> |
||
Revision as of 22:00, 25 June 2011
Contents
General
We chose the following mutations for the sequence-based mutation analysis:
- G29E
- Q125E
- Y166N
- G249S
- C264W
- R265W
- I326T
- I361V
- F409C
- Y438N
The mutation positions are relative to the Uniprot reference sequence.
A Protocol
was created describing all steps for running the programs etc.
Amino Acid Properties
Reference amino acid | Mutated amino acid | Structural Difference | Secondary Structure | |||||
---|---|---|---|---|---|---|---|---|
Position | Residue | Properties | Structure | Residue | Properties | Structure | ||
29 | G | tiny, small | E | charged, polar | C | |||
125 | Q | acidic, polar | E | charged, polar | C | |||
166 | Y | hydrophobic, aromatic, polar | N | acidic, polar, small | H | |||
249 | G | tiny, small | S | polar, small, tiny, hydroxylic | H | |||
264 | C | sulphur containing, hydrophobic, tiny, small, polar | W | hydrophobic, aromatic, polar | E | |||
265 | R | charged, positive (basic), polar | W | hydrophobic, aromatic, polar | [[File:.png|thumb|100px]] | E | ||
326 | I | aliphatic, hydrophobic | T | hydroxylic, hydrophobic, small, polar | E | |||
361 | I | aliphatic, hydrophobic | V | aliphatic, hydrophobic, small | H | |||
409 | F | aromatic, hydrophobic | C | sulphur containing, hydrophobic, tiny, small, polar | C | |||
438 | Y | hydrophobic, aromatic, polar | N | acidic, polar, small | C |
Annotation: H = helix, E = beta-sheet, C = coil
To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.
The Protocol describes in detail the way how we used pymol to visualize our mutations.
Substitution matrices
Position | AA1/AA2 | BLOSUM62 | PAM1 | PAM250 | result | |||
---|---|---|---|---|---|---|---|---|
score | worst | score | worst | score | worst | |||
29 | G/E | -2 | -4 (I, L) | 7 | 0 (I, W, Y) | 9 | 2 (W) | BLOSUM62 says that the mutation is not very likely, whereas PAM1 and PAM250 say that the mutation is not anomalous |
125 | Q/E | 2 | -3 (C, F, I) | 27 | 0 (F, W, Y) | 7 | 1 (C, F, W) | all three substitutionmatrizes show that this mutation occurs quite often |
166 | Y/N | -2 | -3 (D, G, P) | 3 | 0 (R, D, Q, G, K, M, P) | 2 | 1 (A, R, D, Q, E, G, K, P) | Since the values of the three matrizes are low the mutation is not ofteh which shows that is not very probably |
249 | G/S | 0 | -4 (I, L) | 21 | 0 (I, W, Y) | 11 | 2 (W) | All three values are high what means that this mutation is often and so probably not very damaging |
264 | C/W | -2 | -4 (E) | 0 | 0 (N, D, Q, E, G, L, K, M, F, W) | 1 | 1 (R, N, D, Q, E, L, K, M, F, W) | The scores are all low. This reflects that the mutation is rare and because of this it is very likely that it influences the function of the protein |
265 | R/W | -3 | -3 (W, V, F, I, C) | 8 | 0 (D, E, G, Y) | 7 | 1 (F) | PAM1 and PAM250 have high values whereas BLOSUM62 has a low value. So BLOSUM62 says that this mutation is rare and probably damaging and PAM1 and PAM250 say that the mutation is quite often and so not very damaging |
326 | I/T | -1 | -4 (G) | 7 | 0 (G, A, P, W) | 4 | 1 (W) | Again the three matrizes have a different result. Whereas BLOSUM62 says that the mutation is rare, PAM1 and PAM250 says that the mutation have no bad influence in the protein and thats why it is probably. |
361 | I/V | 3 | -4 (G) | 33 | 0 (G, H, P, W) | 9 | 1 (W) | All three scores are high so that the mutation is often and because of that it is very possibly not damaging |
409 | F/C | -2 | -4 (P) | 0 | 0 (D, C, Q, E, K, P, V) | 1 | 1 (R, D, C, Q, E, G, K, P) | The three values are all very low which means that this mutation is very rare. This hypothesize that the mutation damages the function and the structure of the protein |
438 | Y/N | -2 | -3 (D, G, P) | 3 | 0 (R, D, Q, G, K, M, P) | 2 | 1 (A, R, D, Q, E, G, K, P) | Again the scores are all low which indicates the damaging effect of the mutation |
PSSM
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
29 G 1 0 0 -2 1 0 -2 4 1 -1 -3 -1 -3 -2 -3 1 -1 -4 -1 -1 10 6 5 1 3 4 1 37 3 5 2 3 0 2 0 9 2 0 2 6 0.37 0.77
125 Q -1 -1 -3 -3 -5 8 0 -4 0 -3 -4 -1 -1 -6 -4 -1 1 -5 -4 -4 4 2 0 0 0 74 2 0 1 1 1 2 1 0 0 2 9 0 0 0 1.46 1.28
166 Y 3 -3 -4 -4 3 0 -3 -4 1 -2 -2 -3 0 -2 -4 -1 1 7 3 1 24 1 0 0 7 5 1 1 3 1 3 1 2 1 0 3 8 15 12 11 0.62 1.29
249 G 5 -4 -3 -4 -4 -3 -2 4 -4 -5 -5 -3 -4 -5 -4 1 -2 -5 -5 -4 54 0 0 0 0 0 3 35 0 0 0 0 0 0 0 8 1 0 0 0 1.12 1.21
264 C -2 -5 -3 -5 9 -5 -5 -5 -5 3 -2 -5 -3 -4 -1 -3 -3 -5 -4 4 1 0 2 0 45 0 0 0 0 15 2 0 0 0 4 1 1 0 0 29 1.43 1.18
265 R -3 4 2 -3 -5 5 2 -4 0 -2 -4 -1 -2 -5 -4 -2 -2 -5 -2 0 0 25 12 0 0 34 15 0 1 2 0 1 0 0 0 1 1 0 1 7 0.88 1.21
326 I -3 -5 -6 -6 -4 -5 -6 -6 -6 7 0 -5 0 -2 -5 -5 -3 -5 -4 4 0 0 0 0 0 0 0 0 0 66 6 0 1 1 0 0 0 0 0 26 1.40 1.17
361 I -3 -5 -6 -6 -4 -5 -6 -6 -6 6 3 -5 1 -1 -5 -5 -3 -5 -4 2 0 0 0 0 0 0 0 0 0 55 27 0 3 1 0 0 0 0 0 14 1.22 1.21
409 F -4 -3 -6 -6 -3 -5 -5 -6 -3 0 1 -5 1 8 -6 -5 -3 0 1 -1 1 1 0 0 1 0 0 0 0 5 11 0 3 69 0 0 1 1 3 4 1.56 1.31
438 Y 0 -2 -2 -4 -2 -1 -1 -3 3 -3 -3 -3 -3 1 -5 -3 -3 3 8 -2 9 2 1 0 1 3 3 1 6 0 1 0 0 1 0 0 0 3 66 2 1.34 0.89
The values in the pssm reflect the grade of conservation in an multiple alignment. The higher the values, the better the conservation and therefore a substitution of the corresponing amino acids is usually tolerated, as both alleles have been passed on successfully. The pssm values for our mutations have been colored orange. For most of the mutations the pssm score is negative and therefore this substitution is not conserved and not likely to be tolerated.
The Q125E mutation has a score of 0.
The G249S substitution has a score of +1, which is quite good. This indicates a small rate of conservation and therefore this mutation might be tolerated in nature.
The highest score for our substitutions is +2 for the I361V mutations. This mutation is conserved and likely to be tolerated. This can be argumented by the fact that isoleucin and valin show a structural and physiochemical similarities.
Secondary Structure
To find out if the mutation has an influence on the secondary structure of the protein we compared the secondary structure of the sequence without mutations and the sequence including the mutations. To get the secondary structure of the two sequences we used psipred
We compared the structure for each position:
Position 29
seq: LLLRQPGARG non-mut: HHHCCCCCCC mut: HHHCCCCCCC
Position 125
seq: ESQRQGRISF non-mut: HHHHCCCCCC mut: HHHHCCCCCC
Position 166
seq: AGVLMYRDYP non-mut: HHHHHHCCCC mut: HHHHHHCCCC
Position 249
seq: ASEGDAHAGF non-mut: CCHHHHHHHH mut: CCCHHHHHHH
Position 264
seq: PIIFFCRNNG non-mut: CEEEEEECCC mut: CEEEEEECCC
Position 265
seq: PIIFFCRNNG non-mut: CEEEEEECCC mut: CEEEEEECCC
Position 326
seq: ENQPFLIEAM non-mut: CCCCEEEEEE mut: CCCCEEEEEE
Position 361
seq: ISRLRHYLLS non-mut: HHHHHHHHHH mut: HHHHHHHHHH
Position 409
seq: PKPNPNLLFS non-mut: CCCCHHHHHH mut: CCCCHHHHHH
Position 438
seq: HLQTYGEHYP non-mut: HHHHHCCCCC mut: HHHHHCCCCC
As we can see by comparing the secondary structure of all the positions, none of the mutations have an influence on the secondary structure.
Multiple Sequence Alignment
To find the homologue sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are mammals.
ID | Accession | Entry name |
---|---|---|
sp | P11178 | ODBA_BOVIN |
sp | P12694 | ODBA_HUMAN |
sp | Q8HXY4 | ODBA_MACFA |
sp | P50136 | ODBA_MOUSE |
sp | A5A6H9 | ODBA_PANTR |
sp | P11960 | ODBA_RAT |
tr | Q6ZSA3 | Q6ZSA3_HUMAN |
tr | E7ESE6 | E7ESE6_HUMAN |
tr | B2R8A9 | B2R8A9_HUMAN |
tr | Q658P7 | Q658P7_HUMAN |
tr | E7EW46 | E7EW46_HUMAN |
tr | B4DP47 | B4DP47_HUMAN |
tr | Q59EI3 | Q59EI3_HUMAN |
tr | F1N5F2 | F1N5F2_BOVIN |
tr | B1PK12 | B1PK12_PIG |
tr | E2RPW4 | E2RPW4_CANFA |
tr | B2LSM3 | B2LSM3_SHEEP |
tr | F1RHA0 | F1RHA0_PIG |
tr | F1PI86 | F1PI86_CANFA |
tr | D2HMT3 | D2HMT3_AILME |
tr | Q2TBT9 | Q2TBT9_BOVIN |
tr | Q3U3J1 | Q3U3J1_MOUSE |
tr | Q99L69 | Q99L69_MOUSE |
tr | Q5EB89 | Q5EB89_RAT |
tr | B1WBN3 | B1WBN3_RAT |
With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologous was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other ones. So we removed those sequences and realigned the other sequences.
With this new multiple alignment we could analyze the 10 positions of our mutations to find out how good they are conserved.
position | conservation wildtype | conservation mutant |
---|---|---|
29 | 0.72 | 0 |
125 | 0.96 | 0 |
166 | 1 | 0 |
249 | 1 | 0 |
264 | 1 | 0 |
265 | 1 | 0 |
326 | 1 | 0 |
361 | 0.92 | 0 |
409 | 0.92 | 0 |
438 | 0.92 | 0 |
The results show that all amino acids on the observed positions are really good conserved since the value is always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for the structure and the function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function.
SNAP
To run SNAP we used the command:
snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out
nsSNP | Prediction | Reliability Index | Expected Accuracy |
---|---|---|---|
G29E | Neutral | 0 | 53% |
Q125E | Non-neutral | 1 | 63% |
Y166N | Non-neutral | 2 | 70% |
G249S | Neutral | 1 | 60% |
C264W | Non-neutral | 4 | 82% |
R265W | Non-neutral | 4 | 82% |
I326T | Non-neutral | 3 | 78% |
I361V | Neutral | 4 | 85% |
F409C | Non-neutral | 4 | 82% |
Y438N | Non-neutral | 4 | 82% |
The output of SNAP shows us that most of the mutations would have a damaging effect on the structure and function of the protein. Only the mutations on position 29, 249 and 361 would not have an influence on the protein.
A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions.
SIFT
The following table displays the SIFT results. The threshold for intolerance is 0.05.
The amino acids are colored in the following way:
- nonpolar
- uncharged polar
- basic
- acidic
Capital letters: amino acids appear in the alignment
Lower case letters: amino acids result from prediction
Seq Rep:fraction of sequences that contain one of the basic amino acids
The only substritutions SIFT predicts not to affect protein function are G29E and I361V. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from isoleucin to valin, which are both 'branched-chain' amino acids. These two amino acids quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.
Polyphen2
Position | AA1/AA2 | HumDiv | HumVar | ||||||
---|---|---|---|---|---|---|---|---|---|
prediction | Score | Sensitivity | Specificity | prediction | Score | Sensitivity | Specificity | ||
29 | G/E | benign | 0.025 | 0.96 | 0.80 | benign | 0.018 | 0.96 | 0.52 |
125 | Q/E | possibly damaging | 0.759 | 0.85 | 0.93 | benign | 0.285 | 0.87 | 0.75 |
166 | Y/N | probably damaging | 0.997 | 0.40 | 0.98 | probably damaging | 0.964 | 0.59 | 0.93 |
249 | G/S | benign | 0.145 | 0.93 | 0.86 | benign | 0.292 | 0.86 | 0.75 |
264 | C/W | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 |
265 | R/W | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 |
326 | I/T | probably damaging | 0.997 | 0.40 | 0.98 | probably damaging | 0.998 | 0.16 | 0.99 |
361 | I/V | benign | 0.039 | 0.95 | 0.82 | benign | 0.178 | 0.89 | 0.70 |
409 | F/C | probably damaging | 0.998 | 0.27 | 0.99 | probably damaging | 0.939 | 0.64 | 0.92 |
438 | Y/N | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 0.987 | 0.49 | 0.96 |
Polyphen2 uses two different datasets to do the prediction. As the results show the two predictions are not always the same. The predictions with the HumDiv dataset says that there are three mutations that possibly have no grave effect on the function or structure of the protein whereas the result of HumVar is that there are four mutations that perhaps would have no damaging influence. The three mutations which are in both datasets marked as "benign" are on the positions 29, 249 and 361. The mutation which is only in the HumVar dataset predicted as benign is on position 125.
Comparison
Position | AA1/AA2 | BLOSUM62 | PAM1 | PAM250 | PSSM | Multiple Alignment | SNAP | SIFT | Polyphen2 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Prediction | Conservation wildtype | Conservation mutant | Prediction | Prediction | HumDiv | HumVar | |||||
29 | G/E | non-neutral | neutral | neutral | non-neutral | neutral | 0 | neutral | neutral | neutral | neutral |
125 | Q/E | neutral | neutral | neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | neutral |
166 | Y/N | non-neutral | non-neutral | non-neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
249 | G/S | non-neutral | neutral | neutral | neutral | non-neutral | 0 | neutral | non-neutral | neutral | neutral |
264 | C/W | non-neutral | non-neutral | non-neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
265 | R/W | non-neutral | neutral | neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
326 | I/T | non-neutral | neutral | neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
361 | I/V | neutral | neutral | neutral | neutral | neutral | 0 | non-neutral | neutral | neutral | neutral |
409 | F/C | non-neutral | non-neutral | non-neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
438 | Y/N | non-neutral | non-neutral | non-neutral | non-neutral | non-neutral | 0 | non-neutral | non-neutral | non-neutral | non-neutral |
position 29:
position 125:
position 166:
position 249:
position 264:
position 265:
position 326:
position 361:
position 409:
position 438:
back to Maple_syrup_urine_disease main page
go back to Task 5 Mapping SNPs