Sequence-based mutation analysis BCKDHA
Contents
General
A Protocol was created where all steps for running the programs etc are described.
Subset of mutations
Reference amino acid | Mutated amino acid | Structural Difference | Secondary Structure | |||||
---|---|---|---|---|---|---|---|---|
Position | Residue | Properties | Structure | Residue | Properties | Structure | ||
29 | G | tiny, small | E | charged, polar | C | |||
125 | Q | acidic, polar | E | charged, polar | C | |||
166 | Y | hydrophobic, aromatic, polar | N | acidic, polar, small | H | |||
249 | G | tiny, small | S | polar, small, tiny, hydroxylic | H | |||
264 | C | sulphur containing, hydrophobic, tiny, small, polar | W | hydrophobic, aromatic, polar | E | |||
265 | R | charged, positive (basic), polar | W | hydrophobic, aromatic, polar | [[File:.png|thumb|100px]] | E | ||
326 | I | aliphatic, hydrophobic | T | hydroxylic, hydrophobic, small, polar | E | |||
361 | I | aliphatic, hydrophobic | V | aliphatic, hydrophobic, small | H | |||
409 | F | aromatic, hydrophobic | C | sulphur containing, hydrophobic, tiny, small, polar | C | |||
438 | Y | hydrophobic, aromatic, polar | N | acidic, polar, small | C |
Annotation: H = helix, E = beta-sheet, C = coil
To visualize the mutations in the three-dimensional protein structure, the PDB entry for BCKDHA, 1U5B, was used. As the PDB file only contains coordinate information about the protein itself, the signal peptide (45 first amino acids) are not annotated. Therefore the first mutation on position 29, which lies in the signal peptide, could not be visualized.
The Protocol describes in detail the way how we used pymol to visualize our mutations.
Position | AA1/AA2 | BLOSUM62 | PAM1 | PAM250 | |||
---|---|---|---|---|---|---|---|
score | worst | score | worst | score | worst | ||
29 | G/E | -2 | -4 (I, L) | 7 | 0 (I, W, Y) | 9 | 2 (W) |
125 | Q/E | 2 | -3 (C, F, I) | 27 | 0 (F, W, Y) | 7 | 1 (C, F, W) |
166 | Y/N | -2 | -3 (D, G, P) | 3 | 0 (R, D, Q, G, K, M, P) | 2 | 1 (A, R, D, Q, E, G, K, P) |
249 | G/S | 0 | -4 (I, L) | 21 | 0 (I, W, Y) | 11 | 2 (W) |
264 | C/W | -2 | -4 (E) | 0 | 0 (N, D, Q, E, G, L, K, M, F, W) | 1 | 1 (R, N, D, Q, E, L, K, M, F, W) |
265 | R/W | -3 | -3 (W, V, F, I, C) | 8 | 0 (D, E, G, Y) | 7 | 1 (F) |
326 | I/T | -1 | -4 (G) | 7 | 0 (G, A, P, W) | 4 | 1 (W) |
361 | I/V | 3 | -4 (G) | 33 | 0 (G, H, P, W) | 9 | 1 (W) |
409 | F/C | -2 | -4 (P) | 0 | 0 (D, C, Q, E, K, P, V) | 1 | 1 (R, D, C, Q, E, G, K, P) |
438 | Y/N | -2 | -3 (D, G, P) | 3 | 0 (R, D, Q, G, K, M, P) | 2 | 1 (A, R, D, Q, E, G, K, P) |
PSSM
In order to get a human-readable PSSM-File PsiBlast was run using the following command: blastpgp -i sequence.fasta -j 5 -d /data/blast/nr/nr -C profile.ckp -u 1 -J T
Multiple Sequence Alignment
To find the homologue sequences to BCKDHA we used BLAST. It found 250 homologous sequences, 25 of them are mammals.
ID | Accession | Entry name |
---|---|---|
sp | P11178 | ODBA_BOVIN |
sp | P12694 | ODBA_HUMAN |
sp | Q8HXY4 | ODBA_MACFA |
sp | P50136 | ODBA_MOUSE |
sp | A5A6H9 | ODBA_PANTR |
sp | P11960 | ODBA_RAT |
tr | Q6ZSA3 | Q6ZSA3_HUMAN |
tr | E7ESE6 | E7ESE6_HUMAN |
tr | B2R8A9 | B2R8A9_HUMAN |
tr | Q658P7 | Q658P7_HUMAN |
tr | E7EW46 | E7EW46_HUMAN |
tr | B4DP47 | B4DP47_HUMAN |
tr | Q59EI3 | Q59EI3_HUMAN |
tr | F1N5F2 | F1N5F2_BOVIN |
tr | B1PK12 | B1PK12_PIG |
tr | E2RPW4 | E2RPW4_CANFA |
tr | B2LSM3 | B2LSM3_SHEEP |
tr | F1RHA0 | F1RHA0_PIG |
tr | F1PI86 | F1PI86_CANFA |
tr | D2HMT3 | D2HMT3_AILME |
tr | Q2TBT9 | Q2TBT9_BOVIN |
tr | Q3U3J1 | Q3U3J1_MOUSE |
tr | Q99L69 | Q99L69_MOUSE |
tr | Q5EB89 | Q5EB89_RAT |
tr | B1WBN3 | B1WBN3_RAT |
With this 25 results we made a multiple alignment by using CLUSTALW. The alignment with all mammalian homologous was quite bad because of the sequences "Q6ZSA3" and "E2RPW4". These two sequences are much longer than the other one. So we removed those sequences and realigned the other sequences.
With this new multiple Alignment we could analyze the 10 positions to find out how good they are conserved.
position | conservation wildtype | conservation mutant |
---|---|---|
29 | 0.72 | 0 |
125 | 0.96 | 0 |
166 | 1 | 0 |
249 | 1 | 0 |
264 | 1 | 0 |
265 | 1 | 0 |
326 | 1 | 0 |
361 | 0.92 | 0 |
409 | 0.92 | 0 |
438 | 0.92 | 0 |
The results show that all amino acids on the observed positions are really good conserved since the value is always nearly 1. Only on position 29 the conservation of Glycin is only about 72%. This is not that high as the other results but it is still good conserved. Regions in the proteins which are good conserved are propably very important for the structure and the function of the protein. Because of the fact that all amino acids are very good conserved, the mutations on these positions can be very damaging and can have a huge impact on the protein and its function.
SNAP
To run SNAP we used the command:
snapfun -i BCKDHA.fasta -m mutations.txt -o SNAP.out
nsSNP | Prediction | Reliability Index | Expected Accuracy |
---|---|---|---|
G29E | Neutral | 0 | 53% |
Q125E | Non-neutral | 1 | 63% |
Y166N | Non-neutral | 2 | 70% |
G249S | Neutral | 1 | 60% |
C264W | Non-neutral | 4 | 82% |
R265W | Non-neutral | 4 | 82% |
I326T | Non-neutral | 3 | 78% |
I361V | Neutral | 4 | 85% |
F409C | Non-neutral | 4 | 82% |
Y438N | Non-neutral | 4 | 82% |
A second SNAP run was performed where all ten chosen mutation positions were mutated by all possible substitutions.
SIFT
The following table displays the SIFT results. The threshold for intolerance is 0.05.
The amino acids are colored in the following way:
- nonpolar
- uncharged polar
- basic
- acidic
Capital letters: amino acids appear in the alignment
Lower case letters: amino acids result from prediction
Seq Rep:fraction of sequences that contain one of the basic amino acids
The only substritutions SIFT predicts not to affect protein function are G29E and I361V. The first substitution may be tolerated, as this position is not within the actual protein sequence. The second tolerated amino acid exchange is from isoleucin to valin, which are both 'branched-chain' amino acids. These two amino acids quite similar concerning their structure and physiochemical properties, so an exchange can be tolerated.
Polyphen2
Position | AA1/AA2 | HumDiv | HumVar | ||||||
---|---|---|---|---|---|---|---|---|---|
prediction | Score | Sensitivity | Specificity | prediction | Score | Sensitivity | Specificity | ||
29 | G/E | benign | 0.025 | 0.96 | 0.80 | benign | 0.018 | 0.96 | 0.52 |
125 | Q/E | possibly damaging | 0.759 | 0.85 | 0.93 | benign | 0.285 | 0.87 | 0.75 |
166 | Y/N | probably damaging | 0.997 | 0.40 | 0.98 | probably damaging | 0.964 | 0.59 | 0.93 |
249 | G/S | benign | 0.145 | 0.93 | 0.86 | benign | 0.292 | 0.86 | 0.75 |
264 | C/W | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 |
265 | R/W | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 |
326 | I/T | probably damaging | 0.997 | 0.40 | 0.98 | probably damaging | 0.998 | 0.16 | 0.99 |
361 | I/V | benign | 0.039 | 0.95 | 0.82 | benign | 0.178 | 0.89 | 0.70 |
409 | F/C | probably damaging | 0.998 | 0.27 | 0.99 | probably damaging | 0.939 | 0.64 | 0.92 |
438 | Y/N | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 0.987 | 0.49 | 0.96 |
Comparison
Position | AA1/AA2 | BLOSUM62 | PAM1 | PAM250 | PSSM | Multiple Alignemnt | SNAP | SIFT | Polyphen2 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Prediction | Conservation wildtype | Conservation mutant | Prediction | Prediction | HumDiv | HumVar | |||||
29 | G/E | -2 | 7 | 9 | 0.72 | 0 | Neutral | Tolerated | benign | benign | |
125 | Q/E | 2 | 27 | 7 | 0.96 | 0 | Non-neutral | Not Tolerated | possibly damaging | benign | |
166 | Y/N | -2 | 3 | 2 | 1 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging | |
249 | G/S | 0 | 21 | 11 | 1 | 0 | Neutral | Not Tolerated | benign | benign | |
264 | C/W | -2 | 0 | 1 | 1 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging | |
265 | R/W | -3 | 8 | 7 | 1 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging | |
326 | I/T | -1 | 7 | 4 | 1 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging | |
361 | I/V | 3 | 33 | 9 | 0.92 | 0 | Neutral | Tolerated | benign | benign | |
409 | F/C | -2 | 0 | 1 | 0.92 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging | |
438 | Y/N | -2 | 3 | 2 | 0.92 | 0 | Non-neutral | Not Tolerated | probably damaging | probably damaging |
For position 29 the 7 sources show that this mutation won't have a damaging influence on the function or structure of the protein. This is perspicuous since the conservation of this position is not very high.
The amino acid on position 125 is quite good conserved with 96%. SNAP and SIFT says that this mutation will have an influence on the protein whereas Polyphen2 says that this mutation is ether not (HumVar) or hardly damaging (HumDiv). Since this position is quite good conserved it is more likely that this mutation will have an influence on the function or structure of the protein.
back to Maple_syrup_urine_disease main page
go back to Task 5 Mapping SNPs