Difference between revisions of "Mapping SNPs HEXA"
(→Summary) |
(→Mutations annotated only in SNP-DB) |
||
(18 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Methods == |
== Methods == |
||
− | First of all, we had to parse the HGMD database and the DB-SNP database. |
+ | First of all, we had to parse the HGMD database and the DB-SNP database to extract the different SNPs which are already known for the HEXA protein. |
=== HGMD === |
=== HGMD === |
||
− | We logged |
+ | We logged in and searched for Tay-Sachs diseases and chose one entry of HEXA. In our case there were two entries with identical content. We only looked at the missense/nonsense mutations, so in sum we found 68 annotated mutations in HGMD. We just copied the webpage in a text file and than wrote a short parser, which parse the codon change, amino acid change and codon number. |
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
=== DBSNP === |
=== DBSNP === |
||
− | It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the |
+ | It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the graphical output and again copied and pasted the page in a text file. |
− | An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence. In the next line, there are the |
+ | An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence is listed. In the next line, there are the allele origin, the clinical relevance and last the annotations of the mutation. |
− | We parsed the SNP-id. If there was an NP entry on the line with the annotations, there was a detailed description on which position which amino acid is changed. Than we used this annotation. If we could not find such an annotation we used the NM or NR annotation. Both of them describe the mutation at the nucleotide sequence. We used the position and divided it by 3, so therefore, we know the codon position and the position of the nucleotide exchange in the codon (1, 2 or 3). We used the sequence to get the triplet of this codon and than we mapped the original triplet and the mutated triplet to amino acids. Therefore, it was possible to annotate the position, codon position and the mutations of this position, which we wrote in our tables. |
+ | We parsed the SNP-id. If there was an NP entry on the line with the annotations, there was a detailed description on which position in this protein which amino acid is changed in this SNP. Than we used this annotation. If we could not find such an annotation we used the NM or NR annotation. Both of them describe the mutation at the nucleotide sequence. We used the position and divided it by 3, so therefore, we know the codon position and the position of the nucleotide exchange in the codon (1, 2 or 3). We used the sequence to get the triplet of this codon and than we mapped the original triplet and the mutated triplet to the different amino acids. Therefore, it was possible to annotate the position, codon position and the mutations of this position, which we wrote in our tables. |
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
== Comparison of mutations in HGMD and DBSNP == |
== Comparison of mutations in HGMD and DBSNP == |
||
− | For the comparison of the mutations in HGMD and DBSNP we made table with some |
+ | For the comparison of the mutations in HGMD and DBSNP we made table with some important information. First of all we extracted the DBSNP-id which makes it easy to look after this SNP if necessary. Next we looked up the codon position which displays where the mutation takes place in the sequence. Furthermore we also extracted the mutation position which corresponds to the position in the triplet where the mutation takes place. The next two entries displays the mutations for the amino acids and codon which means it shows detailed which amino acid is replaced by another and how the codon is changed. |
− | Afterwards we |
+ | Afterwards, we created descriptive representation of the mutation by showing the sequence and coloring the corresponding kind of mutation. |
− | Following tables and descriptive |
+ | Following tables and descriptive visualizations are for different cases which are described below. |
− | Furthermore, a detailed summary of the different cases and a |
+ | Furthermore, a detailed summary of the different cases and a comparison of them can be seen below at [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Mapping_SNPs_HEXA#Statistical_comparison_of_HGMD_and_DBSNP statistical comparison of HGMD and DBSNP]]. |
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
=== Mutations annotated in both databases === |
=== Mutations annotated in both databases === |
||
− | First of all we compared the mutations which are annotated in both databases. These mutations are not silent and cause all the phenotype, which is known as Tay-Sachs disease. |
+ | First of all we compared the mutations which are annotated in both databases. These mutations are not silent and cause all the phenotype, which is known as Tay-Sachs disease. We know this pretty sure, because HGMD contains no silent mutations. |
− | Here we found 33 annotated mutations where some of them take place in different |
+ | Here we found 33 annotated mutations where some of them take place in different mutation positions in the same triplet. Besides, in some case two different named mutation take places at the same codon position. |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
|SNP-DB Identifier |
|SNP-DB Identifier |
||
+ | |Codon position |
||
− | |Codonposition |
||
+ | |Mutation position |
||
− | |Mutationposition |
||
|Amino Acids |
|Amino Acids |
||
|Codons |
|Codons |
||
Line 71: | Line 75: | ||
|CGG -> CAG |
|CGG -> CAG |
||
|- |
|- |
||
− | |rowspan="2" | |
+ | |rowspan="2" | rs28941770 |
|rowspan="3" | 178 |
|rowspan="3" | 178 |
||
|2 |
|2 |
||
Line 181: | Line 185: | ||
|gCGA -> TGA |
|gCGA -> TGA |
||
|- |
|- |
||
− | |rowspan="2" | |
+ | |rowspan="2" | rs121907958 |
|rowspan="2" | 420 |
|rowspan="2" | 420 |
||
|3 |
|3 |
||
Line 197: | Line 201: | ||
|tCTG -> GTG |
|tCTG -> GTG |
||
|- |
|- |
||
− | |rowspan="2" | |
+ | |rowspan="2" | rs121907978 |
|rowspan="2" | 454 |
|rowspan="2" | 454 |
||
|2 |
|2 |
||
Line 247: | Line 251: | ||
|CGC -> CAC |
|CGC -> CAC |
||
|- |
|- |
||
− | |rowspan="2" | |
+ | |rowspan="2" | rs4777502 |
− | |rowspan="2" | |
+ | |rowspan="2" | 506 |
|3 |
|3 |
||
|Glu -> Asp |
|Glu -> Asp |
||
Line 268: | Line 272: | ||
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
||
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
− | <br> |
||
− | |||
=== Mutations annotated only in HGMD === |
=== Mutations annotated only in HGMD === |
||
− | We also looked for mutations which are annotated only in HGMD, but we did not |
+ | We also looked for mutations which are annotated only in HGMD, but we did not find any of them. |
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
=== Mutations annotated only in SNP-DB === |
=== Mutations annotated only in SNP-DB === |
||
Here we listed all mutations which are annotated only in the DBSNP and which are not silent. Some of these mutations have a high detailed NP annotation while others are not annotated in such a detailed way. Therefore, we had to map these mutations. The detailed list of the mutations, which we mapped can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Mapping_SNPs_HEXA/DETAIL here]]. |
Here we listed all mutations which are annotated only in the DBSNP and which are not silent. Some of these mutations have a high detailed NP annotation while others are not annotated in such a detailed way. Therefore, we had to map these mutations. The detailed list of the mutations, which we mapped can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Mapping_SNPs_HEXA/DETAIL here]]. |
||
− | Finally, 8 non-silent mutations could be found which have only one possible |
+ | Finally, 8 non-silent mutations could be found which have only one possible mutation position and codon position. |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
|SNP-DB Identifier |
|SNP-DB Identifier |
||
+ | |Codon position |
||
− | |Codonposition |
||
+ | |Mutation position |
||
− | |Mutationposition |
||
|Amino Acids |
|Amino Acids |
||
|Codons |
|Codons |
||
Line 326: | Line 330: | ||
|436 |
|436 |
||
|1 |
|1 |
||
− | | |
+ | |Val -> Ile |
|ATA -> GTA |
|ATA -> GTA |
||
|- |
|- |
||
Line 339: | Line 343: | ||
'''Graphical representation:'''<br> |
'''Graphical representation:'''<br> |
||
− | The following |
+ | The following visualization displays the mutations which are only in DBSNP and non-silent. Sometimes, we got the wrong amino acid in the mutation annotation which can be explained, because there exist some different sequences of this protein, depending on which sequence assembly is used. Therefore, if DBSNP uses another sequence than we, it is possible that there are some other amino acids in the sequence. It is not possible to say which sequence is the right one. Therefore, we decided to color wrong annotated amino acids blue, so it is possible for the reader to see with one look that there is a difference. <br><br> |
MTSSRLWFSLLLAAAFAGRATALWPWPQ<font color=red>N</font>FQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> MTSSRLWFSLLLAAAFAGRATALWPWPQ<font color=red>S</font>FQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD<br> <br> <br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSR<font color=red>H</font>YLPLS<br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSR<font color=red>D</font>YLPLS<br> <br> <br> SILDTLDVMAYNKLNVFHWHLVD<font color=blue>D</font>PSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYNKLNVFHWHLVD<font color=blue>D</font>PSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> AR<font color=red>L</font>RGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNP<font color=red>S</font>LNNTYEFMSTFFL<br> AR<font color=red>F</font>RGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNP<font color=red>I</font>LNNTYEFMSTFFL<br> <br> <br> EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPV<font color=red>N</font>YMKELELVTKAGFRALLSAPWYLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPV<font color=red>D</font>YMKELELVTKAGFRALLSAPWYLNRISYG<br> <br> <br> PDWKDFY<font color=red>I</font>VEPLAFEGTPEQKALVIGG<font color=blue>E</font>ACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> PDWKDFY<font color=red>V</font>VEPLAFEGTPEQKALVIGG<font color=blue>S</font>ACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> <br> <br> TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT |
MTSSRLWFSLLLAAAFAGRATALWPWPQ<font color=red>N</font>FQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> MTSSRLWFSLLLAAAFAGRATALWPWPQ<font color=red>S</font>FQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD<br> <br> <br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSR<font color=red>H</font>YLPLS<br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSR<font color=red>D</font>YLPLS<br> <br> <br> SILDTLDVMAYNKLNVFHWHLVD<font color=blue>D</font>PSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYNKLNVFHWHLVD<font color=blue>D</font>PSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> AR<font color=red>L</font>RGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNP<font color=red>S</font>LNNTYEFMSTFFL<br> AR<font color=red>F</font>RGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNP<font color=red>I</font>LNNTYEFMSTFFL<br> <br> <br> EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPV<font color=red>N</font>YMKELELVTKAGFRALLSAPWYLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPV<font color=red>D</font>YMKELELVTKAGFRALLSAPWYLNRISYG<br> <br> <br> PDWKDFY<font color=red>I</font>VEPLAFEGTPEQKALVIGG<font color=blue>E</font>ACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> PDWKDFY<font color=red>V</font>VEPLAFEGTPEQKALVIGG<font color=blue>S</font>ACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> <br> <br> TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT |
||
Line 346: | Line 350: | ||
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
||
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
=== Silent Mutations === |
=== Silent Mutations === |
||
− | HGMD contains no silent mutations which means that the silent mutations are extracted from DBSNP. These silent mutations do not cause change the amino acid. It is not totally clear, if these silent mutations do not cause any phenotypes, because there exist the hypothesis, that silent mutations change the folding of the protein. If one very common triplet is change by a codon which is rare it could be possible that the tRNA needs more time to bind on the ribosome, than a tRNA of a very common codon because of their frequent |
+ | HGMD contains no silent mutations which means that the silent mutations are extracted from DBSNP. These silent mutations do not cause change the amino acid. It is not totally clear, if these silent mutations do not cause any phenotypes, because there exist the hypothesis, that silent mutations change the folding of the protein. If one very common triplet is change by a codon which is rare it could be possible that the tRNA needs more time to bind on the ribosome, than a tRNA of a very common codon because of their frequent existence (compare for example [http://www.sciencemag.org/content/324/5924/255.full Coding-Sequence Determinants of Gene Expression in Escherichia coli]). This could lead to changes in the folding of the protein and there it is also possible, that a silent mutation change the phenotype because of miss-folded proteins. Therefore, we think it is important to list the silent mutations, too. They are not annotated in the HGMD database and therefore, probably they do not change the phenotype. Otherwise, if the hypothesis with the miss-folded proteins because of different codons is true, the silent mutations should be kept in mind. Therefore, it would be possible to explain a different phenotype although there is no amino acid exchange. This is the reason, why we decided to list the non-silent mutations in this section. |
One problem with these mutations is, that these are badly annotated in the DBSNP which means we had to prepare the found results in addition. Therefore we first rotated the found codons, because the original codon has to encode for the amino acid which occur in the protein sequence. |
One problem with these mutations is, that these are badly annotated in the DBSNP which means we had to prepare the found results in addition. Therefore we first rotated the found codons, because the original codon has to encode for the amino acid which occur in the protein sequence. |
||
− | Therefore we used the codon with a mutation at position one, position two and position three. Next, we also reversed them and created the |
+ | Therefore we used the codon with a mutation at position one, position two and position three. Next, we also reversed them and created the complementary sequence for both. |
The detailed result can be seen [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Mapping_SNPs_HEXA/DETAIL here]]<br> |
The detailed result can be seen [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Mapping_SNPs_HEXA/DETAIL here]]<br> |
||
− | If we found more than one nucleotide combination that encodes the same amino acid in the protein sequence and if these are silent mutations as well, we listed them all in the following table. Otherwise, if there do not exist any other possible mutation for this |
+ | If we found more than one nucleotide combination that encodes the same amino acid in the protein sequence and if these are silent mutations as well, we listed them all in the following table. Otherwise, if there do not exist any other possible mutation for this position, we also listed the mutations which are not silent. |
Here are the results which displays all combinations that are possible: |
Here are the results which displays all combinations that are possible: |
||
Line 360: | Line 366: | ||
|- |
|- |
||
|SPN-DB Identifier |
|SPN-DB Identifier |
||
+ | |Codon position |
||
− | |Codonposition |
||
+ | |Mutation position |
||
− | |Mutationposition |
||
|Amino Acids |
|Amino Acids |
||
|Codons |
|Codons |
||
Line 380: | Line 386: | ||
|forward |
|forward |
||
|- |
|- |
||
− | |rowspan="2" | |
+ | |rowspan="2" | rs28942072 |
− | |rowspan="2" | |
+ | |rowspan="2" | 324 |
|rowspan="2" | 3 |
|rowspan="2" | 3 |
||
|rowspan="2" | Val -> Val |
|rowspan="2" | Val -> Val |
||
|GTT -> GTA |
|GTT -> GTA |
||
− | |rowspan="2" | |
+ | |rowspan="2" | Forward |
|- |
|- |
||
|GTT -> GTC |
|GTT -> GTC |
||
Line 394: | Line 400: | ||
|Pro -> Pro |
|Pro -> Pro |
||
|CCT -> CCC |
|CCT -> CCC |
||
− | | |
+ | |complementary reverse |
|- |
|- |
||
|rs4777502 |
|rs4777502 |
||
Line 408: | Line 414: | ||
'''Graphical representation:'''<br> |
'''Graphical representation:'''<br> |
||
+ | Here we listed all silent mutation which we found in the database to give the user the possibility to see the position of the mutations with one look. |
||
− | In this graphical representation the silent mutations which are only in DBSNP are displayed.<br><br> |
||
MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> <br> <br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS<br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS<br> <br> <br> SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL<br> ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL<br> <br> <br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG<br> <br> <br> PDWKDFYIVEPLAFEGT<font color=yellow>P</font>EQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> PDWKDFYIVEPLAFEGT<font color=yellow>P</font>EQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> <br> <br> TSDLTFAYERLSHFRC<font color=yellow>E</font>LLRRGVQAQPLNVGFCEQEFEQT |
MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> <br> <br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS<br> QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS<br> <br> <br> SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL<br> ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL<br> <br> <br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG<br> <br> <br> PDWKDFYIVEPLAFEGT<font color=yellow>P</font>EQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> PDWKDFYIVEPLAFEGT<font color=yellow>P</font>EQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL<br> <br> <br> TSDLTFAYERLSHFRC<font color=yellow>E</font>LLRRGVQAQPLNVGFCEQEFEQT |
||
Line 415: | Line 421: | ||
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
||
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
== Summary == |
== Summary == |
||
'''Graphical representation:''' |
'''Graphical representation:''' |
||
− | Here we combined all graphical representations of the different cases. This means this |
+ | Here we combined all graphical representations of the different cases. This means this visualization displays all possible mutations which are found in HGMD and DBSNP. Therefore we can see non-silent mutations as well as silent mutations. Furthermore, the wrong annotated amino acids are marked, too. |
MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWP<font color=red>W</font>PQ<font color=red>N</font>FQTSDQRYV<font color=red>L</font>YPNNFQFQYDVSSAAQPGCSVLD<br> MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWP<font color=red>!</font>PQ<font color=red>S</font>FQTSDQRYV<font color=red>R</font>YPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> <br> <br> QCL<font color=red>L</font>LSETVWGAL<font color=red>R</font>GLETFSQLVWKSAEGTFFINKTEIEDFPRFPH<font color=red>R</font>GLLLDTS<font color=red>R</font><font color=red>H</font><font color=red>Y</font>LPLS<br> QCL<font color=red>F</font>LSETVWGAL<font color=red>!</font>GLETFSQLVWKSAEGTFFINKTEIEDFPRFPH<font color=red>W</font>GLLLDTS<font color=red>H</font><font color=red>D</font><font color=red>!</font>LPLS<br> <br> <br> SILDTLDVMAYN<font color=red>K</font>LN<font color=red>V</font>FHW<font color=red>H</font>LVD<font color=blue>D</font>P<font color=red>S</font><font color=red>F</font>PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYN<font color=red>T</font>LN<font color=red>M</font>FHW<font color=red>R</font>LVD<font color=blue>D</font>P<font color=red>F</font><font color=red>S</font>PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> A<font color=red>R</font><font color=red>L</font>R<font color=red>G</font>IRVLAEF<font color=red>D</font>TPGHTLSWGP<font color=red>G</font>IPGLLTPCYSGSEPSGTFGPVNP<font color=red>S</font>LNNTYEF<font color=red>M</font>STFFL<br> A<font color=red>W</font><font color=red>F</font>R<font color=red>S</font>IRVLAEF<font color=red>H</font>TPGHTLSWGP<font color=red>S</font>IPGLLTPCYSGSEPSGTFGPVNP<font color=red>I</font>LNNTYEF<font color=red>R</font>STFFL<br> <br> <br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTC<font color=red>W</font>KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTC<font color=red>!</font>KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVW<font color=red>R</font>EDIPV<font color=red>N</font>YMKELELVTKAGFRALLSAP<font color=red>W</font>YLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVW<font color=red>!</font>EDIPV<font color=red>D</font>YMKELELVTKAGFRALLSAP<font color=red>C</font>YLNRISYG<br> <br> <br> PDWKDFY<font color=red>I</font>VEPLAFEGT<font color=yellow>P</font>EQKA<font color=red>L</font>VI<font color=red>G</font>G<font color=blue>E</font>ACMWGEYVDNTNLVPRL<font color=red>W</font>PRAGAVA<font color=red>E</font>RL<font color=red>W</font>SNKL<br> PDWKDFY<font color=red>V</font>VEPLAFEGT<font color=yellow>P</font>EQKA<font color=red>V</font>VI<font color=red>D</font>G<font color=blue>S</font>ACMWGEYVDNTNLVPRL<font color=red>C</font>PRAGAVA<font color=red>K</font>RL<font color=red>R</font>SNKL<br> <br> <br> TSDLTFAYE<font color=red>R</font>LSHF<font color=red>R</font>C<font color=yellow>E</font>LLRRGVQAQPLNVGFCEQEFEQT |
MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWP<font color=red>W</font>PQ<font color=red>N</font>FQTSDQRYV<font color=red>L</font>YPNNFQFQYDVSSAAQPGCSVLD<br> MT<font color=yellow>S</font>SRLWFSLLLAAAFAGRATALWP<font color=red>!</font>PQ<font color=red>S</font>FQTSDQRYV<font color=red>R</font>YPNNFQFQYDVSSAAQPGCSVLD<br> <br> <br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLP<font color=yellow>T</font>LESVENYTLTINDD<br> <br> <br> QCL<font color=red>L</font>LSETVWGAL<font color=red>R</font>GLETFSQLVWKSAEGTFFINKTEIEDFPRFPH<font color=red>R</font>GLLLDTS<font color=red>R</font><font color=red>H</font><font color=red>Y</font>LPLS<br> QCL<font color=red>F</font>LSETVWGAL<font color=red>!</font>GLETFSQLVWKSAEGTFFINKTEIEDFPRFPH<font color=red>W</font>GLLLDTS<font color=red>H</font><font color=red>D</font><font color=red>!</font>LPLS<br> <br> <br> SILDTLDVMAYN<font color=red>K</font>LN<font color=red>V</font>FHW<font color=red>H</font>LVD<font color=blue>D</font>P<font color=red>S</font><font color=red>F</font>PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> SILDTLDVMAYN<font color=red>T</font>LN<font color=red>M</font>FHW<font color=red>R</font>LVD<font color=blue>D</font>P<font color=red>F</font><font color=red>S</font>PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY<br> <br> <br> A<font color=red>R</font><font color=red>L</font>R<font color=red>G</font>IRVLAEF<font color=red>D</font>TPGHTLSWGP<font color=red>G</font>IPGLLTPCYSGSEPSGTFGPVNP<font color=red>S</font>LNNTYEF<font color=red>M</font>STFFL<br> A<font color=red>W</font><font color=red>F</font>R<font color=red>S</font>IRVLAEF<font color=red>H</font>TPGHTLSWGP<font color=red>S</font>IPGLLTPCYSGSEPSGTFGPVNP<font color=red>I</font>LNNTYEF<font color=red>R</font>STFFL<br> <br> <br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTC<font color=red>W</font>KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> EVSSVFPDFYLHLGGDE<font color=yellow>V</font>DFTC<font color=red>!</font>KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG<br> <br> <br> KGYVVWQEVFDNKVKIQPDTIIQVW<font color=red>R</font>EDIPV<font color=red>N</font>YMKELELVTKAGFRALLSAP<font color=red>W</font>YLNRISYG<br> KGYVVWQEVFDNKVKIQPDTIIQVW<font color=red>!</font>EDIPV<font color=red>D</font>YMKELELVTKAGFRALLSAP<font color=red>C</font>YLNRISYG<br> <br> <br> PDWKDFY<font color=red>I</font>VEPLAFEGT<font color=yellow>P</font>EQKA<font color=red>L</font>VI<font color=red>G</font>G<font color=blue>E</font>ACMWGEYVDNTNLVPRL<font color=red>W</font>PRAGAVA<font color=red>E</font>RL<font color=red>W</font>SNKL<br> PDWKDFY<font color=red>V</font>VEPLAFEGT<font color=yellow>P</font>EQKA<font color=red>V</font>VI<font color=red>D</font>G<font color=blue>S</font>ACMWGEYVDNTNLVPRL<font color=red>C</font>PRAGAVA<font color=red>K</font>RL<font color=red>R</font>SNKL<br> <br> <br> TSDLTFAYE<font color=red>R</font>LSHF<font color=red>R</font>C<font color=yellow>E</font>LLRRGVQAQPLNVGFCEQEFEQT |
||
Line 428: | Line 435: | ||
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
<font color=red>Non-silent mutation</font> <font color=yellow>Silent mutation</font> <font color=blue>Wrong AA in mutation annotation</font><br> |
||
− | + | We decided to show the mutations in the 3D structure of the protein. We wanted to show the mutations in the protein structure, that it is possible to see if the mutation takes place in a secondary structure element (which is mostly more effective) or in a loop region (which normally do not affect the structure of the protein that much). |
|
{| |
{| |
||
− | | [[Image:frontview_mut.png| |
+ | | [[Image:frontview_mut.png|thumb|400px|Figure 1: Graphical representation of the mutations in the 3D structure of HEXA (frontview)<br>red: non-silent mutations<br>green: silent mutations]] |
− | | [[Image:backview_mut.png| |
+ | | [[Image:backview_mut.png|thumb|360px|Figure 2: Graphical representation of the mutations in the 3D structure of HEXA (backview)<br>red: non-silent mutations<br>green: silent mutations]] |
|} |
|} |
||
− | Red colored residues are non-silent mutations. Green colored residues show silent mutations. |
||
+ | As we can see in both pictures (Figure 1 and Figure 2) there are silent mutations which are located in loops and in secondary structure elements, as well as non silent mutations. About half of the non-silent mutations are located in loops . Therefore they do not destroy a secondary structure element and the damaging effect has to explained in a different way. |
||
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
||
== Statistical comparison of HGMD and DBSNP == |
== Statistical comparison of HGMD and DBSNP == |
||
For the analysis of the two different database results we decided to do some statistical comparison. |
For the analysis of the two different database results we decided to do some statistical comparison. |
||
− | First of all we compared the different resulting tables (see above) according to their |
+ | First of all we compared the different resulting tables (see above) according to their mutation position in the triplet. Therefore we created a bar plot which shows the percentage of the frequency for the corresponding mutation position (Figure 3). |
− | The first three bars show the case of the overlapping results of both databases. One can see that there are as much mutations at the first |
+ | The first three bars show the case of the overlapping results of both databases. One can see that there are as much mutations at the first position as at the second position. The occurrence of a mutation on the third position deviates from the others which means it is much rarer. The reason for this is that the database HGMD contains no silent mutations which means that the overlap of both databases do not contain them as well. The third position of a triplet often causes a silent mutation and only a few mutations on the third position result in an amino acid change. Therefore, this explains why the third position is that rare while the other two positions are equal frequent. |
− | The second case displays the position frequency of the corresponding mutations which are only resulting in DBSNP. Here one can see that mutations at position one are more frequent than at position two and that there are no mutations at position three. The |
+ | The second case displays the position frequency of the corresponding mutations which are only resulting in DBSNP (Figure 3, bar 4,5). Here one can see that mutations at position one are more frequent than at position two and that there are no mutations at position three. The mutations in DBSNP are not silent which can explain why there is no mutation at position three: mutations at the third position of a triplet are very often silent. One explanation why the first position is more often mutated than the second one is, that there are some amino acids which do not change if the second position of the triplet is substituted. So in sum there are more mutations if you change the first position, because in this case the substitution always leads to a non-silent mutation. |
− | The third case shows the resulting silent mutations of DBSNP. Here the most frequent mutation position is the third one while the other ones are same common. This is the opposite |
+ | The third case shows the resulting silent mutations of DBSNP (Figure 3, bar 6,7,8). Here the most frequent mutation position is the third one while the other ones are same common. This is the opposite behavior comparing to the other cases and has a similar explanation: mutations at the third position of a triplet often result in a silent mutation where contrary silent mutations at the other position are very rare. |
− | The last case represents the total distribution of the |
+ | The last case represents the total distribution of the mutation positions (Figure 3, bar 9,10,11). Here the first position is the most common for a mutation followed by the second position which is almost same common. The third position is the least frequent one. This is the expected result corresponding to the other three cases: the first and the second position are almost always the most frequent ones and the third the rarest. In the third case there is an exception which is the reason why the difference is not so high in the complete distribution. |
− | In summary, the |
+ | In summary, the bar plot of the different tables/cases correspond to the expectation and can be all explained logically. |
− | [[Image:barplot.png|center|600px|Figure |
+ | [[Image:barplot.png|center|600px|thumb|Figure 3: Bar plot of the mutation positions for the different tables]] |
− | As a next step we looked up which amino acid mutates most often for each table. Therefore we create a |
+ | As a next step we looked up which amino acid mutates most often for each table. Therefore we create a bar plot where the frequency for a mutation of a certain amino acid is displayed for each table. The different colors correspond to the different tables and the total distribution. Furthermore we plotted not the absolute values, but the relative ratio of the amino acid exchange within one table (in percent). |
− | Looking at the |
+ | The result of this plot can be seen in Figure 4. Looking at the overlapping result of both database we can see that almost every amino acid mutation occurs. The onliest amino acids which do not mutate are Alanine, Asparagine, Cysteine, Glutamine and Threonine. Three of these amino acid (Ala, Cys, Gln) do not occur in the other tables as well. One possible reason is that these amino acids were encoded only by triplets which are very rare. This means, they probably occur less often. For asparagine this explanation is probably also right, which means that only few triplets encodes it. Threonine does probably never mutate by accident, because it can be encoded by an higher number of triplets and is a very special amino acid. |
The amino acids that mutate most common in the overlapping result of both databases are Arginine and Glycine. A possible reason for this is that Arginine can be encoded by many possible triplets as well as Glycine which has as result that they are probably more common amino acids. |
The amino acids that mutate most common in the overlapping result of both databases are Arginine and Glycine. A possible reason for this is that Arginine can be encoded by many possible triplets as well as Glycine which has as result that they are probably more common amino acids. |
||
− | Looking at the DBSNP |
+ | Looking at the DBSNP results which were not in HGMD we can see that there are many amino acids which do not mutate. A reason for this is that the number of found mutations which are only in SNPDB and which are not silent is very low and therefore not really significant. The most common mutated amino acid here is Asparagine. A possible reason can be that this amino acid is encoded by less triplets and in relation to the other tables it occurs here very often by chance. |
For the silent mutations we also extract the amino acid where a nucleotide exchange takes places. Most strikingly are Serine and Valine which have the highest percentage for silent mutations. This can be explained by the fact that these amino acids were encoded by a high number of different triplets. The reason is that a silent mutation is more usually when a amino acid is encoded by many triplets. |
For the silent mutations we also extract the amino acid where a nucleotide exchange takes places. Most strikingly are Serine and Valine which have the highest percentage for silent mutations. This can be explained by the fact that these amino acids were encoded by a high number of different triplets. The reason is that a silent mutation is more usually when a amino acid is encoded by many triplets. |
||
− | At last we |
+ | At last we plotted the total result of all tables together. In this case the most common amino acid exchange is for Arginine followed by Glycine, Serine and Tryptophan. This corresponds to the overlapping results of both databases, because it has highest number of results. The only exception is Serine which mutates a lot in the other two cases. Furthermore, these results agree also with the explanation that these amino acids, except Tryptophan, were encoded by many different triplets. This means that they occur probably more often in sequences. Tryptophan is probably occurring so often by chance. The rest of the amino acid mutation rates correspond mostly to the results of both databases. |
− | All in all, the |
+ | All in all, the bar plot result agrees often with the number of triplets that encodes a certain amino acid. The amino acids that are encoded by many triplet are probably more common amino acids while amino acids that are encoded only by few triplets are rare. However this is not really true in biology but it agrees partly. Furthermore the silent mutations are more common in amino acids that were encoded by many triplets which is logical and can be explained well. Besides the results are not really significant because they are very small. For a real statistic evidence, more data is necessary. However, this bar plot gives a good overview for the amino acid mutation rate for this special case. |
− | [[Image:barplot_aa.png|center| |
+ | [[Image:barplot_aa.png|center|800px|thumb|Figure 4: Bar plot for the frequency of a certain amino acid mutation for the different tables]] |
− | Afterwards we decided to create another graphical representation for the different amino acid mutations. Therefore, we included all possible mutations from DBSNP |
+ | Afterwards, we decided to create another graphical representation for the different amino acid mutations. Therefore, we included all possible mutations from DBSNP and HGMD that are not silent. On the y-axis are the original amino acids and on the x-axis are the advise mutated amino acids (Figure 5). This means it displays which amino acid mutates to a certain other amino acid. The color visualizes the frequency of such a certain amino acid exchange. White color means that there is no mutation and the darker the color the more common is a certain exchange. |
− | We can see here that the most common exchanges are Arginine to |
+ | We can see here that the most common exchanges are Arginine to Cysteine, Arginine to Histidine, Glycine to Aspartic acid, Glycine to Serine and Tryptophan to Cysteine. Furthermore, it is possible to see that there are a lot mutation possibilities that do not occur at all (white). |
− | [[Image:aa_mat.png|center|600px|Figure |
+ | [[Image:aa_mat.png|center|600px|thumb|Figure 5: Heatmap for the amino acid exchange for all non-silent mutations]] |
− | At last we build another heatmap for the nucleotide exchanges. Therefore, we also include all possible exchanges from DBSNP and HGMD (also with silent mutations). On the y-axis are the original nucleotides and on the x-axis are the exchanged nucleotides. In this case the colors also visualize the frequency of a certain exchange. White color means that no exchange took places and the darker the color the more exchanges proceed. |
+ | At last we build another heatmap for the nucleotide exchanges (Figure 6). Therefore, we also include all possible exchanges from DBSNP and HGMD (also with silent mutations). On the y-axis are the original nucleotides and on the x-axis are the exchanged nucleotides. In this case the colors also visualize the frequency of a certain exchange. White color means that no exchange took places and the darker the color the more exchanges proceed. |
+ | We can see here the most common nucleotide exchange is from Guanine to Adenine. Other common nucleotide exchanges are Cysteine to Thymine and Thymine to Cysteine. A possible explanation is that Guanine and Adenine are both Purines whereas Cytosine and Thymine are Pyrimidines. This means that this two groups have similar shapes and that possibly these makes a mutation between these bases easier and also less selective. Another point can be that the base excision repair recognizes mutations with the same shape harder. |
||
− | We can see here... |
||
+ | There are also two exchanges that never happen: Cytosine to Adenine and Adenine to Thymine. The other exchanges have an average frequency. |
||
+ | [[Image:bp_mat.png|center|400px|thumb|Figure 6: Heatmap for the nucleotide exchange for all mutations]] |
||
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs disease]]<br><br> |
Latest revision as of 14:45, 23 September 2011
Contents
Methods
First of all, we had to parse the HGMD database and the DB-SNP database to extract the different SNPs which are already known for the HEXA protein.
HGMD
We logged in and searched for Tay-Sachs diseases and chose one entry of HEXA. In our case there were two entries with identical content. We only looked at the missense/nonsense mutations, so in sum we found 68 annotated mutations in HGMD. We just copied the webpage in a text file and than wrote a short parser, which parse the codon change, amino acid change and codon number.
Back to [Tay-Sachs disease]
DBSNP
It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the graphical output and again copied and pasted the page in a text file.
An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence is listed. In the next line, there are the allele origin, the clinical relevance and last the annotations of the mutation.
We parsed the SNP-id. If there was an NP entry on the line with the annotations, there was a detailed description on which position in this protein which amino acid is changed in this SNP. Than we used this annotation. If we could not find such an annotation we used the NM or NR annotation. Both of them describe the mutation at the nucleotide sequence. We used the position and divided it by 3, so therefore, we know the codon position and the position of the nucleotide exchange in the codon (1, 2 or 3). We used the sequence to get the triplet of this codon and than we mapped the original triplet and the mutated triplet to the different amino acids. Therefore, it was possible to annotate the position, codon position and the mutations of this position, which we wrote in our tables.
Back to [Tay-Sachs disease]
Comparison of mutations in HGMD and DBSNP
For the comparison of the mutations in HGMD and DBSNP we made table with some important information. First of all we extracted the DBSNP-id which makes it easy to look after this SNP if necessary. Next we looked up the codon position which displays where the mutation takes place in the sequence. Furthermore we also extracted the mutation position which corresponds to the position in the triplet where the mutation takes place. The next two entries displays the mutations for the amino acids and codon which means it shows detailed which amino acid is replaced by another and how the codon is changed.
Afterwards, we created descriptive representation of the mutation by showing the sequence and coloring the corresponding kind of mutation.
Following tables and descriptive visualizations are for different cases which are described below.
Furthermore, a detailed summary of the different cases and a comparison of them can be seen below at [statistical comparison of HGMD and DBSNP].
Back to [Tay-Sachs disease]
Mutations annotated in both databases
First of all we compared the mutations which are annotated in both databases. These mutations are not silent and cause all the phenotype, which is known as Tay-Sachs disease. We know this pretty sure, because HGMD contains no silent mutations. Here we found 33 annotated mutations where some of them take place in different mutation positions in the same triplet. Besides, in some case two different named mutation take places at the same codon position.
SNP-DB Identifier | Codon position | Mutation position | Amino Acids | Codons |
rs121907964 | 26 | 3 | Trp -> TER | TGGc -> TGA |
rs121907979 | 39 | 2 | Leu -> Arg | CTT -> CGT |
rs121907975 | 127 | 1 | Leu -> Phe | aCTC -> TTC |
2 | Leu -> Arg | CTC -> CGC | ||
rs121907962 | 137 | 1 | Arg -> TER | cCGA -> TGA |
rs121907972 | 170 | 1 | Arg -> Trp | cCGG -> TGG |
rs121907957 | 2 | Arg -> Gln | CGG -> CAG | |
rs28941770 | 178 | 2 | Arg -> His | CGC -> CAC |
2 | Arg -> Leu | CGC -> CTC | ||
rs121907953 | 1 | Arg -> Cys | tCGC -> TGC | |
rs121907969 | 180 | 3 | Tyr -> TER | TACc -> TAG |
rs28941771 | 1 | Tyr -> His | tTAC -> CAC | |
rs121907973 | 197 | 2 | Lys -> Thr | AAA -> ACA |
rs1800429 | 200 | 1 | Val -> Met | cGTG -> ATG |
rs121907976 | 204 | 2 | His -> Arg | CAT -> CGT |
rs121907961 | 210 | 2 | Ser -> Phe | TCC -> TTC |
rs121907974 | 211 | 2 | Phe -> Ser | TTC -> TCC |
rs121907970 | 247 | 1 | Arg -> Trp | aCGG -> TGG |
rs121907959 | 250 | 1 | Gly -> Ser | gGGT -> AGT |
2 | Gly -> Asp | GGT -> GAT | ||
2 | Gly -> Val | GGT -> GTT | ||
rs121907971 | 258 | 1 | Asp -> His | tGAC -> CAC |
rs121907954 | 269 | 1 | Gly -> Ser | aGGT -> AGT |
2 | Gly -> Asp | GGT -> GAT | ||
rs121907977 | 301 | 2 | Met -> Arg | ATG -> AGG |
rs121907967 | 329 | 2 | Trp -> TER | TGG -> TAG |
rs121907963 | 393 | 1 | Arg -> TER | gCGA -> TGA |
rs121907958 | 420 | 3 | Trp -> Cys | TGGt -> TGC |
3 | Trp -> Cys | TGGt -> TGT | ||
rs28940871 | 451 | 1 | Leu -> Val | tCTG -> GTG |
rs121907978 | 454 | 2 | Gly -> Asp | GGT -> GAT |
1 | Gly -> Ser | tGGT -> AGT | ||
rs121907981 | 474 | 3 | Trp -> Cys | TGGc -> TGC |
rs121907952 | 482 | 1 | Glu -> Lys | cGAA -> AAA |
rs121907968 | 485 | 1 | Trp -> Arg | gTGG -> CGG |
rs121907966 | 499 | 1 | Arg -> Cys | aCGT -> TGT |
rs121907956 | 2 | Arg -> His | CGT -> CAT | |
rs28942071 | 504 | 1 | Arg -> Cys | cCGC -> TGC |
rs121907955 | 2 | Arg -> His | CGC -> CAC | |
rs4777502 | 506 | 3 | Glu -> Asp | GAA -> GAC |
3 | Glu -> Asp | GAA -> GAT |
Graphical representation:
The graphical representation shows at which position a certain mutation takes places. In this case only non-silent mutations are marked. The reason is that in HGMD only non-silent mutations are annotated and therefore the results which agree in both databases are also non-silent. Furthermore, there are no amino acids which are wrong annotated.
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWP!PQNFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHH!LPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
AWLRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFRSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVNYMKELELVTKAGFRALLSAPCYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKAVVIDGEACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYECLSHFCCDLLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
Back to [Tay-Sachs disease]
Mutations annotated only in HGMD
We also looked for mutations which are annotated only in HGMD, but we did not find any of them.
Back to [Tay-Sachs disease]
Mutations annotated only in SNP-DB
Here we listed all mutations which are annotated only in the DBSNP and which are not silent. Some of these mutations have a high detailed NP annotation while others are not annotated in such a detailed way. Therefore, we had to map these mutations. The detailed list of the mutations, which we mapped can be found [here]. Finally, 8 non-silent mutations could be found which have only one possible mutation position and codon position.
SNP-DB Identifier | Codon position | Mutation position | Amino Acids | Codons |
rs4777505 | 29 | 2 | Asn -> Ser | AAC -> AGC |
rs61731240 | 179 | 1 | His -> Asp | CAT -> GAT |
rs3743230 | 208 | 1 | Asn -> Asp | AAC -> GAC |
rs61747114 | 248 | 1 | Leu -> Phe | CTT -> TTT |
rs1054374 | 293 | 2 | Ser -> Ile | AGT -> ATT |
rs1800430 | 399 | 1 | Asn -> Asp | AAC -> GAC |
rs1800431 | 436 | 1 | Val -> Ile | ATA -> GTA |
rs121907982 | 456 | 2 | Tyr -> Ser | TAT -> TCT |
Graphical representation:
The following visualization displays the mutations which are only in DBSNP and non-silent. Sometimes, we got the wrong amino acid in the mutation annotation which can be explained, because there exist some different sequences of this protein, depending on which sequence assembly is used. Therefore, if DBSNP uses another sequence than we, it is possible that there are some other amino acids in the sequence. It is not possible to say which sequence is the right one. Therefore, we decided to color wrong annotated amino acids blue, so it is possible for the reader to see with one look that there is a difference.
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQSFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRDYLPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARFRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFMSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVDYMKELELVTKAGFRALLSAPWYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYVVEPLAFEGTPEQKALVIGGSACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
Back to [Tay-Sachs disease]
Silent Mutations
HGMD contains no silent mutations which means that the silent mutations are extracted from DBSNP. These silent mutations do not cause change the amino acid. It is not totally clear, if these silent mutations do not cause any phenotypes, because there exist the hypothesis, that silent mutations change the folding of the protein. If one very common triplet is change by a codon which is rare it could be possible that the tRNA needs more time to bind on the ribosome, than a tRNA of a very common codon because of their frequent existence (compare for example Coding-Sequence Determinants of Gene Expression in Escherichia coli). This could lead to changes in the folding of the protein and there it is also possible, that a silent mutation change the phenotype because of miss-folded proteins. Therefore, we think it is important to list the silent mutations, too. They are not annotated in the HGMD database and therefore, probably they do not change the phenotype. Otherwise, if the hypothesis with the miss-folded proteins because of different codons is true, the silent mutations should be kept in mind. Therefore, it would be possible to explain a different phenotype although there is no amino acid exchange. This is the reason, why we decided to list the non-silent mutations in this section.
One problem with these mutations is, that these are badly annotated in the DBSNP which means we had to prepare the found results in addition. Therefore we first rotated the found codons, because the original codon has to encode for the amino acid which occur in the protein sequence.
Therefore we used the codon with a mutation at position one, position two and position three. Next, we also reversed them and created the complementary sequence for both.
The detailed result can be seen [here]
If we found more than one nucleotide combination that encodes the same amino acid in the protein sequence and if these are silent mutations as well, we listed them all in the following table. Otherwise, if there do not exist any other possible mutation for this position, we also listed the mutations which are not silent.
Here are the results which displays all combinations that are possible:
SPN-DB Identifier | Codon position | Mutation position | Amino Acids | Codons | translation |
rs1800428 | 3 | 3 | Ser -> Ser | AGC -> AGT | Forward |
rs11551324 | 109 | 3 | Thr -> Thr | ACC -> ACT | forward |
rs28942072 | 324 | 3 | Val -> Val | GTT -> GTA | Forward |
GTT -> GTC | |||||
rs34085965 | 446 | 1 | Pro -> Pro | CCT -> CCC | complementary reverse |
rs4777502 | 506 | 3 | Glu -> Glu | GAG -> GAA | Forward |
Graphical representation:
Here we listed all silent mutation which we found in the database to give the user the possibility to see the position of the mutations with one look.
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
Back to [Tay-Sachs disease]
Summary
Graphical representation:
Here we combined all graphical representations of the different cases. This means this visualization displays all possible mutations which are found in HGMD and DBSNP. Therefore we can see non-silent mutations as well as silent mutations. Furthermore, the wrong annotated amino acids are marked, too.
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWP!PQSFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHD!LPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
AWFRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFRSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVDYMKELELVTKAGFRALLSAPCYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYVVEPLAFEGTPEQKAVVIDGSACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYECLSHFCCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
We decided to show the mutations in the 3D structure of the protein. We wanted to show the mutations in the protein structure, that it is possible to see if the mutation takes place in a secondary structure element (which is mostly more effective) or in a loop region (which normally do not affect the structure of the protein that much).
As we can see in both pictures (Figure 1 and Figure 2) there are silent mutations which are located in loops and in secondary structure elements, as well as non silent mutations. About half of the non-silent mutations are located in loops . Therefore they do not destroy a secondary structure element and the damaging effect has to explained in a different way.
Back to [Tay-Sachs disease]
Statistical comparison of HGMD and DBSNP
For the analysis of the two different database results we decided to do some statistical comparison.
First of all we compared the different resulting tables (see above) according to their mutation position in the triplet. Therefore we created a bar plot which shows the percentage of the frequency for the corresponding mutation position (Figure 3).
The first three bars show the case of the overlapping results of both databases. One can see that there are as much mutations at the first position as at the second position. The occurrence of a mutation on the third position deviates from the others which means it is much rarer. The reason for this is that the database HGMD contains no silent mutations which means that the overlap of both databases do not contain them as well. The third position of a triplet often causes a silent mutation and only a few mutations on the third position result in an amino acid change. Therefore, this explains why the third position is that rare while the other two positions are equal frequent.
The second case displays the position frequency of the corresponding mutations which are only resulting in DBSNP (Figure 3, bar 4,5). Here one can see that mutations at position one are more frequent than at position two and that there are no mutations at position three. The mutations in DBSNP are not silent which can explain why there is no mutation at position three: mutations at the third position of a triplet are very often silent. One explanation why the first position is more often mutated than the second one is, that there are some amino acids which do not change if the second position of the triplet is substituted. So in sum there are more mutations if you change the first position, because in this case the substitution always leads to a non-silent mutation.
The third case shows the resulting silent mutations of DBSNP (Figure 3, bar 6,7,8). Here the most frequent mutation position is the third one while the other ones are same common. This is the opposite behavior comparing to the other cases and has a similar explanation: mutations at the third position of a triplet often result in a silent mutation where contrary silent mutations at the other position are very rare.
The last case represents the total distribution of the mutation positions (Figure 3, bar 9,10,11). Here the first position is the most common for a mutation followed by the second position which is almost same common. The third position is the least frequent one. This is the expected result corresponding to the other three cases: the first and the second position are almost always the most frequent ones and the third the rarest. In the third case there is an exception which is the reason why the difference is not so high in the complete distribution.
In summary, the bar plot of the different tables/cases correspond to the expectation and can be all explained logically.
As a next step we looked up which amino acid mutates most often for each table. Therefore we create a bar plot where the frequency for a mutation of a certain amino acid is displayed for each table. The different colors correspond to the different tables and the total distribution. Furthermore we plotted not the absolute values, but the relative ratio of the amino acid exchange within one table (in percent).
The result of this plot can be seen in Figure 4. Looking at the overlapping result of both database we can see that almost every amino acid mutation occurs. The onliest amino acids which do not mutate are Alanine, Asparagine, Cysteine, Glutamine and Threonine. Three of these amino acid (Ala, Cys, Gln) do not occur in the other tables as well. One possible reason is that these amino acids were encoded only by triplets which are very rare. This means, they probably occur less often. For asparagine this explanation is probably also right, which means that only few triplets encodes it. Threonine does probably never mutate by accident, because it can be encoded by an higher number of triplets and is a very special amino acid. The amino acids that mutate most common in the overlapping result of both databases are Arginine and Glycine. A possible reason for this is that Arginine can be encoded by many possible triplets as well as Glycine which has as result that they are probably more common amino acids.
Looking at the DBSNP results which were not in HGMD we can see that there are many amino acids which do not mutate. A reason for this is that the number of found mutations which are only in SNPDB and which are not silent is very low and therefore not really significant. The most common mutated amino acid here is Asparagine. A possible reason can be that this amino acid is encoded by less triplets and in relation to the other tables it occurs here very often by chance.
For the silent mutations we also extract the amino acid where a nucleotide exchange takes places. Most strikingly are Serine and Valine which have the highest percentage for silent mutations. This can be explained by the fact that these amino acids were encoded by a high number of different triplets. The reason is that a silent mutation is more usually when a amino acid is encoded by many triplets.
At last we plotted the total result of all tables together. In this case the most common amino acid exchange is for Arginine followed by Glycine, Serine and Tryptophan. This corresponds to the overlapping results of both databases, because it has highest number of results. The only exception is Serine which mutates a lot in the other two cases. Furthermore, these results agree also with the explanation that these amino acids, except Tryptophan, were encoded by many different triplets. This means that they occur probably more often in sequences. Tryptophan is probably occurring so often by chance. The rest of the amino acid mutation rates correspond mostly to the results of both databases.
All in all, the bar plot result agrees often with the number of triplets that encodes a certain amino acid. The amino acids that are encoded by many triplet are probably more common amino acids while amino acids that are encoded only by few triplets are rare. However this is not really true in biology but it agrees partly. Furthermore the silent mutations are more common in amino acids that were encoded by many triplets which is logical and can be explained well. Besides the results are not really significant because they are very small. For a real statistic evidence, more data is necessary. However, this bar plot gives a good overview for the amino acid mutation rate for this special case.
Afterwards, we decided to create another graphical representation for the different amino acid mutations. Therefore, we included all possible mutations from DBSNP and HGMD that are not silent. On the y-axis are the original amino acids and on the x-axis are the advise mutated amino acids (Figure 5). This means it displays which amino acid mutates to a certain other amino acid. The color visualizes the frequency of such a certain amino acid exchange. White color means that there is no mutation and the darker the color the more common is a certain exchange.
We can see here that the most common exchanges are Arginine to Cysteine, Arginine to Histidine, Glycine to Aspartic acid, Glycine to Serine and Tryptophan to Cysteine. Furthermore, it is possible to see that there are a lot mutation possibilities that do not occur at all (white).
At last we build another heatmap for the nucleotide exchanges (Figure 6). Therefore, we also include all possible exchanges from DBSNP and HGMD (also with silent mutations). On the y-axis are the original nucleotides and on the x-axis are the exchanged nucleotides. In this case the colors also visualize the frequency of a certain exchange. White color means that no exchange took places and the darker the color the more exchanges proceed.
We can see here the most common nucleotide exchange is from Guanine to Adenine. Other common nucleotide exchanges are Cysteine to Thymine and Thymine to Cysteine. A possible explanation is that Guanine and Adenine are both Purines whereas Cytosine and Thymine are Pyrimidines. This means that this two groups have similar shapes and that possibly these makes a mutation between these bases easier and also less selective. Another point can be that the base excision repair recognizes mutations with the same shape harder. There are also two exchanges that never happen: Cytosine to Adenine and Adenine to Thymine. The other exchanges have an average frequency.
Back to [Tay-Sachs disease]