Mapping SNPs HEXA

Methods

First of all, we had to parse the HGMD database and the DB-SNP database.

HGMD

We logged us in and searched for Tay-Sachs diseases and chose one entry of HEXA. In our case there were two entries with identical content. We only looked at the missense/nonsense mutations, which are 68 annotated in HGMD. We just copied the webpage in a textfile and than wrote a short parser, which parse the codon change, amino acid change and codon number.

DBSNP

It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the grapical output and again copied and pasted the page in a textfile. An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence. In the next line, there are the allel origin, the clinical relevance and last the annotations of the mutation. TBA...

Comparison of mutations in HGMD and DBSNP

For the comparison of the mutations in HGMD and DBSNP we made table with some importent informations. First of all we extraced the DBSNP-id which makes it easy to look after this SNP if necessary. Next we looked up the codonposition which displays where the mutation takes place in the sequence. Furthermore we also extraced the mutation position which corresponds to the position in the triplet where the mutation takes place. The next two entries displays the mutations for the amino acids and codon which means it shows detailed which amino acid is replaced by another and how the codon is changed. Afterwards we create descriptive representation for the mutation by showing the sequence and coloring the correspondig kind of mutation. Following tables and descriptive visualisations are for different cases which are described below. Furthermore a detailed summary of the different cases and a comparision of them can be seen below at 'statistical comparison of HGMD and DBSNP'.

Mutations annotated in both databases

First of all we compared the mutations which are annotated in both databases. These mutations are not silent and cause all a certain phenotype. The reason therefore is that HGMD contains no silent mutations. Here we found 33 annotated mutations where some them take place in different mutationpositions in the triplet. Besides, in some case two different named mutation take places at the same codon position.

SNP-DB Identifier	Codonposition	Mutationposition	Amino Acids	Codons
rs121907964	26	3	Trp -> TER	TGGc -> TGA
rs121907979	39	2	Leu -> Arg	CTT -> CGT
rs121907975	127	1	Leu -> Phe	aCTC -> TTC
rs121907975	127	2	Leu -> Arg	CTC -> CGC
rs121907962	137	1	Arg -> TER	cCGA -> TGA
rs121907972	170	1	Arg -> Trp	cCGG -> TGG
rs121907957	170	2	Arg -> Gln	CGG -> CAG
rs28941770	178	2	Arg -> His	CGC -> CAC
rs28941770		2	Arg -> Leu	CGC -> CTC
rs121907953		1	Arg -> Cys	tCGC -> TGC
rs121907969	180	3	Tyr -> TER	TACc -> TAG
rs28941771	180	1	Tyr -> His	tTAC -> CAC
rs121907973	197	2	Lys -> Thr	AAA -> ACA
rs1800429	200	1	Val -> Met	cGTG -> ATG
rs121907976	204	2	His -> Arg	CAT -> CGT
rs121907961	210	2	Ser -> Phe	TCC -> TTC
rs121907974	211	2	Phe -> Ser	TTC -> TCC
rs121907970	247	1	Arg -> Trp	aCGG -> TGG
rs121907959	250	1	Gly -> Ser	gGGT -> AGT
		2	Gly -> Asp	GGT -> GAT
		2	Gly -> Val	GGT -> GTT
rs121907971	258	1	Asp -> His	tGAC -> CAC
rs121907954	269	1	Gly -> Ser	aGGT -> AGT
rs121907954	269	2	Gly -> Asp	GGT -> GAT
rs121907977	301	2	Met -> Arg	ATG -> AGG
rs121907967	329	2	Trp -> TER	TGG -> TAG
rs121907963	393	1	Arg -> TER	gCGA -> TGA
rs121907958	420	3	Trp -> Cys	TGGt -> TGC
rs121907958	420	3	Trp -> Cys	TGGt -> TGT
rs28940871	451	1	Leu -> Val	tCTG -> GTG
rs121907978	454	2	Gly -> Asp	GGT -> GAT
rs121907978	454	1	Gly -> Ser	tGGT -> AGT
rs121907981	474	3	Trp -> Cys	TGGc -> TGC
rs121907952	482	1	Glu -> Lys	cGAA -> AAA
rs121907968	485	1	Trp -> Arg	gTGG -> CGG
rs121907966	499	1	Arg -> Cys	aCGT -> TGT
rs121907956	499	2	Arg -> His	CGT -> CAT
rs28942071	504	1	Arg -> Cys	cCGC -> TGC
rs121907955	504	2	Arg -> His	CGC -> CAC
rs4777502	506	3	Glu -> Asp	GAA -> GAC
rs4777502	506	3	Glu -> Asp	GAA -> GAT

Graphical representation:
The graphical representation shows at which position a certain mutation takes places. In this case onla non-silent mutations are marked. The reason is that in HGMD only non-silent mutations are annotated and therefore the results which agree in both databases are also non-silent. Furthermore there are no amino acids which are wrong annotated.

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 MTSSRLWFSLLLAAAFAGRATALWP!PQNFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD
 
 
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 
 
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
 QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHH!LPLS
 
 
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 
 
 ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
 AWLRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFRSTFFL
 
 
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 
 
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
 KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVNYMKELELVTKAGFRALLSAPCYLNRISYG
 
 
 PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 PDWKDFYIVEPLAFEGTPEQKAVVIDGEACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL
 
 
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT
 TSDLTFAYECLSHFCCDLLRRGVQAQPLNVGFCEQEFEQT


 Non-silent mutation       Silent mutation       Wrong AA in mutation annotation

Mutations annotated only in HGMD

We also looked for mutations which are annotated only in HGMD, but we had not found any of them.

Mutations annotated only in SNP-DB

Here we listed all mutations which are annotated only in the DBSNP and which are not silent. Some of these mutations have a high detailed NP annotation while others are not as detailed annotated. Therefore we had to map these mutations. The detailed list of the mutations, which we mapped can be found [here]. Finally, 8 non-silent mutations could be found which have only one possible mutationposition and codonposition.

SNP-DB Identifier	Codonposition	Mutationposition	Amino Acids	Codons
rs4777505	29	2	Asn -> Ser	AAC -> AGC
rs61731240	179	1	His -> Asp	CAT -> GAT
rs3743230	208	1	Asn -> Asp	AAC -> GAC
rs61747114	248	1	Leu -> Phe	CTT -> TTT
rs1054374	293	2	Ser -> Ile	AGT -> ATT
rs1800430	399	1	Asn -> Asp	AAC -> GAC
rs1800431	436	1	Ile -> Val	ATA -> GTA
rs121907982	456	2	Tyr -> Ser	TAT -> TCT

Graphical representation:
The following visualisation displays the corresponding mutations which are only in DBSNP and non-silent. Therefore there is of course no silent mutation marked while some wrong annotated amino acids exist.

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 MTSSRLWFSLLLAAAFAGRATALWPWPQSFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 
 
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 
 
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRDYLPLS
 
 
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 
 
 ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
 ARFRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFMSTFFL
 
 
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 
 
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVDYMKELELVTKAGFRALLSAPWYLNRISYG
 
 
 PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 PDWKDFYVVEPLAFEGTPEQKALVIGGSACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 
 
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT


 Non-silent mutation       Silent mutation       Wrong AA in mutation annotation

Silent Mutations

HGMD contains no silent mutations which means that the silent mutations are extracted from DBSNP. This silent mutation do not cause change of the phenotype which means that the nucleotide exchange results in the same amino acid.

One problem is that this mutations are badly annotated in the DBSNP which means we had to prepare the found results in addition. Therefore we first rotated the found codons, because the original codon has to encode for the amino acid which occur in the protein sequence. Therefore we used the codon with a mutation at position one, position two and position three. Next, we also reversed them and created the complemetary sequence for both. The detailed result can be seen [here]
If we found more than one nucleotide combination which codes for the same amino acid in the protein sequence and if these are silent mutations as well, we listed them all in the following table. Otherwise, if there do not exist any other possible mutation for this postition, we also listed the mutations which are not silent.

Here are the results which displays all combinations that are possible:

SPN-DB Identifier	Codonposition	Mutationposition	Amino Acids	Codons	translation
rs1800428	3	3	Ser -> Ser	AGC -> AGT	Forward
rs11551324	109	3	Thr -> Thr	ACC -> ACT	forward
rs28942072	324	3	Val -> Val	GTT -> GGA	Forward
rs28942072	324	3	Val -> Val	GTC -> GTT	Forward
rs34085965	446	1	Pro -> Pro	CCT -> CCC	complemantry reverse
rs4777502	506	3	Glu -> Glu	GAG -> GAA	Forward

Graphical representation:
In this graphical representation the silent mutations which are only in DBSNP are displayed.

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 
 
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 
 
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
 
 
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 
 
 ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
 ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
 
 
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 
 
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
 
 
 PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 
 
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT  
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT


 Non-silent mutation       Silent mutation       Wrong AA in mutation annotation

Summary

Graphical representation:

Here we combined all graphical representations of the different cases. This means this visualisation displays all possible mutations which are found in HGMD and DBSNP. Therefore we can see non-silent mutations as well as silent mutations. Furthermore the wrong annotated amino acids are marked as well.

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
 MTSSRLWFSLLLAAAFAGRATALWP!PQSFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD
 
 
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
 
 
 QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
 QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHD!LPLS
 
 
 SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
 
 
 ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
 AWFRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFRSTFFL
 
 
 EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
 
 
 KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
 KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVDYMKELELVTKAGFRALLSAPCYLNRISYG
 
 
 PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
 PDWKDFYVVEPLAFEGTPEQKAVVIDGSACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL
 
 
 TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT
 TSDLTFAYECLSHFCCELLRRGVQAQPLNVGFCEQEFEQT


 Non-silent mutation       Silent mutation       Wrong AA in mutation annotation

Statistical comparison of HGMD and DBSNP

For the analysis of the two different database results we decided to do some statistical comparison.

First of all we compared the different resulting tables (see above) according to their mutationposition in the triplet. Therefore we created a barplot which shows the precentage of the frequency for the corresponding mutationposition.

For the first case where the overlapping results of both databases. One can see here that there are as much mutations at the first postion as at the second position. The occurance of a mutation on the third position deviates from the others which means it is much rarer. The reason for this ist that the database HGMD contains no silent mutations which means that the overlap of both databases do not contain them as well. The third position of a triplet often causes a silent mutation and there for only a few mutations on the third position result in an amino acid change. Therefore this explains why the third position is as rare while the other positions are equal frequent.

The second case displays the position frequency of the corresponding mutations which are only resulting in DBSNP. Here one can see that mutations at position one are more frequent than at position two and that there are no mutations at position three. The resulting mutations only for DBSNP are not silent which can explain why there is no mutation at position three: mutations at the third position of a triplet are very often silent. Besides there is probably no special reason why the first position is more common than the second one for a mutation.

The third case shows the resulting silent mutations of DBSNP. Here the most frequent mutation position is the third one while the other ones are same common. This is the opposite behaviour comparing to the other cases and has similar explantion: mutations at the third position of a triplet often result in a silent mutation where contrary silent mutations at the other position are very rare.

The last case represents the total distribution of the mutationpositions. Here the first position is the most common for a mutation followed by the second position which is almost same common. The third position is the least frequent one. This is the expected result corresponding to the other three cases: the first and the second position are almost always the most frequent ones and the third the rarest. In the third case there is an exception which is the reason why the difference is not so high in the total distribution.

In summary, the barplot of the different tables/cases correspond to the expectation and can be all explained logically.

Figure 1: Barplot of the mutationposition for the different tables

As a next step we looked up which amino acid mutates most often for each table. Therefore we create a barplot where the frequency for a mutation of a certain amino acid is displayed for each table. The different colors correspond to the differnt tables and the total distribution. Furthermore we ploted not the absolute values, but the relative ratio of the amino acid exchange within one table (in percent).

Looking at the overlaping result of both database we can see that almost every amino acid mutation occures. The only amino acids which do not mutate are Alanine, Asparagine, Cystein, Glutamine and Threonine. Three of these amino acid (Ala, Cys, Gln) do not occure in the other tables as well. One possible reason is that this amino acid were encoded only by very little possible triplets which means that they probably occure less often. Asparagine has probably the same reason which means that only few triplets encodes it. Threonine does probably never mutate by accident, because it can be encoded by an higher number of triplets. The amino acids that mutate most common in the overlapping result of both databases are Arginine and Glycine. A possible reason for this is that Arginine can be encoded by many possible triplets as well as Glycine which has as result that they are probably more common amino acids.

Looking at the DBSNP result which were not in HGMD we can see that there are many amino acids which do not mutate. A reason for this is that the number of found mutations which are only in SNPDB and which are not silent is very low and therefore not really significant. The most common mutated amino acid is here Asparagine. A possible reason can be that this amino acid is encoded by less triplets and in relation to the other tables it occures here very often by chance.

For the silent mutations we also extract the amino acid where a nucleotide exchange takes places. Most strikingly are Serine and Valine which have the highest percentage for silent mutations. This can be explained by the fact that these amino acids were encoded by a high number of different triplets. The reason is that a silent mutation is more usually when a amino acid is encoded by many triplets.

At last we ploted the total result of all tables together. In this case the most commonst amino acid exchange is for Arginine followed by Glycine, Serine and Tryptophan. This corresponds to the overlapping results of both databases, because it has highest number of results. The only excpetion is Serine which mutates a lot in the other two cases. Furthermore these result agrees also with the explanation that these amino acids excepted Tryptophan were encoded by many different triplets. This means that they occure probably more often in sequences. Tryptophan is probably occuring so often by chance, because it is encoded only by one specific triplet. The rest of the amino acid mutation rates correspond mostly the result of both databases as well.

All in all, the barplot result agrees often with the number of triplets that encodes a certain amino acid. The amino acids that are encoded by many triplet are probably more common amino acids while amino acids that are encoded only by few triplets are rare. However this is not really true in biology but it agrees partly. Furthermore the silent mutations are more common in amino acids that were encoded by many triplets which is logical and can be explained well. Besides the results are not really significant because they are very small. For a really statistic evidence more informations are necessary. However, this barplot gives a good overview for the amino acid mutation rate for this special case.

Mapping SNPs HEXA

Contents

Methods

HGMD

DBSNP

Comparison of mutations in HGMD and DBSNP

Mutations annotated in both databases

Mutations annotated only in HGMD

Mutations annotated only in SNP-DB

Silent Mutations

Summary

Statistical comparison of HGMD and DBSNP

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools