Mapping SNPs HEXA
Methods
First of all, we had to parse the HGMD database and the DB-SNP database.
HGMD
We logged us in and searched for Tay-Sachs diseases and chose one entry of HEXA. In our case there were two entries with identical content. We only looked at the missense/nonsense mutations, which are 68 annotated in HGMD. We just copied the webpage in a textfile and than wrote a short parser, which parse the codon change, amino acid change and codon number.
DBSNP
It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the grapical output and again copied and pasted the page in a textfile. An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence. In the next line, there are the allel origin, the clinical relevance and last the annotations of the mutation. TBA...
Comparison of mutations in HGMD and SNP-DB
- Mutations annotated in both databases:
mutations which are not silent and cause a phenotype
Codonposition | Mutationposition | Amino Acids | Codons |
26 | 3 | Trp -> TER | TGGc -> TGA |
39 | 2 | Leu -> Arg | CTT -> CGT |
127 | 1 | Leu -> Phe | aCTC -> TTC |
2 | Leu -> Arg | CTC -> CGC | |
137 | 1 | Arg -> TER | cCGA -> TGA |
170 | 1 | Arg -> Trp | cCGG -> TGG |
2 | Arg -> Gln | CGG -> CAG | |
178 | 2 | Arg -> His | CGC -> CAC |
2 | Arg -> Leu | CGC -> CTC | |
1 | Arg -> Cys | tCGC -> TGC | |
180 | 3 | Tyr -> TER | TACc -> TAG |
1 | Tyr -> His | tTAC -> CAC | |
197 | 2 | Lys -> Thr | AAA -> ACA |
200 | 1 | Val -> Met | cGTG -> ATG |
204 | 2 | His -> Arg | CAT -> CGT |
210 | 2 | Ser -> Phe | TCC -> TTC |
211 | 2 | Phe -> Ser | TTC -> TCC |
247 | 1 | Arg -> Trp | aCGG -> TGG |
250 | 1 | Gly -> Ser | gGGT -> AGT |
2 | Gly -> Asp | GGT -> GAT | |
2 | Gly -> Val | GGT -> GTT | |
258 | 1 | Asp -> His | tGAC -> CAC |
269 | 1 | Gly -> Ser | aGGT -> AGT |
2 | Gly -> Asp | GGT -> GAT | |
301 | 2 | Met -> Arg | ATG -> AGG |
329 | 2 | Trp -> TER | TGG -> TAG |
393 | 1 | Arg -> TER | gCGA -> TGA |
420 | 3 | Trp -> Cys | TGGt -> TGC |
3 | Trp -> Cys | TGGt -> TGT | |
451 | 1 | Leu -> Val | tCTG -> GTG |
454 | 2 | Gly -> Asp | GGT -> GAT |
1 | Gly -> Ser | tGGT -> AGT | |
474 | 3 | Trp -> Cys | TGGc -> TGC |
482 | 1 | Glu -> Lys | cGAA -> AAA |
485 | 1 | Trp -> Arg | gTGG -> CGG |
499 | 1 | Arg -> Cys | aCGT -> TGT |
2 | Arg -> His | CGT -> CAT | |
504 | 1 | Arg -> Cys | cCGC -> TGC |
2 | Arg -> His | CGC -> CAC |
Graphical representation:
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWP!PQNFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHH!LPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
AWLRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFRSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVNYMKELELVTKAGFRALLSAPCYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKAVVIDGEACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYECLSHFCCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
~
The non-silent mutations are colored in red. A "!" in the sequence means that there is a stop codon.
If there are more than one mutation at one position, we always used the first mutation in the graphical representation.
Mutations annotated only in HGMD:
not found
Mutations annotated only in SNP-DB:
mutations annotated only in SNP-DB and not silent (pos in codon ist annotiert und sicher):
Codonposition | Mutationposition | Amino Acids | Codons |
293 | 2 | Ser -> Ile | AGT -> ATT |
399 | 1 | Asn -> Asp | AAC -> GAC |
436 | 1 | Ile -> Val | ATA -> GTA |
456 | 2 | Tyr -> Ser | TAT -> TCT |
506 | 3 | Glu -> Asp | GAA -> GAC |
Glu -> Asp | GAA -> GAT |
Graphical representation:
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFMSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVDYMKELELVTKAGFRALLSAPWYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYVVEPLAFEGTPEQKALVIGGSACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCDLLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
mutations annotated only in SNP-DB and silent: (pos in codon ist annotiert und sicher)
Position | Amino Acids | Codons |
506 | Glu -> Glu | GAA -> GAG |
Graphical representation:
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation
silent mutation (unklar an welcher stelle im codon wird nirgends annotiert):
This mutations are badly annotated in the SNP-DB. Therefore we rotated the found codons, because the original codon has to code for the amino acid which occur in the protein sequence. Therefore we used the codon with the mutation at position one, position two and position three. Next we also reverse it and made the complemetary sequence of both. The detailed result can be seen [here]
Here is the result, which combinations are possible:
Codonposition | Mutationposition | Amino Acids | Codons |
3 | 1 | Phe -> Leu | TTC -> CTC |
2 | Val -> Ala | GTT -> GCT | |
3 | Ser -> Ser | AGT -> AGC | |
29 | 1 | Phe -> Leu | TTC -> CTC |
2 | Val -> Ala | GTT -> GCT | |
3 | Cys -> Cys | TGT -> TGC | |
109 | 1 | Phe -> Leu | TTT -> CTT |
2 | Leu -> Pro | CTT -> CCT | |
3 | Thr -> Thr | ACT -> ACC | |
179 | 1 | Val -> Leu | GTA -> CTA |
2 | Ser -> Thr | AGT -> ACT | |
3 | Asp -> Asp | GAG -> GAC | |
203 | 1 | STOP -> Arg | TGA -> CGA |
2 | Leu -> Pro | CTG -> CCG | |
3 | Thr -> Thr | ACT -> ACC | |
208 | 1 | Leu -> Leu | TTG -> CTG |
2 | Ile -> Thr | ATT -> ACT | |
3 | Tyr -> Tyr | TAT -> TAC | |
248 | 1 | Asp -> Lys | GAA -> AAA |
2 | Arg -> Glu | CGA -> CAA | |
3 | Pro -> Pro | CCG -> CCA | |
324 | 1 | TER -> Arg | TGA -> AGA |
TER -> Arg | TGA -> CGA | ||
2 | Leu -> TER | TTG -> TAG | |
Leu -> Ser | TTG -> TCG | ||
3 | Val -> Val | GTT -> GTA | |
Val -> Val | GTT -> GTC | ||
446 | 1 | Gly -> Arg | GGG -> AGG |
2 | Arg -> Glu | CGG -> CAG | |
3 | Ser -> Ser | TCG -> TCA | |
476 | 1 | Val -> Ile | GTT -> ATT |
2 | Ser -> Asn | AGT -> AAT | |
3 | Lys -> Lys | AAG -> AAA | |
540 | 1 | STOP -> Glu | TAG -> CAG |
2 | Ile -> Thr | ATA -> ACA | |
3 | His -> His | CAT -> CAC |
Graphical representation:
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQXFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRXYLPLS
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHXHLVDXPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARXRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTXEQKALVIGGEACMWGEYVDNTNLVPRLWPXAGAVAERLWSNKL
TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation