Mapping SNPs HEXA

From Bioinformatikpedia
Revision as of 16:15, 16 June 2011 by Link (talk | contribs) (Comparison of mutations in HGMD and SNP-DB)

Methods

First of all, we had to parse the HGMD database and the DB-SNP database.

HGMD

We logged us in and searched for Tay-Sachs diseases and chose one entry of HEXA. In our case there were two entries with identical content. We only looked at the missense/nonsense mutations, which are 68 annotated in HGMD. We just copied the webpage in a textfile and than wrote a short parser, which parse the codon change, amino acid change and codon number.

DBSNP

It was more complicated to parse the DBSNP output, than the output of HGMD. First of all, we search for HEXA in this database and chose only the SNPs which occur in human. We used the grapical output and again copied and pasted the page in a textfile. An entry in DBSNP has following structure: First of all there is the name of the SNP. Next the sequence and the graphical representation of the sequence. In the next line, there are the allel origin, the clinical relevance and last the annotations of the mutation. TBA...

Comparison of mutations in HGMD and SNP-DB

  • Mutations annotated in both databases:

mutations which are not silent and cause a phenotype

Codonposition Mutationposition Amino Acids Codons
26 3 Trp -> TER TGGc -> TGA
39 2 Leu -> Arg CTT -> CGT
127 1 Leu -> Phe aCTC -> TTC
2 Leu -> Arg CTC -> CGC
137 1 Arg -> TER cCGA -> TGA
170 1 Arg -> Trp cCGG -> TGG
2 Arg -> Gln CGG -> CAG
178 2 Arg -> His CGC -> CAC
2 Arg -> Leu CGC -> CTC
1 Arg -> Cys tCGC -> TGC
180 3 Tyr -> TER TACc -> TAG
1 Tyr -> His tTAC -> CAC
197 2 Lys -> Thr AAA -> ACA
200 1 Val -> Met cGTG -> ATG
204 2 His -> Arg CAT -> CGT
210 2 Ser -> Phe TCC -> TTC
211 2 Phe -> Ser TTC -> TCC
247 1 Arg -> Trp aCGG -> TGG
250 1 Gly -> Ser gGGT -> AGT
2 Gly -> Asp GGT -> GAT
2 Gly -> Val GGT -> GTT
258 1 Asp -> His tGAC -> CAC
269 1 Gly -> Ser aGGT -> AGT
2 Gly -> Asp GGT -> GAT
301 2 Met -> Arg ATG -> AGG
329 2 Trp -> TER TGG -> TAG
393 1 Arg -> TER gCGA -> TGA
420 3 Trp -> Cys TGGt -> TGC
3 Trp -> Cys TGGt -> TGT
451 1 Leu -> Val tCTG -> GTG
454 2 Gly -> Asp GGT -> GAT
1 Gly -> Ser tGGT -> AGT
474 3 Trp -> Cys TGGc -> TGC
482 1 Glu -> Lys cGAA -> AAA
485 1 Trp -> Arg gTGG -> CGG
499 1 Arg -> Cys aCGT -> TGT
2 Arg -> His CGT -> CAT
504 1 Arg -> Cys cCGC -> TGC
2 Arg -> His CGC -> CAC

Graphical representation:

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWP!PQNFQTSDQRYVRYPNNFQFQYDVSSAAQPGCSVLD


EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD


QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLFLSETVWGAL!GLETFSQLVWKSAEGTFFINKTEIEDFPRFPHWGLLLDTSHH!LPLS


SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNTLNMFHWRLVDDPFSPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY


ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
AWLRSIRVLAEFHTPGHTLSWGPSIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFRSTFFL


EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTC!KSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG


KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVW!EDIPVNYMKELELVTKAGFRALLSAPCYLNRISYG


PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKAVVIDGEACMWGEYVDNTNLVPRLCPRAGAVAKRLRSNKL


TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYECLSHFCCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation

~


The non-silent mutations are colored in red. A "!" in the sequence means that there is a stop codon.
If there are more than one mutation at one position, we always used the first mutation in the graphical representation.

Mutations annotated only in HGMD: not found

Mutations annotated only in SNP-DB:

mutations annotated only in SNP-DB and not silent (pos in codon ist annotiert und sicher):

Codonposition Mutationposition Amino Acids Codons
293 2 Ser -> Ile AGT -> ATT
399 1 Asn -> Asp AAC -> GAC
436 1 Ile -> Val ATA -> GTA
456 2 Tyr -> Ser TAT -> TCT
506 3 Glu -> Asp GAA -> GAC
Glu -> Asp GAA -> GAT

Graphical representation:

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD


EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD


QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS


SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY


ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPILNNTYEFMSTFFL


EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG


KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVDYMKELELVTKAGFRALLSAPWYLNRISYG


PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYVVEPLAFEGTPEQKALVIGGSACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL


TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCDLLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation


mutations annotated only in SNP-DB and silent: (pos in codon ist annotiert und sicher)

Position Amino Acids Codons
506 Glu -> Glu GAA -> GAG

Graphical representation:

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD


EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD


QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS


SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY


ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL


EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG


KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG


PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL


TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation


silent mutation (unklar an welcher stelle im codon wird nirgends annotiert):

This mutations are badly annotated in the SNP-DB. Therefore we rotated the found codons, because the original codon has to code for the amino acid which occur in the protein sequence. Therefore we used the codon with the mutation at position one, position two and position three. Next we also reverse it and made the complemetary sequence of both. The detailed result can be seen [here]

Here is the result, which combinations are possible:

Codonposition Mutationposition Amino Acids Codons
3 1 Phe -> Leu TTC -> CTC
2 Val -> Ala GTT -> GCT
3 Ser -> Ser AGT -> AGC
29 1 Phe -> Leu TTC -> CTC
2 Val -> Ala GTT -> GCT
3 Cys -> Cys TGT -> TGC
109 1 Phe -> Leu TTT -> CTT
2 Leu -> Pro CTT -> CCT
3 Thr -> Thr ACT -> ACC
179 1 Val -> Leu GTA -> CTA
2 Ser -> Thr AGT -> ACT
3 Asp -> Asp GAG -> GAC
203 1 STOP -> Arg TGA -> CGA
2 Leu -> Pro CTG -> CCG
3 Thr -> Thr ACT -> ACC
208 1 Leu -> Leu TTG -> CTG
2 Ile -> Thr ATT -> ACT
3 Tyr -> Tyr TAT -> TAC
248 1 Asp -> Lys GAA -> AAA
2 Arg -> Glu CGA -> CAA
3 Pro -> Pro CCG -> CCA
324 1 TER -> Arg TGA -> AGA
TER -> Arg TGA -> CGA
2 Leu -> TER TTG -> TAG
Leu -> Ser TTG -> TCG
3 Val -> Val GTT -> GTA
Val -> Val GTT -> GTC
446 1 Gly -> Arg GGG -> AGG
2 Arg -> Glu CGG -> CAG
3 Ser -> Ser TCG -> TCA
476 1 Val -> Ile GTT -> ATT
2 Ser -> Asn AGT -> AAT
3 Lys -> Lys AAG -> AAA
540 1 STOP -> Glu TAG -> CAG
2 Ile -> Thr ATA -> ACA
3 His -> His CAT -> CAC

Graphical representation:

 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD
MTSSRLWFSLLLAAAFAGRATALWPWPQXFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLD


EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD
EAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDD


QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLS
QCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRXYLPLS


SILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY
SILDTLDVMAYNKLNVFHXHLVDXPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEY


ARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL
ARXRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFL


EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG
EVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYG


KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG
KGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYG


PDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKL
PDWKDFYIVEPLAFEGTXEQKALVIGGEACMWGEYVDNTNLVPRLWPXAGAVAERLWSNKL


TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT TSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT Non-silent mutation Silent mutation Wrong AA in mutation annotation