Difference between revisions of "Mapping mutations of ARS A"
(→HGMD) |
(→Comining dbSNP and HGMD) |
||
Line 318: | Line 318: | ||
[[File:Mut map 1auk.png | 400px | center | thumb | Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.]] |
[[File:Mut map 1auk.png | 400px | center | thumb | Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.]] |
||
+ | |||
+ | === Summary Satistics === |
||
+ | |||
+ | [[File:subst_bar.jpeg | 400px | center | thumb | The Figure shows the number of mutations in the reference sequence for each amino acid.]] |
||
+ | |||
+ | [[File:subst_matrix.jpeg | 400px | center | thumb | The Figure shows for each amino acid pair, the number of observed mutations in the above generated combined map (HGMD, dbSNP).]] |
||
=== References === |
=== References === |
Revision as of 13:14, 11 August 2011
Contents
Mutations in general
Mutations are changes in the genomic nucleotide sequence of an organism. These changes are accidentally introduced, e.g. if wrong bases are incorporated during DNA replication. The common types of mutations are insertions into the DNA, deletions from it or Nucleotide substitutions.
Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent.
- Frameshift mutation: If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. If the insertion/deletion is divisible by 3, the reading frame is not disrupted and the structural and functional effects on the protein might not be severe.
- In a nonsense mutation the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to truncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long.
- A missense mutation describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect.
- In a silent mutation, the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed.
In the following, we will map known nonsense, missense and silent (= synonymous) mutations from the databases dbSNP and HGMD on the sequence and the structure of the lysosomal enzyme ARS A.
HGMD
The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that are associated with diseases. We used the protocol as described here to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in red to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "*" are metal binding sites, "." are substrate binding sites and ":" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme. Furthermore, we can see that the disease causing mutations are rather uniformly distributed along the protein sequence.
>sp|P15289|ARSA_HUMAN
**
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
*
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
. : .
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
.
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
**
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
.
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
Accession Number | Codon change | Amino acid change | position |
CM042298 | cGAC-AAC | Asp-Asn | 29 |
CM990171 | cGGC-AGC | Gly-Ser | 32 |
CM065974 | CTG-CCG | Leu-Pro | 52 |
CM990172 | CTG-CCG | Leu-Pro | 68 |
CM950092 | CCG-CTG | Pro-Leu | 82 |
CM940096 | CGG-CAG | Arg-Gln | 84 |
CM990173 | tCGG-TGG | Arg-Trp | 84 |
CM940097 | GGC-GAC | Gly-Asp | 86 |
CM990174 | gCCC-GCC | Pro-Ala | 94 |
CM970109 | AGC-AAC | Ser-Asn | 95 |
CM910049 | TCC-TTC | Ser-Phe | 96 |
CM910050 | GGC-GAC | Gly-Asp | 99 |
CM990175 | GGC-GTC | Gly-Val | 99 |
CM970110 | aGGA-AGA | Gly-Arg | 119 |
CM940098 | cGGC-AGC | Gly-Ser | 122 |
CM980118 | CTG-CCG | Leu-Pro | 135 |
CM940099 | CCC-CTC | Pro-Leu | 136 |
CM990176 | gCCC-TCC | Pro-Ser | 136 |
CM004461 | tCGA-GGA | Arg-Gly | 143 |
CM990177 | CCG-CTG | Pro-Leu | 148 |
CM970111 | cGAC-TAC | Asp-Tyr | 152 |
CM962419 | CAGg-CAC | Gln-His | 153 |
CM940100 | GGC-GAC | Gly-Asp | 154 |
CM940101 | CCC-CGC | Pro-Arg | 155 |
CM032834 | CCC-CTC | Pro-Leu | 155 |
CM042299 | cTGC-CGC | Cys-Arg | 156 |
CM940102 | CCT-CGT | Pro-Arg | 167 |
CM940103 | cGAC-AAC | Asp-Asn | 169 |
CM950093 | TGT-TAT | Cys-Tyr | 172 |
CM910051 | ATC-AGC | Ile-Ser | 179 |
CM032835 | CTG-CAG | Leu-Gln | 181 |
CM950094 | CAGc-CAC | Gln-His | 190 |
CM990178 | gCCC-ACC | Pro-Thr | 191 |
CM990179 | TGGc-TGA | Trp-Term | 193 |
CM950095 | TAC-TGC | Tyr-Cys | 201 |
CM930039 | GCC-GTC | Ala-Val | 212 |
CM050538 | cTTC-GTC | Phe-Val | 219 |
CM930040 | GCC-GTC | Ala-Val | 224 |
CM990180 | cCAC-TAC | His-Tyr | 227 |
CM940104 | cCCT-ACT | Pro-Thr | 231 |
CM940105 | cCGC-TGC | Arg-Cys | 244 |
CM970112 | CGC-CAC | Arg-His | 244 |
CM930041 | cGGG-AGG | Gly-Arg | 245 |
CM034715 | TTT-TCT | Phe-Ser | 247 |
CM970113 | TCC-TAC | Ser-Tyr | 250 |
CM024340 | gGAG-AAG | Glu-Lys | 253 |
CM960078 | gGAT-CAT | Asp-His | 255 |
CM930042 | ACG-ATG | Thr-Met | 274 |
CM074714 | ACT-ATT | Thr-Ile | 279 |
CM993444 | aGAC-TAC | Asp-Tyr | 281 |
CM023013 | gACC-CCC | Thr-Pro | 286 |
CM990181 | CGT-CAT | Arg-His | 288 |
CM940106 | gCGT-TGT | Arg-Cys | 288 |
CM042300 | cGGC-AGC | Gly-Ser | 293 |
CM044574 | GGC-GAC | Gly-Asp | 293 |
CM042301 | TGC-TAC | Cys-Tyr | 294 |
CM930043 | TCC-TAC | Ser-Tyr | 295 |
CM980119 | TTG-TCG | Leu-Ser | 298 |
CM990182 | TGT-TTT | Cys-Phe | 300 |
CM032836 | cTAC-CAC | Tyr-His | 306 |
HM060041 | cGAG-AAG | Glu-Lys | 307 |
CM990183 | GGC-GAC | Gly-Asp | 308 |
CM962420 | GGC-GTC | Gly-Val | 308 |
CM930044 | cGGT-AGT | Gly-Ser | 309 |
CM950096 | CGA-CAA | Arg-Gln | 311 |
CM001061 | GAGc-GAT | Glu-Asp | 312 |
CM970114 | tGCC-ACC | Ala-Thr | 314 |
CM004546 | TGGc-TGA | Trp-Term | 318 |
CM032837 | cGGC-AGC | Gly-Ser | 325 |
CM990184 | ACC-ATC | Thr-Ile | 327 |
CM065973 | cGAG-TAG | Glu-Term | 329 |
CM940107 | GAC-GTC | Asp-Val | 335 |
CM890013 | AAT-AGT | Asn-Ser | 350 |
CM970115 | AAGa-AAC | Lys-Asn | 367 |
CM940108 | CGG-CAG | Arg-Gln | 370 |
CM940109 | tCGG-TGG | Arg-Trp | 370 |
CM940110 | CCG-CTG | Pro-Leu | 377 |
CM930045 | cGAG-AAG | Glu-Lys | 382 |
CM970116 | cCGT-TGT | Arg-Cys | 384 |
CM980120 | CGG-CAG | Arg-Gln | 390 |
CM940111 | gCGG-TGG | Arg-Trp | 390 |
CM910052 | ACT-AGT | Thr-Ser | 391 |
CM980121 | tCAC-TAC | His-Tyr | 397 |
CM065972 | cAGT-GGT | Ser-Gly | 406 |
CM012065 | ACC-ATC | Thr-Ile | 408 |
CM940112 | ACT-ATT | Thr-Ile | 409 |
CM990185 | gCCC-ACC | Pro-Thr | 425 |
CM940113 | CCG-CTG | Pro-Leu | 426 |
CM970117 | CTC-CCC | Leu-Pro | 428 |
CM032838 | TAT-TCT | Tyr-Ser | 429 |
CM990186 | GCC-GTC | Ala-Val | 464 |
CM034716 | GCT-GGT | Ala-Gly | 469 |
CM940114 | gCAG-TAG | Gln-Term | 486 |
CM044573 | cTGT-GGT | Cys-Gly | 489 |
dbSNP
The Single Nucleotide Polymorphism Database (dbSNP) <ref> Wheeler DL, Barrett T, Benson DA, et al. (January 2007). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 35 (Database issue): D5–12. </ref> is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA.
The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared.
We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Also there were a lot of insertions and deletions, that we did not want to consider. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis.
Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in green). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section.
>sp|P15289|ARSA_HUMAN
**
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
*
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
. : .
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
.
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
**
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
.
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
SNP ID | SNP type | nucleotide (mutation) | amino acid (mutation) | nucleotide (reference) | amino acid (reference) | position |
---|---|---|---|---|---|---|
rs6151428 | missense | A | His [H] | G | Arg [R] | 496 |
rs117341984 | missense | A | Arg [R] | G | Gly [G] | 447 |
rs6151427 | missense | G | Ser [S] | A | Asn [N] | 440 |
rs6151425 | synonymous | T | Asp [D] | C | Asp [D] | 381 |
rs6151422 | missense | G | Val [V] | T | Phe [F] | 356 |
rs113990230 | synonymous | C | His [H] | T | His [H] | 206 |
rs62001867 | missense | A | Thr [T] | G | Ala [A] | 205 |
rs34457249 | synonymous | T | Pro [P] | C | Pro [P] | 195 |
rs6151415 | missense | T | Cys [C] | G | Trp [W] | 193 |
rs113209108 | synonymous | T | Ser [S] | C | Ser [S] | 186 |
rs6151412 | synonymous | T | His [H] | C | His [H] | 151 |
rs60504011 | missense | G | Ala [A] | C | Pro [P] | 136 |
rs6151411 | missense | G | Leu [L] | C | Pro [P] | 82 |
rs6151410 | missense | T | Gly [G] | C | Gly [G] | 79 |
Comining dbSNP and HGMD
We combined all 104 mutations (snynonymous, missense and nonsense) from both databases. We did not need to align the sequences, because both database used the same sequence version and positions perfectly corresponded to our sequence of ARSA. The overlap between both databases is very low. Only 3 positions show up in both results:
- Position 193: The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C).
- Position 136: The mutations are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP.
- Position 82: Is mutation is identical in both databases and leads to a substitution: P -> L.
We again visualized the distribution of the mutation along the sequence. Synonymous substitutions are depicted in green, missense and nonsense mutations are depicted in red:
>sp|P15289|ARSA_HUMAN
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
Furthermore, we visualized the mutations on the 3-dimensional structure of the protein:
Summary Satistics
References
<references/>