Difference between revisions of "Mapping mutations of ARS A"
m (→References) |
|||
(16 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent. |
Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent. |
||
− | * ''Frameshift mutation'': If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. |
+ | * ''Frameshift mutation'': If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. If the insertion/deletion is divisible by 3, the reading frame is not disrupted and the structural and functional effects on the protein might not be severe. |
− | * In a ''nonsense mutation'' the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to |
+ | * In a ''nonsense mutation'' the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to truncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long. |
* A ''missense mutation'' describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect. |
* A ''missense mutation'' describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect. |
||
* In a ''silent mutation'', the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed. |
* In a ''silent mutation'', the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed. |
||
Line 13: | Line 13: | ||
=== HGMD === |
=== HGMD === |
||
− | The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that |
+ | The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that are associated with diseases. We used the protocol as described [[Task 5 - Mapping SNPs | here]] to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in <span style="background:#ff0000">red</span> to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "'''*'''" are metal binding sites, "'''.'''" are substrate binding sites and "''':'''" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme. Furthermore, we can see that the disease causing mutations are rather uniformly distributed along the protein sequence. |
<code> |
<code> |
||
Line 229: | Line 229: | ||
=== dbSNP === |
=== dbSNP === |
||
− | The Single Nucleotide Polymorphism Database (dbSNP) <ref> |
+ | The Single Nucleotide Polymorphism Database (dbSNP) <ref> Wheeler DL, Barrett T, Benson DA, et al. (January 2007). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 35 (Database issue): D5–12. </ref> is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA. <br> |
The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared. |
The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared. |
||
− | We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis. <br> |
+ | We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Also there were a lot of insertions and deletions, that we did not want to consider. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis. <br> |
Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in <span style="background:#00FF00">green</span>). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section. |
Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in <span style="background:#00FF00">green</span>). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section. |
||
Line 293: | Line 293: | ||
=== Comining dbSNP and HGMD === |
=== Comining dbSNP and HGMD === |
||
+ | |||
+ | We combined all 104 mutations (snynonymous, missense and nonsense) from both databases. We did not need to align the sequences, because both database used the same sequence version and positions perfectly corresponded to our sequence of ARSA. The overlap between both databases is very low. Only 3 positions show up in both results: |
||
+ | |||
+ | * ''Position 193'': The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C). |
||
+ | * ''Position 136'': The mutations are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP. |
||
+ | * ''Position 82'': Is mutation is identical in both databases and leads to a substitution: P -> L. |
||
+ | |||
+ | We again visualized the distribution of the mutation along the sequence. Synonymous substitutions are depicted in <span style="background:#00FF00">green</span>, missense and nonsense mutations are depicted in <span style="background:#ff0000">red</span>: |
||
<code> |
<code> |
||
>sp|P15289|ARSA_HUMAN<br> |
>sp|P15289|ARSA_HUMAN<br> |
||
+ | MGAPRSLLLALAAGLAVARPPNIVLIFA<span style="background:#ff0000">D</span>DL<span style="background:#ff0000">G</span>YGDLGCYGHPSSTTPNLDQ<span style="background:#ff0000">L</span>AAGGLRFT<br> |
||
− | ** <br> |
||
− | + | DFYVPVS<span style="background:#ff0000">L</span>CTPSRAALLT<span style="background:#ff0000">G</span>RL<span style="background:#ff0000">P</span>V<span style="background:#ff0000">R</span>M<span style="background:#ff0000">G</span>MYPGVLV<span style="background:#ff0000">P</span><span style="background:#ff0000">S</span><span style="background:#ff0000">S</span>RG<span style="background:#ff0000">G</span>LPLEEVTVAEVLAARGYLT<span style="background:#ff0000">G</span>M<br> |
|
+ | A<span style="background:#ff0000">G</span>KWHLGVGPEGAF<span style="background:#ff0000">L</span><span style="background:#ff0000">P</span>PHQGFH<span style="background:#ff0000">R</span>FLGI<span style="background:#ff0000">P</span>YS<span style="background:#00FF00">H</span><span style="background:#ff0000">D</span><span style="background:#ff0000">Q</span><span style="background:#ff0000">G</span><span style="background:#ff0000">P</span><span style="background:#ff0000">C</span>QNLTCFPPAT<span style="background:#ff0000">P</span>C<span style="background:#ff0000">D</span>GG<span style="background:#ff0000">C</span>DQGLVP<span style="background:#ff0000">I</span>P<br> |
||
− | * <br> |
||
− | + | <span style="background:#ff0000">L</span>LANL<span style="background:#00FF00">S</span>VEA<span style="background:#ff0000">Q</span><span style="background:#ff0000">P</span>P<span style="background:#ff0000">W</span>L<span style="background:#00FF00">P</span>GLEAR<span style="background:#ff0000">Y</span>MAF<span style="background:#ff0000">A</span><span style="background:#00FF00">H</span>DLMAD<span style="background:#ff0000">A</span>QRQDRP<span style="background:#ff0000">F</span>FLYY<span style="background:#ff0000">A</span>SH<span style="background:#ff0000">H</span>THY<span style="background:#ff0000">P</span>QFSGQSFAE<br> |
|
+ | RSG<span style="background:#ff0000">R</span><span style="background:#ff0000">G</span>P<span style="background:#ff0000">F</span>GD<span style="background:#ff0000">S</span>LM<span style="background:#ff0000">E</span>L<span style="background:#ff0000">D</span>AAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>LVIF<span style="background:#ff0000">T</span>A<span style="background:#ff0000">D</span>NGPE<span style="background:#ff0000">T</span>M<span style="background:#ff0000">R</span>MSRG<span style="background:#ff0000">G</span><span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>GL<span style="background:#ff0000">L</span>R<span style="background:#ff0000">C</span><br> |
||
− | . : . <br> |
||
− | + | GKGTT<span style="background:#ff0000">Y</span><span style="background:#ff0000">E</span><span style="background:#ff0000">G</span><span style="background:#ff0000">G</span>V<span style="background:#ff0000">R</span><span style="background:#ff0000">E</span>P<span style="background:#ff0000">A</span>LAFWPGHIAP<span style="background:#ff0000">G</span>V<span style="background:#ff0000">T</span>HELASSL<span style="background:#ff0000">D</span>LLPTLAALAGAPLP<span style="background:#ff0000">N</span>VTLDG<span style="background:#ff0000">F</span>DLSP<br> |
|
+ | LLLGTG<span style="background:#ff0000">K</span>SP<span style="background:#ff0000">R</span>QSLFFY<span style="background:#ff0000">P</span>SYP<span style="background:#00FF00">D</span><span style="background:#ff0000">E</span>V<span style="background:#ff0000">R</span>GVFAV<span style="background:#ff0000">R</span><span style="background:#ff0000">T</span>GKYKA<span style="background:#ff0000">H</span>FFTQGSAH<span style="background:#ff0000">S</span>D<span style="background:#ff0000">T</span><span style="background:#ff0000">T</span>ADPACHASSSL<br> |
||
− | . <br> |
||
− | <span style="background:#ff0000"> |
+ | TAHE<span style="background:#ff0000">P</span><span style="background:#ff0000">P</span>L<span style="background:#ff0000">L</span><span style="background:#ff0000">Y</span>DLSKDPGENY<span style="background:#ff0000">N</span>LLGGVA<span style="background:#ff0000">G</span>ATPEVLQALKQLQLLK<span style="background:#ff0000">A</span>QLDA<span style="background:#ff0000">A</span>VTFGPSQVARG<br> |
+ | EDPALQIC<span style="background:#ff0000">C</span>HPGCTP<span style="background:#ff0000">R</span>PACCHCPDPHA |
||
− | ** <br> |
||
− | RSG<span style="background:#ff0000">R</span>R<span style="background:#ff0000">G</span><span style="background:#ff0000">F</span>FD<span style="background:#ff0000">S</span>SM<span style="background:#ff0000">E</span>E<span style="background:#ff0000">D</span>DAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>TVIF<span style="background:#ff0000">T</span>T<span style="background:#ff0000">D</span>DGPE<span style="background:#ff0000">T</span>T<span style="background:#ff0000">R</span>RSRG<span style="background:#ff0000">G</span>G<span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>L<span style="background:#ff0000">L</span>L<span style="background:#ff0000">C</span>C<br> |
||
− | . <br> |
||
− | KGTT<span style="background:#ff0000">Y</span>Y<span style="background:#ff0000">E</span><span style="background:#ff0000">G</span><span style="background:#ff0000">G</span><span style="background:#ff0000">R</span>R<span style="background:#ff0000">E</span><span style="background:#ff0000">A</span>AAFWPGHIAP<span style="background:#ff0000">G</span>G<span style="background:#ff0000">T</span>TELASSL<span style="background:#ff0000">D</span>DLPTLAALAGAPLP<span style="background:#ff0000">N</span>NTLDG<span style="background:#ff0000">F</span>FLSP<br><br> |
||
− | LLLGTG<span style="background:#ff0000">K</span>KP<span style="background:#ff0000">R</span>RSLFFY<span style="background:#ff0000">P</span>PYP<span style="background:#00FF00">D</span>D<span style="background:#ff0000">E</span><span style="background:#ff0000">R</span>RVFAV<span style="background:#ff0000">R</span>R<span style="background:#ff0000">T</span>KYKA<span style="background:#ff0000">H</span>HFTQGSAH<span style="background:#ff0000">S</span>S<span style="background:#ff0000">T</span>T<span style="background:#ff0000">T</span>DPACHASSSL<br><br> |
||
− | TAHE<span style="background:#ff0000">P</span>P<span style="background:#ff0000">P</span><span style="background:#ff0000">L</span>L<span style="background:#ff0000">Y</span>LSKDPGENY<span style="background:#ff0000">N</span>NLGGVA<span style="background:#ff0000">G</span>GTPEVLQALKQLQLLK<span style="background:#ff0000">A</span>ALDA<span style="background:#ff0000">A</span>ATFGPSQVARG<br><br> |
||
− | EDPALQIC<span style="background:#ff0000">C</span>CPGCTP<span style="background:#ff0000">R</span>RACCHCPDPHA |
||
</code> |
</code> |
||
+ | Furthermore, we visualized the mutations on the 3-dimensional structure of the protein: |
||
− | There are 3 identical mutated residues, that are annotated in both databases are at position: |
||
− | |||
− | * 193: The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C). |
||
− | * 136: The mutationas are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP. |
||
− | * 82: Is mutation is identical in both databases and leads to a substitution: P -> L. |
||
[[File:Mut map 1auk.png | 400px | center | thumb | Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.]] |
[[File:Mut map 1auk.png | 400px | center | thumb | Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.]] |
||
+ | |||
+ | === Summary Satistics === |
||
+ | In this section we shortly want to analyse the mutation frequencies of the amino acids. First, we counted for all 20 amino acids, how often they are mutated, regarding to the above generated mutation map. The Figure below shows the results. |
||
+ | |||
+ | [[File:subst_bar.jpeg | 400px | center | thumb | The Figure shows the number of mutations in the reference sequence for each amino acid.]] |
||
+ | |||
+ | Gly, Pro and Arg show the highest mutation freqeuncy. All of these amino acids show very distinct physico-chemical properties. We expect these to be overrepresented in our map, because we are mostly looking at disease-causing mutations. |
||
+ | * Glycine is the smallest amino acid. Replacing it by any other bigger amino acid might cause structural chnages to the protein. <br> |
||
+ | * Proline is unique due to its ring structure, which enables the amino acid to disrupt secondary structure elements, which causes structural changes and might therefore affect the function of the protein. <br> |
||
+ | * Arginine is the most hydrophobic amino acid, with an Hydropathy index of -4.5. <ref>Kyte J, Doolittle RF (1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology </ref> Here, an amino acid substitution changes the behaviour in a waterous environment. <br> |
||
+ | |||
+ | Next, we wanted to have a look at the frequencies of all substitutions. To achieve that, we calculated for each amino acid pair, the number of observed mutations in our combined mutation map. The Following Figure visualizes these counts: |
||
+ | |||
+ | [[File:subst_matrix.jpeg | 400px | center | thumb | The Figure shows for each amino acid pair, the number of observed mutations in the above generated combined map (HGMD, dbSNP).]] |
||
+ | |||
+ | Now let's consider and analyse the two most frequent mutations in our map. As we are mostly looking at disease causing mutations, we expect mutations between amino acids with very different physico-chemical bahaviour to be most abundant. <br> |
||
+ | * The most observed mutation from the map is Leu -> Pro. Also, Pro -> Leu is is very frequent. Leucine is a hydrophobic amino acid, whereas Proline is rather hydrophilic. Furthermore, introduction/removal of Proline to/from a structure might cause a severe strucutral change to the whole protein. This is because due to its unqiue ring structure, Protline is able to disrupt helical structures. Thus, introduction of Proline might disrupt secondary structure, resp. removel might introduce new structural elements. |
||
+ | * Another frequent mutation is Asp->Gly. Asparagine is the very tiny residue Glycine replaces the bulky Asparagine. Furthermore Asparagine is rather hydrophilic due to its polarity, wehereas Glycine is aliphatic. |
||
=== References === |
=== References === |
||
<references/> |
<references/> |
||
+ | |||
+ | [[Category : Metachromatic_Leukodystrophy 2011]] |
Latest revision as of 13:58, 29 March 2012
Contents
Mutations in general
Mutations are changes in the genomic nucleotide sequence of an organism. These changes are accidentally introduced, e.g. if wrong bases are incorporated during DNA replication. The common types of mutations are insertions into the DNA, deletions from it or Nucleotide substitutions.
Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent.
- Frameshift mutation: If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. If the insertion/deletion is divisible by 3, the reading frame is not disrupted and the structural and functional effects on the protein might not be severe.
- In a nonsense mutation the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to truncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long.
- A missense mutation describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect.
- In a silent mutation, the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed.
In the following, we will map known nonsense, missense and silent (= synonymous) mutations from the databases dbSNP and HGMD on the sequence and the structure of the lysosomal enzyme ARS A.
HGMD
The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that are associated with diseases. We used the protocol as described here to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in red to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "*" are metal binding sites, "." are substrate binding sites and ":" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme. Furthermore, we can see that the disease causing mutations are rather uniformly distributed along the protein sequence.
>sp|P15289|ARSA_HUMAN
**
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
*
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
. : .
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
.
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
**
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
.
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
Accession Number | Codon change | Amino acid change | position |
CM042298 | cGAC-AAC | Asp-Asn | 29 |
CM990171 | cGGC-AGC | Gly-Ser | 32 |
CM065974 | CTG-CCG | Leu-Pro | 52 |
CM990172 | CTG-CCG | Leu-Pro | 68 |
CM950092 | CCG-CTG | Pro-Leu | 82 |
CM940096 | CGG-CAG | Arg-Gln | 84 |
CM990173 | tCGG-TGG | Arg-Trp | 84 |
CM940097 | GGC-GAC | Gly-Asp | 86 |
CM990174 | gCCC-GCC | Pro-Ala | 94 |
CM970109 | AGC-AAC | Ser-Asn | 95 |
CM910049 | TCC-TTC | Ser-Phe | 96 |
CM910050 | GGC-GAC | Gly-Asp | 99 |
CM990175 | GGC-GTC | Gly-Val | 99 |
CM970110 | aGGA-AGA | Gly-Arg | 119 |
CM940098 | cGGC-AGC | Gly-Ser | 122 |
CM980118 | CTG-CCG | Leu-Pro | 135 |
CM940099 | CCC-CTC | Pro-Leu | 136 |
CM990176 | gCCC-TCC | Pro-Ser | 136 |
CM004461 | tCGA-GGA | Arg-Gly | 143 |
CM990177 | CCG-CTG | Pro-Leu | 148 |
CM970111 | cGAC-TAC | Asp-Tyr | 152 |
CM962419 | CAGg-CAC | Gln-His | 153 |
CM940100 | GGC-GAC | Gly-Asp | 154 |
CM940101 | CCC-CGC | Pro-Arg | 155 |
CM032834 | CCC-CTC | Pro-Leu | 155 |
CM042299 | cTGC-CGC | Cys-Arg | 156 |
CM940102 | CCT-CGT | Pro-Arg | 167 |
CM940103 | cGAC-AAC | Asp-Asn | 169 |
CM950093 | TGT-TAT | Cys-Tyr | 172 |
CM910051 | ATC-AGC | Ile-Ser | 179 |
CM032835 | CTG-CAG | Leu-Gln | 181 |
CM950094 | CAGc-CAC | Gln-His | 190 |
CM990178 | gCCC-ACC | Pro-Thr | 191 |
CM990179 | TGGc-TGA | Trp-Term | 193 |
CM950095 | TAC-TGC | Tyr-Cys | 201 |
CM930039 | GCC-GTC | Ala-Val | 212 |
CM050538 | cTTC-GTC | Phe-Val | 219 |
CM930040 | GCC-GTC | Ala-Val | 224 |
CM990180 | cCAC-TAC | His-Tyr | 227 |
CM940104 | cCCT-ACT | Pro-Thr | 231 |
CM940105 | cCGC-TGC | Arg-Cys | 244 |
CM970112 | CGC-CAC | Arg-His | 244 |
CM930041 | cGGG-AGG | Gly-Arg | 245 |
CM034715 | TTT-TCT | Phe-Ser | 247 |
CM970113 | TCC-TAC | Ser-Tyr | 250 |
CM024340 | gGAG-AAG | Glu-Lys | 253 |
CM960078 | gGAT-CAT | Asp-His | 255 |
CM930042 | ACG-ATG | Thr-Met | 274 |
CM074714 | ACT-ATT | Thr-Ile | 279 |
CM993444 | aGAC-TAC | Asp-Tyr | 281 |
CM023013 | gACC-CCC | Thr-Pro | 286 |
CM990181 | CGT-CAT | Arg-His | 288 |
CM940106 | gCGT-TGT | Arg-Cys | 288 |
CM042300 | cGGC-AGC | Gly-Ser | 293 |
CM044574 | GGC-GAC | Gly-Asp | 293 |
CM042301 | TGC-TAC | Cys-Tyr | 294 |
CM930043 | TCC-TAC | Ser-Tyr | 295 |
CM980119 | TTG-TCG | Leu-Ser | 298 |
CM990182 | TGT-TTT | Cys-Phe | 300 |
CM032836 | cTAC-CAC | Tyr-His | 306 |
HM060041 | cGAG-AAG | Glu-Lys | 307 |
CM990183 | GGC-GAC | Gly-Asp | 308 |
CM962420 | GGC-GTC | Gly-Val | 308 |
CM930044 | cGGT-AGT | Gly-Ser | 309 |
CM950096 | CGA-CAA | Arg-Gln | 311 |
CM001061 | GAGc-GAT | Glu-Asp | 312 |
CM970114 | tGCC-ACC | Ala-Thr | 314 |
CM004546 | TGGc-TGA | Trp-Term | 318 |
CM032837 | cGGC-AGC | Gly-Ser | 325 |
CM990184 | ACC-ATC | Thr-Ile | 327 |
CM065973 | cGAG-TAG | Glu-Term | 329 |
CM940107 | GAC-GTC | Asp-Val | 335 |
CM890013 | AAT-AGT | Asn-Ser | 350 |
CM970115 | AAGa-AAC | Lys-Asn | 367 |
CM940108 | CGG-CAG | Arg-Gln | 370 |
CM940109 | tCGG-TGG | Arg-Trp | 370 |
CM940110 | CCG-CTG | Pro-Leu | 377 |
CM930045 | cGAG-AAG | Glu-Lys | 382 |
CM970116 | cCGT-TGT | Arg-Cys | 384 |
CM980120 | CGG-CAG | Arg-Gln | 390 |
CM940111 | gCGG-TGG | Arg-Trp | 390 |
CM910052 | ACT-AGT | Thr-Ser | 391 |
CM980121 | tCAC-TAC | His-Tyr | 397 |
CM065972 | cAGT-GGT | Ser-Gly | 406 |
CM012065 | ACC-ATC | Thr-Ile | 408 |
CM940112 | ACT-ATT | Thr-Ile | 409 |
CM990185 | gCCC-ACC | Pro-Thr | 425 |
CM940113 | CCG-CTG | Pro-Leu | 426 |
CM970117 | CTC-CCC | Leu-Pro | 428 |
CM032838 | TAT-TCT | Tyr-Ser | 429 |
CM990186 | GCC-GTC | Ala-Val | 464 |
CM034716 | GCT-GGT | Ala-Gly | 469 |
CM940114 | gCAG-TAG | Gln-Term | 486 |
CM044573 | cTGT-GGT | Cys-Gly | 489 |
dbSNP
The Single Nucleotide Polymorphism Database (dbSNP) <ref> Wheeler DL, Barrett T, Benson DA, et al. (January 2007). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 35 (Database issue): D5–12. </ref> is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA.
The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared.
We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Also there were a lot of insertions and deletions, that we did not want to consider. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis.
Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in green). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section.
>sp|P15289|ARSA_HUMAN
**
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
*
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
. : .
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
.
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
**
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
.
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
SNP ID | SNP type | nucleotide (mutation) | amino acid (mutation) | nucleotide (reference) | amino acid (reference) | position |
---|---|---|---|---|---|---|
rs6151428 | missense | A | His [H] | G | Arg [R] | 496 |
rs117341984 | missense | A | Arg [R] | G | Gly [G] | 447 |
rs6151427 | missense | G | Ser [S] | A | Asn [N] | 440 |
rs6151425 | synonymous | T | Asp [D] | C | Asp [D] | 381 |
rs6151422 | missense | G | Val [V] | T | Phe [F] | 356 |
rs113990230 | synonymous | C | His [H] | T | His [H] | 206 |
rs62001867 | missense | A | Thr [T] | G | Ala [A] | 205 |
rs34457249 | synonymous | T | Pro [P] | C | Pro [P] | 195 |
rs6151415 | missense | T | Cys [C] | G | Trp [W] | 193 |
rs113209108 | synonymous | T | Ser [S] | C | Ser [S] | 186 |
rs6151412 | synonymous | T | His [H] | C | His [H] | 151 |
rs60504011 | missense | G | Ala [A] | C | Pro [P] | 136 |
rs6151411 | missense | G | Leu [L] | C | Pro [P] | 82 |
rs6151410 | missense | T | Gly [G] | C | Gly [G] | 79 |
Comining dbSNP and HGMD
We combined all 104 mutations (snynonymous, missense and nonsense) from both databases. We did not need to align the sequences, because both database used the same sequence version and positions perfectly corresponded to our sequence of ARSA. The overlap between both databases is very low. Only 3 positions show up in both results:
- Position 193: The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C).
- Position 136: The mutations are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP.
- Position 82: Is mutation is identical in both databases and leads to a substitution: P -> L.
We again visualized the distribution of the mutation along the sequence. Synonymous substitutions are depicted in green, missense and nonsense mutations are depicted in red:
>sp|P15289|ARSA_HUMAN
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA
Furthermore, we visualized the mutations on the 3-dimensional structure of the protein:
Summary Satistics
In this section we shortly want to analyse the mutation frequencies of the amino acids. First, we counted for all 20 amino acids, how often they are mutated, regarding to the above generated mutation map. The Figure below shows the results.
Gly, Pro and Arg show the highest mutation freqeuncy. All of these amino acids show very distinct physico-chemical properties. We expect these to be overrepresented in our map, because we are mostly looking at disease-causing mutations.
- Glycine is the smallest amino acid. Replacing it by any other bigger amino acid might cause structural chnages to the protein.
- Proline is unique due to its ring structure, which enables the amino acid to disrupt secondary structure elements, which causes structural changes and might therefore affect the function of the protein.
- Arginine is the most hydrophobic amino acid, with an Hydropathy index of -4.5. <ref>Kyte J, Doolittle RF (1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology </ref> Here, an amino acid substitution changes the behaviour in a waterous environment.
Next, we wanted to have a look at the frequencies of all substitutions. To achieve that, we calculated for each amino acid pair, the number of observed mutations in our combined mutation map. The Following Figure visualizes these counts:
Now let's consider and analyse the two most frequent mutations in our map. As we are mostly looking at disease causing mutations, we expect mutations between amino acids with very different physico-chemical bahaviour to be most abundant.
- The most observed mutation from the map is Leu -> Pro. Also, Pro -> Leu is is very frequent. Leucine is a hydrophobic amino acid, whereas Proline is rather hydrophilic. Furthermore, introduction/removal of Proline to/from a structure might cause a severe strucutral change to the whole protein. This is because due to its unqiue ring structure, Protline is able to disrupt helical structures. Thus, introduction of Proline might disrupt secondary structure, resp. removel might introduce new structural elements.
- Another frequent mutation is Asp->Gly. Asparagine is the very tiny residue Glycine replaces the bulky Asparagine. Furthermore Asparagine is rather hydrophilic due to its polarity, wehereas Glycine is aliphatic.
References
<references/>