Difference between revisions of "Mapping mutations of ARS A"

Latest revision as of 13:58, 29 March 2012

Mutations are changes in the genomic nucleotide sequence of an organism. These changes are accidentally introduced, e.g. if wrong bases are incorporated during DNA replication. The common types of mutations are insertions into the DNA, deletions from it or Nucleotide substitutions.
Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent.

Frameshift mutation: If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. If the insertion/deletion is divisible by 3, the reading frame is not disrupted and the structural and functional effects on the protein might not be severe.
In a nonsense mutation the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to truncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long.
A missense mutation describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect.
In a silent mutation, the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed.

In the following, we will map known nonsense, missense and silent (= synonymous) mutations from the databases dbSNP and HGMD on the sequence and the structure of the lysosomal enzyme ARS A.

HGMD

The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that are associated with diseases. We used the protocol as described here to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in red to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "*" are metal binding sites, "." are substrate binding sites and ":" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme. Furthermore, we can see that the disease causing mutations are rather uniformly distributed along the protein sequence.

>sp|P15289|ARSA_HUMAN

                            **                             

MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT

       *                                                    

DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM

  . :                        .                              

AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP

                                                .           

LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE

                                       **                   

RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
 
 .                                                         

GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP


LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL


TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG


EDPALQICCHPGCTPRPACCHCPDPHA

Accession Number	Codon change	Amino acid change	position
CM042298	cGAC-AAC	Asp-Asn	29
CM990171	cGGC-AGC	Gly-Ser	32
CM065974	CTG-CCG	Leu-Pro	52
CM990172	CTG-CCG	Leu-Pro	68
CM950092	CCG-CTG	Pro-Leu	82
CM940096	CGG-CAG	Arg-Gln	84
CM990173	tCGG-TGG	Arg-Trp	84
CM940097	GGC-GAC	Gly-Asp	86
CM990174	gCCC-GCC	Pro-Ala	94
CM970109	AGC-AAC	Ser-Asn	95
CM910049	TCC-TTC	Ser-Phe	96
CM910050	GGC-GAC	Gly-Asp	99
CM990175	GGC-GTC	Gly-Val	99
CM970110	aGGA-AGA	Gly-Arg	119
CM940098	cGGC-AGC	Gly-Ser	122
CM980118	CTG-CCG	Leu-Pro	135
CM940099	CCC-CTC	Pro-Leu	136
CM990176	gCCC-TCC	Pro-Ser	136
CM004461	tCGA-GGA	Arg-Gly	143
CM990177	CCG-CTG	Pro-Leu	148
CM970111	cGAC-TAC	Asp-Tyr	152
CM962419	CAGg-CAC	Gln-His	153
CM940100	GGC-GAC	Gly-Asp	154
CM940101	CCC-CGC	Pro-Arg	155
CM032834	CCC-CTC	Pro-Leu	155
CM042299	cTGC-CGC	Cys-Arg	156
CM940102	CCT-CGT	Pro-Arg	167
CM940103	cGAC-AAC	Asp-Asn	169
CM950093	TGT-TAT	Cys-Tyr	172
CM910051	ATC-AGC	Ile-Ser	179
CM032835	CTG-CAG	Leu-Gln	181
CM950094	CAGc-CAC	Gln-His	190
CM990178	gCCC-ACC	Pro-Thr	191
CM990179	TGGc-TGA	Trp-Term	193
CM950095	TAC-TGC	Tyr-Cys	201
CM930039	GCC-GTC	Ala-Val	212
CM050538	cTTC-GTC	Phe-Val	219
CM930040	GCC-GTC	Ala-Val	224
CM990180	cCAC-TAC	His-Tyr	227
CM940104	cCCT-ACT	Pro-Thr	231
CM940105	cCGC-TGC	Arg-Cys	244
CM970112	CGC-CAC	Arg-His	244
CM930041	cGGG-AGG	Gly-Arg	245
CM034715	TTT-TCT	Phe-Ser	247
CM970113	TCC-TAC	Ser-Tyr	250
CM024340	gGAG-AAG	Glu-Lys	253
CM960078	gGAT-CAT	Asp-His	255
CM930042	ACG-ATG	Thr-Met	274
CM074714	ACT-ATT	Thr-Ile	279
CM993444	aGAC-TAC	Asp-Tyr	281
CM023013	gACC-CCC	Thr-Pro	286
CM990181	CGT-CAT	Arg-His	288
CM940106	gCGT-TGT	Arg-Cys	288
CM042300	cGGC-AGC	Gly-Ser	293
CM044574	GGC-GAC	Gly-Asp	293
CM042301	TGC-TAC	Cys-Tyr	294
CM930043	TCC-TAC	Ser-Tyr	295
CM980119	TTG-TCG	Leu-Ser	298
CM990182	TGT-TTT	Cys-Phe	300
CM032836	cTAC-CAC	Tyr-His	306
HM060041	cGAG-AAG	Glu-Lys	307
CM990183	GGC-GAC	Gly-Asp	308
CM962420	GGC-GTC	Gly-Val	308
CM930044	cGGT-AGT	Gly-Ser	309
CM950096	CGA-CAA	Arg-Gln	311
CM001061	GAGc-GAT	Glu-Asp	312
CM970114	tGCC-ACC	Ala-Thr	314
CM004546	TGGc-TGA	Trp-Term	318
CM032837	cGGC-AGC	Gly-Ser	325
CM990184	ACC-ATC	Thr-Ile	327
CM065973	cGAG-TAG	Glu-Term	329
CM940107	GAC-GTC	Asp-Val	335
CM890013	AAT-AGT	Asn-Ser	350
CM970115	AAGa-AAC	Lys-Asn	367
CM940108	CGG-CAG	Arg-Gln	370
CM940109	tCGG-TGG	Arg-Trp	370
CM940110	CCG-CTG	Pro-Leu	377
CM930045	cGAG-AAG	Glu-Lys	382
CM970116	cCGT-TGT	Arg-Cys	384
CM980120	CGG-CAG	Arg-Gln	390
CM940111	gCGG-TGG	Arg-Trp	390
CM910052	ACT-AGT	Thr-Ser	391
CM980121	tCAC-TAC	His-Tyr	397
CM065972	cAGT-GGT	Ser-Gly	406
CM012065	ACC-ATC	Thr-Ile	408
CM940112	ACT-ATT	Thr-Ile	409
CM990185	gCCC-ACC	Pro-Thr	425
CM940113	CCG-CTG	Pro-Leu	426
CM970117	CTC-CCC	Leu-Pro	428
CM032838	TAT-TCT	Tyr-Ser	429
CM990186	GCC-GTC	Ala-Val	464
CM034716	GCT-GGT	Ala-Gly	469
CM940114	gCAG-TAG	Gln-Term	486
CM044573	cTGT-GGT	Cys-Gly	489

dbSNP

The Single Nucleotide Polymorphism Database (dbSNP) <ref> Wheeler DL, Barrett T, Benson DA, et al. (January 2007). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 35 (Database issue): D5–12. </ref> is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA.
The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared. We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Also there were a lot of insertions and deletions, that we did not want to consider. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis.
Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in green). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section.

>sp|P15289|ARSA_HUMAN

                            **                              

MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT

       *                                                    

DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM

  . :                        .                              

AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP

                                                .           

LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE

                                       **                   

RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC

 .                                                         

GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP


LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL


TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG


EDPALQICCHPGCTPRPACCHCPDPHA

SNP ID	SNP type	nucleotide (mutation)	amino acid (mutation)	nucleotide (reference)	amino acid (reference)	position
rs6151428	missense	A	His [H]	G	Arg [R]	496
rs117341984	missense	A	Arg [R]	G	Gly [G]	447
rs6151427	missense	G	Ser [S]	A	Asn [N]	440
rs6151425	synonymous	T	Asp [D]	C	Asp [D]	381
rs6151422	missense	G	Val [V]	T	Phe [F]	356
rs113990230	synonymous	C	His [H]	T	His [H]	206
rs62001867	missense	A	Thr [T]	G	Ala [A]	205
rs34457249	synonymous	T	Pro [P]	C	Pro [P]	195
rs6151415	missense	T	Cys [C]	G	Trp [W]	193
rs113209108	synonymous	T	Ser [S]	C	Ser [S]	186
rs6151412	synonymous	T	His [H]	C	His [H]	151
rs60504011	missense	G	Ala [A]	C	Pro [P]	136
rs6151411	missense	G	Leu [L]	C	Pro [P]	82
rs6151410	missense	T	Gly [G]	C	Gly [G]	79

Comining dbSNP and HGMD

We combined all 104 mutations (snynonymous, missense and nonsense) from both databases. We did not need to align the sequences, because both database used the same sequence version and positions perfectly corresponded to our sequence of ARSA. The overlap between both databases is very low. Only 3 positions show up in both results:

Position 193: The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C).
Position 136: The mutations are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP.
Position 82: Is mutation is identical in both databases and leads to a substitution: P -> L.

We again visualized the distribution of the mutation along the sequence. Synonymous substitutions are depicted in green, missense and nonsense mutations are depicted in red:

>sp|P15289|ARSA_HUMAN

MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT

DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM

AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP

LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE

RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC

GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP

LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL

TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG

EDPALQICCHPGCTPRPACCHCPDPHA

Furthermore, we visualized the mutations on the 3-dimensional structure of the protein:

Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.

Summary Satistics

In this section we shortly want to analyse the mutation frequencies of the amino acids. First, we counted for all 20 amino acids, how often they are mutated, regarding to the above generated mutation map. The Figure below shows the results.

The Figure shows the number of mutations in the reference sequence for each amino acid.

Gly, Pro and Arg show the highest mutation freqeuncy. All of these amino acids show very distinct physico-chemical properties. We expect these to be overrepresented in our map, because we are mostly looking at disease-causing mutations.

Glycine is the smallest amino acid. Replacing it by any other bigger amino acid might cause structural chnages to the protein.
Proline is unique due to its ring structure, which enables the amino acid to disrupt secondary structure elements, which causes structural changes and might therefore affect the function of the protein.
Arginine is the most hydrophobic amino acid, with an Hydropathy index of -4.5. <ref>Kyte J, Doolittle RF (1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology </ref> Here, an amino acid substitution changes the behaviour in a waterous environment.

Next, we wanted to have a look at the frequencies of all substitutions. To achieve that, we calculated for each amino acid pair, the number of observed mutations in our combined mutation map. The Following Figure visualizes these counts:

The Figure shows for each amino acid pair, the number of observed mutations in the above generated combined map (HGMD, dbSNP).

Now let's consider and analyse the two most frequent mutations in our map. As we are mostly looking at disease causing mutations, we expect mutations between amino acids with very different physico-chemical bahaviour to be most abundant.

The most observed mutation from the map is Leu -> Pro. Also, Pro -> Leu is is very frequent. Leucine is a hydrophobic amino acid, whereas Proline is rather hydrophilic. Furthermore, introduction/removal of Proline to/from a structure might cause a severe strucutral change to the whole protein. This is because due to its unqiue ring structure, Protline is able to disrupt helical structures. Thus, introduction of Proline might disrupt secondary structure, resp. removel might introduce new structural elements.
Another frequent mutation is Asp->Gly. Asparagine is the very tiny residue Glycine replaces the bulky Asparagine. Furthermore Asparagine is rather hydrophilic due to its polarity, wehereas Glycine is aliphatic.

References

@@ Line 4: / Line 4: @@
 Depending on the type mutation on the DNA level, it might influence the structure or function of a protein in different extent.
-* ''Frameshift mutation'': If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional.
+* ''Frameshift mutation'': If an insertion or deletion of a sequence occurs - and if the length of this sequence is not divisible by 3 - the reading frame of the downstream protein sequence is shifted either by one or two nucleotides. This leads to a completely different translation of the mRNA into the protein and if the downstream regions is long the protein is likely to be dysfunctional. If the insertion/deletion is divisible by 3, the reading frame is not disrupted and the structural and functional effects on the protein might not be severe.
-* In a ''nonsense mutation'' the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to trncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long.
+* In a ''nonsense mutation'' the codon of an amino acid within the protein is changed, such that a premature stop codon arises. This leads to truncation of the downstream protein sequence. The protein is likely to be dysfunctional if the truncated sequence is long.
 * A ''missense mutation'' describes an alteration of the codon, such that the amino acid in the protein is changed. Depending on the properties and location of the mutated amino acid, changes in structure and function can have a more or less dramatic effect.
 * In a ''silent mutation'', the mutated codon still encodes the same amino acid. Thus the amino acid sequence of the protein is not changed and no structural or functional alteration should be observed.
@@ Line 13: / Line 13: @@
 === HGMD ===
-The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that underly or are associated with diseases. We used the protocol as described [[Task 5 - Mapping SNPs | here]] to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in <span style="background:#ff0000">red</span> to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "'''*'''" are metal binding sites, "'''.'''" are substrate binding sites and "''':'''" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme.
+The Human Gene Muation Database (HGMD) <ref> Krawczak M, Cooper DN: The human gene mutation database (HGMD). Genome Digest 3: 7-8, 1996. </ref> provides a comprehensive collection of mutations within human genes, that are associated with diseases. We used the protocol as described [[Task 5 - Mapping SNPs | here]] to get all missense and nonsense mutations of ARSA. All mutations found were known to be associated with Metachromatic Leukodystrophy. The table of all 90 missense/nonsense mutations is depicted at the end of this section. Furthermore, we mapped all 90 mutations on the sequnece of ARSA and colored them in <span style="background:#ff0000">red</span> to get an impression of the distribution of the mutations (see below). Together with the sequence and the location of the mutations, we marked important binding sites in the graphical illustration below. "'''*'''" are metal binding sites, "'''.'''" are substrate binding sites and "''':'''" is the active site. One can see, that these important functional sites are always near a known mutation, which are therefore likely to cause a misfunction of the enzyme. Furthermore, we can see that the disease causing mutations are rather uniformly distributed along the protein sequence.
 <code>
  >sp|P15289|ARSA_HUMAN<br>
-                              **                             <br>
+                             **                             <br>
  MGAPRSLLLALAAGLAVARPPNIVLIFA<span style="background:#ff0000">D</span>DL<span style="background:#ff0000">G</span>YGDLGCYGHPSSTTPNLDQ<span style="background:#ff0000">L</span>AAGGLRFT<br>
         *                                                    <br>
  DFYVPVS<span style="background:#ff0000">L</span>CTPSRAALLTGRL<span style="background:#ff0000">P</span>V<span style="background:#ff0000">R</span>M<span style="background:#ff0000">G</span>MYPGVLV<span style="background:#ff0000">P</span><span style="background:#ff0000">S</span><span style="background:#ff0000">S</span>RG<span style="background:#ff0000">G</span>LPLEEVTVAEVLAARGYLT<span style="background:#ff0000">G</span>M<br>
- . :                        .                                <br>
+   . :                        .                              <br>
  A<span style="background:#ff0000">G</span>KWHLGVGPEGAF<span style="background:#ff0000">L</span><span style="background:#ff0000">P</span>PHQGFH<span style="background:#ff0000">R</span>FLGI<span style="background:#ff0000">P</span>YSH<span style="background:#ff0000">D</span><span style="background:#ff0000">Q</span><span style="background:#ff0000">G</span><span style="background:#ff0000">P</span><span style="background:#ff0000">C</span>QNLTCFPPAT<span style="background:#ff0000">P</span>C<span style="background:#ff0000">D</span>GG<span style="background:#ff0000">C</span>DQGLVP<span style="background:#ff0000">I</span>P<br>
-                                             .              <br>
+                                                 .           <br>
  <span style="background:#ff0000">L</span>LANLSVEA<span style="background:#ff0000">Q</span><span style="background:#ff0000">P</span>PWLPGLEAR<span style="background:#ff0000">Y</span>MAFAHDLMAD<span style="background:#ff0000">A</span>QRQDRP<span style="background:#ff0000">F</span>FLYY<span style="background:#ff0000">A</span>SH<span style="background:#ff0000">H</span>THY<span style="background:#ff0000">P</span>QFSGQSFAE<br>
-                                     **                  .   <br>
+                                        **                   <br>
- RSG<span style="background:#ff0000">R</span><span style="background:#ff0000">G</span>P<span style="background:#ff0000">F</span>GD<span style="background:#ff0000">S</span>LM<span style="background:#ff0000">E</span>L<span style="background:#ff0000">D</span>AAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>LVIF<span style="background:#ff0000">T</span>A<span style="background:#ff0000">D</span>NGPE<span style="background:#ff0000">T</span>M<span style="background:#ff0000">R</span>MSRG<span style="background:#ff0000">G</span><span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>GL<span style="background:#ff0000">L</span>R<span style="background:#ff0000">C</span><br> <br>
+ RSG<span style="background:#ff0000">R</span><span style="background:#ff0000">G</span>P<span style="background:#ff0000">F</span>GD<span style="background:#ff0000">S</span>LM<span style="background:#ff0000">E</span>L<span style="background:#ff0000">D</span>AAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>LVIF<span style="background:#ff0000">T</span>A<span style="background:#ff0000">D</span>NGPE<span style="background:#ff0000">T</span>M<span style="background:#ff0000">R</span>MSRG<span style="background:#ff0000">G</span><span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>GL<span style="background:#ff0000">L</span>R<span style="background:#ff0000">C</span><br>
+  .                                                         <br>
  GKGTT<span style="background:#ff0000">Y</span><span style="background:#ff0000">E</span><span style="background:#ff0000">G</span><span style="background:#ff0000">G</span>V<span style="background:#ff0000">R</span><span style="background:#ff0000">E</span>P<span style="background:#ff0000">A</span>LAFWPGHIAP<span style="background:#ff0000">G</span>V<span style="background:#ff0000">T</span>HELASSL<span style="background:#ff0000">D</span>LLPTLAALAGAPLP<span style="background:#ff0000">N</span>VTLDGFDLSP<br><br>
  LLLGTG<span style="background:#ff0000">K</span>SP<span style="background:#ff0000">R</span>QSLFFY<span style="background:#ff0000">P</span>SYPD<span style="background:#ff0000">E</span>V<span style="background:#ff0000">R</span>GVFAV<span style="background:#ff0000">R</span><span style="background:#ff0000">T</span>GKYKA<span style="background:#ff0000">H</span>FFTQGSAH<span style="background:#ff0000">S</span>D<span style="background:#ff0000">T</span><span style="background:#ff0000">T</span>ADPACHASSSL<br><br>
@@ Line 228: / Line 229: @@
 === dbSNP ===
-The Single Nucleotide Polymorphism Database (dbSNP) <ref> dbSNP </ref>  is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA. <br>
+The Single Nucleotide Polymorphism Database (dbSNP) <ref> Wheeler DL, Barrett T, Benson DA, et al. (January 2007). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Res. 35 (Database issue): D5–12. </ref>  is an archive for genetic variation within and across different species. Again we used the protocol described here to search the database for known mutations of ARSA. <br>
 The "SNP" search for ARSA yielded 123 known human mutations for the protein. When we searched via the Geneview report for mutations, only 14 mutations appeared.
-We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis. <br>
+We wondered why the first search yielded much more results than the Geneview report. Thus, we investigated the results from the "SNP" search in more deatil and noticed, that the "SNP" search yielded results from different isoforms and sequence versions of ARSA. Also there were a lot of insertions and deletions, that we did not want to consider. Therefore we selected the Geneview report for our isoform, that we used so far (ID=NP_000478) and proceeded with the analysis. <br>
-Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in <span style="background:#00FF00">green</span>). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe  because dbSNP does not specifically stores disease-associated mutations. <br>
+Again, we summarized hte results graphically. Unlike in the first section for HGMD, we also selected synonymous mutations from dbSNP (which are colored in <span style="background:#00FF00">green</span>). This time we had much less mutations than in the analysis with HGMD and as one can see the mutations are not necessarily near important functional sites in the protein. This is maybe  because dbSNP does not specifically stores disease-associated mutations. A table, containing all mutations is depicted at the end of this section.
-A table, containing all mutations is depicted at the end of this section.
 <code>
@@ Line 240: / Line 240: @@
         *                                                    <br>
  DFYVPVSLCTPSRAALLT<span style="background:#ff0000">G</span>RL<span style="background:#ff0000">P</span>VRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM<br>
- . :                        .                                <br>
+   . :                        .                              <br>
  AGKWHLGVGPEGAFL<span style="background:#ff0000">P</span>PHQGFHRFLGIPYS<span style="background:#00FF00">H</span>DQGPCQNLTCFPPATPCDGGCDQGLVPIP<br>
-                                              .              <br>
+                                                 .           <br>
  LLANL<span style="background:#00FF00">S</span>VEAQPP<span style="background:#ff0000">W</span>L<span style="background:#00FF00">P</span>GLEARYMAF<span style="background:#ff0000">A</span><span style="background:#00FF00">H</span>DLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE<br>
+                                        **                   <br>
+ RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC<br>
-                                    **                  .   <br>
+  .                                                         <br>
- RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC<br><br>
  GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDG<span style="background:#ff0000">F</span>DLSP<br><br>
  LLLGTGKSPRQSLFFYPSYP<span style="background:#00FF00">D</span>EVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL<br><br>
@@ Line 292: / Line 292: @@
 |}
-=== The mutation map ===
+=== Comining dbSNP and HGMD ===
+We combined all 104 mutations (snynonymous, missense and nonsense) from both databases. We did not need to align the sequences, because both database used the same sequence version and positions perfectly corresponded to our sequence of ARSA. The overlap between both databases is very low. Only 3 positions show up in both results:
+* ''Position 193'': The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C).
+* ''Position 136'': The mutations are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP.
+* ''Position 82'': Is mutation is identical in both databases and leads to a substitution: P -> L.
+We again visualized the distribution of the mutation along the sequence. Synonymous substitutions are depicted in <span style="background:#00FF00">green</span>, missense and nonsense mutations are depicted in <span style="background:#ff0000">red</span>:
 <code>
  >sp|P15289|ARSA_HUMAN<br>
- MGAPRSLLLALAAGLAVARPPNIVLIFA<span style="background:#ff0000">D</span>DL<span style="background:#ff0000">G</span>GGDLGCYGHPSSTTPNLDQ<span style="background:#ff0000">L</span>LAGGLRFT<br>
+ MGAPRSLLLALAAGLAVARPPNIVLIFA<span style="background:#ff0000">D</span>DL<span style="background:#ff0000">G</span>YGDLGCYGHPSSTTPNLDQ<span style="background:#ff0000">L</span>AAGGLRFT<br>
- DFYVPVS<span style="background:#ff0000">L</span>LTPSRAALLT<span style="background:#ff0000">G</span>GL<span style="background:#ff0000">P</span><span style="background:#ff0000">P</span>P<span style="background:#ff0000">R</span>R<span style="background:#ff0000">G</span>GYPGVLV<span style="background:#ff0000">P</span>P<span style="background:#ff0000">S</span><span style="background:#ff0000">S</span>G<span style="background:#ff0000">G</span>GPLEEVTVAEVLAARGYLT<span style="background:#ff0000">G</span>G<br>
+ DFYVPVS<span style="background:#ff0000">L</span>CTPSRAALLT<span style="background:#ff0000">G</span>RL<span style="background:#ff0000">P</span>V<span style="background:#ff0000">R</span>M<span style="background:#ff0000">G</span>MYPGVLV<span style="background:#ff0000">P</span><span style="background:#ff0000">S</span><span style="background:#ff0000">S</span>RG<span style="background:#ff0000">G</span>LPLEEVTVAEVLAARGYLT<span style="background:#ff0000">G</span>M<br>
- A<span style="background:#ff0000">G</span>GWHLGVGPEGAF<span style="background:#ff0000">L</span>L<span style="background:#ff0000">P</span><span style="background:#ff0000">P</span>HQGFH<span style="background:#ff0000">R</span>RLGI<span style="background:#ff0000">P</span>PS<span style="background:#00FF00">H</span>H<span style="background:#ff0000">D</span><span style="background:#ff0000">Q</span><span style="background:#ff0000">G</span><span style="background:#ff0000">P</span><span style="background:#ff0000">C</span>NLTCFPPAT<span style="background:#ff0000">P</span>P<span style="background:#ff0000">D</span>DG<span style="background:#ff0000">C</span>CQGLVP<span style="background:#ff0000">I</span>I<br>
+ A<span style="background:#ff0000">G</span>KWHLGVGPEGAF<span style="background:#ff0000">L</span><span style="background:#ff0000">P</span>PHQGFH<span style="background:#ff0000">R</span>FLGI<span style="background:#ff0000">P</span>YS<span style="background:#00FF00">H</span><span style="background:#ff0000">D</span><span style="background:#ff0000">Q</span><span style="background:#ff0000">G</span><span style="background:#ff0000">P</span><span style="background:#ff0000">C</span>QNLTCFPPAT<span style="background:#ff0000">P</span>C<span style="background:#ff0000">D</span>GG<span style="background:#ff0000">C</span>DQGLVP<span style="background:#ff0000">I</span>P<br>
- <span style="background:#ff0000">L</span>LANL<span style="background:#00FF00">S</span>SEA<span style="background:#ff0000">Q</span>Q<span style="background:#ff0000">P</span><span style="background:#ff0000">W</span>W<span style="background:#00FF00">P</span>PLEAR<span style="background:#ff0000">Y</span>YAF<span style="background:#ff0000">A</span>A<span style="background:#00FF00">H</span>LMAD<span style="background:#ff0000">A</span>ARQDRP<span style="background:#ff0000">F</span>FLYY<span style="background:#ff0000">A</span>AH<span style="background:#ff0000">H</span>HHY<span style="background:#ff0000">P</span>PFSGQSFAE<br>
+ <span style="background:#ff0000">L</span>LANL<span style="background:#00FF00">S</span>VEA<span style="background:#ff0000">Q</span><span style="background:#ff0000">P</span>P<span style="background:#ff0000">W</span>L<span style="background:#00FF00">P</span>GLEAR<span style="background:#ff0000">Y</span>MAF<span style="background:#ff0000">A</span><span style="background:#00FF00">H</span>DLMAD<span style="background:#ff0000">A</span>QRQDRP<span style="background:#ff0000">F</span>FLYY<span style="background:#ff0000">A</span>SH<span style="background:#ff0000">H</span>THY<span style="background:#ff0000">P</span>QFSGQSFAE<br>
- RSG<span style="background:#ff0000">R</span>R<span style="background:#ff0000">G</span><span style="background:#ff0000">F</span>FD<span style="background:#ff0000">S</span>SM<span style="background:#ff0000">E</span>E<span style="background:#ff0000">D</span>DAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>TVIF<span style="background:#ff0000">T</span>T<span style="background:#ff0000">D</span>DGPE<span style="background:#ff0000">T</span>T<span style="background:#ff0000">R</span>RSRG<span style="background:#ff0000">G</span>G<span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>L<span style="background:#ff0000">L</span>L<span style="background:#ff0000">C</span>C<br>
+ RSG<span style="background:#ff0000">R</span><span style="background:#ff0000">G</span>P<span style="background:#ff0000">F</span>GD<span style="background:#ff0000">S</span>LM<span style="background:#ff0000">E</span>L<span style="background:#ff0000">D</span>AAVGTLMTAIGDLGLLEE<span style="background:#ff0000">T</span>LVIF<span style="background:#ff0000">T</span>A<span style="background:#ff0000">D</span>NGPE<span style="background:#ff0000">T</span>M<span style="background:#ff0000">R</span>MSRG<span style="background:#ff0000">G</span><span style="background:#ff0000">C</span><span style="background:#ff0000">S</span>GL<span style="background:#ff0000">L</span>R<span style="background:#ff0000">C</span><br>
- KGTT<span style="background:#ff0000">Y</span>Y<span style="background:#ff0000">E</span><span style="background:#ff0000">G</span><span style="background:#ff0000">G</span><span style="background:#ff0000">R</span>R<span style="background:#ff0000">E</span><span style="background:#ff0000">A</span>AAFWPGHIAP<span style="background:#ff0000">G</span>G<span style="background:#ff0000">T</span>TELASSL<span style="background:#ff0000">D</span>DLPTLAALAGAPLP<span style="background:#ff0000">N</span>NTLDG<span style="background:#ff0000">F</span>FLSP<br>
+ GKGTT<span style="background:#ff0000">Y</span><span style="background:#ff0000">E</span><span style="background:#ff0000">G</span><span style="background:#ff0000">G</span>V<span style="background:#ff0000">R</span><span style="background:#ff0000">E</span>P<span style="background:#ff0000">A</span>LAFWPGHIAP<span style="background:#ff0000">G</span>V<span style="background:#ff0000">T</span>HELASSL<span style="background:#ff0000">D</span>LLPTLAALAGAPLP<span style="background:#ff0000">N</span>VTLDG<span style="background:#ff0000">F</span>DLSP<br>
- LLLGTG<span style="background:#ff0000">K</span>KP<span style="background:#ff0000">R</span>RSLFFY<span style="background:#ff0000">P</span>PYP<span style="background:#00FF00">D</span>D<span style="background:#ff0000">E</span><span style="background:#ff0000">R</span>RVFAV<span style="background:#ff0000">R</span>R<span style="background:#ff0000">T</span>KYKA<span style="background:#ff0000">H</span>HFTQGSAH<span style="background:#ff0000">S</span>S<span style="background:#ff0000">T</span>T<span style="background:#ff0000">T</span>DPACHASSSL<br>
+ LLLGTG<span style="background:#ff0000">K</span>SP<span style="background:#ff0000">R</span>QSLFFY<span style="background:#ff0000">P</span>SYP<span style="background:#00FF00">D</span><span style="background:#ff0000">E</span>V<span style="background:#ff0000">R</span>GVFAV<span style="background:#ff0000">R</span><span style="background:#ff0000">T</span>GKYKA<span style="background:#ff0000">H</span>FFTQGSAH<span style="background:#ff0000">S</span>D<span style="background:#ff0000">T</span><span style="background:#ff0000">T</span>ADPACHASSSL<br>
- TAHE<span style="background:#ff0000">P</span>P<span style="background:#ff0000">P</span><span style="background:#ff0000">L</span>L<span style="background:#ff0000">Y</span>LSKDPGENY<span style="background:#ff0000">N</span>NLGGVA<span style="background:#ff0000">G</span>GTPEVLQALKQLQLLK<span style="background:#ff0000">A</span>ALDA<span style="background:#ff0000">A</span>ATFGPSQVARG<br>
+ TAHE<span style="background:#ff0000">P</span><span style="background:#ff0000">P</span>L<span style="background:#ff0000">L</span><span style="background:#ff0000">Y</span>DLSKDPGENY<span style="background:#ff0000">N</span>LLGGVA<span style="background:#ff0000">G</span>ATPEVLQALKQLQLLK<span style="background:#ff0000">A</span>QLDA<span style="background:#ff0000">A</span>VTFGPSQVARG<br>
- EDPALQIC<span style="background:#ff0000">C</span>CPGCTP<span style="background:#ff0000">R</span>RACCHCPDPHA
+ EDPALQIC<span style="background:#ff0000">C</span>HPGCTP<span style="background:#ff0000">R</span>PACCHCPDPHA
 </code>
+Furthermore, we visualized the mutations on the 3-dimensional structure of the protein:
-There are 3 identical mutated residues, that are annotated in both databases are at position:
-* 193: The mutations are different. In HGMD, the mutation results in a premature stop codon, thus the main part of the whole protein is truncated. In dbSNP, there is a amino acid substitution (W -> C).
-* 136: The mutationas are different amino acid substitutions. P -> L is annotated in HGMD and P -> A is annotated in dbSNP.
-* 82: Is mutation is identical in both databases and leads to a substitution: P -> L.
 [[File:Mut map 1auk.png | 400px | center | thumb | Structure of ARSA. Synonymous mutations are shown in green, missense/nonsense mutations in red. The active site is depicted in yellow.]]
+=== Summary Satistics ===
+In this section we shortly want to analyse the mutation frequencies of the amino acids. First, we counted for all 20 amino acids, how often they are mutated, regarding to the above generated mutation map. The Figure below shows the results.
+[[File:subst_bar.jpeg | 400px | center | thumb | The Figure shows the number of mutations in the reference sequence for each amino acid.]]
+Gly, Pro and Arg show the highest mutation freqeuncy. All of these amino acids show very distinct physico-chemical properties. We expect these to be overrepresented in our map, because we are mostly looking at disease-causing mutations.
+* Glycine is the smallest amino acid. Replacing it by any other bigger amino acid might cause structural chnages to the protein. <br>
+* Proline is unique due to its ring structure, which enables the amino acid to disrupt secondary structure elements, which causes structural changes and might therefore affect the function of the protein. <br>
+* Arginine is the most hydrophobic amino acid, with an Hydropathy index of -4.5. <ref>Kyte J, Doolittle RF (1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology </ref> Here, an amino acid substitution changes the behaviour in a waterous environment. <br>
+Next, we wanted to have a look at the frequencies of all substitutions. To achieve that, we calculated for each amino acid pair, the number of observed mutations in our combined mutation map. The Following Figure visualizes these counts:
+[[File:subst_matrix.jpeg | 400px | center | thumb | The Figure shows for each amino acid pair, the number of observed mutations in the above generated combined map (HGMD, dbSNP).]]
+Now let's consider and analyse the two most frequent mutations in our map. As we are mostly looking at disease causing mutations, we expect mutations between amino acids with very different physico-chemical bahaviour to be most abundant. <br>
+* The most observed mutation from the map is Leu -> Pro. Also, Pro -> Leu is is very frequent. Leucine is a hydrophobic amino acid, whereas Proline is rather hydrophilic. Furthermore, introduction/removal of Proline to/from a structure might cause a severe strucutral change to the whole protein. This is because due to its unqiue ring structure, Protline is able to disrupt helical structures. Thus, introduction of Proline might disrupt secondary structure, resp. removel might introduce new structural elements.
+* Another frequent mutation is Asp->Gly. Asparagine is the very tiny residue Glycine replaces the bulky Asparagine. Furthermore Asparagine is rather hydrophilic due to its polarity, wehereas Glycine is aliphatic.
 === References ===
 <references/>
+[[Category : Metachromatic_Leukodystrophy 2011]]

Difference between revisions of "Mapping mutations of ARS A"

Latest revision as of 13:58, 29 March 2012

Contents

Mutations in general

HGMD

dbSNP

Comining dbSNP and HGMD

Summary Satistics

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools