Fabry:Mapping point mutations

From Bioinformatikpedia

Fabry Disease » Mapping point mutations


The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Database comparison

The databases we used to find the information about SNPs in our gene differ in the amount of information, its quality and how up-to-date they are. <xr id="tab:database_comparison"/> gives a short overview of the differences.

<figtable id="tab:database_comparison"> Comparison of the databases used for the analysis

Database name Information given Last updated Evidence
dbSNP Contains SNPs from various organisms (currently more than 300 millions), but only 1/6 to 1/5 are validated.<ref name="dbsnp_summary">dbSNP Summary; last accessed on June 10, 2012</ref> NCBI dbSNP Build 135, build on Oct 13, 2011<ref name="dbsnp_summary">dbSNP Summary; last accessed on June 10, 2012</ref> Relies on submissions from authors and other database projects<ref name="dbsnp_origin">SNP FAQ Archive: dbSNP Data Origins; last accessed on June 10, 2012</ref>. These are not manually checked, but the submitters are responsible for the correctness<ref name="dbsnp_quality">SNP FAQ Archive: dbSNP Data Quality Control; last accessed on June 10, 2012</ref>.
The Human Gene Mutation Database (HGMD) Provides human SNPs that are associated with a disease, i.e. they influence the phenotype, so silent mutations are excluded<ref name="hgmd">Human Gene Mutation Database; last accessed on June 10, 2012</ref> January 2012 (public available data is at least 3 years old)<ref name="hgmd_update">HGMD Disclaimer; last accessed on June 10, 2012</ref> Literature, manually curated<ref name="hgmd">Human Gene Mutation Database; last accessed on June 10, 2012</ref>
OMIM Overview of human genes and genetic phenotypes of all known mendelian disorders and over 12,000 genes with focus on the relationship between phenotype and genotype. <ref name="OMIM"> About: OMIM® - Online Mendelian Inheritance in Man® http://omim.org/about; June 10th, 2012</ref> daily (we used data from June 9th, 2012) Literature, manually curated
SNPdbe <ref>Schaefer C, Meier A, Rost B, Bromberg Y (2012) SNPdbe: Constructing an nsSNP functional impacts database, Bioinformatics; 28(4):601-602</ref> Joins information of several databases (Swissprot, dbSNP, PMD, OMIM, 1000 genomes) and predicts functional effect March 2012 (we used data from June 10th, 2012) Depending on source
SNPedia Wiki investigating human genetics and listing information about the effects of variations in DNA <ref>SNPedia, http://www.snpedia.com/index.php/SNPedia; June 9th, 2012</ref> constantly (we used data from June 9th, 2012) peer-reviewed scientific publications

</figtable>

HGMD

The Human Gene Mutation Database contains 514 mutations for the GLA gene, out of which 354 are of the missense/nonsense type. These missense/nonsense mutations are listed in this table in detail. All these mutations are disease-causing, as already written in <xr id="tab:database_comparison"/>.

Generally, HGMD distinguishes between the following mutation types:

  • Missense/nonsense
Point mutations in coding regions
  • Splicing
These mutations have an effect on the mRNA splicing.
  • Regulatory
Mutations that to changes in the regulation of the gene
  • Small deletions
Deletions of 20 bp or less
  • Small insertions
Insertions of 20 bp or less
  • Small indels
Sequence parts of 20 bp or less that are replaced by a new sequence of not necessarily the same length
  • Gross deletions
Deletions of more than 20 bp, can affect several kb
  • Gross insertions/duplications
Insertions or duplications of more than 20 bp, often several kb
  • Complex rearrangements
Deletions, insertions, duplications and inversions that were observed at specific gene location.
  • Repeat variations
The number of repetitions of a gene region differs from the wild type


SNPs distribution

<figure id="hgmd_distribution">

The number of SNPs of gene GAL from HGMD per protein sequence position. The start of the mature α-galactosidase A protein at position 32 and the active site at position 170 and 231 are marked.

</figure>

As depicted in <xr id="hgmd_distribution"/> there are no real hotspots according to HGMD. Since the cause of the Fabry disease is a degenerated α-galactosidase A protein, the SNPs at position 0 in Methionine (the start codon) could be expected. Apart from that, there are SNP accumulations near the start of the mature protein around position 50, at the active site at position 170 and 231 and one bigger one in the region from position 260 to 300.


dbSNP

<figure id="dbSNP_distribution">

The number of SNPs of gene GAL from dbSNP per protein sequence position. The start of the mature α-galactosidase A protein at position 32 and the active site at position 170 and 231 are marked.

</figure>

At the time we checked dbSNP, it listed eight silent single point mutations. Their distribution across the α-galactosidase A protein sequence is shown in <xr id="dbSNP_distribution"/>. It is not surprising that there are no hotspots, when only 8 SNPs are examined. The only notable, but small, accumulation is around position 40.

Single base-pair insertion and deletion, which also count as point mutations, can have a more severe impact on the protein, because they introduce frameshifts. Frameshifts change the sequence and therefore the protein structure for larger regions of the gene. They can be less severe when several of such mutations occur close to each other on the DNA sequence and, by chance, the original frames are restored. In this case, only the amino acids between the first frameshift and the one, that shifts one frame to an original one, are affected.


OMIM

<figure id="fig:OMIM_SNPs">

Distribution and number of SNPs along the sequence of GLA

</figure>

The OMIM database lists 62 (actually 63, since one entry contains 2 SNPs) allelic variants for the gene GLA. Out of these, 6 are exon deletions, 10 are base pair deletions, 4 are located in non-coding regions, 1 was an exon duplication and 2 Insertions (see OMIM data-Allelic variants). The remaining 40 variants are specified as SNPs and are listed in OMIM data-SNPs.

Of course, all of the mentioned mutations are disease causing. In <xr id="fig:OMIM_SNPs"/> we show the distribution of SNPs along the sequence of the α-galactosidase A coding gene. There seems to be no real "hotspot" of SNPs, but a long strip of sequence without any point mutation (position 66-112). In the signal peptide only one mutation is listed, and in at the active site amino acids (170 and 231), as well as the binding site (203-207) there are none.


SNPedia

<figure id="fig:SNPedia_SNPs">

Distribution and number of SNPs found in SNPedia along the sequence of GLA

</figure>

In SNPedia exists a site for the disease, that is caused by mutations in the gene we are examining, Fabry Disease, but not for the gene itself (GLA). Thus we performed a query with the search term "Gene=GLA", which resulted in 40 hits (see SNPedia_data). 32 are missense mutations and the remaining 8 are stop-gained.

Again, only one hit was in the signal peptide part of the sequence and none of the important residues is mutated (see <xr id="fig:SNPedia_SNPs"/>). The gap of no variation between position 66 and 112 can be observed as well and besides from that a rather even distribution without big peaks.
Without deeper look, it can be said, that the results from OMIM and SNPedia are very similar, by simply comparing <xr id="fig:SNPedia_SNPs"/> and <xr id="fig:OMIM_SNPs"/>

SNPdbe

<figure id="fig:SNPdbe_SNPs">

Distribution and number of SNPs found in SNPdbe along the sequence of GLA. Disease causing mutations are red, non-disease causing green

</figure> <figure id="fig:SNPdbe_SNPs2">

Disease and non-disease causing SNPs from SNPdbe

</figure>

A search in the SNPdbe database revealed 57 mutations (source), which are listed in SNPdbe data SNPdbe lists 35 disease causing and 22 non-disease causing mutations (see <xr id="fig:SNPdbe_SNPs2"/>). The distribution of disease causing SNPs again looks similar to the distributions of OMIM and SNPedia (see <xr id="fig:SNPedia_SNPs"/> and <xr id="fig:OMIM_SNPs"/>), although there are more SNPs in the signal peptide region (position 1-31) and the gap is closed by one mutation at position 94.

Here, the distribution of SNPs that are not causing Fabry's Disease (or any other disease) appears to be homogeneous, except for a long sequence part from residue 29 to 117.

If we use only confirmed point mutations (see <xr id="fig:SNPdbe_SNPs_conf"/>), the plot looks very different, since there are only few non-disease causing SNPs left (7).

<figure id="fig:SNPdbe_SNPs_conf">

Confirmed SNPs from SNPdbe

</figure>


Mapping

<figure id="fig:posvssd_noHGMD">

Distribution of disease and non-disease causing SNPs along the sequence of GLA
(HGMD is not displayed)

</figure>

<figure id="fig:SNPdist_disease">

SNPs that are listed as diseases causing were mapped onto the sequence and their frequency displayed

</figure> <figure id="fig:SNPdist_non_disease">

SNPs that are listed as non-diseases causing were mapped onto the sequence and their frequency displayed

</figure>

<figure id="fig:hist_disease">

Histogram of SNPs that are listed as diseases causing; Duplicates were counted only once

</figure> <figure id="fig:hist_non_disease">

Histogram of SNPs that are listed as non-diseases causing or silent; Duplicates were counted only once

</figure>

All found SNPs are listed at Fabry SNPs. As reference sequence serves NP_000160.1 for all databases which is identical to the swissprot sequence P06280 we already used in the previous tasks.

For a first overview of the number of SNPs at each position, we displayed the disease and non-disease causing SNPs along the sequence. <xr id="fig:SNPdist_disease"/> and <xr id="fig:SNPdist_non_disease"/> show, that there are, of course, way more disease causing point mutations (red) than non-disease causing and silent ones (green). Both groups appear to be distributed evenly along the sequence, but since we wanted to be sure, we created a histogram. Important to note here, is that <xr id="fig:SNPdist_disease"/> and <xr id="fig:SNPdist_non_disease"/> list all hits of all databases, but in the histograms (see <xr id="fig:hist_disease"/> and <xr id="fig:hist_non_disease"/>), redundant information is filtered out. There is not much to say about the silent and non-disease causing mutations (see here), because there are only very few and they are fairly even distributed along the sequence, except for the afore mentioned gap between 50 and 120. The histogram of the disease causing mutations (see here) does not reveal an actual accumulation at one certain point of the sequence, but several peaks that stick out (e.g. between 40 and 50). We can see again, that the number of point mutations in the signal peptide is only sparse and in the area of 20 to 30 there is no SNP at all.

In order to discover the overlap of the different databases, we displayed each SNP as a point along the sequence, indicating, whether it causes Fabry Disease, or not and which database the information was gained from. We tried to display all databases, but since there are too many HGMD hits, and no real hotspots of the SNPs in this database could be seen (see Fabry_posvssd_all.png), we decided to display only the remaining four databases in <xr id="fig:posvssd_noHGMD"/>.

Here, it becomes obvious, that the disease causing mutations have almost 100% coverage (pink, green and blue points almost never appear alone, but most of the time in triplets), while the non-disease causing point mutations do not group together.

<figure id="fig:map_3dstruc_silent">

Silent SNPS (blue) mapped onto the structure of α-Galactosidase A

</figure>

<figure id="fig:map_3dstruc_non_disease">

Non-disease causing SNPS (purple) mapped onto the structure of α-Galactosidase A

</figure>

<figure id="fig:map_3dstruc_disease">

Disease causing SNPS (red) mapped onto the structure of α-Galactosidase A

</figure>

In <xr id="fig:map_3dstruc_silent"/>, <xr id="fig:map_3dstruc_non_disease"/> and <xr id="fig:map_3dstruc_disease"/> each class of point mutations (silent, non-disease and disease causing) is mapped onto the structure of α-Galactosidase A. We can see, that it does not seem to be important, where the mutation is located (i.e. type of secondary structure, exposed/buried,... ) to determine whether the polymorphism is disease causing, or not.


References

<references/>