Difference between revisions of "Task 5: Mapping point mutations"
(→Methodology) |
(→Retrieving Data) |
||
Line 42: | Line 42: | ||
===== Retrieving Data ===== |
===== Retrieving Data ===== |
||
+ | [[Image:Silent mutations.png|thumb|top|Figure 1: Shows the query and result for our silent mutation search]] |
||
− | We searched in [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP] for silent mutations in coding regions. This means we only considered those SNPs which alter the triplet but not the amino acid |
+ | We searched in [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP] for silent mutations in coding regions. This means we only considered those SNPs which alter the triplet but not the amino acid. |
+ | To do so we used the Entrez interface of NCBI which is accessible under this URL: |
||
− | We constructed the following query to search for SNPs which are considered silent in the coding regions of the human PAH gene (see figure 1): [[Image:Silent mutations.png|thumb|top|Figure 1: Shows the query and result for our silent mutation search]] |
||
+ | * ://www.ncbi.nlm.nih.gov/sites/entrez?Db=snp |
||
+ | |||
+ | The advantage of this Entrez interface is that we can construct arbitrary complex queries to restrict our result set. |
||
+ | |||
+ | We constructed the following query to search for SNPs which are considered silent in the coding regions of the human PAH gene (see figure 1): |
||
* "synonymous-codon"[Function_Class] AND PAH[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS] |
* "synonymous-codon"[Function_Class] AND PAH[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS] |
||
− | Results of this query can be accessed directly via the following URL: |
+ | Results of this query can be accessed directly via the following URL: |
+ | |||
+ | * http://www.ncbi.nlm.nih.gov/sites/entrez?Db=snp&Cmd=DetailsSearch&Term=%22synonymous-codon%22%5BFunction_Class%5D+AND+PAH%5BGENE%5D+AND+%22human%22%5BORGN%5D+AND+%22snp%22%5BSNP_CLASS%5D |
||
We decided to download the results as FlatFile. This seemed to be the most simple format to process and contains almost all information we need. |
We decided to download the results as FlatFile. This seemed to be the most simple format to process and contains almost all information we need. |
Revision as of 20:08, 16 June 2011
Contents
Task description
A detailed task description can be found here: Mapping point mutations
SNP databases
HGMD
- HGMD
- Searched for PAH
- 429 Missense/Nonsense mutations known by HGMD Professional
There are several mutation types known for PAH:
- Missense/nonsense
- Splicing
- Regulatory
- Small deletions
- Small insertions
- Small indels
- Gross deletions
- Gross insertions/duplications
- Complex rearrangements
One additional category of mutation is known, but is not recorded for PAH
- Repeat variations
Reference Sequence
The reference sequence is given by the accession number NM_000277.1, whose entry contains the following amino acid sequence:
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEEN DVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDI GATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCG FHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPM YTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLC KQGDSIKAYGAGLLSSFGELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESF NDAKEKVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQK IK
SNPs
SNPdb
Methodology
Retrieving Data
We searched in dbSNP for silent mutations in coding regions. This means we only considered those SNPs which alter the triplet but not the amino acid.
To do so we used the Entrez interface of NCBI which is accessible under this URL:
- ://www.ncbi.nlm.nih.gov/sites/entrez?Db=snp
The advantage of this Entrez interface is that we can construct arbitrary complex queries to restrict our result set.
We constructed the following query to search for SNPs which are considered silent in the coding regions of the human PAH gene (see figure 1):
- "synonymous-codon"[Function_Class] AND PAH[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]
Results of this query can be accessed directly via the following URL:
We decided to download the results as FlatFile. This seemed to be the most simple format to process and contains almost all information we need.
Processing Data
Results
We could find the following silent mutations in dbSNP:
Identifier | AA-Position | Reference Triplet | Mutated Triplet | Reference Allele | Mutated Allele | Frame | Reference Residue | Mutated Residue |
---|---|---|---|---|---|---|---|---|
rs117308669 | 65 | GAA | GAG | A | G | 3 | E | E |
rs75065106 | 257 | CTG | TTG | C | T | 1 | L | L |
rs62651567 | 322 | ACA | ACG | A | G | 3 | T | T |
rs62508648 | 366 | CTG | CTA | G | A | 3 | L | L |
rs61747292 | 320 | CTC | CTT | C | T | 3 | L | L |
rs59326968 | 425 | AAT | AAC | T | C | 3 | N | N |
rs17852374 | 35 | TCA | TCG | A | G | 3 | S | S |
rs1801152 | 413 | TAC | TAT | C | T | 3 | Y | Y |
rs1801151 | 399 | AGG | CGG | A | C | 1 | R | R |
rs1801150 | 398 | GTA | GTT | A | T | 3 | V | V |
rs1801147 | 202 | TGC | TGT | C | T | 3 | C | C |
rs1801146 | 136 | AGC | AGT | C | T | 3 | S | S |
rs1801145 | 9 | GGC | GGG | C | G | 3 | G | G |
rs1126758 | 231 | CAG | CAG | A | G | 3 | Q | Q |
rs1042503 | 244 | GTG | GTA | G | A | 3 | V | V |
rs772897 | 384 | CTG | CTC | G | C | 3 | L | L |
Comparing the annotation of HGMD and SNPdb
Alignment of the reference sequences
We decided to use the sequence of PAH of Uniprot (see UniProt).
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV
NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW
FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM
EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA
QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE
KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR
IEVLDNTQQLKILADSINSEIGILCSALQKIK
Alignment with the reference sequence used in HGMD
The resulting alignment shows a 100% identity without any gaps. Therefore it is a "self-alignment".