Difference between revisions of "Researching SNPs TSD Journal"

From Bioinformatikpedia
(Created page with "== HGMD == == dbSNP == == SNPdbe == == Mutation map ==")
 
(HGMD)
Line 1: Line 1:
 
== HGMD ==
 
== HGMD ==
  +
=== Sequence retrieval and comparison ===
  +
Retrieve the protein sequence from HGMD by going to the sequence view and applying switch view, as described in the task description. Download the page manually and save to a file (cdna_new.php). Then execute the following command:
  +
<source lang="bash">
  +
grep -P --only-matching "<td><font face=\"courier new\" color=\"black\">\w{3}</font><br><font face=\"courier new\" color=\"#666666\">\w{3}</font>" ./cdna_new.php | cut -c99-101 | sed -n '$!p' | tr -d '\n' | tr '[:lower:]' '[:upper:]' |
  +
./ThreeToOneLetterCode.py | tr -d '\n' > hgmdSequence_aa
  +
</source>
  +
Where 'ThreeToOneLetterCode.py' is the following script
  +
<source lang="python">
  +
#!/usr/bin/env python
  +
  +
from Bio.PDB import to_one_letter_code as one_letter
  +
import sys
  +
  +
seq = sys.stdin.readline()
  +
  +
for aaa in range(0, len(seq), 3) :
  +
print one_letter[seq[aaa:aaa+3]]
  +
</source>
  +
To check whether the protein sequence from HGMD is the same than the one in Uniprot perform the following operations:
  +
<source lang="bash">
  +
curl -s http://www.uniprot.org/uniprot/P06865.fasta | sed '1d' | tr -d '\n' > HEXA_HUMAN #Get Uniprot sequence without header
  +
perl -p -e 's|(.)|$1\n|g' HEXA_HUMAN > temp1 #Insert linebreak after every character in both sequences, for easy diff'ing
  +
perl -p -e 's|(.)|$1\n|g' hgmdSequence_aa > temp2
  +
diff temp1 temp2
  +
</source>
  +
This yields that position 436 differs with an Ile in Uniprot and a Val in HGMD.
  +
 
== dbSNP ==
 
== dbSNP ==
 
== SNPdbe ==
 
== SNPdbe ==

Revision as of 22:08, 8 June 2012

HGMD

Sequence retrieval and comparison

Retrieve the protein sequence from HGMD by going to the sequence view and applying switch view, as described in the task description. Download the page manually and save to a file (cdna_new.php). Then execute the following command: <source lang="bash">

grep -P --only-matching "\w{3}
\w{3}" ./cdna_new.php | cut -c99-101 | sed -n '$!p' | tr -d '\n' | tr '[:lower:]' '[:upper:]' |

./ThreeToOneLetterCode.py | tr -d '\n' > hgmdSequence_aa </source> Where 'ThreeToOneLetterCode.py' is the following script <source lang="python">

  1. !/usr/bin/env python

from Bio.PDB import to_one_letter_code as one_letter import sys

seq = sys.stdin.readline()

for aaa in range(0, len(seq), 3) :

 print one_letter[seq[aaa:aaa+3]]

</source> To check whether the protein sequence from HGMD is the same than the one in Uniprot perform the following operations: <source lang="bash"> curl -s http://www.uniprot.org/uniprot/P06865.fasta | sed '1d' | tr -d '\n' > HEXA_HUMAN #Get Uniprot sequence without header perl -p -e 's|(.)|$1\n|g' HEXA_HUMAN > temp1 #Insert linebreak after every character in both sequences, for easy diff'ing perl -p -e 's|(.)|$1\n|g' hgmdSequence_aa > temp2 diff temp1 temp2 </source> This yields that position 436 differs with an Ile in Uniprot and a Val in HGMD.

dbSNP

SNPdbe

Mutation map