Difference between revisions of "Sequence Search and Multiple Sequence Alignment (PKU)"

From Bioinformatikpedia
Line 52: Line 52:
 
Instead of mapping hits in big_80 to PDB to get structural information for the alignments, we performed an additional search in PDB at NCBI, with parameters: scoring matrix=PAM70, gap open = 10, gap extend = 1, composition based statistics, max. 1000 target sequences.
 
Instead of mapping hits in big_80 to PDB to get structural information for the alignments, we performed an additional search in PDB at NCBI, with parameters: scoring matrix=PAM70, gap open = 10, gap extend = 1, composition based statistics, max. 1000 target sequences.
 
==MSA==
 
==MSA==
  +
  +
===Datasets===
  +
  +
===ClustalW===
  +
clustalw -align -infile=NN.fasta -outfile=clustalW_NN.aln
  +
  +
===Muscle===
  +
muscle -in NN.fasta -out muscle_NN.fasta
  +
  +
===T-coffee===
  +
t_coffee NN.fasta

Revision as of 17:20, 2 May 2012

Short Task Description

Perform database searches using different search tools with the PAH protein as query

Create and evaluate multiple sequence alignments


Reference Sequence of PAH

>sp|P00439|PH4H_HUMAN Phenylalanine-4-hydroxylase OS=Homo sapiens GN=PAH PE=1 SV=1
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDV
NLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPW
FPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYM
EEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFA
QFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSE
KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQR
IEVLDNTQQLKILADSINSEIGILCSALQKI

Database Searches

Blast

time blast2 -p blastp -d /mnt/project/pracstrucfunc12/data/big/big -i Dropbox/Phenylketonuria/Task1/PAH.fasta -o results_blast2_standard

real 1m47.401s user 1m25.290s sys 0m18.280s

time blast2 -p blastp -d /mnt/project/pracstrucfunc12/data/big/big -i Dropbox/Phenylketonuria/Task1/PAH.fasta -o results_blast2_e-10 -e 0.0000000001 -v 2000

real 1m35.454s user 1m21.700s sys 0m3.100s

HHBlits

time hhblits -i Dropbox/Phenylketonuria/Task1/PAH.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o results_hhblits_standard

real 6m10.059s user 3m15.640s sys 0m40.220s

hhblits -i Dropbox/Phenylketonuria/Task1/PAH.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o results_hhblits_n4_e-7 -n 4 -e 0.0000001 -o results_hhblits_n4_e-7

HHSearch

time hhsearch -i Dropbox/Phenylketonuria/Task1/PAH.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current_hhm_db

real 13m27.782s user 13m18.120s sys 0m8.480s

PDB

Instead of mapping hits in big_80 to PDB to get structural information for the alignments, we performed an additional search in PDB at NCBI, with parameters: scoring matrix=PAM70, gap open = 10, gap extend = 1, composition based statistics, max. 1000 target sequences.

MSA

Datasets

ClustalW

clustalw -align -infile=NN.fasta -outfile=clustalW_NN.aln

Muscle

muscle -in NN.fasta -out muscle_NN.fasta

T-coffee

t_coffee NN.fasta