Task 2 - Alignments with PAH Reference
Contents
Task 2 - Alignments with PAH Reference
Sequence Searches
BLAST
Running
time sudo blastall -p blastp -d '/data/blast/nr/nr' -i ./reference.fasta -o './reference.blast' -b 500
real 11m30.762s
user 3m11.440s
sys 0m12.250s
Results
FASTA
Installation
- Used Virtual Box with Linux.
- Download fasta3.tar.gz from ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/
- Unzip the archive
- Build it with: make -f /apps/fasta/make/Makefile.linux64 all
Running
time ./fasta36 /home/student/reference.fasta /data/nr/nr
interactive:
- Enter filename for results []: /home/student/reference.fasta_search
- How many scores do you want to see: 500
- More scores? 0
- Display alignments also? (y/n) [n] y
- number of alignments [500]? 500
real 10m13.878s
user 7m26.270s
sys 0m20.230s
Results
PSI-BLAST
Running
Parameterset 1
time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e10E-6_i3.blast' -h 10E-6 -j 3 -C './reference_i3_e10E-6.chk'
real 37m56.447s
user 14m27.620s
sys 0m54.620s
Parameterset 2
time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e005_i3.blast' -h 0.005 -j 3 -C './reference_i3_e005.chk'
real 37m41.487s
user 14m42.850s
sys 0m52.370s
Parameterset 3
time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e005_i5.blast' -h 0.005 -j 5 -C './reference_i5_e005.chk'
real 62m22.175s
user 26m25.410s
sys 1m20.700s
Parameterset 4
time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e10E-6_i5.blast' -h 10E-6 -j 5 -C './reference_i5_e10E-6.chk'
real 61m59.284s
user 25m55.920s
sys 1m21.620s
Results
HHSearch
Installation
Preparing the HHM-Database
- Download pdb70_29May10.hhm.tar.gz from ftp://ftp.tuebingen.mpg.de/pub/protevo/HHsearch/databases/
- Unzip the archive
- Make a database: cat *.hhm >> pdb70.db
- Move the db to an appropriate directory: sudo mv pdb70.db ../pdb70.db
Configure HHSearch-Tools
In the manual of HHSearch it was adviced to add the information of the secondary structure to the multiple alignment used for the query. Therefore it was necessary to run the addpsipred script of HHSearch. This script was not configured in the virtual box. Several parameters have to be adjusted.
Changes in /apps/bin/addpsipred:
my $psipreddir="/apps/psipred_2.5";
my $ncbidir="/apps/blast_old/bin";
my $perl="/apps/bin";
my $dummydb="/home/student/tmp";
Copy /apps/bin/reformat to /apps/bin/reformat.pl
Running
Parameterset 1
time hhsearch -i reference.fasta -d /data/hmm/pdb70.db -b 500 -o reference_simple.hhsearch real 8m33.171s user 5m14.530s sys 0m3.510s
Parameterset 2
alignblast reference_psi_e10E-6_i3.blast reference_psi_e10E-6_i3.a3m addpsipred /home/student/workspace/reference_psi_e10E-6_i3.a3m time hhsearch -i reference_psi_e10E-6_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i3.hhsearch real 16m27.258s user 7m47.220s sys 0m6.290s
Parameterset 3
alignblast reference_psi_e005_i3.blast reference_psi_e005_i3.a3m addpsipred /home/student/workspace/reference_psi_e005_i3.a3m time hhsearch -i reference_psi_e005_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i3.hhsearch real 16m7.216s user 7m41.840s sys 0m5.570s
Parameterset 4
alignblast reference_psi_e005_i5.blast reference_psi_e005_i5.blast.a3m addpsipred /home/student/workspace/reference_psi_e005_i5.blast.a3m time hhsearch -i reference_psi_e005_i5.blast.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i5.blast.hhsearch real 7m49.907s user 7m15.310s sys 0m4.320s
Parameterset 5
alignblast reference_psi_e10E-6_i5.blast reference_psi_e10E-6_i5.a3m addpsipred /home/student/workspace/reference_psi_e10E-6_i5.a3m time hhsearch -i reference_psi_e10E-6_i5.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i5.hhsearch real 8m10.730s user 7m33.190s sys 0m5.390s
Comparing the Results
HSSP - Some Positives
Getting the entry of PAH from HSSP http://mrs.cmbi.ru.nl/mrs-5/entry?db=hssp&id=2pah&q=phenylalanine%20hydroxylase
HSSP - More Positives
hhsearch is run with a pdb-set. for blast was nr used. nr contains swissprot, refseq, PIR, PRF, PDB and GenBank CDS translations entries. hssp contains only swissprot entries. That's why a mapping of the swissprot-entries and the other databases is necessary. For this purpose we created a java-tool: