Difference between revisions of "Task 2 - Alignments with PAH Reference"

From Bioinformatikpedia
Line 35: Line 35:
   
 
<code>
 
<code>
real 10m13.878s </br>
+
real 10m13.878s <br>
user 7m26.270s </br>
+
user 7m26.270s <br>
sys 0m20.230s </br>
+
sys 0m20.230s <br>
 
</code>
 
</code>
   
Line 49: Line 49:
   
 
<code>
 
<code>
real 37m56.447s</br>
+
real 37m56.447s<br>
user 14m27.620s</br>
+
user 14m27.620s<br>
sys 0m54.620s</br>
+
sys 0m54.620s<br>
 
</code>
 
</code>
   
Line 58: Line 58:
   
 
<code>
 
<code>
real 37m41.487s </br>
+
real 37m41.487s <br>
user 14m42.850s </br>
+
user 14m42.850s <br>
sys 0m52.370s </br>
+
sys 0m52.370s <br>
 
</code>
 
</code>
   
Line 67: Line 67:
   
 
<code>
 
<code>
real 62m22.175s </br>
+
real 62m22.175s <br>
user 26m25.410s </br>
+
user 26m25.410s <br>
sys 1m20.700s </br>
+
sys 1m20.700s <br>
 
</code>
 
</code>
   
Line 76: Line 76:
   
 
<code>
 
<code>
real 61m59.284s</br>
+
real 61m59.284s<br>
user 25m55.920s</br>
+
user 25m55.920s<br>
sys 1m21.620s</br>
+
sys 1m21.620s<br>
 
</code>
 
</code>
   
Line 98: Line 98:
 
Several parameters have to be adjusted.
 
Several parameters have to be adjusted.
   
* Changes in /apps/bin/addpsipred:</br>
+
* Changes in /apps/bin/addpsipred:<br>
 
<code>
 
<code>
my $psipreddir="/apps/psipred_2.5"; </br>
+
my $psipreddir="/apps/psipred_2.5"; <br>
my $ncbidir="/apps/blast_old/bin"; </br>
+
my $ncbidir="/apps/blast_old/bin"; <br>
my $perl="/apps/bin"; </br>
+
my $perl="/apps/bin"; <br>
my $dummydb="/home/student/tmp"; </br>
+
my $dummydb="/home/student/tmp"; <br>
 
</code>
 
</code>
 
* Copy /apps/bin/reformat to /apps/bin/reformat.pl
 
* Copy /apps/bin/reformat to /apps/bin/reformat.pl
Line 113: Line 113:
   
 
<code>
 
<code>
real 8m33.171s </br>
+
real 8m33.171s <br>
user 5m14.530s </br>
+
user 5m14.530s <br>
sys 0m3.510s </br>
+
sys 0m3.510s <br>
 
</code>
 
</code>
   
 
=====Parameterset 2=====
 
=====Parameterset 2=====
alignblast reference_psi_e10E-6_i3.blast reference_psi_e10E-6_i3.a3m </br>
+
alignblast reference_psi_e10E-6_i3.blast reference_psi_e10E-6_i3.a3m <br>
addpsipred /home/student/workspace/reference_psi_e10E-6_i3.a3m </br>
+
addpsipred /home/student/workspace/reference_psi_e10E-6_i3.a3m <br>
time hhsearch -i reference_psi_e10E-6_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i3.hhsearch </br>
+
time hhsearch -i reference_psi_e10E-6_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i3.hhsearch <br>
   
 
<code>
 
<code>
real 16m27.258s </br>
+
real 16m27.258s <br>
user 7m47.220s </br>
+
user 7m47.220s <br>
sys 0m6.290s </br>
+
sys 0m6.290s <br>
 
</code>
 
</code>
   
Line 136: Line 136:
   
 
=====Parameterset 3=====
 
=====Parameterset 3=====
alignblast reference_psi_e005_i3.blast reference_psi_e005_i3.a3m </br>
+
alignblast reference_psi_e005_i3.blast reference_psi_e005_i3.a3m <br>
addpsipred /home/student/workspace/reference_psi_e005_i3.a3m </br>
+
addpsipred /home/student/workspace/reference_psi_e005_i3.a3m <br>
time hhsearch -i reference_psi_e005_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i3.hhsearch </br>
+
time hhsearch -i reference_psi_e005_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i3.hhsearch <br>
   
 
<code>
 
<code>
real 16m7.216s</br>
+
real 16m7.216s<br>
user 7m41.840s</br>
+
user 7m41.840s<br>
sys 0m5.570s</br>
+
sys 0m5.570s<br>
 
</code>
 
</code>
   
 
=====Parameterset 4=====
 
=====Parameterset 4=====
alignblast reference_psi_e005_i5.blast reference_psi_e005_i5.blast.a3m </br>
+
alignblast reference_psi_e005_i5.blast reference_psi_e005_i5.blast.a3m <br>
addpsipred /home/student/workspace/reference_psi_e005_i5.blast.a3m </br>
+
addpsipred /home/student/workspace/reference_psi_e005_i5.blast.a3m <br>
time hhsearch -i reference_psi_e005_i5.blast.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i5.blast.hhsearch </br>
+
time hhsearch -i reference_psi_e005_i5.blast.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i5.blast.hhsearch <br>
   
 
<code>
 
<code>
real 7m49.907s</br>
+
real 7m49.907s<br>
user 7m15.310s</br>
+
user 7m15.310s<br>
sys 0m4.320s</br>
+
sys 0m4.320s<br>
 
</code>
 
</code>
   
 
=====Parameterset 5=====
 
=====Parameterset 5=====
alignblast reference_psi_e10E-6_i5.blast reference_psi_e10E-6_i5.a3m</br>
+
alignblast reference_psi_e10E-6_i5.blast reference_psi_e10E-6_i5.a3m<br>
addpsipred /home/student/workspace/reference_psi_e10E-6_i5.a3m</br>
+
addpsipred /home/student/workspace/reference_psi_e10E-6_i5.a3m<br>
time hhsearch -i reference_psi_e10E-6_i5.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i5.hhsearch</br>
+
time hhsearch -i reference_psi_e10E-6_i5.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i5.hhsearch<br>
   
 
<code>
 
<code>
real 8m10.730s</br>
+
real 8m10.730s<br>
user 7m33.190s</br>
+
user 7m33.190s<br>
sys 0m5.390s</br>
+
sys 0m5.390s<br>
 
<code>
 
<code>
   

Revision as of 11:27, 22 May 2011

Task 2 - Alignments with PAH Reference

Sequence Searches

BLAST

Running

time sudo blastall -p blastp -d '/data/blast/nr/nr' -i ./reference.fasta -o './reference.blast' -b 500

real 11m30.762s
user 3m11.440s
sys 0m12.250s

Results

FASTA

Installation

  • Used Virtual Box with Linux.

Running

time ./fasta36 /home/student/reference.fasta /data/nr/nr

interactive:

  • Enter filename for results []: /home/student/reference.fasta_search
  • How many scores do you want to see: 500
  • More scores? 0
  • Display alignments also? (y/n) [n] y
  • number of alignments [500]? 500

real 10m13.878s
user 7m26.270s
sys 0m20.230s

Results

PSI-BLAST

Running

Parameterset 1

time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e10E-6_i3.blast' -h 10E-6 -j 3 -C './reference_i3_e10E-6.chk'

real 37m56.447s
user 14m27.620s
sys 0m54.620s

Parameterset 2

time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e005_i3.blast' -h 0.005 -j 3 -C './reference_i3_e005.chk'

real 37m41.487s
user 14m42.850s
sys 0m52.370s

Parameterset 3

time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e005_i5.blast' -h 0.005 -j 5 -C './reference_i5_e005.chk'

real 62m22.175s
user 26m25.410s
sys 1m20.700s

Parameterset 4

time blastpgp -d '/data/nr/nr' -i './reference.fasta' -o './reference_psi_e10E-6_i5.blast' -h 10E-6 -j 5 -C './reference_i5_e10E-6.chk'

real 61m59.284s
user 25m55.920s
sys 1m21.620s

Results

HHSearch

Installation

Preparing the HHM-Database
Configure HHSearch-Tools

In the manual of HHSearch it was adviced to add the information of the secondary structure to the multiple alignment used for the query. Therefore it was necessary to run the addpsipred script of HHSearch. This script was not configured in the virtual box. Several parameters have to be adjusted.

  • Changes in /apps/bin/addpsipred:

my $psipreddir="/apps/psipred_2.5";
my $ncbidir="/apps/blast_old/bin";
my $perl="/apps/bin";
my $dummydb="/home/student/tmp";

  • Copy /apps/bin/reformat to /apps/bin/reformat.pl


Running

Parameterset 1

time hhsearch -i ./reference.fasta -d /data/hmm/pdb70.db -b 500 -o ./reference_simple.hhsearch

real 8m33.171s
user 5m14.530s
sys 0m3.510s

Parameterset 2

alignblast reference_psi_e10E-6_i3.blast reference_psi_e10E-6_i3.a3m
addpsipred /home/student/workspace/reference_psi_e10E-6_i3.a3m
time hhsearch -i reference_psi_e10E-6_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i3.hhsearch

real 16m27.258s
user 7m47.220s
sys 0m6.290s

To run HHSearch with the profile of a PSI-BLAST search the output of PSI-BLAST has to be converted to a multiple alignment by the script alignblast. The performance of HHSearch can be increases by adding the information of the secondary structure to the multiple alignment. This can be done with the script addspipred (uses the secondary structure prediction of PSIPRED).

Parameterset 3

alignblast reference_psi_e005_i3.blast reference_psi_e005_i3.a3m
addpsipred /home/student/workspace/reference_psi_e005_i3.a3m
time hhsearch -i reference_psi_e005_i3.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i3.hhsearch

real 16m7.216s
user 7m41.840s
sys 0m5.570s

Parameterset 4

alignblast reference_psi_e005_i5.blast reference_psi_e005_i5.blast.a3m
addpsipred /home/student/workspace/reference_psi_e005_i5.blast.a3m
time hhsearch -i reference_psi_e005_i5.blast.a3m -d /data/hmm/pdb70.db -o reference_psi_e005_i5.blast.hhsearch

real 7m49.907s
user 7m15.310s
sys 0m4.320s

Parameterset 5

alignblast reference_psi_e10E-6_i5.blast reference_psi_e10E-6_i5.a3m
addpsipred /home/student/workspace/reference_psi_e10E-6_i5.a3m
time hhsearch -i reference_psi_e10E-6_i5.a3m -d /data/hmm/pdb70.db -o reference_psi_e10E-6_i5.hhsearch

real 8m10.730s
user 7m33.190s
sys 0m5.390s

Comparing the Results

HSSP - Some Positives

Getting the entry of PAH from HSSP http://mrs.cmbi.ru.nl/mrs-5/entry?db=hssp&id=2pah&q=phenylalanine%20hydroxylase

HSSP - More Positives

hhsearch is run with a pdb-set. for blast was nr used. nr contains swissprot, refseq, PIR, PRF, PDB and GenBank CDS translations entries. hssp contains only swissprot entries. That's why a mapping of the swissprot-entries and the other databases is necessary. For this purpose we created a java-tool: