Protocol search

From Bioinformatikpedia
Revision as of 15:51, 6 May 2012 by Angermue (talk | contribs) (Created page with "== Sources == The data and scripts we used can be found in <tt>/mnt/home/student/angermue/mp/tasks/task02/search</tt> == Calling blastpgp == File:run_blastpgp.sh is the scri…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Sources

The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search

Calling blastpgp

File:Run blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp
NAME=`basename $SEQ`
NAME=${NAME%.*}
j=${1:-1}
h=${2:-2e-3}
DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80}
DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \
  -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out

The resulting checkpoint file was than used to jumpstart blastpgp via File:Run blastpgp chk.sh:

#!/bin/bash

CHK=$1
SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DB=/mnt/project/pracstrucfunc12/data/big/big
DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ 
  -o $DIRBASENAME.bla > $DIRBASENAME.out

Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:

BLAST
 run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
PSI-BLAST_j2_h2e-3
 run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
PSI-BLAST_j10_h2e-3
 run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk

Calling HHblits

File:Run hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits
NAME=`basename $SEQ`
NAME=${NAME%.*}
DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current
n=${1:-2}
e=${1:-1e-3}
DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e)

/usr/bin/time -o $DIRBASENAME.time hhblits \
  -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \
  -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out

The relevant HHblits results were obtained as follows:

HHblits
run_hhblits.sh 2 1e-3