Protocol search

From Bioinformatikpedia
Revision as of 16:03, 6 May 2012 by Angermue (talk | contribs)

Sources

The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search

Calling blastpgp

File:Run blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp
NAME=`basename $SEQ`
NAME=${NAME%.*}
j=${1:-1}
h=${2:-2e-3}
DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80}
DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \
  -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out

The resulting checkpoint file was than used to jumpstart blastpgp via File:Run blastpgp chk.sh:

#!/bin/bash

CHK=$1
SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DB=/mnt/project/pracstrucfunc12/data/big/big
DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ 
  -o $DIRBASENAME.bla > $DIRBASENAME.out

Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:

BLAST
 run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
PSI-BLAST_j2_h2e-3
 run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
PSI-BLAST_j10_h2e-3
 run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk

Calling HHblits

File:Run hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits
NAME=`basename $SEQ`
NAME=${NAME%.*}
DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current
n=${1:-2}
e=${1:-1e-3}
DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e)

/usr/bin/time -o $DIRBASENAME.time hhblits \
  -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \
  -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out

The relevant HHblits results were obtained as follows:

HHblits
run_hhblits.sh 2 1e-3

Further tools

We made use of following scripts for carrying out the analysis:

File:Hits blastpgp.pl Lists the e-evalue and sequence identity for all hits in a blastpgp output file.
File:Hits hhblits.pl Lists the e-evalue and sequence identity for all hits in a hhblits output file.
File:Hits hhblits.pl Lists the e-evalue and sequence identity for all hits in a hhblits output file.
File:Getids.sh Extracts the sequence identifiers of a hit list.
File:Filter evalue.sh Filters a hit list by evalue.
File:Filter id.sh Filters a hit list by sequence identity.
File:Uptr.sh Lists all >tr/>sp identifiers for a list of >UP20 identifiers.