Protocol search
From Bioinformatikpedia
Revision as of 16:51, 6 May 2012 by Angermue (talk | contribs) (Created page with "== Sources == The data and scripts we used can be found in <tt>/mnt/home/student/angermue/mp/tasks/task02/search</tt> == Calling blastpgp == File:run_blastpgp.sh is the scri…")
Sources
The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search
Calling blastpgp
File:Run blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp NAME=`basename $SEQ` NAME=${NAME%.*} j=${1:-1} h=${2:-2e-3} DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80} DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \ -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out
The resulting checkpoint file was than used to jumpstart blastpgp via File:Run blastpgp chk.sh:
#!/bin/bash CHK=$1 SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DB=/mnt/project/pracstrucfunc12/data/big/big DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ -o $DIRBASENAME.bla > $DIRBASENAME.out
Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:
- BLAST
run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
- PSI-BLAST_j2_h2e-3
run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
- PSI-BLAST_j10_h2e-3
run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk
Calling HHblits
File:Run hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits NAME=`basename $SEQ` NAME=${NAME%.*} DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current n=${1:-2} e=${1:-1e-3} DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e) /usr/bin/time -o $DIRBASENAME.time hhblits \ -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \ -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out
The relevant HHblits results were obtained as follows:
- HHblits
run_hhblits.sh 2 1e-3