Protocol search
Sources
The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search
Calling blastpgp
run_blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp NAME=`basename $SEQ` NAME=${NAME%.*} j=${1:-1} h=${2:-2e-3} DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80} DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \ -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out
The resulting checkpoint file was than used to jumpstart blastpgp via run_blastpgp_chk.sh:
#!/bin/bash CHK=$1 SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DB=/mnt/project/pracstrucfunc12/data/big/big DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ -o $DIRBASENAME.bla > $DIRBASENAME.out
Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:
- BLAST
run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
- PSI-BLAST_j2_h2e-3
run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
- PSI-BLAST_j10_h2e-3
run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk
Calling HHblits
run_hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits NAME=`basename $SEQ` NAME=${NAME%.*} DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current n=${1:-2} e=${1:-1e-3} DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e) /usr/bin/time -o $DIRBASENAME.time hhblits \ -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \ -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out
The relevant HHblits results were obtained as follows:
- HHblits
run_hhblits.sh 2 1e-3
Further tools
We made use of following scripts for carrying out the analysis:
hits_blastpgp.pl | Lists the e-evalue and sequence identity for all hits in a blastpgp output file. |
hits_hhblits.pl | Lists the e-evalue and sequence identity for all hits in a hhblits output file. |
hits_hhblits.pl | Lists the e-evalue and sequence identity for all hits in a hhblits output file. |
getids.sh | Extracts the sequence identifiers of a hit list. |
filter_evalue.sh | Filters a hit list by evalue. |
filter_id.sh | Filters a hit list by sequence identity. |
uptr.sh | Lists all >tr/>sp identifiers for a list of >UP20 identifiers. |