Protocol search
Sources
The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search
Calling blastpgp
File:Run blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp NAME=`basename $SEQ` NAME=${NAME%.*} j=${1:-1} h=${2:-2e-3} DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80} DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \ -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out
The resulting checkpoint file was than used to jumpstart blastpgp via File:Run blastpgp chk.sh:
#!/bin/bash CHK=$1 SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DB=/mnt/project/pracstrucfunc12/data/big/big DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`) /usr/bin/time -o $DIRBASENAME.time blastpgp \ -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ -o $DIRBASENAME.bla > $DIRBASENAME.out
Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:
- BLAST
run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
- PSI-BLAST_j2_h2e-3
run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
- PSI-BLAST_j10_h2e-3
run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
- PSI-BLAST_j2_h1e-10
run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk
Calling HHblits
File:Run hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:
#!/bin/bash SEQ=/mnt/home/student/angermue/mp/data/P04062.seq DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits NAME=`basename $SEQ` NAME=${NAME%.*} DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current n=${1:-2} e=${1:-1e-3} DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e) /usr/bin/time -o $DIRBASENAME.time hhblits \ -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \ -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out
The relevant HHblits results were obtained as follows:
- HHblits
run_hhblits.sh 2 1e-3
Further tools
We made use of following scripts for carrying out the analysis:
File:Hits blastpgp.pl | Lists the e-evalue and sequence identity for all hits in a blastpgp output file. |
File:Hits hhblits.pl | Lists the e-evalue and sequence identity for all hits in a hhblits output file. |
File:Hits hhblits.pl | Lists the e-evalue and sequence identity for all hits in a hhblits output file. |
File:Getids.sh | Extracts the sequence identifiers of a hit list. |
File:Filter evalue.sh | Filters a hit list by evalue. |
File:Filter id.sh | Filters a hit list by sequence identity. |
File:Uptr.sh | Lists all >tr/>sp identifiers for a list of >UP20 identifiers. |