Difference between revisions of "Protocol search"

From Bioinformatikpedia
 
Line 3: Line 3:
   
 
== Calling blastpgp ==
 
== Calling blastpgp ==
[[File:run_blastpgp.sh]] is the script we used for performing several iterations blastpgp with <tt>P04062.seq</tt> as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:
+
<tt>run_blastpgp.sh</tt> is the script we used for performing several iterations blastpgp with <tt>P04062.seq</tt> as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:
   
 
<pre>
 
<pre>
Line 22: Line 22:
 
</pre>
 
</pre>
   
The resulting checkpoint file was than used to jumpstart blastpgp via [[File:run_blastpgp_chk.sh]]:
+
The resulting checkpoint file was than used to jumpstart blastpgp via <tt>run_blastpgp_chk.sh</tt>:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
Line 53: Line 53:
   
 
== Calling HHblits ==
 
== Calling HHblits ==
[[File:run_hhblits.sh]] is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:
+
<tt>run_hhblits.sh</tt> is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
Line 77: Line 77:
 
We made use of following scripts for carrying out the analysis:
 
We made use of following scripts for carrying out the analysis:
 
{|class="wikitable"
 
{|class="wikitable"
|[[File:hits_blastpgp.pl]] || Lists the e-evalue and sequence identity for all hits in a blastpgp output file.
+
|<tt>hits_blastpgp.pl</tt> || Lists the e-evalue and sequence identity for all hits in a blastpgp output file.
 
|-
 
|-
|[[File:hits_hhblits.pl]] || Lists the e-evalue and sequence identity for all hits in a hhblits output file.
+
|<tt>hits_hhblits.pl</tt> || Lists the e-evalue and sequence identity for all hits in a hhblits output file.
 
|-
 
|-
|[[File:hits_hhblits.pl]] || Lists the e-evalue and sequence identity for all hits in a hhblits output file.
+
|<tt>hits_hhblits.pl</tt> || Lists the e-evalue and sequence identity for all hits in a hhblits output file.
 
|-
 
|-
|[[File:getids.sh]] || Extracts the sequence identifiers of a hit list.
+
|<tt>getids.sh</tt> || Extracts the sequence identifiers of a hit list.
 
|-
 
|-
|[[File:filter_evalue.sh]] || Filters a hit list by evalue.
+
|<tt>filter_evalue.sh</tt> || Filters a hit list by evalue.
 
|-
 
|-
|[[File:filter_id.sh]] || Filters a hit list by sequence identity.
+
|<tt>filter_id.sh</tt> || Filters a hit list by sequence identity.
 
|-
 
|-
|[[File:uptr.sh]] || Lists all >tr/>sp identifiers for a list of >UP20 identifiers.
+
|<tt>uptr.sh</tt> || Lists all >tr/>sp identifiers for a list of >UP20 identifiers.
 
|}
 
|}

Latest revision as of 16:07, 6 May 2012

Sources

The data and scripts we used can be found in /mnt/home/student/angermue/mp/tasks/task02/search

Calling blastpgp

run_blastpgp.sh is the script we used for performing several iterations blastpgp with P04062.seq as query sequence. It takes the number of iterations, the inclusion threshold, and the databases as input arguments and creates a BLAST output files and also a BLAST checkout file which refers to the PSSM used in the last search iteration:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/blastpgp
NAME=`basename $SEQ`
NAME=${NAME%.*}
j=${1:-1}
h=${2:-2e-3}
DB=${3:-/mnt/project/pracstrucfunc12/data/big/big_80}
DIRBASENAME=$(printf '%s/blastpgp_i%s_d%s_j%d_h%g' $DIR $NAME `basename $DB` $j $h)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -d $DB -a 2 -e 10 -v 10000 -b 10000 -j $j -h $h \
  -o $DIRBASENAME.bla -C $DIRBASENAME.chk > $DIRBASENAME.out

The resulting checkpoint file was than used to jumpstart blastpgp via run_blastpgp_chk.sh:

#!/bin/bash

CHK=$1
SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DB=/mnt/project/pracstrucfunc12/data/big/big
DIRBASENAME=$(printf '%s_d%s' $CHK `basename $DB`)

/usr/bin/time -o $DIRBASENAME.time blastpgp \
  -i $SEQ -R $CHK -d $DB -a 2 -e 10 -v 10000 -b 10000 -j 1 \ 
  -o $DIRBASENAME.bla > $DIRBASENAME.out

Altogether, following commands were executed to obtain the relevant BLAST/PSI-BLAST results:

BLAST
 run_blastpgp.sh 1 2e-3 /mnt/project/pracstrucfunc12/data/big/big
PSI-BLAST_j2_h2e-3
 run_blastpgp.sh 2 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 2 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j2_h1e-10.chk
PSI-BLAST_j10_h2e-3
 run_blastpgp.sh 10 2e-3 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h0.002.chk
PSI-BLAST_j2_h1e-10
 run_blastpgp.sh 10 1e-10 /mnt/project/pracstrucfunc12/data/big/big_80
 run_blastpgp_chk.sh /mnt/home/student/angermue/mp/tasks/task02/search/blastpgp/blastpgp_iP04062_dbig_80_j10_h1e-10.chk

Calling HHblits

run_hhblits.sh is the caller script for HHblits which takes the number of iterations and the inclusion value as arguments:

#!/bin/bash

SEQ=/mnt/home/student/angermue/mp/data/P04062.seq
DIR=/mnt/home/student/angermue/mp/tasks/task02/search/hhblits
NAME=`basename $SEQ`
NAME=${NAME%.*}
DB=/mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current
n=${1:-2}
e=${1:-1e-3}
DIRBASENAME=$(printf '%s/hhblits_i%s_d%s_n%d_e%g' $DIR $NAME `basename $DB` $n $e)

/usr/bin/time -o $DIRBASENAME.time hhblits \
  -i $SEQ -d $DB -cpu 2 -E 10 -B 10000 -Z 10000 -n $n -e $e \
  -o $DIRBASENAME.hhr -oa3m $DIRBASENAME.a3m > $DIRBASENAME.out

The relevant HHblits results were obtained as follows:

HHblits
run_hhblits.sh 2 1e-3

Further tools

We made use of following scripts for carrying out the analysis:

hits_blastpgp.pl Lists the e-evalue and sequence identity for all hits in a blastpgp output file.
hits_hhblits.pl Lists the e-evalue and sequence identity for all hits in a hhblits output file.
hits_hhblits.pl Lists the e-evalue and sequence identity for all hits in a hhblits output file.
getids.sh Extracts the sequence identifiers of a hit list.
filter_evalue.sh Filters a hit list by evalue.
filter_id.sh Filters a hit list by sequence identity.
uptr.sh Lists all >tr/>sp identifiers for a list of >UP20 identifiers.