Difference between revisions of "Fabry:Sequence alignments (sequence searches and multiple alignments)/Journal"

From Bioinformatikpedia
(Results)
(Psi-Blast)
Line 15: Line 15:
 
=== Psi-Blast ===
 
=== Psi-Blast ===
   
  +
The following command was used to run Psi-Blast with AGAL as query sequence against big80. It was run with two and ten iterations configured and an e-value cut-off of 2e-3 and 1e-9, respectively.
Iterations: 2
 
  +
Evalue: 0.002
 
  +
$ bash [https://www.dropbox.com/sh/wsuttl48gkqkuyh/DSm9YUAknM/run_psi_blast.sh.txt run_psi_blast.sh] > run_psi_blast.log
 
  +
real 3m30.256s
 
  +
The log file contains the runtimes of the different psi-blast runs:
user 2m58.070s
 
  +
sys 0m13.360s
 
  +
{| class="wikitable" style="text-align: center;"
 
  +
|-
Iterations: 2
 
  +
! scope="col"| Iterations
Evalue: 0.000000001
 
  +
! scope="col"| E-value cut-off
 
  +
! scope="col"| Runtime
real 3m8.507s
 
  +
|-
user 3m5.180s
 
  +
| 2
sys 0m2.400s
 
  +
| 2e-3
 
  +
| 3m4.616s
Iterations: 2
 
  +
|-
Evalue: 0.0000000001
 
  +
| 2
 
  +
| 1e-9
real 3m10.271s
 
  +
| 3m9.002s
user 3m7.620s
 
  +
|-
sys 0m2.190s
 
  +
| 10
 
  +
| 2e-3
Iterations: 10
 
  +
| 15m20.813s
Evalue: 0.002
 
  +
|-
 
  +
| 10
real 15m29.218s
 
  +
| 1e-9
user 15m8.910s
 
  +
| 15m37.960s
sys 0m12.730s
 
  +
|-
 
  +
|}
Iterations: 10
 
  +
Evalue: 0.000000001
 
  +
Afterwards, the psi-blast output was parsed to collect the all the information about all the hits of the last iteration, which include the e-value, the sequence identity, the coverage in the longer sequence of the pairwise alignment and the length of the alignment. When there were more than one alignment per hit, we used the first one which was also listed in the short result output.
 
  +
real 16m33.748s
 
  +
$ for i in psi_results_*.txt; do \
user 16m12.500s
 
  +
perl [https://www.dropbox.com/sh/wsuttl48gkqkuyh/j5yPOvLQ-e/parse_psiblast.pl.txt parse_psiblast.pl] "$i" > "${i%.*}.stats" \
sys 0m13.080s
 
  +
done
 
Iterations: 10
 
Evalue: 0.0000000001
 
 
real 16m20.137s
 
user 15m55.910s
 
sys 0m13.190s
 
   
 
=== HHblits / HHsearch ===
 
=== HHblits / HHsearch ===

Revision as of 22:12, 6 May 2012

Please see Task 2 Results for our results on this topic. Please see also Task 2 Scripts for the used scripts.


Sequence searches

Blast

We searched the "big80" database with Blast with the following command:

blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P06280.fasta -m 0 -o blastsearch_default.out -v 700 -b 700
perl extract_ids_blast.pl blastsearch_default.out
perl ../download-annotation.pl blastsearch_default_ids.txt
perl ../compare_GO_terms.pl P06280 blastsearch_default_ids_GOterms.tsv
perl parse_blast.pl blastsearch_default.out

Psi-Blast

The following command was used to run Psi-Blast with AGAL as query sequence against big80. It was run with two and ten iterations configured and an e-value cut-off of 2e-3 and 1e-9, respectively.

$ bash run_psi_blast.sh > run_psi_blast.log

The log file contains the runtimes of the different psi-blast runs:

Iterations E-value cut-off Runtime
2 2e-3 3m4.616s
2 1e-9 3m9.002s
10 2e-3 15m20.813s
10 1e-9 15m37.960s

Afterwards, the psi-blast output was parsed to collect the all the information about all the hits of the last iteration, which include the e-value, the sequence identity, the coverage in the longer sequence of the pairwise alignment and the length of the alignment. When there were more than one alignment per hit, we used the first one which was also listed in the short result output.

$ for i in psi_results_*.txt; do \
    perl parse_psiblast.pl "$i" > "${i%.*}.stats" \
  done

HHblits / HHsearch

We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8) with the following commands:

time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_default.out -E 0.003  -z 700
./extract_ids_hhblits.sh hhblits_default.out
perl ../download-annotation.pl hhblits_default_ids.txt
perl ../compare_GO_terms.pl P06280 hhblits_default_ids_GOterms.tsv
perl parse_hhblits.pl hhblits_default.out

time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_n8_neu.out -E 0.003 -n 8 -z 800 -b 800
./extract_ids_hhblits.sh hhblits_n8_neu.out
perl ../download-annotation.pl hhblits_n8_neu_ids.txt
perl ../compare_GO_terms.pl P06280 hhblits_n8_neu_ids_GOterms.tsv
perl parse_hhblits.pl hhblits_n8_neu.out

R CMD BATCH hist_hhblits.R

Comparison

Venn diagrams created with Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.

  >R CMD BATCH all_Evalues.R


Multiple sequence alignments

Results

The following commands were used in our bash script calculate_msas.sh to generate the multiple sequence alignments. The pictures were obtained by using jalview.

$ clustalw -infile="<filename>.fasta" -outfile="msa/clustalw_<filename>.msa" &

$ muscle -in "<filename>.fasta" -out "msa/muscle_<filename>.msa" &

$ /mnt/opt/T-Coffee/bin/t_coffee -seq "<filename>.fasta" -outfile "msa/tcoffe_<filename>.msa" &

$ /mnt/opt/T-Coffee/bin/t_coffee -seq "<filename>.fasta" -method sap_pair -template_file "<filename>.pdb" \
    -outfile "msa/3Dcoffee_<filename>.msa" &


We counted the number of gaps and conserved columns with the perl script countGaps.pl. There is also a small wrapper script - countAllGaps.sh which basically runs countGaps.pl on all .msa files in a specific folder:

#!/bin/bash

for file in msa/*.msa; do
	perl countGaps.pl "$file" > "${file%.*}.counts"
done