Difference between revisions of "Sequence Alignments HEXA"

From Bioinformatikpedia
(Sequence Alignments)
(Sequence Alignments)
Line 18: Line 18:
 
'''Result Statistics'''
 
'''Result Statistics'''
   
We wrote a script, which shows the distribution of the E-Value and the Identity and also the different aligned sequences. To analyse the overlap between the different methods, we drew a Venn diagram (with http://bioinfogp.cnb.csic.es/tools/venny/index.html). We compared the BLAST, FASTA and Psiblast method (PSIBLAST with 3 and 5 runs and E-Value cutoff from 10E-6)
+
We wrote a script, which shows the distribution of the E-Value and the Identity and also the different aligned sequences. To analyse the overlap between the different methods, we drew a Venn diagram (with http://bioinfogp.cnb.csic.es/tools/venny/index.html). We compared the BLAST, FASTA and Psiblast method (PsiBlast with 3 and 5 runs and E-Value cutoff from 10E-6)
  +
   
 
[[Image:comparison_ali.png|none|300px]]
 
[[Image:comparison_ali.png|none|300px]]
  +
  +
FASTA found more than 1000 matches, whereas the numbers of results of the blast methods is lower. Therefore, we can see, that FASTA aligns more sequences than blast and therefore, a lot of FASTA hits would be wrong.
  +
  +
We also decided to compare different runs of PsiBlast. We compared PsiBlast with 3 iterations and an e-Value Cutoff of 0.005 and 10E-6 and also two PsiBlast runs with 5 iterations and the same two e-Value cutoffs as before.
  +
  +
[[Image::comparison_psiblast|none|300px]]
   
 
== Multiple Alignments ==
 
== Multiple Alignments ==

Revision as of 10:14, 23 May 2011

Sequence Alignments

Sequence Searches:

  • FASTA

/bin/fasta36 seq.fasta /data/blast/nr/nr > fasta_out.txt

  • BLAST

blastall -p blastp -d /data/blast/nr/nr -i mult_seq.fasta > blast_out.txt

  • PSIBLAST

blastpgn -i seq.fasta -j <#iterations> -h <e-value threshold> -d /data/blast/nr/nr > psiblast_out.txt

  • HHSearch

For the HHSearch tool we used the online server for HHSearch.


Result Statistics

We wrote a script, which shows the distribution of the E-Value and the Identity and also the different aligned sequences. To analyse the overlap between the different methods, we drew a Venn diagram (with http://bioinfogp.cnb.csic.es/tools/venny/index.html). We compared the BLAST, FASTA and Psiblast method (PsiBlast with 3 and 5 runs and E-Value cutoff from 10E-6)


Comparison ali.png

FASTA found more than 1000 matches, whereas the numbers of results of the blast methods is lower. Therefore, we can see, that FASTA aligns more sequences than blast and therefore, a lot of FASTA hits would be wrong.

We also decided to compare different runs of PsiBlast. We compared PsiBlast with 3 iterations and an e-Value Cutoff of 0.005 and 10E-6 and also two PsiBlast runs with 5 iterations and the same two e-Value cutoffs as before.

[[Image::comparison_psiblast|none|300px]]

Multiple Alignments

  • Cobalt

Download Cobalt from ftp://ftp.ncbi.nlm.nih.gov/pub/cobalt/executables/2.0.1/ (ncbi-cobalt-2.0.1-x64-linux.tar). Uncompress the archive file with tar xfz ncbi-cobalt-2.0.1-x64-linux.tar and change directory to the uncompressed cobalt directoy. Call: ./cobalt -i mult_seq.fasta -norps T > cobalt_out.aln

  • ClustalW

clustalw -infile=mult_seq.fasta > clustalW_out.aln

  • Muscle

muscle -in mult_seq.fasta -out muscle_out.aln -clw

  • T-Coffee

t_coffee -seq mult_seq.fasta

  • T-Coffee (3D)

t_coffee -seq mult_seq.fasta -mode expresso