Difference between revisions of "Sequence Alignments TSD"

From Bioinformatikpedia
(PSI-Blast)
(Blast)
Line 3: Line 3:
 
Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A [[Sequence Alignments Protocol TSD|protocol]] containing the basic steps taken is available.
 
Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A [[Sequence Alignments Protocol TSD|protocol]] containing the basic steps taken is available.
 
===Blast===
 
===Blast===
  +
The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required identity.
   
 
===PSI-Blast===
 
===PSI-Blast===

Revision as of 20:40, 3 May 2012

Sequence searches

There are several alignment methods provided by various initiatives, who tackle the problem of sequence searches. Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A protocol containing the basic steps taken is available.

Blast

The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required identity.

PSI-Blast

Iterations 2 2 10 10
E-value 0.002 10E-10 0.002 10E-10
BIG80 3m53 4m3 18m57 21m9
BIG 17m19 11m13 16m39 11m13

Table 1: Different performances of PSI-Blast.


HHBlits