Sequence Alignments TSD
Sequence searches
There are several alignment methods provided by various initiatives, who tackle the problem of sequence searches. Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A protocol containing the basic steps taken is available.
Blast
The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required identity.
PSI-Blast
The PSI-Blast alignment was assessed in different constellations of e-value cutoff and iteration number for the profiles, see <xr id="tab:psiblast">.
<figtable id="tab:psiblast">
Iterations | 2 | 2 | 10 | 10 |
---|---|---|---|---|
E-value | 0.002 | 10E-10 | 0.002 | 10E-10 |
BIG80 | 3m53 | 4m3 | 18m57 | 21m9 |
BIG | 17m19 | 11m13 | 16m39 | 11m13 |
Table 1: Different performances of PSI-Blast. </figtable>