Difference between revisions of "Sequence Alignments TSD"

Revision as of 13:23, 4 May 2012

Sequence searches

There are several alignment methods provided by various initiatives, who tackle the problem of sequence searches. Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A protocol containing the basic steps taken is available.

Blast

The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required low sequence identity.

The results from the BIG80 database contain only Uniprot sequences, which can be explained by the clustering used for big_80 (based on CD_hit), where long sequences are preferred. All hits are unique.

PSI-Blast

The PSI-Blast alignment was assessed in different constellations of e-value cutoff and iteration number for the profiles. An appropriate output cutoff was also chosen to avoid irrelevant hits. This threshold was set to 1200 just like for the simple Blast search. At first PSI-Blast was used to compute the profiles from the BIG80 database. To provide the alignments with an extended search space these profiles were used to run Psi-Blast against the larger BIG database.

...

The performance for the different combinations of e-value and iterations for the search in the BIG80 database as well as the BIG database are shown in <xr id="tab:psiblast"/>.

<figtable id="tab:psiblast">

Iterations	2	2	10	10
E-value	0.002	10E-10	0.002	10E-10
BIG80	3m53	4m3	18m57	21m9
BIG	17m19	11m13	16m39	11m13

Table 1: Different performances of PSI-Blast. </figtable>

Overlap of all PSI-Blast runs against the BIG80 database

</figure>

@@ Line 3: / Line 3: @@
 Here some of them are applied for the Hex A protein and analyzed. For the searches non redundant protein databases are used. The outputs are adapted to each other and put in comparison in order to determine the best results. A [[Sequence Alignments Protocol TSD|protocol]] containing the basic steps taken is available.
 ===Blast===
-The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse  sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required identity.
+The first sequence similarity search with the Hex A protein was run with Blast. Here the default settings which provide an output of 250 alignments cover a just a small fraction of similar proteins as the e-value of the last hit receives a significantly low e-value of 3e-48. This shows that the sequence search can be continued and more sequences added safely. This is especially important because there are sequences with a comparably low sequence identity of 20% needed for the multiple sequence alignment. The sequence identity correlates with the hit rank of blast, meaning that with a worse  sequence identity the e-value is overall expected to increase. To manage between quality deterioration with a worse e-value and on the other hand the need for low sequence identity a limitation of the output sequences was chosen of 1200. Here the e-value does not go beyond 1e-4 and thus the quality of the alignment is still sufficient but there are also sequences aligned with the required low sequence identity.
+The results from the BIG80 database contain only Uniprot sequences, which can be explained by the clustering used for big_80 (based on CD_hit), where long sequences are preferred. All hits are unique.
 ===PSI-Blast===
 The PSI-Blast alignment was assessed in different constellations of e-value cutoff and iteration number for the profiles. An appropriate output cutoff was also chosen to avoid irrelevant hits. This threshold was set to 1200 just like for the simple Blast search.
-For the runs in which the profiles are computed the BIG80 database was used.
+At first PSI-Blast was used to compute the profiles from the BIG80 database. To provide the alignments with an extended search space these profiles were used to run Psi-Blast against the larger BIG database.
 ...
 The performance for the different combinations of e-value and iterations for the search in the BIG80 database as well as the BIG database are shown in <xr id="tab:psiblast"/>.

Difference between revisions of "Sequence Alignments TSD"

Revision as of 13:23, 4 May 2012

Contents

Sequence searches

Blast

PSI-Blast

HHBlits

Evaluation

Multiple sequence alignment

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools