Difference between revisions of "ARS A Sequence alignments"

From Bioinformatikpedia
m (PSI-Blast)
m (PSI-Blast)
Line 32: Line 32:
 
|}
 
|}
   
* Running iterative blasts takes a while (see table). The more iterations, the longer the run-time. However, decreasing the inclusion threshold speeds up the process since fewer sequences are added to the profile.
+
* Running iterative blasts takes a while (see table "Runtimes depending on parameters"). The more iterations, the longer the run-time. However, decreasing the inclusion threshold speeds up the process since fewer sequences are added to the profile.
   
 
{| class="wikitable" style="float: right; border: 2px solid darkgray;" border="1"
 
{| class="wikitable" style="float: right; border: 2px solid darkgray;" border="1"
Line 50: Line 50:
 
|}
 
|}
   
  +
* The number of unique matches increases with more iterations (see table "Number of unique matches depending on parameters"). Notably, the number of unique matches increases with decreased inclusion threshold. Probably the PSSM built with stricter inclusion is more specific and therefore produces more significant hits.
* The number of unique matches increases with more iterations.
 
   
 
=== Comparison ===
 
=== Comparison ===

Revision as of 14:20, 12 April 2012

In this task, we explore the sequence space around the Human lysosomal arylsulfatase A (ARS A).

Sequence searches

We compare different methods to search a database of non-redundant proteins (for details see protocol).

Blast

BLAST result: distribution of percent sequence identity of unique matches
BLAST result: distribution of eValues of unique matches
  • This simple search finds 3763 sequence matches with an e-value better (smaller) than the default 10.
  • Of these, 3120 have an e-value matching the default criterion for inclusion in the iterative BLAST.
  • Some of the sequence matches occur twice with different alignments. The number of unique sequence matches is: 3513.
  • The distributions of percent sequence identity and e-Values shows that there are many sequence matches between 20 and 40 percent sequence identity and that the majority of e-Values is around 10^-6.
  • The blast search in big_80 finds only one matching pdb entry. However, this is partly due to the way of clustering used for big_80 (based on CD_hit), where long sequences are preferred over shorter ones.

PSI-Blast

Runtimes depending on parameters
Runtime [s] j = 2 j = 10
h = 0.002 280 2111
h = 10E-10 280
  • Running iterative blasts takes a while (see table "Runtimes depending on parameters"). The more iterations, the longer the run-time. However, decreasing the inclusion threshold speeds up the process since fewer sequences are added to the profile.
Number of unique matches depending on parameters
# unique matches j = 2 j = 10
h = 0.002 6421 7554
h = 10E-10 7115
  • The number of unique matches increases with more iterations (see table "Number of unique matches depending on parameters"). Notably, the number of unique matches increases with decreased inclusion threshold. Probably the PSSM built with stricter inclusion is more specific and therefore produces more significant hits.

Comparison

Multiple sequence alignments