ARS A Sequence alignments

From Bioinformatikpedia
Revision as of 13:59, 12 April 2012 by Andrea (talk | contribs) (PSI-Blast)

In this task, we explore the sequence space around the Human lysosomal arylsulfatase A (ARS A).

Sequence searches

We compare different methods to search a database of non-redundant proteins (for details see protocol).

Blast

BLAST result: distribution of percent sequence identity of unique matches
BLAST result: distribution of eValues of unique matches
  • This simple search finds 3763 sequence matches with an e-value better (smaller) than the default 10.
  • Of these, 3120 have an e-value matching the default criterion for inclusion in the iterative BLAST.
  • Some of the sequence matches occur twice with different alignments. The number of unique sequence matches is: 3513.
  • The distributions of percent sequence identity and e-Values shows that there are many sequence matches between 20 and 40 percent sequence identity and that the majority of e-Values is around 10^-6.
  • The blast search in big_80 finds only one matching pdb entry. However, this is partly due to the way of clustering used for big_80 (based on CD_hit), where long sequences are preferred over shorter ones.

PSI-Blast

Runtimes depending on parameters
Runtime [s] j = 2 j = 10
h = 0.002 280 2111
h = 10E-10
  • Running iterative blasts takes a while (see table). The more iterations, the longer the run-time. However, decreasing the inclusion threshold speeds up the process since fewer sequences are added to the profile.
Number of unique matches depending on parameters
# unique matches j = 2 j = 10
h = 0.002
h = 10E-10
  • The number of unique matches increases with more iterations.

Comparison

Multiple sequence alignments