ARS A Sequence alignments
From Bioinformatikpedia
In this task, we explore the sequence space around the Human lysosomal arylsulfatase A (ARS A).
Sequence searches
We compare different methods to search a database of non-redundant proteins (for details see protocol).
Blast
- This simple search finds 3763 sequence matches with an e-value better (smaller) than the default 10.
- Of these, 3120 have an e-value matching the default criterion for inclusion in the iterative BLAST.
- Some of the sequence matches occur twice with different alignments. The number of unique sequence matches is: 3513.
- The distributions of percent sequence identity and e-Values shows that there are many sequence matches between 20 and 40 percent sequence identity and that the majority of e-Values is around 10^-6.
- The blast search in big_80 finds only one matching pdb entry. However, this is partly due to the way of clustering used for big_80 (based on CD_hit), where long sequences are preferred over shorter ones.
PSI-Blast
Runtime [s] | j = 2 | j = 10 |
---|---|---|
h = 0.002 | 280 | 2111 |
h = 10E-10 |
- Running iterative blasts takes a while (see table). The more iterations, the longer the run-time. However, decreasing the inclusion threshold speeds up the process since fewer sequences are added to the profile.
# unique matches | j = 2 | j = 10 |
---|---|---|
h = 0.002 | ||
h = 10E-10 |
- The number of unique matches increases with more iterations.