Difference between revisions of "ARS A Sequence alignments"
From Bioinformatikpedia
m (→Blast) |
m (→PSI-Blast) |
||
Line 17: | Line 17: | ||
=== PSI-Blast === |
=== PSI-Blast === |
||
− | # run PSI-Blast with different combinations of parameters |
||
− | > blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -m 8 -b 10000 -j 2 -i ARSA.fas -o ARSA.psiBlast.j2.hDefault |
||
− | 280.470u 30.420s 5:55.56 87.4% 0+0k 18005712+1880io 792pf+0w |
||
− | > blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -m 8 -b 10000 -j 10 -i ARSA.fas -o ARSA.psiBlast.j10.hDefault |
||
− | [blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048 |
||
− | 2111.750u 44.740s 36:55.25 97.3% 0+0k 12951072+11856io 1271pf+0w |
||
− | > blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -m 8 -b 10000 -j 2 -h 1e-10 -i ARSA.fas -o ARSA.psiBlast.j2.h1e-10 |
||
− | > blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -m 8 -b 10000 -j 10 -h 1e-10 -i ARSA.fas -o ARSA.psiBlast.j10.h1e-10 |
||
* Running iterative blasts takes a while (see table below). The more iterations the longer the run-time. However, decreasing the inclusion threshold speeds up the process. |
* Running iterative blasts takes a while (see table below). The more iterations the longer the run-time. However, decreasing the inclusion threshold speeds up the process. |
||
{| class="wikitable" |
{| class="wikitable" |
||
− | ! scope="col" align="left"| |
+ | ! scope="col" align="left"| Runtime [s] |
! scope="col"| j = 2 |
! scope="col"| j = 2 |
||
! scope="col"| j = 10 |
! scope="col"| j = 10 |
||
|- |
|- |
||
! scope="row" align="left" | h = 0.002 |
! scope="row" align="left" | h = 0.002 |
||
− | | align="right" | 280 |
+ | | align="right" | 280 |
− | | align="right" | 2111 |
+ | | align="right" | 2111 |
|- |
|- |
||
! scope="row" align="left" | h = 10E-10 |
! scope="row" align="left" | h = 10E-10 |
||
− | | align="right" | |
+ | | align="right" | |
| |
| |
||
|- |
|- |
||
|} |
|} |
||
+ | * The number of unique matches increases with more iterations. |
||
− | # To evaluate the final results, we have to dissect the output into separate files: Look for the first hit, find at what line numbers it occurs and then cut the files accordingly. |
||
+ | {| class="wikitable" |
||
− | grep -n G3IH84 ARSA.psiBlast.j2.hDefault |
||
+ | ! scope="col" align="left"| # unique matches |
||
− | tail -n +4679 ARSA.psiBlast.j2.hDefault > ARSA.psiBlast.j2.hDefault.lastIter |
||
+ | ! scope="col"| j = 2 |
||
+ | ! scope="col"| j = 10 |
||
+ | |- |
||
+ | ! scope="row" align="left" | h = 0.002 |
||
+ | | align="right" | |
||
+ | | align="right" | |
||
+ | |- |
||
+ | ! scope="row" align="left" | h = 10E-10 |
||
+ | | align="right" | |
||
+ | | |
||
+ | |- |
||
+ | |} |
||
=== Comparison === |
=== Comparison === |
Revision as of 13:43, 12 April 2012
In this task, we explore the sequence space around the Human lysosomal arylsulfatase A (ARS A).
Sequence searches
We compare different methods to search a database of non-redundant proteins (for details see protocol).
Blast
- This simple search finds 3763 sequence matches with an e-value better (smaller) than the default 10.
- Of these, 3120 have an e-value matching the default criterion for inclusion in the iterative BLAST.
- Some of the sequence matches occur twice with different alignments. The number of unique sequence matches is: 3513.
- The distributions of percent sequence identity and e-Values shows that there are many sequence matches between 20 and 40 percent sequence identity and that the majority of e-Values is around 10^-6.
- The blast search in big_80 finds only one matching pdb entry. However, this is partly due to the way of clustering used for big_80 (based on CD_hit), where long sequences are preferred over shorter ones.
PSI-Blast
- Running iterative blasts takes a while (see table below). The more iterations the longer the run-time. However, decreasing the inclusion threshold speeds up the process.
Runtime [s] | j = 2 | j = 10 |
---|---|---|
h = 0.002 | 280 | 2111 |
h = 10E-10 |
- The number of unique matches increases with more iterations.
# unique matches | j = 2 | j = 10 |
---|---|---|
h = 0.002 | ||
h = 10E-10 |