Difference between revisions of "Canavan Task 2 - Sequence alignments"

From Bioinformatikpedia
(HHBLITS)
(Task 2 : Canavans Disease)
Line 3: Line 3:
 
=== Sequence ===
 
=== Sequence ===
   
Native ASPA sequence:
+
The native ASPA sequence that we used for the current task is shown below:
   
 
UniProt: P45381
 
UniProt: P45381
Line 18: Line 18:
 
=== BLASTP ===
 
=== BLASTP ===
   
Run BlastP on student machines with big_80 database.
+
First, we ran BlastP on student machines with the big_80 as a reference database.
   
 
Command:
 
Command:
   
 
<code>blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out</code>
 
<code>blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out</code>
 
-p specifies the blast type<br>
 
-d database<br>
 
-i input file<br>
 
-o output file<br>
 
   
 
We got 195 resulting sequences, out of which the last 101 have EValues below E^-15.
 
We got 195 resulting sequences, out of which the last 101 have EValues below E^-15.
Line 34: Line 29:
 
===PSIBLAST===
 
===PSIBLAST===
   
Run PsiBlast on student machines with big_80 database.
+
PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database. Results:
   
*2 iterations, default Evalue 0.002
+
* Running 2 iterations and default E-Value 0.002
 
**<code> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_p45381_wt_big80.out -j 2</code>
 
**<code> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_p45381_wt_big80.out -j 2</code>
 
** time: ~2m30
 
** time: ~2m30
Line 43: Line 38:
 
** worst E-Value: 3E^-4
 
** worst E-Value: 3E^-4
 
** Results with best EValues are mostly Aspartoacylases
 
** Results with best EValues are mostly Aspartoacylases
** Sequences previously not found are mostly Succi..
+
** Sequences previously not found are mostly Succinylglutamate Desuccinylases
   
   
*2 iterations, E-value cutoff 10E-10
+
*2 iterations, more strict E-value cutoff of 10E-10
 
**<code> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e10_p45381_wt_big80.out -j 2 -h 10e-10</code>
 
**<code> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e10_p45381_wt_big80.out -j 2 -h 10e-10</code>
 
** time ~2m30
 
** time ~2m30

Revision as of 16:40, 6 May 2012

Task 2 : Canavans Disease

Sequence

The native ASPA sequence that we used for the current task is shown below:

UniProt: P45381

>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A)
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK
CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS
NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG
PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA
AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK
LTLNAKSIRCCLH

BLASTP

First, we ran BlastP on student machines with the big_80 as a reference database.

Command:

blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out

We got 195 resulting sequences, out of which the last 101 have EValues below E^-15. Most of the resulting proteins are Aspartoacylases of other species. Most of the results with EValue > E^-15 are Succinylglutamate Desuccinylases, which catalyze a reaction similar to Aspartoacylase.

PSIBLAST

PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database. Results:

  • Running 2 iterations and default E-Value 0.002
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_p45381_wt_big80.out -j 2
    • time: ~2m30
    • 500 results
    • best E-Value: E^-142
    • worst E-Value: 3E^-4
    • Results with best EValues are mostly Aspartoacylases
    • Sequences previously not found are mostly Succinylglutamate Desuccinylases


  • 2 iterations, more strict E-value cutoff of 10E-10
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e10_p45381_wt_big80.out -j 2 -h 10e-10
    • time ~2m30
    • 93 results
    • best E Value: E^-145
    • worst E Value: 2E-29
    • results mainly Aspartoacylases


  • 5 iterations, default Evalue 0.002
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it5_p45381_wt_big80.out -j 2


  • 10 iterations, default Evalue 0.002
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10
    • time: ~10m
    • converged after 8 rounds
    • 500 results
    • best E-Value 5E^-70
    • worst E-Value 8E^-38
    • most significant results include more Succinylglutamate Desuccinylases than Aspartoacylases


  • 10 iterations, E-value cutoff 10E-10
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10
    • time: ~10m
    • all 10 iterations were done (no early convergence)
    • 500 results
    • best E-Value: 7E^-70
    • worst E-Value: 1E^-38
    • aspartoacylases slightly more frequent in lower E-Values (< E-58), but no significant difference in E-Values for aspas and succis

HHBLITS

Run HHBlits on student machines with Uniprot20 database.

  • hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_def.out
    • time: 2m50
    • 274 results
    • best E-Value: 2E^-110
    • worst E-Value: 0.0011
    • mixed results with Aspa and Succi
  • hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_n10.out

-n number of iterations (def 2)
-e E Value CutOff (0.001)