Difference between revisions of "Canavan Task 2 - Sequence alignments"

From Bioinformatikpedia
(BLASTP)
(BLASTP)
Line 24: Line 24:
 
We ran BlastP on student machines with the big_80 as a reference database.
 
We ran BlastP on student machines with the big_80 as a reference database.
   
  +
<table>
<table><tr><td>dgfdsg</td></tr></table>
 
  +
<tr><td>Parameters</td><td>default E-Value = 10</td><td> E-Value 10e-10</td></tr>
  +
<tr><td>results</td><td>196</td><td>94</td></tr>
  +
<tr><td>best E-Value</td><td>1e-155</td><td>1e-155</td></tr>
  +
<tr><td>worst E-Value</td><td>9.6</td><td>e-15</td></tr>
  +
<tr><td>comment</td><td>Most of the resulting proteins are Aspartoacylases of other species. Most of the results with EValue > e-15 are Succinylglutamate Desuccinylases, which catalyze a reaction similar to Aspartoacylase.</td><td>The results are the same as for the first run, just with an earlier cutoff</td></tr>
  +
  +
</table>
   
 
Command:
 
Command:

Revision as of 11:59, 7 May 2012

Task 2 : Canavans Disease

Sequence

The native ASPA sequence that we used for the current task is shown below:

UniProt: P45381

>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A)
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK
CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS
NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG
PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA
AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK
LTLNAKSIRCCLH



Search

BLASTP

We ran BlastP on student machines with the big_80 as a reference database.

Parametersdefault E-Value = 10 E-Value 10e-10
results19694
best E-Value1e-1551e-155
worst E-Value9.6e-15
commentMost of the resulting proteins are Aspartoacylases of other species. Most of the results with EValue > e-15 are Succinylglutamate Desuccinylases, which catalyze a reaction similar to Aspartoacylase.The results are the same as for the first run, just with an earlier cutoff

Command:

blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out

  • default E-Value cutoff (10)
    • 196 resulting sequences, out of which the last 101 have EValues below e-15.
    • best E-Value: e-155
    • worst E-Value: 9.6
    • Most of the resulting proteins are Aspartoacylases of other species. Most of the results with EValue > e-15 are Succinylglutamate Desuccinylases, which catalyze a reaction similar to Aspartoacylase.
  • more strict E-Value cutoff of 10e-10
    • 94 results
    • best E-Value: e-155
    • worst E-Value: e-15
    • The results are the same as for the first run, just with an earlier cutoff

PSIBLAST

PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database. Results:

  • Running 2 iterations and default E-Value 0.002
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_p45381_wt_big80.out -j 2
    • time: ~2m30
    • 500 results
    • best E-Value: E^-142
    • worst E-Value: 3E^-4
    • Results with best EValues are mostly Aspartoacylases
    • Sequences previously not found are mostly Succinylglutamate Desuccinylases


  • 2 iterations, more strict E-value cutoff of 10E-10
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e10_p45381_wt_big80.out -j 2 -h 10e-10
    • time ~2m30
    • 93 results
    • best E Value: E^-145
    • worst E Value: 2E-29
    • results mainly Aspartoacylases


  • 10 iterations, default Evalue 0.002
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10
    • time: ~10m
    • converged after 8 rounds
    • 500 results
    • best E-Value 5E^-70
    • worst E-Value 8E^-38
    • most significant results include more Succinylglutamate Desuccinylases than Aspartoacylases


  • 10 iterations, E-value cutoff 10E-10
    • blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10
    • time: ~10m
    • all 10 iterations were done (no early convergence)
    • 500 results
    • best E-Value: 7E^-70
    • worst E-Value: 1E^-38
    • aspartoacylases slightly more frequent in lower E-Values (< E-58), but no significant difference in E-Values for aspas and succis

HHBLITS

Run HHBlits on student machines with Uniprot20 database.

  • hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_def.out
    • time: 2m50
    • 274 results
    • best E-Value: 2e-110
    • worst E-Value: 0.0011
    • mixed results with Aspa and Succi
  • hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -o hhblits_p45381_n10.out
    • time: 6m
    • 500 results
    • best E-Value: 2.9e-68
    • worst E-Value: 9.5e-09
    • very varying results: Aspartoacylasen, Succinylasen, Zinc Proteins

-n number of iterations (def 2)


Summary and Comparison

Along with the expactations one can find more hits with Psi-Blast than with a simple Blast search.

In general, one can distinguish between two kinds of proteins, that frequently are identified by the sequence searches:

  • Aspartoacylases
  • Succinylglutamate Desuccinylases

Increasing the amount of iterations performed in a PSI-Blast search, obviously increases the running time. One can see, that the best ranked hits have lower E-Values than the best Hits of the runs with less iterations. Yet, there are more hits found with better E-Values, which is not surprising because more homologues of significant profile sequences will be found.

When restricting the E-Value Cutoff for the profile built-up, we found that more hits are classified as Aspartoacylases than as Succinylglutamate Desuccinylases. The running time, as well as the E-Values of the resulting hits did not change significantly.

Comparison of found sequences

hello