Difference between revisions of "CD task2 protocol"

From Bioinformatikpedia
(PsiBlast)
(HHBlits)
 
(4 intermediate revisions by the same user not shown)
Line 111: Line 111:
   
 
#hits GO terms
 
#hits GO terms
  +
564 hydrolase activity, acting on ester bonds
754 metabolic process
 
  +
564 metabolic process
752 hydrolase activity, acting on ester bonds
 
599 hydrolase activity
+
443 hydrolase activity
588 metal ion binding
+
433 metal ion binding
256 zinc ion binding
+
160 zinc ion binding
  +
114 arginine metabolic process
121 proteolysis
 
  +
114 arginine catabolic process to glutamate
118 metallocarboxypeptidase activity
 
  +
113 succinylglutamate desuccinylase activity
117 arginine catabolic process to glutamate
 
  +
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
117 arginine metabolic process
 
116 succinylglutamate desuccinylase activity
+
60 aspartoacylase activity
  +
28 proteolysis
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
 
60 aspartoacylase activity
+
27 metallocarboxypeptidase activity
  +
21 arginine catabolic process to succinate
57 carboxypeptidase activity
 
  +
12 carboxypeptidase activity
22 arginine catabolic process to succinate
 
9 cytoplasm
+
8 cytoplasm
  +
4 apical plasma membrane
6 ATP binding
 
  +
4 aminoacylase activity
5 membrane
 
  +
3 plasma membrane
5 transferase activity
 
  +
3 identical protein binding
4 methyltransferase activity
 
  +
3 membrane
4 methylation
 
  +
2 nucleus
4 apical plasma membrane
 
  +
2 arginine catabolic process
4 aminoacylase activity
 
  +
2 transferase activity
4 regulation of transcription, DNA-dependent
 
  +
3 identical protein binding
 
3 nucleotide binding
 
3 plasma membrane
 
3 nucleus
 
3 catalytic activity
 
2 sequence-specific DNA binding transcription factor activity
 
2 intracellular
 
2 kinase activity
 
2 phosphorylation
 
2 peptidase activity
 
2 molecular_function
 
2 signal transducer activity
 
2 arginine catabolic process
 
2 DNA binding
 
2 biological_process
 
2 signal transduction
 
2 integral to membrane
 
   
   
Line 158: Line 142:
 
<b>GO Term Enrichment (all terms represented more than once)</b>
 
<b>GO Term Enrichment (all terms represented more than once)</b>
 
#hits GO term
 
#hits GO term
766 hydrolase activity, acting on ester bonds
+
480 hydrolase activity, acting on ester bonds
665 metabolic process
+
480 metabolic process
507 hydrolase activity
+
374 hydrolase activity
488 metal ion binding
+
363 metal ion binding
294 zinc ion binding
+
148 zinc ion binding
234 arginine catabolic process to glutamate
+
108 arginine metabolic process
  +
108 arginine catabolic process to glutamate
176 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
 
158 aspartoacylase activity
+
107 succinylglutamate desuccinylase activity
  +
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
132 arginine metabolic process
 
132 succinylglutamate desuccinylase activity
+
60 aspartoacylase activity
  +
22 proteolysis
30 cytoplasm
 
  +
21 metallocarboxypeptidase activity
24 arginine catabolic process to succinate
 
  +
19 arginine catabolic process to succinate
23 proteolysis
 
22 metallocarboxypeptidase activity
+
9 carboxypeptidase activity
  +
8 cytoplasm
12 apical plasma membrane
 
  +
4 apical plasma membrane
9 carboxypeptidase activity
 
  +
4 aminoacylase activity
8 nucleus
 
  +
3 plasma membrane
8 aminoacylase activity
 
  +
3 identical protein binding
8 membrane
 
  +
3 membrane
6 identical protein binding
 
  +
2 nucleus
6 oxidation-reduction process
 
  +
2 transferase activity
6 plasma membrane
 
5 oxidoreductase activity
 
2 malate synthase activity
 
2 glyoxylate cycle
 
2 exonuclease activity
 
2 nucleotide binding
 
2 acetate metabolic process
 
2 arginine catabolic process
 
2 cell adhesion
 
2 virus-host interaction
 
2 integral to membrane
 
2 transferase activity
 
   
 
 
Line 198: Line 171:
 
<b>10 iterations, default Evalue 0.002 </b>
 
<b>10 iterations, default Evalue 0.002 </b>
 
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>
 
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>
  +
  +
  +
<b>GO Term Enrichment (all terms represented more than once)</b>
  +
  +
  +
#hits GO term
  +
2352 zinc ion binding
  +
2217 proteolysis
  +
2214 metallocarboxypeptidase activity
  +
1438 carboxypeptidase activity
  +
953 metabolic process
  +
940 hydrolase activity, acting on ester bonds
  +
885 hydrolase activity
  +
783 metal ion binding
  +
118 arginine catabolic process to glutamate
  +
118 arginine metabolic process
  +
116 succinylglutamate desuccinylase activity
  +
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
  +
85 peptidase activity
  +
60 aspartoacylase activity
  +
55 metallopeptidase activity
  +
33 cell wall macromolecule catabolic process
  +
32 cytoplasm
  +
28 cell adhesion
  +
23 cytosol
  +
22 membrane
  +
22 arginine catabolic process to succinate
  +
20 nucleus
  +
18 cellular_component
  +
17 cellular cell wall organization
  +
16 vacuole
  +
16 tubulin binding
  +
15 extracellular region
  +
13 serine-type carboxypeptidase activity
  +
11 extracellular space
  +
9 protein side chain deglutamylation
  +
9 C-terminal protein deglutamylation
  +
9 plasma membrane
  +
8 protein deglutamylation
  +
8 biological_process
  +
8 transferase activity
  +
7 protein branching point deglutamylation
  +
7 molecular_function
  +
6 cell wall
  +
6 catalytic activity
  +
6 integral to membrane
  +
6 regulation of transcription, DNA-dependent
  +
5 mitochondrion organization
  +
5 mitochondrion
  +
5 ATP binding
  +
4 transcription corepressor activity
  +
4 regulation of root meristem growth
  +
4 aminoacylase activity
  +
4 apical plasma membrane
  +
3 methylation
  +
3 cerebellar Purkinje cell differentiation
  +
3 carbohydrate binding
  +
3 eye photoreceptor cell differentiation
  +
3 methyltransferase activity
  +
3 identical protein binding
  +
3 neuromuscular process
  +
3 olfactory bulb development
  +
3 DNA binding
  +
3 pathogenesis
  +
2 protein binding
  +
2 sequence-specific DNA binding transcription factor activity
  +
2 flagellum
  +
2 intracellular
  +
2 transferase activity, transferring phosphorus-containing groups
  +
2 carbohydrate metabolic process
  +
2 transferase activity, transferring glycosyl groups
  +
2 calcium ion binding
  +
2 regulation of angiotensin levels in blood
  +
2 fungal-type vacuole
  +
2 proteinaceous extracellular matrix
  +
2 cytoplasmic vesicle
  +
2 protein kinase activity
  +
2 response to chemical stimulus
  +
2 inorganic diphosphatase activity
  +
2 oxidation-reduction process
  +
2 purine-nucleoside phosphorylase activity
  +
2 fungal-type cell wall organization
  +
2 arginine catabolic process
  +
2 secretory granule
  +
2 extracellular matrix
  +
2 protein phosphorylation
  +
2 polysaccharide catabolic process
   
   
 
<b>10 iterations, E-value cutoff 10E-10 </b>
 
<b>10 iterations, E-value cutoff 10E-10 </b>
 
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>
 
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>
  +
  +
  +
<b>GO Term Enrichment (all terms represented more than once)</b>
  +
  +
#hits GO term
  +
965 metabolic process
  +
964 hydrolase activity, acting on ester bonds
  +
790 hydrolase activity
  +
754 metal ion binding
  +
639 zinc ion binding
  +
504 proteolysis
  +
502 metallocarboxypeptidase activity
  +
337 carboxypeptidase activity
  +
118 arginine metabolic process
  +
118 arginine catabolic process to glutamate
  +
116 succinylglutamate desuccinylase activity
  +
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
  +
60 aspartoacylase activity
  +
22 arginine catabolic process to succinate
  +
11 peptidase activity
  +
11 cytoplasm
  +
7 nucleus
  +
7 membrane
  +
6 plasma membrane
  +
6 metallopeptidase activity
  +
5 serine-type carboxypeptidase activity
  +
5 biological_process
  +
4 transferase activity
  +
4 regulation of root meristem growth
  +
4 cell wall macromolecule catabolic process
  +
4 apical plasma membrane
  +
4 aminoacylase activity
  +
3 vacuole
  +
3 molecular_function
  +
3 methyltransferase activity
  +
3 methylation
  +
3 integral to membrane
  +
3 identical protein binding
  +
3 cellular_component
  +
3 cellular cell wall organization
  +
2 transferase activity, transferring glycosyl groups
  +
2 purine-nucleoside phosphorylase activity
  +
2 polysaccharide catabolic process
  +
2 fungal-type vacuole
  +
2 fungal-type cell wall organization
  +
2 catalytic activity
  +
2 arginine catabolic process
   
 
==HHBlits==
 
==HHBlits==
Line 207: Line 314:
   
 
*<b> 2 iterations </b>
 
*<b> 2 iterations </b>
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_def.out </source>
+
<source lang="bash"> time hhblits -i p45381_wt.fa -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_2it_0.002.out -e 0.002 -z 1000 -v 1000 </source>
  +
*<b> 8 iterations </b>
 
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>
 
   
 
*<b> 2 iterations, -e 10e-10 </b>
 
*<b> 2 iterations, -e 10e-10 </b>
 
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
 
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
  +
 
*<b> 8 iterations, -e 10e-10 </b>
 
*<b> 8 iterations, -e 10e-10 </b>
 
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>
 
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>
  +
  +
*<b> 8 iterations </b>
  +
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>

Latest revision as of 16:52, 31 August 2012

Sequence

The native ASPA sequence (UniProt: P45381):

>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A)
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK
CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS
NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG
PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA
AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK
LTLNAKSIRCCLH

GO term enrichment

<source lang="java">

for(int i = 0; i< seq_id.length; i++ ){

   // URL for annotations from QuickGO for one protein
   URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv");
   // Connect
   HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection();
   // Get data
   BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); 
   List<String> columns=Arrays.asList(rd.readLine().split("\t"));
   int idIndex=columns.indexOf("GO ID");
   int nameIndex=columns.indexOf("GO Name");
   String line;
   if(rd.ready()) count_go_prots++;
   Set<String> names = new HashSet<String>();
   while ((line=rd.readLine())!=null) {

// Split them into fields String[] fields=line.split("\t"); if(!names.contains(fields[nameIndex])){ names.add(fields[nameIndex]); if(this.go_ids.containsKey(fields[idIndex])) { int count = Integer.parseInt(this.go_ids.get(fields[idIndex])[1]); this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); } else{ this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); }

       }
   }
   // close input when finished
   rd.close();

} </source>

BlastP

We ran BlastP on student machines with the big_80 as a reference database.

<source lang="bash"> blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out </source>

default E-Value 10 - GO Term Enrichment (hit more than once)

#hits   GO term
185	 metabolic process
184	 hydrolase activity, acting on ester bonds
133	 hydrolase activity
125	 metal ion binding
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
60	 aspartoacylase activity
44	 zinc ion binding
24	 arginine metabolic process
24	 arginine catabolic process to glutamate
23	 succinylglutamate desuccinylase activity
8	 cytoplasm
5	 arginine catabolic process to succinate
4	 apical plasma membrane
4	 aminoacylase activity
4	 membrane
3	 plasma membrane
3	 identical protein binding
2	 nucleus
2	 intracellular
2	 exonuclease activity
2	 nucleotide binding
2	 nucleic acid binding
2	 oxidoreductase activity
2	 oxidation-reduction process

E-Value 10e-10 - GO Term Enrichment (hit more than once)

#hits   GO term
94	 hydrolase activity, acting on ester bonds
94	 metabolic process
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
66	 hydrolase activity
62	 metal ion binding
58	 aspartoacylase activity
19	 zinc ion binding
8	 cytoplasm
4	 apical plasma membrane
4	 aminoacylase activity
3	 plasma membrane
3	 identical protein binding
3	 membrane
2	 nucleus

PsiBlast

PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database.

2 iterations and default E-Value 0.002 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i p45381_wt.fa -o psiblast_it2_h0002_1000.out -j 2 -h 0.002 -v 1000 -b 1000</source>

GO Term Enrichment (all terms represented more than once)

#hits GO terms
564	 hydrolase activity, acting on ester bonds
564	 metabolic process
443	 hydrolase activity
433	 metal ion binding
160	 zinc ion binding
114	 arginine metabolic process
114	 arginine catabolic process to glutamate
113	 succinylglutamate desuccinylase activity
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
60	 aspartoacylase activity
28	 proteolysis
27	 metallocarboxypeptidase activity
21	 arginine catabolic process to succinate
12	 carboxypeptidase activity
8	 cytoplasm
4	 apical plasma membrane
4	 aminoacylase activity
3	 plasma membrane
3	 identical protein binding
3	 membrane
2	 nucleus
2	 arginine catabolic process
2	 transferase activity


2 iterations, more strict E-value cutoff of 10E-10 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e-10_1000.out -j 2 -h 10e-10 -v 1000 -b 1000</source>

GO Term Enrichment (all terms represented more than once)

#hits  GO term
480	 hydrolase activity, acting on ester bonds
480	 metabolic process
374	 hydrolase activity
363	 metal ion binding
148	 zinc ion binding
108	 arginine metabolic process
108	 arginine catabolic process to glutamate
107	 succinylglutamate desuccinylase activity
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
60	 aspartoacylase activity
22	 proteolysis
21	 metallocarboxypeptidase activity
19	 arginine catabolic process to succinate
9	 carboxypeptidase activity
8	 cytoplasm
4	 apical plasma membrane
4	 aminoacylase activity
3	 plasma membrane
3	 identical protein binding
3	 membrane
2	 nucleus
2	 transferase activity



10 iterations, default Evalue 0.002 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>


GO Term Enrichment (all terms represented more than once)


#hits   GO term
2352	 zinc ion binding
2217	 proteolysis
2214	 metallocarboxypeptidase activity
1438	 carboxypeptidase activity
953	 metabolic process
940	 hydrolase activity, acting on ester bonds
885	 hydrolase activity
783	 metal ion binding
118	 arginine catabolic process to glutamate
118	 arginine metabolic process
116	 succinylglutamate desuccinylase activity
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
85	 peptidase activity
60	 aspartoacylase activity
55	 metallopeptidase activity
33	 cell wall macromolecule catabolic process
32	 cytoplasm
28	 cell adhesion
23	 cytosol
22	 membrane
22	 arginine catabolic process to succinate
20	 nucleus
18	 cellular_component
17	 cellular cell wall organization
16	 vacuole
16	 tubulin binding
15	 extracellular region
13	 serine-type carboxypeptidase activity
11	 extracellular space
9	 protein side chain deglutamylation
9	 C-terminal protein deglutamylation
9	 plasma membrane
8	 protein deglutamylation
8	 biological_process
8	 transferase activity
7	 protein branching point deglutamylation
7	 molecular_function
6	 cell wall
6	 catalytic activity
6	 integral to membrane
6	 regulation of transcription, DNA-dependent
5	 mitochondrion organization
5	 mitochondrion
5	 ATP binding
4	 transcription corepressor activity
4	 regulation of root meristem growth
4	 aminoacylase activity
4	 apical plasma membrane
3	 methylation
3	 cerebellar Purkinje cell differentiation
3	 carbohydrate binding
3	 eye photoreceptor cell differentiation
3	 methyltransferase activity
3	 identical protein binding
3	 neuromuscular process
3	 olfactory bulb development
3	 DNA binding
3	 pathogenesis
2	 protein binding
2	 sequence-specific DNA binding transcription factor activity
2	 flagellum
2	 intracellular
2	 transferase activity, transferring phosphorus-containing groups
2	 carbohydrate metabolic process
2	 transferase activity, transferring glycosyl groups
2	 calcium ion binding
2	 regulation of angiotensin levels in blood
2	 fungal-type vacuole
2	 proteinaceous extracellular matrix
2	 cytoplasmic vesicle
2	 protein kinase activity
2	 response to chemical stimulus
2	 inorganic diphosphatase activity
2	 oxidation-reduction process
2	 purine-nucleoside phosphorylase activity
2	 fungal-type cell wall organization
2	 arginine catabolic process
2	 secretory granule
2	 extracellular matrix
2	 protein phosphorylation
2	 polysaccharide catabolic process


10 iterations, E-value cutoff 10E-10 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>


GO Term Enrichment (all terms represented more than once)

#hits   GO term
965	 metabolic process 
964	 hydrolase activity, acting on ester bonds 
790	 hydrolase activity 
754	 metal ion binding 
639	 zinc ion binding 
504	 proteolysis 
502	 metallocarboxypeptidase activity 
337	 carboxypeptidase activity 
118	 arginine metabolic process 
118	 arginine catabolic process to glutamate 
116	 succinylglutamate desuccinylase activity 
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 
60	 aspartoacylase activity 
22	 arginine catabolic process to succinate 
11	 peptidase activity 
11	 cytoplasm 
7	 nucleus 
7	 membrane 
6	 plasma membrane 
6	 metallopeptidase activity 
5	 serine-type carboxypeptidase activity 
5	 biological_process 
4	 transferase activity 
4	 regulation of root meristem growth 
4	 cell wall macromolecule catabolic process 
4	 apical plasma membrane 
4	 aminoacylase activity 
3	 vacuole 
3	 molecular_function 
3	 methyltransferase activity 
3	 methylation 
3	 integral to membrane 
3	 identical protein binding 
3	 cellular_component 
3	 cellular cell wall organization 
2	 transferase activity, transferring glycosyl groups 
2	 purine-nucleoside phosphorylase activity 
2	 polysaccharide catabolic process 
2	 fungal-type vacuole 
2	 fungal-type cell wall organization 
2	 catalytic activity 
2	 arginine catabolic process

HHBlits

Run HHBlits on student machines with Uniprot20 database.

  • 2 iterations
 <source lang="bash"> time hhblits -i p45381_wt.fa -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_2it_0.002.out -e 0.002 -z 1000 -v 1000 </source>


  • 2 iterations, -e 10e-10
 <source lang="bash">  hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
  • 8 iterations, -e 10e-10

<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>

  • 8 iterations

<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>