Difference between revisions of "CD task2 protocol"

From Bioinformatikpedia
(PsiBlast)
(PsiBlast)
Line 158: Line 158:
 
<b>GO Term Enrichment (all terms represented more than once)</b>
 
<b>GO Term Enrichment (all terms represented more than once)</b>
 
#hits GO term
 
#hits GO term
766 hydrolase activity, acting on ester bonds
+
480 hydrolase activity, acting on ester bonds
665 metabolic process
+
480 metabolic process
507 hydrolase activity
+
374 hydrolase activity
488 metal ion binding
+
363 metal ion binding
294 zinc ion binding
+
148 zinc ion binding
234 arginine catabolic process to glutamate
+
108 arginine metabolic process
  +
108 arginine catabolic process to glutamate
176 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
 
158 aspartoacylase activity
+
107 succinylglutamate desuccinylase activity
  +
88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
132 arginine metabolic process
 
132 succinylglutamate desuccinylase activity
+
60 aspartoacylase activity
  +
22 proteolysis
30 cytoplasm
 
  +
21 metallocarboxypeptidase activity
24 arginine catabolic process to succinate
 
  +
19 arginine catabolic process to succinate
23 proteolysis
 
22 metallocarboxypeptidase activity
+
9 carboxypeptidase activity
  +
8 cytoplasm
12 apical plasma membrane
 
  +
4 apical plasma membrane
9 carboxypeptidase activity
 
  +
4 aminoacylase activity
8 nucleus
 
  +
3 plasma membrane
8 aminoacylase activity
 
  +
3 identical protein binding
8 membrane
 
  +
3 membrane
6 identical protein binding
 
  +
2 nucleus
6 oxidation-reduction process
 
  +
2 transferase activity
6 plasma membrane
 
5 oxidoreductase activity
 
2 malate synthase activity
 
2 glyoxylate cycle
 
2 exonuclease activity
 
2 nucleotide binding
 
2 acetate metabolic process
 
2 arginine catabolic process
 
2 cell adhesion
 
2 virus-host interaction
 
2 integral to membrane
 
2 transferase activity
 
   
 
 

Revision as of 15:03, 29 August 2012

Sequence

The native ASPA sequence (UniProt: P45381):

>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A)
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK
CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS
NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG
PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA
AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK
LTLNAKSIRCCLH

GO term enrichment

<source lang="java">

for(int i = 0; i< seq_id.length; i++ ){

   // URL for annotations from QuickGO for one protein
   URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv");
   // Connect
   HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection();
   // Get data
   BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); 
   List<String> columns=Arrays.asList(rd.readLine().split("\t"));
   int idIndex=columns.indexOf("GO ID");
   int nameIndex=columns.indexOf("GO Name");
   String line;
   if(rd.ready()) count_go_prots++;
   Set<String> names = new HashSet<String>();
   while ((line=rd.readLine())!=null) {

// Split them into fields String[] fields=line.split("\t"); if(!names.contains(fields[nameIndex])){ names.add(fields[nameIndex]); if(this.go_ids.containsKey(fields[idIndex])) { int count = Integer.parseInt(this.go_ids.get(fields[idIndex])[1]); this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); } else{ this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); }

       }
   }
   // close input when finished
   rd.close();

} </source>

BlastP

We ran BlastP on student machines with the big_80 as a reference database.

<source lang="bash"> blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out </source>

default E-Value 10 - GO Term Enrichment (hit more than once)

#hits   GO term
185	 metabolic process
184	 hydrolase activity, acting on ester bonds
133	 hydrolase activity
125	 metal ion binding
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
60	 aspartoacylase activity
44	 zinc ion binding
24	 arginine metabolic process
24	 arginine catabolic process to glutamate
23	 succinylglutamate desuccinylase activity
8	 cytoplasm
5	 arginine catabolic process to succinate
4	 apical plasma membrane
4	 aminoacylase activity
4	 membrane
3	 plasma membrane
3	 identical protein binding
2	 nucleus
2	 intracellular
2	 exonuclease activity
2	 nucleotide binding
2	 nucleic acid binding
2	 oxidoreductase activity
2	 oxidation-reduction process

E-Value 10e-10 - GO Term Enrichment (hit more than once)

#hits   GO term
94	 hydrolase activity, acting on ester bonds
94	 metabolic process
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
66	 hydrolase activity
62	 metal ion binding
58	 aspartoacylase activity
19	 zinc ion binding
8	 cytoplasm
4	 apical plasma membrane
4	 aminoacylase activity
3	 plasma membrane
3	 identical protein binding
3	 membrane
2	 nucleus

PsiBlast

PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database.

2 iterations and default E-Value 0.002 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i p45381_wt.fa -o psiblast_it2_h0002_1000.out -j 2 -h 0.002 -v 1000 -b 1000</source>

GO Term Enrichment (all terms represented more than once)

#hits GO terms
754	 metabolic process	
752	 hydrolase activity, acting on ester bonds	
599	 hydrolase activity	
588	 metal ion binding	
256	 zinc ion binding	
121	 proteolysis	
118	 metallocarboxypeptidase activity	
117	 arginine catabolic process to glutamate	
117	 arginine metabolic process	
116	 succinylglutamate desuccinylase activity	
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides	
60	 aspartoacylase activity	
57	 carboxypeptidase activity	
22	 arginine catabolic process to succinate	
9	 cytoplasm	
6	 ATP binding	
5	 membrane	
5	 transferase activity	
4	 methyltransferase activity	
4	 methylation	
4	 apical plasma membrane	
4	 aminoacylase activity	
4	 regulation of transcription, DNA-dependent	
3	 identical protein binding	
3	 nucleotide binding	
3	 plasma membrane	
3	 nucleus	
3	 catalytic activity	
2	 sequence-specific DNA binding transcription factor activity	
2	 intracellular	
2	 kinase activity	
2	 phosphorylation	
2	 peptidase activity	
2	 molecular_function	
2	 signal transducer activity	
2	 arginine catabolic process	
2	 DNA binding	
2	 biological_process	
2	 signal transduction	
2	 integral to membrane	 


2 iterations, more strict E-value cutoff of 10E-10 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e-10_1000.out -j 2 -h 10e-10 -v 1000 -b 1000</source>

GO Term Enrichment (all terms represented more than once)

#hits  GO term
480	 hydrolase activity, acting on ester bonds
480	 metabolic process
374	 hydrolase activity
363	 metal ion binding
148	 zinc ion binding
108	 arginine metabolic process
108	 arginine catabolic process to glutamate
107	 succinylglutamate desuccinylase activity
88	 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides
60	 aspartoacylase activity
22	 proteolysis
21	 metallocarboxypeptidase activity
19	 arginine catabolic process to succinate
9	 carboxypeptidase activity
8	 cytoplasm
4	 apical plasma membrane
4	 aminoacylase activity
3	 plasma membrane
3	 identical protein binding
3	 membrane
2	 nucleus
2	 transferase activity



10 iterations, default Evalue 0.002 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>


10 iterations, E-value cutoff 10E-10 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>

HHBlits

Run HHBlits on student machines with Uniprot20 database.

  • 2 iterations
 <source lang="bash">  hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_def.out </source>
  • 8 iterations

<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>

  • 2 iterations, -e 10e-10
 <source lang="bash">  hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
  • 8 iterations, -e 10e-10

<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>