CD task2 protocol
Sequence
The native ASPA sequence (UniProt: P45381):
>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A) MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK LTLNAKSIRCCLH
GO term enrichment
<source lang="java">
for(int i = 0; i< seq_id.length; i++ ){
// URL for annotations from QuickGO for one protein URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv"); // Connect HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection(); // Get data BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); List<String> columns=Arrays.asList(rd.readLine().split("\t")); int idIndex=columns.indexOf("GO ID"); int nameIndex=columns.indexOf("GO Name"); String line; if(rd.ready()) count_go_prots++; Set<String> names = new HashSet<String>(); while ((line=rd.readLine())!=null) {
// Split them into fields String[] fields=line.split("\t"); if(!names.contains(fields[nameIndex])){ names.add(fields[nameIndex]); if(this.go_ids.containsKey(fields[idIndex])) { int count = Integer.parseInt(this.go_ids.get(fields[idIndex])[1]); this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); } else{ this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); }
} }
// close input when finished rd.close();
} </source>
BlastP
We ran BlastP on student machines with the big_80 as a reference database.
<source lang="bash"> blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out </source>
default E-Value 10 - GO Term Enrichment (hit more than once)
#hits GO term 185 metabolic process 184 hydrolase activity, acting on ester bonds 133 hydrolase activity 125 metal ion binding 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 60 aspartoacylase activity 44 zinc ion binding 24 arginine metabolic process 24 arginine catabolic process to glutamate 23 succinylglutamate desuccinylase activity 8 cytoplasm 5 arginine catabolic process to succinate 4 apical plasma membrane 4 aminoacylase activity 4 membrane 3 plasma membrane 3 identical protein binding 2 nucleus 2 intracellular 2 exonuclease activity 2 nucleotide binding 2 nucleic acid binding 2 oxidoreductase activity 2 oxidation-reduction process
E-Value 10e-10 - GO Term Enrichment (hit more than once)
#hits GO term 94 hydrolase activity, acting on ester bonds 94 metabolic process 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 66 hydrolase activity 62 metal ion binding 58 aspartoacylase activity 19 zinc ion binding 8 cytoplasm 4 apical plasma membrane 4 aminoacylase activity 3 plasma membrane 3 identical protein binding 3 membrane 2 nucleus
PsiBlast
PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database.
2 iterations and default E-Value 0.002 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i p45381_wt.fa -o psiblast_it2_h0002_1000.out -j 2 -h 0.002 -v 1000 -b 1000</source>
GO Term Enrichment (all terms represented more than once)
#hits GO terms 564 hydrolase activity, acting on ester bonds 564 metabolic process 443 hydrolase activity 433 metal ion binding 160 zinc ion binding 114 arginine metabolic process 114 arginine catabolic process to glutamate 113 succinylglutamate desuccinylase activity 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 60 aspartoacylase activity 28 proteolysis 27 metallocarboxypeptidase activity 21 arginine catabolic process to succinate 12 carboxypeptidase activity 8 cytoplasm 4 apical plasma membrane 4 aminoacylase activity 3 plasma membrane 3 identical protein binding 3 membrane 2 nucleus 2 arginine catabolic process 2 transferase activity
2 iterations, more strict E-value cutoff of 10E-10 <source lang="bash"> time blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e-10_1000.out -j 2 -h 10e-10 -v 1000 -b 1000</source>
GO Term Enrichment (all terms represented more than once)
#hits GO term 480 hydrolase activity, acting on ester bonds 480 metabolic process 374 hydrolase activity 363 metal ion binding 148 zinc ion binding 108 arginine metabolic process 108 arginine catabolic process to glutamate 107 succinylglutamate desuccinylase activity 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 60 aspartoacylase activity 22 proteolysis 21 metallocarboxypeptidase activity 19 arginine catabolic process to succinate 9 carboxypeptidase activity 8 cytoplasm 4 apical plasma membrane 4 aminoacylase activity 3 plasma membrane 3 identical protein binding 3 membrane 2 nucleus 2 transferase activity
10 iterations, default Evalue 0.002 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>
GO Term Enrichment (all terms represented more than once)
#hits GO term 2352 zinc ion binding 2217 proteolysis 2214 metallocarboxypeptidase activity 1438 carboxypeptidase activity 953 metabolic process 940 hydrolase activity, acting on ester bonds 885 hydrolase activity 783 metal ion binding 118 arginine catabolic process to glutamate 118 arginine metabolic process 116 succinylglutamate desuccinylase activity 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 85 peptidase activity 60 aspartoacylase activity 55 metallopeptidase activity 33 cell wall macromolecule catabolic process 32 cytoplasm 28 cell adhesion 23 cytosol 22 membrane 22 arginine catabolic process to succinate 20 nucleus 18 cellular_component 17 cellular cell wall organization 16 vacuole 16 tubulin binding 15 extracellular region 13 serine-type carboxypeptidase activity 11 extracellular space 9 protein side chain deglutamylation 9 C-terminal protein deglutamylation 9 plasma membrane 8 protein deglutamylation 8 biological_process 8 transferase activity 7 protein branching point deglutamylation 7 molecular_function 6 cell wall 6 catalytic activity 6 integral to membrane 6 regulation of transcription, DNA-dependent 5 mitochondrion organization 5 mitochondrion 5 ATP binding 4 transcription corepressor activity 4 regulation of root meristem growth 4 aminoacylase activity 4 apical plasma membrane 3 methylation 3 cerebellar Purkinje cell differentiation 3 carbohydrate binding 3 eye photoreceptor cell differentiation 3 methyltransferase activity 3 identical protein binding 3 neuromuscular process 3 olfactory bulb development 3 DNA binding 3 pathogenesis 2 protein binding 2 sequence-specific DNA binding transcription factor activity 2 flagellum 2 intracellular 2 transferase activity, transferring phosphorus-containing groups 2 carbohydrate metabolic process 2 transferase activity, transferring glycosyl groups 2 calcium ion binding 2 regulation of angiotensin levels in blood 2 fungal-type vacuole 2 proteinaceous extracellular matrix 2 cytoplasmic vesicle 2 protein kinase activity 2 response to chemical stimulus 2 inorganic diphosphatase activity 2 oxidation-reduction process 2 purine-nucleoside phosphorylase activity 2 fungal-type cell wall organization 2 arginine catabolic process 2 secretory granule 2 extracellular matrix 2 protein phosphorylation 2 polysaccharide catabolic process
10 iterations, E-value cutoff 10E-10
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>
GO Term Enrichment (all terms represented more than once)
#hits GO term 965 metabolic process 964 hydrolase activity, acting on ester bonds 790 hydrolase activity 754 metal ion binding 639 zinc ion binding 504 proteolysis 502 metallocarboxypeptidase activity 337 carboxypeptidase activity 118 arginine metabolic process 118 arginine catabolic process to glutamate 116 succinylglutamate desuccinylase activity 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 60 aspartoacylase activity 22 arginine catabolic process to succinate 11 peptidase activity 11 cytoplasm 7 nucleus 7 membrane 6 plasma membrane 6 metallopeptidase activity 5 serine-type carboxypeptidase activity 5 biological_process 4 transferase activity 4 regulation of root meristem growth 4 cell wall macromolecule catabolic process 4 apical plasma membrane 4 aminoacylase activity 3 vacuole 3 molecular_function 3 methyltransferase activity 3 methylation 3 integral to membrane 3 identical protein binding 3 cellular_component 3 cellular cell wall organization 2 transferase activity, transferring glycosyl groups 2 purine-nucleoside phosphorylase activity 2 polysaccharide catabolic process 2 fungal-type vacuole 2 fungal-type cell wall organization 2 catalytic activity 2 arginine catabolic process
HHBlits
Run HHBlits on student machines with Uniprot20 database.
- 2 iterations
<source lang="bash"> time hhblits -i p45381_wt.fa -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_2it_0.002.out -e 0.002 -z 1000 -v 1000 </source>
- 2 iterations, -e 10e-10
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
- 8 iterations, -e 10e-10
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>
- 8 iterations
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>