Difference between revisions of "CD task2 protocol"
(→GO term enrichment) |
|||
Line 13: | Line 13: | ||
=GO term enrichment= |
=GO term enrichment= |
||
<source lang="java"> |
<source lang="java"> |
||
+ | |||
for(int i = 0; i< seq_id.length; i++ ){ |
for(int i = 0; i< seq_id.length; i++ ){ |
||
+ | // URL for annotations from QuickGO for one protein |
||
− | |||
+ | URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv"); |
||
− | System.out.println(seq_id[i]); |
||
− | + | // Connect |
|
+ | HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection(); |
||
− | // URL for annotations from QuickGO for one protein |
||
+ | // Get data |
||
− | URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv"); |
||
+ | BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); |
||
− | // Connect |
||
+ | List<String> columns=Arrays.asList(rd.readLine().split("\t")); |
||
− | HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection(); |
||
+ | int idIndex=columns.indexOf("GO ID"); |
||
− | // Get data |
||
+ | int nameIndex=columns.indexOf("GO Name"); |
||
− | BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); |
||
+ | String line; |
||
− | List<String> columns=Arrays.asList(rd.readLine().split("\t")); |
||
+ | if(rd.ready()) count_go_prots++; |
||
− | int idIndex=columns.indexOf("GO ID"); |
||
+ | Set<String> names = new HashSet<String>(); |
||
− | int nameIndex=columns.indexOf("GO Name"); |
||
+ | while ((line=rd.readLine())!=null) { |
||
− | String line; |
||
+ | // Split them into fields |
||
− | if(rd.ready()) count_go_prots++; |
||
+ | String[] fields=line.split("\t"); |
||
− | Set<String> names = new HashSet<String>(); |
||
+ | if(!names.contains(fields[nameIndex])){ |
||
− | while ((line=rd.readLine())!=null) { |
||
− | + | names.add(fields[nameIndex]); |
|
+ | if(this.go_ids.containsKey(fields[idIndex])) { |
||
− | String[] fields=line.split("\t"); |
||
+ | int count = Integer.parseInt(this.go_ids.get(fields[idIndex])[1]); |
||
− | |||
− | + | this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); |
|
+ | } |
||
− | names.add(fields[nameIndex]); |
||
+ | else{ |
||
− | if(this.go_ids.containsKey(fields[idIndex])) { |
||
− | + | this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); |
|
+ | } |
||
− | this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); |
||
− | + | } |
|
− | + | } |
|
− | this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); |
||
− | } |
||
− | } |
||
− | } |
||
− | + | // close input when finished |
|
− | + | rd.close(); |
|
+ | } |
||
− | } |
||
</source> |
</source> |
||
Revision as of 13:34, 29 August 2012
Sequence
The native ASPA sequence (UniProt: P45381):
>hsa:443 ASPA, ACY2, ASP; aspartoacylase; K01437 aspartoacylase [EC:3.5.1.15] (A) MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKPFITNPRAVKK CTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLFGPKDSEDSYDIIFDLHNTTS NMGCTLILEDSRNNFLIQMFHYIKTSLAPLPCYVYLIEHPSLKYATTRSIAKYPVGIEVG PQPQGVLRADILDQMRKMIKHALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIA AIIHPNLQDQDWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK LTLNAKSIRCCLH
GO term enrichment
<source lang="java">
for(int i = 0; i< seq_id.length; i++ ){
// URL for annotations from QuickGO for one protein URL u=new URL("http://www.ebi.ac.uk/QuickGO/GAnnotation?protein="+seq_id[i]+"&format=tsv"); // Connect HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection(); // Get data BufferedReader rd=new BufferedReader(new InputStreamReader(urlConnection.getInputStream())); List<String> columns=Arrays.asList(rd.readLine().split("\t")); int idIndex=columns.indexOf("GO ID"); int nameIndex=columns.indexOf("GO Name"); String line; if(rd.ready()) count_go_prots++; Set<String> names = new HashSet<String>(); while ((line=rd.readLine())!=null) {
// Split them into fields String[] fields=line.split("\t"); if(!names.contains(fields[nameIndex])){ names.add(fields[nameIndex]); if(this.go_ids.containsKey(fields[idIndex])) { int count = Integer.parseInt(this.go_ids.get(fields[idIndex])[1]); this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex], String.valueOf(count+1)}); } else{ this.go_ids.put(fields[idIndex], new String[]{fields[nameIndex],"1"}); }
} }
// close input when finished rd.close();
} </source>
BlastP
We ran BlastP on student machines with the big_80 as a reference database.
<source lang="bash"> blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o blastp_p45381_wt_big80.out </source>
default E-Value 10 - GO Term Enrichment (hit more than once)
#hits GO term 185 metabolic process 184 hydrolase activity, acting on ester bonds 133 hydrolase activity 125 metal ion binding 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 60 aspartoacylase activity 44 zinc ion binding 24 arginine metabolic process 24 arginine catabolic process to glutamate 23 succinylglutamate desuccinylase activity 8 cytoplasm 5 arginine catabolic process to succinate 4 apical plasma membrane 4 aminoacylase activity 4 membrane 3 plasma membrane 3 identical protein binding 2 nucleus 2 intracellular 2 exonuclease activity 2 nucleotide binding 2 nucleic acid binding 2 oxidoreductase activity 2 oxidation-reduction process
E-Value 10e-10 - GO Term Enrichment (hit more than once)
#hits GO term 94 hydrolase activity, acting on ester bonds 94 metabolic process 88 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 66 hydrolase activity 62 metal ion binding 58 aspartoacylase activity 19 zinc ion binding 8 cytoplasm 4 apical plasma membrane 4 aminoacylase activity 3 plasma membrane 3 identical protein binding 3 membrane 2 nucleus
PsiBlast
PSIBlast was used in the same fashion as BLAST, with the big_80 as the background database.
2 iterations and default E-Value 0.002 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_p45381_wt_big80.out -j 2 -v 700</source>
2 iterations, more strict E-value cutoff of 10E-10
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it2_h10e10_p45381_wt_big80.out -j 2 -h 10e-10 </source>
GO Term Enrichment (all terms represented more than once)
766 hydrolase activity, acting on ester bonds 665 metabolic process 507 hydrolase activity 488 metal ion binding 294 zinc ion binding 234 arginine catabolic process to glutamate 176 hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds, in linear amides 158 aspartoacylase activity 132 arginine metabolic process 132 succinylglutamate desuccinylase activity 30 cytoplasm 24 arginine catabolic process to succinate 23 proteolysis 22 metallocarboxypeptidase activity 12 apical plasma membrane 9 carboxypeptidase activity 8 nucleus 8 aminoacylase activity 8 membrane 6 identical protein binding 6 oxidation-reduction process 6 plasma membrane 5 oxidoreductase activity 2 malate synthase activity 2 glyoxylate cycle 2 exonuclease activity 2 nucleotide binding 2 acetate metabolic process 2 arginine catabolic process 2 cell adhesion 2 virus-host interaction 2 integral to membrane 2 transferase activity
10 iterations, default Evalue 0.002 <source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_p45381_wt_big80.out -j 10 </source>
10 iterations, E-value cutoff 10E-10
<source lang="bash"> blastpgp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P45381_wt.fasta -o psiblast_it10_h10e10_p45381_wt_big80.out -j 10 -h 10e-10 </source>
HHBlits
Run HHBlits on student machines with Uniprot20 database.
- 2 iterations
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -o hhblits_p45381_def.out </source>
- 8 iterations
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -z 1000 -v 1000-o hhblits_p45381_n10.out </source>
- 2 iterations, -e 10e-10
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 10e-10 -o hhblits_p45381_def.out </source>
- 8 iterations, -e 10e-10
<source lang="bash"> hhblits -i P45381_wt.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -n 8 -e 10e-10 -z 1000 -v 1000 -o hhblits_p45381_n10.out </source>