Sequence-based predictions GLA

From Bioinformatikpedia
Revision as of 18:17, 31 May 2011 by Drexler (talk | contribs) (Prediction of GO terms)

by Benjamin Drexler and Fabian Grandke

Secondary structure prediction

GLA sequence

PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/

GLA Psipred.png

Jpred3

http://www.compbio.dundee.ac.uk/www-jpred/index.html

EBI Chain Describtion E-value
3hg5 B Alpha-galactosidase A 0.0
3hg5 A Alpha-galactosidase A 0.0
3hg4 B Alpha-galactosidase A 0.0
3hg4 A Alpha-galactosidase A 0.0
3hg2 B Alpha-galactosidase A 0.0
3hg2 A Alpha-galactosidase A 0.0
3gxt B Alpha-galactosidase A 0.0
3gxt A Alpha-galactosidase A 0.0
3gxp B Alpha-galactosidase A 0.0
3gxp A Alpha-galactosidase A 0.0
3gxn B Alpha-galactosidase A 0.0
3gxn A Alpha-galactosidase A 0.0
1r47 B Alpha-galactosidase A 0.0
1r47 A Alpha-galactosidase A 0.0
1r46 B Alpha-galactosidase A 0.0
1r46 A Alpha-galactosidase A 0.0
3hg3 B Alpha-galactosidase A 0.0
3hg3 A Alpha-galactosidase A 0.0
3lxc B Alpha-galactosidase A 0.0
3lxc A Alpha-galactosidase A 0.0
3lxb B Alpha-galactosidase A 0.0
3lxb A Alpha-galactosidase A 0.0
3lxa B Alpha-galactosidase A 0.0
3lxa A Alpha-galactosidase A 0.0
3lx9 B Alpha-galactosidase A 0.0
3lx9 A Alpha-galactosidase A 0.0
1ktc A alpha-N-acetylgalactosaminidase e-113
1ktb A alpha-N-acetylgalactosaminidase e-113
3igu B Alpha-N-acetylgalactosaminidase e-100
3igu A Alpha-N-acetylgalactosaminidase e-100
3h55 B Alpha-N-acetylgalactosaminidase e-100
3h55 A Alpha-N-acetylgalactosaminidase e-100
3h54 B Alpha-N-acetylgalactosaminidase e-100
3h54 A Alpha-N-acetylgalactosaminidase e-100
3h53 B Alpha-N-acetylgalactosaminidase e-100
3h53 A Alpha-N-acetylgalactosaminidase e-100

The lightblue colored protein is the protein that was used as query sequence.

Comparison with DSSP

http://swift.cmbi.ru.nl/servers/html/

GLA DSSP Comp.png

Find a pdf version of this image here: File:GLA DSSP Comp.pdf

Prediction of disordered regions

DISOPRED

http://bioinf.cs.ucl.ac.uk/disopred/

GLA Diso graph.png

POODLE

http://mbs.cbrc.jp/poodle/poodle.html

POODLE-S: Missing residues

GLA Poodle s missing.png


POODLE-S: High B-Factor residues

GLA Poodle s high b.png

IUPRED

http://iupred.enzim.hu/index.html

Short Disorder

GLA Iupred Short.png

Long Disorder

GLA Iupred Long.png

META-Disorder

http://www.predictprotein.org/

Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!

https://www.rostlab.org/owiki/index.php/Metadisorder

GLA Meta disorder.png

PROFbval

https://rostlab.org/owiki/index.php/Profbval

NORSnet

https://www.rostlab.org/owiki/index.php/Norsnet

Ucon

https://www.rostlab.org/owiki/index.php/UCON

Prediction of transmembrane alpha-helices and signal peptides

Additional Proteins

TMHMM

GLA

BARC_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

Phobius and PolyPhobius

http://phobius.sbc.su.se/

GLA

Phobius

GLA Phob gla.png

SIGNAL 1 31

REGION 1 9 N-REGION.

REGION 10 22 H-REGION.

REGION 23 31 C-REGION.

TOPO_DOM 32 429 NON CYTOPLASMIC.


PolyPhobius

GLA Poly gla.png

SIGNAL 1 31

REGION 1 12 N-REGION.

REGION 13 26 H-REGION.

REGION 27 31 C-REGION.

TOPO_DOM 32 429 NON CYTOPLASMIC.

BARC_HALSA

Phobius

GLA Phob barc.png

TOPO_DOM 1 22 NON CYTOPLASMIC.

TRANSMEM 23 42

TOPO_DOM 43 53 CYTOPLASMIC.

TRANSMEM 54 76

TOPO_DOM 77 95 NON CYTOPLASMIC.

TRANSMEM 96 114

TOPO_DOM 115 120 CYTOPLASMIC.

TRANSMEM 121 142

TOPO_DOM 143 147 NON CYTOPLASMIC.

TRANSMEM 148 169

TOPO_DOM 170 189 CYTOPLASMIC.

TRANSMEM 190 212

TOPO_DOM 213 217 NON CYTOPLASMIC.

TRANSMEM 218 237

TOPO_DOM 238 262 CYTOPLASMIC.


PolyPhobius

GLA Poly barc.png

TOPO_DOM 1 21 NON CYTOPLASMIC.

TRANSMEM 22 43

TOPO_DOM 44 54 CYTOPLASMIC.

TRANSMEM 55 77

TOPO_DOM 78 94 NON CYTOPLASMIC.

TRANSMEM 95 114

TOPO_DOM 115 120 CYTOPLASMIC.

TRANSMEM 121 141

TOPO_DOM 142 147 NON CYTOPLASMIC.

TRANSMEM 148 166

TOPO_DOM 167 186 CYTOPLASMIC.

TRANSMEM 187 205

TOPO_DOM 206 215 NON CYTOPLASMIC.

TRANSMEM 216 237

TOPO_DOM 238 262 CYTOPLASMIC.

RET4_HUMAN

Phobius

GLA Phob ret4.png

SIGNAL 1 18

REGION 1 2 N-REGION.

REGION 3 13 H-REGION.

REGION 14 18 C-REGION.

TOPO_DOM 19 201 NON CYTOPLASMIC.


PolyPhobius

Poly ret4.png

SIGNAL 1 18

REGION 1 3 N-REGION.

REGION 4 13 H-REGION.

REGION 14 18 C-REGION.

TOPO_DOM 19 201 NON CYTOPLASMIC.

INSL5_HUMAN

Phobius

GLA Phob insl5.png

SIGNAL 1 22

REGION 1 5 N-REGION.

REGION 6 17 H-REGION.

REGION 18 22 C-REGION.

TOPO_DOM 23 135 NON CYTOPLASMIC.

PolyPhobius

Poly insl5.png

SIGNAL 1 22

REGION 1 4 N-REGION.

REGION 5 16 H-REGION.

REGION 17 22 C-REGION.

TOPO_DOM 23 135 NON CYTOPLASMIC.


LAMP1_HUMAN

Phobius

GLA Phob lamp1.png

SIGNAL 1 28

REGION 1 10 N-REGION.

REGION 11 22 H-REGION.

REGION 23 28 C-REGION.

TOPO_DOM 29 381 NON CYTOPLASMIC.

TRANSMEM 382 405

TOPO_DOM 406 417 CYTOPLASMIC.

PolyPhobius

GLA Poly lamp1.png

SIGNAL 1 28

REGION 1 9 N-REGION.

REGION 10 22 H-REGION.

REGION 23 28 C-REGION.

TOPO_DOM 29 381 NON CYTOPLASMIC.

TRANSMEM 382 405

TOPO_DOM 406 417 CYTOPLASMIC.


A4_HUMAN

Phobius

GLA Phob a4.png

SIGNAL 1 17

REGION 1 1 N-REGION.

REGION 2 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

PolyPhobius

GLA Poly a4.png

SIGNAL 1 17

REGION 1 3 N-REGION.

REGION 4 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

OCTOPUS and SPOCTOPUS

http://octopus.cbr.su.se/index.php

GLA

Octopus

GLA Octo gla.png

Spoctopus

GLA Spocto gla.png

BARC_HALSA

Octopus

GLA Octo bacr.png

Spoctopus

GLA Spocto bacr.png

RET4_HUMAN

Octopus

GLA Octo ret4.png

Spoctopus

GLA Spocto ret4.png

INSL5_HUMAN

Octopus

GLA Octo insl5.png

Spoctopus

GLA Spocto insl5.png

LAMP1_HUMAN

Octopus

GLA Octo lamp1.png

Spoctopus

GLA Spocto lamp1.png

A4_HUMAN

Octopus

Octo a4.png

Spoctopus

GLA Spocto a4.png

SignalP

GLA

BARC_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

TargetP

http://www.cbs.dtu.dk/services/TargetP/

Name Length mTP SP other Loc RC
GLA 429 0.041 0.860 0.141 S 2
BACR_HALSA 262 0.019 0.897 0.562 S 4
RET4_HUMAN 201 0.242 0.928 0.020 S 2
INSL5_HUMA 135 0.074 0.899 0.037 S 1
LAMP1_HUMA 417 0.043 0.953 0.017 S 1
A4_HUMAN 770 0.035 0.937 0.084 S 1

http://www.cbs.dtu.dk/services/TargetP-1.1/output.php

Prediction of GO terms

Programs

GOPET

http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar

We used the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007). The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.

Pfam

Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles(HMMs). Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.

Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.

The Pfam-A contains the well characterized entries with annotation. It starts with the building of the seed alignment with a few selected representative sequence members under manually quality checking. Then the HMMs is applied automatically to make full alignment and try to detect all the possible members for each initial family. The families/domains in Pfam-A are in high quality level and could be used as a reliable annotation/classification evidence for the query sequence.

The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.

We used the "sequence search" feature of Pfam to determine potential domains or domain families of the protein. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation and hence there was no gain in including Pfam-B. The classification in respect to the significance of a hit was done by the Pfam search algorithm. The results are listed in the tables below.

ProtFun

http://www.cbs.dtu.dk/services/ProtFun/

The results of the Gene Ontology category assignment of ProtFun are listed below. The term 'Prob' represents the calculated probability by ProtFun that the query belongs to the category. This probability is dependent on the prior probability of the category. 'Odds' describes the odds that the query belongs to the certain category and is not influenced by the prior probability.<ref name=ProtFun>Explanation of the ProtFun 2.2 output.</ref> The class with the highest information content and with the highest probability is marked bold. Additionally we provide a table for each query that contains the categories with the highest information content or probability, respectively, and their associated GO id. For this purpose, we used the search feature of the Gene Ontology website.

Proteins

GLA

GOPET

GOid Confidence GO term
GO:0016798 98% hydrolase activity acting on glycosyl bonds
GO:0004553 98% hydrolase activity hydrolyzing O-glycosyl compounds
GO:0016787 97% hydrolase activity
GO:0004557 96% alpha-galactosidase activit
GO:0008456 89% alpha-N-acetylgalactosaminidase activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Melibiase Family x Molecular function hydrolase activity, hydrolyzing O-glycosyl compounds GO:0004553
Pfam-A Melibiase Family x Biological process carbohydrate metabolic process GO:0005975

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.090    0.419
 Receptor                             0.014    0.083
 Hormone                              0.002    0.318
 Structural_protein                   0.004    0.127
 Transporter                          0.024    0.222
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.003    0.127
 Cation_channel                       0.010    0.215
 Transcription                        0.047    0.367
 Transcription_regulation             0.026    0.204
 Stress_response                      0.049    0.552
 Immune_response                      0.012    0.136
 Growth_factor                        0.006    0.412
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest probablity Signal transducer Molecular function GO:0004871

BARC_HALSA

GOPET

GOid Confidence GO term
GO:0005216 77% ion channel activity
GO:0008020 75% G-protein coupled photoreceptor activity
GO:0015078 60% hydrogen ion transmembrane transporter activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Bacteriorhodopsin-like protein Domain x Cellular component membrane GO:0016020
Pfam-A Bacteriorhodopsin-like protein Domain x Molecular function ion channel activity GO:0005216
Pfam-A Bacteriorhodopsin-like protein Domain x Biological process ion transport GO:0006811
Pfam-A Domain of unknown function DUF21 Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.258    1.205
 Receptor                             0.355    2.087
 Hormone                              0.001    0.206
 Structural_protein                   0.006    0.200
 Transporter                       => 0.440    4.036
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.004    0.172
 Cation_channel                       0.078    1.689
 Transcription                        0.026    0.205
 Transcription_regulation             0.028    0.226
 Stress_response                      0.012    0.139
 Immune_response                      0.011    0.128
 Growth_factor                        0.010    0.727
 Metal_ion_transport                  0.049    0.106

Type GO category GO aspect GO id
Highest information content / highest probability Transporter Molecular function GO:0005215

RET4_HUMAN

GOPET

GOid Confidence GO term
GO:0005488 90% binding
GO:0005501 81% retinoid binding
GO:0008289 80% lipid binding
GO:0019841 78% retinol binding
GO:0005215 78% transporter activity
GO:0016918 78% retinal binding
GO:0005319 69% lipid transporter activity
GO:0008035 60% high-density lipoprotein particle binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lipocalin / cytosolic fatty-acid binding protein family Domain x Molecular function binding GO:0005488
Pfam-A DspF/AvrF protein Family - - -
Pfam-B PB008544 - - - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.202    0.942
 Receptor                             0.147    0.862
 Hormone                              0.004    0.667
 Structural_protein                   0.002    0.058
 Transporter                          0.025    0.232
 Ion_channel                          0.016    0.288
 Voltage-gated_ion_channel            0.003    0.148
 Cation_channel                       0.010    0.215
 Transcription                        0.027    0.207
 Transcription_regulation             0.025    0.196
 Stress_response                      0.161    1.829
 Immune_response                   => 0.239    2.813
 Growth_factor                        0.023    1.617
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content / highest probability Immune response Biological process GO:0006955

INSL5_HUMAN

GOPET

GOid Confidence GO term
GO:0005179 80% hormone activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Insulin/IGF/Relaxin family Domain x Cellular component extracellular region GO:0005576
Pfam-A Insulin/IGF/Relaxin family Domain x Molecular function hormone activity GO:0005179

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.374    1.746
 Receptor                             0.128    0.750
 Hormone                           => 0.247   37.936
 Structural_protein                   0.001    0.041
 Transporter                          0.025    0.228
 Ion_channel                          0.010    0.168
 Voltage-gated_ion_channel            0.003    0.131
 Cation_channel                       0.010    0.215
 Transcription                        0.054    0.425
 Transcription_regulation             0.091    0.724
 Stress_response                      0.099    1.128
 Immune_response                      0.178    2.090
 Growth_factor                        0.061    4.379
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Hormone Molecular function GO:0005179
Highest probability Signal transducer Molecular function GO:0004871

LAMP1_HUMAN

GOPET

GOid Confidence GO term
GO:0004812 60% aminoacyl-tRNA ligase activity
GO:0005524 60% ATP binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lysosome-associated membrane glycoprotein Family x Cellular component membrane GO:0016020
Pfam-A Protein of unknown function DUF1180 Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.396    1.849
 Receptor                             0.282    1.659
 Hormone                              0.001    0.206
 Structural_protein                   0.011    0.408
 Transporter                          0.024    0.222
 Ion_channel                          0.008    0.147
 Voltage-gated_ion_channel            0.002    0.111
 Cation_channel                       0.010    0.215
 Transcription                        0.032    0.247
 Transcription_regulation             0.018    0.142
 Stress_response                      0.246    2.795
 Immune_response                   => 0.371    4.368
 Growth_factor                        0.013    0.956
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Immune response Biological process GO:0006955
Highest probability Signal transducer Molecular function GO:0004871

A4_HUMAN

GOPET

GOid Confidence GO term
GO:0004866 87% endopeptidase inhibitor activity
GO:0004867 86% serine-type endopeptidase inhibitor activity
GO:0030568 83% plasmin inhibitor activity
GO:0030304 83% trypsin inhibitor activity
GO:0030414 82% peptidase inhibitor activity
GO:0005488 79% binding
GO:0005515 74% protein binding
GO:0046872 73% metal ion binding
GO:0003677 71% DNA binding
GO:0008201 70% heparin binding
GO:0008270 69% zinc ion binding
GO:0005507 69% copper ion binding
GO:0005506 67% iron ion binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Cellular component integral to membrane GO:0016021
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Molecular function binding GO:0005488
Pfam-A Copper-binding of amyloid precursor, CuBD Domain x - - -
Pfam-A Kunitz/Bovine pancreatic trypsin inhibitor domain Domain x Molecular function serine-type endopeptidase inhibitor activity GO:0004867
Pfam-A E2 domain of amyloid precursor protein Domain x - - -
Pfam-A Beta-amyloid peptide Family x Cellular component integral to membrane GO:0016021
Pfam-A Beta-amyloid peptide Family x Molecular function binding GO:0005488
Pfam-A beta-amyloid precursor protein C-terminus Family x - - -
Pfam-A Exonuclease VII, large subunit Family - - -
Pfam-A Transcriptional activator TraM Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.126    0.586
 Receptor                             0.036    0.211
 Hormone                              0.001    0.206
 Structural_protein                => 0.034    1.205
 Transporter                          0.024    0.222
 Ion_channel                          0.009    0.162
 Voltage-gated_ion_channel            0.002    0.108
 Cation_channel                       0.010    0.215
 Transcription                        0.043    0.335
 Transcription_regulation             0.018    0.143
 Stress_response                      0.076    0.862
 Immune_response                      0.016    0.183
 Growth_factor                        0.005    0.372
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Structural protein Molecular function GO:0005198
Highest probability Signal transducer Molecular function GO:0004871

References

<references />