Sequence-based predictions GLA

From Bioinformatikpedia
Revision as of 17:48, 5 June 2011 by Grandke (talk | contribs) (Proteins)

by Benjamin Drexler and Fabian Grandke

Proteins

GLA

GLA sequence

BACR_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

Secondary structure prediction

PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/

GLA Psipred.png

Jpred3

http://www.compbio.dundee.ac.uk/www-jpred/index.html

EBI Chain Describtion E-value
3hg5 B Alpha-galactosidase A 0.0
3hg5 A Alpha-galactosidase A 0.0
3hg4 B Alpha-galactosidase A 0.0
3hg4 A Alpha-galactosidase A 0.0
3hg2 B Alpha-galactosidase A 0.0
3hg2 A Alpha-galactosidase A 0.0
3gxt B Alpha-galactosidase A 0.0
3gxt A Alpha-galactosidase A 0.0
3gxp B Alpha-galactosidase A 0.0
3gxp A Alpha-galactosidase A 0.0
3gxn B Alpha-galactosidase A 0.0
3gxn A Alpha-galactosidase A 0.0
1r47 B Alpha-galactosidase A 0.0
1r47 A Alpha-galactosidase A 0.0
1r46 B Alpha-galactosidase A 0.0
1r46 A Alpha-galactosidase A 0.0
3hg3 B Alpha-galactosidase A 0.0
3hg3 A Alpha-galactosidase A 0.0
3lxc B Alpha-galactosidase A 0.0
3lxc A Alpha-galactosidase A 0.0
3lxb B Alpha-galactosidase A 0.0
3lxb A Alpha-galactosidase A 0.0
3lxa B Alpha-galactosidase A 0.0
3lxa A Alpha-galactosidase A 0.0
3lx9 B Alpha-galactosidase A 0.0
3lx9 A Alpha-galactosidase A 0.0
1ktc A alpha-N-acetylgalactosaminidase e-113
1ktb A alpha-N-acetylgalactosaminidase e-113
3igu B Alpha-N-acetylgalactosaminidase e-100
3igu A Alpha-N-acetylgalactosaminidase e-100
3h55 B Alpha-N-acetylgalactosaminidase e-100
3h55 A Alpha-N-acetylgalactosaminidase e-100
3h54 B Alpha-N-acetylgalactosaminidase e-100
3h54 A Alpha-N-acetylgalactosaminidase e-100
3h53 B Alpha-N-acetylgalactosaminidase e-100
3h53 A Alpha-N-acetylgalactosaminidase e-100

The lightblue colored protein is the protein that was used as query sequence.

Comparison with DSSP

http://swift.cmbi.ru.nl/servers/html/

GLA DSSP Comp.png

Find a pdf version of this image here: File:GLA DSSP Comp.pdf

Prediction of disordered regions

DISOPRED

http://bioinf.cs.ucl.ac.uk/disopred/

GLA Diso graph.png

POODLE

http://mbs.cbrc.jp/poodle/poodle.html

POODLE-S: Missing residues

GLA Poodle s missing.png


POODLE-S: High B-Factor residues

GLA Poodle s high b.png

IUPRED

http://iupred.enzim.hu/index.html

Short Disorder

GLA Iupred Short.png

Long Disorder

GLA Iupred Long.png

META-Disorder

http://www.predictprotein.org/

Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!

https://www.rostlab.org/owiki/index.php/Metadisorder

GLA Meta disorder.png

PROFbval

https://rostlab.org/owiki/index.php/Profbval

NORSnet

https://www.rostlab.org/owiki/index.php/Norsnet

Ucon

https://www.rostlab.org/owiki/index.php/UCON

Prediction of transmembrane alpha-helices and signal peptides

Programs

Describtion of the programs:

TMHMM

Phobius/Polyphobius

Octopus/Spoctopus

TargetP

SignalP

Proteins

GLA

Prediction Phobius Polyphobius
Signalpeptide 1-31 1-31
N-Region 1-9 1-12
H-Region 10-22 13-26
C-Region 23-31 27-31
Non-cytoplasmic 32-429 32-429
Octopus Spoctopus
Inside 1-9 N-terminal 1-10
TM-Helix 10-30 Signal Peptide 11-31
Outside 31-429 Outisde 32-429

BACR_HALSA

Prediction Phobius Polyphobius
Non-cytoplasmic 1-22 1-21
Transmembrane 23-42 22-43
Cytoplasmic 43-53 44-54
Transmembrane 54-76 55-77
Non-cytoplasmic 77-95 78-94
Transmembrane 96-114 95-114
Cytoplasmic 115-120 115-120
Transmembrane 121-142 121-141
Non-cytoplasmic 143-147 142-147
Transmembrane 148-169 148-166
Cytoplasmic 170-189 167-186
Transmembrane 190-212 187-205
Non-cytoplasmic 213-217 206-215
Transmembrane 218-237 216-237
Cytoplasmic 238-262 238-262
Prediction Octopus Spoctopus
Outside 1-22 1-22
TM-Helix 23-43 23-43
Outside 44-54 44-54
TM-Helix 55-75 55-75
Outside 76-95 76-95
TM-Helix 96-116 96-116
Outside 117-121 117-120
TM-Helix 122-142 121-141
Outside 143-147 142-147
TM-Helix 148-168 148-168
Outside 169-185 169-185
TM-Helix 186-206 186-206
Outside 207-216 207-216
TM-Helix 217-237 217-237
Outside 238-262 238-262


RET4_HUMAN

Prediction Phobius Polyphobius
Signalpeptide 1-18 1-18
N-Region 1-2 1-3
H-Region 3-13 4-13
C-Region 14-18 14-18
Non-cytoplasmic 19-201 19-201
Octopus Spoctopus
Inside 1-1 N-terminal 1-5
TM-Helix 2-23 Signal Peptide 6-19
Outside 24-201 Outisde 20-201

INSL5_HUMAN

Prediction Phobius Polyphobius
Signalpeptide 1-22 1-22
N-Region 1-5 1-4
H-Region 6-17 5-16
C-Region 18-22 17-22
Non-cytoplasmic 23-135 23-135
Octopus Spoctopus
Inside 1-1 N-terminal 1-5
TM-Helix 2-32 Signal Peptide 6-23
Outside 33-135 Outisde 24-135

LAMP1_HUMAN

Prediction Phobius Polyphobius
Signalpeptide 1-28 1-28
N-Region 1-10 1-9
H-Region 11-22 10-22
C-Region 23-28 23-28
Non-cytoplasmic 29-381 29-381
Transmembrane 382-405 382-405
Non-cytoplasmic 406-417 406-417
Octopus Spoctopus
Inside 1-10 N-terminal 1-11
TM-Helix 11-31 Signal Peptide 12-29
Outside 32-383 Outisde 30-383
TM-Helix 384-404 TM-Helix 384-404
Inside 405-417 Inside 405-417

A4_HUMAN

Prediction Phobius Polyphobius
Signalpeptide 1-17 1-17
N-Region 1-1 1-3
H-Region 2-12 4-12
C-Region 13-17 13-17
Non-cytoplasmic 18-700 18-700
Transmembrane 701-723 701-723
Cytoplasmic 724-770 724-770
Octopus Spoctopus
Outside 1-5 N-terminal 1-4
TM-Helix 6-11 Signal Peptide 5-18
Outside 12-701 Outisde 19-701
TM-Helix 702-722 TM-Helix 702-722
Inside 723-770 Inside 723-770

TMHMM

GLA

Fabry disease TMHMM GLA.png

BACR_HALSA

Fabry disease TMHMM BACR HALSA.png

RET4_HUMAN

Fabry disease TMHMM RET4 HUMAN.png

INSL5_HUMAN

Fabry disease TMHMM INSL5 HUMAN.png

LAMP1_HUMAN

Fabry disease TMHMM LAMP1 HUMAN.png

A4_HUMAN

Fabry disease TMHMM A4 HUMAN.png

Phobius and PolyPhobius

http://phobius.sbc.su.se/

GLA

Phobius

GLA Phob gla.png


PolyPhobius

GLA Poly gla.png


BACR_HALSA

Phobius

GLA Phob barc.png


PolyPhobius

GLA Poly barc.png


RET4_HUMAN

Phobius

GLA Phob ret4.png


PolyPhobius

Poly ret4.png

INSL5_HUMAN

Phobius

GLA Phob insl5.png


PolyPhobius

Poly insl5.png


LAMP1_HUMAN

Phobius

GLA Phob lamp1.png

PolyPhobius

GLA Poly lamp1.png


A4_HUMAN

Phobius

GLA Phob a4.png

SIGNAL 1 17

REGION 1 1 N-REGION.

REGION 2 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

PolyPhobius

GLA Poly a4.png

SIGNAL 1 17

REGION 1 3 N-REGION.

REGION 4 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

OCTOPUS and SPOCTOPUS

http://octopus.cbr.su.se/index.php

GLA

Octopus

GLA Octo gla.png

Spoctopus

GLA Spocto gla.png

BACR_HALSA

Octopus

GLA Octo bacr.png

Spoctopus

GLA Spocto bacr.png

RET4_HUMAN

Octopus

GLA Octo ret4.png

Spoctopus

GLA Spocto ret4.png

INSL5_HUMAN

Octopus

GLA Octo insl5.png

Spoctopus

GLA Spocto insl5.png

LAMP1_HUMAN

Octopus

GLA Octo lamp1.png

Spoctopus

GLA Spocto lamp1.png

A4_HUMAN

Octopus

Octo a4.png

Spoctopus

GLA Spocto a4.png

SignalP

GLA

BACR_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

TargetP

http://www.cbs.dtu.dk/services/TargetP/

Name Length mTP SP other Loc RC
GLA 429 0.041 0.860 0.141 S 2
BACR_HALSA 262 0.019 0.897 0.562 S 4
RET4_HUMAN 201 0.242 0.928 0.020 S 2
INSL5_HUMA 135 0.074 0.899 0.037 S 1
LAMP1_HUMA 417 0.043 0.953 0.017 S 1
A4_HUMAN 770 0.035 0.937 0.084 S 1

http://www.cbs.dtu.dk/services/TargetP-1.1/output.php

Prediction of GO terms

GOPET

GOPET stands for Gene Ontology term Prediction and Evaluation Tool and was developed by Vinayagam et al. in 2006<ref name=vinayagam>Vinayagam et al., "GOPET: a tool for automated predictions of Gene Ontology terms.", BMC Bioinformatics. 2006 Mar 20, PubMed</ref>. It is based on homology searches on GO-mapped protein databases and uses support vector machines for the calculation of the confidence values.

We used the webserver of GOPET with the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007) and the FASTA-sequence of the protein as input. The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.

Pfam

Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles (HMMs) and was first described by Sonnhammer et al. in 1997<ref name=sonnhammer>Sonnhammer et al., "Pfam: a comprehensive database of protein domain families based on seed alignments.", Proteins. 1997 Jul, PubMed</ref>. Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.

Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.

The Pfam-A contains the well characterized entries with annotation. It starts with the building of the seed alignment with a few selected representative sequence members under manually quality checking. Then the HMMs is applied automatically to make full alignment and try to detect all the possible members for each initial family. The families/domains in Pfam-A are in high quality level and could be used as a reliable annotation/classification evidence for the query sequence.

The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.

We used the "sequence search" feature of Pfam website with the FASTA-sequence of the protein to determine potential domains or domain families. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation and hence there was no gain in including Pfam-B. The classification in respect to the significance of a hit was done by the Pfam search algorithm.

ProtFun

ProtFun tries to assign a function to the query protein. For this purpose, it uses the prediction of several other features like post-translational modification sites or localization of the protein. The prediction of these features itself is based on other programs like TMHMM, ... ProtFun was developed by Jensen et al. in 2002<ref name=jensen_1>Jensen et al., "Prediction of human protein function from post-translational modifications and localization features.", J Mol Biol. 2002 Jun 21, PubMed</ref>. The prediction of the Gene Ontology category was added in 2003<ref name=jensen_2>Jensen et al., "Prediction of human protein function according to Gene Ontology categories.", Bioinformatics. 2003 Mar 22, PubMed</ref>.

We used the webserver of ProtFun 2.2 with the default settings and the FASTA-sequence of the protein as the input. The output contains predictions about the functional category, enzyme/nonenzyme, enzyme class and the Gene Ontology category. In our case, only the result of the latter was relevant. The term 'Prob' represents the calculated probability by ProtFun that the query belongs to the category. This probability is dependent on the prior probability of the category. 'Odds' describes the odds that the query belongs to the certain category and is not influenced by the prior probability.<ref name=ProtFun>Explanation of the ProtFun 2.2 output.</ref> The class with the highest information content and with the highest probability is marked bold. Additionally we provide a table for each query that contains the categories with the highest information content or probability, respectively, and their associated GO id. For this purpose, we used the search feature of the Gene Ontology website.

Evaluation of the Results

Proteins

GLA

GOPET

GOid Confidence GO term
GO:0016798 98% hydrolase activity acting on glycosyl bonds
GO:0004553 98% hydrolase activity hydrolyzing O-glycosyl compounds
GO:0016787 97% hydrolase activity
GO:0004557 96% alpha-galactosidase activit
GO:0008456 89% alpha-N-acetylgalactosaminidase activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Melibiase Family x Molecular function hydrolase activity, hydrolyzing O-glycosyl compounds GO:0004553
Pfam-A Melibiase Family x Biological process carbohydrate metabolic process GO:0005975

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.090    0.419
 Receptor                             0.014    0.083
 Hormone                              0.002    0.318
 Structural_protein                   0.004    0.127
 Transporter                          0.024    0.222
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.003    0.127
 Cation_channel                       0.010    0.215
 Transcription                        0.047    0.367
 Transcription_regulation             0.026    0.204
 Stress_response                      0.049    0.552
 Immune_response                      0.012    0.136
 Growth_factor                        0.006    0.412
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest probablity Signal transducer Molecular function GO:0004871

Evaluation

Venn diagramm of the GO ids for the protein GLA.

BACR_HALSA

GOPET

GOid Confidence GO term
GO:0005216 77% ion channel activity
GO:0008020 75% G-protein coupled photoreceptor activity
GO:0015078 60% hydrogen ion transmembrane transporter activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Bacteriorhodopsin-like protein Domain x Cellular component membrane GO:0016020
Pfam-A Bacteriorhodopsin-like protein Domain x Molecular function ion channel activity GO:0005216
Pfam-A Bacteriorhodopsin-like protein Domain x Biological process ion transport GO:0006811
Pfam-A Domain of unknown function DUF21 Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.258    1.205
 Receptor                             0.355    2.087
 Hormone                              0.001    0.206
 Structural_protein                   0.006    0.200
 Transporter                       => 0.440    4.036
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.004    0.172
 Cation_channel                       0.078    1.689
 Transcription                        0.026    0.205
 Transcription_regulation             0.028    0.226
 Stress_response                      0.012    0.139
 Immune_response                      0.011    0.128
 Growth_factor                        0.010    0.727
 Metal_ion_transport                  0.049    0.106

Type GO category GO aspect GO id
Highest information content / highest probability Transporter Molecular function GO:0005215

Evaluation

Venn diagramm of the GO ids for the protein BACR_HALSA.

RET4_HUMAN

GOPET

GOid Confidence GO term
GO:0005488 90% binding
GO:0005501 81% retinoid binding
GO:0008289 80% lipid binding
GO:0019841 78% retinol binding
GO:0005215 78% transporter activity
GO:0016918 78% retinal binding
GO:0005319 69% lipid transporter activity
GO:0008035 60% high-density lipoprotein particle binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lipocalin / cytosolic fatty-acid binding protein family Domain x Molecular function binding GO:0005488
Pfam-A DspF/AvrF protein Family - - -
Pfam-B PB008544 - - - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.202    0.942
 Receptor                             0.147    0.862
 Hormone                              0.004    0.667
 Structural_protein                   0.002    0.058
 Transporter                          0.025    0.232
 Ion_channel                          0.016    0.288
 Voltage-gated_ion_channel            0.003    0.148
 Cation_channel                       0.010    0.215
 Transcription                        0.027    0.207
 Transcription_regulation             0.025    0.196
 Stress_response                      0.161    1.829
 Immune_response                   => 0.239    2.813
 Growth_factor                        0.023    1.617
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content / highest probability Immune response Biological process GO:0006955

Evaluation

Venn diagramm of the GO ids for the protein RET4_HUMAN.

INSL5_HUMAN

GOPET

GOid Confidence GO term
GO:0005179 80% hormone activity

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Insulin/IGF/Relaxin family Domain x Cellular component extracellular region GO:0005576
Pfam-A Insulin/IGF/Relaxin family Domain x Molecular function hormone activity GO:0005179

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.374    1.746
 Receptor                             0.128    0.750
 Hormone                           => 0.247   37.936
 Structural_protein                   0.001    0.041
 Transporter                          0.025    0.228
 Ion_channel                          0.010    0.168
 Voltage-gated_ion_channel            0.003    0.131
 Cation_channel                       0.010    0.215
 Transcription                        0.054    0.425
 Transcription_regulation             0.091    0.724
 Stress_response                      0.099    1.128
 Immune_response                      0.178    2.090
 Growth_factor                        0.061    4.379
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Hormone Molecular function GO:0005179
Highest probability Signal transducer Molecular function GO:0004871

Evaluation

Venn diagramm of the GO ids for the protein INSL5_HUMAN.

LAMP1_HUMAN

GOPET

GOid Confidence GO term
GO:0004812 60% aminoacyl-tRNA ligase activity
GO:0005524 60% ATP binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lysosome-associated membrane glycoprotein Family x Cellular component membrane GO:0016020
Pfam-A Protein of unknown function DUF1180 Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.396    1.849
 Receptor                             0.282    1.659
 Hormone                              0.001    0.206
 Structural_protein                   0.011    0.408
 Transporter                          0.024    0.222
 Ion_channel                          0.008    0.147
 Voltage-gated_ion_channel            0.002    0.111
 Cation_channel                       0.010    0.215
 Transcription                        0.032    0.247
 Transcription_regulation             0.018    0.142
 Stress_response                      0.246    2.795
 Immune_response                   => 0.371    4.368
 Growth_factor                        0.013    0.956
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Immune response Biological process GO:0006955
Highest probability Signal transducer Molecular function GO:0004871

Evaluation

Venn diagramm of the GO ids for the protein LAMP1_HUMAN.

A4_HUMAN

GOPET

GOid Confidence GO term
GO:0004866 87% endopeptidase inhibitor activity
GO:0004867 86% serine-type endopeptidase inhibitor activity
GO:0030568 83% plasmin inhibitor activity
GO:0030304 83% trypsin inhibitor activity
GO:0030414 82% peptidase inhibitor activity
GO:0005488 79% binding
GO:0005515 74% protein binding
GO:0046872 73% metal ion binding
GO:0003677 71% DNA binding
GO:0008201 70% heparin binding
GO:0008270 69% zinc ion binding
GO:0005507 69% copper ion binding
GO:0005506 67% iron ion binding

Pfam

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Cellular component integral to membrane GO:0016021
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Molecular function binding GO:0005488
Pfam-A Copper-binding of amyloid precursor, CuBD Domain x - - -
Pfam-A Kunitz/Bovine pancreatic trypsin inhibitor domain Domain x Molecular function serine-type endopeptidase inhibitor activity GO:0004867
Pfam-A E2 domain of amyloid precursor protein Domain x - - -
Pfam-A Beta-amyloid peptide Family x Cellular component integral to membrane GO:0016021
Pfam-A Beta-amyloid peptide Family x Molecular function binding GO:0005488
Pfam-A beta-amyloid precursor protein C-terminus Family x - - -
Pfam-A Exonuclease VII, large subunit Family - - -
Pfam-A Transcriptional activator TraM Family - - -

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.126    0.586
 Receptor                             0.036    0.211
 Hormone                              0.001    0.206
 Structural_protein                => 0.034    1.205
 Transporter                          0.024    0.222
 Ion_channel                          0.009    0.162
 Voltage-gated_ion_channel            0.002    0.108
 Cation_channel                       0.010    0.215
 Transcription                        0.043    0.335
 Transcription_regulation             0.018    0.143
 Stress_response                      0.076    0.862
 Immune_response                      0.016    0.183
 Growth_factor                        0.005    0.372
 Metal_ion_transport                  0.009    0.020

Type GO category GO aspect GO id
Highest information content Structural protein Molecular function GO:0005198
Highest probability Signal transducer Molecular function GO:0004871

Evaluation

Venn diagramm of the GO ids for the protein A4_HUMAN.

References

<references />