Difference between revisions of "Sequence-based predictions GLA"

From Bioinformatikpedia
(Pfam)
(Pfam)
Line 614: Line 614:
   
 
==Pfam==
 
==Pfam==
  +
Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles(HMMs). Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.
http://pfam.sanger.ac.uk/
 
  +
  +
Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.
  +
  +
The Pfam-A contains the well characterized entries with annotation. It starts with the building of the seed alignment with a few selected representative sequence members under manually quality checking. Then the HMMs is applied automatically to make full alignment and try to detect all the possible members for each initial family. The families/domains in Pfam-A are in high quality level and could be used as a reliable annotation/classification evidence for the query sequence.
  +
  +
The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.
  +
  +
We used the "sequence search" feature of Pfam to determine potential domains or domain families of the protein. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation. The results are listed in the tables below.
   
 
===GLA===
 
===GLA===
Line 628: Line 636:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Melibiase || Family || x || Molecular function || hydrolase activity, hydrolyzing O-glycosyl compounds || GO:0004553
+
| Pfam-A || Melibiase || Family || x || Molecular function || hydrolase activity, hydrolyzing O-glycosyl compounds || GO:0004553
 
|-
 
|-
| Pfam A || Melibiase || Family || x || Biological process || carbohydrate metabolic process || GO:0005975
+
| Pfam-A || Melibiase || Family || x || Biological process || carbohydrate metabolic process || GO:0005975
 
|-
 
|-
 
|}
 
|}
Line 646: Line 654:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Bacteriorhodopsin-like protein || Domain || x || Cellular component || membrane || GO:0016020
+
| Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Cellular component || membrane || GO:0016020
 
|-
 
|-
| Pfam A || Bacteriorhodopsin-like protein || Domain || x || Molecular function || ion channel activity || GO:0005216
+
| Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Molecular function || ion channel activity || GO:0005216
 
|-
 
|-
| Pfam A || Bacteriorhodopsin-like protein || Domain || x || Biological process || ion transport || GO:0006811
+
| Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Biological process || ion transport || GO:0006811
 
|-
 
|-
| Pfam A || Domain of unknown function DUF21 || Family || || - || - || -
+
| Pfam-A || Domain of unknown function DUF21 || Family || || - || - || -
 
|-
 
|-
 
|}
 
|}
Line 668: Line 676:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Lipocalin / cytosolic fatty-acid binding protein family || Domain || x || Molecular function || binding || GO:0005488
+
| Pfam-A || Lipocalin / cytosolic fatty-acid binding protein family || Domain || x || Molecular function || binding || GO:0005488
 
|-
 
|-
| Pfam A || DspF/AvrF protein || Family || || - || - || -
+
| Pfam-A || DspF/AvrF protein || Family || || - || - || -
 
|-
 
|-
| Pfam B || PB008544 || - || - || - || - || -
+
| Pfam-B || PB008544 || - || - || - || - || -
 
|-
 
|-
 
|}
 
|}
Line 688: Line 696:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Insulin/IGF/Relaxin family || Domain || x || Cellular component || extracellular region || GO:0005576
+
| Pfam-A || Insulin/IGF/Relaxin family || Domain || x || Cellular component || extracellular region || GO:0005576
 
|-
 
|-
| Pfam A || Insulin/IGF/Relaxin family || Domain || x || Molecular function || hormone activity || GO:0005179
+
| Pfam-A || Insulin/IGF/Relaxin family || Domain || x || Molecular function || hormone activity || GO:0005179
 
|-
 
|-
 
|}
 
|}
Line 706: Line 714:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Lysosome-associated membrane glycoprotein || Family || x || Cellular component || membrane || GO:0016020
+
| Pfam-A || Lysosome-associated membrane glycoprotein || Family || x || Cellular component || membrane || GO:0016020
 
|-
 
|-
| Pfam A || Protein of unknown function DUF1180 || Family || || - || - || -
+
| Pfam-A || Protein of unknown function DUF1180 || Family || || - || - || -
 
|-
 
|-
 
|}
 
|}
Line 724: Line 732:
 
! GO id
 
! GO id
 
|-
 
|-
| Pfam A || Amyloid A4 N-terminal heparin-binding || Domain || x || Cellular component || integral to membrane || GO:0016021
+
| Pfam-A || Amyloid A4 N-terminal heparin-binding || Domain || x || Cellular component || integral to membrane || GO:0016021
 
|-
 
|-
| Pfam A || Amyloid A4 N-terminal heparin-binding || Domain || x || Molecular function || binding || GO:0005488
+
| Pfam-A || Amyloid A4 N-terminal heparin-binding || Domain || x || Molecular function || binding || GO:0005488
 
|-
 
|-
| Pfam A || Copper-binding of amyloid precursor, CuBD || Domain || x || - || - || -
+
| Pfam-A || Copper-binding of amyloid precursor, CuBD || Domain || x || - || - || -
 
|-
 
|-
| Pfam A || Kunitz/Bovine pancreatic trypsin inhibitor domain || Domain || x || Molecular function || serine-type endopeptidase inhibitor activity || GO:0004867
+
| Pfam-A || Kunitz/Bovine pancreatic trypsin inhibitor domain || Domain || x || Molecular function || serine-type endopeptidase inhibitor activity || GO:0004867
 
|-
 
|-
| Pfam A || E2 domain of amyloid precursor protein || Domain || x || - || - || -
+
| Pfam-A || E2 domain of amyloid precursor protein || Domain || x || - || - || -
 
|-
 
|-
| Pfam A || Beta-amyloid peptide || Family || x || Cellular component || integral to membrane || GO:0016021
+
| Pfam-A || Beta-amyloid peptide || Family || x || Cellular component || integral to membrane || GO:0016021
 
|-
 
|-
| Pfam A || Beta-amyloid peptide || Family || x || Molecular function || binding || GO:0005488
+
| Pfam-A || Beta-amyloid peptide || Family || x || Molecular function || binding || GO:0005488
 
|-
 
|-
| Pfam A || beta-amyloid precursor protein C-terminus || Family || x || - || - || -
+
| Pfam-A || beta-amyloid precursor protein C-terminus || Family || x || - || - || -
 
|-
 
|-
| Pfam A || Exonuclease VII, large subunit || Family || || - || - || -
+
| Pfam-A || Exonuclease VII, large subunit || Family || || - || - || -
 
|-
 
|-
| Pfam A || Transcriptional activator TraM || Family || || - || - || -
+
| Pfam-A || Transcriptional activator TraM || Family || || - || - || -
 
|-
 
|-
 
|}
 
|}

Revision as of 17:10, 28 May 2011

by Benjamin Drexler and Fabian Grandke

Secondary structure prediction

GLA sequence

PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/

GLA Psipred.png

Jpred3

http://www.compbio.dundee.ac.uk/www-jpred/index.html

EBI Chain Describtion E-value
3hg5 B Alpha-galactosidase A 0.0
3hg5 A Alpha-galactosidase A 0.0
3hg4 B Alpha-galactosidase A 0.0
3hg4 A Alpha-galactosidase A 0.0
3hg2 B Alpha-galactosidase A 0.0
3hg2 A Alpha-galactosidase A 0.0
3gxt B Alpha-galactosidase A 0.0
3gxt A Alpha-galactosidase A 0.0
3gxp B Alpha-galactosidase A 0.0
3gxp A Alpha-galactosidase A 0.0
3gxn B Alpha-galactosidase A 0.0
3gxn A Alpha-galactosidase A 0.0
1r47 B Alpha-galactosidase A 0.0
1r47 A Alpha-galactosidase A 0.0
1r46 B Alpha-galactosidase A 0.0
1r46 A Alpha-galactosidase A 0.0
3hg3 B Alpha-galactosidase A 0.0
3hg3 A Alpha-galactosidase A 0.0
3lxc B Alpha-galactosidase A 0.0
3lxc A Alpha-galactosidase A 0.0
3lxb B Alpha-galactosidase A 0.0
3lxb A Alpha-galactosidase A 0.0
3lxa B Alpha-galactosidase A 0.0
3lxa A Alpha-galactosidase A 0.0
3lx9 B Alpha-galactosidase A 0.0
3lx9 A Alpha-galactosidase A 0.0
1ktc A alpha-N-acetylgalactosaminidase e-113
1ktb A alpha-N-acetylgalactosaminidase e-113
3igu B Alpha-N-acetylgalactosaminidase e-100
3igu A Alpha-N-acetylgalactosaminidase e-100
3h55 B Alpha-N-acetylgalactosaminidase e-100
3h55 A Alpha-N-acetylgalactosaminidase e-100
3h54 B Alpha-N-acetylgalactosaminidase e-100
3h54 A Alpha-N-acetylgalactosaminidase e-100
3h53 B Alpha-N-acetylgalactosaminidase e-100
3h53 A Alpha-N-acetylgalactosaminidase e-100

The lightblue colored protein is the protein that was used as query sequence.

Comparison with DSSP

http://swift.cmbi.ru.nl/servers/html/

GLA DSSP Comp.png

Find a pdf version of this image here: File:GLA DSSP Comp.pdf

Prediction of disordered regions

DISOPRED

http://bioinf.cs.ucl.ac.uk/disopred/

GLA Diso graph.png

POODLE

http://mbs.cbrc.jp/poodle/poodle.html

POODLE-S: Missing residues

GLA Poodle s missing.png


POODLE-S: High B-Factor residues

GLA Poodle s high b.png

IUPRED

http://iupred.enzim.hu/index.html

Short Disorder

GLA Iupred Short.png

Long Disorder

GLA Iupred Long.png

META-Disorder

http://www.predictprotein.org/

Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!

https://www.rostlab.org/owiki/index.php/Metadisorder

GLA Meta disorder.png

PROFbval

https://rostlab.org/owiki/index.php/Profbval

NORSnet

https://www.rostlab.org/owiki/index.php/Norsnet

Ucon

https://www.rostlab.org/owiki/index.php/UCON

Prediction of transmembrane alpha-helices and signal peptides

Additional Proteins

TMHMM

GLA

BARC_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

Phobius and PolyPhobius

http://phobius.sbc.su.se/

GLA

Phobius

GLA Phob gla.png

SIGNAL 1 31

REGION 1 9 N-REGION.

REGION 10 22 H-REGION.

REGION 23 31 C-REGION.

TOPO_DOM 32 429 NON CYTOPLASMIC.


PolyPhobius

GLA Poly gla.png

SIGNAL 1 31

REGION 1 12 N-REGION.

REGION 13 26 H-REGION.

REGION 27 31 C-REGION.

TOPO_DOM 32 429 NON CYTOPLASMIC.

BARC_HALSA

Phobius

GLA Phob barc.png

TOPO_DOM 1 22 NON CYTOPLASMIC.

TRANSMEM 23 42

TOPO_DOM 43 53 CYTOPLASMIC.

TRANSMEM 54 76

TOPO_DOM 77 95 NON CYTOPLASMIC.

TRANSMEM 96 114

TOPO_DOM 115 120 CYTOPLASMIC.

TRANSMEM 121 142

TOPO_DOM 143 147 NON CYTOPLASMIC.

TRANSMEM 148 169

TOPO_DOM 170 189 CYTOPLASMIC.

TRANSMEM 190 212

TOPO_DOM 213 217 NON CYTOPLASMIC.

TRANSMEM 218 237

TOPO_DOM 238 262 CYTOPLASMIC.


PolyPhobius

GLA Poly barc.png

TOPO_DOM 1 21 NON CYTOPLASMIC.

TRANSMEM 22 43

TOPO_DOM 44 54 CYTOPLASMIC.

TRANSMEM 55 77

TOPO_DOM 78 94 NON CYTOPLASMIC.

TRANSMEM 95 114

TOPO_DOM 115 120 CYTOPLASMIC.

TRANSMEM 121 141

TOPO_DOM 142 147 NON CYTOPLASMIC.

TRANSMEM 148 166

TOPO_DOM 167 186 CYTOPLASMIC.

TRANSMEM 187 205

TOPO_DOM 206 215 NON CYTOPLASMIC.

TRANSMEM 216 237

TOPO_DOM 238 262 CYTOPLASMIC.

RET4_HUMAN

Phobius

GLA Phob ret4.png

SIGNAL 1 18

REGION 1 2 N-REGION.

REGION 3 13 H-REGION.

REGION 14 18 C-REGION.

TOPO_DOM 19 201 NON CYTOPLASMIC.


PolyPhobius

Poly ret4.png

SIGNAL 1 18

REGION 1 3 N-REGION.

REGION 4 13 H-REGION.

REGION 14 18 C-REGION.

TOPO_DOM 19 201 NON CYTOPLASMIC.

INSL5_HUMAN

Phobius

GLA Phob insl5.png

SIGNAL 1 22

REGION 1 5 N-REGION.

REGION 6 17 H-REGION.

REGION 18 22 C-REGION.

TOPO_DOM 23 135 NON CYTOPLASMIC.

PolyPhobius

Poly insl5.png

SIGNAL 1 22

REGION 1 4 N-REGION.

REGION 5 16 H-REGION.

REGION 17 22 C-REGION.

TOPO_DOM 23 135 NON CYTOPLASMIC.


LAMP1_HUMAN

Phobius

GLA Phob lamp1.png

SIGNAL 1 28

REGION 1 10 N-REGION.

REGION 11 22 H-REGION.

REGION 23 28 C-REGION.

TOPO_DOM 29 381 NON CYTOPLASMIC.

TRANSMEM 382 405

TOPO_DOM 406 417 CYTOPLASMIC.

PolyPhobius

GLA Poly lamp1.png

SIGNAL 1 28

REGION 1 9 N-REGION.

REGION 10 22 H-REGION.

REGION 23 28 C-REGION.

TOPO_DOM 29 381 NON CYTOPLASMIC.

TRANSMEM 382 405

TOPO_DOM 406 417 CYTOPLASMIC.


A4_HUMAN

Phobius

GLA Phob a4.png

SIGNAL 1 17

REGION 1 1 N-REGION.

REGION 2 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

PolyPhobius

GLA Poly a4.png

SIGNAL 1 17

REGION 1 3 N-REGION.

REGION 4 12 H-REGION.

REGION 13 17 C-REGION.

TOPO_DOM 18 700 NON CYTOPLASMIC.

TRANSMEM 701 723

TOPO_DOM 724 770 CYTOPLASMIC.

OCTOPUS and SPOCTOPUS

http://octopus.cbr.su.se/index.php

GLA

Octopus

GLA Octo gla.png

Spoctopus

GLA Spocto gla.png

BARC_HALSA

Octopus

GLA Octo bacr.png

Spoctopus

GLA Spocto bacr.png

RET4_HUMAN

Octopus

GLA Octo ret4.png

Spoctopus

GLA Spocto ret4.png

INSL5_HUMAN

Octopus

GLA Octo insl5.png

Spoctopus

GLA Spocto insl5.png

LAMP1_HUMAN

Octopus

GLA Octo lamp1.png

Spoctopus

GLA Spocto lamp1.png

A4_HUMAN

Octopus

Octo a4.png

Spoctopus

GLA Spocto a4.png

SignalP

GLA

BARC_HALSA

RET4_HUMAN

INSL5_HUMAN

LAMP1_HUMAN

A4_HUMAN

TargetP

http://www.cbs.dtu.dk/services/TargetP/

Name Length mTP SP other Loc RC
GLA 429 0.041 0.860 0.141 S 2
BACR_HALSA 262 0.019 0.897 0.562 S 4
RET4_HUMAN 201 0.242 0.928 0.020 S 2
INSL5_HUMA 135 0.074 0.899 0.037 S 1
LAMP1_HUMA 417 0.043 0.953 0.017 S 1
A4_HUMAN 770 0.035 0.937 0.084 S 1

http://www.cbs.dtu.dk/services/TargetP-1.1/output.php

Prediction of GO terms

GOPET

http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar

We used the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007). The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.

GLA

GOid Confidence GO term
GO:0016798 98% hydrolase activity acting on glycosyl bonds
GO:0004553 98% hydrolase activity hydrolyzing O-glycosyl compounds
GO:0016787 97% hydrolase activity
GO:0004557 96% alpha-galactosidase activit
GO:0008456 89% alpha-N-acetylgalactosaminidase activity

BARC_HALSA

GOid Confidence GO term
GO:0005216 77% ion channel activity
GO:0008020 75% G-protein coupled photoreceptor activity
GO:0015078 60% hydrogen ion transmembrane transporter activity

RET4_HUMAN

GOid Confidence GO term
GO:0005488 90% binding
GO:0005501 81% retinoid binding
GO:0008289 80% lipid binding
GO:0019841 78% retinol binding
GO:0005215 78% transporter activity
GO:0016918 78% retinal binding
GO:0005319 69% lipid transporter activity
GO:0008035 60% high-density lipoprotein particle binding

INSL5_HUMAN

GOid Confidence GO term
GO:0005179 80% hormone activity

LAMP1_HUMAN

GOid Confidence GO term
GO:0004812 60% aminoacyl-tRNA ligase activity
GO:0005524 60% ATP binding

A4_HUMAN

GOid Confidence GO term
GO:0004866 87% endopeptidase inhibitor activity
GO:0004867 86% serine-type endopeptidase inhibitor activity
GO:0030568 83% plasmin inhibitor activity
GO:0030304 83% trypsin inhibitor activity
GO:0030414 82% peptidase inhibitor activity
GO:0005488 79% binding
GO:0005515 74% protein binding
GO:0046872 73% metal ion binding
GO:0003677 71% DNA binding
GO:0008201 70% heparin binding
GO:0008270 69% zinc ion binding
GO:0005507 69% copper ion binding
GO:0005506 67% iron ion binding

Pfam

Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles(HMMs). Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.

Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.

The Pfam-A contains the well characterized entries with annotation. It starts with the building of the seed alignment with a few selected representative sequence members under manually quality checking. Then the HMMs is applied automatically to make full alignment and try to detect all the possible members for each initial family. The families/domains in Pfam-A are in high quality level and could be used as a reliable annotation/classification evidence for the query sequence.

The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.

We used the "sequence search" feature of Pfam to determine potential domains or domain families of the protein. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation. The results are listed in the tables below.

GLA

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Melibiase Family x Molecular function hydrolase activity, hydrolyzing O-glycosyl compounds GO:0004553
Pfam-A Melibiase Family x Biological process carbohydrate metabolic process GO:0005975

BACR_HALSA

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Bacteriorhodopsin-like protein Domain x Cellular component membrane GO:0016020
Pfam-A Bacteriorhodopsin-like protein Domain x Molecular function ion channel activity GO:0005216
Pfam-A Bacteriorhodopsin-like protein Domain x Biological process ion transport GO:0006811
Pfam-A Domain of unknown function DUF21 Family - - -

RET4_HUMAN

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lipocalin / cytosolic fatty-acid binding protein family Domain x Molecular function binding GO:0005488
Pfam-A DspF/AvrF protein Family - - -
Pfam-B PB008544 - - - - -

INSL5_HUMAN

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Insulin/IGF/Relaxin family Domain x Cellular component extracellular region GO:0005576
Pfam-A Insulin/IGF/Relaxin family Domain x Molecular function hormone activity GO:0005179

LAMP1_HUMAN

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Lysosome-associated membrane glycoprotein Family x Cellular component membrane GO:0016020
Pfam-A Protein of unknown function DUF1180 Family - - -

A4_HUMAN

Source Description Entry type Significant GO aspect GO description GO id
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Cellular component integral to membrane GO:0016021
Pfam-A Amyloid A4 N-terminal heparin-binding Domain x Molecular function binding GO:0005488
Pfam-A Copper-binding of amyloid precursor, CuBD Domain x - - -
Pfam-A Kunitz/Bovine pancreatic trypsin inhibitor domain Domain x Molecular function serine-type endopeptidase inhibitor activity GO:0004867
Pfam-A E2 domain of amyloid precursor protein Domain x - - -
Pfam-A Beta-amyloid peptide Family x Cellular component integral to membrane GO:0016021
Pfam-A Beta-amyloid peptide Family x Molecular function binding GO:0005488
Pfam-A beta-amyloid precursor protein C-terminus Family x - - -
Pfam-A Exonuclease VII, large subunit Family - - -
Pfam-A Transcriptional activator TraM Family - - -

ProtFun 2.2

http://www.cbs.dtu.dk/services/ProtFun/

References

<references />