Fabry:Sequence-based analyses

From Bioinformatikpedia
Revision as of 17:21, 14 May 2012 by Rackersederj (talk | contribs) (GO terms)

Fabry Disease » Sequence-based analyses



The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Secondary structure

Disorder

Transmembrane helices

Voltage-gated potassium channel (Q9YDF8)


Signal peptides

Prediction of the presence and location of signal peptide cleavage sites in amino acid sequences.

hest information content. In the pictures below the probabilty, odds and information content ( <math>probability \cdot odds</math> probabilty*odds) a

GO terms

QuickGO

Since we used QuickGO in Task2 to download the GO terms we decided to refine our GO analysis now. The QuickGO search reveals 28 distinct GO terms (see <xr id="tab:QuickGO"/>)

<figtable id="tab:QuickGO"> Results of the QuickGO search

Code Name
GO:0052692 raffinose alpha-galactosidase activity
GO:0051001 negative regulation of nitric-oxide synthase activity
GO:0046479 glycosphingolipid catabolic process
GO:0046477 glycosylceramide catabolic process
GO:0045019 negative regulation of nitric oxide biosynthetic process
GO:0044281 small molecule metabolic process
GO:0043202 lysosomal lumen
GO:0043169 cation binding
GO:0042803 protein homodimerization activity
GO:0016936 galactoside binding
GO:0016798 hydrolase activity, acting on glycosyl bonds
GO:0016787 hydrolase activity
GO:0016139 glycoside catabolic process
GO:0009311 oligosaccharide metabolic process
GO:0008152 metabolic process
GO:0006687 glycosphingolipid metabolic process
GO:0006665 sphingolipid metabolic process
GO:0005975 carbohydrate metabolic process
GO:0005794 Golgi apparatus
GO:0005764 lysosome
GO:0005737 cytoplasm
GO:0005625 soluble fraction
GO:0005576 extracellular region
GO:0005515 protein binding
GO:0005102 receptor binding
GO:0004557 alpha-galactosidase activity
GO:0004553 hydrolase activity, hydrolyzing O-glycosyl compounds
GO:0003824 catalytic activity

</figtable>

GOPET

<figtable id="tab:GOPET"> Results of the GOPET search

Result for GOPET search
GOid Aspect Confidence GO term
GO:0016798 Molecular Function Ontology (F) 98% hydrolase activity acting on glycosyl bonds
GO:0004553 Molecular Function Ontology (F) 98% hydrolase activity hydrolyzing O-glycosyl compounds
GO:0016787 Molecular Function Ontology (F) 97% hydrolase activity
GO:0004557 Molecular Function Ontology (F) 96% alpha-galactosidase activity
GO:0008456 Molecular Function Ontology (F) 89% alpha-N-acetylgalactosaminidase activity

</figtable>

Searching the GOPET annotation tool with the AGAL_HUMAN sequence revealed 5 GOIds, which are displayed in <xr id="tab:GOPET"/>. Even maximizing the "maximum number of GO prediction to be displayed per sequence" and minimizing the "confidence threshold for prediction" did not result in any more sequences.
On a first glance, since we already know the name and function of the protein, it is a bit surprising, that alpha-galactosidase activity is only the third entry with 96% confidence. In our already carried out information gathering we learned that α-galactosidase A is a hydrolase thus the first three entries were not surprising. Considering that our enzyme mainly is a glycosidase, the both entries on top of the list make perfekt sense.
Again a bit surprising was the last entry. α-N-Acetylgalactosaminidase is actually used for enzyme replacement therapy, which we mention on our main page. The structure of both enzymes is similar to each other, but this still does not explain the association of this GO term to the AGAL protein.


ProtFun2.2

In the output file of ProtFun, the "=>" indicates, which of the subcategories of each category is predicted to be true for the submitted sequence. This prediction is performed on basis of the highest information content. In the pictures below the probabilty, odds and information content are shown separately for each category. The left y-axis is assigned to the probabilty (blue) and the information content (red line), the right y-axis is assigned to the probabilty, which is multiplied by 10 for a better perceptibility.
According to the information content the most likely functional category of the human α-galactosidase A protein is "Cell envelope". Considering our own researches, we would not come to this conclusion, but rather assign a metabolic or regulatory class, since to us known GO terms are for example "glycoside catabolic process", "negative regulation of nitric oxide biosynthetic process" and "small molecule metabolic process".
The prediction that the submitted sequence belongs to an enzyme is right.
In our opinion, it is not always the best evaluation method to decide on the basis of the information content, since the chosen enzyme class "Ligase" indeed has the highest information content, but not the highest probabilty and is, considering literature, wrong. AGAL_HUMAN clearly is a Hydrolase, which is also indicated by the high probabilty. The assigned EC number of the galactosidase is EC=3.2.1.22.
ProtFun does not predict a Gene Ontology category, since the category with the highest information content has odds lower than 1. We actually expected this category to be very unclear since the protein has very diverse cellular functions.

ProtFun uses a lot of other ressources to predict the probabilties and odds of the categories. Some of the outcomes are listed in <xr id="tab:ProtFun"/>. It is known that the α-galactosidase A protein has a 31 residues long signal peptide on position 1 to 31 which later is cleaved of. Thus the prediction of SignalP 3.0 is right.
The prediction of a propeptide cleavage site could not be confirmed in the human AGAL protein.
ProtFun infered 22 phosphorylation sites at the positions 62, 102, 201, 235, 238, 241, 276, 304, 364, 371, 405, 424, 366, 400, 86, 134, 151, 152, 173, 184, 207, 216. Phosphosite confirms phosphorylation modificatopns at S23, Y134 and H186 of which only one was predicted by NetPhos.
Uniprot claims that at the positions 139, 192, 215 and 408 there are N-linked Glycosilated sites (the last one is only a potential site). All 4 are predicted by NetNGlyc. As predicted by NetOGlyc, there are no (known) O-glycosylated sites.
See also sections Transmembrane helices and Signal peptides for further information on these two topics.

<figtable id="tab:ProtFun"> Output rendered by the individual features used by ProtFun 2.2

Feature Output summary Details
SignalP 3.0 Most likely cleavage site between pos. 31 and 32: ARA-LD Using neural networks (NN) and hidden Markov models (HMM) trained on eukaryotes
ProP 1.0 1 propeptide cleavage site predicted at position: 196 Furin-type cleavage site prediction (Arginine/Lysine residues)
TargetP 1.1 No high confidence targeting predition -
NetPhos 2.0 22 putative phosphorylation sites phosphorylation site prediction
NetOGlyc 3.1 No O-glycosylated sites predicted -
NetNGlyc 1.0 4 putative N-glycosylated sites at positions 139 192 215 408 -
TMHMM 2.0 No TM helices predicted -

</figtable>


Pfam

The Pfam sequence search revealed one significant Pfam-A match, which is shown in <xr id="fig:pfam_fam"/>.

<figtable id="fig:pfam_fam"> Pfam-A match

Family Description Entry xtype Clan Envelope Start Envelope End Alignment Start Alignment End HMM From HMM To Bit score E-value
Melibiase Melibiase Family CL0058 33 149 40 146 41 140 50.0 1.5e-13

</figtable>


<figure id="fig:AGAL">

The protein encoded by the Fabry-associated GLA gene: α-Galactosidase A

</figure>


The alpha-galactosidase A protein is, according to PFAM, in the family of the Melibiases (see <xr id="tab:Pfam_Melibiose"/>). The Pfam PF02065 family itself is refered to as "Glycoside hydrolase family 27" and "Glycoside hydrolase family 36", the AGAL_HUMAN falls into the first category. Their common characteristic is that the members of this family are glycoside hydrolases (EC 3.2.1.). The AGAL enzyme catalyzes the hydrolisis of the disaccharide Melibiose (D-Gal-α(1→6)-D-Glc) into its two components galactose and glucose.

<figtable id="tab:Pfam_Melibiose"> Melibiase Identifiers

Melibiase Identifiers
Symbol Melibiase
Pfam PF02065
Pfam clan CL0058
InterPro IPR000111
SCOP 1ktc
SUPERFAMILY 1ktc
CAZy GH27

</figtable>

The alignment of positions 40 - 146 of the AGAL_HUMAN protein sequence and the matching HMM used in this prediction, is shown in <xr id="fig:pfam_ali" />. According to the color code, indicating the degree of confidence of each aligned position, there is an overall very good agreement of the Hidden Markov Model and the sequence, except for the residues 17-46. Checking our background knowledge and the Uniprot database we could not find a very interesting or abnormal region here, but the signal peptide cleavage site and a beta strand at position 42-46.
Our query protein belongs to a rather large Clan, the Glyco_hydro_tim Clan (CL0058), which includes 4 CAZy-Clans (GH-A, GH-D, GH-H and GH-K). They main attribute of all the included glycosyl hydrolase enzymes is the hold of a a TIM barrel fold (eight α-helices and eight parallel β-strands that alternate along the peptide backbone source). This fold (residue 31 - 324) can be well seen in <xr id="fig:AGAL"/>, where the 3D structure of α-Galactosidase A is depicted.
The InterPro protein sequence analysis and classification assigns the enzyme to the "IPR000111 Glycoside hydrolase, clan GH-D" family. As mentioned before, GH-D is part of the Glyco_hydro_tim Clan and thus it is not surprising, that the description is almost the same. According to InterPro, there are 6 IPR000111 family members in the human body.
The Structural Classification of Proteins (SCOP) finds two domains, an Amylase, catalytic domain (c.1.8.1, residues 32 - 323) and an alpha-Amylases, C-terminal beta-sheet domain (b.71.1.1, residues 324 - 421). The catalytic domain is an alpha and beta protein (a/b) with TIM beta/alpha-barrel fold, which is consonant with the affiliation to the Glyco_hydro_tim Clan. The C-terminal beta-sheet domain is an all beta protein, the fold and superfamily is Glycosyl hydrolase domain. This is also not surprising to us.
AGAL is in the CAZy GH27 (Glycoside Hydrolase Family 27) family, together with a-N-acetylgalactosaminidase (EC 3.2.1.49; as mentioned before, used for enzyme replacement therapy), isomalto-dextranase(EC 3.2.1.94) and b-L-arabinopyranosidase (EC 3.2.1.88). See http://www.cazypedia.org/index.php/Glycoside_Hydrolase_Family_27 TODO

<figure id="fig:pfam_ali">

Melibiase - alignment region residues 40 - 146

</figure>


Other programs and ressources