Gaucher Disease: Task 03 - Sequence-based predictions

Secondary Structure

In this task secondary structure of a protein is predicted using ReProf and compared to PsiPred prediction and DSSP structure assignment.

Lab journal

Evaluation results

Evaluation results of Reprof against Psipred and DSSP are summarized in <xr id="secondary structure results"/>. Reprof run was performed starting with the Psi-BLAST PSSM after a run against big_80 with 3 iterations and E-value cutoff 10E-10 (as described in the lab journal in the link above).

Query	Precision PsiPred				Precision DSSP
Query	E	H	L	Total	E	H	L	Total
P10775	47.3	99.4	59.6	72.6	42.1	96.0	62.2	78.5
Q9X0E6	86.1	97.2	75.9	87.1	83.0	97.3	77.0	87.8
Q08209	87.5	74.3	86.0	82.2	70.5	75.9	88.9	78.8
P04062	88.4	96.8	76.2	83.6	80.0	84.1	86.3	83.3

Precision ( = number of matches / number of residues) of the ReProf prediction regarding the PsiPred prediction as well as the DSSP assignment.

</figtable>

Human Glucosylceramidase (P04062)

Aligned view

Aligned view of the secondary structure predictions with ReProf and PsiPred, the DSSP assignment and the UniProt annotation for the Gaucher's disease protein, P04062, is shown below.

Sequence:	MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHT
  ReProf:	LLLLLLHHHHLLHHLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLEEELLLLLEEEEEELLLLLLLLLLLLLLLLEEEEEEELLLLLEEEEELLLLLLLLL
 PsiPred:	LLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLL
    DSSP:	----------------------------------------E---EEE-LLLLEEEEEELL---E--------LLEEEEEEEELLL--LEEEEEE-ELL--
 UniProt:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEELEEEELLLLLLLLLLLLLLLLLEEEEEEEELLLLLEEEEEEELEEELL

Sequence:	GTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLI
  ReProf:	LLLEEEEEELLLLEEEEEEELEEHHHHHHHHHHLLLHHHHHHHHHHHLLLLLLLEEEEEEELLLLLLLLLLLEEELLLLLLLLLLLLLLHHHHHHHHHHH
 PsiPred:	LLLEEEEEELLLLLLEEEEEELLLLHHHHHHHHLLLHHHHHHHHHHLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLLHHLLLLLHHHH
    DSSP:	--LEEEEEEEEEEEEE--EEEEE--HHHHHHHLLL-HHHHHHHHHHHHLLLLL---EEEEEEL--LLLLL---L--LLL-LL-LL----HHHHHHHHHHH
 UniProt:	LLEEEEEEEEEEEEEELLEEEEELLHHHHHHHHLLLHHHHHHHHHHHHLLLLLLLLEEEEEEELLEEEEELLLLLLEEELLLLLLLLLLHHHHLLHHHHH

Sequence:	HRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA
  ReProf:	HHHHHHLLLLEEEEEELLLLLLEEEELLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLLLLLLLLLELLLHHHHHHHHH
 PsiPred:	HHHHHHLLLLLEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHLLLLLLEEEELLLLLLLLLLLLLLLLLLLLHHHHHHHHH
    DSSP:	HHHHHH-LL--EEEEEEL---HHHELL-LLLLL-EELL-LLLHHHHHHHHHHHHHHHHHHHLL---LEEEL-LLLLHHHLLL--L---E--HHHHHHHHH
 UniProt:	HHHHHHLLLLLEEEEEEELLLHHHEEELEEEEELEEEELLLLHHHHHHHHHHHHHHHHHHHLLLLLEEEEELLLHHHHHLLLLEEELLLLLHHHHHHHHH

Sequence:	RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGM
  ReProf:	HHHHHHHHHLLLLLEEEEEEELLLLLLHHHHHHHLLLHHHHHHHHHLEEEELLLLLLLLHHHHHHHHHHLLLLLEEEEEEEELLLLLLLLLLLLLHHHHH
 PsiPred:	HHHHHHHHLLLLLLEEEEEELLLLLLLLLLLHHHLLLHHHHHHLLEEEEELLLLLLLHHHHHHHHHHHHLLLLEEEEEELLLLLLLLLLLLLLLLHHHHH
    DSSP:	HHHHHHHHLLLLLLLEEEEEEEEHHHLLHHHHHHHLLHHHHLL--EEEEEEELLL---HHHHHHHHHHH-LLLEEEEEEEE----LLL-L--LL-HHHHH
 UniProt:	HLHHHHHHLLLLLLEEEEEEEEEHHHLLHHHHHHHLLHHHHLLLLEEEEEEELHHHLLHHHHHHHHHHHLLLEEEEEEEEELLLLEEELLLLLLLHHHHH

Sequence:	QYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVL
  ReProf:	HHHHHHHHHHHHHLHHEEEEEEEELLLLLLLLLLLLEEEEEEEELLLLEEEEELLEHHHHEEEELLLLLLEEEEEELLLLLLEEEEEEELLLLLEEEEEE
 PsiPred:	HHHHHHHHHHHLLLEEEEEELLLLLLLLLLLLLLLLLLLLEEEELLLLEEEELLLEEEEEHHLLLLLLLLEEEEEELLLLLLLEEEEEELLLLLEEEEEE
    DSSP:	HHHHHHHHHHHLLEEEEEEEEL-E-LLL---LL------LEEEEHHHLEEEE-HHHHHHHHHHLL--LL-EEEEEEELL--LEEEEEEE-LLL-EEEEEE
 UniProt:	HHHHHHHHHHHLLEEEEEEEEEEELEEELLLLLLLLLLLEEEEEHHHLEEEELHHHHHHHHHHLLLLLLLEEEEEEEEELLEEEEEEEELLLLLEEEEEE

Sequence:	NRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
  ReProf:	LLLLLLEEEEEEELLLEEEEEELLLLEEEEEEEELL
 PsiPred:	ELLLLLEEEEEELLLLLEEEEELLLLEEEEEEEELL
    DSSP:	E-LLL-EEEEEEELLLEEEEEEE-LLEEEEEEE---
 UniProt:	ELEEELEEEEEEELLLEEEEEEELLLEEEEEEELLL

Schematic representation of UniProt secondary structure for P04062 (source: [1])

</figure>

Secondary structure of the PDB structure 1OGS, visualized with VMD using "new cartoon" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta (source of the PDB structure: [2])

</figure>

Comparison to available knowledge

Here we compare the secondary structure predictions and DSSP assignment for the protein sequence P04062 to the available knowledge in UniProt and PDB.

UniProt
UniProt secondary structure annotation assigns residues into one of the three states: helix, strand or turn. The annotation might be unreliable, if no evidence on experimental level is available for the protein. However, the existence of our protein, P04062, was verified on protein level, therefore we can rely on the annotation to some extent. The UniProt secondary structure annotation for P04062 is shown in the image above. It also included into the alignment in previous section, regarding both turns and positions not in one of the three states (helix, strand or turn) as loops. As one can see from the alignment, the main difference is that ReProf and PsiPred both predict one long helix and ReProf additionally two short helices before it (with 4 and 2 residues) near the beginning of the sequence, whereas UniProt annotates only loops there (and DSSP has no assignment there). But altogether, the secondary structures look very similar, excluding small disagreements in the exact position and length of a segment or not everywhere present short segments. The latter may be falsely predicted or assigned.

PDB
The PDB structure of owr protein P04062, 1OGS, consists of two identical chains, A and B. From looking at the cartoon representation colored according to the secondary structures, one can see that each chain contains many alternating helices and sheets connected by loops. Beta barrel fold can be recognized and an extra beta sheet ring on the side of each chain. This supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein P04062.

Ribonuclease inhibitor (P10775)

Aligned view

This is the aligned view of the secondary structure predictions with ReProf and PsiPred, the DSSP assignment and the UniProt annotation for Ribonuclease inhibitor (P10775).

Sequence:	MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL
  ReProf:	LEELLLLLLLLHHHHHHHHHHHHLLLEEEELLLLLLHHHHHHHHHHHHLLLLLEEEELLLLLLLHHHHHHHHHHHHLLLHHEEEEELLLLLLLHHHHHHH
 PsiPred:	LEEELLLLLLLHHHHHHHHHHHHHLLEEELLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLHHHHHHHHHHLLLLLLLLLEEEEELLLLLHHHHLHH
    DSSP:	-E--EEL----HHHHHHHHHHHLL-LEEEEEL----HHHHHHHHHHHLL-LL--EEE--L---HHHHHHHHHHHHLLLL----EEE-LLL---HHHHHLH
 UniProt:	LLLLEEELLLLHHHHHHHHHHHLLLEEEEEELLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLHHHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHLH

Sequence:	PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR
  ReProf:	HHHHHHLLLLLHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHLLLLLHHHHHHHHHHHHLLLLLLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
 PsiPred:	HHHHLLLLLLLEEELLLLLLLLHHHHHHHHHHLLLLLLLLEEEELLLLLLHHLHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHHLLLLLLLLLEEE
    DSSP:	HHHHHH-LL--EEE--L---HHHHHHHHHHHHHLLL----EEE-LL---EHHHHHHHHHHHHH-L---EEE-LLLE-HHHHHHHHHHHHHL--L---EEE
 UniProt:	HHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHHLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHHLLLLLLLEEE

Sequence:	LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC
  ReProf:	HHLLLLLHHHHHHHHHHHHHLLLLLHHHHLLLLLLLHHHHHHHHHHHHHLHHHHHHLLLLLLLLHHHHHHHHHHHHHLLHHHHHHHHHHLLLHHHHHHHH
 PsiPred:	LLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHH
    DSSP:	-LLL---HHHHHHHHHHHHH-LL--EEE--LL--HHHHHHHHHHHHL-LL----EEE-LLL---HHHHHHHHHHHHH-LL--EEE-LLL--HHHHHHHHH
 UniProt:	LLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHH

Sequence:	ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL
  ReProf:	HHHHHHLLLEEEEEEELLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHHHHLLLLLEEEEEELLLLLLHHHHHHHHHHHHHLLLLEEEEL
 PsiPred:	HHLLLLLLLLLEEEELLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHLLLLLLLLLEEEELLLLLLLHHHHHHHHHHHHLLLLLLEEEL
    DSSP:	HHHLLLL----EEE-LLL--EHHHHHHHHHHHHH-LL--EEE--LLE-HHHHHHHHHHHLLLLL----EEE-LLL---HHHHHHHHHHHHH--L--EEE-
 UniProt:	HHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLEEELHHHHHHHHHHHLLEEELLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEEL

Sequence:	SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
  ReProf:	LLLLLLHHHHHHHHHHHHHLLLLLEEEEELLLLLLHHHHHHHHHHHHHLLLLLELL
 PsiPred:	LLLLLLLHHHHHHHHHLLLLLLLLLEEELLLLLLLHHHHHHHHHHHHLLLLLEELL
    DSSP:	LLLL--HHHHHHHHHHHLLLL----EEE-LL----HHHHHHHHHHHHH-LL-EEE-
 UniProt:	LEEELLHHHHHHHHHHHLEEELLLLEEELLLLLLLHHHHHHHHHHHHHLLLLEEE

Schematic representation of UniProt secondary structure for P10775 (source: [3])

</figure>

Secondary structure of the PDB structure 2BNH, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta (source of the PDB structure: [4])

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence P10775 to the available knowledge in UniProt and PDB.

UniProt
The existence of the protein P10775 was verified on protein level,too, therefore we can rely on the annotation to some extent. The UniProt secondary structure annotation for P10775 is shown in the image above. Like before, it is also included into the secondary structure alignment in previous section. The main differences occur in ReProf in the first half of the sequence: prediction of a helix, where a 3-residue long strand is predicted by the other sources, and prediction of a longer helix after that, where a shorter helix and a short strand are predicted by the others. The first case occurs once more and the latter three more times. Overall, more helices are predicted by ReProf. In the PsiPred prediction, DSSP assignment and UniProt annotation the secondary structures look altogether very similar: a sequence of alternating helices and strands separated by loops, sometimes with two short consequent strands.

PDB
From the visualization of the PDB structure of the protein P10775, 2BNH, one can see that it has the typical hoof fold, with helices on the outer side and sheets in the inner side connected by loops from both sides. This fold supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein P10775.

Divalent-cation tolerance protein CutA (Q9X0E6)

Aligned view

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q9X0E6 to the available knowledge in UniProt and PDB.

Sequence:	MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV
  ReProf:	LEEEEEELLLHHHHHHHHHHHHHHLLEEEEELLLEEEEEEELLLLEEEEEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELHLLLHHHHHHHHHHL
 PsiPred:	LEEEEELLLLHHHHHHHHHHHHHLLLLLEEEEEEEEEEEEELLLEEEEEEEEEEEELLHHLHHHHHHHHHHHLLLLLLEEEEEELLLLLHHHHHHHHHHL
    DSSP:	-EEEEEEELLHHHHHHHHHHHHHLLL-LEEEEEEEEEEEEELLEEEEEEEEEEEEEEEHHHHHHHHHHHHHH-LLLL--EEEEE-L---HHHHHHHHHH-
 UniProt:	EEEEEEEEEEHHHHHHHHHHHHHLLLLEEEEEEEEEEEEEELLEEEEEEEEEEEEEEEHHHHHHHHHHHHHHLEEEELLEEEELLLLLLHHHHHHHHHH

Sequence:	L
  ReProf:	L
 PsiPred:	L
    DSSP:	-
 UniProt:

Schematic representation of UniProt secondary structure for Q9X0E6 (source: [5])

</figure>

Secondary structure of the PDB structure 1VHF, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta (source of the PDB structure: [6])

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q9X0E6 to the available knowledge in UniProt and PDB.

UniProt
The existence of the protein Q9X0E6 was also verified on protein level, thus we can rely on its UniProt annotation. The UniProt secondary structure annotation for Q9X0E6 is shown in the image above. Like before, it is also included into the secondary structure alignment in previous section. It is a short protein of only 101 amino acids and all 4 sources agree almost entirely in the secondary structure prediction or assignment. The protein contains three helices (apart from a single loop predicted in the second helix in PsiPred) and 4 strands according to PsiPred and DSSP. ReProf splits the second strand by a 3-residue loop, whereas UniProt splits the last strand by 2 loop residues.

PDB
From the visualization of the PDB structure of the protein Q9X0E6, 1VHF, in Pymol we can see the number and consequence of its helices, loops and strands : lELHLELEHLELh (a lower case letter represents here one residue in this structure and an upper case letter multiple residues in this state). It is the same as the DSSP assignment of the protein P10775.

Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

Aligned view

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q08209 to the available knowledge in UniProt and PDB.

Sequence:	MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK
  ReProf:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEELLLLLLLLLLLLEHHHHHH
 PsiPred:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHLLLLEEELLLEEEELLLLLHHHHHHH
    DSSP:	----------------LLLLL-------E-HHHHE-LLL-E-HHHHHHHHHLL--E-HHHHHHHHHHHHHHHHLL-LEEEE-LLEEEE---LL-HHHHHH
 UniProt:	LLLLLLLLLLEEELLLLLLLLLLLLLLLLLHHHHLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLEEEEELEEEEEELLLLLLHHHHHH

Sequence:	LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG
  ReProf:	HHHHLLLLLLLEEEEELEELLLLLLLHHHHHHHHHHHHHLLLHEEEEELLLLLLLLLLLLLLHHHHHHHLHHHHHHHHHHHHHHHLLHHEELLEEEEEEL
 PsiPred:	HHHHLLLLLLLLLEELLLLLLLLLLHHHHHHHHHHHHHLLLLLEEEELLLLLLLLLLLLLLHHHHHHHHHLHHHHHHHHHHLLLLHHHHHHLLLEEEELL
    DSSP:	HHHHH--LLL--EEE-L--LLLLL-HHHHHHHHHHHHHHLLLLEEE---LLLLHHHHHHLLHHHHHHHHL-HHHHHHHHHHHLLL--EEEELLLEEEELL
 UniProt:	HHHHHLLLLLLLEEELLLLEEEELLHHHHHHHHHHHHHHLLLLEEELLLLLLLHHHHHHLLHHHHHHHHLLHHHHHHHHHHHHLLLLEEEELLLEEEEEE

Sequence:	GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP
  ReProf:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLLLLLLLLLLLELLLLLLLEEEELHHHHHHHHHHLLLEEEEEELLELLLLEEEEEELLLLLLL
 PsiPred:	LLLLLLLLHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEELLHHHHHHHHHHLLLLHHHHHLLLLLLLLLLEELLLLLLLL
    DSSP:	---LL--LHHHHHHL--LLL--LLLHHHHHHH-EE-LLLLL-LL---EEE-LLLLLLEEE-HHHHHHHHHHLL-LEEEE--L--LLLEEE--E-LLLLLE
 UniProt:	LLLLLEEELHHHHLLLLEEELLEEEHHHHHHHLLLLLLLLLLLLLLEEEELLLLEEEEEELHHHHHHHHHHLLLLEEEELLLLLLLEEEELLLLLLLEEE

Sequence:	SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI
  ReProf:	EEEEEEELLLLLLLLLLEEEEEEELLLLLEEEEEELLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
 PsiPred:	LEEEEELLLLLLLLLLLLEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLHHLHHHHHHHHHHHH
    DSSP:	LEEEE---LLHHHLL---EEEEEEELLEEEEEEE---------HHH--HHHHHHHHHHHHHHHHHHHHHLL-----------------------------
 UniProt:	EEEEEELLLLHHHLLLLLEEEEEEELLEEEEEEELLLLLLLLLHHHLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHH

Sequence:	RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN
  ReProf:	HHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLHHHLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLL
 PsiPred:	HHHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHLLLLLLLLLLHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLL
    DSSP:	--------------------------------------------------------------------HHHHHHHHHHHHL-------------------
 UniProt:	HHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH

Sequence:	KALTSETNGTDSNGSNSSNIQ
  ReProf:	LLLLLLLLLLLLLLLLLLLLL
 PsiPred:	LLLLLLLLLLLLLLLLLLLLL
    DSSP:	--------------------
 UniProt:

Schematic representation of UniProt secondary structure for Q08209 (source: [7])

</figure>

Secondary structure of the PDB structure 1AUI, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta (source of the PDB structure: [8])

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q08209 to the available knowledge in UniProt and PDB.

UniProt
Also the last protein we explore was verified on protein level, therefore we can trust the UniProt annotation to some extent. The UniProt secondary structure annotation for Q08209 is shown in the image above, which is also included into the secondary structure alignment in previous section. It is a long protein of over 500 residues and some disagreements about its secondary structure assignment can be seen. According to all four sources, the protein contains many loops and helices and some strands. Like in P04062, there are some disagreements in the exact position and length of a segment or short segments not present everywhere. The main differences are:

A 6-residue long helix is assigned by DSSP and UniProt, but not by the predictors.
ReProf does not find a 4-6 residues long helix present at the three other sources.
ReProf predicts a 5-residues long strand instead of a 7-residues long helix assigned by DSSP and UniProt.
PsiPred predicts only a loop instead of the latter helix.
PsiPred predicts a helix, where a 4-6 residue long strand is assigned by the other sources.
Near the end of the protein (bevore the last "conserved" helix), RPsiPred predicts a long helix, ReProf only a 3-residues long helix, whereas only lopps are assigned by DSSP and UniProt in that region.

PDB
From the visualization of the PDB structure of the protein Q08209, 1AUI, one can see that it contains many helices connected by loop regions and sometimes by short strands, also there is a beta-sheet region consisting of ten strands. This supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein Q08209.

Colclusion

Using the secondary structure predictions ReProf and PsiPred, DSSP assignments and UniProt annotations, we could learn for our protein that its secondary structure mainly contains helices and strands. It is a dimer, each chain folds into a beta-barrel domain. Also about the example proteins we could learn about their secondary structures (already discussed in the respective sections).

To sum up, ReProf and PsiPred are good secondary structure prediction tool. Their predictions agree in the most cases with the DSSP assignment - typically used as a reference, because it uses the actually resolved PDB structure of a protein, which could also be seen in the visualization of the structures - and the UniProt annotation of the secondary structure.

Disorder

In this task we predict protein disordered and globular regions using IUpred and MetaDisorder.
Lab journal

IUPred

IUpred is a protein disorder predictor. User can choose one of the three options:

long for prediction of long disorders
short for prediction of short disorders
glob for prediction of structured, globular domains

IUpred prediction results for each protein are presented and described in the plots below. The disorder tendency ranges from 0 to 1 and is plotted for each residue in a protein sequence. Residues with a tendency above 0.5 are seen as disordered.

IUpred predictions of disordered protein regions with the different options: long, short and glob.
IUpred results for protein P04062. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. Almost the whole protein sequence - from position 4 till the end (536) - is in a globular domain, according to "glob". Only a short region of three first residues is predicted to be disordered.
IUpred results for protein P10775. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. The whole protein sequence (456 residues long) is predicted to be in a globular domain, according to "glob".
IUpred results for protein Q9X0E6. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. The whole protein sequence (101 residues long) is predicted to be in a globular domain, according to "glob".
IUpred results for protein Q08209. There are only small deviations between the "long" and "short" prediction, "long" predicts more disorder at the end on the protein, whereas "short" predicts more disorder at the beginning and the very end of the protein. According to "glob", a major part of the protein sequence - from position 5 till 446 from a total of 521 residues - is in a globular domain. Therefore, two disordered regions are predicted: one consisting only of four residues at the beginning and one containing 75 amino acids at the end of the protein.

MetaDisorder

MetaDisorder (MD) is a meta-predictor that combines several prediction methods:

NORSnet: prediction of unstructured loops
PROFbval: prediction of residue flexibility from sequence
Ucon: prediction of protein disorder using predicted internal contacts

Among the prediction scores of the three predictors, MD gives the final decision on disorder as well as MDrel: reliability of the final prediction, whose values range from 0-9 (9 is the strongest prediction). The raw prediction scores as well as the MD final score for each of the four proteins are visualized in the following plots.

MetaDisorder (MD) predictions of disordered protein regions. Raw scores of the used programs: NORSnet, PROFbval and Ucon, as well as the MD score are shown.
MetaDisorder results for protein P04062. MD prediction (red line) looks very similar to IUpred prediction, the very beginning of the protein is predicted to be disordered. The predictions of the single programs look very different and are very fluctuating. PROFbval (blue line) outputs higher scores (frequently over 0.5), than NORSnet (green line) and Ucon (purple line), however Ucon has some high peaks over 0.5. Overall, MD prediction seems more reliable, than predictions of the stand-alone methods.
MetaDisorder results for protein P10775. MD results are very similar to IUpred "short" results, the very beginning and end of the protein seem to be slightly disordered, however the score goes only a little over 0.5 for this prediction to be reliable. PROFbval prediction is the most similar to MD because of the higher scores at the ends, only the scores are overall higher (often over 0.5). NORSnet and Ucon output lower scores and still lower at the ends, Ucon has sometimes high peaks. Again, MD seems to predict disordered regions better than the single programs.
MetaDisorder results for protein Q9X0E6. Compared to IUpred prediction, MD, PROFbval and Ucon predictions look very different and fluctuating, predicting several disordered regions, which are not present in the IUpred results. Only NORSnet predicts the whole protein as lacking unstructured loops, like IUpred. Maybe the worse prediction of MD can be caused by the short length of this protein.
MetaDisorder results for protein Q08209. MD predicts disorder at the beginning and the end of the protein, as IUpred. NORSnet prediction is also similar, however it predict less disorder at the very beginning but more disorder after it (approx. position 5-40). Interestingly, both MD and NORSnet predict slight peaks around the position 240 and 375. The two other methods - Ucon and PROFbval - have very fluctuating prediction with higher scores and many peaks. Here MD and NORSnet seem to make more reliable predictions.

DisProt

Human Glucosylceramidase (P04062)

DisProt Psi-BLAST search result for protein P04062.

For our protein there is no entry in DisProt. There is an option to search for similar sequences in DisProt, with Smith-Waterman alignment method or Psi-BLAST. Using the Psi-BLAST search, only one sequence producing significant alignment was found: DP00159. As only a short region was aligned (47 aligned columns) and the sequence identity of the alignment is only 27%, we cannot consider DisProt annotation of the found protein DP00159 for the protein P04062.

Ribonuclease inhibitor (P10775)

Also for protein P10775 no DisProt entry is found. Psi-BLAST search yielded seven matches, the one with the best score and E-value is with the sequence DP00554. As the alignment is pretty good (40% identity and 196 aligned columns), we check the annotation of the similar protein DP00554. According to the annotation, there is only one 20 residues long disordered region near the beginning of the sequence (residues 31 - 50). However, this region does not fall into the alignment (aligned positions of the query are: 144 - 339 and of the target: 787 - 982). Therefore, also this DisProt annotation is useless for our protein of interest, P10775.

Divalent-cation tolerance protein CutA (Q9X0E6)

The protein Q9X0E6 is not found in DisProt as well. With Psi-BLAST search only insignificant and short matches are found, which cannot be considered for DisProt annotation transfer.

Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

It is the only protein from the four which could be found in DisProt with the ID. According to DisProt, there are five disordered regions in the protein, two of them overlap (see the figure on the right and <xr id="Q08209_disprot"/>). These regions are at the ends of the sequence: one at the N terminus (positions 1 - 13) and four at the C terminus (altogether positions 374 - 521). The sixth region is ordered and is in the core of the sequence and is also the longest region (positions 14 - 373). (All regions map to the PDB structure 1AUI:A.) The predictions we made with IUpred an MD for the protein Q08209 yielded similar results for the disordered regions.

DisProt visualization of disordered and ordered regions for protein Q08209.

Region	Type	Name	Location	Length	Structural/functional type	Functional classes	Functional subclasses
1	Disordered - Extended		1 - 13	13	Relationship to function unknown	Unknown	Unknown
2	Disordered - Extended		374 - 468	95	Function arises via a disorder to order transition	Molecular assembly	Autoregulatory, Protein-protein binding
3	Disordered - Extended	CaM-binding domain	390 - 414	25	Function arises via a disorder to order transition	Molecular assembly	Protein-protein binding, Autoregulatory
4	Disordered - Extended	Autoinhibitory region	469 - 486	18	Function arises via an order to disorder transition and vice versa	Molecular assembly	Protein-protein binding, Autoregulatory
5	Disordered - Extended		487 - 521	35	Relationship to function unknown	Unknown	Unknown
6	Ordered		14 - 373	360

DisProt disordered and ordered regions for protein Q08209.

</figtable>

Transmembrane Helices

Lab journal

Four Proteins, including the Gaucher's disease causing Protein, where analysed under reference by transmembrane (TM) helices. The used prediction tools differ in their analysing features. While Polyphobius only differs between residues being part of a transmembrane helix (TMH) or being inside/outside of the cytoplama, Memsat-SVM also predicts re-entrant helices and pore-linig helices. Due to the fact that pore-lining helices are also transmembrane helices, this kind of helices is detected of both prediction tools. In case of re-entrant helices both programms differ. In general a membrane helix crosses the membrane, so that both ends of the helix lie on different sides of the membrane. In contrast, the re-entrant helix leads bot its ends to the same side of the mebrane. Memsat-SVM can predict re-entrant helices, but Polyphobius treats this helices as a general membrane helices, which crosses the membrane (seen for Q9YDF8), or ignores it (seen for P47863). In case of re-entrant helices predictions also the C-terminal or the N-terminal may be predicted on different membrane sides, as well as some helices may be predicted to lie in a different direction within the membrane, because of an re-entrant helix.

The database OPM do not give the direct information about the localization of the N- and C-terminus as well as the type of the helices. Instead of differing between transmembrane and re-entrant helices, OPM classifies all identified membrane helices (MH) as transmembrane. These localizations and the helix type (transmembrane or re-entrant) can be detected from the visualisation of the protein, provided by OPM and also shown in the following tables below.

The second database PDMTM only contains transmembrane proteins. For non-transmembrane proteins, no information is available.

Human Glucosylceramidase

This Protein is not a membrane protein and is located on the extracellular side of the membrane as documented in OPM. For the same reason there exist no entry in the PDBTM, as this databse only contains membrane proteins. The prediction of Polyphobius causes to the same result. Additionally Polyphobius predicted also the signal peptide (including the N/H/C-region). MemsatSVM detected a false positive transmembrane helix. As the Glucosylceramidase cleaves lipids of cell membranes, the active site of the enzyme may be mistaken for a transmembrane helix.

Comparison of membrane helices (MH) for Glucosylceramidase (P04062, human)
	Prediction		Assignment
	Memsat SVM	Polyphobius	OPM	PDMTM
# of MH	1	-	-	-
MH Topology	456-471	-	-	-
N-terminal	extracellular	extracellular	extracellular	-
C-terminal	cytoplasmic	extracellular	extracellular	-
Signal peptide	1-34	1-40	-	-
Re-entrant Helix	-	-	-	-
Pore-lining Helix	1	-	-	-
Graphical position				-
more information	P04062		1OGS	1OGS is not in the PDBTM

Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human glucocerebrosidase. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Aeropyrum pernix Voltage-gated potassium channel

For the protein of the Arachae, Aeropyrum pernix, 4 different pdb ids were found:

1ORQ: chain C
1ORS: chain C
2A0L: chain A/B
2KYH: chain A

As all pdb ids represent structures on different chains, which are not the same, it was difficult to choose one of the ids. In the end the 1ORS was choosen, because of two reasons. The x-ray structure has the highest resolution compared to the others. Aside from this, this structure represents a sensor domain and musst be important for the protein. The predictions have completly different results than the assignments. As the predictions are more similar to each other, they were compared to each other. The same was done for the two assignments. Both predictions have the same number of helices. Nevertheless some helices have a greater deviation in their position. Memsat predicted an re-entrant helix where Polyphobius detected a transmembrane helix. Thats why the N-terminal is predicted different of both programms.

The assignment of OPM has actually one helix more, but only because of a different declaration of its helices than PDBTM. The third helix of PDBTM consist of two shorter consecutive helices. Both together they form one larger helix which crosses the membrane once and are therfore seen as one helix in PDBMT. These two mini-helices which would be too short to cross the membrane alone are counted separatly in OPM. Apart from a light deviation of a few residues at the ends of the helices, the strucure is the same in both databases.

Comparison of membrane helices (MH) for Voltage-gated potassium channel (Q9YDF8, Aeropyrum pernix)
	Prediction		Assignment
	Memsat SVM	Polyphobius	OPM	PDMTM
# of MHs	6	7	5	4
MH Topology	1. 43-59 2. 72-90 3. 101-118 4. 128-143 5. 163-184 6. 188-217 7. 221-245	1. 42-60 2. 68-88 3. 108-129 4. 137-157 5. 163-184 6. 196-213 7. 224-244	1. 25-46 2. 55-78 3. 86-97 4. 100-107 5. 117-148 6. - 7. -	1. 27-50 2. 55-75 3. 88-107 4. - 5. 118-142 6. - 7. -
N-terminal	cytoplasmic	extracellular	cytoplasmic	cytoplasmic
C-terminal	cytoplasmic	cytoplasmic	cytoplasmic	cytoplasmic
Signal peptide	-	-	-	-
Re-entrant Helix	1	-	-	-
Pore-lining Helix	1	-	-	-
Graphical position
more information	Q9YDF8		1ORS	1ORS

Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for aeropyrum pernix Voltage-gated potassium channel. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Human Lysosome-associated membrane glycoprotein 1

Both predictions have results similar to the assignments of OPM and PDBMT. All predicted transmembrane helices differ in their position only by a few residues. The protein consists of 6 transmembrane helices and 2 re-entrant helices. Polyphobius skips the re-entrant helices prediction but predicts the remaining membrane helices well. MemsatSVM predicts the re-entrant helices similar to the re-entrant helices of the database entries. Unfortunately MemsatSVM predicts the placing inside the membrane wrong. Instead of the C- and N-terminal situated in the cytoplasm, MemsatSVM places the both ends in the extracellular region.

The two assignments are very similar, OPM does not particularly signs two of its helices as re-entrant but both helices can be seen as re-entrant in the OPM visualisation. The re-entrant helices are colored gold in the PDBTM and are lightly silhouetted against the yellow transmembrane helices. All pictures can be found in the table below.

Comparison of membrane helices (MH) for Lysosome-associated membrane glycoprotein 1 (P47863, human)
	Prediction		Assignment
	Memsat SVM	Polyphobius	OPM	PDMTM
# of MH	8	6	8 (per chain)	8 (per chain)
MH Topology	1. 35-56 2. 71-89 3. 93-109 4. 113-136 5. 157-178 6. 190-205 7. 209-225 8. 232-252	1. 34-58 2. 70-91 3. - 4. 115-136 5. 156-177 6. 188-208 7. - 8. 231-252	1. 34-56 2. 70-88 3. 98-107 4. 112-136 5. 156-178 6. 189-203 7. 214-223 8. 231-252	1. 39-55 2. 72-89 3. 95-106 4. 116-133 5. 158-177 6. 188-205 7. 209-222 8. 231-248
N-terminal	extracellular	cytoplasmic	cytoplasmic	cytoplasmic
C-terminal	extracellular	cytoplasmic	cytoplasmic	cytoplasmic
Signal peptide	1-20	-	-	-
Re-entrant Helix	2	-	2	2
Pore-lining Helix	4	-	-	-
Graphical position
more information	P47863		2D57	2D57

Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human Lysosome-associated membrane glycoprotein 1. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Human D3 dopamine receptor

The dopamine receptor is a transmembrane protein. All predicted transmembrane helices of the predictors and the databases agree mostly, with only a less shift of a few residues. While Polyphobius predicts all 7 transmembrane helices which are also documented in OPM and PDMTM, Memsat_SVM only identifies 6 transmembrane helices. As the missing helix is the last one, the C-terminus of the protein is localised extracellular instead of cytoplasmic. The programm classifies the 3rd helix as a pore-lining helix.

Comparison of membrane helices (MH) for D3 dopamine receptor (P35462, human)
	Prediction		Assignment
	Memsat SVM	Polyphobius	OPM	PDMTM
# of MH	6	7	7	7
MH Topology	1. 32-55 2. 65-88 3. 101-129 4. 151-169 5. 188-209 6. 331-354 7. -	1. 30-55 2. 66-88 3. 105-126 4. 150-170 5. 188-212 6. 329-352 7. 367-386	1. 34-52 2. 67-91 3. 101-126 4. 150-170 5. 187-209 6. 330-351 7. 363-386	1. 35-52 2. 68-84 3. 109-123 4. 152-166 5. 191-206 6. 334-347 7. 368-382
N-terminal	extracellular	extracellular	extracellular	extracellular
C-terminal	extracellular	cytoplasmic	cytoplasmic	cytoplasmic
Signal peptide	1-29	-	-	-
Re-entrant Helix	-	-	-	-
Pore-lining Helix	1	-	-	-
Graphical position
more information	P35462		3PBL	3PBL

Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human D3 dopamine receptor. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Signal Peptides

Lab journal

For the following proteins, the signal peptides as well as its cleavage sides were predigted with SignalP:

Glucosylceramidase (P04062, human)
Serum albumin (P02768, human)
Aquaporin 4 (P11279, rat)
Lysosome-associated membrane glycoprotein 1 (P47863, human)

The four eukaryotic proteins were also looked up in the Signal Peptide Database to compare the entry with the results of the prediction.

Glucosylceramidase (P04062)

For the Glucosylcerbrosidase, the prediction of SignalP differs from the database entry.

In the database the protein has a signal peptide of 39 residues. A signal peptide is characterized with high hydrophobicity in its core region followed by the cleavage site [9]. Especially the residues 18-23 and 27-34 indicate with its higher hydrohobicity to a signal peptide (green area in <xr id="sp_gluco"/>, 2).

MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASG

However, the prediction of SignalP results in no signal peptide. On the visualisation of the different scores in <xr id="sp_gluco"/>, the green signal peptide score shows the most possible prediction for an signal peptide. The green line is higher for the first 39 residues than for the later residues. But the calculated D-score of the detected peptide lies with 0.37 below the threshold (0.5). The peptide is neglected as signal. These residues are not only defined as signal peptide by the database, but were also detected, with a light deviation, by the transmembrane helix predictors MemsatSVM (residues 1-34) and Polyphobius(residues 1-40).

SignalP result for P04062: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Hydrophobicity of all residues of the signal peptide of P04062 in the Signal Peptide Database.

SignalP output and information of Signal Peptide Database of Glucosylceramidase. </figure>

Serum albumin (P02768)

The signal peptide consists of residues 1-18 and is predicted of SignalP as well as documented in the Signal Peptide Databse

MKWVTFISLLFLFSSAYS

<xr id="sp_ser"/> show an clearly prediction of the signal peptide. A high S-core for the signal peptide region with a D-score of 0.85 far over th threshold. The cleavage side is predicted between the residue 18 and 19. The database shows a high hydrophobicity for the residues 6-14 which marks the region as signal peptide as well.

SignalP result for P02768: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Hydrophobicity of all residues of the signal peptide of P02768 in the Signal Peptide Database.

SignalP output and information of Signal Peptide Database of Serum albumin. </figure>

Aquaporin 4 (P11279)

For Aquaporin the Scores are even higher than for Serum albumin. The signal peptide consists of 28 residues as follows:

MAAPGSARRPLLLLLLLLLLGLMHCASA

The database shows a large hydrophobic region of 17 residues. At the end of the protein a transmembrane helix with a length of 23 residues ending in cytoplasm is documented in the Aquaporin 4 entry. The SignalP prediction gives a D-score of 0.95 for the detected signal peptide. The cleavage site is predicted between the residues 28 and 29 (ASA-AM). <xr id="sp_aq"/>

SignalP result for P11279: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Hydrophobicity of all residues of the signal peptide of P11279 in the Signal Peptide Database.

SignalP output and information of Signal Peptide Database of Aquaporin 4. </figure>

Lysosome-associated membrane glycoprotein 1 (P47863)

The rat protein has no entry in the Signal Peptide Database, as no signal peptide exists for it. The visualised results of the prediction show on the first sight, that the protein does not have a signal peptide. All scores are lower than 0.21, which is far below the threshold for signal peptides.

SignalP result for P47863: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score. The threshold is marked by a red dotted line.

SignalP output of Lysosome-associated membrane glycoprotein 1 </figure>

GO Terms

Lab journal

Not very good prediction, depends a lot on what is known.

GOPET

GOPET: GO Terms for Glucocerebrosidase
GO ID	Aspect	Confidence in %	GO Term
GO:0016787	F	98	hydrolase activity
GO:0004348	F	97	glucosylceramidase activity
GO:0016798	F	97	hydrolase activity acting on glycosyl bonds

GOPET mainly focus on the activity of the Glucosycerebrosidase. The predicted GO Terms are less but correlate with the disease description in task1 as well as the description in Pfam.

Predict Protein

Predict Protein: GO Terms for Glucocerebrosidase
	GO ID	GO Term	Reliability in %
Biological Process	GO:0005515	protein binding	70
Cellular Component	-	-	-
Molecular Function	GO:0007040 GO:0006665 GO:0005975 GO:0008219 GO:0005976	lysosome organization sphingolipid metabolic process carbohydrate metabolic process cell death polysaccharide metabolic process	49 36 35 19 8

The first three predicted Terms of the Molecular Function are confirmed by our knowledge about the process. The protein has no directly influence on cell death (GO:0008219), but is indirect involved as it processes the cell membrane of death blood cells. A participation on polysaccaride metabolic process is not known, as the glucosylcerebrosidase acts on fatty acids not on ploysaccarides. This GO Term is supposed to be wrong.

ProtFun2.0

Functional category                 
                                      Prob     Odds
 Amino_acid_biosynthesis              0.035    1.593
 Biosynthesis_of_cofactors            0.182    2.528
 Cell_envelope                     => 0.504    8.262
 Cellular_processes                   0.032    0.438
 Central_intermediary_metabolism      0.382    6.063
 Energy_metabolism                    0.067    0.740
 Fatty_acid_metabolism                0.027    2.088
 Purines_and_pyrimidines              0.538    2.213
 Regulatory_functions                 0.031    0.191
 Replication_and_transcription        0.126    0.471
 Translation                          0.082    1.863
 Transport_and_binding                0.560    1.365

Enzyme/nonenzyme                     
                                      Prob     Odds
 Enzyme                            => 0.773    2.698
 Nonenzyme                            0.227    0.318

Enzyme class
                                      Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.083    0.399
 Transferase    (EC 2.-.-.-)          0.228    0.660
 Hydrolase      (EC 3.-.-.-)          0.272    0.859
 Lyase          (EC 4.-.-.-)          0.045    0.961
 Isomerase      (EC 5.-.-.-)          0.011    0.345
 Ligase         (EC 6.-.-.-)          0.017    0.332

Gene Ontology category
                                      Prob     Odds
 Signal_transducer                    0.054    0.251
 Receptor                             0.027    0.158
 Hormone                              0.001    0.206
 Structural_protein                   0.002    0.087
 Transporter                          0.024    0.222
 Ion_channel                          0.018    0.307
 Voltage-gated_ion_channel            0.004    0.195
 Cation_channel                       0.012    0.268
 Transcription                        0.070    0.550
 Transcription_regulation             0.030    0.237
 Stress_response                      0.085    0.962
 Immune_response                   => 0.153    1.804
 Growth_factor                        0.005    0.376
 Metal_ion_transport                  0.009    0.020

The ProtFun Server classifies P0462 as an enzyme of the class Hydrolase. The predicted Gene Ontalogy category indicates to the location of the protein, which acts in the macrophages (immunocells). The functional category "cell envelope" declares the protein to interact with the cell membrane. This is right as the protein not only interacts but processes fatty acids of membranes.

Pfam

For the protein P04062 O-Glycosyl hydrolase family 30 was found on position 40-533.

Identifiers
Symbol	Glyco_hydro_30
Pfam family	PF02055
Pfam clan	CL0058
InterPro	IPR001139
SCOP	1nof
SUPERFAMILY	1nof
OPM family	125
OPM protein	1ogs
CAZY	GH30

O-Glycosyl hydrolase family 30

This family is a part of glycoside hydrolases known under the EC 3.2.1. Glycoside hydrolases includes a great number of enzymes that destroy glycosidic bonds between carbohydrates and other moieties

This glycoside hydrolases has the clan (CL0058): Tim barrel glycosyl hydrolase superfamily

The Pfam entry for the family PF02055 also mentions the human glucosylcerbrosidase as Gaucher disease causing.

Discussion

Other available methods

Prediction of	Tool
secondary structure	APSSP: Advanced Protein Secondary Structure Prediction Server
CFSSP: Chou & Fasman Secondary Structure Prediction Server
GOR IV
Jpred3
disorder	DISOPRED2
transmembrane helices	MEMSAT3
TMHMM
PredictProtein
DAS
HMMTOP
TMpred
signal peptides	PrediSi
Polyphobius
MemsatSVM
SIGCLEAVE
ANTHEPROT
Signal Find Server
SPD
SPEPlip
SOSUIsignal
GO terms	AmiGo
PDB
Uniprot

What else can/is be predicted from protein sequence alone

Fold recognition (profile based pGenTHREADER and rapid GenTHREADER)
Fold domain recognition (pDomTHREADER)
Protein domain prediction (DomPred)
Homology modelling (BioSerf v2.0)
Function prediction (eukaryotic function: FFPred v2.0)
Prediction of TM topology and helix packing (SVM-based MEMPACK)

http://bioinf.cs.ucl.ac.uk/psipred/

Cleavage site prediction
Ab initio structure prediction (not very succesfull, combinatorial problem, computational intensive, worse for longer sequences. Moreover biological molecules are not necesserily in the lowest energy comformation.)
Solvent accesibility
Metal binding sites, active sites
Protein protein interactions
SNPs effect prediction

Which predictions can be improved considerably by structure-based approaches

Solvent accesibility

Gaucher Disease: Task 03 - Sequence-based predictions

Contents

Secondary Structure

Evaluation results

Human Glucosylceramidase (P04062)

Aligned view

Comparison to available knowledge

Ribonuclease inhibitor (P10775)

Aligned view

Comparison to available knowledge

Divalent-cation tolerance protein CutA (Q9X0E6)

Aligned view

Comparison to available knowledge

Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

Aligned view

Comparison to available knowledge

Colclusion

Disorder

IUPred

MetaDisorder

DisProt

Human Glucosylceramidase (P04062)

Ribonuclease inhibitor (P10775)

Divalent-cation tolerance protein CutA (Q9X0E6)

Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

Transmembrane Helices

Human Glucosylceramidase

Aeropyrum pernix Voltage-gated potassium channel

Human Lysosome-associated membrane glycoprotein 1

Human D3 dopamine receptor

Signal Peptides

Glucosylceramidase (P04062)

Serum albumin (P02768)

Aquaporin 4 (P11279)

Lysosome-associated membrane glycoprotein 1 (P47863)

GO Terms

GOPET

Predict Protein

ProtFun2.0

Pfam

Discussion

Other available methods

What else can/is be predicted from protein sequence alone

Which predictions can be improved considerably by structure-based approaches

Navigation menu

Search