Gaucher Disease: Task 03 - Sequence-based predictions

From Bioinformatikpedia
Revision as of 01:14, 6 September 2013 by Kalemanovm (talk | contribs) (Serum albumin (P02768))

Secondary Structure

In this task secondary structure of a protein is predicted using ReProf and compared to PsiPred prediction and DSSP structure assignment.

Lab journal

Evaluation results

Evaluation results of Reprof against Psipred and DSSP are summarized in <xr id="secondary structure results"/>. A Reprof run was performed starting with the Psi-BLAST PSSM after a run against big_80 with 3 iterations and E-value cutoff 10E-10 (as described in the lab journal in the link above).

<figtable id="secondary structure results">

Query Precision PsiPred Precision DSSP
E H L Total E H L Total
P10775 47.3 99.4 59.6 72.6 42.1 96.0 62.2 78.5
Q9X0E6 86.1 97.2 75.9 87.1 83.0 97.3 77.0 87.8
Q08209 87.5 74.3 86.0 82.2 70.5 75.9 88.9 78.8
P04062 88.4 96.8 76.2 83.6 80.0 84.1 86.3 83.3
Precision ( = number of matches / number of residues) of the ReProf prediction regarding the PsiPred prediction as well as the DSSP assignment.

</figtable>

Human Glucosylceramidase (P04062)

Aligned view

Aligned view of the secondary structure predictions with ReProf and PsiPred, the DSSP assignment and the UniProt annotation for the Gaucher's disease protein, P04062, is shown below.

Sequence:	MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHT
  ReProf:	LLLLLLHHHHLLHHLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLEEELLLLLEEEEEELLLLLLLLLLLLLLLLEEEEEEELLLLLEEEEELLLLLLLLL
 PsiPred:	LLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLL
    DSSP:	----------------------------------------E---EEE-LLLLEEEEEELL---E--------LLEEEEEEEELLL--LEEEEEE-ELL--
 UniProt:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEELEEEELLLLLLLLLLLLLLLLLEEEEEEEELLLLLEEEEEEELEEELL
Sequence: GTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLI ReProf: LLLEEEEEELLLLEEEEEEELEEHHHHHHHHHHLLLHHHHHHHHHHHLLLLLLLEEEEEEELLLLLLLLLLLEEELLLLLLLLLLLLLLHHHHHHHHHHH PsiPred: LLLEEEEEELLLLLLEEEEEELLLLHHHHHHHHLLLHHHHHHHHHHLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLLHHLLLLLHHHH DSSP: --LEEEEEEEEEEEEE--EEEEE--HHHHHHHLLL-HHHHHHHHHHHHLLLLL---EEEEEEL--LLLLL---L--LLL-LL-LL----HHHHHHHHHHH UniProt: LLEEEEEEEEEEEEEELLEEEEELLHHHHHHHHLLLHHHHHHHHHHHHLLLLLLLLEEEEEEELLEEEEELLLLLLEEELLLLLLLLLLHHHHLLHHHHH
Sequence: HRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA ReProf: HHHHHHLLLLEEEEEELLLLLLEEEELLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLLLLLLLLLELLLHHHHHHHHH PsiPred: HHHHHHLLLLLEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHLLLLLLEEEELLLLLLLLLLLLLLLLLLLLHHHHHHHHH DSSP: HHHHHH-LL--EEEEEEL---HHHELL-LLLLL-EELL-LLLHHHHHHHHHHHHHHHHHHHLL---LEEEL-LLLLHHHLLL--L---E--HHHHHHHHH UniProt: HHHHHHLLLLLEEEEEEELLLHHHEEELEEEEELEEEELLLLHHHHHHHHHHHHHHHHHHHLLLLLEEEEELLLHHHHHLLLLEEELLLLLHHHHHHHHH
Sequence: RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGM ReProf: HHHHHHHHHLLLLLEEEEEEELLLLLLHHHHHHHLLLHHHHHHHHHLEEEELLLLLLLLHHHHHHHHHHLLLLLEEEEEEEELLLLLLLLLLLLLHHHHH PsiPred: HHHHHHHHLLLLLLEEEEEELLLLLLLLLLLHHHLLLHHHHHHLLEEEEELLLLLLLHHHHHHHHHHHHLLLLEEEEEELLLLLLLLLLLLLLLLHHHHH DSSP: HHHHHHHHLLLLLLLEEEEEEEEHHHLLHHHHHHHLLHHHHLL--EEEEEEELLL---HHHHHHHHHHH-LLLEEEEEEEE----LLL-L--LL-HHHHH UniProt: HLHHHHHHLLLLLLEEEEEEEEEHHHLLHHHHHHHLLHHHHLLLLEEEEEEELHHHLLHHHHHHHHHHHLLLEEEEEEEEELLLLEEELLLLLLLHHHHH
Sequence: QYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVL ReProf: HHHHHHHHHHHHHLHHEEEEEEEELLLLLLLLLLLLEEEEEEEELLLLEEEEELLEHHHHEEEELLLLLLEEEEEELLLLLLEEEEEEELLLLLEEEEEE PsiPred: HHHHHHHHHHHLLLEEEEEELLLLLLLLLLLLLLLLLLLLEEEELLLLEEEELLLEEEEEHHLLLLLLLLEEEEEELLLLLLLEEEEEELLLLLEEEEEE DSSP: HHHHHHHHHHHLLEEEEEEEEL-E-LLL---LL------LEEEEHHHLEEEE-HHHHHHHHHHLL--LL-EEEEEEELL--LEEEEEEE-LLL-EEEEEE UniProt: HHHHHHHHHHHLLEEEEEEEEEEELEEELLLLLLLLLLLEEEEEHHHLEEEELHHHHHHHHHHLLLLLLLEEEEEEEEELLEEEEEEEELLLLLEEEEEE
Sequence: NRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ ReProf: LLLLLLEEEEEEELLLEEEEEELLLLEEEEEEEELL PsiPred: ELLLLLEEEEEELLLLLEEEEELLLLEEEEEEEELL DSSP: E-LLL-EEEEEEELLLEEEEEEE-LLEEEEEEE--- UniProt: ELEEELEEEEEEELLLEEEEEEELLLEEEEEEELLL

<figure id="uniprot_sec_str_P04062" >

Schematic representation of UniProt secondary structure for P04062

</figure>

<figure id="1OGS_sec_str_vmd" >

Secondary structure of the PDB structure 1OGS, visualized with VMD using "new cartoon" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta

</figure>

Comparison to available knowledge

Here, we compare the secondary structure predictions and DSSP assignment for the protein sequence P04062 to the available knowledge in UniProt and PDB.

UniProt
UniProt secondary structure annotation assigns residues into one of the three states: helix, strand or turn. The annotation might be unreliable, if no evidence on experimental level is available for the protein. However, the existence of our protein, P04062, was verified on protein level, therefore we can rely on the annotation to some extent. The UniProt secondary structure annotation for P04062 is shown in the image above. It is also included into the alignment in previous section, regarding both turns and positions not in one of the three states (helix, strand or turn) as loops. As one can see from the alignment, ReProf and PsiPred both predict one long helix and ReProf additionally two short helices before it (with 4 and 2 residues) near the beginning of the sequence. However, UniProt annotates only loops there (and DSSP has no assignment there). Altogether, the secondary structures look very similar, excluding small disagreements in the exact position and length of a segment or short segments not presented everywhere. The latter may be falsely predicted or assigned.

PDB
The PDB structure of our protein P04062, 1OGS, consists of two identical chains, A and B. From looking at the cartoon representation colored according to the secondary structures, one can see that each chain contains many alternating helices and sheets connected by loops. Beta barrel fold can be recognized and an extra beta sheet ring on the side of each chain. This supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein P04062.


Ribonuclease inhibitor (P10775)

Aligned view

This is the aligned view of the secondary structure predictions with ReProf and PsiPred, the DSSP assignment and the UniProt annotation for Ribonuclease inhibitor (P10775).

Sequence:	MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL
  ReProf:	LEELLLLLLLLHHHHHHHHHHHHLLLEEEELLLLLLHHHHHHHHHHHHLLLLLEEEELLLLLLLHHHHHHHHHHHHLLLHHEEEEELLLLLLLHHHHHHH
 PsiPred:	LEEELLLLLLLHHHHHHHHHHHHHLLEEELLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLHHHHHHHHHHLLLLLLLLLEEEEELLLLLHHHHLHH
    DSSP:	-E--EEL----HHHHHHHHHHHLL-LEEEEEL----HHHHHHHHHHHLL-LL--EEE--L---HHHHHHHHHHHHLLLL----EEE-LLL---HHHHHLH
 UniProt:	LLLLEEELLLLHHHHHHHHHHHLLLEEEEEELLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLHHHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHLH
Sequence: PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR ReProf: HHHHHHLLLLLHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHLLLLLHHHHHHHHHHHHLLLLLLLLHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH PsiPred: HHHHLLLLLLLEEELLLLLLLLHHHHHHHHHHLLLLLLLLEEEELLLLLLHHLHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHHLLLLLLLLLEEE DSSP: HHHHHH-LL--EEE--L---HHHHHHHHHHHHHLLL----EEE-LL---EHHHHHHHHHHHHH-L---EEE-LLLE-HHHHHHHHHHHHHL--L---EEE UniProt: HHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHHLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHHLLLLLLLEEE
Sequence: LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC ReProf: HHLLLLLHHHHHHHHHHHHHLLLLLHHHHLLLLLLLHHHHHHHHHHHHHLHHHHHHLLLLLLLLHHHHHHHHHHHHHLLHHHHHHHHHHLLLHHHHHHHH PsiPred: LLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHH DSSP: -LLL---HHHHHHHHHHHHH-LL--EEE--LL--HHHHHHHHHHHHL-LL----EEE-LLL---HHHHHHHHHHHHH-LL--EEE-LLL--HHHHHHHHH UniProt: LLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHHHHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLLLLLHHHHHHHHH
Sequence: ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL ReProf: HHHHHHLLLEEEEEEELLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHHHHLLLLLEEEEEELLLLLLHHHHHHHHHHHHHLLLLEEEEL PsiPred: HHLLLLLLLLLEEEELLLLLLHHHHHHHHHHHLLLLLLLEEELLLLLLLLHHHHHHHHLLLLLLLLLEEEELLLLLLLHHHHHHHHHHHHLLLLLLEEEL DSSP: HHHLLLL----EEE-LLL--EHHHHHHHHHHHHH-LL--EEE--LLE-HHHHHHHHHHHLLLLL----EEE-LLL---HHHHHHHHHHHHH--L--EEE- UniProt: HHHLLLLLLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEELLEEELHHHHHHHHHHHLLEEELLLLEEELLLLLLLHHHHHHHHHHHHHLLLLLEEEL
Sequence: SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS ReProf: LLLLLLHHHHHHHHHHHHHLLLLLEEEEELLLLLLHHHHHHHHHHHHHLLLLLELL PsiPred: LLLLLLLHHHHHHHHHLLLLLLLLLEEELLLLLLLHHHHHHHHHHHHLLLLLEELL DSSP: LLLL--HHHHHHHHHHHLLLL----EEE-LL----HHHHHHHHHHHHH-LL-EEE- UniProt: LEEELLHHHHHHHHHHHLEEELLLLEEELLLLLLLHHHHHHHHHHHHHLLLLEEE

<figure id="uniprot_sec_str_P10775" >

Schematic representation of UniProt secondary structure for P10775

</figure>

<figure id="2BNH_sec_str_vmd" >

Secondary structure of the PDB structure 2BNH, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence P10775 to the available knowledge in UniProt and PDB.

UniProt
The existence of the protein P10775 was verified on protein level, too. Therefore, we can rely on the annotation to some extent. The UniProt secondary structure annotation for P10775 is shown in the image above. Like before, it is also included into the secondary structure alignment in previous section. The main differences occur in ReProf in the first half of the sequence: prediction of a helix, where a 3-residue long strand is predicted by the other sources, and a prediction of a longer helix after that, where a shorter helix and a short strand are predicted by the others. The first case occurs once more and the latter three more times. Overall, more helices are predicted by ReProf. In the PsiPred prediction, DSSP assignment and UniProt annotation for the secondary structures look altogether very similar: a sequence of alternating helices and strands separated by loops, sometimes with two short consequent strands.

PDB
From the visualization of the PDB structure of the protein P10775, 2BNH, one can see that it has the typical hoof fold, with helices on the outer side and sheets in the inner side connected by loops from both sides. This fold supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein P10775.


Divalent-cation tolerance protein CutA (Q9X0E6)

Aligned view

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q9X0E6 to the available knowledge in UniProt and PDB.

Sequence:	MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL
  ReProf:	LEEEEEELLLHHHHHHHHHHHHHHLLEEEEELLLEEEEEEELLLLEEEEEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELHLLLHHHHHHHHHHLL
 PsiPred:	LEEEEELLLLHHHHHHHHHHHHHLLLLLEEEEEEEEEEEEELLLEEEEEEEEEEEELLHHLHHHHHHHHHHHLLLLLLEEEEEELLLLLHHHHHHHHHHLL
    DSSP:	-EEEEEEELLHHHHHHHHHHHHHLLL-LEEEEEEEEEEEEELLEEEEEEEEEEEEEEEHHHHHHHHHHHHHH-LLLL--EEEEE-L---HHHHHHHHHH--
 UniProt:	EEEEEEEEEEHHHHHHHHHHHHHLLLLEEEEEEEEEEEEEELLEEEEEEEEEEEEEEEHHHHHHHHHHHHHHLEEEELLEEEELLLLLLHHHHHHHHHH


<figure id="uniprot_sec_str_Q9X0E6" >

Schematic representation of UniProt secondary structure for Q9X0E6

</figure>

<figure id="1VHF_sec_str_vmd" >

Secondary structure of the PDB structure 1VHF, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q9X0E6 to the available knowledge in UniProt and PDB.

UniProt
The existence of the protein Q9X0E6 was also verified on protein level, thus we can rely on its UniProt annotation. The UniProt secondary structure annotation for Q9X0E6 is shown in the image above. Like before, it is also included into the secondary structure alignment in previous section. It is a short protein of only 101 amino acids and all 4 sources agree almost entirely in the secondary structure prediction or assignment. The protein contains three helices (apart from a single loop predicted in the second helix in PsiPred) and 4 strands according to PsiPred and DSSP. ReProf splits the second strand by a 3-residue loop, whereas UniProt splits the last strand by 2 loop residues.

PDB
From the visualization of the PDB structure of the protein Q9X0E6, 1VHF, in Pymol we can see the number and consequence of its helices, loops and strands : lELHLELEHLELh (a lower case letter represents here one residue in this structure and an upper case letter multiple residues in this state). It is the same as the DSSP assignment of the protein P10775.


Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

Aligned view

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q08209 to the available knowledge in UniProt and PDB.

Sequence:	MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK
  ReProf:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEELLLLLLLLLLLLEHHHHHH
 PsiPred:	LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHLLLLEEELLLEEEELLLLLHHHHHHH
    DSSP:	----------------LLLLL-------E-HHHHE-LLL-E-HHHHHHHHHLL--E-HHHHHHHHHHHHHHHHLL-LEEEE-LLEEEE---LL-HHHHHH
 UniProt:	LLLLLLLLLLEEELLLLLLLLLLLLLLLLLHHHHLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLEEEEELEEEEEELLLLLLHHHHHH
Sequence: LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG ReProf: HHHHLLLLLLLEEEEELEELLLLLLLHHHHHHHHHHHHHLLLHEEEEELLLLLLLLLLLLLLHHHHHHHLHHHHHHHHHHHHHHHLLHHEELLEEEEEEL PsiPred: HHHHLLLLLLLLLEELLLLLLLLLLHHHHHHHHHHHHHLLLLLEEEELLLLLLLLLLLLLLHHHHHHHHHLHHHHHHHHHHLLLLHHHHHHLLLEEEELL DSSP: HHHHH--LLL--EEE-L--LLLLL-HHHHHHHHHHHHHHLLLLEEE---LLLLHHHHHHLLHHHHHHHHL-HHHHHHHHHHHLLL--EEEELLLEEEELL UniProt: HHHHHLLLLLLLEEELLLLEEEELLHHHHHHHHHHHHHHLLLLEEELLLLLLLHHHHHHLLHHHHHHHHLLHHHHHHHHHHHHLLLLEEEELLLEEEEEE
Sequence: GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP ReProf: LLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLLLLLLLLLLLELLLLLLLEEEELHHHHHHHHHHLLLEEEEEELLELLLLEEEEEELLLLLLL PsiPred: LLLLLLLLHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEELLHHHHHHHHHHLLLLHHHHHLLLLLLLLLLEELLLLLLLL DSSP: ---LL--LHHHHHHL--LLL--LLLHHHHHHH-EE-LLLLL-LL---EEE-LLLLLLEEE-HHHHHHHHHHLL-LEEEE--L--LLLEEE--E-LLLLLE UniProt: LLLLLEEELHHHHLLLLEEELLEEEHHHHHHHLLLLLLLLLLLLLLEEEELLLLEEEEEELHHHHHHHHHHLLLLEEEELLLLLLLEEEELLLLLLLEEE
Sequence: SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI ReProf: EEEEEEELLLLLLLLLLEEEEEEELLLLLEEEEEELLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH PsiPred: LEEEEELLLLLLLLLLLLEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLHHLHHHHHHHHHHHH DSSP: LEEEE---LLHHHLL---EEEEEEELLEEEEEEE---------HHH--HHHHHHHHHHHHHHHHHHHHHLL----------------------------- UniProt: EEEEEELLLLHHHLLLLLEEEEEEELLEEEEEEELLLLLLLLLHHHLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHH
Sequence: RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN ReProf: HHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLHHHLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHLLLLLLLLLLLLLLLLLLLL PsiPred: HHHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHLLLLLLLLLLHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLL DSSP: --------------------------------------------------------------------HHHHHHHHHHHHL------------------- UniProt: HHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
Sequence: KALTSETNGTDSNGSNSSNIQ ReProf: LLLLLLLLLLLLLLLLLLLLL PsiPred: LLLLLLLLLLLLLLLLLLLLL DSSP: -------------------- UniProt:

<figure id="uniprot_sec_str_Q08209" >

Schematic representation of UniProt secondary structure for Q08209)

</figure>

<figure id="1AUI_sec_str_vmd" >

Secondary structure of the PDB structure 1AUI, visualized with VMD using "licorice" representation. Helices: blue, beta strands and sheets: green, loops and turns: magenta

</figure>

Comparison to available knowledge

The following is the comparison of the secondary structure predictions and DSSP assignment for the protein sequence Q08209 to the available knowledge in UniProt and PDB.

UniProt
Also the last protein we explore was verified on protein level. Therefore we can trust the UniProt annotation to some extent. The UniProt secondary structure annotation for Q08209 is shown in the image above, which is also included into the secondary structure alignment in previous section. It is a long protein of over 500 residues. Some disagreements about its secondary structure assignment can be seen. According to all four sources, the protein contains many loops and helices and some strands. Like in P04062, there are some disagreements in the exact position and length of a segment or short segments not present everywhere. The main differences are:

  • A 6-residue long helix is assigned by DSSP and UniProt, but not by the predictors.
  • ReProf does not find a 4-6 residues long helix present at the three other sources.
  • ReProf predicts a 5-residues long strand instead of a 7-residues long helix assigned by DSSP and UniProt.
  • PsiPred predicts only a loop instead of the latter helix.
  • PsiPred predicts a helix, where a 4-6 residue long strand is assigned by the other sources.
  • Near the end of the protein (bevore the last "conserved" helix), RPsiPred predicts a long helix, ReProf only a 3-residues long helix, whereas only lopps are assigned by DSSP and UniProt in that region.

PDB
From the visualization of the PDB structure of the protein Q08209, 1AUI, one can see that it contains many helices connected by loop regions and sometimes by short strands, also there is a beta-sheet region consisting of ten strands. This supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein Q08209.

Colclusion

Using the secondary structure predictions ReProf and PsiPred, DSSP assignments and UniProt annotations, we could learn for our protein that its secondary structure mainly contains helices and strands. It is a dimer with each chain folding into a beta-barrel domain. Also about the example proteins, we could learn about their secondary structures (already discussed in the respective sections).

To sum up, ReProf and PsiPred are good secondary structure prediction tool. Their predictions agree in the most cases with the DSSP assignment - typically used as a reference, because it uses the actually resolved PDB structure of a protein, which could also be seen in the visualization of the structures - and the UniProt annotation of the secondary structure.


Disorder

In this task we predict protein disordered and globular regions using IUpred and MetaDisorder.
Lab journal

IUPred

IUpred is a protein disorder predictor. User can choose one of the three options:

  • long for prediction of long disorders
  • short for prediction of short disorders
  • glob for prediction of structured, globular domains

IUpred prediction results for each protein are presented and described in the plots in <xr id="iupred"/>. The disorder tendency ranges from 0 to 1 and is plotted for each residue in a protein sequence. Residues with a tendency above 0.5 are seen as disordered.

<figure id="iupred" >

IUpred predictions of disordered protein regions with the different options: long, short and glob. </figure>

MetaDisorder

MetaDisorder (MD) is a meta-predictor that combines several prediction methods:

  • NORSnet: prediction of unstructured loops
  • PROFbval: prediction of residue flexibility from sequence
  • Ucon: prediction of protein disorder using predicted internal contacts

Among the prediction scores of the three predictors, MD gives the final decision on disorder as well as MDrel: reliability of the final prediction, whose values range from 0-9 (9 is the strongest prediction). The raw prediction scores as well as the MD final score for each of the four proteins are visualized in <xr id="md"/>.

<figure id="md" >

MetaDisorder (MD) predictions of disordered protein regions. Raw scores of the used programs: NORSnet, PROFbval and Ucon, as well as the MD score are shown. </figure>

DisProt

Human Glucosylceramidase (P04062)

<figure id="P04062_disprot" >

DisProt Psi-BLAST search result for protein P04062.

</figure>

For our protein, there is no entry in DisProt. There is an option to search for similar sequences in DisProt, with Smith-Waterman alignment method or Psi-BLAST. Using the Psi-BLAST search, only one sequence producing significant alignment was found: DP00159. As only a short region was aligned (47 aligned columns) and the sequence identity of the alignment is only 27%, we cannot consider DisProt annotation of the found protein DP00159 for the protein P04062.

Ribonuclease inhibitor (P10775)

Also for protein P10775 no DisProt entry is found. Psi-BLAST search yielded seven matches. The one with the best score and E-value is with the sequence DP00554. As the alignment is pretty good (40% identity and 196 aligned columns), we check the annotation of the similar protein DP00554. According to the annotation, there is only one 20 residues long disordered region near the beginning of the sequence (residues 31 - 50). However, this region does not fall into the alignment (aligned positions of the query are: 144 - 339 and of the target: 787 - 982). Therefore, also this DisProt annotation is useless for our protein of interest, P10775.

Divalent-cation tolerance protein CutA (Q9X0E6)

The protein Q9X0E6 is not found in DisProt as well. With Psi-BLAST search only insignificant and short matches are found, which cannot be considered for DisProt annotation transfer.

Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform (Q08209)

It is the only protein from the four which could be found in DisProt with the ID. According to DisProt, there are five disordered regions in the protein, two of them overlaping (see the figure on the right and <xr id="Q08209_disprot"/>). These regions are at the ends of the sequence: one at the N terminus (positions 1 - 13) and four at the C terminus (altogether positions 374 - 521). The sixth region is ordered and is in the core of the sequence and is also the longest region (positions 14 - 373). (All regions map to the PDB structure 1AUI:A.) The predictions we made with IUpred an MD for the protein Q08209 yielded similar results for the disordered regions.
<figure id="Q08209_disprot" >

DisProt visualization of disordered and ordered regions for protein Q08209.

</figure>

<figtable id="Q08209_disprot">

Region Type Name Location Length Structural/functional type Functional classes Functional subclasses
1 Disordered - Extended 1 - 13 13 Relationship to function unknown Unknown Unknown
2 Disordered - Extended 374 - 468 95 Function arises via a disorder to order transition Molecular assembly Autoregulatory, Protein-protein binding
3 Disordered - Extended CaM-binding domain 390 - 414 25 Function arises via a disorder to order transition Molecular assembly Protein-protein binding, Autoregulatory
4 Disordered - Extended Autoinhibitory region 469 - 486 18 Function arises via an order to disorder transition and vice versa Molecular assembly Protein-protein binding, Autoregulatory
5 Disordered - Extended 487 - 521 35 Relationship to function unknown Unknown Unknown
6 Ordered 14 - 373 360
DisProt disordered and ordered regions for protein Q08209.

</figtable>

Transmembrane Helices

Lab journal


Four Proteins, including the Gaucher disease causing Protein, where analyzed under reference by transmembrane (TM) helices. The used prediction tools differ in their analyzing features. While Polyphobius only differs between residues being part of a transmembrane helix (TMH) or being inside/outside of the cytoplama, Memsat-SVM also predicts re-entrant helices and pore-linig helices. Due to the fact that pore-lining helices are also transmembrane helices, this kind of helices is detected of both prediction tools. In case of re-entrant helices both programs differ. In general a membrane helix crosses the membrane, so that both ends of the helix lie on different sides of the membrane. In contrast, the re-entrant helix leads bot its ends to the same side of the membrane. Memsat-SVM can predict re-entrant helices, but Polyphobius treats this helices as a general membrane helices, which crosses the membrane (seen for Q9YDF8), or ignores it (seen for P47863). In case of re-entrant helices predictions also the C-terminal or the N-terminal may be predicted on different membrane sides, as well as some helices may be predicted to lie in a different direction within the membrane, because of an re-entrant helix.

The database OPM do not give the direct information about the localization of the N- and C-terminus as well as the type of the helices. Instead of differing between transmembrane and re-entrant helices, OPM classifies all identified membrane helices (MH) as transmembrane. These localizations and the helix type (transmembrane or re-entrant) can be detected from the visualization of the protein, provided by OPM and also shown in the following tables (<xr id="gluco"/>, <xr id="aero"/>, <xr id="lyso"/>, <xr id="dopa"/>).

The second database PDMTM only contains transmembrane proteins. For non-transmembrane proteins, no information is available.

Human Glucosylceramidase

This protein is not a membrane protein and is located on the extracellular side of the membrane as documented in OPM. For the same reason there exist no entry in the PDBTM, as this database only contains membrane proteins. The prediction of Polyphobius causes to the same result. Additionally, Polyphobius predicted also the signal peptide (including the N/H/C-region). MemsatSVM detected a false positive transmembrane helix. As the Glucosylceramidase cleaves lipids of cell membranes, the active site of the enzyme may be mistaken for a transmembrane helix.

<figtable id="gluco">

Comparison of membrane helices (MH) for Glucosylceramidase (P04062, human)
Prediction Assignment
Memsat SVM Polyphobius OPM PDMTM
# of MH 1 - - -
MH Topology 456-471 - - -
N-terminal extracellular extracellular extracellular -
C-terminal cytoplasmic extracellular extracellular -
Signal peptide 1-34 1-40 - -
Re-entrant Helix - - - -
Pore-lining Helix 1 - - -
Graphical position
Cartoon P04062.png
Graphik P04062.png
1ogs.png
-
more information P04062 1OGS 1OGS is not in the PDBTM
Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human glucocerebrosidase. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Aeropyrum pernix Voltage-gated potassium channel

For the protein of the Arachae, Aeropyrum pernix, 4 different PDB ids were found:

  • 1ORQ: chain C
  • 1ORS: chain C
  • 2A0L: chain A/B
  • 2KYH: chain A

As all PDB entries represent structures of different chains, which are not the same, it was difficult to choose one of the entries. In the end the 1ORS was chosen for two reasons. This X-ray structure has the highest resolution compared to the others. Aside from this, this structure represents a sensor domain and must be important for the protein. The predictions have completely different results than the assignments. As the predictions are more similar to each other, they were compared to each other. The same was done for the two assignments. Both predictions have the same number of helices. Nevertheless, some helices have a greater deviation in their position. Memsat predicted an re-entrant helix where Polyphobius detected a transmembrane helix. That is why the N-terminus is predicted differently by both programs.

The assignment of OPM has actually one helix more, but only because of a different declaration of its helices than PDBTM. The third helix of PDBTM consist of two shorter consecutive helices. Together they form one larger helix which crosses the membrane once and are therefore seen as one helix in PDBMT. These two mini-helices which would be too short to cross the membrane alone are counted separately in OPM. Apart from a light deviation of a few residues at the ends of the helices, the structure is the same in both databases.

<figtable id="aero">

Comparison of membrane helices (MH) for Voltage-gated potassium channel (Q9YDF8, Aeropyrum pernix)
Prediction Assignment
Memsat SVM Polyphobius OPM PDMTM
# of MHs 6 7 5 4
MH Topology 1. 43-59
2. 72-90
3. 101-118
4. 128-143
5. 163-184
6. 188-217
7. 221-245
1. 42-60
2. 68-88
3. 108-129
4. 137-157
5. 163-184
6. 196-213
7. 224-244
1. 25-46
2. 55-78
3. 86-97
4. 100-107
5. 117-148
6. -
7. -
1. 27-50
2. 55-75
3. 88-107
4. -
5. 118-142
6. -
7. -
N-terminal cytoplasmic extracellular cytoplasmic cytoplasmic
C-terminal cytoplasmic cytoplasmic cytoplasmic cytoplasmic
Signal peptide - - - -
Re-entrant Helix 1 - - -
Pore-lining Helix 1 - - -
Graphical position
Cartoon Q9YDF8.png
Graphik Q9YDF8.png
1ors-Q9YDF.png
1ors lm.png
more information Q9YDF8 1ORS 1ORS
Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for aeropyrum pernix Voltage-gated potassium channel. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Human Lysosome-associated membrane glycoprotein 1

Both predictions have results similar to the assignments of OPM and PDBMT. All predicted transmembrane helices differ in their position only by a few residues. The protein consists of 6 transmembrane helices and 2 re-entrant helices. Polyphobius skips the re-entrant helices prediction but predicts the remaining membrane helices well. MemsatSVM predicts the re-entrant helices similar to the re-entrant helices of the database entries. Unfortunately MemsatSVM predicts the placing inside the membrane wrong. Instead of the C- and N-terminal situated in the cytoplasm, MemsatSVM places the both ends in the extracellular region.

The two assignments are very similar, OPM does not particularly signs two of its helices as re-entrant but both helices can be seen as re-entrant in the OPM visualization. The re-entrant helices are colored gold in the PDBTM and are lightly silhouetted against the yellow transmembrane helices. All pictures can be found in <xr id="lyso"/>.

<figtable id="lyso">

Comparison of membrane helices (MH) for Lysosome-associated membrane glycoprotein 1 (P47863, human)
Prediction Assignment
Memsat SVM Polyphobius OPM PDMTM
# of MH 8 6 8 (per chain) 8 (per chain)
MH Topology 1. 35-56
2. 71-89
3. 93-109
4. 113-136
5. 157-178
6. 190-205
7. 209-225
8. 232-252
1. 34-58
2. 70-91
3. -
4. 115-136
5. 156-177
6. 188-208
7. -
8. 231-252
1. 34-56
2. 70-88
3. 98-107
4. 112-136
5. 156-178
6. 189-203
7. 214-223
8. 231-252
1. 39-55
2. 72-89
3. 95-106
4. 116-133
5. 158-177
6. 188-205
7. 209-222
8. 231-248
N-terminal extracellular cytoplasmic cytoplasmic cytoplasmic
C-terminal extracellular cytoplasmic cytoplasmic cytoplasmic
Signal peptide 1-20 - - -
Re-entrant Helix 2 - 2 2
Pore-lining Helix 4 - - -
Graphical position
Cartoon P47863.png
Graphik P47863.png
2d57.png
2d57 lm.png
more information P47863 2D57 2D57
Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human Lysosome-associated membrane glycoprotein 1. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Human D3 dopamine receptor

The dopamine receptor is a transmembrane protein. All predicted transmembrane helices of the predictors and the databases agree mostly, with only a less shift of a few residues. While Polyphobius predicts all 7 transmembrane helices which are also documented in OPM and PDMTM, Memsat_SVM only identifies 6 transmembrane helices. As the missing helix is the last one, the C-terminus of the protein is localized as extracellular instead of cytoplasmic. The program classifies the 3rd helix as a pore-lining helix.

<figtable id="dopa">

Comparison of membrane helices (MH) for D3 dopamine receptor (P35462, human)
Prediction Assignment
Memsat SVM Polyphobius OPM PDMTM
# of MH 6 7 7 7
MH Topology 1. 32-55
2. 65-88
3. 101-129
4. 151-169
5. 188-209
6. 331-354
7. -
1. 30-55
2. 66-88
3. 105-126
4. 150-170
5. 188-212
6. 329-352
7. 367-386
1. 34-52
2. 67-91
3. 101-126
4. 150-170
5. 187-209
6. 330-351
7. 363-386
1. 35-52
2. 68-84
3. 109-123
4. 152-166
5. 191-206
6. 334-347
7. 368-382
N-terminal extracellular extracellular extracellular extracellular
C-terminal extracellular cytoplasmic cytoplasmic cytoplasmic
Signal peptide 1-29 - - -
Re-entrant Helix - - - -
Pore-lining Helix 1 - - -
Graphical position
Cartoon P35462.png
Graphik P35462.png
3pbl.png
3pbl lm.png
more information P35462 3PBL 3PBL
Summary and comparison of different membrane predictors (Polyphobius, Memsat-SVM) and databases (OPM, PDMTM) for the human D3 dopamine receptor. Pore-lining helices are colored blue. Re-entrant helices are highlighted red.

</figtable>

Signal Peptides

Lab journal


For the following proteins, the signal peptides as well as its cleavage sites were predicted with SignalP:

  • Glucosylceramidase (P04062, human)
  • Serum albumin (P02768, human)
  • Aquaporin 4 (P11279, rat)
  • Lysosome-associated membrane glycoprotein 1 (P47863, human)

The four eukaryotic proteins were also looked up in the Signal Peptide Database to compare the entry with the results of the prediction.

Glucosylceramidase (P04062)

For the Glucosylcerbrosidase, the prediction of SignalP differs from the database entry.

In the database the protein has a signal peptide of 39 residues. A signal peptide is characterized with high hydrophobicity in its core region followed by the cleavage site [1]. Especially the residues 18-23 and 27-34 indicate with its higher hydrohobicity to a signal peptide (green area in <xr id="sp_gluco"/>, 2).

MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASG


However, the prediction of SignalP results in no signal peptide. On the visualization of the different scores in <xr id="sp_gluco"/>, the green signal peptide score shows the most possible prediction for an signal peptide. The green line is higher for the first 39 residues than for the later residues. But the calculated D-score of the detected peptide lies with 0.37 below the threshold (0.5). The peptide is neglected as signal. These residues are not only defined as signal peptide by the database, but were also detected, with a light deviation, by the transmembrane helix predictors MemsatSVM (residues 1-34) and Polyphobius(residues 1-40).


<figure id="sp_gluco">

SignalP output and information of Signal Peptide Database of Glucosylceramidase. </figure>

Serum albumin (P02768)

The signal peptide consists of residues 1-18 and is predicted of SignalP as well as documented in the Signal Peptide Databse

MKWVTFISLLFLFSSAYS

<xr id="sp_ser"/> shows a clear prediction of the signal peptide. A high S-core for the signal peptide region with a D-score of 0.85 far over the threshold. The cleavage side is predicted between the residue 18 and 19. The database shows a high hydrophobicity for the residues 6-14 which marks the region as signal peptide as well.

<figure id="sp_ser" >

SignalP output and information of Signal Peptide Database of Serum albumin. </figure>

Aquaporin 4 (P11279)

For Aquaporin the Scores are even higher than for Serum albumin. The signal peptide consists of 28 residues as follows:

MAAPGSARRPLLLLLLLLLLGLMHCASA

The database shows a large hydrophobic region of 17 residues. At the end of the protein a transmembrane helix with a length of 23 residues ending in cytoplasm is documented in the Aquaporin 4 entry. The SignalP prediction gives a D-score of 0.95 for the detected signal peptide. The cleavage site is predicted between the residues 28 and 29 (ASA-AM). <xr id="sp_aq"/>

<figure id="sp_aq" >

SignalP output and information of Signal Peptide Database of Aquaporin 4. </figure>

Lysosome-associated membrane glycoprotein 1 (P47863)

The rat protein has no entry in the Signal Peptide Database, as no signal peptide exists for it. The visualised results of the prediction show on the first sight, that the protein does not have a signal peptide. All scores are lower than 0.21, which is far below the threshold for signal peptides.

<figure id="sp_ly" >

SignalP output of Lysosome-associated membrane glycoprotein 1 </figure>

GO Terms

Lab journal

Not very good prediction, depends a lot on what is known.

GOPET

GOPET: GO Terms for Glucocerebrosidase
GO ID Aspect Confidence in % GO Term
GO:0016787 F 98 hydrolase activity
GO:0004348 F 97 glucosylceramidase activity
GO:0016798 F 97 hydrolase activity acting on glycosyl bonds

GOPET mainly focus on the activity of the Glucosycerebrosidase. The predicted GO Terms are less but correlate with the disease description in task1 as well as the description in Pfam.

Predict Protein

Predict Protein: GO Terms for Glucocerebrosidase
GO ID GO Term Reliability in %
Biological Process GO:0005515 protein binding 70
Cellular Component - - -
Molecular Function GO:0007040
GO:0006665
GO:0005975
GO:0008219
GO:0005976
lysosome organization
sphingolipid metabolic process
carbohydrate metabolic process
cell death
polysaccharide metabolic process
49
36
35
19
8

The first three predicted Terms of the Molecular Function are confirmed by our knowledge about the process. The protein has no directly influence on cell death (GO:0008219), but is indirect involved as it processes the cell membrane of death blood cells. A participation on polysaccaride metabolic process is not known, as the glucosylcerebrosidase acts on fatty acids not on ploysaccarides. This GO Term is supposed to be wrong.

ProtFun2.0

Functional category                 
                                      Prob     Odds
 Amino_acid_biosynthesis              0.035    1.593
 Biosynthesis_of_cofactors            0.182    2.528
 Cell_envelope                     => 0.504    8.262
 Cellular_processes                   0.032    0.438
 Central_intermediary_metabolism      0.382    6.063
 Energy_metabolism                    0.067    0.740
 Fatty_acid_metabolism                0.027    2.088
 Purines_and_pyrimidines              0.538    2.213
 Regulatory_functions                 0.031    0.191
 Replication_and_transcription        0.126    0.471
 Translation                          0.082    1.863
 Transport_and_binding                0.560    1.365
Enzyme/nonenzyme                     
                                      Prob     Odds
 Enzyme                            => 0.773    2.698
 Nonenzyme                            0.227    0.318
Enzyme class
                                      Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.083    0.399
 Transferase    (EC 2.-.-.-)          0.228    0.660
 Hydrolase      (EC 3.-.-.-)          0.272    0.859
 Lyase          (EC 4.-.-.-)          0.045    0.961
 Isomerase      (EC 5.-.-.-)          0.011    0.345
 Ligase         (EC 6.-.-.-)          0.017    0.332
Gene Ontology category
                                      Prob     Odds
 Signal_transducer                    0.054    0.251
 Receptor                             0.027    0.158
 Hormone                              0.001    0.206
 Structural_protein                   0.002    0.087
 Transporter                          0.024    0.222
 Ion_channel                          0.018    0.307
 Voltage-gated_ion_channel            0.004    0.195
 Cation_channel                       0.012    0.268
 Transcription                        0.070    0.550
 Transcription_regulation             0.030    0.237
 Stress_response                      0.085    0.962
 Immune_response                   => 0.153    1.804
 Growth_factor                        0.005    0.376
 Metal_ion_transport                  0.009    0.020

The ProtFun Server classifies P0462 as an enzyme of the class Hydrolase. The predicted Gene Ontalogy category indicates to the location of the protein, which acts in the macrophages (immunocells). The functional category "cell envelope" declares the protein to interact with the cell membrane. This is right as the protein not only interacts but processes fatty acids of membranes.

Pfam

For the protein P04062 O-Glycosyl hydrolase family 30 was found on position 40-533.

Identifiers
Symbol Glyco_hydro_30
Pfam family PF02055
Pfam clan CL0058
InterPro IPR001139
SCOP 1nof
SUPERFAMILY 1nof
OPM family 125
OPM protein 1ogs
CAZY GH30


O-Glycosyl hydrolase family 30


This family is a part of glycoside hydrolases known under the EC 3.2.1. Glycoside hydrolases includes a great number of enzymes that destroy glycosidic bonds between carbohydrates and other moieties

This glycoside hydrolases has the clan (CL0058): Tim barrel glycosyl hydrolase superfamily

The Pfam entry for the family PF02055 also mentions the human glucosylcerbrosidase as Gaucher disease causing.

Discussion

Other available methods

Prediction of Tool
secondary structure APSSP: Advanced Protein Secondary Structure Prediction Server
CFSSP: Chou & Fasman Secondary Structure Prediction Server
GOR IV
Jpred3
disorder DISOPRED2
transmembrane helices MEMSAT3
TMHMM
PredictProtein
DAS
HMMTOP
TMpred
signal peptides PrediSi
Polyphobius
MemsatSVM
SIGCLEAVE
ANTHEPROT
Signal Find Server
SPD
SPEPlip
SOSUIsignal
GO terms AmiGo
PDB
Uniprot

What else can/is be predicted from protein sequence alone

  • Fold recognition (profile based pGenTHREADER and rapid GenTHREADER)
  • Fold domain recognition (pDomTHREADER)
  • Protein domain prediction (DomPred)
  • Homology modelling (BioSerf v2.0)
  • Function prediction (eukaryotic function: FFPred v2.0)
  • Prediction of TM topology and helix packing (SVM-based MEMPACK)

http://bioinf.cs.ucl.ac.uk/psipred/

  • Cleavage site prediction
  • Ab initio structure prediction (not very succesfull, combinatorial problem, computational intensive, worse for longer sequences. Moreover biological molecules are not necesserily in the lowest energy comformation.)
  • Solvent accesibility
  • Metal binding sites, active sites
  • Protein protein interactions
  • SNPs effect prediction

Which predictions can be improved considerably by structure-based approaches

  • Solvent accesibility