Difference between revisions of "Sequence-based analyses Gaucher Disease"

From Bioinformatikpedia
(Q9YDF8)
(ProtFun)
Line 489: Line 489:
 
=== ProtFun ===
 
=== ProtFun ===
   
# Functional category Prob Odds
 
Amino_acid_biosynthesis 0.035 1.593
 
Biosynthesis_of_cofactors 0.182 2.528
 
Cell_envelope => 0.504 8.262
 
Cellular_processes 0.032 0.438
 
Central_intermediary_metabolism 0.382 6.063
 
Energy_metabolism 0.067 0.740
 
Fatty_acid_metabolism 0.027 2.088
 
Purines_and_pyrimidines 0.538 2.213
 
Regulatory_functions 0.031 0.191
 
Replication_and_transcription 0.126 0.471
 
Translation 0.082 1.863
 
Transport_and_binding 0.560 1.365
 
   
  +
<figtable id="tab:ss_protfun">
# Enzyme/nonenzyme Prob Odds
 
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0" align="left" width="480px"
Enzyme => 0.773 2.698
 
  +
|-
Nonenzyme 0.227 0.318
 
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Functional category
 
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Prob
# Enzyme class Prob Odds
 
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Odds
Oxidoreductase (EC 1.-.-.-) 0.083 0.399
 
  +
|-
Transferase (EC 2.-.-.-) 0.228 0.660
 
  +
| Amino_acid_biosynthesis || 0.035 || 1.593
Hydrolase (EC 3.-.-.-) 0.272 0.859
 
  +
|-
Lyase (EC 4.-.-.-) 0.045 0.961
 
Isomerase (EC 5.-.-.-) 0.011 0.345
+
| Biosynthesis_of_cofactors || 0.182 || 2.528
  +
|-style="font-style: bold; color: green;"
Ligase (EC 6.-.-.-) 0.017 0.332
 
  +
| Cell_envelope ||0.504 ||8.262
 
  +
|-
# Gene Ontology category Prob Odds
 
Signal_transducer 0.054 0.251
+
| Cellular_processes ||0.032 ||0.438
  +
|-
Receptor 0.027 0.158
 
Hormone 0.001 0.206
+
| Central_intermediary_metabolism ||0.382 ||6.063
  +
|-
Structural_protein 0.002 0.087
 
Transporter 0.024 0.222
+
| Biosynthesis_of_cofactors ||0.182 || 2.528
  +
|-
Ion_channel 0.018 0.307
 
Voltage-gated_ion_channel 0.004 0.195
+
| Energy_metabolism ||0.067 ||0.740
  +
|-
Cation_channel 0.012 0.268
 
Transcription 0.070 0.550
+
| Fatty_acid_metabolism ||0.027 || 2.088
  +
|-
Transcription_regulation 0.030 0.237
 
Stress_response 0.085 0.962
+
| Purines_and_pyrimidines || 0.538 || 2.213
  +
|-
Immune_response => 0.153 1.804
 
Growth_factor 0.005 0.376
+
| Regulatory_functions || 0.031 || 0.191
  +
|-
Metal_ion_transport 0.009 0.020
 
  +
| Replication_and_transcription || 0.126 || 0.471
  +
|-
  +
| Translation || 0.082 || 1.863
  +
|-
  +
| Transport_and_binding || 0.560 || 1.365
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Enzyme/nonenzyme
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Prob
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Odds
  +
|-style="font-style: bold; color: green;"
  +
| Enzyme || 0.773 || 2.698
  +
|-
  +
|Nonenzyme || 0.227 || 0.318
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Enzyme class
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Prob
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Odds
  +
|-
  +
| Oxidoreductase (EC 1.-.-.-) || 0.083 || 0.399
  +
|-
  +
| Transferase (EC 2.-.-.-) || 0.228 || 0.660
  +
|-
  +
| Hydrolase (EC 3.-.-.-) || 0.272 || 0.859
  +
|-
  +
| Lyase (EC 4.-.-.-) || 0.045 || 0.961
  +
|-
  +
| Isomerase (EC 5.-.-.-) || 0.011 || 0.345
  +
|-
  +
| Ligase (EC 6.-.-.-) || 0.017 || 0.332
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
| style="border-style: solid; border-width: 0 0 2px 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Gene Ontology category
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Prob
  +
| style="border-style: solid; border-width: 0 0 2px 0" | Odds
  +
|-
  +
| Signal_transducer || 0.054 || 0.251
  +
|-
  +
| Receptor || 0.027 || 0.158
  +
|-
  +
| Hormone || 0.001 || 0.206
  +
|-
  +
| Structural_protein || 0.002 || 0.087
  +
|-
  +
| Transporter || 0.024 || 0.222
  +
|-
  +
| Ion_channel || 0.018 || 0.307
  +
|-
  +
| Voltage-gated_ion_channel || 0.004 || 0.195
  +
|-
  +
| Cation_channel || 0.012 || 0.268
  +
|-
  +
| Transcription || 0.070 || 0.550
  +
|-
  +
| Transcription_regulation || 0.030 || 0.237
  +
|-
  +
| Stress_response || 0.085 || 0.962
  +
|-style="font-style: bold; color: green;"
  +
| Immune_response || 0.153 || 1.804
  +
|-
  +
| Growth_factor || 0.005 || 0.376
  +
|-
  +
| Metal_ion_transport || 0.009 || 0.020
  +
|}
  +
</figtable>
  +
<br style="clear:both;">
   
 
=== Pfam ===
 
=== Pfam ===

Revision as of 11:03, 21 May 2012

Secondary structure

Knowing the secondary structure of a protein can shed light on its function since structure implies function. If the structure of a protein is known, secondary structure elements (helix, sheet, coiled) can be assigned to its residues depending on their affinity to form hydrogen bonds. DSSP <ref name="DSSP">Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers</ref> is the most common method to perform such secondary structure assignments. If the structure of a protein is unknown, secondary structure elements be be predicted by tools like PSIPRED <ref name="PSIPRED">Liam J. McGuffin, Kevin Bryson, and David T. Jones (2000). "The PSIPRED protein structure prediction server". Bioinformatics</ref> or Reprof<ref name="Reprof">B Rost, C Sander (1993). "Prediction of protein secondary structure at better than 70% accuracy". J. Mol. Bio.</ref>. The aim of this task was to analyse the secondary structure of different proteins and the compare the secondary structure predictions of PSIPRED and Reprof with the DSSP secondary structure assignments. Following sequences were taken into account: <figtable id="tab:ss_sequences">

NAME UniProtKB PDB
Glucosylceramidase P04062 1OGS
Ribonuclease inhibitor P10775 1DFJ
Divalent-cation tolerance protein CutA Q9X0E6 1KR4
Serine/threonine-protein phosphatase Q08209 1AUI

</figtable> Information about program calls and implementation details can be found in our protocol.

Predictions

For being able to better compare the different output formats, we mapped the secondary structure definitions of all three methods onto the three letters H (helix), E (sheet), and C (coiled) according to table <xr id="tab:ss_mapping"/>. Regions of the UniProt sequences which were not present in the PDB file as well as regions where no DSSP assignment was possible were ignored. <figtable id="tab:ss_mapping">

Method H E C
DSSP H,G,I E,B T,S,' '
PSIPRED H E C
Reprof H E L

</figtable>


P04062

Glycosylcermidase (P04062) is located the the membrane of lysosomes. It exhibits two domains which belong to the (1) glycosyl hydrolase domain fold and (2) the TIM beta/alpha-barrel fold. Both domains have hydrophobic beta sheets which anchor the protein in the membrane. <xr id="fig:ss_P04062"/> depicts the secondary structure elements of the corresponding crystal structure which coincide with the DSSP assignments. The following section shows the secondary structure annotations of the different methods: The PSIPRED predictions better coincide with the DSSP assigments than the Reprof predictions do. Reprof predicts sheets instead of helices in several regions. The residues of the beta-barell sheets (the tube in the middle of <xr id="fig:ss_P04062"/>) are marked by asterisks.

</figure>

          40
          ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKG
Reprof    CCCCCCCCCCCEEEEEEECCEECCCCCCCCCCCCCEEEEEEECCCCCEEEEECCCEECCCCCCEEEEEECCCCEEEEEEC
PSIPRED   CCCCCCCCCCCCCEEEEECCHHCCCCCCCCCCCCCEEEEEEECCCCCCHHCCCCCCCCCCCCCCCEEEECCCCCCEEEEE
DSSP      CECCCEEECCCCCEEEEEECCCCCECCCCCCCCCCEEEEEEEECCCCCCEEEEEECECCCCCCCEEEEEEEEEEEEECCE

          120
          FGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPL
Reprof    CCCCCCHHHHHEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCCEEEEEEEECCCCCCCEEEEECCCCCCCCCEEEE
PSIPRED   EEEEHHHHHHHHHHHCCHHHHHHHHHHHCCCCCCEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHH
DSSP      EEEECCHHHHHHHCCCCHHHHHHHHHHHHCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHCCHHHH

          200
          IHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGL
Reprof    EEHHHHHCCCCCEEEECCCCCCCEEEECCCECCCEECCCCCCCCCCHHHHHHHHHHHHHCCCCEEEEEEEEECCCCCCCE
PSIPRED   HHHHHHHHCCCEEEEEEECCCCHHEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEEECCCCCCCC
DSSP      HHHHHHHCCCCCEEEEEECCCCHHHECCCCCCCCCEECCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCEEECCCCCCHHH
                      ******                                                  ***
          280
          LSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPA
Reprof    ECCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCEEEEEEECCCCEECCCEEEEEECCCCCCEEEEEEEEEEEEEECCCC
PSIPRED   CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHHHCCHHHHHHCCEEEEECCCCCCCCH
DSSP      CCCCCCCCCECCHHHHHHHHHHCHHHHHHCCCCCCCEEEEEEEEHHHCCHHHHHHHCCHHHHCCCCEEEEEEECCCCCCH
                                              ********                      ******* 
          360
          KATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDS
Reprof    CCECCCCCECCCCCEEEECCCCCCCEEEEEEEEECCCCCCCEEECEHHHHEEEEEEECCCCEEEECCCCCCEEEEECCCC
PSIPRED   HHHHHHHHHHCCCCEEEEECCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHEEEEEEEEECCCCCCCCCCCCCCC
DSSP      HHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCEEEEEEEECCECCCCCCCCCCCCCCC
                        ********                                ******** 
          440
          PIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFL
Reprof    CEEEEECCCCCCCCCEEEECCCEEEECCCCCEEEEEEEECCCCCEEEEEECCCCCEEEEEEECCCCCCEEEECCCCEEEE
PSIPRED   CEEEECCCCEEEECHHHHHHHHHHHHCCCCCEEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE
DSSP      CEEEEHHHCEEEECHHHHHHHHHHCCCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE

          520
          ETISPGYSIHTYLWRRQ
Reprof    EEECCCCEEEEEEEECC
PSIPRED   EEECCCCEEEEEEEECC
DSSP      EEEECCCEEEEEEECCC

<figure id="fig:ss_P04062">

Crsytal structure of 1OGS (P04062). Red: alpha-helix, yellow: beta-sheet, green: coiled.


P10775

<xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: d1dfji_ is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. d1dfje_ contains long curved antiparallel beta-sheets and three alpha-helices. The alternating HHH and EEE regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof.

</figure>

          1
          MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTC
Reprof    CCCCECHHCCCCCHHHHHHHHHHHCCEEEECCCCCCHHHHHHHHHHHHCCCCHHHHHHHHCCCCCCCHEEEHCCCCCCCC
PSIPRED   CEEECCCCCCCHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHHHCCC
DSSP      CECCCECCCCCHHHHHHHHHHHCCCCEEECECCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCC

          81
          KIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASV
Reprof    EEEEECCCCCCCCHCCCCCCHHHHCHCHHHHHHCCCCCCCCHHHHHHHHHCCCCCHCCHHHHHHHHHHCCCCCCHHHHHH
PSIPRED   CCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH
DSSP      CCCEEECCCCCCCCCHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH

          161
          LRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA
Reprof    HHHHHHHHHHCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHCHCCHHHCCCCCCCCCHHHHH
PSIPRED   HHHCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
DSSP      HHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHH

          241
          ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSL
Reprof    HHCCCCCCCHHHHCHHEEEHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCHHHHHHHHCHH
PSIPRED   HHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCC
DSSP      HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCC

          321
          TAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL
Reprof    HHHHHHHHHHHHHCCHHHHHHHCCCCCCCCHHHHHHHHHHCCCCCEEEEEEECCCCCCCCCHHHHHHHHHHHCCHHHHCC
PSIPRED   CHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEEC
DSSP      EHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEEC

          401
          SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Reprof    CCCCCCCHHHHHHHCCCCCCCCHHHHHHHCCCCCCHHHHHHHHHHHCCCCCCCECC
PSIPRED   CCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEECC
DSSP      CCCECCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC

<figure id="fig:ss_P10775">

Crsytal structure of 1DFJ (P10775). Red: alpha-helix, yellow: beta-sheet, green: coiled.

Q9X0E6

<xr id="fig:ss_Q9X0E6"/> depicts the d1kr4a_ domain of 1KR4 which is made of three alpha-helices interrupted by beta-sheets. Reprof predicts too long helices.

</figure>

          2
          ILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIF
Reprof    EEEEECCCCHHHHHHHHHHHHHHHHHHHHCHCHHHCCCEEECEEECCHHHHHHHCCCHHHHHHHHHHHHHCCCCCCCHHE
PSIPRED   EEEEEECCCHHHHHHHHHHHHHCCCEEEEEEEEEEEEEEECCCEEEEEEEEEEECCCHHHHHHHHHHHHHHCCCCCCEEE
DSSP      EEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEE

          82
          TLKVENVLTEYMNWLRESVL
Reprof    HHHHHHHHHHHHHHHHHHCC
PSIPRED   EEECCCCCHHHHHHHHHHCC
DSSP      EECCCCEEHHHHHHHHHHCC

<figure id="fig:ss_Q9X0E6">

Crsytal structure of 1KR4 (Q9X0E6). Red: alpha-helix, yellow: beta-sheet, green: coiled.

Q08209

Q08209 contains the domain d1auib_ and d1auia_ which are mainly assembled of alpha-helices (<xr id="fig:ss_Q08209"/>). PSIPRED predicts these alpha-helices considerably better than Reprof which suggests beta-sheets in some regions.

</figure>

          14
          TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHG
Reprof    CCCEEEEECCCCCCCEEEEEEECCCCCCEEEEEHHHECCCCCCCEEEEEEEEECCCCEEECCCCCCCCCCCEEEEECCCC
PSIPRED   CCCCCCCCCCCCCCCCCHHHCCCCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHCCCEEEECCCEEEECCCCC
DSSP      CCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCC

          94
          QFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSER
Reprof    HHHHHHEEEEECCCCCCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCCCEEEEEECCCCCCEEEEEEEEEEEEEEEEECH
PSIPRED   HHHHHHHHHHHCCCCCCCCEEECCCCCCCCCCCHHHHHHHHHHHHCCCCCEEEECCCCHHHHHHCCCCHHHHHHHHCCHH
DSSP      CHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHCCHHHHHHHHCCHH

          174
          VYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTV
Reprof    HHHHHHHHCCCCCHHHHHCCCEEEEECCCCCCCCCHHHHHHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCCCECCCCCE
PSIPRED   HHHHHHHHHCCCHHHHHCCCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHCCCCCCCCCCCCCCCCCCCCCC
DSSP      HHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCC

          254
          RGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQ
Reprof    CCEEEEECCCCEEEEHCCCCHHHHEHHHCCCCCCEEEEEECCCCCCCEEEEEEECCCEEEEECCCEEEEEECCCEEEEEE
PSIPRED   CCCCCCCCHHHHHHHHHHCCCCEEEEHHHHHHHHHHHHHCCCCCCCCCEEEEEECCCCCCCCCCEEEEEEECCCCCEEEE
DSSP      CCCCEEECHHHHHHHHHHCCCCEEEECCCCCCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEE

          334
          FNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSSFEEAKGLDRINERMPPR
Reprof    ECCCCCCCCCCCCCEEEEEECCCCCHHHHHHHHHHHEECCEHHHHCCCCHHCCCCCCC
PSIPRED   ECCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHCCCCC
DSSP      ECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHCCCCC

<figure id="fig:ss_Q08209">

Crsytal structure of 1AUI (Q08209). Red: alpha-helix, yellow: beta-sheet, green: coiled.

Prediction accuracy/precision

We compared the prediction performance of PSIPRED and Reprof via the Q3 score and the precision of the three secondary structure states H,E, and C. The Q3 score is identical to the accuracy, i.e. the number of correctly predicted states divided by the length of the protein. The precision of state X is the fraction of correct predictions of X, formally precision(X)=TP(X)/(TP(X)+FP(X)). <xr id="ss_acc"/> shows the results: PSIPRED clearly outperforms Reprof in all for cases. PSIPRED achieves an average accuracy of 87% which is significantly higher than 58% in case of Reprof. <figtable id="ss_acc">

Method Q3 Precision H Precision E Precision C
P04062
PSIPRED 0.831 0.830 0.872 0.810
Reprof 0.553 1.000 0.455 0.592
P10775
PSIPRED 0.941 0.959 0.960 0.919
Reprof 0.603 0.589 0.417 0.644
Q9X0E6
PSIPRED 0.890 1.000 0.895 0.720
Reprof 0.580 0.562 0.917 0.458
Q08209
PSIPRED 0.833 0.842 0.902 0.812
Reprof 0.579 0.762 0.293 0.743

</figtable>

Disorder

Disordered regions are regions with a varying three-dimensional structure. Nevertheless, such regions can be functionally highly important: regulation, signalling, and flexible ligand binding are only some examples. DisProt<ref name="DisProt">Vucetic S, Obradovic Z, Vacic V, et al. (2005). "DisProt: a database of protein disorder". Bioinformatics</ref> is a curated databases of proteins with experimentally determined disordered regions. IUPred<ref name="IUPred">Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon (2005). "The Pairwise Energy Content Estimated from Amino Acid Composition Discriminates between Folded and Intrinsically Unstructured Proteins". J. Mol. Biol.</ref> is a method for predicting disordered regions ab-initio, i.e. based solely on the protein sequence. We compared the predictions of IUPred with the annotations in the DisProt database for all four example proteins. IUPred was called to predict long, global disorder regions (confer the protocol for details). Residues involved in disordered regions were defined as those with a probability of at least 50%. These residues were compared to the DisProt annotations: either by the UniProt entry itself if available, or by a significant homolog (e-value < 1e-3) for which a DisProt entry existed. We measured the performance of IUPred via the precision (TP/(TP+FP)), sensitivity (TP/(TP+FN)), and specificity (TN/(TN+FP)).

P04062

Neither Glycosylceramidase (P04062) nor a homologous protein is annotated in DisProt. This might be due to lacking experimental data or, which is more likely, due to lacking disordered regions. The latter assumption is supported by the highly structured protein complex 1OGS (<xr id="fig:ss_P04062"/>). However, IUOred predicts some disordered residues with a probability >= 50%.

<figtable id="disorder_P04062">

Method Disorder regions
IUPred 2, 3, 6, 90-93, 229-231, 235, 236
DisProt
Precision: 0% Sensitivity: undef Specificity: 98%

</figtable>

P10775

P10775 is not annotated in DisProt itself, but there is a significant homolog (DP00554) with a disordered region from 31 to 50. This region is, however, is not covered by the pairwise alignment of P10775 and DP00554. Hence, IUpred correctly did not predict any disordered region (Specificity=100%).

Q9X0E6

There is neither an entry in DisProt which suggests a disordered region in Q9X0E6 and nor does IUPred predict such a region.

Q08209

Five disordered regions are annotated in DisProt for Q08209. These regions mainly cover the C-terminal end which exhibits several rather arbitrarily arranged alpha-helices (<xr id="fig:ss_Q08209"/>). All residues predicted by IUPred with a probability >= 50% are covered by DisProt annotations (precision 100%), but IUpred predicts only about half of all disordered regions (sensitivity 52%).

</figure>

<figtable id="disorder_Q08209">

Method Disorder regions
IUPred 1-6, 8, 383,384,424,425,432,434-439,443,445,448,449,455,458,463-521
DisProt 1-13,390-414,374-468,469-486,487-521
Precision: 100% Sensitivity: 52% Specificity: 100%

</figtable>

<figure id="fig:disorder_Q08209">

Disordered and ordered regions of Q08209 using the DisProt entry DP00092.

Transmembrane helices

PolyPhobius was used to predict transmembrane helices for our protein P04062 and other three proteins P35462, Q9YDF8 and P47863. The scripts which were used to do the prediction can be found here [protocal]. The prediction results from PolyPhobius were then compared with the membrane assignment of the structures for these proteins in OPM and/or PDBTM. For that purpose, a corresponding pdb structure for each protein was needed and was taken from uniprot:

<figtable id="tab:ss_sequences_trans">

NAME UniProtKB PDB
Glucosylceramidase P04062 1OGS
D(3) dopamine receptor P35462 3PBL
DVoltage-gated potassium channel Q9YDF8 1ORQ 1ORS
Aquaporin-4 P47863 2D57

</figtable>

P04062

Our protein Glucosylceramidase is located in the lysosome, therefore contains no transmembrane regions. As expected, PolyPhobius has not reported any transmembrane regions. Instead, the most regions were predicted lying in the non-cytoplam and a signal peptide was found. Compared with that in uniprot, PolyPhobius has returned correct prediction:


<figtable id="tab:P04062_trans">

P04062 Signal peptide Others
PolyPhobius 1-39 40-536 NON CYTOPLASMIC
Uniprot 1-39 40–536 Chain

</figtable>



P35462

<figtable id="tab:P35462_trans">

P35462 TRANSMEM 1 TRANSMEM 2 TRANSMEM 3 TRANSMEM 4 TRANSMEM 5 TRANSMEM 6 TRANSMEM 7
PolyPhobius 30-55 66-88 105-126 150-170 188-212 329-352 367-386
Uniprot 33–55 66–88 105–126 150–170 188–212 330–351 367–388
OPM 34-52 67-91 101-126 150–170 187-209 3330-351 363-386
PDBTM 35-52 68-84 109-123 152-166 191-206 334-347 368-382

</figtable>


<figtable id="tab:P35462_trans_png">

Dopamine D3 receptor(P35462,3PBL) showing in OPM.
Dopamine D3 receptor(P35462,3PBL) showing in PDBTM.

</figtable>


Q9YDF8

<figtable id="tab:Q9YDF8_trans">

Q9YDF8 TRANSMEM_1 TRANSMEM_2 TRANSMEM_3 TRANSMEM_4 TRANSMEM_5 TRANSMEM_6 TRANSMEM_7 TRANSMEM_8
PolyPhobius 42-60 68-88 - 108-129 137-157 163-184 196-213 224-244
Uniprot 39–63 68–92 97–105(Intramembrane) 109–125 129–145 160–184 196–208(Intramembrane) 222 – 253
OPM_1ORQ - - - - - 153-172 183-195 207-225
OPM_1ORS 25-46 55-78 86-97 100-107 117-148 - - -
PDBTM_1ORQ 21-52 57-80 - - 151-171 - - 209-236
PDBTM_1ORS 27-50 55-75 88-107 118-142 - - - -

</figtable>


<figtable id="tab:Q9YDF8_trans_png">

Potassium channel KvAP(Q9YDF8,1ORQ) showing in OPM.
Potassium channel KvAP(Q9YDF8,1ORS) showing in OPM.
Potassium channel KvAP(Q9YDF8,1ORQ) showing in PDBTM.
Potassium channel KvAP(Q9YDF8,1ORS) showing in PDBTM.

</figtable>


P47863

<figtable id="tab:P47863_trans">

P47863 TRANSMEM_1 TRANSMEM_2 TRANSMEM_3 TRANSMEM_4 TRANSMEM_5 TRANSMEM_6 TRANSMEM_7 TRANSMEM_8
PolyPhobius 34-58 70-91 - 115-136 156-177 188-208 - 231-252
Uniprot 37–57 65–85 - 116–136 156–176 185–205 - 232–252
OPM 34-56 70-88 98-107 112-136 156-178 189-203 214-223 231-252
PDBTM 39-55 72-89 - 116-133 158-177 188-205 - 231-248

</figtable>


<figtable id="tab:P47863_trans_png">

Aquaporin-4(P47863,2D57) showing in OPM.
Aquaporin-4(P47863,2D57) showing in PDBTM.

</figtable>


Signal peptides

The online server of SignalP with version 4.0 was used to predict signal peptides for our protein P04062 and other three proteins P02768, P47863 and P11279:

<figtable id="tab:ss_sequences_signa">

NAME UniProtKB PDB
Glucosylceramidase P04062 1OGS
Serum albumin P02768 1AO6
Aquaporin-4 P47863 2D57
Lysosome-associated membrane glycoprotein 1 P11279 -

</figtable>


<figtable id="tab:signa_png">

signal peptides prediction of Glucosylceramidase (P04062) by using SignalP.
signal peptides prediction of Serum albumin (P02768) by using SignalP.
signal peptides prediction of Aquaporin-4 (P47863) by using SignalP.
signal peptides prediction of Lysosome-associated membrane glycoprotein 1 (P11279) by using SignalP.

</figtable>


GO terms

GOPET

<figtable id="tab:ss_gopet">

GOid Aspect Confidence GO term
GO:0016787 F 98% hydrolase activity
GO:0004348 F 97% glucosylceramidase activity
GO:0016798 F 97% hydrolase activity acting on glycosyl bonds

</figtable>

ProtFun

<figtable id="tab:ss_protfun">

Functional category Prob Odds
Amino_acid_biosynthesis 0.035 1.593
Biosynthesis_of_cofactors 0.182 2.528
Cell_envelope 0.504 8.262
Cellular_processes 0.032 0.438
Central_intermediary_metabolism 0.382 6.063
Biosynthesis_of_cofactors 0.182 2.528
Energy_metabolism 0.067 0.740
Fatty_acid_metabolism 0.027 2.088
Purines_and_pyrimidines 0.538 2.213
Regulatory_functions 0.031 0.191
Replication_and_transcription 0.126 0.471
Translation 0.082 1.863
Transport_and_binding 0.560 1.365
Enzyme/nonenzyme Prob Odds
Enzyme 0.773 2.698
Nonenzyme 0.227 0.318
Enzyme class Prob Odds
Oxidoreductase (EC 1.-.-.-) 0.083 0.399
Transferase (EC 2.-.-.-) 0.228 0.660
Hydrolase (EC 3.-.-.-) 0.272 0.859
Lyase (EC 4.-.-.-) 0.045 0.961
Isomerase (EC 5.-.-.-) 0.011 0.345
Ligase (EC 6.-.-.-) 0.017 0.332
Gene Ontology category Prob Odds
Signal_transducer 0.054 0.251
Receptor 0.027 0.158
Hormone 0.001 0.206
Structural_protein 0.002 0.087
Transporter 0.024 0.222
Ion_channel 0.018 0.307
Voltage-gated_ion_channel 0.004 0.195
Cation_channel 0.012 0.268
Transcription 0.070 0.550
Transcription_regulation 0.030 0.237
Stress_response 0.085 0.962
Immune_response 0.153 1.804
Growth_factor 0.005 0.376
Metal_ion_transport 0.009 0.020

</figtable>

Pfam

References

<references/>