Difference between revisions of "Task 3: odba human Sequence-based predictions"

From Bioinformatikpedia
(Signal peptides)
 
(11 intermediate revisions by 2 users not shown)
Line 947: Line 947:
 
|}
 
|}
   
  +
The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated.
Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results.We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de or in UniProt that indicate towards these Proteins being signal peptides.
 
  +
Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3.
   
 
= GO terms =
 
= GO terms =
  +
The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase.
  +
 
<table border=1>
 
<table border=1>
 
<tr><td colspan=4 align=center><b>GOPET</b></td></tr>
 
<tr><td colspan=4 align=center><b>GOPET</b></td></tr>
Line 967: Line 970:
 
<tr><td>GO:0046872</td><td>F</td><td>62%</td><td>metal ion binding</td></tr>
 
<tr><td>GO:0046872</td><td>F</td><td>62%</td><td>metal ion binding</td></tr>
 
</table>
 
</table>
  +
  +
  +
Unfortunately applying ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%).
   
 
############## ProtFun 2.2 predictions ##############
 
############## ProtFun 2.2 predictions ##############

Latest revision as of 16:12, 3 December 2012

secondary structure

To predict secondary structure we use the following tools and compare the results:

-reprof
-psipred
-DSSP_Server

Methods

reprof

to run reprof from the command line the following command is used:

reprof -i seq.fasta

reprof then calculates the secondary structure prediction and provides an output file "seq.reprof". Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse. Result: (H = Helix, E = Extended/Sheet, L = Loop)

reprof with fasta

obda_human
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656
LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543
HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567
EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138
LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

898975011024431728777888989998887256688886799
LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337
LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411
HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777
HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025
HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

65766771365554034675421654433056661778999988407888851138
LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420
LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624
EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763
LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554
EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554
HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

331156787677886677889
HHHLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524
LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

reprof with HHBlits - PSSM (up20)

To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:

hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"

now reprof is run using the created pssm's. Results:

obda_human - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978
LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777
HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855
LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736
LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

674889999984266986689888988999872585477765789
LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 - PSSM UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778
LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888
HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888
EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783
HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

16768915799999887624877067785067688677999998898767846738
LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614
HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630
LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999
EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666
HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

554655567766555544559
LLLLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843
LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

psipred

the version from [1] was used to predict secondary structure with psipred. Results:

obda_human
confidence
sec-structure
AA-sequence
915554344652010125789986408888898888999867679889999999999943
CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE

345544347897889982787589997259999999999999999999799999999999
CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY

984258734444798615999998531399982418689312241328998999998626
HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG

978889988877778888776122353334681566759888767099958999818886
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA

670358878577674079759999579823236884113641143204567987321028
CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

209999999999999999089974998642117999999999999997788866514995
CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP

799999999779999999999999999999999999992999996677866421799789
HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL

9999999999998299899998789
HHHHHHHHHHHHHHCCCCCCCCCCC
RKQQESLARHLQTYGEHYPLDHFDK
P10775
confidence
sec-structure
AA-sequence

989828999999999999677137869971179999887999999643699967898647
CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT

999958999999760499984138982169999455648999723799858897979999
CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC
NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL

926999999884299986488981148899014899999871699979997869999968
CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH
GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA

999999750299985368971389999776999999882199979796999999918999
HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA

998530499984148970289999665999999860499969996889999907999999
HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH
ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

980499986458961189789888999999882499749896999999925899999851
HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL

899961119980268899666999999984399979885999999938999999840699
CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC
SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP

997678830589887899999999995599831219
CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC
GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
confidence
sec-structure
AA-sequence

999988999887776433454799998899321139899989999999996317799999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV

999999999997219880321498447446663057899999816999998630024543
HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC
ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV

689971899999999834099957772157432122131554799995408989999985
CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH
DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD

429830331043915485046798999911110268999999999741002589999889
HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC
AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG

976434567889874069747689999987549803433323114013201134467999
CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC
NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

469980378752345882379998376020786434899998898885444531367899
CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK

999999984067865568988888820389999999997775556788889999842100
HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV

000137999999886667723566666677776665430699998898548887401200
HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH
LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN

38999989998864522211113457789999999999999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ
Q9X0E6
confidence
sec-structure
AA-sequence

999999079999999999998635611237776532476544610112148789766711
CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE
 
19999999986599966418998366555778899875229
CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC
KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL

DSSP_Server

to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209

uniprot IDpdb ID's
P126941DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F
P107751DFJ, 2BNH
Q082091AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8
Q9X0E61KR4, 1O5J, 1VHF

now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein. Results:

P12694 - 2BFD - Position 46-445
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S    TT      SS        SS S EE SB TTS BS GGG     HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS     TT HHHHHHHHHTS 
AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD

TTSEEE  S  HHHHHHTT  HHHHHHHHHT TT TTTT S SS   BTTTTB    SSTTTHHHHHHHHHHHHHHHT    EEEEEETTGGGSHHHHHH
NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG

HHHHHHTT  EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT  EEEEEE           HHHHHHHHH
FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL

TTTT   HHHHHHHHHHHHHHHHHHHHHHHHS B  GGGGSTTSSSS  HHHHHHHHHHHHHHHHHGGGS GGGB         S EEEE HHHHHHHHH
LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL

HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE  S HHHHHHHHHHHHHTT  EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS  TTEE
DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT

EEEEES  SS GGGSS   HHHHHTSTT EEE  SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS  EEEESS     SS  EEEE  SSEEEEE TTHH
IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV

HHHHHHHHHHHHHH   EEEEE  EEES  HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS  EEEEE SS   STTHHHHS  HHH
HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK

HHHHHHHHHT 
CYDALRKMINY
P10775 - 2BNH - position 1-456
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S B  EES    HHHHHHHHHHHTT SEEEEET    HHHHHHHHHHHTT TT  EEE  S   HHHHHHHHHHHHSSTT    EEE TTS   GGGGGS
AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV

HHHHHHH TT  EEE  S   HHHHHHHHHHHHHSTT    EEE TT   BHHHHHHHHHHHHH S   EEE TTSB HHHHHHHHHHHHHT  S   EE
LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL

E TTS   HHHHHHHHHHHHH TT  EEE  SS  HHHHHHHHHHHHT TT    EEE TTS   HHHHHHHHHHHHH SS  EEE TTS  HHHHHHHH
RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL

HHHHTSTT    EEE TTS  BGGGHHHHHHHHHH SS  EEE  SSB HHHHHHHHHHHTTSSS    EEE TTS   HHHHHHHHHHHHH  S  EEE
CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD

 TTSS  HHHHHHHHHHHTSSS    EEE TT    HHHHHHHHHHHHH SS EEE 
LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - 1AUI - position 1-521
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S   SSTTS       B HHHHB TTS B HHHHHHHHHTT  B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE   TT HHHHHHHHHHH  TTT  
ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR

EEE S  SSSSS HHHHHHHHHHHHHHSTTTEEE   TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS  EEEETTTEEEESS   TT  SHHHH
YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI

HHS  SSS  SSSHHHHHHH EE TTTTS SS   EEE TTTTSSEEE HHHHHHHHHHTT SEEEE  S  TTSEEE  B TTTSSBSEEEE   SSGG
RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL

GTS   EEEEEEETTEEEEEEE         GGG  HHHHHHHHHHHHHHHHHHHHHTT    HHHHHHHHGGGGS             S  HHHHHHHH
DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG

HHHHHH TT  SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT  HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS
KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL

 HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG     
KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV
Q9X0E6 - 1KR4 - position 1-101
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S  EE   EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS  EEEE    EEHHH
AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY

HHHHHHHTS  
XNWLRESVLGS

Comparison

MethodqueryTP's against DSSPQ3
reprof fastaobda_human215 / 37856.87%
reprof PSSMobda_human286 / 37875.66%
psipredobda_human237 / 37862.69%
reprof fastaP10775279 / 45661.18%
reprof PSSMP10775342 / 45675%
psipredP10775268 / 45658.77%
reprof fastaQ08209211 / 38055.52%
reprof PSSMQ08209299 / 38078.68%
psipredQ08209218 / 38057.36%
reprof fastaQ9X0E667 / 11060.90%
reprof PSSMQ9X0E692 / 11083.63%
psipredQ9X0E692 / 11083.63%

This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file.

Disorder

To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:

up-IDDisProt-IDidentitiespositivesgapse-valuedirect hit
Q08209DP00092100%100%0-y
P12694-0%0%0-n
P10775DP0055440%54%05e-30n
Q9X0E6DP0017532%56%04.3n

Q08209

DisProt
Regiontypelocationlength
1Disordered - Extended1-1313
2Disordered - Extended374-46895
3Disordered - Extended390-41425
4Disordered - Extended469-48618
5Disordered - Extended487-52135
6ordered14-373360
IUPred
Regiontypelocationlength
1short disordered1 - 1110
2short disordered13 - 130
3short disordered18 - 191
4short disordered24 - 240
5short disordered32 - 353
6short disordered434 - 4340
7short disordered437 - 4370
8short disordered460 - 4600
9short disordered463 - 4663
10short disordered469 - 52152
11long disordered1 - 1110
12long disordered13 - 130
13long disordered18 - 191
14long disordered24 - 240
15long disordered32 - 353
16long disordered434 - 4340
17long disordered437 - 4370
18long disordered460 - 4600
19long disordered463 - 4663
20long disordered469 - 52152
21short ordered12 - 120
22short ordered14 - 173
23short ordered20 - 233
24short ordered25 - 316
25short ordered36 - 433397
26short ordered435 - 4361
27short ordered438 - 45921
28short ordered461 - 4621
29short ordered467 - 4681
30long ordered12 - 120
31long ordered14 - 173
32long ordered20 - 233
33long ordered25 - 316
34long ordered36 - 433397
35long ordered435 - 4361
36long ordered438 - 45921
37long ordered461 - 4621
38long ordered467 - 4681

P12694

DisProt
Regiontypelocationlength
N/AN/AN/AN/A
IUPred
Regiontypelocationlength
1short disordered1 - 10
2short disordered33 - 5522
3short disordered92 - 931
4short disordered393 - 41118
5short disordered415 - 4150
6short disordered420 - 4211
7short disordered423 - 4252
8short disordered427 - 4281
9short disordered433 - 4330
10short disordered438 - 4457
11long disordered1 - 10
12long disordered33 - 5522
13long disordered92 - 931
14long disordered393 - 41118
15long disordered415 - 4150
16long disordered420 - 4211
17long disordered423 - 4252
18long disordered427 - 4281
19long disordered433 - 4330
20long disordered438 - 4457
21short ordered2 - 3230
22short ordered56 - 9135
23short ordered94 - 392298
24short ordered412 - 4142
25short ordered416 - 4193
26short ordered422 - 4220
27short ordered426 - 4260
28short ordered429 - 4323
29short ordered434 - 4373
30long ordered2 - 3230
31long ordered56 - 9135
32long ordered94 - 392298
33long ordered412 - 4142
34long ordered416 - 4193
35long ordered422 - 4220
36long ordered426 - 4260
37long ordered429 - 4323
38long ordered434 - 4373

P10775

DisProt
Regiontypelocationlength
1Disordered31 - 5020
IUPred
Regiontypelocationlength
1short disordered1 - 54
2short disordered452 - 4564
3long disordered1 - 54
4long disordered452 - 4564
5short ordered6 - 451445
6long ordered6 - 451445

Q9X0E6

DisProt
Regiontypelocationlength
1Disordered1 - 5656
IUPred
Regiontypelocationlength
1short ordered1 - 101100
2long ordered1 - 101100

Transmembrane helices

For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8(Voltage-gated potassium channel) and P477863(Aquaporin-4).

Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions, we found no entry in any of the databases that indicate otherwise.

P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.

3PBL » Dopamine D3 receptor, image from OPM.
Region PolyPhobius Start Stop UniProt Start Stop OPM Start Stop PDBTM Start Stop
1.transmembrane 30 55 33 55 34 52 35 52
2.transmembrane 66 88 66 88 67 91 68 84
3.transmembrane 105 126 105 126 101 126 109 123
4.transmembrane 150 170 150 170 150 170 152 166
5.transmembrane 188 212 188 212 187 209 191 206
6.transmembrane 329 352 330 351 330 351 334 347
7.transmembrane 367 386 367 388 363 386 368 382

As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction.

For Q9YDF8 PolyPhobius again predicts seven transmembrane regions.

1ORQ » Potassium channel KvAP, image from OPM.
Region PolyPhobius Start Stop UniProt Start Stop OPM Start Stop PDBTM Start Stop
1.transmembrane 42 60 39 63 153 172 21 52
2.transmembrane 68 88 68 92 183 195 57 80
3.transmembrane 108 129 109 125 207 225 151 171
4.transmembrane 137 157 129 145 *184 *200
5.transmembrane 163 184 160 184 209 236
6.transmembrane 196 213 *196 *208
7.transmembrane 224 244 222 253

In this case however the number of transmembrane regions found per database varies greatly. While Uniprot notes six transmembrane regieons and one intermembrane region (marked with *) that mostly overlap the prediction of Polyphobius, the 1orq structure of the protein has four identical subunits that all have the same three transmembrane regions. According so PDBTM there are four transmembrane regions and one intermembrane region for each of the identical subunits. For this Protein identification of transmembrane regions seems to be quite difficult, as there is so little consensus across the different databases. The problem seems to be caused by the shallow angles by which most of the helices enter the membrane, and by the fact that only very few of them actually cross the membrane.

For P47863 PolyPhobius predicts six transmembrane regions.

2D57 » Aquaporin-4, image from OPM.
Region PolyPhobius Start Stop UniProt Start Stop OPM Start Stop PDBTM Start Stop
1.transmembrane 34 58 37 57 34 56 38 55
2.transmembrane 70 91 65 85 70 88 72 89
3.transmembrane 115 136 116 136 98 107 *94 *106
4.transmembrane 156 177 156 176 112 136 116 133
5.transmembrane 188 208 185 205 156 178 158 177
6.transmembrane 231 252 232 252 189 203 188 205
7.transmembrane 214 223 *209 *222
8.transmembrane 231 252 231 248

The third and seventh transmembrane regions are listed in OPM despite the fact, that they are actually to short to span through a membrane. Most likely these are actually intermembrane helices, as marked in PDBTM (as can be seen in the illustration to the left, two helices of the yellow subunit do not actually spann through the membrane).

In from the analysis with the three proteins we conclude that results produced by PolyPhobius seem reasonably good. While PolyPhobius seems to filter out results that are to short to actually be a transmembrane helix, it seems not to differenciate well between intermembrane and transmembrane helices, especially when the angle of entry is very shallow. This again can confuse the tools sense of interior/outerior of a protein which leads to decrease in performance.

Some other tools to predict transmembrane regions are for example MEMSAT3, MINNOU, PHDhtm, TMHMM2, DAS, HMMTOP2, OCTOPUS, SVMtop, PONGO or BPROMPT.

Signal peptides

For the prediction of Signal Peptides we used SignalP in Versions 3(Offline-Version) and 4 (Webserver). In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P02768(Serum albumin), P11279(Lysosome-associated membrane glycoprotein 1) and P477863(Aquaporin-4).

Protein SignalPv3 NN MeanS-Score Hmm-Confidence SignalPv4 NN MeanS-Score SignalPeptide.de
ODBA_Human 0.561 0.723 0.357 no entry
P02768 0.941 0.967 0.890 confirmed
P11279 0,961 1.000 0.962 confirmed
P47863 0.376 0.723 0.139 no entry

The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated. Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3.

GO terms

The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase.

GOPET
GOidAspectConfidenceGO-Term
GO:0003824F97%catalytic activity
GO:0016491F96%oxidoreductase activity
GO:0016624F95%oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863F90%3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739F89%pyruvate dehydrogenase acetyl-transferring activity
GO:0004738F78%pyruvate dehydrogenase activity
GO:0003826F77%alpha-ketoacid dehydrogenase activity
GO:0047101F75%2-oxoisovalerate dehydrogenase acylating activity
GO:0008677F65%2-dehydropantoate 2-reductase activity
GO:0019152F63%acetoin dehydrogenase activity
GO:0030955F63%potassium ion binding
GO:0016616F62%oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872F62%metal ion binding


Unfortunately applying ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%).

############## ProtFun 2.2 predictions ##############

>sp_P12694_O

# Functional category                  Prob     Odds
 Amino_acid_biosynthesis              0.187    8.520
 Biosynthesis_of_cofactors            0.246    3.413
 Cell_envelope                        0.035    0.581
 Cellular_processes                   0.041    0.560
 Central_intermediary_metabolism   => 0.321    5.096
 Energy_metabolism                    0.208    2.310
 Fatty_acid_metabolism                0.023    1.738
 Purines_and_pyrimidines              0.257    1.059
 Regulatory_functions                 0.031    0.194
 Replication_and_transcription        0.170    0.636
 Translation                          0.047    1.078
 Transport_and_binding                0.029    0.071

# Enzyme/nonenzyme                     Prob     Odds
 Enzyme                            => 0.769    2.683
 Nonenzyme                            0.231    0.324

# Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.178    0.857
 Transferase    (EC 2.-.-.-)          0.238    0.690
 Hydrolase      (EC 3.-.-.-)          0.190    0.601
 Lyase          (EC 4.-.-.-)          0.076    1.614
 Isomerase      (EC 5.-.-.-)          0.010    0.321
 Ligase         (EC 6.-.-.-)       => 0.085    1.673

# Gene Ontology category               Prob     Odds
 Signal_transducer                    0.098    0.458
 Receptor                             0.006    0.038
 Hormone                              0.001    0.206
 Structural_protein                   0.005    0.170
 Transporter                          0.025    0.226
 Ion_channel                          0.009    0.163
 Voltage-gated_ion_channel            0.004    0.170
 Cation_channel                       0.010    0.215
 Transcription                        0.060    0.470
 Transcription_regulation             0.053    0.427
 Stress_response                      0.010    0.110
 Immune_response                      0.012    0.136
 Growth_factor                        0.009    0.609
 Metal_ion_transport                  0.012    0.025

//