Latest revision as of 16:12, 3 December 2012

secondary structure

To predict secondary structure we use the following tools and compare the results:

-reprof
-psipred
-DSSP_Server

Methods

reprof

to run reprof from the command line the following command is used:

reprof -i seq.fasta

reprof then calculates the secondary structure prediction and provides an output file "seq.reprof". Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse. Result: (H = Helix, E = Extended/Sheet, L = Loop)

reprof with fasta

obda_human
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656
LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543
HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567
EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138
LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

898975011024431728777888989998887256688886799
LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK

P10775
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337
LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411
HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777
HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025
HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

65766771365554034675421654433056661778999988407888851138
LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS

Q08209
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420
LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624
EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763
LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554
EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554
HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

331156787677886677889
HHHLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ

Q9X0E6
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524
LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

reprof with HHBlits - PSSM (up20)

To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:

hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"

now reprof is run using the created pssm's. Results:

obda_human - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978
LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777
HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855
LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736
LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

674889999984266986689888988999872585477765789
LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK

P10775 - PSSM UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778
LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888
HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888
EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783
HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

16768915799999887624877067785067688677999998898767846738
LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS

Q08209 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614
HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630
LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999
EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666
HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

554655567766555544559
LLLLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ

Q9X0E6 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843
LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

psipred

the version from [1] was used to predict secondary structure with psipred. Results:

obda_human
confidence
sec-structure
AA-sequence
915554344652010125789986408888898888999867679889999999999943
CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE

345544347897889982787589997259999999999999999999799999999999
CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY

984258734444798615999998531399982418689312241328998999998626
HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG

978889988877778888776122353334681566759888767099958999818886
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA

670358878577674079759999579823236884113641143204567987321028
CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

209999999999999999089974998642117999999999999997788866514995
CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP

799999999779999999999999999999999999992999996677866421799789
HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL

9999999999998299899998789
HHHHHHHHHHHHHHCCCCCCCCCCC
RKQQESLARHLQTYGEHYPLDHFDK

P10775
confidence
sec-structure
AA-sequence

989828999999999999677137869971179999887999999643699967898647
CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT

999958999999760499984138982169999455648999723799858897979999
CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC
NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL

926999999884299986488981148899014899999871699979997869999968
CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH
GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA

999999750299985368971389999776999999882199979796999999918999
HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA

998530499984148970289999665999999860499969996889999907999999
HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH
ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

980499986458961189789888999999882499749896999999925899999851
HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL

899961119980268899666999999984399979885999999938999999840699
CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC
SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP

997678830589887899999999995599831219
CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC
GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS

Q08209
confidence
sec-structure
AA-sequence

999988999887776433454799998899321139899989999999996317799999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV

999999999997219880321498447446663057899999816999998630024543
HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC
ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV

689971899999999834099957772157432122131554799995408989999985
CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH
DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD

429830331043915485046798999911110268999999999741002589999889
HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC
AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG

976434567889874069747689999987549803433323114013201134467999
CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC
NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

469980378752345882379998376020786434899998898885444531367899
CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK

999999984067865568988888820389999999997775556788889999842100
HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV

000137999999886667723566666677776665430699998898548887401200
HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH
LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN

38999989998864522211113457789999999999999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ

Q9X0E6
confidence
sec-structure
AA-sequence

999999079999999999998635611237776532476544610112148789766711
CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE
 
19999999986599966418998366555778899875229
CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC
KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL

DSSP_Server

to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209

uniprot ID	pdb ID's
P12694	1DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F
P10775	1DFJ, 2BNH
Q08209	1AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8
Q9X0E6	1KR4, 1O5J, 1VHF

now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein. Results:

P12694 - 2BFD - Position 46-445
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S    TT      SS        SS S EE SB TTS BS GGG     HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS     TT HHHHHHHHHTS 
AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD

TTSEEE  S  HHHHHHTT  HHHHHHHHHT TT TTTT S SS   BTTTTB    SSTTTHHHHHHHHHHHHHHHT    EEEEEETTGGGSHHHHHH
NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG

HHHHHHTT  EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT  EEEEEE           HHHHHHHHH
FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL

TTTT   HHHHHHHHHHHHHHHHHHHHHHHHS B  GGGGSTTSSSS  HHHHHHHHHHHHHHHHHGGGS GGGB         S EEEE HHHHHHHHH
LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL

HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE  S HHHHHHHHHHHHHTT  EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS  TTEE
DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT

EEEEES  SS GGGSS   HHHHHTSTT EEE  SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS  EEEESS     SS  EEEE  SSEEEEE TTHH
IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV

HHHHHHHHHHHHHH   EEEEE  EEES  HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS  EEEEE SS   STTHHHHS  HHH
HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK

HHHHHHHHHT 
CYDALRKMINY

P10775 - 2BNH - position 1-456
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S B  EES    HHHHHHHHHHHTT SEEEEET    HHHHHHHHHHHTT TT  EEE  S   HHHHHHHHHHHHSSTT    EEE TTS   GGGGGS
AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV

HHHHHHH TT  EEE  S   HHHHHHHHHHHHHSTT    EEE TT   BHHHHHHHHHHHHH S   EEE TTSB HHHHHHHHHHHHHT  S   EE
LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL

E TTS   HHHHHHHHHHHHH TT  EEE  SS  HHHHHHHHHHHHT TT    EEE TTS   HHHHHHHHHHHHH SS  EEE TTS  HHHHHHHH
RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL

HHHHTSTT    EEE TTS  BGGGHHHHHHHHHH SS  EEE  SSB HHHHHHHHHHHTTSSS    EEE TTS   HHHHHHHHHHHHH  S  EEE
CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD

 TTSS  HHHHHHHHHHHTSSS    EEE TT    HHHHHHHHHHHHH SS EEE 
LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS

Q08209 - 1AUI - position 1-521
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S   SSTTS       B HHHHB TTS B HHHHHHHHHTT  B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE   TT HHHHHHHHHHH  TTT  
ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR

EEE S  SSSSS HHHHHHHHHHHHHHSTTTEEE   TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS  EEEETTTEEEESS   TT  SHHHH
YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI

HHS  SSS  SSSHHHHHHH EE TTTTS SS   EEE TTTTSSEEE HHHHHHHHHHTT SEEEE  S  TTSEEE  B TTTSSBSEEEE   SSGG
RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL

GTS   EEEEEEETTEEEEEEE         GGG  HHHHHHHHHHHHHHHHHHHHHTT    HHHHHHHHGGGGS             S  HHHHHHHH
DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG

HHHHHH TT  SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT  HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS
KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL

 HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG     
KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV

Q9X0E6 - 1KR4 - position 1-101
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S  EE   EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS  EEEE    EEHHH
AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY

HHHHHHHTS  
XNWLRESVLGS

Comparison

Method	query	TP's against DSSP	Q3
reprof fasta	obda_human	215 / 378	56.87%
reprof PSSM	obda_human	286 / 378	75.66%
psipred	obda_human	237 / 378	62.69%
reprof fasta	P10775	279 / 456	61.18%
reprof PSSM	P10775	342 / 456	75%
psipred	P10775	268 / 456	58.77%
reprof fasta	Q08209	211 / 380	55.52%
reprof PSSM	Q08209	299 / 380	78.68%
psipred	Q08209	218 / 380	57.36%
reprof fasta	Q9X0E6	67 / 110	60.90%
reprof PSSM	Q9X0E6	92 / 110	83.63%
psipred	Q9X0E6	92 / 110	83.63%

This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file.

Disorder

To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:

up-ID	DisProt-ID	identities	positives	gaps	e-value	direct hit
Q08209	DP00092	100%	100%	0	-	y
P12694	-	0%	0%	0	-	n
P10775	DP00554	40%	54%	0	5e-30	n
Q9X0E6	DP00175	32%	56%	0	4.3	n

Q08209

DisProt
Region	type	location	length
1	Disordered - Extended	1-13	13
2	Disordered - Extended	374-468	95
3	Disordered - Extended	390-414	25
4	Disordered - Extended	469-486	18
5	Disordered - Extended	487-521	35
6	ordered	14-373	360

IUPred
Region	type	location	length
1	short disordered	1 - 11	10
2	short disordered	13 - 13	0
3	short disordered	18 - 19	1
4	short disordered	24 - 24	0
5	short disordered	32 - 35	3
6	short disordered	434 - 434	0
7	short disordered	437 - 437	0
8	short disordered	460 - 460	0
9	short disordered	463 - 466	3
10	short disordered	469 - 521	52
11	long disordered	1 - 11	10
12	long disordered	13 - 13	0
13	long disordered	18 - 19	1
14	long disordered	24 - 24	0
15	long disordered	32 - 35	3
16	long disordered	434 - 434	0
17	long disordered	437 - 437	0
18	long disordered	460 - 460	0
19	long disordered	463 - 466	3
20	long disordered	469 - 521	52
21	short ordered	12 - 12	0
22	short ordered	14 - 17	3
23	short ordered	20 - 23	3
24	short ordered	25 - 31	6
25	short ordered	36 - 433	397
26	short ordered	435 - 436	1
27	short ordered	438 - 459	21
28	short ordered	461 - 462	1
29	short ordered	467 - 468	1
30	long ordered	12 - 12	0
31	long ordered	14 - 17	3
32	long ordered	20 - 23	3
33	long ordered	25 - 31	6
34	long ordered	36 - 433	397
35	long ordered	435 - 436	1
36	long ordered	438 - 459	21
37	long ordered	461 - 462	1
38	long ordered	467 - 468	1

P12694

DisProt
Region	type	location	length
N/A	N/A	N/A	N/A

IUPred
Region	type	location	length
1	short disordered	1 - 1	0
2	short disordered	33 - 55	22
3	short disordered	92 - 93	1
4	short disordered	393 - 411	18
5	short disordered	415 - 415	0
6	short disordered	420 - 421	1
7	short disordered	423 - 425	2
8	short disordered	427 - 428	1
9	short disordered	433 - 433	0
10	short disordered	438 - 445	7
11	long disordered	1 - 1	0
12	long disordered	33 - 55	22
13	long disordered	92 - 93	1
14	long disordered	393 - 411	18
15	long disordered	415 - 415	0
16	long disordered	420 - 421	1
17	long disordered	423 - 425	2
18	long disordered	427 - 428	1
19	long disordered	433 - 433	0
20	long disordered	438 - 445	7
21	short ordered	2 - 32	30
22	short ordered	56 - 91	35
23	short ordered	94 - 392	298
24	short ordered	412 - 414	2
25	short ordered	416 - 419	3
26	short ordered	422 - 422	0
27	short ordered	426 - 426	0
28	short ordered	429 - 432	3
29	short ordered	434 - 437	3
30	long ordered	2 - 32	30
31	long ordered	56 - 91	35
32	long ordered	94 - 392	298
33	long ordered	412 - 414	2
34	long ordered	416 - 419	3
35	long ordered	422 - 422	0
36	long ordered	426 - 426	0
37	long ordered	429 - 432	3
38	long ordered	434 - 437	3

P10775

DisProt
Region	type	location	length
1	Disordered	31 - 50	20

IUPred
Region	type	location	length
1	short disordered	1 - 5	4
2	short disordered	452 - 456	4
3	long disordered	1 - 5	4
4	long disordered	452 - 456	4
5	short ordered	6 - 451	445
6	long ordered	6 - 451	445

Q9X0E6

DisProt
Region	type	location	length
1	Disordered	1 - 56	56

IUPred
Region	type	location	length
1	short ordered	1 - 101	100
2	long ordered	1 - 101	100

Transmembrane helices

For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8(Voltage-gated potassium channel) and P477863(Aquaporin-4).

Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions, we found no entry in any of the databases that indicate otherwise.

P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.

3PBL » Dopamine D3 receptor, image from OPM.

Region	PolyPhobius Start	Stop	UniProt Start	Stop	OPM Start	Stop	PDBTM Start	Stop
1.transmembrane	30	55	33	55	34	52	35	52
2.transmembrane	66	88	66	88	67	91	68	84
3.transmembrane	105	126	105	126	101	126	109	123
4.transmembrane	150	170	150	170	150	170	152	166
5.transmembrane	188	212	188	212	187	209	191	206
6.transmembrane	329	352	330	351	330	351	334	347
7.transmembrane	367	386	367	388	363	386	368	382

As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction.

For Q9YDF8 PolyPhobius again predicts seven transmembrane regions.

1ORQ » Potassium channel KvAP, image from OPM.

Region	PolyPhobius Start	Stop	UniProt Start	Stop	OPM Start	Stop	PDBTM Start	Stop
1.transmembrane	42	60	39	63	153	172	21	52
2.transmembrane	68	88	68	92	183	195	57	80
3.transmembrane	108	129	109	125	207	225	151	171
4.transmembrane	137	157	129	145			*184	*200
5.transmembrane	163	184	160	184			209	236
6.transmembrane	196	213	*196	*208
7.transmembrane	224	244	222	253

In this case however the number of transmembrane regions found per database varies greatly. While Uniprot notes six transmembrane regieons and one intermembrane region (marked with *) that mostly overlap the prediction of Polyphobius, the 1orq structure of the protein has four identical subunits that all have the same three transmembrane regions. According so PDBTM there are four transmembrane regions and one intermembrane region for each of the identical subunits. For this Protein identification of transmembrane regions seems to be quite difficult, as there is so little consensus across the different databases. The problem seems to be caused by the shallow angles by which most of the helices enter the membrane, and by the fact that only very few of them actually cross the membrane.

For P47863 PolyPhobius predicts six transmembrane regions.

2D57 » Aquaporin-4, image from OPM.

Region	PolyPhobius Start	Stop	UniProt Start	Stop	OPM Start	Stop	PDBTM Start	Stop
1.transmembrane	34	58	37	57	34	56	38	55
2.transmembrane	70	91	65	85	70	88	72	89
3.transmembrane	115	136	116	136	98	107	*94	*106
4.transmembrane	156	177	156	176	112	136	116	133
5.transmembrane	188	208	185	205	156	178	158	177
6.transmembrane	231	252	232	252	189	203	188	205
7.transmembrane					214	223	*209	*222
8.transmembrane					231	252	231	248

The third and seventh transmembrane regions are listed in OPM despite the fact, that they are actually to short to span through a membrane. Most likely these are actually intermembrane helices, as marked in PDBTM (as can be seen in the illustration to the left, two helices of the yellow subunit do not actually spann through the membrane).

In from the analysis with the three proteins we conclude that results produced by PolyPhobius seem reasonably good. While PolyPhobius seems to filter out results that are to short to actually be a transmembrane helix, it seems not to differenciate well between intermembrane and transmembrane helices, especially when the angle of entry is very shallow. This again can confuse the tools sense of interior/outerior of a protein which leads to decrease in performance.

Some other tools to predict transmembrane regions are for example MEMSAT3, MINNOU, PHDhtm, TMHMM2, DAS, HMMTOP2, OCTOPUS, SVMtop, PONGO or BPROMPT.

Signal peptides

For the prediction of Signal Peptides we used SignalP in Versions 3(Offline-Version) and 4 (Webserver). In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P02768(Serum albumin), P11279(Lysosome-associated membrane glycoprotein 1) and P477863(Aquaporin-4).

Protein	SignalPv3 NN MeanS-Score	Hmm-Confidence	SignalPv4 NN MeanS-Score	SignalPeptide.de
ODBA_Human	0.561	0.723	0.357	no entry
P02768	0.941	0.967	0.890	confirmed
P11279	0,961	1.000	0.962	confirmed
P47863	0.376	0.723	0.139	no entry

The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated. Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3.

GO terms

The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase.

GOPET
GOid	Aspect	Confidence	GO-Term
GO:0003824	F	97%	catalytic activity
GO:0016491	F	96%	oxidoreductase activity
GO:0016624	F	95%	oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863	F	90%	3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739	F	89%	pyruvate dehydrogenase acetyl-transferring activity
GO:0004738	F	78%	pyruvate dehydrogenase activity
GO:0003826	F	77%	alpha-ketoacid dehydrogenase activity
GO:0047101	F	75%	2-oxoisovalerate dehydrogenase acylating activity
GO:0008677	F	65%	2-dehydropantoate 2-reductase activity
GO:0019152	F	63%	acetoin dehydrogenase activity
GO:0030955	F	63%	potassium ion binding
GO:0016616	F	62%	oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872	F	62%	metal ion binding

Unfortunately applying ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%).

############## ProtFun 2.2 predictions ##############

>sp_P12694_O

# Functional category                  Prob     Odds
 Amino_acid_biosynthesis              0.187    8.520
 Biosynthesis_of_cofactors            0.246    3.413
 Cell_envelope                        0.035    0.581
 Cellular_processes                   0.041    0.560
 Central_intermediary_metabolism   => 0.321    5.096
 Energy_metabolism                    0.208    2.310
 Fatty_acid_metabolism                0.023    1.738
 Purines_and_pyrimidines              0.257    1.059
 Regulatory_functions                 0.031    0.194
 Replication_and_transcription        0.170    0.636
 Translation                          0.047    1.078
 Transport_and_binding                0.029    0.071

# Enzyme/nonenzyme                     Prob     Odds
 Enzyme                            => 0.769    2.683
 Nonenzyme                            0.231    0.324

# Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.178    0.857
 Transferase    (EC 2.-.-.-)          0.238    0.690
 Hydrolase      (EC 3.-.-.-)          0.190    0.601
 Lyase          (EC 4.-.-.-)          0.076    1.614
 Isomerase      (EC 5.-.-.-)          0.010    0.321
 Ligase         (EC 6.-.-.-)       => 0.085    1.673

# Gene Ontology category               Prob     Odds
 Signal_transducer                    0.098    0.458
 Receptor                             0.006    0.038
 Hormone                              0.001    0.206
 Structural_protein                   0.005    0.170
 Transporter                          0.025    0.226
 Ion_channel                          0.009    0.163
 Voltage-gated_ion_channel            0.004    0.170
 Cation_channel                       0.010    0.215
 Transcription                        0.060    0.470
 Transcription_regulation             0.053    0.427
 Stress_response                      0.010    0.110
 Immune_response                      0.012    0.136
 Growth_factor                        0.009    0.609
 Metal_ion_transport                  0.012    0.025

//

Difference between revisions of "Task 3: odba human Sequence-based predictions"

Latest revision as of 16:12, 3 December 2012

Contents

secondary structure

Methods

reprof

reprof with fasta

reprof with HHBlits - PSSM (up20)

psipred

DSSP_Server

Comparison

Disorder

Q08209

P12694

P10775

Q9X0E6

Transmembrane helices

Signal peptides

GO terms

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools

@@ Line 632: / Line 632: @@
 = Transmembrane helices =
-For the prediction of Transmembrane helices in our Protein we used [http://phobius.sbc.su.se/poly.html PolyPhobius]. In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P35462 P35462](D(3) dopamine receptor), [http://www.uniprot.org/uniprot/Q9YDF8 Q9YDF8] and [http://www.uniprot.org/uniprot/P47863 P477863].
+For the prediction of Transmembrane helices in our Protein we used [http://phobius.sbc.su.se/poly.html PolyPhobius]. In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P35462 P35462](D(3) dopamine receptor), [http://www.uniprot.org/uniprot/Q9YDF8 Q9YDF8](Voltage-gated potassium channel) and [http://www.uniprot.org/uniprot/P47863 P477863](Aquaporin-4).
-Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions.
+Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions, we found no entry in any of the databases that indicate otherwise.
 P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.
+[[File:3pbl.png|200px|thumb|left|3PBL » Dopamine D3 receptor, image from OPM.]]
-{| style="text-align: right;"
+{|class="wikitable" border="1" style="border-spacing:0;text-align: right;"
 !Region
 !PolyPhobius Start
@@ Line 719: / Line 720: @@
 |-
 |}
-As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions.
+As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction.
+For Q9YDF8 PolyPhobius again predicts seven transmembrane regions.
-{| style="text-align: right;"
+[[File:1orq.png|200px|thumb|left|1ORQ » Potassium channel KvAP, image from OPM.]]
-!Database
-!True Positives
+{|class="wikitable" border="1" style="border-spacing:0;text-align: right;"
-!False Positives
+!Region
-!True Negatives
+!PolyPhobius Start
-!False Negatives
+!Stop
+!UniProt Start
+!Stop
+!OPM Start
+!Stop
+!PDBTM Start
+!Stop
 |-
+|1.transmembrane
-|UniProt
-|149
+|42
-|5
+|60
+|39
+|63
+|153
+|172
+|21
+|52
+|-
+|2.transmembrane
+|68
+|88
+|68
+|92
+|183
+|195
+|57
+|80
+|-
+|3.transmembrane
+|108
+|129
+|109
+|125
+|207
+|225
+|151
+|171
+|-
+|4.transmembrane
+|137
+|157
+|129
+|145
 |
-|2
+|
+|*184
+|*200
 |-
+|5.transmembrane
-|OPM
-|141
+|163
+|184
-|7+1+3
+|160
+|184
+|
+|
+|209
+|236
+|-
+|6.transmembrane
+|196
+|213
+|*196
+|*208
+|
+|
+|
+|
+|-
+|7.transmembrane
+|224
+|244
+|222
+|253
+|
+|
+|
 |
-|2
 |-
 |}
+In this case however the number of transmembrane regions found per database varies greatly. While Uniprot notes six transmembrane regieons and one intermembrane region (marked with *) that mostly overlap the prediction of Polyphobius, the 1orq structure of the protein has four identical subunits that all have the same three transmembrane regions. According so PDBTM there are four transmembrane regions and one intermembrane region for each of the identical subunits. For this Protein identification of transmembrane regions seems to be quite difficult, as there is so little consensus across the different databases. The problem seems to be caused by the shallow angles by which most of the helices enter the membrane, and by the fact that only very few of them actually cross the membrane.
+For P47863 PolyPhobius predicts six transmembrane regions.
+[[File:2d57.png|200px|thumb|left|2D57 » Aquaporin-4, image from OPM.]]
+{|class="wikitable" border="1" style="border-spacing:0;text-align: right;"
+!Region
+!PolyPhobius Start
+!Stop
+!UniProt Start
+!Stop
+!OPM Start
+!Stop
+!PDBTM Start
+!Stop
+|-
+|1.transmembrane
+|34
+|58
+|37
+|57
+|34
+|56
+|38
+|55
+|-
+|2.transmembrane
+|70
+|91
+|65
+|85
+|70
+|88
+|72
+|89
+|-
+|3.transmembrane
+|115
+|136
+|116
+|136
+|98
+|107
+|*94
+|*106
+|-
+|4.transmembrane
+|156
+|177
+|156
+|176
+|112
+|136
+|116
+|133
+|-
+|5.transmembrane
+|188
+|208
+|185
+|205
+|156
+|178
+|158
+|177
+|-
+|6.transmembrane
+|231
+|252
+|232
+|252
+|189
+|203
+|188
+|205
+|-
+|7.transmembrane
+|
+|
+|
+|
+|214
+|223
+|*209
+|*222
+|-
+|8.transmembrane
+|
+|
+|
+|
+|231
+|252
+|231
+|248
+|-
+|}
+The third and seventh transmembrane regions are listed in OPM despite the fact, that they are actually to short to span through a membrane. Most likely these are actually intermembrane helices, as marked in PDBTM (as can be seen in the illustration to the left, two helices of the yellow subunit do not actually spann through the membrane).
+In from the analysis with the three proteins we conclude that results produced by PolyPhobius seem reasonably good. While PolyPhobius seems to filter out results that are to short to actually be a transmembrane helix, it seems not to differenciate well between intermembrane and transmembrane helices, especially when the angle of entry is very shallow. This again can confuse the tools sense of interior/outerior of a protein which leads to decrease in performance.
+Some other tools to predict transmembrane regions are for example [http://bioinf.cs.ucl.ac.uk/software_downloads/memsat/ MEMSAT3], [http://minnou.cchmc.org/ MINNOU], [http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_htm.html PHDhtm], [http://www.cbs.dtu.dk/services/TMHMM/ TMHMM2], [http://www.sbc.su.se/~miklos/DAS/maindas.html DAS], [http://www.enzim.hu/hmmtop/ HMMTOP2], [http://octopus.cbr.su.se/index.php?about=OCTOPUS OCTOPUS], [http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop/tmp/index.php SVMtop], [http://pongo.biocomp.unibo.it/ PONGO] or [http://www.ddg-pharmfac.net/bprompt/BPROMPT/BPROMPT.html BPROMPT].
 = Signal peptides =
+For the prediction of Signal Peptides we used [http://www.cbs.dtu.dk/services/SignalP/ SignalP] in Versions 3(Offline-Version) and 4 (Webserver). In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P02768 P02768](Serum albumin), [http://www.uniprot.org/uniprot/P11279 P11279](Lysosome-associated membrane glycoprotein 1) and [http://www.uniprot.org/uniprot/P47863 P477863](Aquaporin-4).
+{|class="wikitable" border="1" style="border-spacing:0;text-align: right;"
+!Protein
+!SignalPv3 NN MeanS-Score
+!Hmm-Confidence
+!SignalPv4 NN MeanS-Score
+!SignalPeptide.de
+|-
+|ODBA_Human
+|0.561
+|0.723
+|0.357
+|no entry
+|-
+|P02768
+|0.941
+|0.967
+|0.890
+|confirmed
+|-
+|P11279
+|0,961
+|1.000
+|0.962
+|confirmed
+|-
+|P47863
+|0.376
+|0.723
+|0.139
+|no entry
+|-
+|}
+The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated.
+Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3.
 = GO terms =
+The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase.
 <table border=1>
 <tr><td colspan=4 align=center><b>GOPET</b></td></tr>
@@ Line 761: / Line 970: @@
 <tr><td>GO:0046872</td><td>F</td><td>62%</td><td>metal ion binding</td></tr>
 </table>
+Unfortunately applying  ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%).
  ############## ProtFun 2.2 predictions ##############