Difference between revisions of "Task 3: odba human Sequence-based predictions"

From Bioinformatikpedia
(GO terms)
(Transmembrane helices)
Line 643: Line 643:
 
!UniProt Start
 
!UniProt Start
 
!Stop
 
!Stop
  +
!OPM Start
|-
 
  +
!Stop
|non cytoplasmic
 
  +
!PDBTM Start
|1
 
  +
!Stop
|29
 
|1
 
|32
 
 
|-
 
|-
 
|1.transmembrane
 
|1.transmembrane
Line 655: Line 653:
 
|33
 
|33
 
|55
 
|55
|-
+
|34
  +
|52
|cytoplasmic
 
|56
+
|35
|65
+
|52
|56
 
|65
 
 
|-
 
|-
 
|2.transmembrane
 
|2.transmembrane
Line 667: Line 663:
 
|66
 
|66
 
|88
 
|88
|-
+
|67
  +
|91
|non cytoplasmic
 
|89
+
|68
|104
+
|84
|89
 
|104
 
 
|-
 
|-
 
|3.transmembrane
 
|3.transmembrane
Line 679: Line 673:
 
|105
 
|105
 
|126
 
|126
|-
+
|101
  +
|126
|cytoplasmic
 
|127
+
|109
|149
+
|123
|127
 
|149
 
 
|-
 
|-
 
|4.transmembrane
 
|4.transmembrane
Line 691: Line 683:
 
|150
 
|150
 
|170
 
|170
|-
+
|150
  +
|170
|non cytoplasmic
 
|171
+
|152
|187
+
|166
|171
 
|187
 
 
|-
 
|-
 
|5.transmembrane
 
|5.transmembrane
Line 703: Line 693:
 
|188
 
|188
 
|212
 
|212
|-
+
|187
  +
|209
|cytoplasmic
 
|213
+
|191
|328
+
|206
|213
 
|329
 
 
|-
 
|-
 
|6.transmembrane
 
|6.transmembrane
Line 715: Line 703:
 
|330
 
|330
 
|351
 
|351
|-
+
|330
  +
|351
|non cytoplasmic
 
|353
+
|334
|366
+
|347
|352
 
|366
 
 
|-
 
|-
 
|7.transmembrane
 
|7.transmembrane
Line 727: Line 713:
 
|367
 
|367
 
|388
 
|388
  +
|363
  +
|386
  +
|368
  +
|382
  +
|-
  +
|}
  +
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions.
  +
  +
{| style="text-align: right;"
  +
!Database
  +
!True Positives
  +
!False Positives
  +
!True Negatives
  +
!False Negatives
  +
|-
  +
|UniProt
  +
|149
  +
|5
  +
|
  +
|2
 
|-
 
|-
  +
|OPM
|cytoplasmic
 
|387
+
|141
  +
|7+1+3
|400
 
|389
+
|
|400
+
|2
 
|-
 
|-
 
|}
 
|}

Revision as of 18:59, 14 May 2012

secondary structure

To predict secondary structure we use the following tools and compare the results:

-reprof
-psipred
-DSSP_Server

Methods

reprof

to run reprof from the command line the following command is used:

reprof -i seq.fasta

reprof then calculates the secondary structure prediction and provides an output file "seq.reprof". Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse. Result: (H = Helix, E = Extended/Sheet, L = Loop)

reprof with fasta

obda_human
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656
LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543
HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567
EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138
LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

898975011024431728777888989998887256688886799
LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337
LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411
HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777
HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025
HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

65766771365554034675421654433056661778999988407888851138
LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420
LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624
EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763
LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554
EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554
HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

331156787677886677889
HHHLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524
LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

reprof with HHBlits - PSSM (up20)

To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:

hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"

now reprof is run using the created pssm's. Results:

obda_human - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978
LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE

8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777
HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER

6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855
LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736
LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK

674889999984266986689888988999872585477765789
LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 - PSSM UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778
LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL

9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888
HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR

5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888
EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783
HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL

16768915799999887624877067785067688677999998898767846738
LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK

9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614
HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG

4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630
LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999
EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI

9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666
HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN

554655567766555544559
LLLLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence

9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843
LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV

9
L
L

psipred

the version from [1] was used to predict secondary structure with psipred. Results:

obda_human
confidence
sec-structure
AA-sequence
915554344652010125789986408888898888999867679889999999999943
CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE

345544347897889982787589997259999999999999999999799999999999
CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY

984258734444798615999998531399982418689312241328998999998626
HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG

978889988877778888776122353334681566759888767099958999818886
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA

670358878577674079759999579823236884113641143204567987321028
CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG

209999999999999999089974998642117999999999999997788866514995
CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP

799999999779999999999999999999999999992999996677866421799789
HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL

9999999999998299899998789
HHHHHHHHHHHHHHCCCCCCCCCCC
RKQQESLARHLQTYGEHYPLDHFDK
P10775
confidence
sec-structure
AA-sequence

989828999999999999677137869971179999887999999643699967898647
CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT

999958999999760499984138982169999455648999723799858897979999
CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC
NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL

926999999884299986488981148899014899999871699979997869999968
CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH
GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA

999999750299985368971389999776999999882199979796999999918999
HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA

998530499984148970289999665999999860499969996889999907999999
HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH
ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC

980499986458961189789888999999882499749896999999925899999851
HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL

899961119980268899666999999984399979885999999938999999840699
CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC
SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP

997678830589887899999999995599831219
CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC
GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
confidence
sec-structure
AA-sequence

999988999887776433454799998899321139899989999999996317799999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV

999999999997219880321498447446663057899999816999998630024543
HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC
ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV

689971899999999834099957772157432122131554799995408989999985
CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH
DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD

429830331043915485046798999911110268999999999741002589999889
HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC
AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG

976434567889874069747689999987549803433323114013201134467999
CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC
NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP

469980378752345882379998376020786434899998898885444531367899
CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK

999999984067865568988888820389999999997775556788889999842100
HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV

000137999999886667723566666677776665430699998898548887401200
HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH
LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN

38999989998864522211113457789999999999999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ
Q9X0E6
confidence
sec-structure
AA-sequence

999999079999999999998635611237776532476544610112148789766711
CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE
 
19999999986599966418998366555778899875229
CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC
KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL

DSSP_Server

to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209

uniprot IDpdb ID's
P126941DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F
P107751DFJ, 2BNH
Q082091AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8
Q9X0E61KR4, 1O5J, 1VHF

now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein. Results:

P12694 - 2BFD - Position 46-445
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S    TT      SS        SS S EE SB TTS BS GGG     HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS     TT HHHHHHHHHTS 
AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD

TTSEEE  S  HHHHHHTT  HHHHHHHHHT TT TTTT S SS   BTTTTB    SSTTTHHHHHHHHHHHHHHHT    EEEEEETTGGGSHHHHHH
NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG

HHHHHHTT  EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT  EEEEEE           HHHHHHHHH
FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL

TTTT   HHHHHHHHHHHHHHHHHHHHHHHHS B  GGGGSTTSSSS  HHHHHHHHHHHHHHHHHGGGS GGGB         S EEEE HHHHHHHHH
LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL

HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE  S HHHHHHHHHHHHHTT  EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS  TTEE
DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT

EEEEES  SS GGGSS   HHHHHTSTT EEE  SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS  EEEESS     SS  EEEE  SSEEEEE TTHH
IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV

HHHHHHHHHHHHHH   EEEEE  EEES  HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS  EEEEE SS   STTHHHHS  HHH
HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK

HHHHHHHHHT 
CYDALRKMINY
P10775 - 2BNH - position 1-456
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S B  EES    HHHHHHHHHHHTT SEEEEET    HHHHHHHHHHHTT TT  EEE  S   HHHHHHHHHHHHSSTT    EEE TTS   GGGGGS
AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV

HHHHHHH TT  EEE  S   HHHHHHHHHHHHHSTT    EEE TT   BHHHHHHHHHHHHH S   EEE TTSB HHHHHHHHHHHHHT  S   EE
LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL

E TTS   HHHHHHHHHHHHH TT  EEE  SS  HHHHHHHHHHHHT TT    EEE TTS   HHHHHHHHHHHHH SS  EEE TTS  HHHHHHHH
RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL

HHHHTSTT    EEE TTS  BGGGHHHHHHHHHH SS  EEE  SSB HHHHHHHHHHHTTSSS    EEE TTS   HHHHHHHHHHHHH  S  EEE
CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD

 TTSS  HHHHHHHHHHHTSSS    EEE TT    HHHHHHHHHHHHH SS EEE 
LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - 1AUI - position 1-521
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S   SSTTS       B HHHHB TTS B HHHHHHHHHTT  B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE   TT HHHHHHHHHHH  TTT  
ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR

EEE S  SSSSS HHHHHHHHHHHHHHSTTTEEE   TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS  EEEETTTEEEESS   TT  SHHHH
YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI

HHS  SSS  SSSHHHHHHH EE TTTTS SS   EEE TTTTSSEEE HHHHHHHHHHTT SEEEE  S  TTSEEE  B TTTSSBSEEEE   SSGG
RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL

GTS   EEEEEEETTEEEEEEE         GGG  HHHHHHHHHHHHHHHHHHHHHTT    HHHHHHHHGGGGS             S  HHHHHHHH
DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG

HHHHHH TT  SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT  HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS
KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL

 HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG     
KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV
Q9X0E6 - 1KR4 - position 1-101
sec-structure 
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence

S  EE   EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS  EEEE    EEHHH
AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY

HHHHHHHTS  
XNWLRESVLGS

Comparison

MethodqueryTP's against DSSPQ3
reprof fastaobda_human215 / 37856.87%
reprof PSSMobda_human286 / 37875.66%
psipredobda_human237 / 37862.69%
reprof fastaP10775279 / 45661.18%
reprof PSSMP10775342 / 45675%
psipredP10775268 / 45658.77%
reprof fastaQ08209211 / 38055.52%
reprof PSSMQ08209299 / 38078.68%
psipredQ08209218 / 38057.36%
reprof fastaQ9X0E667 / 11060.90%
reprof PSSMQ9X0E692 / 11083.63%
psipredQ9X0E692 / 11083.63%

This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file.

Disorder

To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:

up-IDDisProt-IDidentitiespositivesgapse-valuedirect hit
Q08209DP00092100%100%0-y
P12694-0%0%0-n
P10775DP0055440%54%05e-30n
Q9X0E6DP0017532%56%04.3n

Q08209

DisProt
Regiontypelocationlength
1Disordered - Extended1-1313
2Disordered - Extended374-46895
3Disordered - Extended390-41425
4Disordered - Extended469-48618
5Disordered - Extended487-52135
6ordered14-373360
IUPred
Regiontypelocationlength
1short disordered1 - 1110
2short disordered13 - 130
3short disordered18 - 191
4short disordered24 - 240
5short disordered32 - 353
6short disordered434 - 4340
7short disordered437 - 4370
8short disordered460 - 4600
9short disordered463 - 4663
10short disordered469 - 52152
11long disordered1 - 1110
12long disordered13 - 130
13long disordered18 - 191
14long disordered24 - 240
15long disordered32 - 353
16long disordered434 - 4340
17long disordered437 - 4370
18long disordered460 - 4600
19long disordered463 - 4663
20long disordered469 - 52152
21short ordered12 - 120
22short ordered14 - 173
23short ordered20 - 233
24short ordered25 - 316
25short ordered36 - 433397
26short ordered435 - 4361
27short ordered438 - 45921
28short ordered461 - 4621
29short ordered467 - 4681
30long ordered12 - 120
31long ordered14 - 173
32long ordered20 - 233
33long ordered25 - 316
34long ordered36 - 433397
35long ordered435 - 4361
36long ordered438 - 45921
37long ordered461 - 4621
38long ordered467 - 4681

P12694

DisProt
Regiontypelocationlength
N/AN/AN/AN/A
IUPred
Regiontypelocationlength
1short disordered1 - 10
2short disordered33 - 5522
3short disordered92 - 931
4short disordered393 - 41118
5short disordered415 - 4150
6short disordered420 - 4211
7short disordered423 - 4252
8short disordered427 - 4281
9short disordered433 - 4330
10short disordered438 - 4457
11long disordered1 - 10
12long disordered33 - 5522
13long disordered92 - 931
14long disordered393 - 41118
15long disordered415 - 4150
16long disordered420 - 4211
17long disordered423 - 4252
18long disordered427 - 4281
19long disordered433 - 4330
20long disordered438 - 4457
21short ordered2 - 3230
22short ordered56 - 9135
23short ordered94 - 392298
24short ordered412 - 4142
25short ordered416 - 4193
26short ordered422 - 4220
27short ordered426 - 4260
28short ordered429 - 4323
29short ordered434 - 4373
30long ordered2 - 3230
31long ordered56 - 9135
32long ordered94 - 392298
33long ordered412 - 4142
34long ordered416 - 4193
35long ordered422 - 4220
36long ordered426 - 4260
37long ordered429 - 4323
38long ordered434 - 4373

P10775

DisProt
Regiontypelocationlength
1Disordered31 - 5020
IUPred
Regiontypelocationlength
1short disordered1 - 54
2short disordered452 - 4564
3long disordered1 - 54
4long disordered452 - 4564
5short ordered6 - 451445
6long ordered6 - 451445

Q9X0E6

DisProt
Regiontypelocationlength
1Disordered1 - 5656
IUPred
Regiontypelocationlength
1short ordered1 - 101100
2long ordered1 - 101100

Transmembrane helices

For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8 and P477863.

Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions.

P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.

Region PolyPhobius Start Stop UniProt Start Stop OPM Start Stop PDBTM Start Stop
1.transmembrane 30 55 33 55 34 52 35 52
2.transmembrane 66 88 66 88 67 91 68 84
3.transmembrane 105 126 105 126 101 126 109 123
4.transmembrane 150 170 150 170 150 170 152 166
5.transmembrane 188 212 188 212 187 209 191 206
6.transmembrane 329 352 330 351 330 351 334 347
7.transmembrane 367 386 367 388 363 386 368 382

As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions.

Database True Positives False Positives True Negatives False Negatives
UniProt 149 5 2
OPM 141 7+1+3 2

Signal peptides

GO terms

GOPET
GOidAspectConfidenceGO-Term
GO:0003824F97%catalytic activity
GO:0016491F96%oxidoreductase activity
GO:0016624F95%oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863F90%3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739F89%pyruvate dehydrogenase acetyl-transferring activity
GO:0004738F78%pyruvate dehydrogenase activity
GO:0003826F77%alpha-ketoacid dehydrogenase activity
GO:0047101F75%2-oxoisovalerate dehydrogenase acylating activity
GO:0008677F65%2-dehydropantoate 2-reductase activity
GO:0019152F63%acetoin dehydrogenase activity
GO:0030955F63%potassium ion binding
GO:0016616F62%oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872F62%metal ion binding
############## ProtFun 2.2 predictions ##############

>sp_P12694_O

# Functional category                  Prob     Odds
 Amino_acid_biosynthesis              0.187    8.520
 Biosynthesis_of_cofactors            0.246    3.413
 Cell_envelope                        0.035    0.581
 Cellular_processes                   0.041    0.560
 Central_intermediary_metabolism   => 0.321    5.096
 Energy_metabolism                    0.208    2.310
 Fatty_acid_metabolism                0.023    1.738
 Purines_and_pyrimidines              0.257    1.059
 Regulatory_functions                 0.031    0.194
 Replication_and_transcription        0.170    0.636
 Translation                          0.047    1.078
 Transport_and_binding                0.029    0.071

# Enzyme/nonenzyme                     Prob     Odds
 Enzyme                            => 0.769    2.683
 Nonenzyme                            0.231    0.324

# Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.178    0.857
 Transferase    (EC 2.-.-.-)          0.238    0.690
 Hydrolase      (EC 3.-.-.-)          0.190    0.601
 Lyase          (EC 4.-.-.-)          0.076    1.614
 Isomerase      (EC 5.-.-.-)          0.010    0.321
 Ligase         (EC 6.-.-.-)       => 0.085    1.673

# Gene Ontology category               Prob     Odds
 Signal_transducer                    0.098    0.458
 Receptor                             0.006    0.038
 Hormone                              0.001    0.206
 Structural_protein                   0.005    0.170
 Transporter                          0.025    0.226
 Ion_channel                          0.009    0.163
 Voltage-gated_ion_channel            0.004    0.170
 Cation_channel                       0.010    0.215
 Transcription                        0.060    0.470
 Transcription_regulation             0.053    0.427
 Stress_response                      0.010    0.110
 Immune_response                      0.012    0.136
 Growth_factor                        0.009    0.609
 Metal_ion_transport                  0.012    0.025

//