secondary structure
To predict secondary structure we use the following tools and compare the results:
-reprof
-psipred
-DSSP_Server
Methods
reprof
to run reprof from the command line the following command is used:
reprof -i seq.fasta
reprof then calculates the secondary structure prediction and provides an output file "seq.reprof".
Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse.
Result: (H = Helix, E = Extended/Sheet, L = Loop)
reprof with fasta
obda_human
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656
LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543
HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567
EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138
LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
898975011024431728777888989998887256688886799
LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337
LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL
0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411
HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR
1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777
HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC
7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025
HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL
65766771365554034675421654433056661778999988407888851138
LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420
LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK
1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624
EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG
6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763
LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP
2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554
EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI
4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554
HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN
331156787677886677889
HHHLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524
LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV
9
L
L
reprof with HHBlits - PSSM (up20)
To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:
hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"
now reprof is run using the created pssm's. Results:
obda_human - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978
LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777
HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL
KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855
LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL
HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736
LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
674889999984266986689888988999872585477765789
LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL
PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 - PSSM UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778
LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL
9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888
HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE
PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR
5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888
EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH
LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC
6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783
HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL
16768915799999887624877067785067688677999998898767846738
LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL
SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK
9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614
HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL
LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG
4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630
LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL
GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP
6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999
EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI
9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666
HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL
RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN
554655567766555544559
LLLLLLLLLLLLLLLLLLLLL
KALTSETNGTDSNGSNSSNIQ
Q9X0E6 - PSSM - UP20 HHBlits
Reliability( 0-9 (most reliable) )
sec-structure
AA-sequence
9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843
LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV
9
L
L
psipred
the version from [1] was used to predict secondary structure with psipred.
Results:
obda_human
confidence
sec-structure
AA-sequence
915554344652010125789986408888898888999867679889999999999943
CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE
345544347897889982787589997259999999999999999999799999999999
CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY
984258734444798615999998531399982418689312241328998999998626
HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG
978889988877778888776122353334681566759888767099958999818886
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA
670358878577674079759999579823236884113641143204567987321028
CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
209999999999999999089974998642117999999999999997788866514995
CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP
799999999779999999999999999999999999992999996677866421799789
HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL
9999999999998299899998789
HHHHHHHHHHHHHHCCCCCCCCCCC
RKQQESLARHLQTYGEHYPLDHFDK
P10775
confidence
sec-structure
AA-sequence
989828999999999999677137869971179999887999999643699967898647
CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC
MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT
999958999999760499984138982169999455648999723799858897979999
CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC
NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL
926999999884299986488981148899014899999871699979997869999968
CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH
GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA
999999750299985368971389999776999999882199979796999999918999
HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA
998530499984148970289999665999999860499969996889999907999999
HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH
ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC
980499986458961189789888999999882499749896999999925899999851
HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC
ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL
899961119980268899666999999984399979885999999938999999840699
CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC
SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP
997678830589887899999999995599831219
CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC
GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209
confidence
sec-structure
AA-sequence
999988999887776433454799998899321139899989999999996317799999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH
MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV
999999999997219880321498447446663057899999816999998630024543
HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC
ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV
689971899999999834099957772157432122131554799995408989999985
CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH
DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD
429830331043915485046798999911110268999999999741002589999889
HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC
AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG
976434567889874069747689999987549803433323114013201134467999
CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC
NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP
469980378752345882379998376020786434899998898885444531367899
CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH
SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK
999999984067865568988888820389999999997775556788889999842100
HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV
000137999999886667723566666677776665430699998898548887401200
HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH
LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN
38999989998864522211113457789999999999999
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ
Q9X0E6
confidence
sec-structure
AA-sequence
999999079999999999998635611237776532476544610112148789766711
CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC
MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE
19999999986599966418998366555778899875229
CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC
KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL
DSSP_Server
to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209
uniprot ID | pdb ID's |
P12694 | 1DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F |
P10775 | 1DFJ, 2BNH |
Q08209 | 1AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8 |
Q9X0E6 | 1KR4, 1O5J, 1VHF |
now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein.
Results:
P12694 - 2BFD - Position 46-445
sec-structure
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence
S TT SS SS S EE SB TTS BS GGG HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS TT HHHHHHHHHTS
AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD
TTSEEE S HHHHHHTT HHHHHHHHHT TT TTTT S SS BTTTTB SSTTTHHHHHHHHHHHHHHHT EEEEEETTGGGSHHHHHH
NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG
HHHHHHTT EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT EEEEEE HHHHHHHHH
FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL
TTTT HHHHHHHHHHHHHHHHHHHHHHHHS B GGGGSTTSSSS HHHHHHHHHHHHHHHHHGGGS GGGB S EEEE HHHHHHHHH
LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL
HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE S HHHHHHHHHHHHHTT EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS TTEE
DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT
EEEEES SS GGGSS HHHHHTSTT EEE SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS EEEESS SS EEEE SSEEEEE TTHH
IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV
HHHHHHHHHHHHHH EEEEE EEES HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS EEEEE SS STTHHHHS HHH
HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK
HHHHHHHHHT
CYDALRKMINY
P10775 - 2BNH - position 1-456
sec-structure
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence
S B EES HHHHHHHHHHHTT SEEEEET HHHHHHHHHHHTT TT EEE S HHHHHHHHHHHHSSTT EEE TTS GGGGGS
AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV
HHHHHHH TT EEE S HHHHHHHHHHHHHSTT EEE TT BHHHHHHHHHHHHH S EEE TTSB HHHHHHHHHHHHHT S EE
LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL
E TTS HHHHHHHHHHHHH TT EEE SS HHHHHHHHHHHHT TT EEE TTS HHHHHHHHHHHHH SS EEE TTS HHHHHHHH
RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL
HHHHTSTT EEE TTS BGGGHHHHHHHHHH SS EEE SSB HHHHHHHHHHHTTSSS EEE TTS HHHHHHHHHHHHH S EEE
CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD
TTSS HHHHHHHHHHHTSSS EEE TT HHHHHHHHHHHHH SS EEE
LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - 1AUI - position 1-521
sec-structure
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence
S SSTTS B HHHHB TTS B HHHHHHHHHTT B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE TT HHHHHHHHHHH TTT
ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR
EEE S SSSSS HHHHHHHHHHHHHHSTTTEEE TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS EEEETTTEEEESS TT SHHHH
YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI
HHS SSS SSSHHHHHHH EE TTTTS SS EEE TTTTSSEEE HHHHHHHHHHTT SEEEE S TTSEEE B TTTSSBSEEEE SSGG
RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL
GTS EEEEEEETTEEEEEEE GGG HHHHHHHHHHHHHHHHHHHHHTT HHHHHHHHGGGGS S HHHHHHHH
DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG
HHHHHH TT SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS
KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL
HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG
KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV
Q9X0E6 - 1KR4 - position 1-101
sec-structure
( H = alpha helix
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded turn
S = bend )
aa-sequence
S EE EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS EEEE EEHHH
AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY
HHHHHHHTS
XNWLRESVLGS
Comparison
Method | query | TP's against DSSP | Q3 |
reprof fasta | obda_human | 215 / 378 | 56.87% |
reprof PSSM | obda_human | 286 / 378 | 75.66% |
psipred | obda_human | 237 / 378 | 62.69% |
reprof fasta | P10775 | 279 / 456 | 61.18% |
reprof PSSM | P10775 | 342 / 456 | 75% |
psipred | P10775 | 268 / 456 | 58.77% |
reprof fasta | Q08209 | 211 / 380 | 55.52% |
reprof PSSM | Q08209 | 299 / 380 | 78.68% |
psipred | Q08209 | 218 / 380 | 57.36% |
reprof fasta | Q9X0E6 | 67 / 110 | 60.90% |
reprof PSSM | Q9X0E6 | 92 / 110 | 83.63% |
psipred | Q9X0E6 | 92 / 110 | 83.63% |
|
This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file.
|
Disorder
To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:
up-ID | DisProt-ID | identities | positives | gaps | e-value | direct hit |
Q08209 | DP00092 | 100% | 100% | 0 | - | y |
P12694 | - | 0% | 0% | 0 | - | n |
P10775 | DP00554 | 40% | 54% | 0 | 5e-30 | n |
Q9X0E6 | DP00175 | 32% | 56% | 0 | 4.3 | n |
Q08209
DisProt |
Region | type | location | length |
1 | Disordered - Extended | 1-13 | 13 |
2 | Disordered - Extended | 374-468 | 95 |
3 | Disordered - Extended | 390-414 | 25 |
4 | Disordered - Extended | 469-486 | 18 |
5 | Disordered - Extended | 487-521 | 35 |
6 | ordered | 14-373 | 360 |
|
IUPred |
Region | type | location | length |
1 | short disordered | 1 - 11 | 10 |
2 | short disordered | 13 - 13 | 0 |
3 | short disordered | 18 - 19 | 1 |
4 | short disordered | 24 - 24 | 0 |
5 | short disordered | 32 - 35 | 3 |
6 | short disordered | 434 - 434 | 0 |
7 | short disordered | 437 - 437 | 0 |
8 | short disordered | 460 - 460 | 0 |
9 | short disordered | 463 - 466 | 3 |
10 | short disordered | 469 - 521 | 52 |
11 | long disordered | 1 - 11 | 10 |
12 | long disordered | 13 - 13 | 0 |
13 | long disordered | 18 - 19 | 1 |
14 | long disordered | 24 - 24 | 0 |
15 | long disordered | 32 - 35 | 3 |
16 | long disordered | 434 - 434 | 0 |
17 | long disordered | 437 - 437 | 0 |
18 | long disordered | 460 - 460 | 0 |
19 | long disordered | 463 - 466 | 3 |
20 | long disordered | 469 - 521 | 52 |
21 | short ordered | 12 - 12 | 0 |
22 | short ordered | 14 - 17 | 3 |
23 | short ordered | 20 - 23 | 3 |
24 | short ordered | 25 - 31 | 6 |
25 | short ordered | 36 - 433 | 397 |
26 | short ordered | 435 - 436 | 1 |
27 | short ordered | 438 - 459 | 21 |
28 | short ordered | 461 - 462 | 1 |
29 | short ordered | 467 - 468 | 1 |
30 | long ordered | 12 - 12 | 0 |
31 | long ordered | 14 - 17 | 3 |
32 | long ordered | 20 - 23 | 3 |
33 | long ordered | 25 - 31 | 6 |
34 | long ordered | 36 - 433 | 397 |
35 | long ordered | 435 - 436 | 1 |
36 | long ordered | 438 - 459 | 21 |
37 | long ordered | 461 - 462 | 1 |
38 | long ordered | 467 - 468 | 1 |
|
P12694
DisProt |
Region | type | location | length |
N/A | N/A | N/A | N/A |
|
IUPred |
Region | type | location | length |
1 | short disordered | 1 - 1 | 0 |
2 | short disordered | 33 - 55 | 22 |
3 | short disordered | 92 - 93 | 1 |
4 | short disordered | 393 - 411 | 18 |
5 | short disordered | 415 - 415 | 0 |
6 | short disordered | 420 - 421 | 1 |
7 | short disordered | 423 - 425 | 2 |
8 | short disordered | 427 - 428 | 1 |
9 | short disordered | 433 - 433 | 0 |
10 | short disordered | 438 - 445 | 7 |
11 | long disordered | 1 - 1 | 0 |
12 | long disordered | 33 - 55 | 22 |
13 | long disordered | 92 - 93 | 1 |
14 | long disordered | 393 - 411 | 18 |
15 | long disordered | 415 - 415 | 0 |
16 | long disordered | 420 - 421 | 1 |
17 | long disordered | 423 - 425 | 2 |
18 | long disordered | 427 - 428 | 1 |
19 | long disordered | 433 - 433 | 0 |
20 | long disordered | 438 - 445 | 7 |
21 | short ordered | 2 - 32 | 30 |
22 | short ordered | 56 - 91 | 35 |
23 | short ordered | 94 - 392 | 298 |
24 | short ordered | 412 - 414 | 2 |
25 | short ordered | 416 - 419 | 3 |
26 | short ordered | 422 - 422 | 0 |
27 | short ordered | 426 - 426 | 0 |
28 | short ordered | 429 - 432 | 3 |
29 | short ordered | 434 - 437 | 3 |
30 | long ordered | 2 - 32 | 30 |
31 | long ordered | 56 - 91 | 35 |
32 | long ordered | 94 - 392 | 298 |
33 | long ordered | 412 - 414 | 2 |
34 | long ordered | 416 - 419 | 3 |
35 | long ordered | 422 - 422 | 0 |
36 | long ordered | 426 - 426 | 0 |
37 | long ordered | 429 - 432 | 3 |
38 | long ordered | 434 - 437 | 3 |
|
P10775
DisProt |
Region | type | location | length |
1 | Disordered | 31 - 50 | 20 |
|
IUPred |
Region | type | location | length |
1 | short disordered | 1 - 5 | 4 |
2 | short disordered | 452 - 456 | 4 |
3 | long disordered | 1 - 5 | 4 |
4 | long disordered | 452 - 456 | 4 |
5 | short ordered | 6 - 451 | 445 |
6 | long ordered | 6 - 451 | 445 |
|
Q9X0E6
DisProt |
Region | type | location | length |
1 | Disordered | 1 - 56 | 56 |
|
IUPred |
Region | type | location | length |
1 | short ordered | 1 - 101 | 100 |
2 | long ordered | 1 - 101 | 100 |
|
Transmembrane helices
For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8 and P477863.
Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions.
P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.
Region
|
PolyPhobius Start
|
Stop
|
UniProt Start
|
Stop
|
OPM Start
|
Stop
|
PDBTM Start
|
Stop
|
1.transmembrane
|
30
|
55
|
33
|
55
|
34
|
52
|
35
|
52
|
2.transmembrane
|
66
|
88
|
66
|
88
|
67
|
91
|
68
|
84
|
3.transmembrane
|
105
|
126
|
105
|
126
|
101
|
126
|
109
|
123
|
4.transmembrane
|
150
|
170
|
150
|
170
|
150
|
170
|
152
|
166
|
5.transmembrane
|
188
|
212
|
188
|
212
|
187
|
209
|
191
|
206
|
6.transmembrane
|
329
|
352
|
330
|
351
|
330
|
351
|
334
|
347
|
7.transmembrane
|
367
|
386
|
367
|
388
|
363
|
386
|
368
|
382
|
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions.
Database
|
True Positives
|
False Positives
|
True Negatives
|
False Negatives
|
UniProt
|
149
|
5
|
|
2
|
OPM
|
141
|
7+1+3
|
|
2
|
Signal peptides
GO terms
GOPET |
GOid | Aspect | Confidence | GO-Term |
GO:0003824 | F | 97% | catalytic activity |
GO:0016491 | F | 96% | oxidoreductase activity |
GO:0016624 | F | 95% | oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor |
GO:0003863 | F | 90% | 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity |
GO:0004739 | F | 89% | pyruvate dehydrogenase acetyl-transferring activity |
GO:0004738 | F | 78% | pyruvate dehydrogenase activity |
GO:0003826 | F | 77% | alpha-ketoacid dehydrogenase activity |
GO:0047101 | F | 75% | 2-oxoisovalerate dehydrogenase acylating activity |
GO:0008677 | F | 65% | 2-dehydropantoate 2-reductase activity |
GO:0019152 | F | 63% | acetoin dehydrogenase activity |
GO:0030955 | F | 63% | potassium ion binding |
GO:0016616 | F | 62% | oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor |
GO:0046872 | F | 62% | metal ion binding |
############## ProtFun 2.2 predictions ##############
>sp_P12694_O
# Functional category Prob Odds
Amino_acid_biosynthesis 0.187 8.520
Biosynthesis_of_cofactors 0.246 3.413
Cell_envelope 0.035 0.581
Cellular_processes 0.041 0.560
Central_intermediary_metabolism => 0.321 5.096
Energy_metabolism 0.208 2.310
Fatty_acid_metabolism 0.023 1.738
Purines_and_pyrimidines 0.257 1.059
Regulatory_functions 0.031 0.194
Replication_and_transcription 0.170 0.636
Translation 0.047 1.078
Transport_and_binding 0.029 0.071
# Enzyme/nonenzyme Prob Odds
Enzyme => 0.769 2.683
Nonenzyme 0.231 0.324
# Enzyme class Prob Odds
Oxidoreductase (EC 1.-.-.-) 0.178 0.857
Transferase (EC 2.-.-.-) 0.238 0.690
Hydrolase (EC 3.-.-.-) 0.190 0.601
Lyase (EC 4.-.-.-) 0.076 1.614
Isomerase (EC 5.-.-.-) 0.010 0.321
Ligase (EC 6.-.-.-) => 0.085 1.673
# Gene Ontology category Prob Odds
Signal_transducer 0.098 0.458
Receptor 0.006 0.038
Hormone 0.001 0.206
Structural_protein 0.005 0.170
Transporter 0.025 0.226
Ion_channel 0.009 0.163
Voltage-gated_ion_channel 0.004 0.170
Cation_channel 0.010 0.215
Transcription 0.060 0.470
Transcription_regulation 0.053 0.427
Stress_response 0.010 0.110
Immune_response 0.012 0.136
Growth_factor 0.009 0.609
Metal_ion_transport 0.012 0.025
//