Difference between revisions of "Task 3: odba human Sequence-based predictions"
(→Transmembrane helices) |
(→Transmembrane helices) |
||
Line 760: | Line 760: | ||
|42 |
|42 |
||
|60 |
|60 |
||
− | | |
+ | |39 |
− | | |
+ | |63 |
| |
| |
||
| |
| |
||
Line 770: | Line 770: | ||
|68 |
|68 |
||
|88 |
|88 |
||
− | | |
+ | |68 |
− | | |
+ | |92 |
| |
| |
||
| |
| |
||
Line 780: | Line 780: | ||
|108 |
|108 |
||
|129 |
|129 |
||
− | | |
+ | |109 |
− | | |
+ | |125 |
| |
| |
||
| |
| |
||
Line 790: | Line 790: | ||
|137 |
|137 |
||
|157 |
|157 |
||
− | | |
+ | |129 |
− | | |
+ | |145 |
| |
| |
||
| |
| |
||
Line 800: | Line 800: | ||
|163 |
|163 |
||
|184 |
|184 |
||
− | | |
+ | |160 |
− | | |
+ | |184 |
| |
| |
||
| |
| |
||
Line 810: | Line 810: | ||
|196 |
|196 |
||
|213 |
|213 |
||
+ | |*196 |
||
− | | |
||
+ | |*208 |
||
− | | |
||
| |
| |
||
| |
| |
||
Line 820: | Line 820: | ||
|224 |
|224 |
||
|244 |
|244 |
||
− | | |
+ | |222 |
− | | |
+ | |253 |
| |
| |
||
| |
| |
Revision as of 20:25, 15 May 2012
Contents
secondary structure
To predict secondary structure we use the following tools and compare the results:
-reprof -psipred -DSSP_Server
Methods
reprof
to run reprof from the command line the following command is used:
reprof -i seq.fasta
reprof then calculates the secondary structure prediction and provides an output file "seq.reprof". Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse. Result: (H = Helix, E = Extended/Sheet, L = Loop)
reprof with fasta
obda_human Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656 LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543 HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567 EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138 LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 898975011024431728777888989998887256688886799 LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337 LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL 0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411 HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR 1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777 HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025 HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL 65766771365554034675421654433056661778999988407888851138 LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420 LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK 1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624 EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG 6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763 LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554 EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI 4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554 HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN 331156787677886677889 HHHLLLLLLLLLLLLLLLLLL KALTSETNGTDSNGSNSSNIQ
Q9X0E6 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524 LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV 9 L L
reprof with HHBlits - PSSM (up20)
To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:
hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"
now reprof is run using the created pssm's. Results:
obda_human - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978 LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777 HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855 LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736 LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 674889999984266986689888988999872585477765789 LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 - PSSM UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778 LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL 9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888 HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR 5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888 EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783 HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL 16768915799999887624877067785067688677999998898767846738 LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999 LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK 9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614 HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG 4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630 LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999 EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI 9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666 HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN 554655567766555544559 LLLLLLLLLLLLLLLLLLLLL KALTSETNGTDSNGSNSSNIQ
Q9X0E6 - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843 LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV 9 L L
psipred
the version from [1] was used to predict secondary structure with psipred. Results:
obda_human confidence sec-structure AA-sequence 915554344652010125789986408888898888999867679889999999999943 CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE 345544347897889982787589997259999999999999999999799999999999 CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY 984258734444798615999998531399982418689312241328998999998626 HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG 978889988877778888776122353334681566759888767099958999818886 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA 670358878577674079759999579823236884113641143204567987321028 CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 209999999999999999089974998642117999999999999997788866514995 CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP 799999999779999999999999999999999999992999996677866421799789 HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL 9999999999998299899998789 HHHHHHHHHHHHHHCCCCCCCCCCC RKQQESLARHLQTYGEHYPLDHFDK
P10775 confidence sec-structure AA-sequence 989828999999999999677137869971179999887999999643699967898647 CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT 999958999999760499984138982169999455648999723799858897979999 CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL 926999999884299986488981148899014899999871699979997869999968 CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA 999999750299985368971389999776999999882199979796999999918999 HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA 998530499984148970289999665999999860499969996889999907999999 HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 980499986458961189789888999999882499749896999999925899999851 HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL 899961119980268899666999999984399979885999999938999999840699 CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP 997678830589887899999999995599831219 CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 confidence sec-structure AA-sequence 999988999887776433454799998899321139899989999999996317799999 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV 999999999997219880321498447446663057899999816999998630024543 HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV 689971899999999834099957772157432122131554799995408989999985 CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD 429830331043915485046798999911110268999999999741002589999889 HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG 976434567889874069747689999987549803433323114013201134467999 CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 469980378752345882379998376020786434899998898885444531367899 CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK 999999984067865568988888820389999999997775556788889999842100 HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV 000137999999886667723566666677776665430699998898548887401200 HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN 38999989998864522211113457789999999999999 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ
Q9X0E6 confidence sec-structure AA-sequence 999999079999999999998635611237776532476544610112148789766711 CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE 19999999986599966418998366555778899875229 CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL
DSSP_Server
to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209
uniprot ID | pdb ID's |
P12694 | 1DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F |
P10775 | 1DFJ, 2BNH |
Q08209 | 1AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8 |
Q9X0E6 | 1KR4, 1O5J, 1VHF |
now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein. Results:
P12694 - 2BFD - Position 46-445 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S TT SS SS S EE SB TTS BS GGG HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS TT HHHHHHHHHTS AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD TTSEEE S HHHHHHTT HHHHHHHHHT TT TTTT S SS BTTTTB SSTTTHHHHHHHHHHHHHHHT EEEEEETTGGGSHHHHHH NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG HHHHHHTT EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT EEEEEE HHHHHHHHH FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL TTTT HHHHHHHHHHHHHHHHHHHHHHHHS B GGGGSTTSSSS HHHHHHHHHHHHHHHHHGGGS GGGB S EEEE HHHHHHHHH LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE S HHHHHHHHHHHHHTT EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS TTEE DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT EEEEES SS GGGSS HHHHHTSTT EEE SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS EEEESS SS EEEE SSEEEEE TTHH IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV HHHHHHHHHHHHHH EEEEE EEES HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS EEEEE SS STTHHHHS HHH HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK HHHHHHHHHT CYDALRKMINY
P10775 - 2BNH - position 1-456 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S B EES HHHHHHHHHHHTT SEEEEET HHHHHHHHHHHTT TT EEE S HHHHHHHHHHHHSSTT EEE TTS GGGGGS AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV HHHHHHH TT EEE S HHHHHHHHHHHHHSTT EEE TT BHHHHHHHHHHHHH S EEE TTSB HHHHHHHHHHHHHT S EE LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL E TTS HHHHHHHHHHHHH TT EEE SS HHHHHHHHHHHHT TT EEE TTS HHHHHHHHHHHHH SS EEE TTS HHHHHHHH RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL HHHHTSTT EEE TTS BGGGHHHHHHHHHH SS EEE SSB HHHHHHHHHHHTTSSS EEE TTS HHHHHHHHHHHHH S EEE CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD TTSS HHHHHHHHHHHTSSS EEE TT HHHHHHHHHHHHH SS EEE LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - 1AUI - position 1-521 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S SSTTS B HHHHB TTS B HHHHHHHHHTT B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE TT HHHHHHHHHHH TTT ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR EEE S SSSSS HHHHHHHHHHHHHHSTTTEEE TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS EEEETTTEEEESS TT SHHHH YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI HHS SSS SSSHHHHHHH EE TTTTS SS EEE TTTTSSEEE HHHHHHHHHHTT SEEEE S TTSEEE B TTTSSBSEEEE SSGG RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL GTS EEEEEEETTEEEEEEE GGG HHHHHHHHHHHHHHHHHHHHHTT HHHHHHHHGGGGS S HHHHHHHH DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG HHHHHH TT SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV
Q9X0E6 - 1KR4 - position 1-101 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S EE EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS EEEE EEHHH AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY HHHHHHHTS XNWLRESVLGS
Comparison
|
This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file. |
Disorder
To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:
up-ID | DisProt-ID | identities | positives | gaps | e-value | direct hit |
Q08209 | DP00092 | 100% | 100% | 0 | - | y |
P12694 | - | 0% | 0% | 0 | - | n |
P10775 | DP00554 | 40% | 54% | 0 | 5e-30 | n |
Q9X0E6 | DP00175 | 32% | 56% | 0 | 4.3 | n |
Q08209
|
|
P12694
|
|
P10775
|
|
Q9X0E6
|
|
Transmembrane helices
For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8(Voltage-gated potassium channel) and P477863.
Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions.
P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.
Region | PolyPhobius Start | Stop | UniProt Start | Stop | OPM Start | Stop | PDBTM Start | Stop |
---|---|---|---|---|---|---|---|---|
1.transmembrane | 30 | 55 | 33 | 55 | 34 | 52 | 35 | 52 |
2.transmembrane | 66 | 88 | 66 | 88 | 67 | 91 | 68 | 84 |
3.transmembrane | 105 | 126 | 105 | 126 | 101 | 126 | 109 | 123 |
4.transmembrane | 150 | 170 | 150 | 170 | 150 | 170 | 152 | 166 |
5.transmembrane | 188 | 212 | 188 | 212 | 187 | 209 | 191 | 206 |
6.transmembrane | 329 | 352 | 330 | 351 | 330 | 351 | 334 | 347 |
7.transmembrane | 367 | 386 | 367 | 388 | 363 | 386 | 368 | 382 |
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction.
Database | True Positives | False Positives | True Negatives | False Negatives |
---|---|---|---|---|
UniProt | 149 | 5 | 2 | |
OPM | 141 | 7+1+3 | 2 |
For Q9YDF8 PolyPhobius again predicts seven transmembrane regions.
Region | PolyPhobius Start | Stop | UniProt Start | Stop | OPM Start | Stop | PDBTM Start | Stop |
---|---|---|---|---|---|---|---|---|
1.transmembrane | 42 | 60 | 39 | 63 | ||||
2.transmembrane | 68 | 88 | 68 | 92 | ||||
3.transmembrane | 108 | 129 | 109 | 125 | ||||
4.transmembrane | 137 | 157 | 129 | 145 | ||||
5.transmembrane | 163 | 184 | 160 | 184 | ||||
6.transmembrane | 196 | 213 | *196 | *208 | ||||
7.transmembrane | 224 | 244 | 222 | 253 |
Signal peptides
GO terms
GOPET | |||
GOid | Aspect | Confidence | GO-Term |
GO:0003824 | F | 97% | catalytic activity |
GO:0016491 | F | 96% | oxidoreductase activity |
GO:0016624 | F | 95% | oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor |
GO:0003863 | F | 90% | 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity |
GO:0004739 | F | 89% | pyruvate dehydrogenase acetyl-transferring activity |
GO:0004738 | F | 78% | pyruvate dehydrogenase activity |
GO:0003826 | F | 77% | alpha-ketoacid dehydrogenase activity |
GO:0047101 | F | 75% | 2-oxoisovalerate dehydrogenase acylating activity |
GO:0008677 | F | 65% | 2-dehydropantoate 2-reductase activity |
GO:0019152 | F | 63% | acetoin dehydrogenase activity |
GO:0030955 | F | 63% | potassium ion binding |
GO:0016616 | F | 62% | oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor |
GO:0046872 | F | 62% | metal ion binding |
############## ProtFun 2.2 predictions ############## >sp_P12694_O # Functional category Prob Odds Amino_acid_biosynthesis 0.187 8.520 Biosynthesis_of_cofactors 0.246 3.413 Cell_envelope 0.035 0.581 Cellular_processes 0.041 0.560 Central_intermediary_metabolism => 0.321 5.096 Energy_metabolism 0.208 2.310 Fatty_acid_metabolism 0.023 1.738 Purines_and_pyrimidines 0.257 1.059 Regulatory_functions 0.031 0.194 Replication_and_transcription 0.170 0.636 Translation 0.047 1.078 Transport_and_binding 0.029 0.071 # Enzyme/nonenzyme Prob Odds Enzyme => 0.769 2.683 Nonenzyme 0.231 0.324 # Enzyme class Prob Odds Oxidoreductase (EC 1.-.-.-) 0.178 0.857 Transferase (EC 2.-.-.-) 0.238 0.690 Hydrolase (EC 3.-.-.-) 0.190 0.601 Lyase (EC 4.-.-.-) 0.076 1.614 Isomerase (EC 5.-.-.-) 0.010 0.321 Ligase (EC 6.-.-.-) => 0.085 1.673 # Gene Ontology category Prob Odds Signal_transducer 0.098 0.458 Receptor 0.006 0.038 Hormone 0.001 0.206 Structural_protein 0.005 0.170 Transporter 0.025 0.226 Ion_channel 0.009 0.163 Voltage-gated_ion_channel 0.004 0.170 Cation_channel 0.010 0.215 Transcription 0.060 0.470 Transcription_regulation 0.053 0.427 Stress_response 0.010 0.110 Immune_response 0.012 0.136 Growth_factor 0.009 0.609 Metal_ion_transport 0.012 0.025 //