Difference between revisions of "Task 3: odba human Sequence-based predictions"
(→Transmembrane helices) |
m (moved Task 3: obda human Sequence-based predictions to Task 3: odba human Sequence-based predictions) |
||
(33 intermediate revisions by 2 users not shown) | |||
Line 632: | Line 632: | ||
= Transmembrane helices = |
= Transmembrane helices = |
||
− | For the prediction of Transmembrane helices in our Protein we used [http://phobius.sbc.su.se/poly.html PolyPhobius]. In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P35462 P35462](D(3) dopamine receptor), [http://www.uniprot.org/uniprot/Q9YDF8 Q9YDF8](Voltage-gated potassium channel) and [http://www.uniprot.org/uniprot/P47863 P477863]. |
+ | For the prediction of Transmembrane helices in our Protein we used [http://phobius.sbc.su.se/poly.html PolyPhobius]. In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P35462 P35462](D(3) dopamine receptor), [http://www.uniprot.org/uniprot/Q9YDF8 Q9YDF8](Voltage-gated potassium channel) and [http://www.uniprot.org/uniprot/P47863 P477863](Aquaporin-4). |
− | Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions. |
+ | Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions, we found no entry in any of the databases that indicate otherwise. |
P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions. |
P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions. |
||
− | [[File:3pbl.png]] |
+ | [[File:3pbl.png|200px|thumb|left|3PBL » Dopamine D3 receptor, image from OPM.]] |
{|class="wikitable" border="1" style="border-spacing:0;text-align: right;" |
{|class="wikitable" border="1" style="border-spacing:0;text-align: right;" |
||
!Region |
!Region |
||
Line 722: | Line 722: | ||
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction. |
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction. |
||
+ | For Q9YDF8 PolyPhobius again predicts seven transmembrane regions. |
||
− | {| style="text-align: right;" |
||
+ | [[File:1orq.png|200px|thumb|left|1ORQ » Potassium channel KvAP, image from OPM.]] |
||
− | !Database |
||
+ | |||
− | !True Positives |
||
+ | {|class="wikitable" border="1" style="border-spacing:0;text-align: right;" |
||
− | !False Positives |
||
+ | !Region |
||
− | !True Negatives |
||
+ | !PolyPhobius Start |
||
− | !False Negatives |
||
+ | !Stop |
||
+ | !UniProt Start |
||
+ | !Stop |
||
+ | !OPM Start |
||
+ | !Stop |
||
+ | !PDBTM Start |
||
+ | !Stop |
||
|- |
|- |
||
+ | |1.transmembrane |
||
− | |UniProt |
||
− | | |
+ | |42 |
− | | |
+ | |60 |
+ | |39 |
||
+ | |63 |
||
+ | |153 |
||
+ | |172 |
||
+ | |21 |
||
+ | |52 |
||
+ | |- |
||
+ | |2.transmembrane |
||
+ | |68 |
||
+ | |88 |
||
+ | |68 |
||
+ | |92 |
||
+ | |183 |
||
+ | |195 |
||
+ | |57 |
||
+ | |80 |
||
+ | |- |
||
+ | |3.transmembrane |
||
+ | |108 |
||
+ | |129 |
||
+ | |109 |
||
+ | |125 |
||
+ | |207 |
||
+ | |225 |
||
+ | |151 |
||
+ | |171 |
||
+ | |- |
||
+ | |4.transmembrane |
||
+ | |137 |
||
+ | |157 |
||
+ | |129 |
||
+ | |145 |
||
| |
| |
||
− | | |
+ | | |
+ | |*184 |
||
+ | |*200 |
||
|- |
|- |
||
+ | |5.transmembrane |
||
− | |OPM |
||
− | | |
+ | |163 |
+ | |184 |
||
− | |7+1+3 |
||
+ | |160 |
||
+ | |184 |
||
+ | | |
||
+ | | |
||
+ | |209 |
||
+ | |236 |
||
+ | |- |
||
+ | |6.transmembrane |
||
+ | |196 |
||
+ | |213 |
||
+ | |*196 |
||
+ | |*208 |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | |- |
||
+ | |7.transmembrane |
||
+ | |224 |
||
+ | |244 |
||
+ | |222 |
||
+ | |253 |
||
+ | | |
||
+ | | |
||
+ | | |
||
| |
| |
||
− | |2 |
||
|- |
|- |
||
|} |
|} |
||
+ | |||
+ | In this case however the number of transmembrane regions found per database varies greatly. While Uniprot notes six transmembrane regieons and one intermembrane region (marked with *) that mostly overlap the prediction of Polyphobius, the 1orq structure of the protein has four identical subunits that all have the same three transmembrane regions. According so PDBTM there are four transmembrane regions and one intermembrane region for each of the identical subunits. For this Protein identification of transmembrane regions seems to be quite difficult, as there is so little consensus across the different databases. The problem seems to be caused by the shallow angles by which most of the helices enter the membrane, and by the fact that only very few of them actually cross the membrane. |
||
+ | |||
+ | For P47863 PolyPhobius predicts six transmembrane regions. |
||
+ | [[File:2d57.png|200px|thumb|left|2D57 » Aquaporin-4, image from OPM.]] |
||
+ | |||
+ | {|class="wikitable" border="1" style="border-spacing:0;text-align: right;" |
||
+ | !Region |
||
+ | !PolyPhobius Start |
||
+ | !Stop |
||
+ | !UniProt Start |
||
+ | !Stop |
||
+ | !OPM Start |
||
+ | !Stop |
||
+ | !PDBTM Start |
||
+ | !Stop |
||
+ | |- |
||
+ | |1.transmembrane |
||
+ | |34 |
||
+ | |58 |
||
+ | |37 |
||
+ | |57 |
||
+ | |34 |
||
+ | |56 |
||
+ | |38 |
||
+ | |55 |
||
+ | |- |
||
+ | |2.transmembrane |
||
+ | |70 |
||
+ | |91 |
||
+ | |65 |
||
+ | |85 |
||
+ | |70 |
||
+ | |88 |
||
+ | |72 |
||
+ | |89 |
||
+ | |- |
||
+ | |3.transmembrane |
||
+ | |115 |
||
+ | |136 |
||
+ | |116 |
||
+ | |136 |
||
+ | |98 |
||
+ | |107 |
||
+ | |*94 |
||
+ | |*106 |
||
+ | |- |
||
+ | |4.transmembrane |
||
+ | |156 |
||
+ | |177 |
||
+ | |156 |
||
+ | |176 |
||
+ | |112 |
||
+ | |136 |
||
+ | |116 |
||
+ | |133 |
||
+ | |- |
||
+ | |5.transmembrane |
||
+ | |188 |
||
+ | |208 |
||
+ | |185 |
||
+ | |205 |
||
+ | |156 |
||
+ | |178 |
||
+ | |158 |
||
+ | |177 |
||
+ | |- |
||
+ | |6.transmembrane |
||
+ | |231 |
||
+ | |252 |
||
+ | |232 |
||
+ | |252 |
||
+ | |189 |
||
+ | |203 |
||
+ | |188 |
||
+ | |205 |
||
+ | |- |
||
+ | |7.transmembrane |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | |214 |
||
+ | |223 |
||
+ | |*209 |
||
+ | |*222 |
||
+ | |- |
||
+ | |8.transmembrane |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | |231 |
||
+ | |252 |
||
+ | |231 |
||
+ | |248 |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | The third and seventh transmembrane regions are listed in OPM despite the fact, that they are actually to short to span through a membrane. Most likely these are actually intermembrane helices, as marked in PDBTM (as can be seen in the illustration to the left, two helices of the yellow subunit do not actually spann through the membrane). |
||
+ | |||
+ | In from the analysis with the three proteins we conclude that results produced by PolyPhobius seem reasonably good. While PolyPhobius seems to filter out results that are to short to actually be a transmembrane helix, it seems not to differenciate well between intermembrane and transmembrane helices, especially when the angle of entry is very shallow. This again can confuse the tools sense of interior/outerior of a protein which leads to decrease in performance. |
||
+ | |||
+ | Some other tools to predict transmembrane regions are for example [http://bioinf.cs.ucl.ac.uk/software_downloads/memsat/ MEMSAT3], [http://minnou.cchmc.org/ MINNOU], [http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_htm.html PHDhtm], [http://www.cbs.dtu.dk/services/TMHMM/ TMHMM2], [http://www.sbc.su.se/~miklos/DAS/maindas.html DAS], [http://www.enzim.hu/hmmtop/ HMMTOP2], [http://octopus.cbr.su.se/index.php?about=OCTOPUS OCTOPUS], [http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop/tmp/index.php SVMtop], [http://pongo.biocomp.unibo.it/ PONGO] or [http://www.ddg-pharmfac.net/bprompt/BPROMPT/BPROMPT.html BPROMPT]. |
||
= Signal peptides = |
= Signal peptides = |
||
+ | For the prediction of Signal Peptides we used [http://www.cbs.dtu.dk/services/SignalP/ SignalP] in Versions 3(Offline-Version) and 4 (Webserver). In addition to our protein of interest (ODBA_HUMAN, see [[Reference sequence (uniprot)]]) we applied the method as well to [http://www.uniprot.org/uniprot/P02768 P02768](Serum albumin), [http://www.uniprot.org/uniprot/P11279 P11279](Lysosome-associated membrane glycoprotein 1) and [http://www.uniprot.org/uniprot/P47863 P477863](Aquaporin-4). |
||
+ | {|class="wikitable" border="1" style="border-spacing:0;text-align: right;" |
||
+ | !Protein |
||
+ | !SignalPv3 NN MeanS-Score |
||
+ | !Hmm-Confidence |
||
+ | !SignalPv4 NN MeanS-Score |
||
+ | !SignalPeptide.de |
||
+ | |- |
||
+ | |ODBA_Human |
||
+ | |0.561 |
||
+ | |0.723 |
||
+ | |0.357 |
||
+ | |no entry |
||
+ | |- |
||
+ | |P02768 |
||
+ | |0.941 |
||
+ | |0.967 |
||
+ | |0.890 |
||
+ | |confirmed |
||
+ | |- |
||
+ | |P11279 |
||
+ | |0,961 |
||
+ | |1.000 |
||
+ | |0.962 |
||
+ | |confirmed |
||
+ | |- |
||
+ | |P47863 |
||
+ | |0.376 |
||
+ | |0.723 |
||
+ | |0.139 |
||
+ | |no entry |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated. |
||
+ | Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3. |
||
+ | |||
= GO terms = |
= GO terms = |
||
+ | The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase. |
||
+ | |||
<table border=1> |
<table border=1> |
||
<tr><td colspan=4 align=center><b>GOPET</b></td></tr> |
<tr><td colspan=4 align=center><b>GOPET</b></td></tr> |
||
Line 762: | Line 970: | ||
<tr><td>GO:0046872</td><td>F</td><td>62%</td><td>metal ion binding</td></tr> |
<tr><td>GO:0046872</td><td>F</td><td>62%</td><td>metal ion binding</td></tr> |
||
</table> |
</table> |
||
+ | |||
+ | |||
+ | Unfortunately applying ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%). |
||
############## ProtFun 2.2 predictions ############## |
############## ProtFun 2.2 predictions ############## |
Latest revision as of 16:12, 3 December 2012
Contents
secondary structure
To predict secondary structure we use the following tools and compare the results:
-reprof -psipred -DSSP_Server
Methods
reprof
to run reprof from the command line the following command is used:
reprof -i seq.fasta
reprof then calculates the secondary structure prediction and provides an output file "seq.reprof". Reprof can be run with a single fasta file, or with a BLAST/HHBlits - PSSM file. We have tried both variants, because the second variant promises more accurate results. We used HHBlits - PSSM files for this purpouse. Result: (H = Helix, E = Extended/Sheet, L = Loop)
reprof with fasta
obda_human Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9221124455554036207776653067862000247852012212357787787762666544200476501154066765467703167878778656 LLHHHHHHHHHHHHLLLLHHHHHHHLLLLLLLELLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHLLLLLLELLLLEEEEELLLLLEELLLLLLLLLHH MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 7776655320100000301100557547888740466664011100046751342012001024530573233245541430113666535300255543 HHHHHHHHHHHHHLHLHHEEELLLLLEEEEEEELLLLLLLLELLLLELLLLLEEEEEELLLLEEEELLLLHHHHHHHHHLLHLLLLLLLLLLLELLLLLL KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 2565305212002477767777653177627888842564565652123003342344178887067503105676210256402047640478887567 EEEEELLLHHHHLHHHHHHHHHHHHLLLLEEEEEEELLLLLLLLLLLLLLEEEEELLLLEEEEEELLLEEELLLLLLLELLLLEELLLLLEEEEEEEELL HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 7234432321577765553267212200001212125776522000114312346677733555545324788866888888643278889998876138 LLEEEEELLLHHHHHHHHHLLLLEEEEHEEEEELLLLLLLLLLHLLLLLLLLLLLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHHHHHHHHLL NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 898975011024431728777888989998887256688886799 LLLLLLEEELHHHHHLHHHHHHHHHHHHHHHHHHLLLLLLLLLLL PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9721000003610177776776314314516778775677778877514877115421004336431001011566864102210024543110024337 LLLLELHHLLLLLHHHHHHHHHHHLLEEEELLLLLLHHHHHHHHHHHHLLLLHHHHHHHHLLLLLLLHEEEHLLLLLLLLEEEEELLLLLLLLHLLLLLL MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL 0133000013330258887661566556631578202223423334201324622688888888877652057775225666664004000153444411 HHHHLHLHHHHHHLLLLLLLLHHHHHHHHHLLLLLHLLHHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLLLHHHHHHHHHLLLLLLHHHHHHHHH PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR 1047778532245655530000101103476775603366543044581121001011000456101257888875566677776613656705677777 HLLLLLLLLLHHHHHHHHHLHLLHHHLLLLLLLLLHHHHHHHLLLLLLLHHHHLHHEEEHLLLLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLHHHHHHH LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 7631777612221231100357767777776410311433210357677236888888842788636677614532246313688888875112001025 HHHLLLLLLHHHHHHHHLHHHHHHHHHHHHHHHLLHHHHHHHLLLLLLLLHHHHHHHHHHLLLLLEEEEEEELLLLLLLLLHHHHHHHHHHHLLHHHHLL ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL 65766771365554034675421654433056661778999988407888851138 LLLLLLLHHHHHHHLLLLLLLLHHHHHHHLLLLLLHHHHHHHHHHHLLLLLLLELL SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9888511664455661466520588761225577627887133322010123575211247777614640222244420133525663200001133420 LLLLLLLLLLLLLLLLEEEEELLLLLLLEEEEEEELLLLLLEEEEEHHHELLLLLLLEEEEEEEEELLLLEEELLLLLLLLLLLEEEEELLLLHHHHHHE MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK 1331577766237775323055315633101345543037640577623532202211441221344665000134433220422233320340467624 EEEELLLLLLLEEEEEEEELLLLEEEEEEEHHHHHHHLLLLLEEEEEELLLLLLEEEEEEEEEEEEEEEEELHHHHHHHHHLLLLLHHHHHLLLEEEEEL LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG 6474436343554310157887757611246217755467764332010220100255530521010002254110100100235413777501557763 LLLLLLLLHHHHHHHHLLLLLLLLLLLEEEEELLLLLLLLLLLLLLELLLLLELLEEEEELLLLEEEEHLLLLHHHHEHHHLLLLLLEEEEEELLLLLLL GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 2577760664021214631678750541456651137786315542201455324354045676554101135675567656767301776667777554 EEEEEEELLLEEEEELLLEEEEEELLLEEEEEEELLLLLLLLLLLLLEEEEEELLLLLHHHHHHHHHHHEELLLLLLLLLLLLLLLHHHHHHHHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI 4421010235433046662477622277652464221277133333243232000332155875402001230153121135788877678876434554 HHHLHHEEEEEEEELLLLLEEEEELLLLLLLLLLLEELLLLEEEEEEEEEEELLHHHHLLLLLLLEEEEHHHHLLLLHHLLLLLLLLLLLLLLLLHHHHH RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN 331156787677886677889 HHHLLLLLLLLLLLLLLLLLL KALTSETNGTDSNGSNSSNIQ
Q9X0E6 Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 7687613771687888888888877787530020104624520132330256774264687898899987508777820100246778889988877524 LEEEEELLLLHHHHHHHHHHHHHHHHHHHHLHLHHHLLLEEELEEELLHHHHHHHLLLHHHHHHHHHHHHHLLLLLLLHHEHHHHHHHHHHHHHHHHHHL MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV 9 L L
reprof with HHBlits - PSSM (up20)
To retrieve PSSM-files from hhblits, the tool hhblits_pssm.pl from the hhsuite is used( we used the version installed in "/opt/hhblits/hhblits/" on jobtest ). It is started from the command line with the following command:
hhblits_pssm.pl --infile query.fasta --outfile query.pssm -h "/mnt/project/rost_db/data/hhblits/uniprot20_current"
now reprof is run using the created pssm's. Results:
obda_human - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9011122245644236115555543116766654453556544557765555458776453234143366656786728986778724477557888978 LLHLLLHHHHHHHLLLLLHHHHHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEEELLLLLELLLLLLLLLLHH MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 8888998898788888898885047875763577741478899998606897795021488998773998789899885003564567752011066777 HHHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLHHHHHHHHHHHLLLLLEEEELLHHHHHHHHLLLLHHHHHHHHHLLLLLLLLLLLLLEELLLLLL KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 6100101256557999999988553178816889881350116558999999774499889998848703362023556840288988637982799855 LLLLLELEHHHHHHHHHHHHHHHHHLLLLLEEEEEEELLLHLLLHHHHHHHHHHHLLLLEEEEEELLLEEEEEELLLLLLLLHHHHHHHHLLLLEEEEEL HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 6878889899999988752489768999973032777776755657888999986047859999988886799997888889889899889888888736 LLHHHHHHHHHHHHHHHHHLLLLEEEEEEEELLLLLLLLLLLLLLLLHHHHHHHHHLLLHHHHHHHHHHHLLLLLHHHHHHHHHHHHHHHHHHHHHHHHL NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 674889999984266986689888988999872585477765789 LLLLHHHHHHHHLLLLLHHHHHHHHHHHHHHHHLLLLLLLLLLLL PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
P10775 - PSSM UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9565456756800079887228884688850676780015788998731898358884146568446899988752215662788841464672215778 LEEELLLLLLLHHLHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLLLLHHHH MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVL 9887308884588830464683328888888862143407788542534755457889887128881688852465564557888877740054607888 HHHHHHLLLLEEEEEELLLLLLLLHHHHHHHHHHHLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEEELLLLLLLLHHHHHHHHHLHLLLEEEEE PSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLR 5235237654577642014523537877135656742178888877401756388886436667013788898873078807887046736732088888 EELLLLLLLLHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLHHHHH LENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 6630326755888851566674434568988711888168782056578645899987620337752788731675685427788988830888358783 HHHHLLLLLEEEEEEELLLLLLLLHHHHHHHHHLLLLLEEEEELLLLLLLLLHHHHHHHHLLLLLLEEEEEELLLLLLLLLHHHHHHHHHLLLLLEEEEE ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL 16768915799999887624877067785067688677999998898767846738 LLLLLLHHHHHHHHHHHHLLLLLEEEEEEELLLLLHHHHHHHHHHHHHLLLLEEEL SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9877777751245775557766787777782220055787587899999852899898889999999999885489834700233453125510899999 LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHLELLLLLLLHHHHHHHHHLLLLLLHHHHHHHHHHHHHHHHLLLLEEEEELLEEEEEELLLLHHHHHH MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMK 9874046554058998856078976799999999864215760899860665688887742352656642488899989999871264064687267614 HHHHLLLLLLLEEEEEEEELLLLLLHHHHHHHHHHHHHHLLLLEEEEELLLLHHHHHHHHLLLHHHHHHHHHHHHHHHHHHHHHLHHHEEELLLEEEEEL LFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHG 4468877624101025657787543410023305421124444555301256567542128789999988769817898434133301220477656630 LLLLLLLLHHHLHHLLLLLLLLLLLLLHEEEELLLLLLLLLLLLLLLEELLLLLLLLLELLHHHHHHHHHHLLLLEEEEELLLLLLLEEELLLLLLLLLL GLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 6999984654465668357999981884238986037887888976653121332023457888998531456765455553223355057788999999 EEEEEEELLLLLLLLLLEEEEEEEELLLLEEEEEELLLLLLLLLLLLLLLLEELLLHHHHHHHHHHHHHHLLLLLLLLLLLLLLLLLLHHHHHHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKI 9999998788876551124787312477874554102476332344445542100135665766532426788752502146888766555656657666 HHHHHHHHHHHHHHHHLLLHHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHLLLLLLLLLLLLLLLHHHHHHLLLHLLLLLLLLLLLLLLLLLLLLL RAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSIN 554655567766555544559 LLLLLLLLLLLLLLLLLLLLL KALTSETNGTDSNGSNSSNIQ
Q9X0E6 - PSSM - UP20 HHBlits Reliability( 0-9 (most reliable) ) sec-structure AA-sequence 9999996599688889999999956312676518237898878224320588999844887899899999729988980788864026878999999843 LEEEEEELLLHHHHHHHHHHHHHHLLEEEEEELLLEEEEEELLELLLLLEEEEEEEELHHHHHHHHHHHHHLLLLLLLLEEEEELLLLLHHHHHHHHHHL MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESV 9 L L
psipred
the version from [1] was used to predict secondary structure with psipred. Results:
obda_human confidence sec-structure AA-sequence 915554344652010125789986408888898888999867679889999999999943 CHHHHHHHHHHHHCCHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE 345544347897889982787589997259999999999999999999799999999999 CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHH FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY 984258734444798615999998531399982418689312241328998999998626 HHHHCCCCCCCCCCCCCHHHHHHHHHHCCCCCEEECCCCCCHHHHHCCCCHHHHHHHHCC ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG 978889988877778888776122353334681566759888767099958999818886 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCC NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA 670358878577674079759999579823236884113641143204567987321028 CCHHHHHHHHHHHHHHCCCEEEEEECCCEEECCCCCCCCCCCHHHHHCCCCCCCCCEECC ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 209999999999999999089974998642117999999999999997788866514995 CCHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHCCCC NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP 799999999779999999999999999999999999992999996677866421799789 HHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHCCCCHHH ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL 9999999999998299899998789 HHHHHHHHHHHHHHCCCCCCCCCCC RKQQESLARHLQTYGEHYPLDHFDK
P10775 confidence sec-structure AA-sequence 989828999999999999677137869971179999887999999643699967898647 CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCC MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRT 999958999999760499984138982169999455648999723799858897979999 CCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCC NELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPL 926999999884299986488981148899014899999871699979997869999968 CHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHH GDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEA 999999750299985368971389999776999999882199979796999999918999 HHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH GARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA 998530499984148970289999665999999860499969996889999907999999 HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHH ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLC 980499986458961189789888999999882499749896999999925899999851 HHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHC ESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQAL 899961119980268899666999999984399979885999999938999999840699 CCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCC SQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQP 997678830589887899999999995599831219 CCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC GCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 confidence sec-structure AA-sequence 999988999887776433454799998899321139899989999999996317799999 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHH MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESV 999999999997219880321498447446663057899999816999998630024543 HHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCC ALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYV 689971899999999834099957772157432122131554799995408989999985 CCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHH DRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMD 429830331043915485046798999911110268999999999741002589999889 HCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCC AFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFG 976434567889874069747689999987549803433323114013201134467999 CCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCC NEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFP 469980378752345882379998376020786434899998898885444531367899 CEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHH SLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEK 999999984067865568988888820389999999997775556788889999842100 HHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH VTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESV 000137999999886667723566666677776665430699998898548887401200 HHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHH LTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRIN 38999989998864522211113457789999999999999 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ
Q9X0E6 confidence sec-structure AA-sequence 999999079999999999998635611237776532476544610112148789766711 CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCC MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEE 19999999986599966418998366555778899875229 CHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC KEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL
DSSP_Server
to use DSSP_Server we first had to determine which pdb-ID's are associated with the uniprot ID's P12694, P10775, Q9X0E6, Q08209
uniprot ID | pdb ID's |
P12694 | 1DTW, 1OLS, 1OLU, 1OLX, 1U5B, 1V11, 1V16, 1V1M, 1V1R, 1WCI, 1X7W, 1X7Y, 1X7Z, 1X80, 2BEU, 2BEV, 2BEW, 2BFB, 2BFC, 2BFD, 2BFE, 2BFF, 2J9F |
P10775 | 1DFJ, 2BNH |
Q08209 | 1AUI, 1M63, 1MF8, 2JOG, 2JZI, 2P6B, 2R28, 2W73, 3LL8 |
Q9X0E6 | 1KR4, 1O5J, 1VHF |
now DSSP_Server is run for each uniprot ID with the corresponding pdb ID with the best resolution or greates span over the protein. Results:
P12694 - 2BFD - Position 46-445 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S TT SS SS S EE SB TTS BS GGG HHHHHHHHHHHHHHHHHHHHHHHHHHTTSSS TT HHHHHHHHHTS AKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALD TTSEEE S HHHHHHTT HHHHHHHHHT TT TTTT S SS BTTTTB SSTTTHHHHHHHHHHHHHHHT EEEEEETTGGGSHHHHHH NTDLVFGQAREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAG HHHHHHTT EEEEEEE SEETTEEGGGT SSSTTGGGTGGGT EEEEEETT HHHHHHHHHHHHHHHHHHT EEEEEE HHHHHHHHH FNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIG!ST!DHPISRLRHYL TTTT HHHHHHHHHHHHHHHHHHHHHHHHS B GGGGSTTSSSS HHHHHHHHHHHHHHHHHGGGS GGGB S EEEE HHHHHHHHH LSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK!AHF!EYGQTQKMNLFQSVTSAL HHHHHH TT EEEETTTTTT TTSTTTTHHHHH TTTEEE S HHHHHHHHHHHHHTT EEEE SSGGG GGGHHHHHTTGGGHHHHTTTSS TTEE DNSLAKDPTAVIFGEDVAFGGVFRCTVGLRDKYGKDRVFNTPLCEQGIVGFGIGIAVTGATAIAEIQFADYIFPAFDQIVNEAAKYRYRSGDLFNCGSLT EEEEES SS GGGSS HHHHHTSTT EEE SSHHHHHHHHHHHHHSSS EEEEEEGGGTTS EEEESS SS EEEE SSEEEEE TTHH IRSPWGCVGHGALYHSQSPEAFFAHCPGIKVVIPRSPFQAKGLLLSCIEDKNPCIFFEPKILYRAAAEEVPIEPYNIPLSQAEVIQEGSDVTLVAWGTQV HHHHHHHHHHHHHH EEEEE EEES HHHHHHHHHHHS EEEEEEEESTT HHHHHHHHHHHHHGGG SS EEEEE SS STTHHHHS HHH HVIREVASMAKEKLGVSCEVIDLRTIIPWDVDTICKSVIKTGRLLISHEAPLTGGFASEISSTVQEECFLNLEAPISRVCGYDTPFPHIFEPFYIPDKWK HHHHHHHHHT CYDALRKMINY
P10775 - 2BNH - position 1-456 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S B EES HHHHHHHHHHHTT SEEEEET HHHHHHHHHHHTT TT EEE S HHHHHHHHHHHHSSTT EEE TTS GGGGGS AMNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGV HHHHHHH TT EEE S HHHHHHHHHHHHHSTT EEE TT BHHHHHHHHHHHHH S EEE TTSB HHHHHHHHHHHHHT S EE LPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETL E TTS HHHHHHHHHHHHH TT EEE SS HHHHHHHHHHHHT TT EEE TTS HHHHHHHHHHHHH SS EEE TTS HHHHHHHH RLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLL HHHHTSTT EEE TTS BGGGHHHHHHHHHH SS EEE SSB HHHHHHHHHHHTTSSS EEE TTS HHHHHHHHHHHHH S EEE CESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELD TTSS HHHHHHHHHHHTSSS EEE TT HHHHHHHHHHHHH SS EEE LSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Q08209 - 1AUI - position 1-521 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S SSTTS B HHHHB TTS B HHHHHHHHHTT B HHHHHHHHHHHHHHHHTS SEEEE SSEEEE TT HHHHHHHHHHH TTT ATDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTR EEE S SSSSS HHHHHHHHHHHHHHSTTTEEE TTSSHHHHHHSSHHHHHHHHS HHHHHHHHHHHTTS EEEETTTEEEESS TT SHHHH YLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDI HHS SSS SSSHHHHHHH EE TTTTS SS EEE TTTTSSEEE HHHHHHHHHHTT SEEEE S TTSEEE B TTTSSBSEEEE SSGG RKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYL GTS EEEEEEETTEEEEEEE GGG HHHHHHHHHHHHHHHHHHHHHTT HHHHHHHHGGGGS S HHHHHHHH DVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS!SFEEAKGLDRINERMPPR!SYPLEMCSHFDADEIKRLG HHHHHH TT SEE HHHHTTSHHHHT TTHHHHHHHH TT SSSEEHHHHHHHHGGG TT HHHHHHHHHHHH TT SSEE HHHHHHHHHHHHTTSS KRFKKLDLDNSGSLSVEEFMSLPELQQNPLVQRVIDIFDTDGNGEVDFKEFIEGVSQFSVKGDKEQKLRFAFRIYDMDKDGYISNGELFQVLKMMVGNNL HHHHHHHHHHHHHHH TTSSSSEEHHHHHHHHGGG GGGG KDTQLQQIVDKTIINADKDGDGRISFEEFCAVVGGLDIHKKMVVDV
Q9X0E6 - 1KR4 - position 1-101 sec-structure ( H = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend ) aa-sequence S EE EEEEEEEESSHHHHHHHHHHHHHTTS SEEEEEEEEEEEEETTEEEEEEEEEEEEEEEGGGHHHHHHHHHHH SSSS EEEE EEHHH AALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEY HHHHHHHTS XNWLRESVLGS
Comparison
|
This table shows the result of our comparison. Psipred performs on all targets way better than reprof using a single fasta file, but reprof outperforms psipred in 3/4 cases using a HHBlits PSSM as query, in case 4 they perform even. The TruePositives( TP's ) represent the matched secondary structure elements between the predicted method and DSSP in the range of the DSSP file. |
Disorder
To predict Disorder in our Protein obda_human IUPred is used, and compared to the entries in DisProt. As in DisProt only the Protein Q08209 can be found directly, the feature "search by sequence" has to be used and checked wheather reliable hits can be found. The following entries were chosen:
up-ID | DisProt-ID | identities | positives | gaps | e-value | direct hit |
Q08209 | DP00092 | 100% | 100% | 0 | - | y |
P12694 | - | 0% | 0% | 0 | - | n |
P10775 | DP00554 | 40% | 54% | 0 | 5e-30 | n |
Q9X0E6 | DP00175 | 32% | 56% | 0 | 4.3 | n |
Q08209
|
|
P12694
|
|
P10775
|
|
Q9X0E6
|
|
Transmembrane helices
For the prediction of Transmembrane helices in our Protein we used PolyPhobius. In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P35462(D(3) dopamine receptor), Q9YDF8(Voltage-gated potassium channel) and P477863(Aquaporin-4).
Polyphobius predicts ODBA_HUMAN to be a completly cytosomal protein without any transmembrane regions, we found no entry in any of the databases that indicate otherwise.
P35462 on the other hand is predicted to be a transmembrane protein with seven transmembrane regions.
Region | PolyPhobius Start | Stop | UniProt Start | Stop | OPM Start | Stop | PDBTM Start | Stop |
---|---|---|---|---|---|---|---|---|
1.transmembrane | 30 | 55 | 33 | 55 | 34 | 52 | 35 | 52 |
2.transmembrane | 66 | 88 | 66 | 88 | 67 | 91 | 68 | 84 |
3.transmembrane | 105 | 126 | 105 | 126 | 101 | 126 | 109 | 123 |
4.transmembrane | 150 | 170 | 150 | 170 | 150 | 170 | 152 | 166 |
5.transmembrane | 188 | 212 | 188 | 212 | 187 | 209 | 191 | 206 |
6.transmembrane | 329 | 352 | 330 | 351 | 330 | 351 | 334 | 347 |
7.transmembrane | 367 | 386 | 367 | 388 | 363 | 386 | 368 | 382 |
As can be seen in the table above, the number of transmembrane regions is the same of all three databases and for the prediction. While the transmembrane regions largely overlap between the different information sources, there are some differences regarding the exact start and stop positions of the transmembrane regions. Depending on which Database we choose as standard of truth we get slightly different results for evaluating the performance of the transmembrane region prediction.
For Q9YDF8 PolyPhobius again predicts seven transmembrane regions.
Region | PolyPhobius Start | Stop | UniProt Start | Stop | OPM Start | Stop | PDBTM Start | Stop |
---|---|---|---|---|---|---|---|---|
1.transmembrane | 42 | 60 | 39 | 63 | 153 | 172 | 21 | 52 |
2.transmembrane | 68 | 88 | 68 | 92 | 183 | 195 | 57 | 80 |
3.transmembrane | 108 | 129 | 109 | 125 | 207 | 225 | 151 | 171 |
4.transmembrane | 137 | 157 | 129 | 145 | *184 | *200 | ||
5.transmembrane | 163 | 184 | 160 | 184 | 209 | 236 | ||
6.transmembrane | 196 | 213 | *196 | *208 | ||||
7.transmembrane | 224 | 244 | 222 | 253 |
In this case however the number of transmembrane regions found per database varies greatly. While Uniprot notes six transmembrane regieons and one intermembrane region (marked with *) that mostly overlap the prediction of Polyphobius, the 1orq structure of the protein has four identical subunits that all have the same three transmembrane regions. According so PDBTM there are four transmembrane regions and one intermembrane region for each of the identical subunits. For this Protein identification of transmembrane regions seems to be quite difficult, as there is so little consensus across the different databases. The problem seems to be caused by the shallow angles by which most of the helices enter the membrane, and by the fact that only very few of them actually cross the membrane.
For P47863 PolyPhobius predicts six transmembrane regions.
Region | PolyPhobius Start | Stop | UniProt Start | Stop | OPM Start | Stop | PDBTM Start | Stop |
---|---|---|---|---|---|---|---|---|
1.transmembrane | 34 | 58 | 37 | 57 | 34 | 56 | 38 | 55 |
2.transmembrane | 70 | 91 | 65 | 85 | 70 | 88 | 72 | 89 |
3.transmembrane | 115 | 136 | 116 | 136 | 98 | 107 | *94 | *106 |
4.transmembrane | 156 | 177 | 156 | 176 | 112 | 136 | 116 | 133 |
5.transmembrane | 188 | 208 | 185 | 205 | 156 | 178 | 158 | 177 |
6.transmembrane | 231 | 252 | 232 | 252 | 189 | 203 | 188 | 205 |
7.transmembrane | 214 | 223 | *209 | *222 | ||||
8.transmembrane | 231 | 252 | 231 | 248 |
The third and seventh transmembrane regions are listed in OPM despite the fact, that they are actually to short to span through a membrane. Most likely these are actually intermembrane helices, as marked in PDBTM (as can be seen in the illustration to the left, two helices of the yellow subunit do not actually spann through the membrane).
In from the analysis with the three proteins we conclude that results produced by PolyPhobius seem reasonably good. While PolyPhobius seems to filter out results that are to short to actually be a transmembrane helix, it seems not to differenciate well between intermembrane and transmembrane helices, especially when the angle of entry is very shallow. This again can confuse the tools sense of interior/outerior of a protein which leads to decrease in performance.
Some other tools to predict transmembrane regions are for example MEMSAT3, MINNOU, PHDhtm, TMHMM2, DAS, HMMTOP2, OCTOPUS, SVMtop, PONGO or BPROMPT.
Signal peptides
For the prediction of Signal Peptides we used SignalP in Versions 3(Offline-Version) and 4 (Webserver). In addition to our protein of interest (ODBA_HUMAN, see Reference sequence (uniprot)) we applied the method as well to P02768(Serum albumin), P11279(Lysosome-associated membrane glycoprotein 1) and P477863(Aquaporin-4).
Protein | SignalPv3 NN MeanS-Score | Hmm-Confidence | SignalPv4 NN MeanS-Score | SignalPeptide.de |
---|---|---|---|---|
ODBA_Human | 0.561 | 0.723 | 0.357 | no entry |
P02768 | 0.941 | 0.967 | 0.890 | confirmed |
P11279 | 0,961 | 1.000 | 0.962 | confirmed |
P47863 | 0.376 | 0.723 | 0.139 | no entry |
The default cutoff for SignalPv3 to consider a Protein to be a signal Protein is 0.48 for the Mean-S-score. So according to the old version of SignalP our protein would be classified as a signal peptide. When using the D-score (a weighted average of the S-mean and the Y-max scores), which is supposed to show the best discrimination, ODBA_Human would fall just below the cutoff (0.425/0.430) and would no longer be classified as signal peptide. In all other cases there was no disagreement between Mean-S and D score. The HMM prediction produced quite high confidence values for all four proteins, even when the Mean-S and D values from the neural network prediction indicated otherwise. We were unable to obtain any HMM predictions for the web version of SignalPv4. We are not sure if this is just a limitation of the web platform, or if this prediction method was depreciated. Checking the predictions with SignalPeptide.de proved difficult, as for P02768 neither searching for UniprotID (ALBU_HUMAN) nor Accesion number (P02768) or sequence retrieved any results, only searching for the trivial name Serum albumin scored any results. We had the same problem with P11279. For P47863 and ODBA_Human we found no entries in SignalPeptide.de. UniProt however marks the OBDA_human sequence from position 1 to 45 as a transit peptide domain, and notes it's cellular location as mitochondrial. While this directly contradicts the SignalP prediction, it seems likely for a protein that metabolizes amino acids to be located in the mitochondria. For P47863 a lookup in UniProt, as well as the prediction from PolyPhobius identified the protein as a membrane protein. These seem to be quite similar to signal peptides, which might explain the predicted likelyhood of 72.3% with the HMM aproach in PolyPhobiusv3.
GO terms
The predicted GO Terms from GOPET give a quite good idea of the function of our protein. All predictions with a confidence above 90% are spot on and remakably detailed. However ODBA_human is actually not marked with GO:0004739 (pyruvate dehydrogenase acetyl-transferring activity) or GO:0004738 (pyruvate dehydrogenase activity). It also seems odd, that the hierarchically higher term GO:0004738 is predicted with a lower confidence than the more detailed GO:0004739 term. GOPET predicts for our Protein multiple exclusive categories of dehydrogenases so we would assume that the protein in fact has some kind of dehydrogenase activity. Without further information about the protein it would be hard to decide which of the predicted dehydrogenase categories it actually belongs to, reducing the possible functions to this few select terms is however already a considerable feat. When searching the sequence of our protein against Pfam we found it belongs to the E1_dh family where the dh stands for dehydrogenase. This further strengthens our assumption that our protein of interest is a dehydrogenase.
GOPET | |||
GOid | Aspect | Confidence | GO-Term |
GO:0003824 | F | 97% | catalytic activity |
GO:0016491 | F | 96% | oxidoreductase activity |
GO:0016624 | F | 95% | oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor |
GO:0003863 | F | 90% | 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity |
GO:0004739 | F | 89% | pyruvate dehydrogenase acetyl-transferring activity |
GO:0004738 | F | 78% | pyruvate dehydrogenase activity |
GO:0003826 | F | 77% | alpha-ketoacid dehydrogenase activity |
GO:0047101 | F | 75% | 2-oxoisovalerate dehydrogenase acylating activity |
GO:0008677 | F | 65% | 2-dehydropantoate 2-reductase activity |
GO:0019152 | F | 63% | acetoin dehydrogenase activity |
GO:0030955 | F | 63% | potassium ion binding |
GO:0016616 | F | 62% | oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor |
GO:0046872 | F | 62% | metal ion binding |
Unfortunately applying ProtFun to determine the protein function did not really help very much. There is only one statement that has a confidence value above 33%, which is, that the Protein is an enzyme (76.9%).
############## ProtFun 2.2 predictions ############## >sp_P12694_O # Functional category Prob Odds Amino_acid_biosynthesis 0.187 8.520 Biosynthesis_of_cofactors 0.246 3.413 Cell_envelope 0.035 0.581 Cellular_processes 0.041 0.560 Central_intermediary_metabolism => 0.321 5.096 Energy_metabolism 0.208 2.310 Fatty_acid_metabolism 0.023 1.738 Purines_and_pyrimidines 0.257 1.059 Regulatory_functions 0.031 0.194 Replication_and_transcription 0.170 0.636 Translation 0.047 1.078 Transport_and_binding 0.029 0.071 # Enzyme/nonenzyme Prob Odds Enzyme => 0.769 2.683 Nonenzyme 0.231 0.324 # Enzyme class Prob Odds Oxidoreductase (EC 1.-.-.-) 0.178 0.857 Transferase (EC 2.-.-.-) 0.238 0.690 Hydrolase (EC 3.-.-.-) 0.190 0.601 Lyase (EC 4.-.-.-) 0.076 1.614 Isomerase (EC 5.-.-.-) 0.010 0.321 Ligase (EC 6.-.-.-) => 0.085 1.673 # Gene Ontology category Prob Odds Signal_transducer 0.098 0.458 Receptor 0.006 0.038 Hormone 0.001 0.206 Structural_protein 0.005 0.170 Transporter 0.025 0.226 Ion_channel 0.009 0.163 Voltage-gated_ion_channel 0.004 0.170 Cation_channel 0.010 0.215 Transcription 0.060 0.470 Transcription_regulation 0.053 0.427 Stress_response 0.010 0.110 Immune_response 0.012 0.136 Growth_factor 0.009 0.609 Metal_ion_transport 0.012 0.025 //