Difference between revisions of "Sequence-Based Predictions Hemochromatosis"
Bernhoferm (talk | contribs) (→Comparison) |
Bernhoferm (talk | contribs) |
||
(68 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Hemochromatosis|Hemochromatosis]]>>[[Sequence-Based Predictions Hemochromatosis|Task 3: Sequence-based predictions]] |
[[Hemochromatosis|Hemochromatosis]]>>[[Sequence-Based Predictions Hemochromatosis|Task 3: Sequence-based predictions]] |
||
− | |||
− | Qc terxepssrw evi jypp sj aiewipw. Mrjsvq xli Uyiir, ws xlex wli qmklx wlss xliq eaec. Livi ai ks 'vsyrh xli qypfivvc fywl. Ks qsroic KS! |
||
− | |||
− | Aqw vjkpm K co etcba, dwv vjga ycpv aqw vq vjkpm vjcv. K mpqy ugetgvu. Mggr vjg rcpvcnqqpu. Cnycau mggr vjg rcpvcnqqpu. |
||
− | |||
− | Don't google it... but a hint: Caesar would solve it ;) |
||
Line 12: | Line 6: | ||
Detailed description: [[Task_3_-_Sequence-based_predictions|Sequence-Based Predictions]] |
Detailed description: [[Task_3_-_Sequence-based_predictions|Sequence-Based Predictions]] |
||
+ | In this part of the wiki we present our results on different sequence based prediction methods. |
||
+ | These cover the prediction of secondary structure, disordered regions, transmembrane helices, signal peptides and GO annotations. |
||
− | * '''TODO:''' Task description |
||
− | * '''TODO:''' Table numbers (once all tables are finished) |
||
== Protocol == |
== Protocol == |
||
− | [[Task3 Hemochromatosis Protocol| |
+ | A protocol with a description of the data acquisition and other scripts used for this task is available [[Task3 Hemochromatosis Protocol|here]]. |
== Secondary Structure == |
== Secondary Structure == |
||
<br style="clear:both;"> |
<br style="clear:both;"> |
||
+ | In the following the secondary structure predictions were evaluated against the DSSP data. The DSSP data was parsed so that only H(helix), E(sheet) and C(coil) are existant. Nonanalyzed positions that exists in the (for DSSP) used sequence were denoted as "*" in the sequence and (from us) predicted as coil. |
||
+ | |||
+ | Afterwards Q3 and SOV scores were evaluated, where Q3 denotes the percentage of right assigned secondary structures. The SOV is a scoring to calculate how good single secondary structure fragments are approximated. This means |
||
+ | CCCCCHHHHHHHHHCCCCC |
||
+ | CCCCCHCHCHCHCHCCCCC |
||
+ | gets a much lower score than |
||
+ | CCCCCHHHHHHHHHCCCCC |
||
+ | CCCCCCCHHHHHCCCCCCC |
||
+ | although their Q3 scores dont differ. The maximum score is here also 100%. |
||
+ | this gives some more insight about the predictions. |
||
+ | |||
+ | The Q3E, Q3H and Q3C score denote the percentage amount of correctly predicted E/H/C secondary structures. |
||
+ | |||
+ | The used Sequences for this were the "aligned" secondary structure sequences. |
||
+ | |||
+ | |||
+ | === 1KR4 === |
||
+ | DSSPSQ: ALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEYXNWLRESVLGS |
||
+ | PsiPSQ: MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL |
||
+ | RPRFSQ: MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL |
||
+ | DSSPSS: CCEECCCEEEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEEEECCCCEEHHHHHHHHHHCCCC |
||
+ | PsiPSS: CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCCCHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC |
||
+ | RPRFSS: CEEEEECCCCHHHHHHHHHHHHHHHHHHHHCHCHHHCCCEEECEEECCHHHHHHHCCCHHHHHHHHHHHHHCCCCCCCHHEHHHHHHHHHHHHHHHHHHCC |
||
+ | |||
+ | |||
+ | <figtable id="Q3SOV1KR4"> |
||
+ | {| class="wikitable" style="border-collapse: collapse; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 0" |
||
+ | !style="width:60%;text-align:left;border-style: solid; border-width: 0 1px 1px 0"| Scoring method |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 1px 0"| PsiPred |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 0px 1px 0"| ReProf |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3 |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 85.15 |
||
+ | |style="text-align:center"| 57.43 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3E |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 76.19 |
||
+ | |style="text-align:center"| 26.19 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3H |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 91.89 |
||
+ | |style="text-align:center"| 97.30 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3C |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 90.91 |
||
+ | |style="text-align:center"| 50.00 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| SOV |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 82.61 |
||
+ | |style="text-align:center"| 60.37 |
||
+ | |- |
||
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 1''': Q3- and SOV-Scores of the predictions with ReProf and PSIPred (PDBID: 1KR4). |
||
+ | |} |
||
+ | </figtable> |
||
+ | |||
+ | === 1AUI === |
||
+ | DSSPSQ: **************TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP |
||
+ | PsiPSQ: MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP |
||
+ | RPRFSQ: MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP |
||
+ | DSSPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCC |
||
+ | PsiPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCC |
||
+ | RPRFSS: CCCCCCCCCCCCCCCCEEEEECCCCCCCEEEEEEECCCCCCEEEEEHHHECCCCCCCEEEEEEEEECCCCEEECCCCCCCCCCCEEEEECCCCHHHHHHEEEEECCCCCCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCC |
||
+ | -------------------------------------------------------------------------------------------------------------------------------------- |
||
+ | DSSPSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA |
||
+ | PSIPSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA |
||
+ | RPRFSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA |
||
+ | DSSPSS: CCEEECCCCCCCHHHHHHCCHHHHHHHHCCHHHHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCCCCCCEEECHHHHHHHHHHCCCCEEEECCCC |
||
+ | PsiPSS: CCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHHHCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHH |
||
+ | RPRFSS: CEEEEEECCCCCCEEEEEEEEEEEEEEEEECHHHHHHHHHCCCCCHHHHHCCCEEEEECCCCCCCCCHHHHHHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCCCECCCCCECCEEEEECCCCEEEEHCCCCHHHHEHHHCC |
||
+ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
||
+ | DSSPSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS**************************************************** |
||
+ | PsiPSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESVLTLKG |
||
+ | RPRFSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESVLTLKG |
||
+ | DSSPSS: CCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEEECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC |
||
+ | PSIPSS: HHHCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCC |
||
+ | RPRFSS: CCCCEEEEEECCCCCCCEEEEEEECCCEEEEECCCEEEEEECCCEEEEEEECCCCCCCCCCCCCEEEEEECCCCCHHHHHHHHHHHEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCHHEEEEEEEECCCCCEEEEEC |
||
+ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
||
+ | DSSPSQ: *******************************************SFEEAKGLDRINERMPPR |
||
+ | PsiPSQ: LTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ |
||
+ | RPRFSQ: LTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ |
||
+ | DSSPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCC |
||
+ | PSIPSS: CCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC |
||
+ | RPRFSS: CCCCCCCCCCEECCCCEEEEEEEEEEECCHHHHCCCCCCCEEEEHHHHCCCCHHCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCC |
||
+ | |||
+ | <figtable id="Q3SOV1AUI"> |
||
+ | {| class="wikitable" style="border-collapse: collapse; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 0" |
||
+ | !style="width:60%;text-align:left;border-style: solid; border-width: 0 1px 1px 0"| Scoring method |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 1px 0"| PsiPred |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 0px 1px 0"| ReProf |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3 |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 72.84 |
||
+ | |style="text-align:center"| 56.58 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3E |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 52.46 |
||
+ | |style="text-align:center"| 67.21 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3H |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 77.30 |
||
+ | |style="text-align:center"| 34.04 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3C |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 75.00 |
||
+ | |style="text-align:center"| 65.49 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| SOV |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 51.99 |
||
+ | |style="text-align:center"| 28.89 |
||
+ | |- |
||
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 2''': Q3- and SOV-Scores of the predictions with ReProf and PSIPred (PDBID: 1AUI). |
||
+ | |} |
||
+ | </figtable> |
||
+ | |||
+ | === 2BNH === |
||
+ | DSSPSQ: *MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL |
||
+ | PsiPSQ: MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL |
||
+ | RPRFSQ: MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL |
||
+ | DSSPSS: CCECCEECCCCCHHHHHHHHHHHCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHCHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHH |
||
+ | PsiPSS: CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHH |
||
+ | RPRFSS: CCCCECHHCCCCCHHHHHHHHHHHCCEEEECCCCCCHHHHHHHHHHHHCCCCHHHHHHHHCCCCCCCHEEEHCCCCCCCCEEEEECCCCCCCCHCCCCCCHHHHCHCHHHHHHCCCCCCCCHHHHHHHHHCCCCCHCCHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHH |
||
+ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
||
+ | DSSPSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC |
||
+ | PsiPSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC |
||
+ | RPRFSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC |
||
+ | DSSPSS: HHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHCCCCCCCCCEEE |
||
+ | PsiPSS: HHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHCCCCCCCEEEEE |
||
+ | RPRFSS: CCCCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHCHCCHHHCCCCCCCCCHHHHHHHCCCCCCCHHHHCHHEEEHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCHHHHHHHHCHHHHHHHHHHHHHHHCCHHHHHHHCCCCCCCCHHHHHHHHHHCCCCCEEEEEE |
||
+ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
||
+ | DSSPSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS |
||
+ | PsiPSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS |
||
+ | RPRFSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS |
||
+ | DSSPSS: CCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC |
||
+ | PsiPSS: CCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC |
||
+ | RPRFSS: ECCCCCCCCCHHHHHHHHHHHCCHHHHCCCCCCCCCHHHHHHHCCCCCCCCHHHHHHHCCCCCCHHHHHHHHHHHCCCCCCCECC |
||
+ | |||
+ | |||
+ | |||
+ | <figtable id="Q3SOV2BNH"> |
||
+ | {| class="wikitable" style="border-collapse: collapse; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 0" |
||
+ | !style="width:60%;text-align:left;border-style: solid; border-width: 0 1px 1px 0"| Scoring method |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 1px 0"| PsiPred |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 0px 1px 0"| ReProf |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3 |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 91.89 |
||
+ | |style="text-align:center"| 60.96 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3E |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 85.96 |
||
+ | |style="text-align:center"| 21.05 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3H |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 90.31 |
||
+ | |style="text-align:center"| 71.94 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3C |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 95.07 |
||
+ | |style="text-align:center"| 61.58 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| SOV |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 95.47 |
||
+ | |style="text-align:center"| 66.03 |
||
+ | |- |
||
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 3''': Q3- and SOV-Scores of the predictions with ReProf and PSIPred (PDBID: 2NBH). |
||
+ | |} |
||
+ | </figtable> |
||
+ | |||
+ | === 1A6Z === |
||
+ | DSSPSQ: ****RSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGaEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI |
||
+ | PsiPSQ: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI |
||
+ | RPRFSQ: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI |
||
+ | DSSPSS: CCCCCCEEEEEEEEEEECCCCCCECCEEEEEECCEEEEEEECCCCCEEECCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEEEEEEECCCCCEEEEEEEEECCEEEEEEEHHHCEEEECCHHHHHHHHHHHCCCH |
||
+ | PsiPSS: CCCCCHHHHHHHHHHHHHHHCCCCCCCCCCCEEEEEECCCCCCCCEEEEEEEECCEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEECCCEECCCCCCCCEEEECCCCCCCCCCCCCCCCEECCCCHHHHHHHHHHHHHH |
||
+ | RPRFSS: CCCCCCHHHHHHHHHHHHHHCCCEEHHCCCEEEEECCCCHCCCCHHHHHHCCCCCEEEEEECCCCCCCCCCCCCECCCCCHHHHHHHHHCCCCCCCEEEEEHEEEHCCCCCCCCCCEEEEEEEEEECCCCCCCCEEEECCCCCCCEEECCCCCCCCCCCCCCCCCCCCHHHHEE |
||
+ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
||
+ | DSSPSQ: RARQNRAYLERDaPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRbRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTbQVEHPGLDQPLIVIW |
||
+ | PsiPSQ: RARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE |
||
+ | RPRFSQ: RARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE |
||
+ | DSSPSS: HHHHHHHHHHCHHHHHHHHHHHHHCCCCCCCECCEEEEEEEECCCCEEEEEEEEEEECCCCEEEEEECCEECCHHHCCCCEEEECCCCCEEEEEEEEECCCHHHHEEEEEECCCCCCCEEEEC |
||
+ | PsiPSS: HHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCCCCCEEEECCCCCCCCEEEEEECCCCCCCCEEEEEECCCCCCCCCCCCCCCEECCCCCCEEEEEEEECCCCCCCEEEEEECCCCCCCEEEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCC |
||
+ | RPRFSS: EEECCCHCCCCCCHHHHHHHHHHCCCCCCCCCCCCEEEEEEECCCCEEEEEEEECCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCEEEEEEEECCCCCCEEEEEEEECCCCCCCEEEEEECCCCCEEEEEEHHHHHHHHHHHHHHHHHHHEECCCCCCCCCEEEEEECCC |
||
+ | |||
+ | <figtable id="Q3SOV1A6Z"> |
||
+ | {| class="wikitable" style="border-collapse: collapse; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 0" |
||
+ | !style="width:60%;text-align:left;border-style: solid; border-width: 0 1px 1px 0"| Scoring method |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 1px 0"| PsiPred |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 0px 1px 0"| ReProf |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3 |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 76.09 |
||
+ | |style="text-align:center"| 61.59 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3E |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 61.47 |
||
+ | |style="text-align:center"| 60.55 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3H |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 74.29 |
||
+ | |style="text-align:center"| 31.428 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| Q3C |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 93.81 |
||
+ | |style="text-align:center"| 84.54 |
||
+ | |- |
||
+ | !style="text-align:left;border-style: solid; border-width: 0 1px 0 0"| SOV |
||
+ | |style="text-align:center;border-style: solid; border-width: 0 1px 0 0"| 71.38 |
||
+ | |style="text-align:center"| 50.40 |
||
+ | |- |
||
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 4''': Q3- and SOV-Scores of the predictions with ReProf and PSIPred (PDBID: 1A6Z). |
||
+ | |} |
||
+ | </figtable> |
||
+ | |||
+ | You can see, that the PsiPred Q3Score is for these proteins in a range from 72 to 92% and the SOV-score is in a range from 52 to 96%. As here are only four proteins this probably does not reflect the general performance of the prediction, but one can gain insight from this. When just looking at the annotated "aligned" secondary structure sequences, it looks like a fairly good prediction (also when looking at the protein 1AUI although the SOV is quite low. This is very likely caused by the fact that many short H/E sequences are not correctly predicted). Another problem occurs at the regions without DSSP-data. Because this is a disordered region the results my be viewed in addition to a disorder prediction. This could give additional informations for both secondary structure and disorder. |
||
+ | |||
+ | This means the predictions should be reliable to gain more insight of the proteins secondary structure. |
||
+ | |||
+ | |||
+ | The ReProf predictions do not get such good results. The Q3Score ranges from 56 to 62%, the SOVScore from 29 to 66%. This means that this prediction is far less reliable than the PsiPred predictions, which should be preferred. This may due to the use of only four sequences not reflect the general performance, and this prediction method may therefore be better on other proteins. |
||
+ | |||
+ | <!--- |
||
+ | PsiPred |
||
+ | 1AUI |
||
+ | Der Q3 Score war :72.8395061728395 |
||
+ | Der Q3EScore war :52.459016393442624 |
||
+ | Der Q3HScore war :77.30496453900709 |
||
+ | Der Q3CScore war :75.0 |
||
+ | DerSOV war :51.9899480602258 |
||
+ | 1A6Z |
||
+ | Der Q3 Score war :76.08695652173913 |
||
+ | Der Q3EScore war :61.46788990825688 |
||
+ | Der Q3HScore war :74.28571428571429 |
||
+ | Der Q3CScore war :93.81443298969072 |
||
+ | DerSOV war :71.38081469812238 |
||
+ | 1KR4 |
||
+ | Der Q3 Score war :85.14851485148515 |
||
+ | Der Q3EScore war :76.19047619047619 |
||
+ | Der Q3HScore war :91.89189189189189 |
||
+ | Der Q3CScore war :90.9090909090909 |
||
+ | DerSOV war :82.61494252873563 |
||
+ | 2BNH |
||
+ | Der Q3 Score war :91.8859649122807 |
||
+ | Der Q3EScore war :85.96491228070175 |
||
+ | Der Q3HScore war :90.3061224489796 |
||
+ | Der Q3CScore war :95.07389162561576 |
||
+ | DerSOV war :95.47415121428278 |
||
+ | ReProf |
||
+ | 1AUI |
||
+ | Der Q3 Score war :56.584362139917694 |
||
+ | Der Q3EScore war :67.21311475409836 |
||
+ | Der Q3HScore war :34.04255319148936 |
||
+ | Der Q3CScore war :65.49295774647888 |
||
+ | DerSOV war :28.89369677101363 |
||
+ | 1A6Z |
||
+ | Der Q3 Score war :61.594202898550726 |
||
+ | Der Q3EScore war :60.55045871559633 |
||
+ | Der Q3HScore war :31.428571428571427 |
||
+ | Der Q3CScore war :84.5360824742268 |
||
+ | DerSOV war :50.40497453572529 |
||
+ | 1KR4 |
||
+ | Der Q3 Score war :57.42574257425743 |
||
+ | Der Q3EScore war :26.19047619047619 |
||
+ | Der Q3HScore war :97.29729729729729 |
||
+ | Der Q3CScore war :50.0 |
||
+ | DerSOV war :60.36539368222537 |
||
+ | 2BNH |
||
+ | Der Q3 Score war :60.96491228070175 |
||
+ | Der Q3EScore war :21.05263157894737 |
||
+ | Der Q3HScore war :71.93877551020408 |
||
+ | Der Q3CScore war :61.576354679802954 |
||
+ | DerSOV war :66.02813756784477 |
||
+ | |||
+ | |||
+ | |||
+ | |||
+ | PsiPred |
||
+ | 1AUI |
||
+ | 1AUIRP |
||
+ | {| class="wikitable" |
||
+ | | Scoring method |
||
+ | | PsiPred |
||
+ | | ReProf |
||
+ | |- |
||
+ | | Q3 |
||
+ | | 72.8395061728395 |
||
+ | | 56.584362139917694 |
||
+ | |- |
||
+ | | Q3E |
||
+ | | 52.459016393442624 |
||
+ | | 67.21311475409836 |
||
+ | |- |
||
+ | | Q3H |
||
+ | | 77.30496453900709 |
||
+ | | 34.04255319148936 |
||
+ | |- |
||
+ | | Q3C |
||
+ | | 75.0 |
||
+ | | 65.49295774647888 |
||
+ | |- |
||
+ | | SOV |
||
+ | | 51.9899480602258 |
||
+ | | 28.89369677101363 |
||
+ | |- |
||
+ | |} |
||
+ | 1A6Z |
||
+ | 1A6ZRP |
||
+ | {| class="wikitable" |
||
+ | | Scoring method |
||
+ | | PsiPred |
||
+ | | ReProf |
||
+ | |- |
||
+ | | Q3 |
||
+ | | 76.08695652173913 |
||
+ | | 61.594202898550726 |
||
+ | |- |
||
+ | | Q3E |
||
+ | | 61.46788990825688 |
||
+ | | 60.55045871559633 |
||
+ | |- |
||
+ | | Q3H |
||
+ | | 74.28571428571429 |
||
+ | | 31.428571428571427 |
||
+ | |- |
||
+ | | Q3C |
||
+ | | 93.81443298969072 |
||
+ | | 84.5360824742268 |
||
+ | |- |
||
+ | | SOV |
||
+ | | 71.38081469812238 |
||
+ | | 50.40497453572529 |
||
+ | |- |
||
+ | |} |
||
+ | 1KR4 |
||
+ | 1KR4RP |
||
+ | {| class="wikitable" |
||
+ | | Scoring method |
||
+ | | PsiPred |
||
+ | | ReProf |
||
+ | |- |
||
+ | | Q3 |
||
+ | | 85.14851485148515 |
||
+ | | 57.42574257425743 |
||
+ | |- |
||
+ | | Q3E |
||
+ | | 76.19047619047619 |
||
+ | | 26.19047619047619 |
||
+ | |- |
||
+ | | Q3H |
||
+ | | 91.89189189189189 |
||
+ | | 97.29729729729729 |
||
+ | |- |
||
+ | | Q3C |
||
+ | | 90.9090909090909 |
||
+ | | 50.0 |
||
+ | |- |
||
+ | | SOV |
||
+ | | 82.61494252873563 |
||
+ | | 60.36539368222537 |
||
+ | |- |
||
+ | |} |
||
+ | 2BNH |
||
+ | 2BNHRP |
||
+ | {| class="wikitable" |
||
+ | | Scoring method |
||
+ | | PsiPred |
||
+ | | ReProf |
||
+ | |- |
||
+ | | Q3 |
||
+ | | 91.8859649122807 |
||
+ | | 60.96491228070175 |
||
+ | |- |
||
+ | | Q3E |
||
+ | | 85.96491228070175 |
||
+ | | 21.05263157894737 |
||
+ | |- |
||
+ | | Q3H |
||
+ | | 90.3061224489796 |
||
+ | | 71.93877551020408 |
||
+ | |- |
||
+ | | Q3C |
||
+ | | 95.07389162561576 |
||
+ | | 61.576354679802954 |
||
+ | |- |
||
+ | | SOV |
||
+ | | 95.47415121428278 |
||
+ | | 66.02813756784477 |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | |||
+ | ---> |
||
== Disorder == |
== Disorder == |
||
Line 36: | Line 421: | ||
| align="right" | [[File:Hemo_dis_q9x0e6.png|thumb|200px|left|Q9X0E6]] |
| align="right" | [[File:Hemo_dis_q9x0e6.png|thumb|200px|left|Q9X0E6]] |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left"| <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left"| <font size=1>'''Table 5:''' IUPred predictions for Q30201, P10775, Q08209, and Q9X0E6. The figures show the disorder probability predicted for each amino acid residue (green line) and the 50% threshold (red line). |
|- |
|- |
||
|} |
|} |
||
Line 42: | Line 427: | ||
<figure id="map92"> |
<figure id="map92"> |
||
− | [[File:Hemo_dp00092.gif|thumb|480px|right|<font size=1>'''Figure |
+ | [[File:Hemo_dp00092.gif|thumb|480px|right|<font size=1>'''Figure 1:''' DisProt map with ordered (blue) and disordered (red) regions for Q08209 (DP00092).]] |
</figure> |
</figure> |
||
Line 52: | Line 437: | ||
For P10775 no disordered regions are predicted (upper right figure in <xr id="iupred"/>). There is also no entry in DisProt. A PsiBlast search results in one significant hit (DP00554), but the alignment does not include the hit's disordered region (31-50). |
For P10775 no disordered regions are predicted (upper right figure in <xr id="iupred"/>). There is also no entry in DisProt. A PsiBlast search results in one significant hit (DP00554), but the alignment does not include the hit's disordered region (31-50). |
||
− | DisProt does have an entry for Q08209 (DP00092). A PsiBlast search also results in an additional significant hit (DP00365), but the alignment does not contain the disordered region (19-147), so it can be discarded. A comparison between the DisProt Map (<xr id="map92"/>) and the IUPred prediction (lower left figure in <xr id="iupred"/>) shows that the general predictions are true, although IUPred inserts a small ordered region at the end of the protein (which should be disordered |
+ | DisProt does have an entry for Q08209 (DP00092). A PsiBlast search also results in an additional significant hit (DP00365), but the alignment does not contain the disordered region (19-147), so it can be discarded. A comparison between the DisProt Map (<xr id="map92"/>) and the IUPred prediction (lower left figure in <xr id="iupred"/>) shows that the general predictions are true, although IUPred inserts a small ordered region at the end of the protein (about 374-425) which should be disordered. The disordered regions from residue 374-486 are known to make a disorder-order transition which might cause IUPred's vague prediction within this section. |
Neither IUPred (lower right figure in <xr id="iupred"/>) nor DisProt suggest any disordered regions for Q9X0E6. |
Neither IUPred (lower right figure in <xr id="iupred"/>) nor DisProt suggest any disordered regions for Q9X0E6. |
||
Line 62: | Line 447: | ||
== Transmembrane Helices == |
== Transmembrane Helices == |
||
+ | |||
+ | Transmembrane helices were predicted with PolyPhobius for HFE ([http://www.uniprot.org/uniprot/Q30201 Q30201]), DRD3 ([http://www.uniprot.org/uniprot/P35462 P35462]), Aquaporin-4 ([http://www.uniprot.org/uniprot/P47863 P47863]), and KvAP ([http://www.uniprot.org/uniprot/Q9YDF8 Q9YDF8]). The results were compared to [http://opm.phar.umich.edu/ OPM], [http://pdbtm.enzim.hu/ PDBTM], and [http://www.uniprot.org/ UniProt]. The PDB IDs for OPM and PDBTM were chosen based on the following criteria: |
||
+ | * wildtype over mutant |
||
+ | * higher coverage |
||
+ | * better resolution |
||
+ | |||
+ | |||
+ | UniProt -> PDB mapping: |
||
+ | * P35462 -> 3PBL |
||
+ | * P47863 -> 2D57 |
||
+ | * Q9YDF8 -> 1ORQ/1ORS |
||
+ | |||
+ | <br style="clear:both;"> |
||
+ | |||
+ | === Q30201 === |
||
+ | |||
+ | |||
+ | PolyPhobius predicts only one transmembrane helix for Q30201 (see <xr id="tmh_q30201"/>). There is no entry in OPM or PDBTM for either of its PDB IDs, but UniProt lists a TMH which almost exactly matches the predicted one (1-residue-shift). |
||
+ | |||
<figtable id="tmh_q30201"> |
<figtable id="tmh_q30201"> |
||
Line 81: | Line 485: | ||
| style="border-style: solid; border-width: 0 0 0 0" |no entry |
| style="border-style: solid; border-width: 0 0 0 0" |no entry |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 6''': TMH predictions and annotations for Q30201. There were no entries for either of the two PDB IDs (1A6Z, 1DE4) in OPM or PDBTM. |
|} |
|} |
||
</figtable> |
</figtable> |
||
+ | |||
+ | <br style="clear:both;"> |
||
+ | |||
+ | === P35462 === |
||
+ | |||
+ | |||
+ | For P35462 all methods list 7 transmembrane helices (<xr id="tmh_p35462"/>) which are consistent (regarding their positions) throughout all methods. |
||
Line 134: | Line 545: | ||
| style="border-style: solid; border-width: 0 0 0 0" |368-382 |
| style="border-style: solid; border-width: 0 0 0 0" |368-382 |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 7''': TMH predictions and annotations for P35462 (PDB ID: 3PBL). |
|} |
|} |
||
</figtable> |
</figtable> |
||
+ | |||
+ | <br style="clear:both;"> |
||
+ | |||
+ | === P47863 === |
||
+ | |||
+ | |||
+ | PolyPhobius, UniProt, and PDBTM list 6 TMHs for P47863, OPM lists two additional TMHs (see <xr id="tmh_p47863"/>). These two regions are listed as "Membrane Loop" in PDBTM which might be the cause for the false entries in OPM. |
||
Line 192: | Line 610: | ||
| style="border-style: solid; border-width: 0 0 0 0" |231-248 |
| style="border-style: solid; border-width: 0 0 0 0" |231-248 |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 8''': TMH predictions and annotations for P47863 (PDB ID: 2D57). TMH3 and TMH7 (marked with *) are listed as "Membrane Loop" in PDBTM. |
|} |
|} |
||
</figtable> |
</figtable> |
||
+ | |||
+ | <br style="clear:both;"> |
||
+ | |||
+ | === Q9YDF8 === |
||
+ | |||
+ | |||
+ | Q9YDF8 seems to be the hardest one to predict TMHs for (cf. <xr id="tmh_q9ydf8"/>). PolyPhobius predicts an additional TMH (compared to UniProt); OPM and PDBTM need two PDB IDs to identify all (and "false") TMHs. Both PDB entries were adjusted for an AA shift of 13 residues. It should also be noted that PsiBlast didn't find any hits for Q9YDF8, so no homology information could be used for PolyPhobius. |
||
+ | |||
+ | PolyPhobius predicted a region (TMH7), labeled as "Intramembrane - Pore-Forming" in UniProt, as a (false) TMH. OPM also included this region and an additional one labeled as "Intramembrane - Helical" in UniProt. PDBTM lists TMH7 as "Membrane Loop". |
||
Line 270: | Line 697: | ||
| style="border-style: solid; border-width: 0 0 0 0" |222-249 |
| style="border-style: solid; border-width: 0 0 0 0" |222-249 |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 9''': TMH predictions and annotations for Q9YDF8 (PDB IDs: 1ORQ, 1ORS). Residue positions are adjusted for the PDB sequence's 13AA shift. TMH3 is annotated as "Intramembrane, Helical" in UniProt, TMH7 as "Intramembrane, Pore-Forming". TMH7 is additionally marked as "Membrane Loop" in PDBTM. |
|} |
|} |
||
</figtable> |
</figtable> |
||
+ | |||
+ | <br style="clear:both;"> |
||
+ | |||
+ | === Comparison === |
||
+ | |||
+ | |||
+ | PolyPhobius predicts the transmembrane helices very well. With the exception of TMH7 in Q9YDF8 it never predicts a false TMH nor misses a true one. Compared to UniProt and OPM it tends to shift the TMHs to the right, while it encloses PDBTM's helices. PolyPhobius, UniProt, and OPM annotate transmembrane helices with an average length of about 21, PDBTM has shorter TMHs with a mean of 18. So it seems that PDBTM is a little bit more cautious to annotate TMHs, while OPM doesn't distinguish between transmembrane helices and other (intra)membrane structures such as membrane loops and intramembrane helices. PolyPhobius' strength, the use of homology information, can be seen in the case of Q9YDF8 where PsiBlast didn't provide any hits, as there the deviation from the other annotations is the biggest. |
||
+ | |||
+ | The results suggest that PDBTM would be the best one to use if you want the least false positives, but compared to PolyPhobius it is quite limited in that you have to provide a PDB entry in the first place. When only sequence information is available PolyPhobius should provide reliable predictions, especially if there is homology information. |
||
<br style="clear:both;"> |
<br style="clear:both;"> |
||
Line 288: | Line 724: | ||
| align="right" | [[File:hemo_sp_p02768.png|thumb|200px|P02768]] |
| align="right" | [[File:hemo_sp_p02768.png|thumb|200px|P02768]] |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left"| <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left"| <font size=1>'''Table 10:''' SignalP predictions for Q30201, P47863, P11279, and P02768. Each figure shows the C-score, S-score, and Y-score per residue position for the corresponding protein. |
|- |
|- |
||
|} |
|} |
||
</figtable> |
</figtable> |
||
+ | [http://www.cbs.dtu.dk/services/SignalP/ SignalP (Webserver 4.0)] predictions were made for HFE ([http://www.uniprot.org/uniprot/Q30201 Q30201]), Aquaporin-4 ([http://www.uniprot.org/uniprot/P47863 P47863]), Lysosome-associated membrane glycoprotein 1 ([http://www.uniprot.org/uniprot/P11279 P11279]), and Serum albumin ([http://www.uniprot.org/uniprot/P02768 P02768]) in order to find signal peptides within these sequences. The results are shown in <xr id="signalp"/> and were compared to the corresponding entries in UniProt. A high S-score indicates that an AA is part of the signal peptide, a low score that it is part of the mature protein. A possible cleavage site is represented by a high C-score. The Y-score is a combination of the other scores and a better indicator for the cleavage site than the C-score alone. |
||
− | '''TODO:''' score description |
||
− | |||
− | [http://www.cbs.dtu.dk/services/SignalP/ SignalP (Webserver 4.0)] predictions were made for HFE ([http://www.uniprot.org/uniprot/Q30201 Q30201]), Aquaporin-4 ([http://www.uniprot.org/uniprot/P47863 P47863]), Lysosome-associated membrane glycoprotein 1 ([http://www.uniprot.org/uniprot/P11279 P11279]), and Serum albumin ([http://www.uniprot.org/uniprot/P02768 P02768]) in order to find signal peptides within these sequences. The results are shown in <xr id="signalp"/> and were compared to the corresponding entries in UniProt. |
||
According to UniProt all four predictions are 100% precise: |
According to UniProt all four predictions are 100% precise: |
||
Line 316: | Line 750: | ||
=== GOPET === |
=== GOPET === |
||
− | GOPET predicts only two GO terms for our protein (see <xr id="gopet"/>) and even they are somewhat redundant (both are receptor activity). |
+ | GOPET predicts only two GO terms for our protein (see <xr id="gopet"/>) and even they are somewhat redundant (both are receptor activity). Although they do not match the QuickGO terms for HFE, they are eligible in that HFE has kind of a receptor activity when in complex with transferrin receptor (TFR). |
Line 337: | Line 771: | ||
| style="border-style: solid; border-width: 0 0 0 0" |MHC class I receptor activity |
| style="border-style: solid; border-width: 0 0 0 0" |MHC class I receptor activity |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 11''': GO term prediction with GOPET for Q30201. |
|} |
|} |
||
</figtable> |
</figtable> |
||
Line 431: | Line 865: | ||
| style="border-style: solid; border-width: 0 0 0 0" |4.486* |
| style="border-style: solid; border-width: 0 0 0 0" |4.486* |
||
|- |
|- |
||
− | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table |
+ | |+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 12''': GO term prediction with ProtFun for Q30201. Entries marked with asterisks (*) had been deemed "true" by ProtFun. Results with a probability below 0.1 and odds below 1.0 are not shown. |
|} |
|} |
||
</figtable> |
</figtable> |
||
Line 488: | Line 922: | ||
=== Pfam === |
=== Pfam === |
||
+ | <figure id="pfam"> |
||
− | Pfam lists two significant results for Q30201: |
||
+ | [[File:Hemo_pfam_q30201.png|thumb|480px|right|<font size=1>'''Figure 2:''' Pfam map for Q30201 with the two Pfam domains, the signal peptide (yellow), and the transmembrane helix (red) at the end.]] |
||
− | * MHC_I - Class I Histocompatibility antigen, domains alpha 1 and 2 (E-value 3.5e-43) |
||
+ | </figure> |
||
− | * C1-set - Immunoglobulin C1-set domain (E-value 2.8e-18) |
||
+ | |||
+ | Pfam lists two significant results for Q30201 (cf. <xr id="pfam"/>): |
||
+ | * PF00129: MHC_I - Class I Histocompatibility antigen, domains alpha 1 and 2 (E-value 3.5e-43) |
||
+ | * PF07654: C1-set - Immunoglobulin C1-set domain (E-value 2.8e-18) |
||
− | MHC class I proteins are |
+ | MHC class I proteins are cell surface receptors and involved in immune responses. UniProt also lists HFE in the MHC class I family and its structure (three extracellular domains, transmembrane region, cytoplasmic tail) fits. |
C1-set domains are associated with MHC class I proteins and HFE indeed contains such a domain (residues 207-298) |
C1-set domains are associated with MHC class I proteins and HFE indeed contains such a domain (residues 207-298) |
||
Line 501: | Line 939: | ||
Compared to QuickGO which lists 27 unique GO terms for Q30201, GOPET predicts only two. Both of them not included in QuickGO's list. These two also seem to fit the HFE-TFR complex better than HFE alone, but at least the MHC class I tag shows specificity to HFE. |
Compared to QuickGO which lists 27 unique GO terms for Q30201, GOPET predicts only two. Both of them not included in QuickGO's list. These two also seem to fit the HFE-TFR complex better than HFE alone, but at least the MHC class I tag shows specificity to HFE. |
||
− | ProtFun's prediction seems more accurate as it successfully identifies HFE's location within the membrane and lists "Transport and binding" as a good second result. "Immune response" is also in accordance to QuickGO's |
+ | ProtFun's prediction seems more accurate as it successfully identifies HFE's location within the membrane and lists "Transport and binding" as a good second result. "Immune response" is also in accordance to QuickGO's terms. |
Pfam's two predicted families were both true positives and it was more informative that the other two methods. |
Pfam's two predicted families were both true positives and it was more informative that the other two methods. |
||
Line 508: | Line 946: | ||
<br style="clear:both;"> |
<br style="clear:both;"> |
||
+ | |||
+ | == Conclusion == |
||
+ | |||
+ | All these methods can be used to extract more information from just the sequence. As most of it is reliable, these methods are able to generate additional info for further experiments, more insight on the structure of the protein and more, in very few time (compared to experimentally generated data). |
Latest revision as of 08:29, 22 May 2012
Hemochromatosis>>Task 3: Sequence-based predictions
Contents
Short Task Description
Detailed description: Sequence-Based Predictions
In this part of the wiki we present our results on different sequence based prediction methods.
These cover the prediction of secondary structure, disordered regions, transmembrane helices, signal peptides and GO annotations.
Protocol
A protocol with a description of the data acquisition and other scripts used for this task is available here.
Secondary Structure
In the following the secondary structure predictions were evaluated against the DSSP data. The DSSP data was parsed so that only H(helix), E(sheet) and C(coil) are existant. Nonanalyzed positions that exists in the (for DSSP) used sequence were denoted as "*" in the sequence and (from us) predicted as coil.
Afterwards Q3 and SOV scores were evaluated, where Q3 denotes the percentage of right assigned secondary structures. The SOV is a scoring to calculate how good single secondary structure fragments are approximated. This means
CCCCCHHHHHHHHHCCCCC CCCCCHCHCHCHCHCCCCC
gets a much lower score than
CCCCCHHHHHHHHHCCCCC CCCCCCCHHHHHCCCCCCC
although their Q3 scores dont differ. The maximum score is here also 100%. this gives some more insight about the predictions.
The Q3E, Q3H and Q3C score denote the percentage amount of correctly predicted E/H/C secondary structures.
The used Sequences for this were the "aligned" secondary structure sequences.
1KR4
DSSPSQ: ALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEYXNWLRESVLGS PsiPSQ: MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL RPRFSQ: MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL DSSPSS: CCEECCCEEEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEEEECCCCEEHHHHHHHHHHCCCC PsiPSS: CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCCCHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC RPRFSS: CEEEEECCCCHHHHHHHHHHHHHHHHHHHHCHCHHHCCCEEECEEECCHHHHHHHCCCHHHHHHHHHHHHHCCCCCCCHHEHHHHHHHHHHHHHHHHHHCC
<figtable id="Q3SOV1KR4">
Scoring method | PsiPred | ReProf |
---|---|---|
Q3 | 85.15 | 57.43 |
Q3E | 76.19 | 26.19 |
Q3H | 91.89 | 97.30 |
Q3C | 90.91 | 50.00 |
SOV | 82.61 | 60.37 |
</figtable>
1AUI
DSSPSQ: **************TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP PsiPSQ: MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP RPRFSQ: MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYP DSSPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCC PsiPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCC RPRFSS: CCCCCCCCCCCCCCCCEEEEECCCCCCCEEEEEEECCCCCCEEEEEHHHECCCCCCCEEEEEEEEECCCCEEECCCCCCCCCCCEEEEECCCCHHHHHHEEEEECCCCCCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCC -------------------------------------------------------------------------------------------------------------------------------------- DSSPSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA PSIPSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA RPRFSQ: KTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEA DSSPSS: CCEEECCCCCCCHHHHHHCCHHHHHHHHCCHHHHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCCCCCCEEECHHHHHHHHHHCCCCEEEECCCC PsiPSS: CCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHHHCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHH RPRFSS: CEEEEEECCCCCCEEEEEEEEEEEEEEEEECHHHHHHHHHCCCCCHHHHHCCCEEEEECCCCCCCCCHHHHHHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCCCECCCCCECCEEEEECCCCEEEEHCCCCHHHHEHHHCC ---------------------------------------------------------------------------------------------------------------------------------------------- DSSPSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS**************************************************** PsiPSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESVLTLKG RPRFSQ: QDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESVLTLKG DSSPSS: CCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEEECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC PSIPSS: HHHCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCC RPRFSS: CCCCEEEEEECCCCCCCEEEEEEECCCEEEEECCCEEEEEECCCEEEEEEECCCCCCCCCCCCCEEEEEECCCCCHHHHHHHHHHHEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCHHEEEEEEEECCCCCEEEEEC ---------------------------------------------------------------------------------------------------------------------------------------------- DSSPSQ: *******************************************SFEEAKGLDRINERMPPR PsiPSQ: LTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ RPRFSQ: LTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ DSSPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCC PSIPSS: CCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC RPRFSS: CCCCCCCCCCEECCCCEEEEEEEEEEECCHHHHCCCCCCCEEEEHHHHCCCCHHCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCC
<figtable id="Q3SOV1AUI">
Scoring method | PsiPred | ReProf |
---|---|---|
Q3 | 72.84 | 56.58 |
Q3E | 52.46 | 67.21 |
Q3H | 77.30 | 34.04 |
Q3C | 75.00 | 65.49 |
SOV | 51.99 | 28.89 |
</figtable>
2BNH
DSSPSQ: *MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL PsiPSQ: MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL RPRFSQ: MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVL DSSPSS: CCECCEECCCCCHHHHHHHHHHHCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHCHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHH PsiPSS: CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHH RPRFSS: CCCCECHHCCCCCHHHHHHHHHHHCCEEEECCCCCCHHHHHHHHHHHHCCCCHHHHHHHHCCCCCCCHEEEHCCCCCCCCEEEEECCCCCCCCHCCCCCCHHHHCHCHHHHHHCCCCCCCCHHHHHHHHHCCCCCHCCHHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHH ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ DSSPSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC PsiPSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC RPRFSQ: GQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLC DSSPSS: HHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHCCCCCCCCCEEE PsiPSS: HHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHCCCCCCCEEEEE RPRFSS: CCCCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHCHCCHHHCCCCCCCCCHHHHHHHCCCCCCCHHHHCHHEEEHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCHHHHHHHHCHHHHHHHHHHHHHHHCCHHHHHHHCCCCCCCCHHHHHHHHHHCCCCCEEEEEE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ DSSPSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS PsiPSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS RPRFSQ: LGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS DSSPSS: CCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC PsiPSS: CCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC RPRFSS: ECCCCCCCCCHHHHHHHHHHHCCHHHHCCCCCCCCCHHHHHHHCCCCCCCCHHHHHHHCCCCCCHHHHHHHHHHHCCCCCCCECC
<figtable id="Q3SOV2BNH">
Scoring method | PsiPred | ReProf |
---|---|---|
Q3 | 91.89 | 60.96 |
Q3E | 85.96 | 21.05 |
Q3H | 90.31 | 71.94 |
Q3C | 95.07 | 61.58 |
SOV | 95.47 | 66.03 |
</figtable>
1A6Z
DSSPSQ: ****RSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGaEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI PsiPSQ: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI RPRFSQ: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKI DSSPSS: CCCCCCEEEEEEEEEEECCCCCCECCEEEEEECCEEEEEEECCCCCEEECCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEEEEEEECCCCCEEEEEEEEECCEEEEEEEHHHCEEEECCHHHHHHHHHHHCCCH PsiPSS: CCCCCHHHHHHHHHHHHHHHCCCCCCCCCCCEEEEEECCCCCCCCEEEEEEEECCEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEECCCEECCCCCCCCEEEECCCCCCCCCCCCCCCCEECCCCHHHHHHHHHHHHHH RPRFSS: CCCCCCHHHHHHHHHHHHHHCCCEEHHCCCEEEEECCCCHCCCCHHHHHHCCCCCEEEEEECCCCCCCCCCCCCECCCCCHHHHHHHHHCCCCCCCEEEEEHEEEHCCCCCCCCCCEEEEEEEEEECCCCCCCCEEEECCCCCCCEEECCCCCCCCCCCCCCCCCCCCHHHHEE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ DSSPSQ: RARQNRAYLERDaPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRbRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTbQVEHPGLDQPLIVIW PsiPSQ: RARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE RPRFSQ: RARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE DSSPSS: HHHHHHHHHHCHHHHHHHHHHHHHCCCCCCCECCEEEEEEEECCCCEEEEEEEEEEECCCCEEEEEECCEECCHHHCCCCEEEECCCCCEEEEEEEEECCCHHHHEEEEEECCCCCCCEEEEC PsiPSS: HHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCCCCCEEEECCCCCCCCEEEEEECCCCCCCCEEEEEECCCCCCCCCCCCCCCEECCCCCCEEEEEEEECCCCCCCEEEEEECCCCCCCEEEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCC RPRFSS: EEECCCHCCCCCCHHHHHHHHHHCCCCCCCCCCCCEEEEEEECCCCEEEEEEEECCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCEEEEEEEECCCCCCEEEEEEEECCCCCCCEEEEEECCCCCEEEEEEHHHHHHHHHHHHHHHHHHHEECCCCCCCCCEEEEEECCC
<figtable id="Q3SOV1A6Z">
Scoring method | PsiPred | ReProf |
---|---|---|
Q3 | 76.09 | 61.59 |
Q3E | 61.47 | 60.55 |
Q3H | 74.29 | 31.428 |
Q3C | 93.81 | 84.54 |
SOV | 71.38 | 50.40 |
</figtable>
You can see, that the PsiPred Q3Score is for these proteins in a range from 72 to 92% and the SOV-score is in a range from 52 to 96%. As here are only four proteins this probably does not reflect the general performance of the prediction, but one can gain insight from this. When just looking at the annotated "aligned" secondary structure sequences, it looks like a fairly good prediction (also when looking at the protein 1AUI although the SOV is quite low. This is very likely caused by the fact that many short H/E sequences are not correctly predicted). Another problem occurs at the regions without DSSP-data. Because this is a disordered region the results my be viewed in addition to a disorder prediction. This could give additional informations for both secondary structure and disorder.
This means the predictions should be reliable to gain more insight of the proteins secondary structure.
The ReProf predictions do not get such good results. The Q3Score ranges from 56 to 62%, the SOVScore from 29 to 66%. This means that this prediction is far less reliable than the PsiPred predictions, which should be preferred. This may due to the use of only four sequences not reflect the general performance, and this prediction method may therefore be better on other proteins.
Disorder
<figtable id="iupred">
</figtable>
<figure id="map92">
</figure>
IUPred was employed to find disordered regions within HFE (Q30201), RNH1 (P10775), PPP3CA (Q08209), and cutA (Q9X0E6). The results are shown in <xr id="iupred"/>. DisProt was used to validate the predictions.
As shown in the upper left figure (<xr id="iupred"/>) Q30201 has two small regions (around residue 250 and 285) where it might be disordered. There is no entry for Q30201 in DisProt that would suggest that this is true and a sequence search (PsiBlast) against DisProt did not yield any significant results.
For P10775 no disordered regions are predicted (upper right figure in <xr id="iupred"/>). There is also no entry in DisProt. A PsiBlast search results in one significant hit (DP00554), but the alignment does not include the hit's disordered region (31-50).
DisProt does have an entry for Q08209 (DP00092). A PsiBlast search also results in an additional significant hit (DP00365), but the alignment does not contain the disordered region (19-147), so it can be discarded. A comparison between the DisProt Map (<xr id="map92"/>) and the IUPred prediction (lower left figure in <xr id="iupred"/>) shows that the general predictions are true, although IUPred inserts a small ordered region at the end of the protein (about 374-425) which should be disordered. The disordered regions from residue 374-486 are known to make a disorder-order transition which might cause IUPred's vague prediction within this section.
Neither IUPred (lower right figure in <xr id="iupred"/>) nor DisProt suggest any disordered regions for Q9X0E6.
IUPred seems to be quite accurate in predicting completely ordered proteins (P10775, Q9X0E6, and with the exception of the small peak in Q30201), but it seems to have problems with disordered regions where a disorder-order transition occurs.
Transmembrane Helices
Transmembrane helices were predicted with PolyPhobius for HFE (Q30201), DRD3 (P35462), Aquaporin-4 (P47863), and KvAP (Q9YDF8). The results were compared to OPM, PDBTM, and UniProt. The PDB IDs for OPM and PDBTM were chosen based on the following criteria:
- wildtype over mutant
- higher coverage
- better resolution
UniProt -> PDB mapping:
- P35462 -> 3PBL
- P47863 -> 2D57
- Q9YDF8 -> 1ORQ/1ORS
Q30201
PolyPhobius predicts only one transmembrane helix for Q30201 (see <xr id="tmh_q30201"/>). There is no entry in OPM or PDBTM for either of its PDB IDs, but UniProt lists a TMH which almost exactly matches the predicted one (1-residue-shift).
<figtable id="tmh_q30201">
Q30201 | TMH 1 |
---|---|
PolyPhobius | 306-329 |
UniProt | 307-330 |
OPM | no entry |
PDBTM | no entry |
</figtable>
P35462
For P35462 all methods list 7 transmembrane helices (<xr id="tmh_p35462"/>) which are consistent (regarding their positions) throughout all methods.
<figtable id="tmh_p35462">
P35462 (3PBL) | TMH 1 | TMH 2 | TMH 3 | TMH 4 | TMH 5 | TMH 6 | TMH 7 |
---|---|---|---|---|---|---|---|
PolyPhobius | 30-55 | 66-88 | 105-126 | 150-170 | 188-212 | 329-352 | 367-386 |
UniProt | 33-55 | 66-88 | 105-126 | 150-170 | 188-212 | 330-351 | 367-388 |
OPM | 34-52 | 67-91 | 101-126 | 150-170 | 187-209 | 330-351 | 363-386 |
PDBTM | 35-52 | 68-84 | 109-123 | 152-166 | 191-206 | 334-347 | 368-382 |
</figtable>
P47863
PolyPhobius, UniProt, and PDBTM list 6 TMHs for P47863, OPM lists two additional TMHs (see <xr id="tmh_p47863"/>). These two regions are listed as "Membrane Loop" in PDBTM which might be the cause for the false entries in OPM.
<figtable id="tmh_p47863">
P47863 (2D57) | TMH 1 | TMH 2 | TMH 3 | TMH 4 | TMH 5 | TMH 6 | TMH 7 | TMH 8 |
---|---|---|---|---|---|---|---|---|
PolyPhobius | 34-58 | 70-91 | 115-136 | 156-177 | 188-208 | 231-252 | ||
UniProt | 37-57 | 65-85 | 116-136 | 156-176 | 185-205 | 232-252 | ||
OPM | 34-56 | 70-88 | 98-107 | 112-136 | 156-178 | 189-203 | 214-223 | 231-252 |
PDBTM | 39-55 | 72-89 | 95-106* | 116-133 | 158-177 | 188-205 | 209-222* | 231-248 |
</figtable>
Q9YDF8
Q9YDF8 seems to be the hardest one to predict TMHs for (cf. <xr id="tmh_q9ydf8"/>). PolyPhobius predicts an additional TMH (compared to UniProt); OPM and PDBTM need two PDB IDs to identify all (and "false") TMHs. Both PDB entries were adjusted for an AA shift of 13 residues. It should also be noted that PsiBlast didn't find any hits for Q9YDF8, so no homology information could be used for PolyPhobius.
PolyPhobius predicted a region (TMH7), labeled as "Intramembrane - Pore-Forming" in UniProt, as a (false) TMH. OPM also included this region and an additional one labeled as "Intramembrane - Helical" in UniProt. PDBTM lists TMH7 as "Membrane Loop".
<figtable id="tmh_q9ydf8">
Q9YDF8 (1ORQ/1ORS) | TMH 1 | TMH 2 | TMH 3 | TMH 4 | TMH 5 | TMH 6 | TMH 7 | TMH 8 |
---|---|---|---|---|---|---|---|---|
PolyPhobius | 42-60 | 68-88 | 108-129 | 137-157 | 163-184 | 196-213 | 224-244 | |
UniProt | 39-63 | 68-92 | 97-105* | 109-125 | 129-145 | 160-184 | 196-208* | 222-253 |
OPM (1ORS) | 38-59 | 68-91 | 99-110 | 113-120 | 130-161 | |||
OPM (1ORQ) | 166-185 | 196-208 | 220-238 | |||||
PDBTM (1ORS) | 40-63 | 68-88 | 101-120 | 131-155 | ||||
PDBTM (1ORQ) | 34-65 | 70-93 | 164-184 | 197-213* | 222-249 |
</figtable>
Comparison
PolyPhobius predicts the transmembrane helices very well. With the exception of TMH7 in Q9YDF8 it never predicts a false TMH nor misses a true one. Compared to UniProt and OPM it tends to shift the TMHs to the right, while it encloses PDBTM's helices. PolyPhobius, UniProt, and OPM annotate transmembrane helices with an average length of about 21, PDBTM has shorter TMHs with a mean of 18. So it seems that PDBTM is a little bit more cautious to annotate TMHs, while OPM doesn't distinguish between transmembrane helices and other (intra)membrane structures such as membrane loops and intramembrane helices. PolyPhobius' strength, the use of homology information, can be seen in the case of Q9YDF8 where PsiBlast didn't provide any hits, as there the deviation from the other annotations is the biggest.
The results suggest that PDBTM would be the best one to use if you want the least false positives, but compared to PolyPhobius it is quite limited in that you have to provide a PDB entry in the first place. When only sequence information is available PolyPhobius should provide reliable predictions, especially if there is homology information.
Signal Peptides
<figtable id="signalp">
</figtable>
SignalP (Webserver 4.0) predictions were made for HFE (Q30201), Aquaporin-4 (P47863), Lysosome-associated membrane glycoprotein 1 (P11279), and Serum albumin (P02768) in order to find signal peptides within these sequences. The results are shown in <xr id="signalp"/> and were compared to the corresponding entries in UniProt. A high S-score indicates that an AA is part of the signal peptide, a low score that it is part of the mature protein. A possible cleavage site is represented by a high C-score. The Y-score is a combination of the other scores and a better indicator for the cleavage site than the C-score alone.
According to UniProt all four predictions are 100% precise:
- Q30201: signal peptide 1-22
- P47863: no signal peptide
- P11279: signal peptide 1-28
- P02768: signal peptide 1-18
This makes SignalP an excellent candidate for signal peptide predictions.
GO Terms
For the last part of this task we used GOPET and ProtFun to make a GO term prediction for the HFE protein (Q30201). We did also search for Pfam families. The results were then compared to UniProt and QuickGO.
GOPET
GOPET predicts only two GO terms for our protein (see <xr id="gopet"/>) and even they are somewhat redundant (both are receptor activity). Although they do not match the QuickGO terms for HFE, they are eligible in that HFE has kind of a receptor activity when in complex with transferrin receptor (TFR).
<figtable id="gopet">
GOid | Aspect | Confidence | Go term |
---|---|---|---|
GO:0004872 | F (Molecular Function Ontology) | 91% | receptor activity |
GO:0030106 | F (Molecular Function Ontology) | 88% | MHC class I receptor activity |
</figtable>
ProtFun
The results for the ProtFun prediction are shown in <xr id="protfun"/>. Predictions with a probability below 0.1 and odds below 1.0 are not shown to decrease the size of the table. ProtFun predicts "cell envelope" for the functional category. This is true as the HFE-TFR complex is located in the membrane. "Transport and binding" also has a high probability which corresponds with HFE's part in the iron transport within the body. HFE is categorized as "Nonenzyme" and no enzyme class was predicted. It is further predicted to be involved in "Immune response" as it is a protein of the major histocompatibility complex (MHC) class I.
<figtable id="protfun">
Functional category | Probability | Odds |
---|---|---|
Biosynthesis of cofactors | 0.105 | 1.452 |
Cell envelope* | 0.633* | 10.377* |
Cellular processes | 0.095 | 1.297 |
Central intermediary metabolism | 0.231 | 3.663 |
Fatty acid metabolism | 0.016 | 1.265 |
Purines and pyrimidines | 0.583 | 2.400 |
Translation | 0.079 | 1.801 |
Transport and binding | 0.732 | 1.785 |
Enzyme/nonenzyme | ||
Enzyme | 0.208 | 0.727 |
Nonenzyme* | 0.792* | 1.110* |
Enzyme class | ||
Hydrolase | 0.135 | 0.425 |
Lyase | 0.049 | 1.054 |
Gene Ontology category | ||
Signal transducer | 0.201 | 0.939 |
Receptor | 0.353 | 2.076 |
Stress response | 0.274 | 3.108 |
Immune response* | 0.381* | 4.486* |
</figtable>
Pfam
<figure id="pfam">
</figure>
Pfam lists two significant results for Q30201 (cf. <xr id="pfam"/>):
- PF00129: MHC_I - Class I Histocompatibility antigen, domains alpha 1 and 2 (E-value 3.5e-43)
- PF07654: C1-set - Immunoglobulin C1-set domain (E-value 2.8e-18)
MHC class I proteins are cell surface receptors and involved in immune responses. UniProt also lists HFE in the MHC class I family and its structure (three extracellular domains, transmembrane region, cytoplasmic tail) fits. C1-set domains are associated with MHC class I proteins and HFE indeed contains such a domain (residues 207-298)
Comparison
Compared to QuickGO which lists 27 unique GO terms for Q30201, GOPET predicts only two. Both of them not included in QuickGO's list. These two also seem to fit the HFE-TFR complex better than HFE alone, but at least the MHC class I tag shows specificity to HFE.
ProtFun's prediction seems more accurate as it successfully identifies HFE's location within the membrane and lists "Transport and binding" as a good second result. "Immune response" is also in accordance to QuickGO's terms.
Pfam's two predicted families were both true positives and it was more informative that the other two methods.
Overall none of them did identify HFE's part in the iron transport.
Conclusion
All these methods can be used to extract more information from just the sequence. As most of it is reliable, these methods are able to generate additional info for further experiments, more insight on the structure of the protein and more, in very few time (compared to experimentally generated data).