Difference between revisions of "Sequence-Based Predictions Hemochromatosis"

Revision as of 10:32, 20 May 2012

Hemochromatosis>>Task 3: Sequence-based predictions

Qc terxepssrw evi jypp sj aiewipw. Mrjsvq xli Uyiir, ws xlex wli qmklx wlss xliq eaec. Livi ai ks 'vsyrh xli qypfivvc fywl. Ks qsroic KS!

Aqw vjkpm K co etcba, dwv vjga ycpv aqw vq vjkpm vjcv. K mpqy ugetgvu. Mggr vjg rcpvcnqqpu. Cnycau mggr vjg rcpvcnqqpu.

Don't google it... but a hint: Caesar would solve it ;)

Short Task Description

Detailed description: Sequence-Based Predictions

In this part of the wiki we present our results on different sequence based prediction methods.

These cover the prediction of secondary structure, disordered regions, transmembrane helices, signal peptides and GO annotations.

TODO: Table numbers (once all tables are finished)

Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.

Secondary Structure

In the following the secondary structure predictions were evaluated against the DSSP data. The DSSP data was parsed so that only H(helix), E(sheet) and C(coil) are existant. Nonanalyzed positions that exists in the (for DSSP) used sequence were denoted as "*" in the sequence and (from us) predicted as coil.

Afterwards Q3 and SOV scores were evaluated, where Q3 denotes the percentage of right assigned secondary structures. The SOV is a scoring to calculate how good single secondary structure fragments are approximated. This means

CCCCCHHHHHHHHHCCCCC
CCCCCHCHCHCHCHCCCCC

gets a much lower score than

CCCCCHHHHHHHHHCCCCC
CCCCCCCHHHHHCCCCCCC

although their Q3 scores dont differ. The maximum score is here also 100%. this gives some more insight about the predictions.

The Q3E, Q3H and Q3C score denote the percentage amount of correctly predicted E/H/C secondary structures.

The used Sequences for this were the "aligned" secondary structure sequences.

1KR4

DSSPSQ: ALYFXGHXILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENILTEYXNWLRESVLGS
DSSPSS: CCEECCCEEEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEEEECCCCEEHHHHHHHHHHCCCC
PsiPSS:       CEEEEECCCCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEECCCCCHHHHHHHHHHHCCCCCCEEEEEECCCCCHHHHHHHHHHCC
PsiPSQ:       MILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIFTLKVENVLTEYMNWLRESVL

Q3 Score :85.14851485148515

Q3EScore: 76.19047619047619

Q3HScore: 91.89189189189189

Q3CScore: 90.9090909090909

SOV: 82.61494252873563

1AUI

DSSPSQ: **************TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICS***********************************************************************************************SFEEAKGLDRINERMPPR
DSSPSS: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHCCHHHHHHHHCCHHHHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCCCCCCEEECHHHHHHHHHHCCCCEEEECCCCCCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEEECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCC
PsiPSS:  CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHCCCCEEECCCEEEECCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCHHHHHHHHCCHHHHHHHHHHCCCCHHHHHCCCCEEEEECCCCCCCCCHHHHCCCCCCCCCCCCCCCCHHCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEEECCCCEEEEEECCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
PsiPSQ:  MSEPKAIDPKLSTTDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHGQFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSERVYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTVRGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQFNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSDDELGSEEDGFDGATAAARKEVIRNKIRAIGKMARVFSVLREESESVLTLKGLTPTGMLPSGVLSGGKQTLQSATVEAIEADEAIKGFSPQHKITSFEEAKGLDRINERMPPRRDAMPSDANLNSINKALTSETNGTDSNGSNSSNIQ

Q3 Score: 72.8395061728395

Q3EScore: 52.459016393442624

Q3HScore: 77.30496453900709

Q3CScore: 75.0

SOV: 51.9899480602258

2BNH

DSSPSQ: *MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
DSSPSS: CCECCEECCCCCHHHHHHHHHHHCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHCHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHCCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC
PsiPSS:  CEEECCCCCCCHHHHHHHHHHHCCCCEEEECCCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHCCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEEECCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCCHHHHHHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCEECC
PsiPSQ:  MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTCKIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASVLRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIAELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSLTAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDLSNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS

Q3 Score: 91.8859649122807

Q3EScore: 85.96491228070175

Q3HScore: 90.3061224489796

Q3CScore: 95.07389162561576

SOV: 95.47415121428278

1A6Z

DSSPSQ:                      ****RSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGaEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDaPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRbRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTbQVEHPGLDQPLIVIW
DSSPSS:                      CCCCCCEEEEEEEEEEECCCCCCECCEEEEEECCEEEEEEECCCCCEEECCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEEEEEEECCCCCEEEEEEEEECCEEEEEEEHHHCEEEECCHHHHHHHHHHHCCCHHHHHHHHHHHCHHHHHHHHHHHHHCCCCCCCECCEEEEEEEECCCCEEEEEEEEEEECCCCEEEEEECCEECCHHHCCCCEEEECCCCCEEEEEEEEECCCHHHHEEEEEECCCCCCCEEEEC
PsiPSS: CCCCCHHHHHHHHHHHHHHHCCCCCCCCCCCEEEEEECCCCCCCCEEEEEEEECCEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCEEEEECCCEECCCCCCCCEEEECCCCCCCCCCCCCCCCEECCCCHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCCCCCEEEECCCCCCCCEEEEEECCCCCCCCEEEEEECCCCCCCCCCCCCCCEECCCCCCEEEEEEEECCCCCCCEEEEEECCCCCCCEEEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCC
PsiPSQ: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE

Q3 Score: 76.08695652173913

Q3EScore: 61.46788990825688

Q3HScore: 74.28571428571429

Q3CScore: 93.81443298969072

SOV: 71.38081469812238

You can see, that the Q3Score is for these proteins in a range from 72 to 92% and the SOV-score is in a range from 52 to 96%. As here are only four proteins this probably does not reflect the general performance of the prediction, but one can gain insight from this. When just looking at the annotated "aligned" secondary structure sequences, it looks like a fairly good prediction (also when looking at the protein 1AUI although the SOV is quite low. This is very likely caused by the fact that many short H/E sequences are not correctly predicted). Another problem occurs at the regions without DSSP-data. Because this is a disordered region the results my be viewed in addition to a disorder prediction. This could give additional informations for both secondary structure and disorder.

This means the predictions should be reliable to gain more insight of the proteins secondary structure.

Disorder

**Table XXX:** IUPred predictions for Q30201, P10775, Q08209, and Q9X0E6. The figures show the disorder probability predicted for each amino acid residue (green line) and the 50% threshold (red line).
	Q30201	P10775
	Q08209	Q9X0E6

</figtable>

Figure XXX: DisProt map with ordered (blue) and disordered (red) regions for Q08209 (DP00092).

</figure>

IUPred was employed to find disordered regions within HFE (Q30201), RNH1 (P10775), PPP3CA (Q08209), and cutA (Q9X0E6). The results are shown in <xr id="iupred"/>. DisProt was used to validate the predictions.

As shown in the upper left figure (<xr id="iupred"/>) Q30201 has two small regions (around residue 250 and 285) where it might be disordered. There is no entry for Q30201 in DisProt that would suggest that this is true and a sequence search (PsiBlast) against DisProt did not yield any significant results.

For P10775 no disordered regions are predicted (upper right figure in <xr id="iupred"/>). There is also no entry in DisProt. A PsiBlast search results in one significant hit (DP00554), but the alignment does not include the hit's disordered region (31-50).

DisProt does have an entry for Q08209 (DP00092). A PsiBlast search also results in an additional significant hit (DP00365), but the alignment does not contain the disordered region (19-147), so it can be discarded. A comparison between the DisProt Map (<xr id="map92"/>) and the IUPred prediction (lower left figure in <xr id="iupred"/>) shows that the general predictions are true, although IUPred inserts a small ordered region at the end of the protein (which should be disordered). The disordered regions from residue 374-486 are known to make a disorder-order transition which might cause IUPred's vague prediction within this section.

Neither IUPred (lower right figure in <xr id="iupred"/>) nor DisProt suggest any disordered regions for Q9X0E6.

IUPred seems to be quite accurate in predicting completely ordered proteins (P10775, Q9X0E6, and with the exception of the small peak in Q30201), but it seems to have problems with disordered regions where a disorder-order transition occurs.

Transmembrane Helices

Transmembrane helices were predicted with PolyPhobius for HFE (Q30201), DRD3 (P35462), Aquaporin-4 (P47863), and KvAP (Q9YDF8). The results were compared to OPM, PDBTM, and UniProt. The PDB IDs for OPM and PDBTM were chosen based on the following criteria:

wildtype over mutant
higher coverage
better resolution

UniProt -> PDB mapping:

P35462 -> 3PBL
P47863 -> 2D57
Q9YDF8 -> 1ORQ/1ORS

Q30201

PolyPhobius predicts only one transmembrane helix for Q30201 (see <xr id="tmh_q30201"/>). There is no entry in OPM or PDBTM for either of its PDB IDs, but UniProt lists a TMH which almost exactly matches the predicted one (1-residue-shift).

**Table XXX**: TMH predictions and annotations for Q30201. There were no entries for either of the two PDB IDs (1A6Z, 1DE4) in OPM or PDBTM.
Q30201	TMH 1
PolyPhobius	306-329
UniProt	307-330
OPM	no entry
PDBTM	no entry

</figtable>

P35462

For P35462 all methods list 7 transmembrane helices (<xr id="tmh_p35462"/>) which are consistent (regarding their positions) throughout all methods.

**Table XXX**: TMH predictions and annotations for P35462 (PDB ID: 3PBL).
P35462 (3PBL)	TMH 1	TMH 2	TMH 3	TMH 4	TMH 5	TMH 6	TMH 7
PolyPhobius	30-55	66-88	105-126	150-170	188-212	329-352	367-386
UniProt	33-55	66-88	105-126	150-170	188-212	330-351	367-388
OPM	34-52	67-91	101-126	150-170	187-209	330-351	363-386
PDBTM	35-52	68-84	109-123	152-166	191-206	334-347	368-382

</figtable>

P47863

PolyPhobius, UniProt, and PDBTM list 6 TMHs for P47863, OPM lists two additional TMHs (see <xr id="tmh_p47863"/>). These two regions are listed as "Membrane Loop" in PDBTM which might be the cause for the false entries in OPM.

**Table XXX**: TMH predictions and annotations for P47863 (PDB ID: 2D57). TMH3 and TMH7 (marked with *) are listed as "Membrane Loop" in PDBTM.
P47863 (2D57)	TMH 1	TMH 2	TMH 3	TMH 4	TMH 5	TMH 6	TMH 7	TMH 8
PolyPhobius	34-58	70-91		115-136	156-177	188-208		231-252
UniProt	37-57	65-85		116-136	156-176	185-205		232-252
OPM	34-56	70-88	98-107	112-136	156-178	189-203	214-223	231-252
PDBTM	39-55	72-89	95-106*	116-133	158-177	188-205	209-222*	231-248

</figtable>

Q9YDF8

Q9YDF8 seems to be the hardest one to predict TMHs for (cf. <xr id="tmh_q9ydf8"/>). PolyPhobius predicts an additional TMH (compared to UniProt); OPM and PDBTM need two PDB IDs to identify all (and "false") TMHs. Both PDB entries were adjusted for an AA shift of 13 residues.

PolyPhobius predicted a region (TMH7), labeled as "Intramembrane - Pore-Forming" in UniProt, as a (false) TMH. OPM also included this region and an additional one labeled as "Intramembrane - Helical" in UniProt. PDBTM lists TMH7 as "Membrane Loop".

**Table XXX**: TMH predictions and annotations for Q9YDF8 (PDB IDs: 1ORQ, 1ORS). Residue positions are adjusted for the PDB sequence's 13AA shift. TMH3 is annotated as "Intramembrane, Helical" in UniProt, TMH7 as "Intramembrane, Pore-Forming". TMH7 is additionally marked as "Membrane Loop" in PDBTM.
Q9YDF8 (1ORQ/1ORS)	TMH 1	TMH 2	TMH 3	TMH 4	TMH 5	TMH 6	TMH 7	TMH 8
PolyPhobius	42-60	68-88		108-129	137-157	163-184	196-213	224-244
UniProt	39-63	68-92	97-105*	109-125	129-145	160-184	196-208*	222-253
OPM (1ORS)	38-59	68-91	99-110	113-120	130-161
OPM (1ORQ)						166-185	196-208	220-238
PDBTM (1ORS)	40-63	68-88		101-120	131-155
PDBTM (1ORQ)	34-65	70-93				164-184	197-213*	222-249

</figtable>

Comparison

TODO: mean shifts, false/true positives, length (probably finished by sunday)

Signal Peptides

**Table XXX:** SignalP predictions for Q30201, P47863, P11279, and P02768. Each figure shows the C-score, S-score, and Y-score per residue position for the corresponding protein.
	Q30201	P47863
	P11279	P02768

</figtable>

TODO: score description

SignalP (Webserver 4.0) predictions were made for HFE (Q30201), Aquaporin-4 (P47863), Lysosome-associated membrane glycoprotein 1 (P11279), and Serum albumin (P02768) in order to find signal peptides within these sequences. The results are shown in <xr id="signalp"/> and were compared to the corresponding entries in UniProt.

According to UniProt all four predictions are 100% precise:

Q30201: signal peptide 1-22
P47863: no signal peptide
P11279: signal peptide 1-28
P02768: signal peptide 1-18

This makes SignalP an excellent candidate for signal peptide predictions.

GO Terms

For the last part of this task we used GOPET and ProtFun to make a GO term prediction for the HFE protein (Q30201). We did also search for Pfam families. The results were then compared to UniProt and QuickGO.

GOPET

GOPET predicts only two GO terms for our protein (see <xr id="gopet"/>) and even they are somewhat redundant (both are receptor activity). At least the results are correct in that HFE has kind of a receptor activity in that it binds to transferrin receptor (TFR).

**Table XXX**: GO term prediction with GOPET for Q30201.
GOid	Aspect	Confidence	Go term
GO:0004872	F (Molecular Function Ontology)	91%	receptor activity
GO:0030106	F (Molecular Function Ontology)	88%	MHC class I receptor activity

</figtable>

ProtFun

The results for the ProtFun prediction are shown in <xr id="protfun"/>. Predictions with a probability below 0.1 and odds below 1.0 are not shown to decrease the size of the table. ProtFun predicts "cell envelope" for the functional category. This is true as the HFE-TFR complex is located in the membrane. "Transport and binding" also has a high probability which corresponds with HFE's part in the iron transport within the body. HFE is categorized as "Nonenzyme" and no enzyme class was predicted. It is further predicted to be involved in "Immune response" as it is a protein of the major histocompatibility complex (MHC) class I.

**Table XXX**: GO term prediction with ProtFun for Q30201. Entries marked with asterisks (*) had been deemed "true" by ProtFun. Results with a probability below 0.1 and odds below 1.0 are not shown.
Functional category	Probability	Odds
Biosynthesis of cofactors	0.105	1.452
Cell envelope*	0.633*	10.377*
Cellular processes	0.095	1.297
Central intermediary metabolism	0.231	3.663
Fatty acid metabolism	0.016	1.265
Purines and pyrimidines	0.583	2.400
Translation	0.079	1.801
Transport and binding	0.732	1.785
Enzyme/nonenzyme
Enzyme	0.208	0.727
Nonenzyme*	0.792*	1.110*
Enzyme class
Hydrolase	0.135	0.425
Lyase	0.049	1.054
Gene Ontology category
Signal transducer	0.201	0.939
Receptor	0.353	2.076
Stress response	0.274	3.108
Immune response*	0.381*	4.486*

</figtable>

Pfam

Pfam lists two significant results for Q30201:

MHC_I - Class I Histocompatibility antigen, domains alpha 1 and 2 (E-value 3.5e-43)
C1-set - Immunoglobulin C1-set domain (E-value 2.8e-18)

MHC class I proteins are strongly involved in immune responses. UniProt also lists HFE in the MHC class I family and its structure (three extracellular domains, transmembrane region, cytoplasmic tail) fits. C1-set domains are associated with MHC class I proteins and HFE indeed contains such a domain (residues 207-298)

Comparison

Compared to QuickGO which lists 27 unique GO terms for Q30201, GOPET predicts only two. Both of them not included in QuickGO's list. These two also seem to fit the HFE-TFR complex better than HFE alone, but at least the MHC class I tag shows specificity to HFE.

ProtFun's prediction seems more accurate as it successfully identifies HFE's location within the membrane and lists "Transport and binding" as a good second result. "Immune response" is also in accordance to QuickGO's terms.

Pfam's two predicted families were both true positives and it was more informative that the other two methods.

Overall none of them did identify HFE's part in the iron transport.

Conclusion

All these methods can be used to extract more information from just the sequence. As most of it is reliable, these methods are able to generate additional info for further experiments, more insight on the structure of the protein and more, in very few time (compared to experimentally generated data).

@@ Line 12: / Line 12: @@
 Detailed description: [[Task_3_-_Sequence-based_predictions|Sequence-Based Predictions]]
+In this part of the wiki we present our results on different sequence based prediction methods.
+These cover the prediction of secondary structure, disordered regions, transmembrane helices, signal peptides and GO annotations.
-* '''TODO:''' Task description
 * '''TODO:''' Table numbers (once all tables are finished)

Difference between revisions of "Sequence-Based Predictions Hemochromatosis"

Revision as of 10:32, 20 May 2012

Contents

Short Task Description

Protocol

Secondary Structure

1KR4

1AUI

2BNH

1A6Z

Disorder

Transmembrane Helices

Q30201

P35462

P47863

Q9YDF8

Comparison

Signal Peptides

GO Terms

GOPET

ProtFun

Pfam

Comparison

Conclusion

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools