Sequence-based predictions
Contents
Secondary structure prediction
PSIPRED
PSIPRED HFORMAT (PSIPRED V3.0) Conf: 999851589999999877513567886245556456636899750389988756755687
Pred: CCCCCHHHHHHHHHHHHHHHCCCCCCCEEEEEEEEEEECCCCCCCEEEEEEEECCEEEEE
AA: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
10 20 30 40 50 60
Conf: 318998225536664688990669998865311211002358577441156788603899
Pred: ECCCCCCEEECCCCCCCCCCHHHHHHHHHHHHCCCCCHHHHHHHHHHHCCCCCCCCEEEE
AA: YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
70 80 90 100 110 120
Conf: 987799319835459889765910588728988756689786135787788899999876
Pred: EEEEEEECCCEEEEEEEEEECCCEEEEECCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHH
AA: ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
130 140 150 160 170 180
Conf: 310271499889888616322000378810000468999601699981450765189996
Pred: HHHCCCHHHHHHHHHHCCCCCCCCCCCCCEEEECCCCCCCEEEEEEEEEECCCCEEEEEE
AA: AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
190 200 210 220 230 240
Conf: 288106667520025355899875899999965999872169986699998826885259
Pred: ECCEECCCCCCCCCCCEECCCCCEEEEEEEEECCCCCCCEEEEEECCCCCCCEEEEEECC
AA: KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
250 260 270 280 290 300
Conf: 999711124320001367777622367764115889887620212359
Pred: CCCCCEEEEEEEEEEEEEEEEEEEEEEEEEECCCCCCCCCCEEECCCC
AA: PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
310 320 330 340
Jpred3
Seq: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDD SS: ------HHHHHHHHHHHHH---------EEEEEEEEE-------EEEEEEEEE-- Seq: QLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHN SS: EEEEEE-----EEEE----------HHHHHHHHHHHHHHHHHHHHHHHHHH---- Seq: HSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPT SS: -----EEEEEEEEEE------EEEEEEE-----EEEEEE----EEE-------HH Seq: KLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSV SS: HHHHH--HHHHHHHHHH------HHHHHHHHHH-H-------EEEEE-------- Seq: TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPG SS:-EEEEEEE------EEEEEEE----------EE----------EEEEEEEEE--- Seq: EEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILR SS: ---EEEEEEEE------EEEEE---------HHHHHHHHHHHHHHHHHHHHHHHH Seq: KRQGSRGAMGHYVLAERE SS: HH----------------
Comparison with DSSP
Prediction of disordered regions
DISOPRED
AA:Target sequence Pred:Residue disorder prediction(.)= ordered residue(*)=Disordered residue conf:997600000000000000000000000000000000000000000000000000000000 pred:**.......................................................... AA:MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF 10 20 30 40 50 60 conf:000120011000000000000000000000000000000000000000000000000000 pred:............................................................ AA:YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV 70 80 90 100 110 120 conf:000000000000000000000000000000000000000000000000000000000000 pred:............................................................ AA:ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR 130 140 150 160 170 180 conf:000000000000000000000002456777878777766530000000000000000000 pred:..............................*.*........................... AA:AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL 190 200 210 220 230 240 conf:000035555545543000000000000000000000000000000000000001354667 pred:............................................................ AA:KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS 250 260 270 280 290 300 conf:777766643300000000000000047889999999999999898999 pred:...........................********************* AA:PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE 310 320 330 340 DISOPRED predictions for a false positive rate threshold of: 2%
POODLE
POODLE stands for Prediction Of Order and Disorder by machine LEarning.
POODLE provides three different predictions
- POODLE-S: short disorder regions prediction
- POODLE-L: long disorder regions prediction (longer 40 residues)
- unfolded protein prediction
All POODLE variants predicted a disordert region at the end of the protein which contains a transmembrane region (pos: 307-330), this shows an evidance for a disordert region at the C-Terminus. But also all variants predicted a short disordered region at the begining of the sequence which is a part of the signal peptid (pos: 1-22).
POODLE-I
POODLE-I (series only) predicted 4 disordert regions within the protein sequence.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF **************---------------------------------------------- YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV -------**********---******------*--------------------------- ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------------------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ---------------------***************------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ----*********----------------------------------------******* PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE ************************************************
POODLE-S
POODLE-S (using missing residues) predicted 6 short disordert regions within the protein sequence.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF -**************--------------------------------------------- YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV -------**********---******---------------------------------- ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------------------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ---------------------***************------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ----*********----------------------------------------******* PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE *--------------------------------********-------
POODLE-S (using High B-Factor residues) predicted 2 short disordert regions within the protein sequence.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF -*-***------------------------------------------------------ YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV ------------------------------------------------------------ ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------******------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ------------------------------------------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ------------------------------------------------------------ PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE ------------------------------------------------
POODLE-L
POODLE-L predicted a disorderd region from 296 to the end.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF ------------------------------------------------------------ YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV ------------------------------------------------------------ ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------------------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ------------------------------------------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ------------------------------------------------------****** PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE ************************************************
IUPRED
The short term prediction predictet 5 short regions. Also disordert residues at the beginin in the signal peptide.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF ***--------------------------------------------------------- YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV ------------------------------------------------------------ ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------------------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ------------------------------------------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ---------********----------***--------*-****---------------- PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE ---------------------------------------------***
The long term prediction predictet 7 disordert residues, but just one short region.
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF ------------------------------------------------------------ YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV ------------------------------------------------------------ ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR ------------------------------------------------------------ AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL ------------------------------------------------------------ KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS ---------******-------------------------*------------------- PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE ------------------------------------------------
The prediction of sturcured regions predicts one globular domain from 1-348. This means, that the whole protein is structured. This is a contradiction to the prediction of POODLE, but because of the weak evidance given by the other IUPRED-methods not a real contradiction to the other results of IUPRED.
META-Disorder
Prediction of transmembrane alpha-helices and signal peptides
TMHMM
Phobius and PolyPhobius
Phobius
Phobius predicted very accurate as seen below. The transmembrane region is predicted just 1-2 residues upstream from the annotated region. The same holds for the topological domains before and after the transmembrane region. Also the signal peptid is correctly predicted.
PREDICTED ANNOTATION ID sp|Q30201|HFE_HUMAN FT SIGNAL 1 21 | 1-20 FT REGION 1 7 N-REGION. FT REGION 8 16 H-REGION. FT REGION 17 21 C-REGION. FT TOPO_DOM 22 304 NON CYTOPLASMIC. | 23-306 FT TRANSMEM 305 329 | 307-330 FT TOPO_DOM 330 348 CYTOPLASMIC. | 331-348
PolyPhobius
PolyPhobius also predicted very accurate but in our case not as accurate as Phobius.
PREDICTED ANNOTATION ID sp|Q30201|HFE_HUMAN FT SIGNAL 1 23 | 1-20 FT REGION 1 5 N-REGION. FT REGION 6 19 H-REGION. FT REGION 20 23 C-REGION. FT TOPO_DOM 24 304 NON CYTOPLASMIC. | 23-306 FT TRANSMEM 305 329 | 307-330 FT TOPO_DOM 330 348 CYTOPLASMIC. | 331-348
OCTOPUS and SPOCTOPUS
Both, OCTOPUS and SPOCTOPUS predicted the signal peptide and the transmembrane region correctly.
SignalP
TargetP
Prediction of GO terms
Generel
HFE is annotated with 27 different GO Terms which are <ref>http://www.ebi.ac.uk/QuickGO/GProtein?ac=Q30201</ref>:
GOID | GO Term | Aspect |
---|---|---|
GO:0002474 | antigen processing and presentation of peptide antigen via MHC class I | Process |
GO:0005515 | protein binding | Function |
GO:0005737 | cytoplasm | Component |
GO:0005769 | early endosome | Component |
GO:0005886 | plasma membrane | Component |
GO:0005887 | integral to plasma membrane | Component |
GO:0006461 | protein complex assembly | Process |
GO:0006810 | transport | Process |
GO:0006811 | ion transport | Process |
GO:0006826 | iron ion transport | Process |
GO:0006879 | cellular iron ion homeostasis | Process |
GO:0006898 | receptor-mediated endocytosis | Process |
GO:0006955 | immune response | Process |
GO:0007565 | female pregnancy | Process |
GO:0010106 | cellular response to iron ion starvation | Process |
GO:0016020 | membrane | Component |
GO:0016021 | integral to membrane | Component |
GO:0019882 | antigen processing and presentation | Process |
GO:0031410 | cytoplasmic vesicle | Component |
GO:0042446 | hormone biosynthetic process | Process |
GO:0042612 | MHC class I protein complex | Component |
GO:0045177 | apical part of cell | Component |
GO:0045178 | basal part of cell | Component |
GO:0048471 | perinuclear region of cytoplasm | Component |
GO:0055037 | recycling endosome | Component |
GO:0055072 | iron ion homeostasis | Process |
GO:0060586 | multicellular organismal iron ion homeostasis | Process |
GOPET
Gopet predicted 2 GO-Terms which have no overlab to the annotation.
GOID | Aspect | Confidence | GO Term |
---|---|---|---|
GO:0004872 | Molecular Function | 91% | receptor activity |
GO:0030106 | Molecular Function | 88% | MHC class I receptor activity |
Pfam
ProtFun 2.2
Functional category Prob Odds
Amino_acid_biosynthesis 0.011 0.484
Biosynthesis_of_cofactors 0.105 1.452
Cell_envelope => 0.633 10.377
Cellular_processes 0.095 1.297
Central_intermediary_metabolism 0.231 3.663
Energy_metabolism 0.059 0.659
Fatty_acid_metabolism 0.016 1.265
Purines_and_pyrimidines 0.583 2.400
Regulatory_functions 0.013 0.079
Replication_and_transcription 0.019 0.073
Translation 0.079 1.801
Transport_and_binding 0.732 1.785
Enzyme/nonenzyme Prob Odds
Enzyme 0.208 0.727
Nonenzyme => 0.792 1.110
Enzyme class Prob Odds
Oxidoreductase (EC 1.-.-.-) 0.084 0.404
Transferase (EC 2.-.-.-) 0.062 0.179
Hydrolase (EC 3.-.-.-) 0.135 0.425
Lyase (EC 4.-.-.-) 0.049 1.054
Isomerase (EC 5.-.-.-) 0.010 0.321
Ligase (EC 6.-.-.-) 0.042 0.827
Gene Ontology category Prob Odds
Signal_transducer 0.201 0.939
Receptor 0.353 2.076
Hormone 0.002 0.365
Structural_protein 0.005 0.190
Transporter 0.024 0.219
Ion_channel 0.008 0.147
Voltage-gated_ion_channel 0.002 0.085
Cation_channel 0.010 0.221
Transcription 0.036 0.283
Transcription_regulation 0.018 0.147
Stress_response 0.274 3.108
Immune_response => 0.381 4.486
Growth_factor 0.013 0.943
Metal_ion_transport 0.009 0.02