Difference between revisions of "Sequence-based predictions"

From Bioinformatikpedia
(Phobius)
(Phobius)
Line 215: Line 215:
 
[[File:Phobius.PNG|200px|thumb|right|predicted regions by Phobius]]
 
[[File:Phobius.PNG|200px|thumb|right|predicted regions by Phobius]]
   
  +
Phobius predicted very accurate as seen below. The transmembrane region is predicted just 1-2 residues upstream from the annotated region. The same holds for the topological domains before and after the transmembrane region. Also the signal peptid is correctly predicted.
Phobius predicted very accurate as seen below.
 
   
 
PREDICTED ANNOTATION
 
PREDICTED ANNOTATION

Revision as of 16:41, 4 June 2011

Secondary structure prediction

PSIPRED

Secondary Structure predicted by PSIPRED
PSIPRED HFORMAT (PSIPRED V3.0)
Conf: 999851589999999877513567886245556456636899750389988756755687
Pred: CCCCCHHHHHHHHHHHHHHHCCCCCCCEEEEEEEEEEECCCCCCCEEEEEEEECCEEEEE
AA: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
10 20 30 40 50 60
Conf: 318998225536664688990669998865311211002358577441156788603899
Pred: ECCCCCCEEECCCCCCCCCCHHHHHHHHHHHHCCCCCHHHHHHHHHHHCCCCCCCCEEEE
AA: YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
70 80 90 100 110 120
Conf: 987799319835459889765910588728988756689786135787788899999876
Pred: EEEEEEECCCEEEEEEEEEECCCEEEEECCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHH
AA: ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
130 140 150 160 170 180
Conf: 310271499889888616322000378810000468999601699981450765189996
Pred: HHHCCCHHHHHHHHHHCCCCCCCCCCCCCEEEECCCCCCCEEEEEEEEEECCCCEEEEEE
AA: AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
190 200 210 220 230 240
Conf: 288106667520025355899875899999965999872169986699998826885259
Pred: ECCEECCCCCCCCCCCEECCCCCEEEEEEEEECCCCCCCEEEEEECCCCCCCEEEEEECC
AA: KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
250 260 270 280 290 300
Conf: 999711124320001367777622367764115889887620212359
Pred: CCCCCEEEEEEEEEEEEEEEEEEEEEEEEEECCCCCCCCCCEEECCCC
AA: PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
310 320 330 340

Jpred3

Seq: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDD
 SS: ------HHHHHHHHHHHHH---------EEEEEEEEE-------EEEEEEEEE--

Seq: QLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHN
 SS: EEEEEE-----EEEE----------HHHHHHHHHHHHHHHHHHHHHHHHHH----

Seq: HSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPT
 SS: -----EEEEEEEEEE------EEEEEEE-----EEEEEE----EEE-------HH

Seq: KLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSV
 SS: HHHHH--HHHHHHHHHH------HHHHHHHHHH-H-------EEEEE--------

Seq: TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPG
 SS:-EEEEEEE------EEEEEEE----------EE----------EEEEEEEEE---

Seq: EEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILR
 SS: ---EEEEEEEE------EEEEE---------HHHHHHHHHHHHHHHHHHHHHHHH
Seq: KRQGSRGAMGHYVLAERE
 SS: HH----------------

Comparison with DSSP

Prediction of disordered regions

DISOPRED

AA:Target sequence
Pred:Residue disorder prediction(.)= ordered residue(*)=Disordered residue
conf:997600000000000000000000000000000000000000000000000000000000
pred:**..........................................................
  AA:MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
             10        20	  30 	    40	      50	60
conf:000120011000000000000000000000000000000000000000000000000000
pred:............................................................
  AA:YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
             70        80	  90	   100	     110       120
conf:000000000000000000000000000000000000000000000000000000000000
pred:............................................................
  AA:ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
            130       140       150       160       170       180
conf:000000000000000000000002456777878777766530000000000000000000
pred:..............................*.*...........................
  AA:AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
            190       200       210       220       230       240
conf:000035555545543000000000000000000000000000000000000001354667
pred:............................................................
  AA:KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
            250       260       270       280       290       300
conf:777766643300000000000000047889999999999999898999
pred:...........................*********************
  AA:PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
            310       320       330       340
DISOPRED predictions for a false positive rate threshold of: 2%

POODLE

POODLE stands for Prediction Of Order and Disorder by machine LEarning.

POODLE provides three different predictions

  • POODLE-S: short disorder regions prediction
  • POODLE-L: long disorder regions prediction (longer 40 residues)
  • unfolded protein prediction



All POODLE variants predicted a disordert region at the end of the protein which contains a transmembrane region (pos: 307-330), this shows an evidance for a disordert region at the C-Terminus. But also all variants predicted a short disordered region at the begining of the sequence which is a part of the signal peptid (pos: 1-22).

POODLE-I

POODLE-I (series only) predicted 4 disordert regions within the protein sequence.

Distribution of disordert region over the AS-Sequence predicted by POODLE-I
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
**************----------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
-------**********---******------*---------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
------------------------------------------------------------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
---------------------***************------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
----*********----------------------------------------*******
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
************************************************

POODLE-S

POODLE-S (using missing residues) predicted 6 short disordert regions within the protein sequence.

Distribution of disordert region over the AS-Sequence predicted by POODLE-S(Missing residues)
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
-**************---------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
-------**********---******----------------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
------------------------------------------------------------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
---------------------***************------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
----*********----------------------------------------*******
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
*--------------------------------********-------

POODLE-S (using High B-Factor residues) predicted 2 short disordert regions within the protein sequence.

Distribution of disordert region over the AS-Sequence predicted by POODLE-S(High B-Factor residues)
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
-*-***------------------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
------------------------------------------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
------------------------------------------------******------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
------------------------------------------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
------------------------------------------------------------
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
------------------------------------------------

POODLE-L

POODLE-L predicted a disorderd region from 296 to the end.

Distribution of disordert region over the AS-Sequence predicted by POODLE-L
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF 
------------------------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV 
------------------------------------------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR 
------------------------------------------------------------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL 
------------------------------------------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS 
------------------------------------------------------******
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
************************************************

IUPRED

The short term prediction predictet 5 short regions. Also disordert residues at the beginin in the signal peptide.

IUPRED prediction of short regions
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF 
***---------------------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV 
------------------------------------------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR 
------------------------------------------------------------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL 
------------------------------------------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS 
---------********----------***--------*-****----------------
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
---------------------------------------------***


The long term prediction predictet 7 disordert residues, but just one short region.

IUPRED prediction of long regions
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF 
------------------------------------------------------------
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV 
------------------------------------------------------------
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR 
------------------------------------------------------------
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL 
------------------------------------------------------------
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS 
---------******-------------------------*-------------------
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
------------------------------------------------
IUPRED prediction of structured regions



The prediction of sturcured regions predicts one globular domain from 1-348. This means, that the whole protein is structured. This is a contradiction to the prediction of POODLE, but because of the weak evidance given by the other IUPRED-methods not a real contradiction to the other results of IUPRED.

META-Disorder

Prediction of transmembrane alpha-helices and signal peptides

TMHMM

Phobius and PolyPhobius

Phobius

predicted regions by Phobius

Phobius predicted very accurate as seen below. The transmembrane region is predicted just 1-2 residues upstream from the annotated region. The same holds for the topological domains before and after the transmembrane region. Also the signal peptid is correctly predicted.

PREDICTED                                                     ANNOTATION
ID   sp|Q30201|HFE_HUMAN
FT   SIGNAL        1     21                             |  1-20
FT   REGION        1      7       N-REGION.              
FT   REGION        8     16       H-REGION.
FT   REGION       17     21       C-REGION.
FT   TOPO_DOM     22    304       NON CYTOPLASMIC.      |  23-306
FT   TRANSMEM    305    329                             |  307-330
FT   TOPO_DOM    330    348       CYTOPLASMIC.          |  331-348

OCTOPUS and SPOCTOPUS

SignalP

TargetP

Prediction of GO terms

Generel

HFE is annotated with 27 different GO Terms which are <ref>http://www.ebi.ac.uk/QuickGO/GProtein?ac=Q30201</ref>:

GOID GO Term Aspect
GO:0002474 antigen processing and presentation of peptide antigen via MHC class I Process
GO:0005515 protein binding Function
GO:0005737 cytoplasm Component
GO:0005769 early endosome Component
GO:0005886 plasma membrane Component
GO:0005887 integral to plasma membrane Component
GO:0006461 protein complex assembly Process
GO:0006810 transport Process
GO:0006811 ion transport Process
GO:0006826 iron ion transport Process
GO:0006879 cellular iron ion homeostasis Process
GO:0006898 receptor-mediated endocytosis Process
GO:0006955 immune response Process
GO:0007565 female pregnancy Process
GO:0010106 cellular response to iron ion starvation Process
GO:0016020 membrane Component
GO:0016021 integral to membrane Component
GO:0019882 antigen processing and presentation Process
GO:0031410 cytoplasmic vesicle Component
GO:0042446 hormone biosynthetic process Process
GO:0042612 MHC class I protein complex Component
GO:0045177 apical part of cell Component
GO:0045178 basal part of cell Component
GO:0048471 perinuclear region of cytoplasm Component
GO:0055037 recycling endosome Component
GO:0055072 iron ion homeostasis Process
GO:0060586 multicellular organismal iron ion homeostasis Process

GOPET

Gopet predicted 2 GO-Terms which have no overlab to the annotation.

GOID Aspect Confidence GO Term
GO:0004872 Molecular Function 91% receptor activity
GO:0030106 Molecular Function 88% MHC class I receptor activity

Pfam

ProtFun 2.2

 Functional category                  Prob     Odds
 Amino_acid_biosynthesis              0.011    0.484
 Biosynthesis_of_cofactors            0.105    1.452
 Cell_envelope                     => 0.633   10.377
 Cellular_processes                   0.095    1.297
 Central_intermediary_metabolism      0.231    3.663
 Energy_metabolism                    0.059    0.659
 Fatty_acid_metabolism                0.016    1.265
 Purines_and_pyrimidines              0.583    2.400
 Regulatory_functions                 0.013    0.079
 Replication_and_transcription        0.019    0.073
 Translation                          0.079    1.801
 Transport_and_binding                0.732    1.785

 Enzyme/nonenzyme                     Prob     Odds
 Enzyme                               0.208    0.727
 Nonenzyme                         => 0.792    1.110

 Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.084    0.404
 Transferase    (EC 2.-.-.-)          0.062    0.179
 Hydrolase      (EC 3.-.-.-)          0.135    0.425
 Lyase          (EC 4.-.-.-)          0.049    1.054
 Isomerase      (EC 5.-.-.-)          0.010    0.321
 Ligase         (EC 6.-.-.-)          0.042    0.827

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.201    0.939
 Receptor                             0.353    2.076
 Hormone                              0.002    0.365
 Structural_protein                   0.005    0.190
 Transporter                          0.024    0.219
 Ion_channel                          0.008    0.147
 Voltage-gated_ion_channel            0.002    0.085
 Cation_channel                       0.010    0.221
 Transcription                        0.036    0.283
 Transcription_regulation             0.018    0.147
 Stress_response                      0.274    3.108
 Immune_response                   => 0.381    4.486
 Growth_factor                        0.013    0.943
 Metal_ion_transport                  0.009    0.02