Difference between revisions of "Secondary Structure Prediction BCKDHA"

From Bioinformatikpedia
(4. Prediction of GO terms)
(4. Prediction of GO terms)
Line 1,339: Line 1,339:
   
 
''' BCKDHA'''
 
''' BCKDHA'''
{|border="1" style="text-align:center; border-spacing:0;"
+
{|border="1" style="text-align:left; border-spacing:0;"
 
!GO Term Name
 
!GO Term Name
 
!GO identifier
 
!GO identifier
 
!Aspect
 
!Aspect
 
|-
 
|-
|colspan="3"|Process
+
|colspan="3" style="text-align:center"|'''Process'''
 
|-
 
|-
 
|metabolic process||0008152||P
 
|metabolic process||0008152||P
Line 1,354: Line 1,354:
 
|oxidation-reduction process||0055114||P
 
|oxidation-reduction process||0055114||P
 
|-
 
|-
|colspan="3"| '''Function'''
+
|colspan="3" style="text-align:center"| '''Function'''
 
|-
 
|-
 
|alpha-ketoacid dehydrogenase activity||0003826||F
 
|alpha-ketoacid dehydrogenase activity||0003826||F
Line 1,370: Line 1,370:
 
|metal ion binding||0046872||F
 
|metal ion binding||0046872||F
 
|-
 
|-
|colspan="3"|'''Component'''
+
|colspan="3" style="text-align:center"|'''Component'''
 
|-
 
|-
 
|mitochondrion||0005739||C
 
|mitochondrion||0005739||C
Line 1,380: Line 1,380:
   
 
'''A4_HUMAN'''
 
'''A4_HUMAN'''
{|border="1" style="text-align:left"; border-spacing:0;"
+
{|border="1" style="text-align:left; border-spacing:0;"
 
!GO Term Name
 
!GO Term Name
 
! GO identifier
 
! GO identifier
Line 1,471: Line 1,471:
 
|smooth endoplasmic reticulum calcium ion homeostasis||0051563||P
 
|smooth endoplasmic reticulum calcium ion homeostasis||0051563||P
 
|-
 
|-
|colspan="3"|'''Function'''
+
|colspan="3" style="text-align:center"|'''Function'''
 
|-
 
|-
 
|DNA binding||0003677||F
 
|DNA binding||0003677||F
Line 1,497: Line 1,497:
 
|PTB domain binding||0051425||F
 
|PTB domain binding||0051425||F
 
|-
 
|-
|colspan="3"|'''Component'''
+
|colspan="3" style="text-align:center"|'''Component'''
 
|-
 
|-
 
|exracellular region||0005576||C
 
|exracellular region||0005576||C
Line 1,556: Line 1,556:
   
 
'''RET4_HUMAN'''
 
'''RET4_HUMAN'''
{|border="1" style="text-align:center; border-spacing:0;"
+
{|border="1" style="text-align:left; border-spacing:0;"
 
!GO Term Name
 
!GO Term Name
 
!GO identifier
 
!GO identifier
 
!Aspect
 
!Aspect
 
|-
 
|-
|colspan="3"|'''Process'''
+
|colspan="3" style="text-align:center"|'''Process'''
 
|-
 
|-
 
|eye development||0001654||P
 
|eye development||0001654||P
Line 1,627: Line 1,627:
 
|heart trabecula formation||0060347||P
 
|heart trabecula formation||0060347||P
 
|-
 
|-
|colspan="3"|'''Function'''
+
|colspan="3" style="text-align:center"|'''Function'''
 
|-
 
|-
 
|transporter activity||0005215||F
 
|transporter activity||0005215||F
Line 1,643: Line 1,643:
 
|retinol transporter activity||0034632||F
 
|retinol transporter activity||0034632||F
 
|-
 
|-
|colspan="3"|'''Component'''
+
|colspan="3" style="text-align:center"|'''Component'''
 
|-
 
|-
 
|extracellular region||0005576||C
 
|extracellular region||0005576||C

Revision as of 22:00, 5 June 2011

1. Secondary structure prediction

General Information

The secondary structure of a protein bases on the primary structure and consists of alpha-helices, beta-sheets and coils.

alpha-helices

alpha-helix

Alpha-helices are build by H-bounds between the NH-group of an amino acid and the CO-group of the amino acid which is placed four recidues earlier (i+4). This form of the alhpa-helix is the most common one. There are two other types of alpha-helices which are very rare. One is called 3,10-helices because the H-bound is between the NH-group and the CO-group three recidues earlier (i+3). And the other one is the Phi-helix and here the H-boung is between the NH-group and the CO-group five residues earlier (i+5). The different locations of the CO-group influence the width and the height of the helices.

beta-sheets

beta-sheet

The H-bounds between the CO-group and the NH-group which build a beta-sheet can be located far away from each other in the sequence.
There are two different kinds of beta-sheets. The parallel one where the sheets all point in the same direction and the anti-parallel ones where the sheets point alternately in different directions.

coils

Coils are irregular formed elements like turns.

PSIPRED

Basic information

author: David T. Jones (University College London)
year:1998
version: 2

PSIPRED uses neuronal networks which has a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure. To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data.
For the online prediction on the server it is enough to enter a amino acid sequence. Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secundary structure prediction is filtered.

There are three different options:
- Mask low complexity regions
- Mask transmembrane helices
- Mask coiled-coil regions

References

[PSIPRED Server]
[Overview of prediction methods]
[History of the PSIPRED]

Prediction

Prediction of DSSP
Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred      CHHHHHHHHHHHHHHHCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCC
UniProt                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred      CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHH
UniProt             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCHHHHHHHHHHCCCC
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred      CCEEECCCCHHHHHHHCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCC
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred      CCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHH
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHHHHCCCEEEEEECCCCCCCCCCCHHCCCCHHHHHCCCCCCCCCEECC
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred      HHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHH
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHHCCCCHHHHHHHHHHCCCCC HHHHHHHHHHHHHHHHHHHHHHHC
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred      CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH   


PSIPRED has predicted 23 coils, 16 alpha helices and 6 beta sheets.
The alpha helices are quite good predicted by DSSP but it also predicts many beta-sheets but most of them are false-positives.

Jpred3

Basic information

author: Cole C, Barber JD & Barton GJ (Bioinformatics and Computational Biology Research, University of Dundee)
year: 1998
version: 3


Jpred is using a neuronal network to make the predictions. To predict the secondary structure of a protein sequence or of a multiple alignment of protein sequences the algorithm Jnet is used. The prediction accuracy for secondary strctures lies above 81%. Additionally Jpred makes predictions about the solvent accessibility.
Jpred3 needs a protein sequence or multiple alignment of protein sequences as input.
It is important that the target sequence is the first sequence in the multiple alignment since the alignment is modified so that the first sequence do not have any gaps. The alignemt has to be in the MSF or in the BLC format.

References

Jpred3 Server
About Jpred
FAQ


Prediction

By predicting the secondary structure of BCKDHA with JPred it found many hits with very good e-values in other proteins.

e-value=0.0
2bew, 2bev, 2beu, 1x80, 1wci, 1u5b, 1olx, 1ols, 1dtw, 1x7y, 1x7z, 1x7x, 1x7w, 2j9f, 2bff, 1v1r, 1olu, 1v16, 1v11, 2bfc, 2bfb, 1v1m, 2bfd, 2bfe

e-value=6e-58
1umd, 1umc, 1umb, 1um9

e-value=1e-57
2bp7, 1qs0, 1w85, 3dva, 1w88


With these hits JPred run the prediction:

Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred        HHHHHHHHHHHHHH                 EEE              
Conf      10090009999980000000323546777770000303566666777777
UniProd                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred                                 EEEEE                HH
Conf      77777777777777654567777777308885377740467787776368
UniProd             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHH     E      HHHHHHHHHHH
Conf      99999999999999999999875045000001677517899999885278
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred        EEEE    HHHHHHHH  HHHHHHHHH
Conf      84465157745788885065689988740677754577777545677777
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred                   HHHHHHHHHHHH     EEEEEE      HHHHHHHH
Conf      64132147888770367889998750688558887407887468999999
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHH     EEEEEEE                 HHHHHHH   EEEEE
Conf      87500888606888703677777777777764067777005725774078
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred        HHHHHHHHHHHHHHHHH    EEEEEEEEEE              HHH
Conf      74689999999999988507985588886354067777777765553688
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH
Conf      99998468758999999986068866899999999999999999988606
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred          HHHHHHH      HHHHHHHHHHHHHHHH
Conf      887368777523688756899999999999875267777777888
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH   


By comparing the prediction of the secondary structure of Jpred and the secondary structure of BCKDHA in UniProt it is remarkable that in the beginning the prediction differs a lot from UniProt but in the middle and in the end it becomes much better. Jpred predicts more helices and less beta sheets than there are in the UniProt secondary structure.




0 129 small.png

130 257 small.png

257 385 small.png

386 Ende small.png

In the first line the secondary structure prediction is shown. The red parts stand for the alpha-helices and the green parts stand for the beta-sheets. Under this line the confidence of the prediction is symbolized by the black bars. The higher a bar is the higher is the confidence for the prediction on this position. In the last line again the confidence is shown. The numbers reach from 0 to 9 where 0 means that the prediction is very uncertain and 9 means that this prediction is quite sure.

DSSP

Basic information

author: Wolfgang Kabsch and Chris Sander (Max-Planck-Institut fürmedizinische Forschung, Heidelberg)
year: 1983
whole name: Define Secondary Structure of Proteins

Based on atomic coordinates in Protein Data Bank format, DSSP defines the secondary structure of a protein.
With this method the secondary structure is not predicted but determined from the 3D coordinates.


Referencse

[Introduction]
[Explanation ]


Prediction

The first part of the Plot till position 391 is a subpart of the whole sequence. The first 50 amino acid are not shown.
Seq     KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMT
Pred        TT       T        TT T    T  TTT  T 333     HHHHHHHHHHHH
UniProt                            EEEEE                HH

Seq     LLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYP
Pred    HHHHHHHHHHHHHHTTTTT     TT HHHHHHHHHTT TTTSSS  TT HHHHHHTT
UniProt HHHHHHHHHHHHHH     E      HHHHHHHHHHH     EEEE    HHHHHHHH  

Seq     LELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANR
Pred    HHHHHHHHHT TT TTTT T TT    TTTT     TTTTTHHHHHHHHHHHHHHTT
UniProt HHHHHHHHH                                  HHHHHHHHHHHH

Seq     VVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPG
Pred     SSSSSSTT333THHHHHHHHHHHHTT  SSSSSSS TSSTTSS333T TTTTT333T33
UniProt EEEEEE      HHHHHHHHHHHH     EEEEEEE                 HHHHHHH

Seq     YGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYR
Pred    3T SSSSSSTT HHHHHHHHHHHHHHHHHHT  SSSSSS    T TTTT  333T
UniProt    EEEEE    HHHHHHHHHHHHHHHHH    EEEEEEEEEE             

Seq     VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFS
Pred     HHHHHHT HHHHHHHHHHHHTT  HHHHHHHHHHHHHHHHHHHHHHHHT    3333TT
UniProt HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH     HHHHHH
 
Seq     DVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred    TTTTT  HHHHHHHHHHHHHHHHH333T 333
UniProt H      HHHHHHHHHHHHHHHH


1. line: Sequence
2. line: structral elements
3. line: if a residue is involved in symmetrie contacts it is labeled with a star
4. line: if a residue is solvent accessible it is labeled with an "A"

Letter code for the secundary structure elements:

- H (blue): alpha
- 3 (yellow): residue in isolated beta-bridge
- T (red): hydrogen bonded turn
- S (green): bend



2. Prediction of disordered regions

General information

Disordered regions are long regions which do not have a regular secondary structure. They are dynamically flexible and have only a regular structure when they bind to another substrate or protein. In these regions polar and charged amino acid and especially proline are overrepresentated. The disordered regions are conserved and obtain mainly in regions which have a regulatory function. Since disordered regions have no clear secondary structure they also have no tertiary structure.


DISOPRED

Basic information

author: Jonathan J. Ward, Liam J. McGuffin, Kevin Bryson, Bernard F. Buxton and David T. Jones (University College London)
year: 2004
version: 2

DISOPRED2 identifies disordered regions by searching residues which appear in the sequence records but have no co-ordinates in the electron density map. This is a very simple method to find disordered regions because the absence of co-ordinates can also be explained with artifacts of the crystalization process.

References

Publication
DISOPRED server
Information

Prediction

DisopredOutseq.png
Disopredplot.png

In the first line the confidence of the prediction which is shown in the second line is denoted. The prediction of a disordered region is marked with an asterisk (*). All of the disordered regions are predicted with a very high confidence.
DISOPRED predicts disordered regions in the beginning and in the end of BCKDHA as it is shown in the left picture by the red fields.
Also the plot on the right side points out that the disordered regions are in the beginning and in the end since at these two sides there are the highest peaks.

POODLE

Basic information

author:
- POODLE-L S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi
- POODLE-S K. Shimizu, Y. Muraoka, S. Hirose, and T. Noguchi
- POODLE-W K. Shimizu, Y. Muraoka, S. Hirose, K. Tomii and T. Noguchi
- POODLE-I S.Hirose, K.Shimizu, N.Inoue, S.Kanai and T.Noguchi

year:
- POODLE-L 2007
- POODLE-S 2007
- POODLE-W 2007
- POODLE-I 2008

POODLE uses machine learning approaches to predict the disordered regions of an amino acid sequence.
There are several different options which can be choosen:

POODLE-L: This tool searches for disordered regions which are longer than 40 consecutive amino acids.
POODLE-S: Here the focus lies on predicting short disordered regions. There are two different subtools: "Missing residues" and "High B-factor residues"
POODLE-W: With this option the proteins which are mostly disordered can be found.
POODLE-I: In this tool the other three tools are combined. POODLE-I also uses structural information to predict disordered regions. It bases on a work-flow approach.


References

[POODLE-L]
[POODLE-S]
[POODLE-W]
[POODLE-I]
[POODLE server]
[Help]


Prediction

POODLE-S

POODLE-S
Missing residues
POODLE-S
High B-factor residues
POODLE-S (Missing residues)
POODLE-S (High B-factor residues)


POODLE-S (Missing residues):


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
M A V A I A A A R V W R L N R G L S Q A A L L L L R Q P G A R G L A R S H P P R Q Q Q Q F S S L D D K P Q F P G
0.585 0.601 0.624 0.69 0.753 0.809 0.798 0.748 0.679 0.595 0.526 0.55 0.59 0.604 0.679 0.754 0.783 0.817 0.849 0.826 0.799 0.779 0.782 0.763 0.748 0.722 0.714 0.668 0.661 0.691 0.724 0.754 0.799 0.841 0.862 0.88 0.885 0.892 0.89 0.892 0.897 0.892 0.91 0.926 0.913 0.908 0.888 0.829 0.77 0.715 0.691 0.652 0.616 0.586 0.577 0.512

341 342 343 344 345
D S S A Y
0.562 0.6 0.615 0.597 0.501

420 421 422 423
L R K Q
0.565 0.594 0.557 0.525


POODLE-S (which predicts short disordered regions) with the option "Missing residues" predicted the disordered regions between the positions 1-56, 341-345 and 420-423. This is also shown in the plot above.

POODLE-S (High B-Factor residues):
6 7 8 9
A A A R
0.618 0.664 0.634 0.609

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
R G L S Q A A L L L L R Q P G A R G L A R S H P P R Q Q Q Q F S S L D D K P Q F P G A
0.544 0.647 0.669 0.716 0.762 0.791 0.777 0.801 0.8 0.799 0.786 0.782 0.744 0.738 0.753 0.797 0.812 0.875 0.898 0.907 0.907 0.889 0.865 0.849 0.816 0.811 0.843 0.867 0.889 0.916 0.909 0.894 0.858 0.805 0.745 0.689 0.634 0.619 0.583 0.594 0.588 0.552 0.525

93
E
0.529

95 96
P H
0.542 0.549

340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
D D S S A Y R S V D E V N Y W
0.501 0.607 0.663 0.73 0.764 0.746 0.763 0.768 0.769 0.746 0.731 0.711 0.66 0.594 0.549

379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 499 400 401 402
E K A W R K Q S R R K V M E A F E Q A E R K P K
0.546 0.577 0.559 0.571 0.63 0.601 0.502 0.517 0.536 0.518 0.504 0.577 0.572 0.568 0.574 0.607 0.622 0.658 0.719 0.74 0.706 0.668 0.642 0.548


POODLE-S (which predicts short disordered regions) with the option "High B-Factor residues" predicted the disordered regions between the positions 6-9, 15-57, 93, 95-96, 340-354 and 379-402. This is also shown in the plot above.


POODLE-L

POODLE L BCKDHA.png

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
M A V A I A A A R V W R L N R G L S Q A A L L L L R Q P G A R G L A R S H P P R Q Q Q Q F S S L
0.516 0.518 0.517 0.521 0.526 0.538 0.543 0.55 0.562 0.574 0.58 0.587 0.594 0.606 0.613 0.618 0.622 0.626 0.632 0.642 0.652 0.666 0.674 0.68 0.682 0.684 0.685 0.683 0.679 0.675 0.672 0.668 0.663 0.657 0.648 0.642 0.637 0.634 0.628 0.619 0.61 0.601 0.588 0.575 0.558 0.542 0.521 0.598

369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 936 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 145 416 417 418 419 420 421 422 423 424 425 426 427 428
L S Q G W W D E E Q E K A W R K Q S R R K V M E A F E Q A E R K P K P N P N L L F S D V Y Q E M P A Q L R K Q Q E S L A
0.365 0.549 0.572 0.591 0.615 0.637 0.656 0.671 0.685 0.698 0.711 0.725 0.737 0.746 0.753 0.756 0.757 0.76 0.763 0.764 0.764 0.763 0.761 0.761 0.762 0.763 0.762 0.759 0.754 0.75 0.747 0.745 0.742 0.738 0.733 0.723 0.712 0.698 0.687 0.676 0.67 0.666 0.669 0.672 0.67 0.665 0.656 0.65 0.64 0.63 0.619 0.614 0.61 0.605 0.592 0.576 0.558 0.54 0.521 0.436


POODLE-L predicts two disordered regions which are longer than 40 amino acids.They are located between the positions 1-48 and 369-428.

POODLE-W

width=300px

The regions which could be disordered regions but poodle is not sure are bordered by blue squares and the disordered regions are bordered by red squares.

0=ordered regions
5=perhaps disordered regions
9=disordered regions


POODLE-I

POODLE I BCKDHA.png


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
M A V A I A A A R V W R L N R G L S Q A A L L L L R Q P G A R G L A R S H P P R Q Q Q Q F S S L D D K P Q F P G P G
0.516 0.518 0.517 0.521 0.526 0.538 0.543 0.55 0.562 0.574 0.58 0.587 0.594 0.606 0.613 0.618 0.622 0.626 0.632 0.642 0.652 0.666 0.674 0.68 0.682 0.684 0.685 0.683 0.679 0.675 0.672 0.668 0.663 0.657 0.648 0.642 0.637 0.634 0.628 0.619 0.61 0.601 0.588 0.575 0.558 0.542 0.521 0.598 0.661 0.725 0.686 0.637 0.602 0.577 0.57 0.534

341 342 343 344 345
D S S A Y
0.544 0.592 0.604 0.571 0.503

370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427
S Q G W W D E E Q E K A W R K Q S R R K V M E A F E Q A E R K P K P N P N L L F S D V Y Q E M P A Q L R K Q Q E S L
0.549 0.572 0.591 0.615 0.637 0.656 0.671 0.685 0.698 0.711 0.725 0.737 0.746 0.753 0.756 0.757 0.76 0.763 0.764 0.764 0.763 0.761 0.761 0.762 0.763 0.762 0.759 0.754 0.75 0.747 0.745 0.742 0.738 0.733 0.723 0.712 0.698 0.687 0.676 0.67 0.666 0.669 0.672 0.67 0.665 0.656 0.65 0.64 0.63 0.619 0.614 0.61 0.605 0.592 0.576 0.558 0.54 0.521

443 444 445
F D K
0.606 0.742 0.881


POODLE-I predicted the disordered regions between the positions 1-56, 341-345, 370-427 and 443-445.


Comparison

POODLE-S(Missing residues) POODLE-S(High B-factor residues) POODLE-L POODLE-W POODLE-I
1-56 6-9 1-48 325-345 1-56
341-345 15-57 369-428 341-345
420-423 93 370-427
95-96 443-445
340-354
379-402




IUPred

Basic information

author: Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon
year: 2005


IUPred predicts disordered regions by estimating the capacity of polypeptides to form stabilizing contacts. The potential to form these contacts depends on the surrounding sequence and on the chemical properties. This approach is based on the idea that disordered regions have no capacity to form sufficient interresidue interactions so that there is no stabilizing energy.


There are three different prediction types which can be chosen:
- long disorder
- short disorder
- structured regions


References

[IUPred server]
[Theory]


Prediction

Prediction type: long disorder

Long.png

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
L A R S H P P R Q Q Q Q F S S L D D
0.6043 0.5758 0.6851 0.7881 0.6851 0.6906 0.6661 0.6661 0.7415 0.7505 0.6136 0.7629 0.7982 0.7595 0.7595 0.7163 0.6948 0.5211

89 90 91 92 93
I N P S E
0.5254 0.6427 0.5493 0.5382 0.5951

385 386 387 388
Q S R R
0.5456 0.5176 0.5176 0.5017

390 391 392 393 394 395 396 397
V M E A F E Q A
0.5017 0.5017 0.5533 0.7209 0.7547 0.7755 0.6851 0.5992

399 400 401
R K P
0.5017 0.5176 0.5211

404 405 406 4407 408 409 410 411 412 413
N P N L L F S D V Y
0.5055 0.5807 0.6089 0.5707 0.6136 0.5176 0.5176 0.5176 0.5017 0.5176

420 421 422
L R K
0.5098 0.5254 0.5176

424 425 426 427 428
Q E S L A
0.5951 0.5854 0.5807 0.5296 0.5296

431
L
0.5533


When using the long disorder-tool of IUPred it predicts several disordered regions. They are located at the positions 33-50, 89-93, 385-388, 390-397, 399-401, 404-413, 420-422, 424-428 and on the position 431.


Detailed sequence with disordered region probability: File:LongSeqOut.pdf

Prediction type: short disorder

Short.png

1
M
0.5623

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
L A R s H P P R Q Q Q Q F S S L D D K P Q F P
0.5846 0.6756 0.7605 0.7688 0.7688 0.7688 0.6756 0.6827 0.7275 0.7232 0.7501 0.8311 0.7869 0.8158 0.8200 0.7817 0.7458 0.6789 0.6827 0.6035 0.5173 0.5253 0.5008

92 93
S E
0.5711 0.5473

393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411
A F E Q A E R K P K P N P N L L F S D
0.5514 0.5900 0.5992 0.6174 0.6293 0.5941 0.5667 0.5084 0.6124 0.5549 0.5008 0.5667 0.5802 0.5296 0.5802 0.5623 0.5846 0.5253

415
E
0.5008

420 421
L R
0.5296 0.5253

423 424 425
Q Q E
0.5126 0.5711 0.5008

427 428
L A
0.5374 0.5126

433
T
0.5084

438 439 440 441 442 443 444
Y P L D H F D K
0.5374 0.6035 0.6442 0.6827 0.7951 0.8158 0.8556 0.9257


When using the short disorder-tool of IUPred it predicts several disordered regions. They are located at the positions 1, 33-55, 92-93, 393-411, 415, 420-421, 423-425, 427-428, 433 and 438-444.


Detailed sequence with disordered region probability: File:ShortSeqOut.pdf

Prediction type: structured regions

Structural.png


With the option "structured regions" there was no prediction of disordered regions.
Only the command "Unkown globular domains: 1-445" appeared.

back to Maple syrup urine disease main page

3. Prediction of transmembrane alpha-helices and signal peptides

General

Transmembrane Topology

The prediction of the membrane topology of proteins aims at discovering which portions of the protein lie within the lipid bilayer of a membrane and which portions protrude from the membrane into the watery environment. Membrane spanning polypeptides usually form helices of about 20 amino acids length. As the surrounding membrane is hydrophobic, the membrane spanning part of the protein consists of hydrophobic amino acids as well. These information can be used for the prediction of transmembrane helices, which subsequently enables the prediction of the membrane topology. <ref> http://en.wikipedia.org/wiki/Membrane_topology</ref><ref>http://en.wikipedia.org/wiki/Transmembrane_domain</ref>

Prediction tools: TMHMM, OCTOPUS and SPOCTOPUS

Signal Peptides

Signal peptides are N-terminal sequence motifs directing proteins to their cellular destination, like secretory pathway, mitochondria and chloroplast. One example for a signal peptide is the secretory signal peptide (SP), which is an N-terminal peptide that is typically 15-30 amino acids long. There are three regions of a signal peptide: an N-terminal region (n-region) which is often built up by positively charged residues, a hydrophobic region (h-region) in the middle of at least six residues and a C-terminal region (c-region) of polar uncharged residues. In Eukaryotes the SP targets proteins across the endoplasmic reticulum, in prokaryotes across the plasma membrane. The SP is cleaved when the protein crosses the membrane.
Furthermore there exists chloroplast transit peptides (cTP) which are also N-terminal and are cleaved when the protein enters the choloplast. The most conserved site in cTPs is an Alanine directly after the N-terminal methionine... <ref>O. Emanuelsson, S. Brunak, G. von Heijne, H. Nielsen, "Location proteins in the cell unsing TargetP, SignalP and related tools", Nature Protocols, 2007</ref> Prediction tools: SignalP, TargetP

Combined transmembrane and signal peptide prediction

As the hydrophobic regions of a transmembrane helix and a signal peptide are highly similar, this leads to cross reaction between these two types of prediction. <ref>http://www.ebi.ac.uk/Tools/phobius/help.html</ref>

Prediction tools: Phobius and Polyphobius

In the following section different tools for predicting transmembrane helices and signal peptides are tested. As the BCKDHA protein isn't a transmembrane protein, additional proteins were used for the transmembrane and signal peptide analysis:

name organism location transmembrane protein signal peptide function reference
A4_HUMAN Human Cell membrane yes yes Protease Inhibitor P05067
BACR_HALSA Halobacterium salinarium Cell membrane yes no ion transport P02945
INSL5_HUMAN Human extracellular region no yes hormone Q9Y5Q6
LAMP1_HUMAN Human Cell membrane, Lysosome membrane, Endosome membrane yes yes Presents carbohydrate ligands to selectins P11279
RET4_HUMAN Human extracellular space no yes Transport P02753

TMHMM

Method

  • TMHMM was developed by Sonnhammer, Heijne and Krogh in 1998 <ref> E.L. Sonnhammer, Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc Int Conf Intell Syst Mol Biol.(1998)</ref>
  • TMHMM predicts transmembrane helices in proteins.
  • TMHMM is a membrane topology prediction method based on a hidden Markov model.

Execution

Before we could execute TMHMM we had to change all occurrences of "/usr/local/bin/" to "/usr/bin" in the following files: tmhmm, tmhmm.ORIG and tmhmmformat.pl

To execute the program we used these commands:

  • tmhmm P05067.fasta > tmhmm _out_P05067.txt
  • tmhmm P02945.fasta > tmhmm _out_P02945.txt
  • tmhmm Q9Y5Q6.fasta > tmhmm _out_Q9Y5Q6.txt
  • tmhmm P11279.fasta > tmhmm _out_P11279.txt
  • tmhmm P02753.fasta > tmhmm _out_P02753.txt
  • tmhmm P12694.fasta > tmhmm _out_P12694.txt

Results

BCKDHA

Position Membrane topology
1-445 outside

TMHMM predicted no membrane spanning region for the BCKDHA protein, which corresponds to the information provided in Uniprot.

Membrane topology of A4_HUMAN (source: Uniprot)

A4_HUMAN

Position Membrane topology
1-700 outside
701-723 TMhelix
724-770 inside

TMHMM predicted one transmembrane helix for the A4_HUMAN. This agrees with the Uniprot annotation. The predicted transmembrane helix begins at position 701 in the protein, whereas Uniprot states the Transmembrane regions goes from position 700-723. The extracellular region reported by Uniprot begins at position 18 in the sequence, this is due to a signal peptide. TMHMM doesn't include a signal peptide prediction, therefore it predicted the extracellular region from position 1-700.

Membrane topology of BACR_HALSA (source: Uniprot)

BACR_HALSA

Position Membrane topology
1-22 outside
23-42 TMhelix
43-54 inside
55-77 TMhelix
78-91 outside
92-114 TMhelix
115-120 inside
121-143 TMhelix
144-147 outside
148-170 TMhelix
171-189 inside
190-212 TMhelix
213-262 outside

The TMHMM prediction differs a little bit from the information provided in Uniprot. TMHMM predicted only 13 different domains of the protein (the end of the protein is predicted to be in the extracellular space), whereas in Uniprot 15 domains are reported (protein ends in cytoplasma).

INSL5_HUMAN

Membrane topology of INSL5_HUMAN (source: Uniprot)
Position Membrane topology
1-135 outside

The TMHMM prediction agrees with fact that INSL5_HUMAN is a hormone and therefore secreted in the extracellular region.

LAMP1_HUMAN

Membrane topology of LAMP1_HUMAN (source: Uniprot)
Position Membrane topology
1-10 inside
11-33 TMhelix
34-383 outside
384-406 TMhelix
407-417 inside

The prediction for LAMP1_HUMAN made by TMHMM does only partially agree with the Uniprot annotation. The sequence parts which form the signal peptide and lumenal domain are predicted to be another transmembrane helix and extracellular domain. The second transmembrane helix is predicted correctly.

RET4_HUMAN

Position Membrane topology
1-201 outside

The TMHMM prediction for RET4_HUMAN is correct, as RET4_HUMAN is a secreted protein and does not span any membrane.

Phobius and Polyphobius

Methods

  • Phobius was developed by Käll et al <ref>Käll et al., "A Combined Transmembrane Topology and Signal Peptide Prediction Method", Journal of Mol. Biology,338(5):1027-1036, 2004 </ref>
  • combined prediction of transmembrane regions and signal peptids
  • Required input information: only sequence in FASTA-Format (20 amino acids and B, Z, X are recognized)
  • As transmembrane topology and signal peptides are likely to be conserved during evolution, Polyphobius was established <ref>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information", Bioinformatics, 21 (Suppl 1):i251-i257, 2005</ref>, which includes information from homologous sequences to the query.
  • Required input: 2 Options: Query Sequence in FASTA-Format, which is then blasted agains uniprot_trembl or upload of an alignment in FASTA-Format which provides information about homologs.

Results

A4_HUMAN
Phobius Polyphobius
BCKDHA Phobius A4 HUMAN.png
sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 1 N-REGION
REGION 2 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 3 N-REGION
REGION 4 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

BCKDHA Polyphobius A4 HUMAN.png

Phobius and Polyphobius predicted the signal peptide and membrane topology for A4_HUMAN correctly.

BACR_HALSA
Phobius Polyphobius
BCKDHA Phobius BACR HALSA.png
sp|P02945|BACR_HALSA
TOPO_DOM 1 22 NON CYTOPLASMIC.
TRANSMEM 23 42
TOPO_DOM 43 53 CYTOPLASMIC.
TRANSMEM 54 76
TOPO_DOM 77 95 NON CYTOPLASMIC.
TRANSMEM 96 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 142
TOPO_DOM 143 147 NON CYTOPLASMIC.
TRANSMEM 148 169
TOPO_DOM 170 189 CYTOPLASMIC.
TRANSMEM 190 212
TOPO_DOM 213 217 NON CYTOPLASMIC.
TRANSMEM 218 237
TOPO_DOM 238 262 CYTOPLASMIC.

sp|P02945|BACR_HALSA
TOPO_DOM 1 21 NON CYTOPLASMIC.
TRANSMEM 22 43
TOPO_DOM 44 54 CYTOPLASMIC.
TRANSMEM 55 77
TOPO_DOM 78 94 NON CYTOPLASMIC.
TRANSMEM 95 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 141
TOPO_DOM 142 147 NON CYTOPLASMIC.
TRANSMEM 148 166
TOPO_DOM 167 186 CYTOPLASMIC.
TRANSMEM 187 205
TOPO_DOM 206 215 NON CYTOPLASMIC.
TRANSMEM 216 237
TOPO_DOM 238 262 CYTOPLASMIC.

BCKDHA Polyphobius BACR HALSA.png

The predictions of Phobius and Polyphobius differ only in a small difference in the length of the single domains. Both predictions of the membrane topology are correct.


INSL5_HUMAN
Phobius Polyphobius
BCKDHA Phobius INSL5 HUMAN.png
sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 5 N-REGION
REGION 6 17 H-REGION
REGION 18 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 4 N-REGION
REGION 5 16 H-REGION
REGION 17 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

BCKDHA Polyphobius INSL5 HUMAN.png

The Phobius and Polyphobius predictions for INSL5_HUMAN agree with the information given on Uniprot. They predicted correctly a signal peptide and only one extracellular region of the protein.

LAMP1_HUMAN
Phobius Polyphobius
BCKDHA Phobius LAMP1 HUMAN.png
sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 10 N-REGION
REGION 11 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 9 N-REGION
REGION 10 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

BCKDHA Polyphobius LAMP1 HUMAN.png

The signal peptide and membrane topology predictions made by Phobius and Polyphobius for LAMP1_HUMAN are correct.

RET4_HUMAN
Phobius Polyphobius
BCKDHA Phobius RET4 HUMAN.png
sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 2 N-REGION
REGION 3 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 3 N-REGION
REGION 4 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

BCKDHA Polyphobius RET4 HUMAN.png

The signal peptide of RET4_HUMAN was predicted correctly by Phobius and Polyphobius, as well as the one extracellular region of the protein.

For the BCKDHA-protein Phobius predicted a signal peptide with about 90% probability at the beginning of the sequence. The predicted signal peptide is 34 amino acids long. This matches the information given on Uniprot, which says, that BCKDHA contains a 45bp long signal peptide for the transfer into the mitochondrion. The rest of the amino acid is a non cytoplasmic protein sequence. No part of the protein is predicted to be transmembrane spanning. This is also true, as BCKDHA is a protein located in the mitochondrion matrix according to Uniprot.

BCKDHA
Phobius Polyphobius
Phobius BCKDHA.png
sp|P12694|ODBA_HUMAN (BCKDHA)
Signal 1 34
Region 1 16 N-Region
Region 17 25 H-Region
Region 26 34 C-Region
TOPO_DOM 35 445 non cytoplasmic

OBDA_HUMAN (BCKDHA)
TOPO_DOM 1 445 Non cytoplasmic

BCKDHA Polyphobius BCKDHA.png

Considering the information given on Uniprot, Polyphobius performed worse than Phobius on the BCKDHA-protein sequence. It predicted no signal sequence at the beginning of the protein sequence. There is a low probability for the amino acids between position 1-45 to be a signal sequence, but all in all the whole sequenc is predicted to be a non cytoplasmic protein.

OCTOPUS and SPOCTOPUS

Methods

  • OCTOPUS was developed by Viklund and Elofsson in 2008 <ref>Håkan Viklund and Arne Elofsson, "Improving topology prediction by two-track ANN-based preference scores and an extended topological grammar", Bioinformatics (2008)</ref>
  • OCTOPUS (obtainer of correct topologies for uncharacterized sequences) uses a combination of hidden Markov models and artificial neural networks.
  • It creates a sequence profile by doing a BLAST search to obtain homologous sequences. The profile is used as input for a neural network that predicts the probability for each residue to be located in a transmembrane(M), interface (I), close loop (L), or globular loop (G) environment as well as the preference to be inside (i) or outside (o) of the membrane. A hidden Markov model is used to calculate the most likely Protein Topology.
  • Required input: Protein Sequence in FASTA-Format
  • SPOCTOPUS (Viklund et al., 2008<ref>Viklund et al., "A combined predictor of signal peptides and membrane protein topology", Bioinformatics (2008)</ref>) is an extension of OCTOPUS which also predicts signal peptides. A neural network is used to predict a signal peptide preference score. The signal peptide's location is determined by a hidden Markov model. The output contains the information retrieved by OCTOPUS as well as the probabilty if a residue is predicted to be N-terminal of a signal peptide (n) or in a signal peptide (S).
  • Required input information: Protein sequence in FASTA-Format

Results

A4_HUMAN
OCTOPUS BCKDHA Octopus A4 HUMAN small.png
SPOCTOPUS BCKDHA Spoctopus A4 HUMAN small.png

OCTOPUS and SPOCTOPUS both predicted the membrane topology for A4_HUMAN. OCTOPUS also detected the signal peptide.

OCTOPUS BCKDHA Octopus BACR HALSA small.png
SPOCTOPUS BCKDHA Spoctopus BACR HALSA small.png

The predictions made by Phobius and Polyphobius for BACR_HALSA are identical and correct. No signal peptide was predicted, which agrees with the information given in Uniprot.

OCTOPUS BCKDHA Octopus INSL5 HUMAN small.png
SPOCTOPUS BCKDHA Spoctopus INSL5 HUMAN small.png

The signal peptide of INSL5_HUMAN was predicted correctly by SPOCTOPUS. OCTOPUS predicted for the same part of the sequence a cytosomal and a transmembrane domain.

OCTOPUS BCKDHA Octopus LAMP1 HUMAN small.png
SPOCTOPUS BCKDHA Spoctopus LAMP1 HUMAN small.png

The SPOCTOPUS for LAMP1_HUMAN agrees with the Uniprot annotation. OCTOPUS had difficulties at the beginnig of the sequence and predicted an additional inside region and transmembrane helix where the sequence contains a signal.

OCTOPUS BCKDHA Octopus RET4 HUMAN small.png
SPOCTOPUS BCKDHA Spoctopus RET4 HUMAN small.png

As with the LAMP1_HUMAN prediction, OCTOPUS made a wrong prediction for the part of the sequence which contains a signal peptide. The SPOCTOPUS prediction is correct.

OCTOPUS BCKDHA Octopus BCKDHA small.png
SPOCTOPUS BCKDHA Spoctopus BCKDHA small.png

The OCTOPUS and SPOCTOPUS predictions for the BCKDHA protein are compeltely contrary in terms of the intracellular and extracellular regions. But both predictions are wrong, as BCKDHA is no membran protein. Furthermore, SPOCTOPUS missed the 45bp long signal peptide at the beginning of the sequence.

SignalP

Method

  • SignalP was established by Nielsen et al. in 1997<ref>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites", Protein Engineering, 10:1-6, 1997</ref>
  • SignalP is neural network based. It identifies signal peptides and cleavage sites.


Execution

To run the command line SignalP tool, the path in the SignalP file had to be adapted to /apps/signalp-3.0

Following commands were used to execute SignalP:

  • signalp -t euk P05067.fasta > signalp_out_P05067.txt
  • signalp -t gram- P02945.fasta > signalp_out_P02945.txt
  • signalp -t euk Q9Y5Q6.fasta > signalp_out_Q9Y5Q6.txt
  • signalp -t euk P11279.fasta > signalp_out_P11279.txt
  • signalp -t euk P02753.fasta > signalp_out_P02753.txt
  • signalp -t euk P12694.fasta > signalp_out_P12694.txt


Results

BCKDHA

Both methods (NN and HMM) predicted the most likely cleavage site between positions 32 and 33 (ARG_LA).
This prediction does not agree with Uniprot, where a signal peptide from position 1-45 is listed.

A4_HUMAN

SignalP predicted with both methods a cleavage site between positions 17 and 18 with a high probability for a signal peptide.
SignalP predicted the prediction site for A4_HUMAN correct.

BACR_HALSA

Both methods (NN and HMM) predicted no cleavage site, and therefore no signal peptide, in the BACR_HALSA sequence.
This is also true according to Uniprot, where no signal peptide is stated.

INSL5_HUMAN

For the INSL5_HUMAN protein signalP detected a cleavage site between positions 22 and 23, which is due to a predicted signal peptide at the beginning of the sequence.
The signal peptidase I cleavage site was predicted correctly, as Uniprot states a signal peptide from positions 1-22.


LAMP1_HUMAN

SignalP predicted with both methods a cleavage site between positions 28 and 29, as there is a signal peptide detected.
The cleavage site prediction made by SignalP for LAMP1_HUMAN is correct. Uniprot shows a signal peptide for this protein which ranges from 1-28 in the sequence.

RET4_HUMAN

SignalP predicted a cleavage site with high probability between positions 18 and 19 in both the NN and the HMM method. This cleavage site is predicted to be after a signal peptide.
This prediction is correct according to Uniprot.

TargetP

Method

  • TargetP was developed by Emanuelsson et al. in 2002 <ref> Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", J. Mol. Biol., 200: 1005-1016, 2002</ref>
  • TargetP predicts the subcellular location of eukaryotic proteins. additionally: cleavage site predictions
  • This method is neural network based. The prediction is based on the N-terminal presequences: chloroplast transit peptide(cTP), mitochondiral targeting peptide (mTP) or secretory pathway signal peptide (SP)
  • Required input information: Sequence(s) in FASTA format, organism group

Results

The TargetP prediction results can be seen in the following table:
BCKDHA TargetP.PNG

The ODBA_HUMAN (BCKDHA) is predicted to be located in the mitochondrion, which is true according to Uniprot. All other tested proteins are predicted to be located in the secretory pathway and therefore to have a signal peptide. These predictions are true except for BACR_HALSA, which has no signal peptide. But here TargetP returns a reliabilty index of four, which indicates an unsafe prediction.


back to Maple syrup urine disease main page

4. Prediction of GO terms

The following section deals with GO term prediction tools. In order to verify the predictions, first the real GO annotations are presented: (P: Process, F: Function, C: Component)

BCKDHA

GO Term Name GO identifier Aspect
Process
metabolic process 0008152 P
branched chain family amino acid catabolic process 0009083 P
cellular nitrogen compound metabolic process 0034641 P
oxidation-reduction process 0055114 P
Function
alpha-ketoacid dehydrogenase activity 0003826 F
3-methyl-2-oxobutanoate dehydrogenase (2-methylpropanoyl-transferring) activity 0003863 F
protein binding 0005515 F
oxidoreductase activity 0016491 F
oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptors 0016624 F
carboxy-lyase activity 0016831 F
metal ion binding 0046872 F
Component
mitochondrion 0005739 C
mitochondrial matrix 0005739 C
mitochondrial alpha-ketoglutarate dehydrogenase complex 0005947 C

A4_HUMAN

GO Term Name GO identifier Aspect
Process
G2 phase of mitotic cell cycle 0000085 P
suckling behaviour 0001967 P
plantelet degranulation 0002576 P
mRNA polyadenylation 0006378 P
regulation of translation 0006417 P
protein phosphorylation 0006468 P
cellular copper ion homeostasis 0006878 P
endocytosis 0006897 P
apoptosis 0006915 P
induction of apoptosis 0006917 P
cell adhesion 0007155 P
regulation of epidermal growth factor receptor activity 0007176 P
Notch signaling pathway 0007219 P
axonogenesis 0007409 P
blood coagulation 0007596 P
mating bahavior 0007617 P
locomotory behavior 0007626 P
axon cargo transport 0008088 P
cell death 0008219 P
adult locomotory behavior 0008344 P
visual learning 0008542 P
negative regulation of peptidase activity 0010466 P
positive regulation of peptidase activity 0010951 P
axon midline choice point recognition 0016199 P
neuron remodeling 0016322 P
dendrite development 0016358 P
platelet activation 0030168 P
extracellular matrix organization 0030198 P
forebrain development 0030900 P
neuron projection development 0031175 P
ionotropic glutamate recptor signaling pathway 0035235 P
regulation of multicellular organism growth 0040014 P
innate immune response 0045087 P
negative regulation of neuron differentiation 0045665 P
positive regulation of mitotic cell cycle 0045931 P
positive regulation of transcription from RNA polymerase II promotor 0045944 P
collateral sprouting in absence of injury 0048699 P
regulation of synapse structure and activity 0050803 P
neuromuscular process controling balance 0050885 P
synaptic growth at neuromuscular junction 0051124 P
neuron apoptosis 0051402 P
smooth endoplasmic reticulum calcium ion homeostasis 0051563 P
Function
DNA binding 0003677 F
serine-type endopeptidase inhibitor activity 0004867 F
receptor binding 0005102 F
binding 0005488 F
protein binding 0005515 F
heparin binding 0008201 F
peptidase activator activity 0016504 F
peptidase inhibitor activity 0030414 F
acetylcholine receptor binding 0033130 F
identical protein binding 0042802 F
metal ion binding 0046872 F
PTB domain binding 0051425 F
Component
exracellular region 0005576 C
membrane fraction 0005624 C
cytoplasm 0005737 C
Golgi apparatus 0005794 C
plasma membrane 0005886 C
integral to plasma membrane 0005887 C
coated pit 0005905 C
cell surface 0009986 C
membrane 0016020 C
integral to membrane 0016021 C
synaptosome 0019717 C
axon 0030424 C
plantelet alpha granule lumen 0031093 C
cytoplasmic vesicle 0031410 C
neuromuscular junction 0031594 C
ciliary rootlet 0035253 C
neuron projection 0042005 C
dendritic spine 0043197 C
dendritic shaft 0043198 C
intracellular membrane-bounded organelle 0043231 C
apical part of cell 0045177 C
synapse 0045202 C
perinuclear region of cytoplasm 0048471 C
spindle midzone 0051233 C


BACR_HALSA

INSL5_HUMAN

LAMP1_HUMAN

RET4_HUMAN

GO Term Name GO identifier Aspect
Process
eye development 0001654 P
gluconeogenesis 0006094 P
transport 0006810 P
spermatogenesis 0007283 P
heart development 0007507 P
visual perception 0007601 P
male gonad development 0008584 P
embryo development 0009790 P
maintenance of gastrointestinal epithelium 0030277 P
lung development 0030324 P
positive regulation of insulin secretion 0033024 P
response to retinoic acid 0032526 P
response to insulin stimulis 0032868 P
retinol transport 0034633 P
retinol metabolic process 0042572 P
retinal metabolic process 0042574 P
glucose homeostasis 0042593 P
response to ethanol 0045471 P
embryonic organ morphogenesis 0048562 P
embryonic skeletal system development 0048706 P
cardiac muscle tissue development 0048738 P
female genitalia morphogenesis 0048807 P
response to stimulus 0050896 P
detection of light stimulus involved in visual perception 0050908 P
positive regulation of immunoglobin secretion 0051024 P
retina development in camera-type eye 0060041 P
negative regulation of cardiac muscle cell proliferation 0060044 P
embryonic retina morphogenesis in camera-type eye 0060059 P
uterus development 0060065 P
vagina development 0060068 P
urinary bladder development 0060157 P
heart trabecula formation 0060347 P
Function
transporter activity 0005215 F
binding 0005488 F
retinoid binding 0005501 F
protein binding 0005515 F
retinal binding 0016918 F
retinol binding 0019841 F
retinol transporter activity 0034632 F
Component
extracellular region 0005576 C
extracellular space 0005615 C

GOPET

Method

  • GOPET (Gene Ontology Term Prediction and Evaluation Tool) was described by Vinayagam et al.<ref> Arunachalam Vinayagam, Coral Del Val, Falk Schubert, Roland Eils, Karl-Heinz Glatting, Sándor Suhai, Rainer König, "GOPET: A tool for automated predictions of Gene Ontology terms", BMC Bioinformatics (2006), Volume: 7, Issue: 161, Publisher: BioMed Central, Pages: 161</ref>
  • GOPET is a complete automated to for assigning molecular function terms to a given sequence.
  • Required input information: cDNA or protein sequence
  • Gene Ontology is used for annotation terms, GO-mapped protein databases for performing homology searches and Support Vector Machines for the prediction and the assignment of confidence values.
  • The prediction is organism independent.

Results

BCKDHA

GOid Aspect Confidence GOTerm
GO:0003824 F 97% catalytic activity
Go:0016491 F 96% oxidoreductase activity
GO:0016624 F 95% oxidoredusctase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863 F 90% 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739 F 89% pyruvate dehydrogenase acetyl-transferring activity
GO:0004738 F 78% pyruvat dehydrogenase activity
GO:0003826 F 77% alpha-ketoacid dehydrogenase activity
GO:0047101 F 75% 2-oxoisovalerate dehydrogenase acylting activity
GO:0008677 F 65% 2-dehydropantoate 2-reductase activity
GO:0019152 F 63% acetoin dehydrogenase activity
GO:0030955 F 63% potassium ion binding
GO:0016616 F 62% oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872 F 62% metal ion binding


A4_HUMAN

GOid Aspect Confidence GOTerm
GO:0004866 F 87% endopeptidase inhibitor activity
GO:0004867 F 86% serine-type endopeptidase inhibitor activity
GO:0030568 F 83% plasmin inhibitor activity
GO:0030304 F 83% trypsin inhibitor activity
GO:0030414 F 82% peptidase inhibitor activity
GO:0005488 F 79% binding
GO:0005515 F 74% protein binding
GO:0046872 F 73% metal ion binding
GO:0003677 F 71% DNA binding
GO:0008201 F 70% heparin binding
GO:0008270 F 69% zinc ion binding
GO:0005507 F 69% copper ion binding
GO:0005506 F 67% iron ion binding


BACR_HALSA

GOid Aspect Confidence GOterm
GO:0005216 F 77% ion channel activiy
GO:0008020 F 75% G-protein coupled photoreceptor activity
GO:0015078 F 60% hydrogen ion transmembrane transporter activity


INSL5_HUMAN

GOid Aspect Confidence GOterm
GO:0005179 F 80% hormone activity


LAMP1_HUMAN

GOid Aspect Confidence GOterm
GO:0004812 F 60% aminoacyl-tRNA ligase activity
GO:0005524 F 60% ATP binding


RET4_HUMAN

GOid Aspect Confidence GOterm
GO:0005488 F 90% binding
GO:0005501 F 81% retinoid binding
GO:0008289 F 80% lipid binding
GO:0019841 F 78% retinol binding
GO:0005215 F 78% transporter activity
GO:0016918 F 78% retinal binding
GO:0005319 F 69% lipid transporter activity
GO:0008035 F 60% high-density lipoprotein particle binding

Pfam

Method

  • Pfam was established by Finn et al. in 2008. It is described in <ref>Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8</ref>

Results

Query Cellular Component Molecular function Biological Process
BCKDHA GO:0016624 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor) GO:0008152 (metabolic process)
A4_HUMAN GO:0016021 (integral to membrane) GO:0005488 (binding)
BACR_HALSA GO:0016020 (membrane) GO:0005216 (ion channel activity) GO: 0006811 (ion transport)
INSL5_HUMAN GO:0005576 (extracellular region) GO:0005179 (hormone activity)
LAMP1_HUMAN GO:0016020 (membrane)
RET4_HUMAN GO:0005488 (binding)

ProtFun 2.2

Method

  • ProtFun is described in : Jensen et al.<ref>Prediction of human protein function from post-translational modifications and localization features.

L. Juhl Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Stærfeldt, K. Rapacki, C. Workman, C. A. F. Andersen, S. Knudsen, A. Krogh, A. Valencia and S. Brunak. J. Mol. Biol., 319:1257-1265, 2002</ref>

  • ProtFun is an ab initio prediction server of protein function from sequence. Various servers are queried and the provided information is integrated into the final prediciton.

Results

BCKDHA BCKDHA ProtFun BCKDHA.png

A4_HUMAN BCKDHA ProtFun A4 Human.png

BACR_HALSA BCKDHA ProtFun BACR HALSA.png

INSL5_HUMAN BCKDHA ProtFun INSL5 Human.png


LAMP1_HUMAN BCKDHA ProtFun LAMP1 Human.png


RET4_HUMAN BCKDHA ProtFun RET4 Human.png

References

<references />


back to Maple syrup urine disease main page