Difference between revisions of "Secondary Structure Prediction BCKDHA"

From Bioinformatikpedia
(General Information)
(Basic information)
Line 26: Line 26:
 
<br>
 
<br>
   
PSIPRED uses neuronal networks which has a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure.
+
PSIPRED uses neuronal networks which have a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure.
 
To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data. <br>
 
To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data. <br>
 
For the online prediction on the server it is enough to enter a amino acid sequence.
 
For the online prediction on the server it is enough to enter a amino acid sequence.
 
Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.<br>
 
Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.<br>
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secundary structure prediction is filtered.<br><br>
+
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secondary structure prediction is filtered.<br><br>
   
 
There are three different options: <br>
 
There are three different options: <br>

Revision as of 16:08, 24 August 2011

1. Secondary structure prediction

General Information

The secondary structure of a protein bases on the primary structure and consists of alpha-helices, beta-sheets and coils.

alpha-helices

Figure1: alpha-helix

Alpha-helices (Figure 1) are built by H-bounds between the NH-group of an amino acid and the CO-group of the amino acid which is placed four recidues earlier (i+4). This form of the alhpa-helix is the most common one. There are two other types of alpha-helices which are very rare. One is called 3,10-helices because the H-bound is between the NH-group and the CO-group three recidues earlier (i+3). The other one is the Phi-helix and here the H-bound is between the NH-group and the CO-group five residues earlier (i+5). The different locations of the CO-group influence the width and the height of the helices.

beta-sheets

Figure2: beta-sheet

The H-bounds (Figure 2) between the CO-group and the NH-group which build a beta-sheet can be located far away from each other in the sequence.
There are two different kinds of beta-sheets. The parallel one where the sheets all point in the same direction and the anti-parallel ones where the sheets point alternately in different directions.

coils

Coils are irregular formed elements like turns.

PSIPRED

Basic information

author: David T. Jones (University College London)
year:1998
version: 2

PSIPRED uses neuronal networks which have a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure. To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data.
For the online prediction on the server it is enough to enter a amino acid sequence. Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secondary structure prediction is filtered.

There are three different options:
- Mask low complexity regions
- Mask transmembrane helices
- Mask coiled-coil regions

References

[PSIPRED Server]
[Overview of prediction methods]
[History of the PSIPRED]

Prediction

Figure3: Visualization of the prediction of PSIPRED
Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred      CHHHHHHHHHHHHHHHCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCC
UniProt                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred      CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHH
UniProt             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCHHHHHHHHHHCCCC
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred      CCEEECCCCHHHHHHHCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCC
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred      CCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHH
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHHHHCCCEEEEEECCCCCCCCCCCHHCCCCHHHHHCCCCCCCCCEECC
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred      HHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHH
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHHCCCCHHHHHHHHHHCCCCC HHHHHHHHHHHHHHHHHHHHHHHC
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred      CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH  


PSIPRED has predicted 23 coils, 16 alpha helices and 6 beta sheets as it is shown in the alignment above. In ( Figure 3) these predictions are visualized by pink bars which stand for the beta sheets and yellow arrows which symbolize the beta sheets. PSIPRED does not mark coils with a special figure which means that when there is wether a bar nor a arrow than there is a coil.
As it is shown in the alignment of prediction and real secondary structure of UniProt the prediction is completely wrong in the beginning. In the middle part it become better but still there are many mistakes. It seems that PSIPRED has more problems with beta sheets than with alpha helices because it predicts more beta sheets which do not exists or misses existing beta sheets than alpha helices. In most of the cases it predicts the alpha helices quite good. By comparing with the structure of UniProt it can be seen that especially the long alpha helices are correct predicted. Except of one long region in the middle of the sequence which should be a long beta sheet but is predicted as a alpha helix.

Jpred3

Basic information

author: Cole C, Barber JD & Barton GJ (Bioinformatics and Computational Biology Research, University of Dundee)
year: 1998
version: 3


Jpred is using a neuronal network to make the predictions. To predict the secondary structure of a protein sequence or of a multiple alignment of protein sequences the algorithm Jnet is used. The prediction accuracy for secondary strctures lies above 81%. Additionally Jpred makes predictions about the solvent accessibility.
Jpred3 needs a protein sequence or multiple alignment of protein sequences as input.
It is important that the target sequence is the first sequence in the multiple alignment since the alignment is modified so that the first sequence do not have any gaps. The alignemt has to be in the MSF or in the BLC format.

References

Jpred3 Server
About Jpred
FAQ


Prediction

By predicting the secondary structure of BCKDHA with JPred it found many hits with very good e-values in other proteins.

e-value=0.0
2bew, 2bev, 2beu, 1x80, 1wci, 1u5b, 1olx, 1ols, 1dtw, 1x7y, 1x7z, 1x7x, 1x7w, 2j9f, 2bff, 1v1r, 1olu, 1v16, 1v11, 2bfc, 2bfb, 1v1m, 2bfd, 2bfe

e-value=6e-58
1umd, 1umc, 1umb, 1um9

e-value=1e-57
2bp7, 1qs0, 1w85, 3dva, 1w88


Figure4: Visualization of the prediction of JPred (alpha helices: red bars; beta sheets: green arrows)
first line: prediction; second and third line: confidence of the prediction

With these hits JPred run the prediction:

Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred        HHHHHHHHHHHHHH                 EEE              
Conf      10090009999980000000323546777770000303566666777777
UniProd                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred                                 EEEEE                HH
Conf      77777777777777654567777777308885377740467787776368
UniProd             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHH     E      HHHHHHHHHHH
Conf      99999999999999999999875045000001677517899999885278
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred        EEEE    HHHHHHHH  HHHHHHHHH
Conf      84465157745788885065689988740677754577777545677777
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred                   HHHHHHHHHHHH     EEEEEE      HHHHHHHH
Conf      64132147888770367889998750688558887407887468999999
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHH     EEEEEEE                 HHHHHHH   EEEEE
Conf      87500888606888703677777777777764067777005725774078
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred        HHHHHHHHHHHHHHHHH    EEEEEEEEEE              HHH
Conf      74689999999999988507985588886354067777777765553688
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH
Conf      99998468758999999986068866899999999999999999988606
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred          HHHHHHH      HHHHHHHHHHHHHHHH
Conf      887368777523688756899999999999875267777777888
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH   


By comparing the prediction of the secondary structure of Jpred and the secondary structure of BCKDHA in UniProt as it is done in the alignment above it is remarkable that in the beginning the prediction differs a lot from UniProt but in the middle and in the end it becomes much better. Jpred predicts more helices and less beta sheets than there are in the UniProt secondary structure. It is interesting that althougth there are no alpha helices in the beginning Jpred predicts them with a quite high confidence. This high confidence can also be seen very good in the visualization of the predition ( Figure 4) where it is displayed by black bars. There is one part in the middle of the sequence where it predicts a very long alpha helix but it should be a beta sheet. It is interesting that PSIPRED also had problems with this beta sheet. In the rest of the middle part the prediction of Jpred is quite correct except for a few positions. ( Figure 4) underlines that the protein mainly consists of alpha helices since there are mainly red bars shown.

DSSP

Basic information

author: Wolfgang Kabsch and Chris Sander (Max-Planck-Institut fürmedizinische Forschung, Heidelberg)
year: 1983
whole name: Define Secondary Structure of Proteins

Based on atomic coordinates in Protein Data Bank format, DSSP defines the secondary structure of a protein.
With this method the secondary structure is not predicted but determined from the 3D coordinates.


Referencse

[Introduction]
[Explanation ]


Prediction

Figure5: Visualization of the prediction of DSSP.
Seq     KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMT
Pred        TT       T        TT T    T  TTT  T 333     HHHHHHHHHHHH
UniProt                            EEEEE                HH

Seq     LLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYP
Pred    HHHHHHHHHHHHHHTTTTT     TT HHHHHHHHHTT TTTSSS  TT HHHHHHTT
UniProt HHHHHHHHHHHHHH     E      HHHHHHHHHHH     EEEE    HHHHHHHH  

Seq     LELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANR
Pred    HHHHHHHHHT TT TTTT T TT    TTTT     TTTTTHHHHHHHHHHHHHHTT
UniProt HHHHHHHHH                                  HHHHHHHHHHHH

Seq     VVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPG
Pred     SSSSSSTT333THHHHHHHHHHHHTT  SSSSSSS TSSTTSS333T TTTTT333T33
UniProt EEEEEE      HHHHHHHHHHHH     EEEEEEE                 HHHHHHH

Seq     YGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYR
Pred    3T SSSSSSTT HHHHHHHHHHHHHHHHHHT  SSSSSS    T TTTT  333T
UniProt    EEEEE    HHHHHHHHHHHHHHHHH    EEEEEEEEEE             

Seq     VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFS
Pred     HHHHHHT HHHHHHHHHHHHTT  HHHHHHHHHHHHHHHHHHHHHHHHT    3333TT
UniProt HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH     HHHHHH
 
Seq     DVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred    TTTTT  HHHHHHHHHHHHHHHHH333T 333
UniProt H      HHHHHHHHHHHHHHHH


Description of the visualization of the prediction
It is important to see the first 50 amino acids of the sequence are not shown. And that the important part for our protein ends on position 391.
1. line: Sequence
2. line: structral elements
3. line: if a residue is involved in symmetrie contacts it is labeled with a star
4. line: if a residue is solvent accessible it is labeled with an "A"

Letter code for the secundary structure elements:

  • H (blue): alpha
  • 3 (yellow): residue in isolated beta-bridge
  • T (red): hydrogen bonded turn
  • S (green): bend

As we can see by the comparison of the predicted structure with the structure of BCKDHA of UniProt they match to a large extent. Especially the alpha helices are assigend mainly correct. As it is shown in (Figure 5) by the blue regions the protein mainly consists of alpha helices so most of the prediction is exact. DSSP has some problems to assign beta sheets which arise from the comparison of the prediction with the UniProt structure.
DSSP offers much more information than the two other tools, since it does not only predict alpha helices, beta sheets and turns but also symmetrie contacts and solvent accessibility.

2. Prediction of disordered regions

General information

Disordered regions are long regions which do not have a regular secondary structure. They are dynamically flexible and have only a regular structure when they bind to another substrate or protein. In these regions polar and charged amino acid and especially proline are overrepresentated. The disordered regions are conserved and obtain mainly in regions which have a regulatory function. Since disordered regions have no clear secondary structure they also have no tertiary structure.


DISOPRED

Basic information

author: Jonathan J. Ward, Liam J. McGuffin, Kevin Bryson, Bernard F. Buxton and David T. Jones (University College London)
year: 2004
version: 2

DISOPRED2 identifies disordered regions by searching residues which appear in the sequence records but have no co-ordinates in the electron density map. This is a very simple method to find disordered regions because the absence of co-ordinates can also be explained with artifacts of the crystalization process.

References

Publication
DISOPRED server
Information

Prediction

Figure6: Prediction of the disordered regions
  
Figure7: Profile plot of the disordered regions

In the first line the confidence of the prediction which is shown in the second line is denoted. The prediction of a disordered region is marked with an asterisk (*). All of the disordered regions are predicted with a very high confidence.
DISOPRED predicts disordered regions mainly in the beginning and a few in the end of BCKDHA as it is shown in Figure 6 by the red fields.
Also Figure 7 on the right side points out that the disordered regions are in the beginning and in the end since at these two sides there are the highest peaks.

POODLE

Basic information

POODLE uses machine learning approaches to predict the disordered regions of an amino acid sequence.

author:
- POODLE-L S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi
- POODLE-S K. Shimizu, Y. Muraoka, S. Hirose, and T. Noguchi
- POODLE-W K. Shimizu, Y. Muraoka, S. Hirose, K. Tomii and T. Noguchi
- POODLE-I S.Hirose, K.Shimizu, N.Inoue, S.Kanai and T.Noguchi

year:
- POODLE-L 2007
- POODLE-S 2007
- POODLE-W 2007
- POODLE-I 2008

options:
POODLE-L: This tool searches for disordered regions which are longer than 40 consecutive amino acids.
POODLE-S: Here the focus lies on predicting short disordered regions. There are two different subtools: "Missing residues" and "High B-factor residues"
POODLE-W: With this option the proteins which are mostly disordered can be found.
POODLE-I: In this tool the other three tools are combined. POODLE-I also uses structural information to predict disordered regions. It bases on a work-flow approach.


References

[POODLE-L]
[POODLE-S]
[POODLE-W]
[POODLE-I]
[POODLE server]
[Help]


Prediction

POODLE-S

POODLE-S
Missing residues
POODLE-S
High B-factor residues
Figure8: POODLE-S (Missing residues): disordered regions prediction
Figure9: POODLE-S (High B-factor residues): disordered regions prediction


POODLE-S
Missing residues
POODLE-S
High B-factor residues
disordered region 1-56 341-345 420-423 6-9 15-57 93 95-96 340-354 379-402
average confidence 0.75 0.58 0.56 0.63 0.77 0.53 0.55 0.67 0.59


POODLE-S (which predicts short disordered regions) with the option "Missing residues" predicted the disordered regions between the positions 1-56, 341-345 and 420-423. This is also shown in Figure 8. The peaks which are over the cut-off value of 0.5 in the green region stay for the disordered regions. In the beginning there is a very high and also very long peak. Because of this it is clear that the tool predicts with a very high confidence that there is a long region with no fixed structure in the beginning of the protein. The average confidence of 0.75 can also be seen in the table under the figures. The other two numbers in this table point out that the predictions of the two disordered regions in the end of the protein do not have a very high confidence. We also ran the prediction with POODLE-S with the option "High B-Factor residues". Here the prediction was that there are disordered regions between the positions 6-9, 15-57, 93, 95-96, 340-354 and 379-402. This is also shown in Figure 9.This option predicts more regions with no fixed structure but as in the option "Missing residues" they are in the beginning and in the end of the protein. By comparing Figure 9 with Figure 8 it can be noticed that the predictions in the end are done with more confidence in the second run with "High B-Factor residues". The peaks are much higher and also longer which shows that the predicted disorderes regions are longer.
In both runs POODLE-S has much variation in the middle part of the protein between the peaks. There are always small peaks but they are not high enough to come over the cut-off value.

POODLE-L

Figure10: POODLE-L : prediction of disordered regions


disordered region 1-48 369-428
average confidence 0.6 0.67


POODLE-L predicts two disordered regions which are longer than 40 amino acids. They are located between the positions 1-48 and 369-428. By looking at Figure 10 we can see that the predictions are in the beginning and in the end of the protein. But both of the predictions only have low peaks so POOLDE-L is not completely confident about the prediction. This observation is supported by the average confidence values of 0.6 and 0.67. This can be explained by the fact that POODLE-L searchs long disordered regions and perhaps the length of the two regions of about 40 amino acids is too short to be a very good match.
Since POODLE-L only looks for long disordered regions it is sure that the rest of the protein does not have any disordered regions. This observation is supported by Figure 10 because we can see that there are no small peaks in the middle of the plot.

POODLE-W

Figure11: POODLE-W: prediction of disordered regions



The regions which could be disordered regions but poodle is not sure are bordered by blue squares and the disordered regions are bordered by red squares in Figure 11.

0=ordered regions
5=perhaps disordered regions
9=disordered regions

In this case there is no predited disordered region in the beginning of the protein which is completely different to the other two tools of POODLE we already used. Instead the prediction of the disordered region in the end is very good which means that the confidence is high and the space which is predicted to be disordered is very long and reachs till the end of the protein. The first part of the disordered region has no high assurance. But the major part of the match is assigned with the highest possible confidence of 9 which can be seen in Figure 11 by the red box.

POODLE-I

Figure12: POODLE-I: prediction of disordered regions


disordered region 1-56 341-345 370-427 443-445
average confidence 0.6 0.56 0.67 0.74


POODLE-I predicted four disordered regions between the positions 1-56, 341-345, 370-427 and 443-445. These predictions are shown in Figure 12 where we can see that they are in the beginning and in the end of the protein. The peak in the beginning is quite long but in the middle of the peak it falls very low so that it is nearly under the cut-off value. That is why the average value is also low. But we can see in the plot (Figure 12) that there are two maximum confidence values for this peak and they are both around 0.7 which underlines that the prediction is quite sure. The next peak is very short and also has a bad average confidence of 0.56 so it seems that POODLE-I is not sure about the prediction. The third peak is longer than the other peaks and alo have a good average confidence value of 0.67. The prediction of the peak directly on the end of the protein has the highest value but that is comprehensible since the structure is always less defined on the end of a protein. So we have to be carefull with this hit because it also can be wrong.
Between the predicted regions there are also many small peaks which are not high enough to come over the threshold.

Comparison

POODLE-S(Missing residues) POODLE-S(High B-factor residues) POODLE-L POODLE-W POODLE-I
1-56 6-9 1-48 325-445 1-56
341-345 15-57 369-428 341-345
420-423 93 370-427
95-96 443-445
340-354
379-402

By comparing all the several tools of POODLE we can summarize that mainly the disordered regions are in the beginning or in the end of the protein. Only POODLE-S predicts them in the middle of the protein but here the regions are so short and the confidence is so low that it is not sure if that are really disordered regions. The predicted disordered regions are mainly between position 1-56 and 341-445. The fact that the disordered regions are in the beginning and in the end of the protein is obvious, since in these regions the structure is always not very good defined. So such a hit can also be wrong just because of the bad definition of the secondary structure.


IUPred

Basic information

author: Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon
year: 2005

IUPred predicts disordered regions by estimating the capacity of polypeptides to form stabilizing contacts. The potential to form these contacts depends on the surrounding sequence and on the chemical properties. This approach is based on the idea that disordered regions have no capacity to form sufficient interresidue interactions so that there is no stabilizing energy.

There are three different prediction types which can be chosen:

  • long disorder: predicts context-independent global disorder that encompasses at least 30 consecutive residues of predicted disorder
  • short disorder: predicting short, probably context-dependent, disordered regions, such as missing residues in the X-ray structure of an otherwise globular protein
  • structured regions: takes the energy profile and finds continuous regions confidently predicted ordered
References

[IUPred server]
[Theory]

Prediction

Prediction type: long disorder
Figure13: Prediction of disordered regions with IUPred(long disorder)


disordered region 33-50 89-93 385-388 390-397 399-401 404-413 420-422 424-428 431
average confidence 0.69 0.57 0.52 0.64 0.51 0.55 0.52 0.56 0.55


When using the long disorder-tool of IUPred it predicts several disordered regions. They are located at the positions 33-50, 89-93, 385-388, 390-397, 399-401, 404-413, 420-422, 424-428 and on the position 431. Although there are many different regions they are all located in the beginning or in the end of the protein. By looking on Figure 13 it strikes out that mainly the peak in the beginning has a high confidence. Since this hit is quite long there are also regions which don't have a high confidence that's why the average value is only 0.69 which is anyway quite good. The second peak is very short and additionally has not a very high confidence so it is not sure wether this is a real hit. In the end there are many predicted disordered regions but except of one the prediction of all of them is quite unsure since the confidence value is only a bit over the cut-off value.
Between all these predicted disordered regions there are many peaks which areonly a bit under the threshold. By looking at Figure 13 the whole protein except of the middle part could be part of a disordered region.

Prediction type: short disorder
Figure14: Prediction of disordered regions with IUPred (short disorder)


disordered region 1 33-55 92-93 393-411 415 420-421 423-425 427-428 433 438-445
average confidence 0.56 0.7 0.56 0.57 0.5 0.53 0.53 0.53 0.51 0.73



When using the short disorder-tool of IUPred it predicts several disordered regions. They are located at the positions 1, 33-55, 92-93, 393-411, 415, 420-421, 423-425, 427-428, 433 and 438-445. The hit on position 1 can be neglected because it is just one residue long and the confidence value is only 0.56. But the next predicted disordered region seems to be important because it is about 20 residues long and the average confidence value is 0.7. Figure 14 shows that it is 0.7 because the peak is so long. The maximum confidence value of this region is about 0.8 which signals the high confidence of this disordered region. The next hit is only two residues long and has a maximum value of 0.57. Since it is so short it can also be neglected. After these predictions in the beginning of the protein there are many very short regions in the end of the protein. All of them are only about one or two residues long except of the last predicted region. There are two possibilities for these short regions. Either they are declared as too short so that they are no true disordered regions which is supported by the low confidence values or we say that they have to be combined to one long disordered region. The second possibility is supported by the fact that all these short regions are next to each other. Since all the other programms also predicted a disordered region in the end of the protein we decide to take the second posibility. The last hit is the most significant one. Indeed it is only eigth residues long but the average confidence value is 0.73 and the maximum value is higher than 0.9. It is obcious that there is such a clear prediction for a disordered region in the end of the protein because this part of a protein normally has no defined fixed structure but although there is no defined secondary structure it is not said that there is no function.

Prediction type: structured regions
Figure15: Prediction of disordered regions with IUPred(structured regions)


By analyzing the secondary structure with the option "structure regions" the programm could not find any disordered regions in the whole protein and only has as output the information that "Unkown globular domains: 1-445" and Figure 15.

META-Disorder

To run META-Disorder we used the tool of PredictProtein Server. <ref> https://www.predictprotein.org/ </ref>

Prediction
Figure 15: in the picture the output of PredictProtein is shown. In the red box the prediction of the disordered regions is shown. Disordered regions are red and non-disordered regions are green. So it is obvious that the disordered regions are in the beginning and in the end of the sequence.


disordered region 1-9 394-400
average confidence 0.63 0.57

By predicting the disordered regions with META-DISORDER we only got two regions as possible disordered region. This can be seen in the Figure 15 where only the beginning and the end of the strand is red and the rest is green. Also the table show the the regions are completely in the beginning and in the end. Since there are no other possible disordered regions in this prediction and the fact that the green part seems to be clearly not disordered indicates that these two hits could be wrong. It is not said but in gerenall these regions of a protein have no very good defined structure although they have a function.

Comparison

By comparing the results of all disordered region prediction tools we can see that all of them predicted disordered regions in the beginning and in the end of the protein. With these results we have to be carefully because in these regions the structure of a protein is always not very well defined. So the hit can arose because of the bad definition of the secondary structure in these regions. But we also have to see that all of the programms predicted these regions and most of them with a high assurance. So perhaps it has to be considered that the beginning and the end of BCKDHA can be disordered regions.

3. Prediction of transmembrane alpha-helices and signal peptides

General

Transmembrane Topology

The prediction of the membrane topology of proteins aims at discovering which portions of the protein lie within the lipid bilayer of a membrane and which portions protrude from the membrane into the watery environment. Membrane spanning polypeptides usually form helices of about 20 amino acids length. As the surrounding membrane is hydrophobic, the membrane spanning part of the protein consists of hydrophobic amino acids as well. These information can be used for the prediction of transmembrane helices, which subsequently enables the prediction of the membrane topology. <ref> http://en.wikipedia.org/wiki/Membrane_topology</ref><ref>http://en.wikipedia.org/wiki/Transmembrane_domain</ref>

Prediction tools: TMHMM, OCTOPUS and SPOCTOPUS

Signal Peptides

Signal peptides are N-terminal sequence motifs directing proteins to their cellular destination, like secretory pathway, mitochondria and chloroplast. One example for a signal peptide is the secretory signal peptide (SP), which is an N-terminal peptide that is typically 15-30 amino acids long. There are three regions of a signal peptide: an N-terminal region (n-region) which is often built up by positively charged residues, a hydrophobic region (h-region) in the middle of at least six residues and a C-terminal region (c-region) of polar uncharged residues. In Eukaryotes the SP targets proteins across the endoplasmic reticulum, in prokaryotes across the plasma membrane. The SP is cleaved when the protein crosses the membrane.
Furthermore there exists chloroplast transit peptides (cTP) which are also N-terminal and are cleaved when the protein enters the choloplast. The most conserved site in cTPs is an Alanine directly after the N-terminal methionine... <ref>O. Emanuelsson, S. Brunak, G. von Heijne, H. Nielsen, "Location proteins in the cell unsing TargetP, SignalP and related tools", Nature Protocols, 2007</ref> Prediction tools: SignalP, TargetP

Combined transmembrane and signal peptide prediction

As the hydrophobic regions of a transmembrane helix and a signal peptide are highly similar, this leads to cross reaction between these two types of prediction. <ref>http://www.ebi.ac.uk/Tools/phobius/help.html</ref>

Prediction tools: Phobius and Polyphobius

In the following section different tools for predicting transmembrane helices and signal peptides are tested. As the BCKDHA protein isn't a transmembrane protein, additional proteins were used for the transmembrane and signal peptide analysis:

name organism location transmembrane protein signal peptide function reference
A4_HUMAN Human Cell membrane yes yes Protease Inhibitor P05067
BACR_HALSA Halobacterium salinarium Cell membrane yes no ion transport P02945
INSL5_HUMAN Human extracellular region no yes hormone Q9Y5Q6
LAMP1_HUMAN Human Cell membrane, Lysosome membrane, Endosome membrane yes yes Presents carbohydrate ligands to selectins P11279
RET4_HUMAN Human extracellular space no yes Transport P02753


TMHMM

Method

  • Was developed by Sonnhammer, Heijne and Krogh in 1998 <ref> E.L. Sonnhammer, Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc Int Conf Intell Syst Mol Biol.(1998)</ref>
  • Predicts predict transmembrane topology of membrane-spanning proteins
  • Is a membrane topology prediction method based on a hidden Markov model with an architecture of 7 types of states
  • Required Input: protein sequence in fasta format
  • Can also be ran on the TMHMM server

Execution

Before we could execute TMHMM we had to change all occurrences of "/usr/local/bin/" to "/usr/bin" in the following files: tmhmm, tmhmm.ORIG and tmhmmformat.pl

To execute the program we used these commands:

  • tmhmm P05067.fasta > tmhmm _out_P05067.txt
  • tmhmm P02945.fasta > tmhmm _out_P02945.txt
  • tmhmm Q9Y5Q6.fasta > tmhmm _out_Q9Y5Q6.txt
  • tmhmm P11279.fasta > tmhmm _out_P11279.txt
  • tmhmm P02753.fasta > tmhmm _out_P02753.txt
  • tmhmm P12694.fasta > tmhmm _out_P12694.txt

Results

BCKDHA

Position Membrane topology
1-445 outside

TMHMM predicted no membrane spanning region for the BCKDHA protein, which corresponds to the information provided in Uniprot.

A4_HUMAN

Figure16: Membrane topology of A4_HUMAN (source: Uniprot)
Position Membrane topology
1-700 outside
701-723 TMhelix
724-770 inside


TMHMM predicted one transmembrane helix for the A4_HUMAN. This agrees with the Uniprot annotation. The predicted transmembrane helix begins at position 701 in the protein, whereas Uniprot states the Transmembrane regions goes from position 700-723 which can be seen in Figure 16. The extracellular region reported by Uniprot begins at position 18 in the sequence, this is due to a signal peptide in the beginning of the protein. TMHMM doesn't include a signal peptide prediction, therefore it predicted the extracellular region from position 1-700.


BACR_HALSA

Figure17: Membrane topology of BACR_HALSA (source: Uniprot)
Position Membrane topology
1-22 outside
23-42 TMhelix
43-54 inside
55-77 TMhelix
78-91 outside
92-114 TMhelix
115-120 inside
121-143 TMhelix
144-147 outside
148-170 TMhelix
171-189 inside
190-212 TMhelix
213-262 outside

The TMHMM prediction differs a little bit from the information provided in Uniprot as it can be seen in Figure 17. TMHMM predicted only 13 different domains of the protein (the end of the protein is predicted to be in the extracellular space), whereas in Uniprot 15 domains are reported (protein ends in cytoplasma).

INSL5_HUMAN

Figure:18: Membrane topology of INSL5_HUMAN (source: Uniprot)
Position Membrane topology
1-135 outside

The TMHMM prediction agrees with the fact that INSL5_HUMAN is a hormone and therefore secreted in the extracellular region. The information about these properties are offered by UniProt and can be seen in Figure 18

LAMP1_HUMAN

Figure19: Membrane topology of LAMP1_HUMAN (source: Uniprot)
Position Membrane topology
1-10 inside
11-33 TMhelix
34-383 outside
384-406 TMhelix
407-417 inside

The prediction for LAMP1_HUMAN made by TMHMM does only partially agree with the Uniprot annotation as we can see by comparing the results of TMHMM with the information of UniProt which are shown in Figure 19. The sequence parts form the signal peptide and lumenal domain are predicted to be another transmembrane helix and extracellular domain. The second transmembrane helix is predicted correctly.

RET4_HUMAN

Position Membrane topology
1-201 outside

The TMHMM prediction for RET4_HUMAN is correct, as RET4_HUMAN is a secreted protein and does not span any membrane.

Phobius and Polyphobius

Methods

  • Phobius was developed by Käll et al <ref>Käll et al., "A Combined Transmembrane Topology and Signal Peptide Prediction Method", Journal of Mol. Biology,338(5):1027-1036, 2004 </ref>
  • combined prediction of transmembrane regions and signal peptids
  • Required input information: only sequence in FASTA-Format (20 amino acids and B, Z, X are recognized)
  • As transmembrane topology and signal peptides are likely to be conserved during evolution, Polyphobius was established <ref>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information", Bioinformatics, 21 (Suppl 1):i251-i257, 2005</ref>, which includes information from homologous sequences to the query.
  • Required input: 2 Options: Query Sequence in FASTA-Format, which is then blasted agains uniprot_trembl or upload of an alignment in FASTA-Format which provides information about homologs.

Results

A4_HUMAN
Phobius Polyphobius
Figure20: Prediction of Phobius
sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 1 N-REGION
REGION 2 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 3 N-REGION
REGION 4 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

Figure21: Prediction of Polyphobius

By comparing the results of Phobius and Polyphobius we can see that they predict mainly the same. Also by looking at Figure 20 and Figure 21 we can see that both predictions are nearly the same. Phobius and Polyphobius predicted the signal peptide and membrane topology for A4_HUMAN correctly. The signal peptide and membrane topology for A4_HUMAN can be found in Figure 16.

BACR_HALSA
Phobius Polyphobius
Figure22: Prediction of Phobius
sp|P02945|BACR_HALSA
TOPO_DOM 1 22 NON CYTOPLASMIC.
TRANSMEM 23 42
TOPO_DOM 43 53 CYTOPLASMIC.
TRANSMEM 54 76
TOPO_DOM 77 95 NON CYTOPLASMIC.
TRANSMEM 96 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 142
TOPO_DOM 143 147 NON CYTOPLASMIC.
TRANSMEM 148 169
TOPO_DOM 170 189 CYTOPLASMIC.
TRANSMEM 190 212
TOPO_DOM 213 217 NON CYTOPLASMIC.
TRANSMEM 218 237
TOPO_DOM 238 262 CYTOPLASMIC.

sp|P02945|BACR_HALSA
TOPO_DOM 1 21 NON CYTOPLASMIC.
TRANSMEM 22 43
TOPO_DOM 44 54 CYTOPLASMIC.
TRANSMEM 55 77
TOPO_DOM 78 94 NON CYTOPLASMIC.
TRANSMEM 95 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 141
TOPO_DOM 142 147 NON CYTOPLASMIC.
TRANSMEM 148 166
TOPO_DOM 167 186 CYTOPLASMIC.
TRANSMEM 187 205
TOPO_DOM 206 215 NON CYTOPLASMIC.
TRANSMEM 216 237
TOPO_DOM 238 262 CYTOPLASMIC.

Figure23: Prediction of Polyphobius

The predictions of Phobius and Polyphobius differ only in a small difference in the length of the single domains which can be seen by the results in the two tables above. Additionally the comparison of Figure 22 with Figure 23 show that they are mainly the same and only differ a bit in the posterior label probability of cytoplasmic and non cytoplasmic regions. Both predictions of the membrane topology are correct which can be seen by comparing the results with Figure 17.


INSL5_HUMAN
Phobius Polyphobius
Figure24: Prediction of Phobius
sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 5 N-REGION
REGION 6 17 H-REGION
REGION 18 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 4 N-REGION
REGION 5 16 H-REGION
REGION 17 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

Figure25: Prediction of Polyphobius

The Phobius and Polyphobius predictions for INSL5_HUMAN agree with the information given on Uniprot (Figure 18). By comparing the results in the table above and Figure 24 with Figure 25 we can see that both predicted correctly a signal peptide and only one extracellular region of the protein.

LAMP1_HUMAN
Phobius Polyphobius
Figure26: Prediction of Phobius
sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 10 N-REGION
REGION 11 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 9 N-REGION
REGION 10 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

Figure27: Prediction of Polyphobius

By comparing the results of Phobius and Polyphobius listet in the table above and shown in Figure 26 and Figure 27 we can assume that the two tools made the same predictions. To find out if these results are correct we compared them to the information offered by UniProt Figure 19 and can conclude that the signal peptide and membrane topology predictions made by Phobius and Polyphobius for LAMP1_HUMAN are correct.

RET4_HUMAN
Phobius Polyphobius
Figure28: Prediction of Phobius
sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 2 N-REGION
REGION 3 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 3 N-REGION
REGION 4 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

Figure29: Prediction of Polyphobius

Both tools made nearly the same prediction which can be seen out of the table above and because of the visualization of the two predictions ( Figure 28, Figure 29). Both predict the signal peptide of RET4_HUMAN correctly, as well as the one extracellular region of the protein.

For the BCKDHA-protein Phobius predicted a signal peptide with about 90% probability at the beginning of the sequence. The predicted signal peptide is 34 amino acids long. This matches the information given on Uniprot, which says, that BCKDHA contains a 45bp long signal peptide for the transfer into the mitochondrion. The rest of the amino acid is a non cytoplasmic protein sequence. No part of the protein is predicted to be transmembrane spanning. This is also true, as BCKDHA is a protein located in the mitochondrion matrix according to Uniprot.

BCKDHA
Phobius Polyphobius
Figure30: Prediction of Phobius
sp|P12694|ODBA_HUMAN (BCKDHA)
Signal 1 34
Region 1 16 N-Region
Region 17 25 H-Region
Region 26 34 C-Region
TOPO_DOM 35 445 non cytoplasmic

OBDA_HUMAN (BCKDHA)
TOPO_DOM 1 445 Non cytoplasmic

Figure31: Prediction of Polyphobius

Considering the information given on Uniprot, Polyphobius performed worse than Phobius on the BCKDHA-protein sequence. It predicted no signal sequence at the beginning of the protein sequence. There is a low probability for the amino acids between position 1-45 to be a signal sequence, but all in all the whole sequenc is predicted to be a non cytoplasmic protein. This is also shown in Figure 31. In contrast to the prediction of Polyphobius, Phobius predicted the signal sequence between position 1 and 34 with a very high probability. This probability is visualized very good in Figure 30

OCTOPUS and SPOCTOPUS

Methods

  • OCTOPUS was developed by Viklund and Elofsson in 2008 <ref>Håkan Viklund and Arne Elofsson, "Improving topology prediction by two-track ANN-based preference scores and an extended topological grammar", Bioinformatics (2008)</ref>
  • OCTOPUS (obtainer of correct topologies for uncharacterized sequences) uses a combination of hidden Markov models and artificial neural networks.
  • It creates a sequence profile by doing a BLAST search to obtain homologous sequences. The profile is used as input for a neural network that predicts the probability for each residue to be located in a transmembrane(M), interface (I), close loop (L), or globular loop (G) environment as well as the preference to be inside (i) or outside (o) of the membrane. A hidden Markov model is used to calculate the most likely Protein Topology.
  • Required input: Protein Sequence in FASTA-Format
  • SPOCTOPUS (Viklund et al., 2008<ref>Viklund et al., "A combined predictor of signal peptides and membrane protein topology", Bioinformatics (2008)</ref>) is an extension of OCTOPUS which also predicts signal peptides. A neural network is used to predict a signal peptide preference score. The signal peptide's location is determined by a hidden Markov model. The output contains the information retrieved by OCTOPUS as well as the probabilty if a residue is predicted to be N-terminal of a signal peptide (n) or in a signal peptide (S).
  • Required input information: Protein sequence in FASTA-Format

Results

A4_HUMAN

Figure32: Prediction for Octopus and Spoctopus for A4_HUMAN


When we compare the results of OCTOPUS and SPOCTOPUS with each other we can see that both tools predict the membrane topology for A4_HUMAN. The output is visualized in Figure 32 and it is shown by the brwon line that the protein is mainly in the non-cytoplasmic region. OCTOPUS also detected the signal peptide. By comparing the predictions with the information offered by UniProt we can see that the predictions of both tools are correct.

BACR_HALSA

Figure33: Prediction for Octopus and Spoctopus for BACR_HALSA


The predictions made by OCTOPUS and SPOCTOPUS for BACR_HALSA are identical and correct. The results are visualized in Figure 33. We can see that the protein is mainly in the transmembrane region which is pointed out by the red bars. Additionally the alternating brown and green lines indicate that the protein changes in turn between non-cytoplasmic region and cytoplasmic region. SPOCTOPUS was not able to predict a signal peptide, which agrees with the information given in Uniprot.

INSL5_HUMAN

Figure34: Prediction for Octopus and Spoctopus for INSL5_HUMAN


When we compare the results of the predictions from OCTOPUS and SPOCTOPUS we can see that both of them predict that the protein is in a non-cytoplasmic region after position 22 or 23. This conclusion is supported by the brown line in Figure 34. In this picture it is also shown that the two tools made different predictions for the first part of the protein. SPOCTOPUS predicted the signal peptide of INSL5_HUMAN while OCTOPUS predicted for the same part of the sequence a transmembrane domain. By comparing the results with the information in UniProt we can see that the signal peptide is correctly predicted.

LAMP1_HUMAN

Figure35: Prediction for Octopus and Spoctopus for LAMP1_HUMAN

By looking on the visualization of the results ( Figure 35) we can see that the two tools made mainly the same predictions. But their predictions differ in the beginning of the protein. While SPOCTOPUS predicted the beginning as a signal peptide, OCTOPUS assigned this region as an additional inside region and transmembrane helix where the sequence contains a signal. As we know from UniProt the prediction of SPOCTOPUS is the right one because LAMP1_HUMAN has a signal peptide in the beginning of the protein.

RET4_HUMAN

Figure36: Prediction for Octopus and Spoctopus for RET4_HUMAN

Again the two tools made nearly the same predictions and only differ in the beginning of the protein. As we can see in Figure 36 both of them predict the protein to be mainly in a non-cytoplasmic region but while SPOCTOPUS predicts the beginning to be a signal peptide, OCTOPUS assigned this region to be a transmembran helix. By comparing the two predictions with the information offered by UniProt it is obvious that there is a signal peptide in the beginning of the protein.

BCKDHA

Figure37: Prediction for Octopus and Spoctopus for BCKDHA

The OCTOPUS and SPOCTOPUS predictions for the BCKDHA protein are completely contrary in terms of the intracellular and extracellular regions which is very clear by considering Figure 37. But both predictions are wrong, as BCKDHA is no membran protein. Furthermore, SPOCTOPUS missed the 45bp long signal peptide at the beginning of the sequence.

SignalP

Method

  • SignalP was established by Nielsen et al. in 1997<ref>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites", Protein Engineering, 10:1-6, 1997</ref>
  • Focused on neural networks as well as Hidden Markov Models
  • Uses three different scores for the prediction with HMMs:
    • S-score (score for the signal peptide)
    • C-score (score for the clevage site)
    • Y-score (combination of the S-score and the C-score but more precise)
  • Identifies signal peptides and cleavage sites
  • Make predictions for three different organism groups:
    • eukaryotes
    • Gram-negative
    • Gram-positive bacteria
  • can also be run on the SignalP server

Execution

To run the command line SignalP tool, the path in the SignalP file had to be adapted to /apps/signalp-3.0

Following commands were used to execute SignalP:

  • signalp -t euk P05067.fasta > signalp_out_P05067.txt
  • signalp -t gram- P02945.fasta > signalp_out_P02945.txt
  • signalp -t euk Q9Y5Q6.fasta > signalp_out_Q9Y5Q6.txt
  • signalp -t euk P11279.fasta > signalp_out_P11279.txt
  • signalp -t euk P02753.fasta > signalp_out_P02753.txt
  • signalp -t euk P12694.fasta > signalp_out_P12694.txt


Results

Figure38: Prediction by SignalP for BCKDHA using HMMs

BCKDHA

Both methods (NN and HMM) predicted the most likely cleavage site between positions 32 and 33 (ARG_LA). This is visualized very good by the red lines in Figure 38
This prediction does not agree with Uniprot, where a signal peptide from position 1-45 is listed.

A4_HUMAN

SignalP predicted with both methods a cleavage site between positions 17 and 18 with a high probability for a signal peptide.
SignalP predicted the prediction site for A4_HUMAN correct.

BACR_HALSA

Both methods (NN and HMM) predicted no cleavage site, and therefore no signal peptide, in the BACR_HALSA sequence.
This is also true according to Uniprot, where no signal peptide is stated.

INSL5_HUMAN

For the INSL5_HUMAN protein signalP detected a cleavage site between positions 22 and 23, which is due to a predicted signal peptide at the beginning of the sequence.
The signal peptidase I cleavage site was predicted correctly, as Uniprot states a signal peptide from positions 1-22.


LAMP1_HUMAN

SignalP predicted with both methods a cleavage site between positions 28 and 29, as there is a signal peptide detected.
The cleavage site prediction made by SignalP for LAMP1_HUMAN is correct. Uniprot shows a signal peptide for this protein which ranges from 1-28 in the sequence.

RET4_HUMAN

SignalP predicted a cleavage site with high probability between positions 18 and 19 in both the NN and the HMM method. This cleavage site is predicted to be after a signal peptide.
This prediction is correct according to Uniprot.

TargetP

Method

  • TargetP was developed by Emanuelsson et al. in 2002 <ref> Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", J. Mol. Biol., 200: 1005-1016, 2002</ref>
  • TargetP predicts the subcellular location of eukaryotic proteins
  • Additionally it can make cleavage site predictions
  • This method is neural network based. The prediction is based on the N-terminal presequences:
    • chloroplast transit peptide(cTP)
    • mitochondiral targeting peptide (mTP)
    • secretory pathway signal peptide (SP)
  • Required input information: Sequence(s) in FASTA format, organism group
  • The prediction can also be ran on the targetP server

Results

Figure39: prediction results by TargetP
All the results of the prediction of TargetP are shown in the table in Figure 39.

The ODBA_HUMAN (BCKDHA) is predicted to be located in the mitochondrion, which is true according to Uniprot. All other tested proteins are predicted to be located in the secretory pathway and therefore to have a signal peptide. These predictions are true except for BACR_HALSA, which has no signal peptide. But here TargetP returns a reliabilty index of four, which indicates an unsafe prediction.

4. Prediction of GO terms

The following section deals with GO term prediction tools. In order to verify the predictions, first the real GO annotations are presented (as they are listed in <ref>http://www.uniprot.org</ref>:
(P: Process, F: Function, C: Component)

BCKDHA

GO Term Name GO identifier Aspect
Process
metabolic process 0008152 P
branched chain family amino acid catabolic process 0009083 P
cellular nitrogen compound metabolic process 0034641 P
oxidation-reduction process 0055114 P
Function
alpha-ketoacid dehydrogenase activity 0003826 F
3-methyl-2-oxobutanoate dehydrogenase (2-methylpropanoyl-transferring) activity 0003863 F
protein binding 0005515 F
oxidoreductase activity 0016491 F
oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptors 0016624 F
carboxy-lyase activity 0016831 F
metal ion binding 0046872 F
Component
mitochondrion 0005739 C
mitochondrial matrix 0005739 C
mitochondrial alpha-ketoglutarate dehydrogenase complex 0005947 C

A4_HUMAN

GO Term Name GO identifier Aspect
Process
G2 phase of mitotic cell cycle 0000085 P
suckling behaviour 0001967 P
plantelet degranulation 0002576 P
mRNA polyadenylation 0006378 P
regulation of translation 0006417 P
protein phosphorylation 0006468 P
cellular copper ion homeostasis 0006878 P
endocytosis 0006897 P
apoptosis 0006915 P
induction of apoptosis 0006917 P
cell adhesion 0007155 P
regulation of epidermal growth factor receptor activity 0007176 P
Notch signaling pathway 0007219 P
axonogenesis 0007409 P
blood coagulation 0007596 P
mating bahavior 0007617 P
locomotory behavior 0007626 P
axon cargo transport 0008088 P
cell death 0008219 P
adult locomotory behavior 0008344 P
visual learning 0008542 P
negative regulation of peptidase activity 0010466 P
positive regulation of peptidase activity 0010951 P
axon midline choice point recognition 0016199 P
neuron remodeling 0016322 P
dendrite development 0016358 P
platelet activation 0030168 P
extracellular matrix organization 0030198 P
forebrain development 0030900 P
neuron projection development 0031175 P
ionotropic glutamate recptor signaling pathway 0035235 P
regulation of multicellular organism growth 0040014 P
innate immune response 0045087 P
negative regulation of neuron differentiation 0045665 P
positive regulation of mitotic cell cycle 0045931 P
positive regulation of transcription from RNA polymerase II promotor 0045944 P
collateral sprouting in absence of injury 0048699 P
regulation of synapse structure and activity 0050803 P
neuromuscular process controling balance 0050885 P
synaptic growth at neuromuscular junction 0051124 P
neuron apoptosis 0051402 P
smooth endoplasmic reticulum calcium ion homeostasis 0051563 P
Function
DNA binding 0003677 F
serine-type endopeptidase inhibitor activity 0004867 F
receptor binding 0005102 F
binding 0005488 F
protein binding 0005515 F
heparin binding 0008201 F
peptidase activator activity 0016504 F
peptidase inhibitor activity 0030414 F
acetylcholine receptor binding 0033130 F
identical protein binding 0042802 F
metal ion binding 0046872 F
PTB domain binding 0051425 F
Component
exracellular region 0005576 C
membrane fraction 0005624 C
cytoplasm 0005737 C
Golgi apparatus 0005794 C
plasma membrane 0005886 C
integral to plasma membrane 0005887 C
coated pit 0005905 C
cell surface 0009986 C
membrane 0016020 C
integral to membrane 0016021 C
synaptosome 0019717 C
axon 0030424 C
plantelet alpha granule lumen 0031093 C
cytoplasmic vesicle 0031410 C
neuromuscular junction 0031594 C
ciliary rootlet 0035253 C
neuron projection 0042005 C
dendritic spine 0043197 C
dendritic shaft 0043198 C
intracellular membrane-bounded organelle 0043231 C
apical part of cell 0045177 C
synapse 0045202 C
perinuclear region of cytoplasm 0048471 C
spindle midzone 0051233 C


BACR_HALSA

GO Term Name GO identifier Aspect
Process
transport 0006810 P
ion transport 0006811 P
phototransduction 007602 P
photon transport 0015992 P
protein-chromophore linkage 0018298 P
response to stimulus 0050896 P
Function
receptor activity 0004872 F
ion channel activity 0005216 F
photoreceptor activity 0009881 F
Component
plasma membrane 0005886 C
membrane 0016020 C
integral to membrane 0016021 C

INSL5_HUMAN

GO Term Name GO identifier Aspect
Process
biological_process 0008150 P
Function
hormone activitiy 0005279 F
Component
cellular_component 0005575 C
extracellular region 0005576 C

LAMP1_HUMAN

GO Term Name GO identifier Aspect
Process
autophagy 0006914 P
Component
membrane fraction 0005624 C
lysosome 0005764 C
lysosomal membrane 0005765 C
endosome 0005768 C
late endosome 0005770 C
multivesicular body 0005771 C
plasma membrane 0005886 C
integral to plasma membrane 0005887 C
external side of plasma membrane 0009897 C
cell surface 0009986 C
endosome membrane 0010008 C
membrane 0016020 C
integral to membrane 0016021 C
vesicle 0031982 C
sarcolemma 0042383 C
melanosome 0042470 C

RET4_HUMAN

GO Term Name GO identifier Aspect
Process
eye development 0001654 P
gluconeogenesis 0006094 P
transport 0006810 P
spermatogenesis 0007283 P
heart development 0007507 P
visual perception 0007601 P
male gonad development 0008584 P
embryo development 0009790 P
maintenance of gastrointestinal epithelium 0030277 P
lung development 0030324 P
positive regulation of insulin secretion 0033024 P
response to retinoic acid 0032526 P
response to insulin stimulis 0032868 P
retinol transport 0034633 P
retinol metabolic process 0042572 P
retinal metabolic process 0042574 P
glucose homeostasis 0042593 P
response to ethanol 0045471 P
embryonic organ morphogenesis 0048562 P
embryonic skeletal system development 0048706 P
cardiac muscle tissue development 0048738 P
female genitalia morphogenesis 0048807 P
response to stimulus 0050896 P
detection of light stimulus involved in visual perception 0050908 P
positive regulation of immunoglobin secretion 0051024 P
retina development in camera-type eye 0060041 P
negative regulation of cardiac muscle cell proliferation 0060044 P
embryonic retina morphogenesis in camera-type eye 0060059 P
uterus development 0060065 P
vagina development 0060068 P
urinary bladder development 0060157 P
heart trabecula formation 0060347 P
Function
transporter activity 0005215 F
binding 0005488 F
retinoid binding 0005501 F
protein binding 0005515 F
retinal binding 0016918 F
retinol binding 0019841 F
retinol transporter activity 0034632 F
Component
extracellular region 0005576 C
extracellular space 0005615 C

GOPET

Method

  • GOPET (Gene Ontology Term Prediction and Evaluation Tool) was described by Vinayagam et al.<ref> Arunachalam Vinayagam, Coral Del Val, Falk Schubert, Roland Eils, Karl-Heinz Glatting, Sándor Suhai, Rainer König, "GOPET: A tool for automated predictions of Gene Ontology terms", BMC Bioinformatics (2006), Volume: 7, Issue: 161, Publisher: BioMed Central, Pages: 161</ref>
  • GOPET is a complete automated tool for assigning molecular function terms to a given sequence.
  • Bases on homology searches and Support Vector Machines
  • Required input information: cDNA or protein sequence
  • Gene Ontology is used for annotation terms, GO-mapped protein databases for performing homology searches and Support Vector Machines for the prediction and the assignment of confidence values.
  • The prediction is organism independent.

Results

BCKDHA

GOid Aspect Confidence GOTerm
GO:0003824 F 97% catalytic activity
Go:0016491 F 96% oxidoreductase activity
GO:0016624 F 95% oxidoredusctase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863 F 90% 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739 F 89% pyruvate dehydrogenase acetyl-transferring activity
GO:0004738 F 78% pyruvat dehydrogenase activity
GO:0003826 F 77% alpha-ketoacid dehydrogenase activity
GO:0047101 F 75% 2-oxoisovalerate dehydrogenase acylting activity
GO:0008677 F 65% 2-dehydropantoate 2-reductase activity
GO:0019152 F 63% acetoin dehydrogenase activity
GO:0030955 F 63% potassium ion binding
GO:0016616 F 62% oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872 F 62% metal ion binding

The GOPET predictions for BCKDHA are mostly correct. The by this tool predicted GO terms with confidence >90% are all listed in the Uniprot entry for BCKDHA and so is the metal ion binding function.


A4_HUMAN

GOid Aspect Confidence GOTerm
GO:0004866 F 87% endopeptidase inhibitor activity
GO:0004867 F 86% serine-type endopeptidase inhibitor activity
GO:0030568 F 83% plasmin inhibitor activity
GO:0030304 F 83% trypsin inhibitor activity
GO:0030414 F 82% peptidase inhibitor activity
GO:0005488 F 79% binding
GO:0005515 F 74% protein binding
GO:0046872 F 73% metal ion binding
GO:0003677 F 71% DNA binding
GO:0008201 F 70% heparin binding
GO:0008270 F 69% zinc ion binding
GO:0005507 F 69% copper ion binding
GO:0005506 F 67% iron ion binding

The GOPET results for A4_HUMAN match the Uniprot annotation quite good. The predicted trypsin inhibitor activity and the plasmin inhibitor activity are not present in Uniprot, as well as the peptidase inhibitor activity or the endopeptidase actitity. But as the predicted serine-type endopeptidase inhibitor activity can be seen as a subcategory of the previously named functions, and it is a true function of the A4_HUMAN protein, the predictions are not that bad. The same is true for the zinc, copper and iron ion binding function, which are all metals, and the protein has a metal ion binding function.


BACR_HALSA

GOid Aspect Confidence GOterm
GO:0005216 F 77% ion channel activiy
GO:0008020 F 75% G-protein coupled photoreceptor activity
GO:0015078 F 60% hydrogen ion transmembrane transporter activity

GOPET predicted the ion channel activity and the photorecptor activity correctly. The hydrogen ion transmembrane transporter activity does not agree with the Uniprot annotations.

INSL5_HUMAN

GOid Aspect Confidence GOterm
GO:0005179 F 80% hormone activity

The INSL5_HUMAN protein is correctly predicted to be a hormone.


LAMP1_HUMAN

GOid Aspect Confidence GOterm
GO:0004812 F 60% aminoacyl-tRNA ligase activity
GO:0005524 F 60% ATP binding

For the LAMP1_HUMAN protein, no functional GO annotation is listed in Uniprot.


RET4_HUMAN

GOid Aspect Confidence GOterm
GO:0005488 F 90% binding
GO:0005501 F 81% retinoid binding
GO:0008289 F 80% lipid binding
GO:0019841 F 78% retinol binding
GO:0005215 F 78% transporter activity
GO:0016918 F 78% retinal binding
GO:0005319 F 69% lipid transporter activity
GO:0008035 F 60% high-density lipoprotein particle binding

The GOPET predictions for RET4_HUMAN are correct except for the lipid-linked activities.

Pfam

Method

  • Pfam was established by Finn et al. in 2008. It is described in <ref>Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8</ref>
  • Pfam is a database which contains protein families and domains
  • The databes consists of two different parts: Pfam-A and Pfam-B
    • Pfam-A: more exactly
    • Pfam-B: generated automatically and so the data are not as qualitativ as in Pfam-A
  • Webserver: Pfam


Results

BCKDHA

Figure40: prediction of Pfam-A for BCKDHA

Pfam found one significant match in the database which is visualized in Figure 40.

  • Molecular function
    • GO:0016624 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor)
  • Biological Process
    • GO:0008152 (metabolic process)


A4_HUMAN

Figure41: prediction of Pfam-A for A4_HUMAN

Pfam found six significant matches in the database which are visualized in Figure 41.

  • Cellular Component
    • GO:0016021 (integral to membrane)
  • Molecular function
    • GO:0005488 (binding)
    • GO:0004867 (serine-type endopeptidase inhibitor activity )
  • No GO-ID
    • E2 domain of amyloid precursor protein
    • beta-amyloid precursor protein C-terminus



BACR_HALSA

Figure42: prediction of Pfam-A for BACR_HALSA

Pfam foundone significant match in the database which is visualized in Figure 42.

  • Cellular Component
    • GO:0016020 (membrane)
  • Molecular function
    • GO:0005216 (ion channel activity)
  • Biological Process
    • GO: 0006811 (ion transport)



INSL5_HUMAN

Figure43: prediction of Pfam-A for INSL5_HUMAN

Pfam found one significant match in the database which is visualized in Figure 43.

  • Cellular Component
    • GO:0005576 (extracellular region)
  • Molecular function
    • GO:0005179 (hormone activity)


LAMP1_HUMAN

Figure44: prediction of Pfam-A for LAMP1_HUMAN

Pfam found one significant match in the database which is visualized in Figure 44.

  • Cellular Component
    • GO:0016020 (membrane)



RET4_HUMAN

Figure45: prediction of Pfam-A for RET4_HUMAN

Pfam found one significant match in the database which is visualized in Figure 45.

  • Molecular function
    • GO:0005488 (binding)


By comparing the Pfam annotations with the already known GO terms for the different proteins it can be seen that the results for all analysed proteins are correct, but by far not exhaustive.

ProtFun 2.2

Method

  • ProtFun is described in : Jensen et al.<ref>Prediction of human protein function from post-translational modifications and localization features.

L. Juhl Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Stærfeldt, K. Rapacki, C. Workman, C. A. F. Andersen, S. Knudsen, A. Krogh, A. Valencia and S. Brunak. J. Mol. Biol., 319:1257-1265, 2002</ref>

  • ProtFun is an ab initio prediction server of protein function from sequence. Various servers are queried and the provided information is integrated into the final prediciton.
  • The results of ProtFun are only probabilities and odd scores and no prediction if a protein has a specific function or not.
  • The arrow (=>) indicated which line includes the highest information content

Results

BCKDHA

Figure46: Prediction of GO-terms for BCKDHA by ProtFun 2.2
  • Functional category
    • Central_intermediary_metabolism (Prob: 0.321, Odds: 5.096) (=>)
    • Amino_acid_biosynthesis (Prob: 0.187, Odds: 8.520)
    • Purines_and_purymidines (Prob: 0.257, Odds: 1.059)
    • Biosynthesis_of_cofactors (Prob: 0.246, Odds: 3.413)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.769, Odds: 2.683)
  • Enzyme class
    • Ligase (Prob: 0.085, Odds: 1.673) (=>)
    • Lyase (Prob: 0.076, Odds: 1.614)
  • Gene Ontology category
    • Growth_factor (Prob: 0.009, Odds: 0.609)
    • Signal_transducer (Prob: 0.098, Odds: 0.458)

The results of ProtFun2.2 for the prediction of the GO-terms in BCKDHA are listed in Figure 46. In the enumeration above the most significant results are summarized. The programm predicted BCKDHA to have mainly a function in the metabolic process. Also the second point of "functional category" has a very good odd score and so we also consider it to be a certain prediction. The other two entries are the ones with the next best probability or odd score. But we can see that in both cases the odd score is much lower than in the first two results. So we take the first entries as the best predictions of ProtFun2.2. By comparing these assertions with the information in UniProt we can see that they are correct. There was no certain prediction for the "Gene Ontology category". We listed the two best results above but as we can see by looking at the probability and the odd score the results are not significant.

A4_HUMAN

Figure47: Prediction of GO-terms for A4_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Transport_and_Binding (Prob: 0.827, Odds: 2.016)
    • Biosynthesis_of_cofactors (Prob: 0.261, Odds: 3.623)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.392, Odds: 1.368)
  • Enzyme class
    • Ligase (Prob: 0.048, Odds: 0.946)
    • Transferase (Prob: 0.208, Odds: 0.603)
    • Hydrolase (Prob: 0.190, Odds: 0.600)
  • Gene Ontology category
    • Structural_protein (Prob: 0.034, Odds: 1.205) (=>)
    • Stress_response (Prob: 0.076, Odds: 0.862)
    • Signal transducer (Prob: 0.126, Odds: 0.586)

The results of ProtFun2.2 for the prediction of the GO-terms in A4_HUMAN are listed in Figure 47. The most significant results are shown in the listing above. Here we can see that A4_HUMAN is predicted to be a "Cell_envelope" with an odd score of 13.186 which indicates that this prediction is very confident. And as we know from UniProt it is right. A4_HUMAN has also the function of transport and binding which is the next point in the list. So we can see that although the odd score is much lower than the fist one, the prediction is indeed correct. As well as the claim that A4_HUMAN is involved in the biosynthesis of cofactors. ProtFun2.2 assumes that this protein is a structural protein. This is again correct according to the information in the beginning of this section. But this is not the only correct predicted GO-term for A4_HUMAN which is shwon by the two other listed GO-terms. By comparing the whole list of predicted GO-terms in Figure 47 with the given information we can see that all of the predictions are right.

BACR_HALSA

Figure48: Prediction of GO-terms for BACR_HALSA by ProtFun 2.2
  • Functional category
    • Transport_and_Binding (Prob: 0.791, Odds: 1.929) (=>)
    • Biosynthesis of cofactors (Prob: 0.186, Odds: 2.589)
    • Purines_and_pyrimidines (Prob: 0.302, Odds: 1.244)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.801, Odds: 1.122)
  • Enzyme class
    • none
  • Gene Ontology category
    • Transporter (Prob: 0.400, Odds: 4.036) (=>)
    • Receptor (Prob: 0.355, Odds: 2.087)

The results of ProtFun2.2 for the prediction of the GO-terms in BACR_HALSA are listed in Figure 48. The most significant results are shown in the list above. Since both the prediction in the "functional category" and in "gene ontology category" declares BACR_HALSA to be mainly a transporter it can be assumed that this prediction is very significant. By looking at the UniProt GO annotations we can see that they include ion transport and photon transport, as well as transport itself so we can say that this prediction was correct. But the assumption that BACR_HALSA is involved in the biosynthesis of cofactors is wrong although it has a quite good odd score. But it has a very low probability which shows that the information of both is important. The last point in the list shows that ProtFun2.2 predicts receptor functionallity. This is also a correct prediction because in UniProt is listed that this protein has a receptor activity.

INSL5_HUMAN

Figure49: Prediction of GO-terms for INSL5_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.756, Odds: 12.393) (=>)
    • Transport_and_binding (Prob: 0.834, Odds: 2.033)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.791, Odds: 1.109)
  • Enzyme class
    • none
  • Gene Ontology category
    • Hormone (Prob: 0.247, Odds: 37.936) (=>)
    • Growth_factor (Prob: 0.061, Odds: 4.379)

The results of ProtFun2.2 for the prediction of the GO-terms in INSL5_HUMAN are listed in Figure 49. The most significant results are specified above. The first prediction of ProtFun2.2 which maintains that INSL5_HUMAN can be classified in the functional category "cell envelope" has a very high odd score of 12.393. This suggests that this prediction is correct which can be confirmed by the information of UniProt. The prediction that INSL5_HUMAN is involved in transport and binding is also correct predicted. ProtFun predicted the hormone activity of INSL5_HUMAN correctly. By comparing this prediction with the information offered by UniProt we can see that it is correct. But it is additionally the only GO-term for this protein which means that the prediction of "growth factor" has to be wrong.

LAMP1_HUMAN

Figure50: Prediction of GO-terms for LAMP1_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Transport_and_binding (Prob: 0.834, Odds: 2.033)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.724, Odds: 1.014)
  • Enzyme class
    • none
  • Gene Ontology category
    • Immune_response (Prob: 0.371, Odds: 4.368) (=>)
    • Stress_response (Prob: 0.246, Odds: 2.795)

The results of ProtFun2.2 for the prediction of the GO-terms in LAMP1_HUMAN are listed in Figure 50. The most significant results can be found in the list above. This protein is predicted to be important for the cell envelope with a very significant probability and odd score. As expected this result is correct since it also occurs in the GO-terms in UniProt. In contrast the prediction of transport and binding which has indeed a good probability but no high odd score is wrong. The GO category Immune response predicted by ProtFun for LAMP1_HUMAN is not false, as autophagy is a process often triggered by the immune system as a response to foreign substances. Since autophagy is listed in UniProt the prediction of stress response is also quite correct.

RET4_HUMAN

Figure51: Prediction of GO-terms for RET4_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Central_intermediary_metabolism (Prob: 0.197, Odds: 3.128)
    • Transport_and_binding (Prob: 0.800, Odds: 1.951)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.544, Odds: 1.900)
  • Enzyme class
    • Lyase (Prob: 0.059, Odds: 1.264) (=>)
    • Hydrolase (Prob: 0.235, Odds: 0.742)
  • Gene Ontology category
    • Immune_response (Prob: 0.239, Odds: 2.813) (=>)
    • Stress_response (Prob: 0.616, Odds: 1.829)

The results of ProtFun2.2 for the prediction of the GO-terms in RET4_HUMAN are listed in Figure 51. The most significant results can be found in the list above. The categorization of ProtFun2.2 of RET4_HUMAN in cell_envelope is done with a very high probability and a significant odd score. And of course this prediction is rigth which can be seen by the comparison with UniProt. The result that the protein is involved in the metabolism has a very bad probability and a much lower odd score than the first hit but anyway it is correct. The last of the three listed functional categories is also predicted accurately. The prediction of immune response for RET4_HUMAN. We can't find any hints in the GO-terms in UniProt for immune response. Whereas the prediction of stress response was correct.

References

<references />


back to Maple syrup urine disease main page

go to Sequence Alignments BCKDHA (Task 2)

go to Homology_based_structure_predictions_BCKDHA (Task 4)