Difference between revisions of "Secondary Structure Prediction BCKDHA"

From Bioinformatikpedia
(Phobius and Polyphobius)
(Results)
 
(322 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== 1. Secondary structure prediction ==
 
== 1. Secondary structure prediction ==
  +
  +
=== General Information ===
  +
The secondary structure of a protein bases on the primary structure and consists of alpha-helices, beta-sheets and coils.<br>
  +
==== alpha-helices ====
  +
[[File:Alphahelix.jpg | right | 120px || thumb || Figure1: alpha-helix]]
  +
Alpha-helices ([[:File:Alphahelix.jpg |Figure 1]]) are built by H-bounds between the NH-group of an amino acid and the CO-group of the amino acid which is placed four recidues earlier (i+4). This form of the alhpa-helix is the most common one. There are two other types of alpha-helices which are very rare. One is called 3,10-helices because the H-bound is between the NH-group and the CO-group three recidues earlier (i+3). The other one is the Phi-helix and here the H-bound is between the NH-group and the CO-group five residues earlier (i+5). The different locations of the CO-group influence the width and the height of the helices.<br>
  +
  +
==== beta-sheets ====
  +
  +
[[File:BetaSheet.jpg | 120px | right || thumb ||Figure2: beta-sheet]]
  +
The H-bounds ([[:File:BetaSheet.jpg |Figure 2]]) between the CO-group and the NH-group which build a beta-sheet can be located far away from each other in the sequence. <br>
  +
There are two different kinds of beta-sheets. The parallel one where the sheets all point in the same direction and the anti-parallel ones where the sheets point alternately in different directions.<br>
  +
  +
==== coils ====
  +
Coils are irregular formed elements like turns. <br><br>
   
 
=== PSIPRED ===
 
=== PSIPRED ===
Line 6: Line 21:
 
==== Basic information ====
 
==== Basic information ====
   
author: David Jones (University College London)<br>
+
author: David T. Jones (University College London)<br>
 
year:1998 <br>
 
year:1998 <br>
 
version: 2 <br>
 
version: 2 <br>
  +
<br>
===== References =====
 
[[http://bioinf.cs.ucl.ac.uk/psipred/ PSIPRED Server]] <br>
 
[[http://bioinf.cs.ucl.ac.uk/index.php?id=779 Overview of prediction methods]] <br>
 
[[http://cms.cs.ucl.ac.uk/typo3/fileadmin/bioinf/PSIPRED/psipred_history.html History of the PSIPRED]] <br>
 
 
===== Theory =====
 
PSIPRED uses neuronal networks which has a single hidden layer and a feed-forward back-propagation architecture.
 
FOr the online prediction on the server it is enough to enter a amino acid sequence.
 
Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.
 
 
===== Algorithm =====
 
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secundary structure prediction is filtered.<br>
 
   
  +
PSIPRED uses neuronal networks which have a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure.
===== What is predicted =====
 
  +
To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data. <br>
PSIPRED predicts the secondary structure <br>
 
  +
For the online prediction on the server it is enough to enter a amino acid sequence.
  +
Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.<br>
  +
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secondary structure prediction is filtered.<br><br>
   
===== Features =====
 
 
There are three different options: <br>
 
There are three different options: <br>
 
- Mask low complexity regions <br>
 
- Mask low complexity regions <br>
 
- Mask transmembrane helices <br>
 
- Mask transmembrane helices <br>
- Mask coiled-coil regions <br>
+
- Mask coiled-coil regions <br><br>
   
  +
===== References =====
It is also possible to get an email with the results of PSIPRED.
 
  +
[[http://bioinf.cs.ucl.ac.uk/psipred/ PSIPRED Server]] <br>
  +
[[http://bioinf.cs.ucl.ac.uk/index.php?id=779 Overview of prediction methods]] <br>
  +
[[http://cms.cs.ucl.ac.uk/typo3/fileadmin/bioinf/PSIPRED/psipred_history.html History of the PSIPRED]] <br>
   
===== Required information=====
 
PSIPRED requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data. <br>
 
<br><br>
 
 
==== Prediction ====
 
==== Prediction ====
  +
[[File:PSIPREDbild.png |right| thumb|230px |Figure3: Visualization of the prediction of PSIPRED]]
   
  +
Seq MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
[[File:PSIPREDbild.png | right]]
 
  +
Pred CHHHHHHHHHHHHHHHCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCC
  +
UniProt
  +
  +
Seq KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
  +
Pred CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHH
  +
UniProt EEEE HHH HH
  +
  +
Seq KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
  +
Pred HHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCHHHHHHHHHHCCCC
  +
UniProt HHHHHHHHHHHHHHHHHHHHHHHH EEE HHHHHHHH
  +
  +
Seq TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
  +
Pred CCEEECCCCHHHHHHHCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCC
  +
UniProt EEE HHHHHH HHHHHHHHH CCCC CCC
  +
  +
Seq HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
  +
Pred CCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHH
  +
UniProt C CCCHHHHHHHHHHHHHHH EEEEEE HHH HHHHHHH
  +
  +
Seq NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
  +
Pred HHHHHHCCCEEEEEECCCCCCCCCCCHHCCCCHHHHHCCCCCCCCCEECC
  +
UniProt HHHHH EEEEEEE EEE HHH EEE HHH HHH EEEEEE
  +
  +
Seq NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
  +
Pred HHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHH
  +
UniProt EEEEEEEEEEEEEEEEEE EEEEEE
  +
  +
Seq VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
  +
Pred HHHHHHCCCCHHHHHHHHHHCCCCC HHHHHHHHHHHHHHHHHHHHHHHC
  +
UniProt HHHHHHHHHCCCC HHHHHHHHHHHHHHHHHHHHHHHH
  +
  +
Seq PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
  +
Pred CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
  +
UniProt HHHH EEEE HHHHHHHHHHHHHHHHHHHH HHH
   
{| border="1" style="text-align:center; border-spacing:0;"
 
!start
 
!end
 
!structural element
 
|-
 
| 1 || 1 || C
 
|-
 
| 2 || 16 || H
 
|-
 
| 17 || 17 || C
 
|-
 
|18 || 25 || H
 
|-
 
|26 || 77 || C
 
|-
 
|78 || 82 || E
 
|-
 
|83 || 98 || C
 
|-
 
|99 || 124 || H
 
|-
 
| 125 || 136 ||C
 
|-
 
|137 || 146 || H
 
|-
 
| 147 || 152 || C
 
|-
 
|153 || 155 || E
 
|-
 
| 156 || 159|| C
 
|-
 
|160 || 166 || H
 
|-
 
|167 || 170 || C
 
|-
 
|171 || 178 || H
 
|-
 
|179 || 212 || C
 
|-
 
|213 || 125 || H
 
|-
 
|126 || 130 || C
 
|-
 
|231 || 236 || E
 
|-
 
|237 || 242 || C
 
|-
 
|243 || 256 || H
 
|-
 
|257 || 259 || C
 
|-
 
|260 || 265 || E
 
|-
 
|266 || 276 || C
 
|-
 
|277 || 278 || H
 
|-
 
|279 || 282 || C
 
|-
 
|283 || 287 || H
 
|-
 
|288 || 296 || C
 
|-
 
|297 || 298 || E
 
|-
 
|299 ||300 || C
 
|-
 
|301 || 318 || H
 
|-
 
|319 || 323 || C
 
|-
 
|324 || 329 || E
 
|-
 
|330 || 347 || C
 
|-
 
|348 || 356 || H
 
|-
 
|357 || 360 || C
 
|-
 
|361 || 370 || H
 
|-
 
|371 || 375 || C
 
|-
 
|377 || 399 || H
 
|-
 
|400 || 404 || C
 
|-
 
|405 || 413 || H
 
|-
 
|414 || 417 || C
 
|-
 
|418 || 434 || H
 
|-
 
|435 || 445 || C
 
|}
 
legend: A=alpha helix, E=beta strand, C=coil
 
   
  +
PSIPRED has predicted 23 coils, 16 alpha helices and 6 beta sheets.
 
  +
PSIPRED has predicted 23 coils, 16 alpha helices and 6 beta sheets as it is shown in the alignment above. In ([[:File:PSIPREDbild.png |Figure 3]]) these predictions are visualized by pink bars which stand for the alpha helices and yellow arrows which symbolize the beta sheets. PSIPRED does not mark coils with a special figure which means that when there is wether a bar nor a arrow than there is a coil.
  +
<br>
  +
As it is shown in the alignment of predicted and real secondary structure of UniProt the prediction is completely wrong in the beginning. In the middle part it become better but still there are many mistakes. It seems that PSIPRED has more problems with beta sheets than with alpha helices because it predicts more beta sheets which do not exists or misses existing beta sheets than alpha helices. In most of the cases it predicts the alpha helices quite good. By comparing with the structure of UniProt it can be seen that especially the long alpha helices are correct predicted. Except of one long region in the middle of the sequence which should be a long beta sheet but is predicted as a alpha helix.
   
 
=== Jpred3 ===
 
=== Jpred3 ===
Line 146: Line 94:
 
year: 1998 <br>
 
year: 1998 <br>
 
version: 3 <br>
 
version: 3 <br>
  +
  +
<br>
  +
Jpred is using a neuronal network to make the predictions. To predict the secondary structure of a protein sequence or of a multiple alignment of protein sequences the algorithm Jnet is used. The prediction accuracy for secondary structures lies above 81%. Additionally Jpred makes predictions about the solvent accessibility. <br>
  +
Jpred3 needs a protein sequence or multiple alignment of protein sequences as input.<br>
  +
It is important that the target sequence is the first sequence in the multiple alignment since the alignment is modified so that the first sequence do not have any gaps. The alignemt has to be in the MSF or in the BLC format. <br>
  +
<br>
 
===== References =====
 
===== References =====
   
Line 152: Line 106:
 
[http://www.compbio.dundee.ac.uk/www-jpred/faq.html FAQ]<br>
 
[http://www.compbio.dundee.ac.uk/www-jpred/faq.html FAQ]<br>
   
  +
<br>
===== Theory =====
 
Jpred is using the neural network called Jnet to predict the secondary structure of a protein sequence or multiple alignment of
 
protein sequences. The prediction accuracy for secondary strctures lies above 81%. Additionally Jpred makes predictions on Solvent
 
Accessibility and Coiled-coil regions. It predicts wether a residue is burried or exposed by using the several cut-off values.
 
   
===== Algorithm =====
+
==== Prediction ====
   
  +
By predicting the secondary structure of BCKDHA with JPred it found many hits with very good e-values in other proteins. <br><br>
   
  +
'''e-value=0.0<br>'''
===== What is predicted =====
 
  +
2bew, 2bev, 2beu, 1x80, 1wci, 1u5b, 1olx, 1ols, 1dtw, 1x7y, 1x7z, 1x7x, 1x7w, 2j9f, 2bff, 1v1r, 1olu, 1v16, 1v11, 2bfc, 2bfb, 1v1m, 2bfd, 2bfe
Jpred3 predicts secondary structure from the sequence or the multiple alignment.<br>
 
  +
<br><br>
It also predicts the relative solvent accessibility
 
  +
'''e-value=6e-58<br>'''
  +
1umd, 1umc, 1umb, 1um9
  +
<br><br>
  +
'''e-value=1e-57<br>'''
  +
2bp7, 1qs0, 1w85, 3dva, 1w88
  +
<br>
  +
<br><br>
  +
[[File:All_JPred_BCKDHA.png |thumb| 550px |Figure4: Visualization of the prediction of JPred (alpha helices: red bars; beta sheets: green arrows)<br> first line: prediction; second and third line: confidence of the prediction]]
  +
With these hits JPred run the prediction:
   
  +
Seq MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
===== Features =====
 
  +
Pred HHHHHHHHHHHHHH EEE
Jpred3 has two different modes:
 
  +
Conf 10090009999980000000323546777770000303566666777777
- single sequence
 
  +
UniProd
- multiple alignment
 
  +
  +
Seq KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
  +
Pred EEEEE HH
  +
Conf 77777777777777654567777777308885377740467787776368
  +
UniProd EEEE HHH HH
  +
  +
Seq KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
  +
Pred HHHHHHHHHHHHHHHHHHHHHHHH E HHHHHHHHHHH
  +
Conf 99999999999999999999875045000001677517899999885278
  +
UniProt HHHHHHHHHHHHHHHHHHHHHHHH EEE HHHHHHHH
  +
  +
Seq TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
  +
Pred EEEE HHHHHHHH HHHHHHHHH
  +
Conf 84465157745788885065689988740677754577777545677777
  +
UniProt EEE HHHHHH HHHHHHHHH CCCC CCC
  +
  +
Seq HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
  +
Pred HHHHHHHHHHHH EEEEEE HHHHHHHH
  +
Conf 64132147888770367889998750688558887407887468999999
  +
UniProt C CCCHHHHHHHHHHHHHHH EEEEEE HHH HHHHHHH
  +
  +
Seq NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
  +
Pred HHHH EEEEEEE HHHHHHH EEEEE
  +
Conf 87500888606888703677777777777764067777005725774078
  +
UniProt HHHHH EEEEEEE EEE HHH EEE HHH HHH EEEEEE
  +
  +
Seq NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
  +
Pred HHHHHHHHHHHHHHHHH EEEEEEEEEE HHH
  +
Conf 74689999999999988507985588886354067777777765553688
  +
UniProt EEEEEEEEEEEEEEEEEE EEEEEE
  +
  +
Seq VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
  +
Pred HHHHHH HHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHH
  +
Conf 99998468758999999986068866899999999999999999988606
  +
UniProt HHHHHHHHHCCCC HHHHHHHHHHHHHHHHHHHHHHHH
  +
  +
Seq PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
  +
Pred HHHHHHH HHHHHHHHHHHHHHHH
  +
Conf 887368777523688756899999999999875267777777888
  +
UniProt HHHH EEEE HHHHHHHHHHHHHHHHHHHH HHH
   
  +
<br>
===== Required information=====
 
  +
By comparing the prediction of the secondary structure of Jpred and the secondary structure of BCKDHA in UniProt as it is done in the alignment above it is remarkable that in the beginning the prediction differs a lot from UniProt but in the middle and in the end it becomes much better. Jpred predicts more helices and less beta sheets than there are in the UniProt secondary structure. It is interesting that although there are no alpha helices in the beginning Jpred predicts them with a quite high confidence. This high confidence can also be seen very good in the visualization of the predition ([[:File:All_JPred_BCKDHA.png |Figure 4]]) where it is displayed by black bars. There is one part in the middle of the sequence where it predicts a very long alpha helix but it should be a beta sheet. It is interesting that PSIPRED also had problems with this beta sheet. In the rest of the middle part the prediction of Jpred is quite correct except for a few positions. ([[:File:All_JPred_BCKDHA.png |Figure 4]]) underlines that the protein mainly consists of alpha helices since there are mainly red bars shown.
Jpred3 needs a protein sequence or multiple alignment of protein sequences as input.
 
   
  +
=== DSSP ===
Multiple sequence: The first sequence has to be the target sequence since the alignment is modified so that the first sequence do not have any gaps. The alignemt has to be in the MSF or in the BLC format.
 
Single sequence: For the sequence a multiple alignment is constructed with PSI-BLAST (3 iteratoins).
 
<br>
 
   
==== Prediction ====
+
==== Basic information ====
   
  +
author: Wolfgang Kabsch and Chris Sander (Max-Planck-Institut fürmedizinische Forschung, Heidelberg)<br>
  +
year: 1983 <br>
  +
whole name: '''D'''efine '''S'''econdary '''S'''tructure of '''P'''roteins
  +
<br><br>
   
  +
Based on atomic coordinates in Protein Data Bank format, DSSP defines the secondary structure of a protein.<br>
{| border="1" style="text-align:center; border-spacing:0;"
 
  +
With this method the secondary structure is not predicted but determined from the 3D coordinates.
!colspan="2" |1u5b (e-value:0)
 
!width="60" |
 
!colspan="2" |1umd (e-value:6e-58)
 
!width="60" |
 
!colspan="2" |1qs0 (e-value:1e-57)
 
!width="60" |
 
!colspan="2" |3dv0 (e-value:2e-51)
 
|-
 
!colspan="2" |[http://www.ebi.ac.uk/pdbsum/1u5b EMBL-EBI]
 
!
 
!colspan="2"|[http://www.ebi.ac.uk/pdbsum/1umd EMBL-EBI]
 
!
 
!colspan="2"|[http://www.ebi.ac.uk/pdbsum/1qs0 EMBL-EBI]
 
!
 
!colspan="2"|[http://www.ebi.ac.uk/pdbsum/3dv0 EMBL-EBI]
 
|-
 
!colspan="2" |[http://www.uniprot.org/uniprot/P12694 UniProt]
 
!
 
!colspan="2"|[http://www.uniprot.org/uniprot/Q5SLR4 UniProt]
 
!
 
!colspan="2"|[http://www.uniprot.org/uniprot/P09060 UniProt]
 
!
 
!colspan="2"|[http://www.uniprot.org/uniprot/P21873 UniProt]
 
|-
 
!position
 
!structural element
 
!
 
!position
 
!structural element
 
!
 
!position
 
!structural element
 
!
 
!position
 
!structural element
 
   
  +
<br>
|-
 
  +
===== Referencse =====
| || || || || || || 10–19 || alpha helix|| || ||
 
  +
[[http://swift.cmbi.ru.nl/gv/dssp/ Introduction]]<br>
|-
 
  +
[[http://swift.cmbi.ru.nl/gv/dssp/DSSP_3.html Explanation ]]
| || || || || || || 24-26|| alpha helix || || ||
 
|-
 
| || || ||35-60 || alpha helix || ||36–38 || alpha helix || || ||
 
|-
 
|61-64 || beta strand || || || || ||44–47|| alpha helix || || 44–69 || alpha helix
 
|-
 
| || || || || || ||48–51|| alpha helix || || ||
 
|-
 
| || || || || || ||67–69|| alpha helix || || ||
 
|-
 
| || || ||74-83 || alpha helix || ||74–99|| alpha helix|| || ||
 
|-
 
| || || || || || || || || || 83–91 || alpha helix
 
|-
 
|91-93|| alpha helix || || 89-92 || beta strand|| || || || || ||
 
|-
 
|99-124 || alpha helix || || 98-104 || alpha helix|| ||102–104|| beta strand |||| 98–100 || beta strand
 
|-
 
| || || ||108-116 || alpha helix ||||110–112 ||turn|| ||106–111 || alpha helix
 
|-
 
| || || || 122-125 || turn || ||113–122||alpha helix|| || 116–124 || alpha helix
 
|-
 
|127-129 || beta strand || || || || ||127–130|| beta strand|| ||127–130 || alpha helix
 
|-
 
| || || || 135-138 || turn|| || || || || ||
 
|-
 
|138-146 || alpha helix || || 144-147 || beta strand|| ||136–141|| alpha helix|| ||||
 
|-
 
|152-154 || beta strand || ||150-162 || alpha helix|| ||146–154||alpha helix||||147 – 161|| alpha helix
 
|-
 
|161-166 || alpha helix || || || || ||160–163|| turn|||| ||
 
|-
 
|171-179 || alpha helix || || 169-173 || beta strand || ||173–175 ||alpha helix|| ||168–173 ||beta strand
 
|-
 
| || || || 176-179 || alpha helix || || 175–178 || alpha helix || || ||
 
|-
 
|185-188 || turn || || 181-193 || alpha helix || ||182–185|| beta strand || ||180–191 || alpha helix
 
|-
 
|198-201 || turn || || 196-204 || beta strand || ||186–200|| alpha helix || ||196–202 || beta strand
 
|-
 
| |||| |||| ||||207–212|| beta strand|| || 204–206||beta strand
 
|-
 
|209-211 || turn || || || || ||211–213 || alpha helix || || ||
 
|-
 
|212-226 || alpha helix || || 222-227 || alpha helix|| ||214–217|| alpha helix|| ||221–226 || alpha helix
 
|-
 
|232-237 || beta strand || || 232-236 || beta strand || ||219–231|| alpha helix|| ||231–235||beta strand
 
|-
 
|240-242 || alpha helix || || 240-255 || alpha helix|| ||235–241 || beta strand|| ||239–254|| alpha helix
 
|-
 
|244-255 || alpha helix || || || || ||243–245|| beta strand|| || ||
 
|-
 
| || || || || || ||250–253 || alpha helix|| || ||
 
|-
 
| || || || || || ||254–257|| turn || || ||
 
|-
 
|260-266 || beta strand || || 261-267 || beta strand|| ||262–266||alpha helix || ||260–265|| beta strand
 
|-
 
|268-270 || beta strand || || || || || || || || ||
 
|-
 
| || || || || || ||270–275|| beta strand || || ||
 
|-
 
|275-277 || alpha helix || || || || || || || || ||
 
|-
 
|280-282 || beta strand || || || || || 279 – 294|| alpha helix || || ||
 
|-
 
|285-287 || alpha helix || || || ||||285–292|| alpha helix || || ||
 
|-
 
|289-291 || alpha helix || || || || || || || || ||
 
|-
 
|294-299 || beta strand |||| 296-305 || alpha helix || ||300–305|| beta strand ||||296–306|| alpha helix
 
|-
 
|303-320 || beta strand || ||306-308 || turn|| ||318–320|| alpha helix || || ||
 
|-
 
|324-329 || beta strand || || 312-334 || alpha helix || ||326–329 || alpha helix|| ||312–334 || alpha helix
 
|-
 
| || || || 341-345 || alpha helix|| || 335–345||alpha helix || ||341–346|| alpha helix
 
|-
 
| || || || 348-351 || beta strand|| || || || || ||
 
|-
 
|360-368 || alpha helix|| || 354-366 || alpha helix|| ||351–373|| alpha helix || || 354–366 || alpha helix
 
|-
 
|369-372 || turn || || || || || || || || ||
 
|-
 
|376-399 || alpha helix|| || || || ||378–380 || beta strand|| || ||
 
|-
 
| || || || || || ||388–390|| alpha helix || || ||
 
|-
 
| || || || || || ||391–396 || beta strand|| || ||
 
|-
 
|405-408 || alpha helix|| || || || ||399–406|| alpha helix || || ||
 
|-
 
|412-415 || beta strand|| || || || || || || || ||
 
|-
 
|418-434 || alpha helix|| || || || || || || || ||
 
|-
 
|435-437 || alpha helix|| || || || || || || || ||
 
|-
 
|440-442 || alpha helix|| || || || || || || || ||
 
|}
 
   
   
{| border="0" style="text-align:center; border-spacing:0;"
 
!1u5b
 
!1umd
 
!1qs0
 
!3dv0
 
|-
 
|e-value:0 || e-value:6e-58 || e-value:1e-57|| e-value:2e-51
 
|-
 
|[[File:1u5bStructurePicture.png]] ||[[File:1umdStructurePicture.png]] ||[[File:1qs0StructurePicture.png]]||[[File:3dv0StructurePicture.png]]
 
   
  +
==== Prediction ====
   
  +
[[File:DSSPOutputColored.png | right | thumb|250px | Figure5: Visualization of the prediction of DSSP.]]
|}
 
   
  +
Seq KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMT
=== DSSP ===
 
  +
Pred TT T TT T T TTT T 333 HHHHHHHHHHHH
  +
UniProt EEEEE HH
  +
  +
Seq LLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYP
  +
Pred HHHHHHHHHHHHHHTTTTT TT HHHHHHHHHTT TTTSSS TT HHHHHHTT
  +
UniProt HHHHHHHHHHHHHH E HHHHHHHHHHH EEEE HHHHHHHH
  +
  +
Seq LELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANR
  +
Pred HHHHHHHHHT TT TTTT T TT TTTT TTTTTHHHHHHHHHHHHHHTT
  +
UniProt HHHHHHHHH HHHHHHHHHHHH
  +
  +
Seq VVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPG
  +
Pred SSSSSSTT333THHHHHHHHHHHHTT SSSSSSS TSSTTSS333T TTTTT333T33
  +
UniProt EEEEEE HHHHHHHHHHHH EEEEEEE HHHHHHH
  +
  +
Seq YGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYR
  +
Pred 3T SSSSSSTT HHHHHHHHHHHHHHHHHHT SSSSSS T TTTT 333T
  +
UniProt EEEEE HHHHHHHHHHHHHHHHH EEEEEEEEEE
  +
  +
Seq VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFS
  +
Pred HHHHHHT HHHHHHHHHHHHTT HHHHHHHHHHHHHHHHHHHHHHHHT 3333TT
  +
UniProt HHHHHH HHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHH HHHHHH
  +
  +
Seq DVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
  +
Pred TTTTT HHHHHHHHHHHHHHHHH333T 333
  +
UniProt H HHHHHHHHHHHHHHHH
   
  +
[[File:DSSPOutputColored.png]] <br><br>
 
  +
<b> Description of the visualization of the prediction</b><br>
  +
It is important to know that the first 50 amino acids of the sequence are not shown. And that the important part for our protein ends on position 391.<br>
 
1. line: Sequence <br>
 
1. line: Sequence <br>
2. line: structral elements <br>
+
2. line: structural elements <br>
 
3. line: if a residue is involved in symmetrie contacts it is labeled with a star <br>
 
3. line: if a residue is involved in symmetrie contacts it is labeled with a star <br>
4. line: if a residue is solvent accessible it is labeled with an "A" <br><br>
+
4. line: if a residue is solvent accessible it is labeled with an "A"
  +
===== Letter code for the secundary structure elements:=====
 
  +
Letter code for the secondary structure elements:<br>
- H (blue): alpha <br>
 
  +
* H (blue): alpha helix<br>
- 3 (yellow): residue in isolated beta-bridge <br>
 
- T (red): hydrogen bonded turn <br>
+
* 3 (yellow): residue in isolated beta-bridge <br>
  +
* T (red): hydrogen bonded turn <br>
- S (green): bend
 
  +
* S (green): bend
  +
  +
As we can see by the comparison of the predicted structure with the structure of BCKDHA of UniProt they match to a large extent. Especially the alpha helices are assigend mainly correct. As it is shown in ([[:File:DSSPOutputColored.png |Figure 5]]) by the blue regions the protein mainly consists of alpha helices so most of the prediction is exact. DSSP has some problems to assign beta sheets which arise from the comparison of the prediction with the UniProt structure.<br>
  +
DSSP offers much more information than the two other tools, since it does not only predict alpha helices, beta sheets and turns but also symmetrie contacts and solvent accessibility.
 
<br><br>
 
<br><br>
[[File:Output.pdf]] <br>
 
   
 
== 2. Prediction of disordered regions ==
 
== 2. Prediction of disordered regions ==
  +
  +
=== General information ===
  +
  +
Disordered regions are long regions which do not have a regular secondary structure. They are dynamically flexible and have only a regular structure when they bind to another substrate or protein. In these regions polar and charged amino acid and especially proline are overrepresentated. The disordered regions are conserved and obtain mainly in regions which have a regulatory function.
  +
Since disordered regions have no clear secondary structure they also have no tertiary structure.
  +
  +
   
 
===DISOPRED===
 
===DISOPRED===
  +
  +
  +
==== Basic information ====
  +
  +
author: Jonathan J. Ward, Liam J. McGuffin, Kevin Bryson, Bernard F. Buxton and David T. Jones (University College London) <br>
  +
year: 2004 <br>
  +
version: 2 <br>
  +
<br>
  +
  +
DISOPRED2 identifies disordered regions by searching residues which appear in the sequence records but have no co-ordinates in the electron density map. This is a very simple method to find disordered regions because the absence of co-ordinates can also be explained with artifacts of the crystalization process.
  +
  +
===== References =====
  +
[http://bioinformatics.oxfordjournals.org/content/early/2004/03/25/bioinformatics.bth195.full.pdf Publication] <br>
  +
[http://bioinf.cs.ucl.ac.uk/disopred/ DISOPRED server] <br>
  +
[http://bioinf.cs.ucl.ac.uk/index.php?id=806 Information] <br>
  +
<br>
  +
  +
==== Prediction ====
  +
 
{|
 
{|
|[[File:DisopredOutseq.png | right]]||[[File:Disopredplot.png | 500px]]
+
|[[File:DisopredOutseq.png | right| 550px|thumb|Figure6: Prediction of the disordered regions]]|| &nbsp;&nbsp;||[[File:Disopredplot.png | thumb|400px|Figure7: Profile plot of the disordered regions]]
 
|}
 
|}
   
  +
In the first line the confidence of the prediction which is shown in the second line is denoted. The prediction of a disordered region is marked with an asterisk (*). All of the disordered regions are predicted with a very high confidence. <br>
 
The disordered regions in BCKDHA are predicted by DISOPRED in the beginning and in the end of the protein.
+
DISOPRED predicts disordered regions mainly in the beginning and a few in the end of BCKDHA as it is shown in [[:File:DisopredOutseq.png |Figure 6]] by the red fields. <br>
  +
[[:File:Disopredplot.png |Figure 7]] on the right side also points out that the disordered regions are in the beginning and in the end since at these two sides there are the highest peaks.
   
 
===POODLE===
 
===POODLE===
=====POODLE-S (Missing residues)=====
 
   
  +
==== Basic information ====
[[File:S_Missing_Residues.png]]
 
   
  +
POODLE uses machine learning approaches to predict the disordered regions of an amino acid sequence. <br>
   
  +
author: <br>
POODLE-S (which predicts short disordered regions) with the option "Missing residues" predicted the disordered regions between the positions 1-56, 341-345 and 420-423. This is also shown in the plot above.
 
  +
- POODLE-L S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi <br>
  +
- POODLE-S K. Shimizu, Y. Muraoka, S. Hirose, and T. Noguchi <br>
  +
- POODLE-W K. Shimizu, Y. Muraoka, S. Hirose, K. Tomii and T. Noguchi <br>
  +
- POODLE-I S.Hirose, K.Shimizu, N.Inoue, S.Kanai and T.Noguchi <br>
   
  +
year: <br>
Detailed sequence with disordered region probability:
 
  +
- POODLE-L 2007 <br>
[[File:PoodleSMissingResiduesOut.pdf]]
 
  +
- POODLE-S 2007 <br>
  +
- POODLE-W 2007 <br>
  +
- POODLE-I 2008 <br><br>
   
  +
options: <br>
The probability can reach from 0 to 1. Where 0 means there is no disordered region and 1 that there is a disordered region.
 
  +
POODLE-L: This tool searches for disordered regions which are longer than 40 consecutive amino acids.<br>
  +
POODLE-S: Here the focus lies on predicting short disordered regions. There are two different subtools: "Missing residues" and "High B-factor residues"<br>
  +
POODLE-W: With this option the proteins which are mostly disordered can be found. <br>
  +
POODLE-I: In this tool the other three tools are combined. POODLE-I also uses structural information to predict disordered regions. It bases on a work-flow approach.
   
  +
<br>
   
=====POODLE-S (High B-Factor residues)=====
+
==== References ====
  +
[[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17545177&ordinalpos=8&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-L]]<br>
  +
[[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17599940&ordinalpos=7&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-S]]<br>
  +
[[http://www.biomedcentral.com/1471-2105/8/78/abstract POODLE-W]]<br>
  +
[[http://mbs.cbrc.jp/poodle/help-i.html POODLE-I]]<br>
  +
[[http://mbs.cbrc.jp/poodle/poodle.html POODLE server]]<br>
  +
[[http://mbs.cbrc.jp/poodle/help.html Help]]<br>
   
[[File:S_BFactor.png]]
 
   
  +
=== Prediction ===
   
POODLE-S (which predicts short disordered regions) with the option "High B-Factor residues" predicted the disordered regions between the positions 6-9, 15-57, 93, 95-96, 340-354 and 379-402. This is also shown in the plot above.
 
   
  +
==== POODLE-S ====
Detailed sequence with disordered region probability:
 
[[File:PoodleSFactorBOut.pdf]]
 
   
  +
{|align="center"
The probability can reach from 0 to 1. Where 0 means there is no disordered region and 1 that there is a disordered region.
 
  +
!POODLE-S<br> Missing residues
  +
!POODLE-S<br> High B-factor residues
  +
|-
  +
|[[File:S_Missing_Residues.png| 400px | thumb | Figure8: POODLE-S (Missing residues): disordered region prediction]] || [[File:S_BFactor.png | 400 px |thumb | Figure9: POODLE-S (High B-factor residues): disordered region prediction]]
  +
|}
  +
<br>
  +
  +
{|border="1" align="center"
  +
!
  +
! colspan="3"|POODLE-S<br> Missing residues
  +
! colspan="6"|POODLE-S<br> High B-factor residues
  +
|-
  +
|disordered region||1-56||341-345||420-423||6-9||15-57||93||95-96||340-354||379-402
  +
|-
  +
|average confidence||0.75||0.58||0.56||0.63||0.77||0.53||0.55||0.67||0.59
  +
|}
  +
  +
  +
POODLE-S (which predicts short disordered regions) with the option "Missing residues" predicted the disordered regions between the positions 1-56, 341-345 and 420-423. This is also shown in [[:File:S_Missing_Residues.png |Figure 8]]. The peaks which are over the cut-off value of 0.5 in the green region stay for the disordered regions. In the beginning there is a very high and also very long peak. Because of this it is clear that the tool predicts with a very high confidence that there is a long region with no fixed structure in the beginning of the protein. The average confidence of 0.75 can also be seen in the table under the figures. The other two numbers in this table point out that the predictions of the two disordered regions in the end of the protein do not have a very high confidence.
  +
We also ran the prediction with POODLE-S with the option "High B-Factor residues". Here the prediction was that there are disordered regions between the positions 6-9, 15-57, 93, 95-96, 340-354 and 379-402. This is also shown in [[:File:S_BFactor.png |Figure 9]].This option predicts more regions with no fixed structure but as in the option "Missing residues" they are in the beginning and in the end of the protein. By comparing [[:File:S_BFactor.png |Figure 9]] with [[:File:S_Missing_Residues.png |Figure 8]] it can be noticed that the predictions in the end are done with more confidence in the second run with "High B-Factor residues". The peaks are much higher and also longer which shows that the predicted disorderes regions are longer.<br>
  +
In both runs POODLE-S has much variation in the middle part of the protein between the peaks. There are always small peaks but they are not high enough to come over the cut-off value.
  +
  +
==== POODLE-L ====
  +
[[File:POODLE_L_BCKDHA.png | 500px|thumb|center|Figure10: POODLE-L : prediction of disordered regions]]
  +
  +
<br>
  +
{|border="1" align="center"
  +
|disordered region||1-48||369-428
  +
|-
  +
|average confidence||0.6||0.67
  +
|}
  +
  +
<br>
  +
POODLE-L predicts two disordered regions which are longer than 40 amino acids. They are located between the positions 1-48 and 369-428. By looking at [[:File:POODLE_L_BCKDHA.png |Figure 10]] we can see that the predictions are in the beginning and in the end of the protein. But both of the predictions only have low peaks so POOLDE-L is not completely confident about the prediction. This observation is supported by the average confidence values of 0.6 and 0.67. This can be explained by the fact that POODLE-L searchs long disordered regions and perhaps the length of the two regions of about 40 amino acids is too short to be a very good match. <br>
  +
Since POODLE-L only looks for long disordered regions it is sure that the rest of the protein does not have any disordered regions. This observation is supported by [[:File:POODLE_L_BCKDHA.png |Figure 10]] because we can see that there are no small peaks in the middle of the plot.
  +
<br><br>
   
  +
==== POODLE-W ====
   
=====POODLE-W=====
 
   
  +
[[File:PoodleW.png |thumb|280px|center|Figure11: POODLE-W: prediction of disordered regions]]<br>
   
[[File:PoodleW.png | width=300px]]
 
   
The regions which could be disordered regions but poodle is not sure are bordered by blue squares and the disordered regions are bordered by red squares.
+
The regions which could be disordered regions but POODLE is not sure are bordered by blue squares and the certain disordered regions are bordered by red squares in [[:File:PoodleW.png |Figure 11]].
   
Detailed sequence with disordered region:
 
[[File:PoodleWDOSeq.pdf]]<br>
 
 
0=ordered regions<br>
 
0=ordered regions<br>
 
5=perhaps disordered regions<br>
 
5=perhaps disordered regions<br>
 
9=disordered regions<br>
 
9=disordered regions<br>
  +
  +
In this case there is no predited disordered region in the beginning of the protein which is completely different to the other two tools of POODLE we already used. Instead the prediction of the disordered region in the end is very good which means that the confidence is high and the space which is predicted to be disordered is very long and reachs till the end of the protein. The first part of the disordered region has no high assurance. But the major part of the match is assigned with the highest possible confidence of 9 which can be seen in [[:File:PoodleW.png |Figure 11]] by the red box.
  +
  +
==== POODLE-I ====
  +
  +
[[File:POODLE_I_BCKDHA.png | thumb|500px|center|Figure12: POODLE-I: prediction of disordered regions]]
  +
  +
  +
{|border="1" align="center"
  +
|disordered region||1-56||341-345||370-427||443-445
  +
|-
  +
|average confidence||0.6||0.56||0.67||0.74
  +
|}
  +
  +
<br>
  +
POODLE-I predicted four disordered regions between the positions 1-56, 341-345, 370-427 and 443-445. These predictions are shown in [[:File:POODLE_I_BCKDHA.png |Figure 12]] where we can see that they are in the beginning and in the end of the protein. The peak in the beginning is quite long but in the middle of the peak it falls very low so that it is nearly under the cut-off value. That is why the average value is also low. But we can see in the plot ([[:File:POODLE_I_BCKDHA.png |Figure 12]]) that there are two maximum confidence values for this peak and they are both around 0.7 which underlines that the prediction is quite sure. The next peak is very short and also has a bad average confidence of 0.56 so it seems that POODLE-I is not sure about the prediction. The third peak is longer than the other peaks and has additionally a good average confidence value of 0.67. The prediction of the peak directly in the end of the protein has the highest value but that is comprehensible since the structure is always less defined in the end of a protein. So we have to be carefully with this hit because it also can be wrong.<br>
  +
Between the predicted regions there are also many small peaks which are not high enough to come over the threshold.
  +
  +
==== Comparison ====
  +
  +
{| border="1" align="center"
  +
!POODLE-S(Missing residues)
  +
!POODLE-S(High B-factor residues)
  +
!POODLE-L
  +
!POODLE-W
  +
!POODLE-I
  +
|-
  +
|1-56||6-9||1-48 ||325-445|| 1-56
  +
|-
  +
|341-345||15-57||369-428|| ||341-345
  +
|-
  +
|420-423||93|| || ||370-427
  +
|-
  +
| ||95-96|| || ||443-445
  +
|-
  +
| ||340-354|| || ||
  +
|-
  +
| ||379-402 || || ||
  +
|}
  +
  +
  +
By comparing all the several tools of POODLE we can summarize that the disordered regions are mainly in the beginning or in the end of the protein. Only POODLE-S predicts them in the middle of the protein but here the regions are so short and the confidence is so low that it is not sure if they are really disordered regions. The predicted disordered regions are mainly between position 1-56 and 341-445. The fact that the disordered regions are in the beginning and in the end of the protein is obvious, since in these regions the structure is always not very well defined. So such a hit can also be a false positive just because of the bad definition of the secondary structure.
  +
<br><br><br>
   
 
===IUPred===
 
===IUPred===
   
  +
  +
==== Basic information ====
  +
  +
author: Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon <br>
  +
year: 2005 <br>
  +
  +
IUPred predicts disordered regions by estimating the capacity of polypeptides to form stabilizing contacts. The potential to form these contacts depends on the surrounding sequence and on the chemical properties. This approach is based on the idea that disordered regions have no capacity to form sufficient interresidue interactions so that there is no stabilizing energy.
  +
  +
There are three different prediction types which can be chosen:<br>
  +
* long disorder: predicts context-independent global disorder that encompasses at least 30 consecutive residues of predicted disorder <br>
  +
* short disorder: predicting short, probably context-dependent, disordered regions, such as missing residues in the X-ray structure of an otherwise globular protein <br>
  +
* structured regions: takes the energy profile and finds continuous regions confidently predicted ordered <br>
  +
  +
===== References =====
  +
[[http://iupred.enzim.hu/index.html IUPred server]]<br>
  +
[[http://iupred.enzim.hu/Theory.html Theory]]
  +
<br>
  +
  +
=== Prediction ===
 
=====Prediction type: long disorder =====
 
=====Prediction type: long disorder =====
   
  +
[[File:Long.png|thumb|center|600px|Figure13: Prediction of disordered regions with IUPred(long disorder)]]
[[File:Long.png]]
 
   
   
  +
{|border="1" align="center"
Detailed sequence with disordered region probability: [[File:LongSeqOut.pdf]]
 
  +
|disordered region||33-50||89-93||385-388||390-397||399-401||404-413||420-422||424-428||431
  +
|-
  +
|average confidence||0.69||0.57||0.52||0.64||0.51||0.55||0.52||0.56||0.55
  +
|}
   
  +
<br>
  +
When using the long disorder tool of IUPred it predicts several disordered regions. They are located at the positions 33-50, 89-93, 385-388, 390-397, 399-401, 404-413, 420-422, 424-428 and on the position 431. Although there are many different regions they are all located in the beginning or in the end of the protein. By looking on [[:File:Long.png |Figure 13]] it strikes out that mainly the peak in the beginning has a high confidence. Since this hit is quite long there are also regions which don't have a high confidence that's why the average value is only 0.69 which is anyway quite good. The second peak is very short and additionally has a weak confidence so it is not sure wether this is a real hit. In the end there are many predicted disordered regions but except of one the prediction of all of them is quite unsure since the confidence value is only a bit over the cut-off value.<br>
  +
Between all these predicted disordered regions there are many peaks which are only a bit under the threshold. By looking at [[:File:Long.png |Figure 13]] the whole protein except of the middle part could be part of a disordered region.
   
 
=====Prediction type: short disorder =====
 
=====Prediction type: short disorder =====
   
  +
[[File:Short.png|thumb|600px|center|Figure14: Prediction of disordered regions with IUPred (short disorder)]]
[[File:Short.png]]
 
   
   
  +
{|border="1" align="center"
Detailed sequence with disordered region probability: [[File:ShortSeqOut.pdf]]
 
  +
|disordered region||1||33-55||92-93||393-411||415||420-421||423-425||427-428||433||438-445
  +
|-
  +
|average confidence||0.56||0.7||0.56||0.57||0.5||0.53||0.53||0.53||0.51||0.73
  +
|}
   
  +
<br><br>
  +
When using the short disorder tool of IUPred it predicts several disordered regions. They are located at the positions 1, 33-55, 92-93, 393-411, 415, 420-421, 423-425, 427-428, 433 and 438-445. The hit on position 1 can be neglected because it is just one residue long and the confidence value is only 0.56. But the next predicted disordered region seems to be important because it is about 20 residues long and the average confidence value is 0.7. [[:File:Short.png |Figure 14]] shows that it is 0.7 because the peak is so long. The maximum confidence value of this region is about 0.8 which signals the high confidence of this disordered region. The next hit is only two residues long and has a maximum value of 0.57. Since it is so short it can also be neglected. After these predictions in the beginning of the protein there are many very short regions in the end of the protein. All of them are only about one or two residues long except of the last predicted region. There are two possibilities for these short regions. Either they are declared as too short so that they are no true disordered regions which is supported by the low confidence values or it is possible that they have to be combined to one long disordered region. The second possibility is supported by the fact that all these short regions are next to each other. Since all the other programms also predicted a disordered region in the end of the protein we decide to take the second possibility. The last hit is the most significant one. Indeed it is only eight residues long but the average confidence value is 0.73 and the maximum value is higher than 0.9. It is obvious that there is such a clear prediction for a disordered region in the end of the protein because this part of a protein normally has no well defined fixed structure but although there is no defined secondary structure it is not said that there is no function.
   
 
=====Prediction type: structured regions =====
 
=====Prediction type: structured regions =====
   
  +
[[File:Structural.png|thumb|600px|center|Figure15: Prediction of disordered regions with IUPred(structured regions)]]
[[File:Structural.png]]
 
   
   
  +
By analyzing the secondary structure with the option "structure regions" the programm could not find any disordered regions in the whole protein and only has as output the information that "Unkown globular domains: 1-445" and [[:File:Structural.png |Figure 15]].
With the option "structured regions" there was no prediction of disordered regions.<br>
 
Only the command "Unkown globular domains: 1-445" appeared.
 
   
  +
=== META-Disorder ===
back to [[Maple syrup urine disease]] main page
 
  +
  +
To run META-Disorder we used the tool of PredictProtein Server. <ref> https://www.predictprotein.org/ </ref> <br>
  +
This tool does not only provide the prediction of disordered regions but also many other features like effects of amino acid substitutions, protein-protein interaction sites and so on. A complete list can be found on the wikipedia page <ref> https://rostlab.org/owiki/index.php/PredictProtein-Machine_Image#What_protein_features_are_predicted_by_methods_included_on_the_PredictProtein_Machine_Image.3F </ref>. It is also possible to download the PredictProtein Machine Image (PPVMI) which contains a fully functional Debian system, prediction methods and supporting databases so it is possible to work locally. A very good description how to use it is given in the already mentioned wikipedia. One very important point is that PredictProtein stores all the predictions which makes it possible to get a very fast answer for the jobs. At the moment there are 4,405,120 annotated proteins in the PredictProtein cache.
  +
  +
=====Prediction=====
  +
[[File:MD.png | thumb | 650px|center |Figure 15: in the picture the output of PredictProtein is shown. In the red box the prediction of the disordered regions is shown. Disordered regions are red and non-disordered regions are green.]]
  +
  +
  +
{|border="1" align="center"
  +
|disordered region||1-9||394-400
  +
|-
  +
|average confidence||0.63||0.57
  +
|}
  +
  +
  +
By predicting the disordered regions with META-DISORDER we only got two regions as possible disordered region. This can be seen in the [[:File:MD.png |Figure 15]] where only the beginning and the end of the strand is red and the rest is green. Also the table shows that the regions are completely in the beginning and in the end. Since there are no other possible disordered regions in this prediction and the fact that the green part seems to be clearly not disordered indicates that these two hits could be wrong. It is not said but in generall these regions of a protein have no very good defined structure although they have a function.
  +
<br><br>
   
  +
=== Comparison ===
  +
By comparing the results of all disordered region prediction tools we can see that all of them predicted disordered regions in the beginning and in the end of the protein. With these results we have to be carefully because in these regions the structure of a protein is always not very well defined. So the hit can arose because of the bad definition of the secondary structure in these regions. But we also have to see that all of the programms predicted these regions and most of them with a high assurance. Because of this fact it seems to be quite sure that the beginning and the end of BCKDHA are disordered regions.
   
 
== 3. Prediction of transmembrane alpha-helices and signal peptides ==
 
== 3. Prediction of transmembrane alpha-helices and signal peptides ==
  +
===General===
   
  +
''' Transmembrane Topology'''
Transmembrane topology and signal peptides are features that are likely to be conserved during evolution.
 
  +
  +
The prediction of the membrane topology of proteins aims at discovering which portions of the protein lie within the lipid bilayer of a membrane and which portions protrude from the membrane into the watery environment. Membrane spanning polypeptides usually form helices of about 20 amino acids length. As the surrounding membrane is hydrophobic, the membrane spanning part of the protein consists of hydrophobic amino acids as well. These information can be used for the prediction of transmembrane helices, which subsequently enables the prediction of the membrane topology. <ref> http://en.wikipedia.org/wiki/Membrane_topology</ref><ref>http://en.wikipedia.org/wiki/Transmembrane_domain</ref>
  +
  +
Prediction tools: [[#TMHMM | TMHMM]], [[#OCTOPUS and SPOCTOPUS| OCTOPUS and SPOCTOPUS]]
  +
  +
'''Signal Peptides'''
  +
  +
Signal peptides are N-terminal sequence motifs directing proteins to their cellular destination, like secretory pathway, mitochondria and chloroplast.
  +
One example for a signal peptide is the secretory signal peptide (SP), which is an N-terminal peptide that is typically 15-30 amino acids long. There are three regions of a signal peptide: an N-terminal region (n-region) which is often built up by positively charged residues, a hydrophobic region (h-region) in the middle of at least six residues and a C-terminal region (c-region) of polar uncharged residues. In Eukaryotes the SP targets proteins across the endoplasmic reticulum, in prokaryotes across the plasma membrane. The SP is cleaved when the protein crosses the membrane.<br>
  +
Furthermore there exist chloroplast transit peptides (cTP) which are also N-terminal and are cleaved when the protein enters the chloroplast. The most conserved site in cTPs is an Alanine directly after the N-terminal methionine...
  +
<ref>O. Emanuelsson, S. Brunak, G. von Heijne, H. Nielsen, "Location proteins in the cell unsing TargetP, SignalP and related tools", Nature Protocols, 2007</ref>
  +
Prediction tools: [[#SignalP | SignalP]], [[#TargetP| TargetP]]
  +
  +
'''Combined transmembrane and signal peptide prediction'''
  +
  +
As the hydrophobic regions of a transmembrane helix and a signal peptide are highly similar, this leads to cross reaction between these two types of prediction. <ref>http://www.ebi.ac.uk/Tools/phobius/help.html</ref>
  +
  +
Prediction tools: [[#Phobius and Polyphobius |Phobius and Polyphobius]]
  +
  +
In the following section different tools for predicting transmembrane helices and signal peptides are tested. As the BCKDHA protein isn't a transmembrane protein, additional proteins were used for the transmembrane and signal peptide analysis:
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
|-
  +
!name|| organism || location||transmembrane protein|| signal peptide || function||reference
  +
|-
  +
|A4_HUMAN || Human || Cell membrane || yes || yes || Protease Inhibitor || [http://www.uniprot.org/uniprot/P05067 P05067]
  +
|-
  +
|BACR_HALSA || Halobacterium salinarium || Cell membrane|| yes || no || ion transport|| [http://www.uniprot.org/uniprot/P02945 P02945]
  +
|-
  +
|INSL5_HUMAN || Human || extracellular region || no || yes ||hormone|| [http://www.uniprot.org/uniprot/Q9Y5Q6 Q9Y5Q6]
  +
|-
  +
|LAMP1_HUMAN || Human|| Cell membrane, Lysosome membrane, Endosome membrane || yes || yes || Presents carbohydrate ligands to selectins || [http://www.uniprot.org/uniprot/P11279 P11279]
  +
|-
  +
|RET4_HUMAN || Human || extracellular space || no || yes ||Transport || [http://www.uniprot.org/uniprot/P02753 P02753]
  +
|}
  +
<br>
   
 
=== TMHMM ===
 
=== TMHMM ===
  +
* TMHMM was developed by Sonnhammer, Heijne and Krogh in 1998 <ref> E.L. Sonnhammer, Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc Int Conf Intell Syst Mol Biol.(1998)</ref>
 
  +
==== Method ====
* TMHMM predicts transmembrane helices in proteins.
 
  +
* Was developed by Sonnhammer, Heijne and Krogh in 1998 <ref> E.L. Sonnhammer, Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc Int Conf Intell Syst Mol Biol.(1998)</ref>
* TMHMM is a membrane topology prediction method based on a hidden Markov model.
 
  +
* Predicts transmembrane topology of membrane-spanning proteins
*
 
  +
* Is a membrane topology prediction method based on a hidden Markov model with an architecture of 7 types of states
  +
* Required Input: protein sequence in fasta format
  +
* Can also be ran on the [http://www.cbs.dtu.dk/services/TMHMM/ TMHMM] server
  +
  +
==== Execution ====
  +
  +
Before we could execute TMHMM we had to change all occurrences of "/usr/local/bin/" to "/usr/bin" in the following files: tmhmm, tmhmm.ORIG and tmhmmformat.pl
  +
  +
To execute the program we used these commands:
  +
*tmhmm P05067.fasta > tmhmm _out_P05067.txt
  +
*tmhmm P02945.fasta > tmhmm _out_P02945.txt
  +
*tmhmm Q9Y5Q6.fasta > tmhmm _out_Q9Y5Q6.txt
  +
*tmhmm P11279.fasta > tmhmm _out_P11279.txt
  +
*tmhmm P02753.fasta > tmhmm _out_P02753.txt
  +
*tmhmm P12694.fasta > tmhmm _out_P12694.txt
  +
  +
==== Results ====
  +
  +
'''BCKDHA'''
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-445||outside
  +
|}
  +
  +
TMHMM predicted no membrane spanning region for the BCKDHA protein, which corresponds to the information provided in Uniprot.
  +
<br><br>
  +
'''A4_HUMAN'''
  +
  +
[[File:BCKDHA_A4_regions.PNG|400px|thumb|Figure16: Membrane topology of A4_HUMAN (source: Uniprot)]]
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-700||outside
  +
|-
  +
|701-723|| TMhelix
  +
|-
  +
|724-770||inside
  +
|}
  +
  +
  +
TMHMM predicted one transmembrane helix for the A4_HUMAN. This agrees with the Uniprot annotation. The predicted transmembrane helix begins at position 701 in the protein, whereas Uniprot states the transmembrane region goes from position 700-723 which can be seen in [[:File:BCKDHA_A4_regions.PNG |Figure 16]]. The extracellular region reported by Uniprot begins at position 18 in the sequence, this is due to a signal peptide in the beginning of the protein. TMHMM doesn't include a signal peptide prediction, therefore it predicted the extracellular region from position 1-700.
  +
<br>
  +
  +
  +
'''BACR_HALSA'''
  +
[[File:BCDKHA_BACR_HALSA_region.PNG|380px|thumb|Figure17: Membrane topology of BACR_HALSA (source: Uniprot)]]
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-22||outside
  +
|-
  +
|23-42||TMhelix
  +
|-
  +
|43-54||inside
  +
|-
  +
|55-77||TMhelix
  +
|-
  +
|78-91||outside
  +
|-
  +
|92-114||TMhelix
  +
|-
  +
|115-120||inside
  +
|-
  +
|121-143||TMhelix
  +
|-
  +
|144-147||outside
  +
|-
  +
|148-170||TMhelix
  +
|-
  +
|171-189||inside
  +
|-
  +
|190-212||TMhelix
  +
|-
  +
|213-262||outside
  +
|}
  +
  +
The TMHMM prediction differs a little bit from the information provided in Uniprot as it can be seen in [[:File:BCDKHA_BACR_HALSA_region.PNG |Figure 17]]. TMHMM predicted only 13 different domains of the protein (the end of the protein is predicted to be in the extracellular space), whereas in Uniprot 15 domains are reported (protein ends in cytoplasma).
  +
<br><br>
  +
'''INSL5_HUMAN'''
  +
  +
[[File:BCKDHA_INSL5_region.PNG|400px|thumb|Figure:18: Membrane topology of INSL5_HUMAN (source: Uniprot)]]
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-135||outside
  +
|}
  +
  +
The TMHMM prediction agrees with the fact that INSL5_HUMAN is a hormone and therefore secreted in the extracellular region. The information about these properties are offered by UniProt and can be seen in [[:File:BCKDHA_INSL5_region.PNG |Figure 18]]
  +
<br><br>
  +
'''LAMP1_HUMAN'''
  +
  +
[[File:LAMP1_regions.PNG|400px|thumb|Figure19: Membrane topology of LAMP1_HUMAN (source: Uniprot)]]
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-10||inside
  +
|-
  +
|11-33||TMhelix
  +
|-
  +
|34-383||outside
  +
|-
  +
|384-406||TMhelix
  +
|-
  +
|407-417||inside
  +
|}
  +
  +
The prediction for LAMP1_HUMAN made by TMHMM does only partially agree with the Uniprot annotation as we can see by comparing the results of TMHMM with the information of UniProt which are shown in [[:File:LAMP1_regions.PNG |Figure 19]]. The sequence parts from the signal peptide and lumenal domain are predicted to be another transmembrane helix and extracellular domain. The second transmembrane helix is predicted correctly.
  +
<br><br>
  +
'''RET4_HUMAN'''
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!Position
  +
!Membrane topology
  +
|-
  +
|1-201||outside
  +
|}
  +
  +
The TMHMM prediction for RET4_HUMAN is correct, as RET4_HUMAN is a secreted protein and does not span any membrane.
   
 
=== Phobius and Polyphobius===
 
=== Phobius and Polyphobius===
  +
* '''Phobius''' was developed by Käll et al <ref> "A Combined Transmembrane Topology and Signal Peptide Prediction Method", Journal of Mol. Biology,338(5):1027-1036, 2004 </ref>
 
  +
==== Methods ====
  +
  +
* '''Phobius''' was developed by Käll et al <ref>Käll et al., "A Combined Transmembrane Topology and Signal Peptide Prediction Method", Journal of Mol. Biology,338(5):1027-1036, 2004 </ref>
 
* combined prediction of transmembrane regions and signal peptids
 
* combined prediction of transmembrane regions and signal peptids
 
* Required input information: only sequence in FASTA-Format (20 amino acids and B, Z, X are recognized)
 
* Required input information: only sequence in FASTA-Format (20 amino acids and B, Z, X are recognized)
 
* As transmembrane topology and signal peptides are likely to be conserved during evolution, '''Polyphobius''' was established <ref>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information", Bioinformatics, 21 (Suppl 1):i251-i257, 2005</ref>, which includes information from homologous sequences to the query.
 
* As transmembrane topology and signal peptides are likely to be conserved during evolution, '''Polyphobius''' was established <ref>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information", Bioinformatics, 21 (Suppl 1):i251-i257, 2005</ref>, which includes information from homologous sequences to the query.
  +
* Required input:
* Required input: 2 Options: Query Sequence in FASTA-Format, which is then blasted agains uniprot_trembl or upload of an alignment in FASTA-Format which provides information about homologs.
 
  +
** Query Sequence in FASTA-Format, which is then blasted agains uniprot_trembl
  +
** Or upload of an alignment in FASTA-Format which provides information about homologs
  +
  +
==== Results ====
  +
  +
{| class = "centered"
  +
!colspan="4"| A4_HUMAN
  +
|-
  +
!||Phobius||Polyphobius
  +
|-
  +
|[[File:BCKDHA_Phobius_A4_HUMAN.png|right|thumb|200px|Figure20: Prediction of Phobius ]]
  +
|<tt>
  +
{|border="1"
  +
!colspan="4"|sp|P05067|A4_HUMAN
  +
|-
  +
|SIGNAL||1||17||
  +
|-
  +
|REGION||1||1||N-REGION
  +
|-
  +
|REGION||2||12||H-REGION
  +
|-
  +
|REGION||13||17||C-REGION
  +
|-
  +
|TOPO_DOM||18||700||NON CYTOPLASMIC
  +
|-
  +
|TRANSMEM||701||723||
  +
|-
  +
|TOPO_DOM||724||770||CYTOPLASMIC
  +
|}
  +
</tt>
  +
|<tt>
  +
{|border="1"
  +
!colspan="4"|sp|P05067|A4_HUMAN
  +
|-
  +
|SIGNAL||1||17||
  +
|-
  +
|REGION||1||3||N-REGION
  +
|-
  +
|REGION||4||12||H-REGION
  +
|-
  +
|REGION||13||17||C-REGION
  +
|-
  +
|TOPO_DOM||18||700||NON CYTOPLASMIC
  +
|-
  +
|TRANSMEM||701||723||
  +
|-
  +
|TOPO_DOM||724||770||CYTOPLASMIC
  +
|}
  +
</tt>
  +
|[[File:BCKDHA_Polyphobius_A4_HUMAN.png|right|200px|thumb|Figure21: Prediction of Polyphobius]]
  +
|}
  +
  +
By comparing the results of Phobius and Polyphobius we can see that they predict mainly the same. Also by looking at [[:File:BCKDHA_Phobius_A4_HUMAN.png |Figure 20]] and [[:File:BCKDHA_Polyphobius_A4_HUMAN.png |Figure 21]] we can see that both predictions are nearly the same.
  +
Phobius and Polyphobius predicted the signal peptide and membrane topology for A4_HUMAN correctly. The signal peptide and membrane topology for A4_HUMAN can be found in [[:File:BCKDHA_A4_regions.PNG |Figure 16]].
   
 
{| class = "centered"
 
{| class = "centered"
Line 450: Line 733:
 
!||Phobius||Polyphobius
 
!||Phobius||Polyphobius
 
|-
 
|-
|[[File:BCKDHA_Phobius_BACR_HALSA.png|right|200px]]
+
|[[File:BCKDHA_Phobius_BACR_HALSA.png|right|200px|thumb|Figure22: Prediction of Phobius]]
 
|<tt>
 
|<tt>
 
{|border="1"
 
{|border="1"
Line 521: Line 804:
 
|}
 
|}
 
</tt>
 
</tt>
|[[File:BCKDHA_Polyphobius_BACR_HALSA.png|right|200px]]
+
|[[File:BCKDHA_Polyphobius_BACR_HALSA.png|right|200px|thumb|Figure23: Prediction of Polyphobius]]
 
|}
 
|}
  +
  +
The predictions of Phobius and Polyphobius differ only in a small variation in the length of the single domains which can be seen by the results in the two tables above. Additionally the comparison of [[:File:BCKDHA_Phobius_BACR_HALSA.png |Figure 22]] with [[:File:BCKDHA_Polyphobius_BACR_HALSA.png |Figure 23]] show that they are mainly the same and only differ a bit in the posterior label probability of cytoplasmic and non cytoplasmic regions. Both predictions of the membrane topology are correct which can be seen by comparing the results with [[:File:BCDKHA_BACR_HALSA_region.PNG |Figure 17]].
   
   
Line 530: Line 815:
 
!||Phobius||Polyphobius
 
!||Phobius||Polyphobius
 
|-
 
|-
|[[File:BCKDHA_Phobius_INSL5_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Phobius_INSL5_HUMAN.png|right|200px|thumb|Figure24: Prediction of Phobius]]
 
|<tt>
 
|<tt>
 
{|border="1"
 
{|border="1"
Line 561: Line 846:
 
|}
 
|}
 
</tt>
 
</tt>
|[[File:BCKDHA_Polyphobius_INSL5_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Polyphobius_INSL5_HUMAN.png|right|200px|thumb|Figure25: Prediction of Polyphobius]]
 
|}
 
|}
  +
  +
The Phobius and Polyphobius predictions for INSL5_HUMAN agree with the information given on UniProt ([[:File:BCKDHA_INSL5_region.PNG |Figure 18]]). By comparing the results in the table above and [[:File:BCKDHA_Phobius_INSL5_HUMAN.png |Figure 24]]
  +
with [[:File:BCKDHA_Polyphobius_INSL5_HUMAN.png |Figure 25]] we can see that both predicted correctly a signal peptide and only one extracellular region of the protein.
   
 
{| class = "centered"
 
{| class = "centered"
Line 569: Line 857:
 
!||Phobius||Polyphobius
 
!||Phobius||Polyphobius
 
|-
 
|-
|[[File:BCKDHA_Phobius_LAMP1_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Phobius_LAMP1_HUMAN.png|right|200px|thumb|Figure26: Prediction of Phobius]]
 
|<tt>
 
|<tt>
 
{|border="1"
 
{|border="1"
Line 608: Line 896:
 
|}
 
|}
 
</tt>
 
</tt>
|[[File:BCKDHA_Polyphobius_LAMP1_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Polyphobius_LAMP1_HUMAN.png|right|200px|thumb|Figure27: Prediction of Polyphobius]]
|}
 
 
 
{| class = "centered"
 
!colspan="4"| A4_HUMAN
 
|-
 
!||Phobius||Polyphobius
 
|-
 
|[[File:BCKDHA_Phobius_A4_HUMAN.png|right|200px]]
 
|<tt>
 
{|border="1"
 
!colspan="4"|sp|P05067|A4_HUMAN
 
|-
 
|SIGNAL||1||17||
 
|-
 
|REGION||1||1||N-REGION
 
|-
 
|REGION||2||12||H-REGION
 
|-
 
|REGION||13||17||C-REGION
 
|-
 
|TOPO_DOM||18||700||NON CYTOPLASMIC
 
|-
 
|TRANSMEM||701||723||
 
|-
 
|TOPO_DOM||724||770||CYTOPLASMIC
 
|}
 
</tt>
 
|<tt>
 
{|border="1"
 
!colspan="4"|sp|P05067|A4_HUMAN
 
|-
 
|SIGNAL||1||17||
 
|-
 
|REGION||1||3||N-REGION
 
|-
 
|REGION||4||12||H-REGION
 
|-
 
|REGION||13||17||C-REGION
 
|-
 
|TOPO_DOM||18||700||NON CYTOPLASMIC
 
|-
 
|TRANSMEM||701||723||
 
|-
 
|TOPO_DOM||724||770||CYTOPLASMIC
 
|}
 
</tt>
 
|[[File:BCKDHA_Polyphobius_A4_HUMAN.png|right|200px]]
 
 
|}
 
|}
   
  +
By comparing the results of Phobius and Polyphobius listet in the table above and shown in [[:File:BCKDHA_Phobius_LAMP1_HUMAN.png |Figure 26]] and [[:File:BCKDHA_Polyphobius_LAMP1_HUMAN.png |Figure 27]] we can assume that the two tools made the same predictions. To find out if these results are correct we compared them to the information offered by UniProt [[:File:LAMP1_regions.PNG |Figure 19]] and can conclude that the signal peptide and membrane topology predictions made by Phobius and Polyphobius for LAMP1_HUMAN are correct.
   
 
{| class = "centered"
 
{| class = "centered"
Line 665: Line 906:
 
!||Phobius||Polyphobius
 
!||Phobius||Polyphobius
 
|-
 
|-
|[[File:BCKDHA_Phobius_RET4_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Phobius_RET4_HUMAN.png|right|200px|thumb|Figure28: Prediction of Phobius]]
 
|<tt>
 
|<tt>
 
{|border="1"
 
{|border="1"
Line 696: Line 937:
 
|}
 
|}
 
</tt>
 
</tt>
|[[File:BCKDHA_Polyphobius_RET4_HUMAN.png|right|200px]]
+
|[[File:BCKDHA_Polyphobius_RET4_HUMAN.png|right|200px|thumb|Figure29: Prediction of Polyphobius]]
 
|}
 
|}
   
  +
Both tools made nearly the same prediction which can be seen out of the table above and because of the visualization of the two predictions ([[:File:BCKDHA_Phobius_RET4_HUMAN.png |Figure 28]], [[:File:BCKDHA_Polyphobius_RET4_HUMAN.png |Figure 29]]). Both predict the signal peptide of RET4_HUMAN correctly, as well as the one extracellular region of the protein.
   
  +
For the BCKDHA-protein Phobius predicted a signal peptide with about 90% probability at the beginning of the sequence. The predicted signal peptide is 34 amino acids long. This matches the information given on UniProt, which says, that BCKDHA contains a 45bp long signal peptide for the transfer into the mitochondrion.
 
  +
The rest of the amino acid is a non cytoplasmic protein sequence. No part of the protein is predicted to be transmembrane spanning. This is also true, as BCKDHA is a protein located in the mitochondrion matrix according to UniProt.
For the BCKDHA-protein Phobius predicted a signal peptide with about 90% probability at the beginning of the sequence. The predicted signal peptide is 34 amino acids long. This matches the information given on Uniprot, which says, that BCKDHA contains a 45bp long signal peptide for the transfer into the mitochondrium.
 
The rest of the amino acid is a non cytoplasmic protein sequence. No part of the protein is predicted to be transmembrane spanning. This is also true, as BCKDHA is a protein located in the mitochondrion matrix according to Uniprot.
 
   
 
{| class = "centered"
 
{| class = "centered"
Line 709: Line 950:
 
!||Phobius||Polyphobius
 
!||Phobius||Polyphobius
 
|-
 
|-
|[[File:Phobius_BCKDHA.png|right|200px]]
+
|[[File:Phobius_BCKDHA.png|right|200px|thumb|Figure30: Prediction of Phobius]]
 
|<tt>
 
|<tt>
 
{|border="1"
 
{|border="1"
Line 735: Line 976:
 
|}
 
|}
 
</tt>
 
</tt>
|[[File:BCKDHA_Polyphobius_BCKDHA.png|right|200px]]
+
|[[File:BCKDHA_Polyphobius_BCKDHA.png|right|200px|thumb|Figure31: Prediction of Polyphobius]]
 
|}
 
|}
Considering the information given on Uniprot, Polyphobius performed worse than Phobius on the BCKDHA-protein sequence. It predicted no signal sequence at the beginning of the protein sequence. There is a low probability for the amino acids between position 1-45 to be a signal sequence, but all in all the whole sequenc is predicted to be a non cytoplasmic protein.
+
Considering the information given on UniProt, Polyphobius performed worse than Phobius on the BCKDHA-protein sequence. It predicted no signal sequence at the beginning of the protein sequence. There is a low probability for the amino acids between position 1-45 to be a signal sequence, but all in all the whole sequence is predicted to be a non cytoplasmic protein. This is also shown in [[:File:BCKDHA_Polyphobius_BCKDHA.png |Figure 31]]. In contrast to the prediction of Polyphobius, Phobius predicted the signal sequence between position 1 and 34 with a very high probability. This probability is visualized very good in [[:File:Phobius_BCKDHA.png |Figure 30]]
   
 
=== OCTOPUS and SPOCTOPUS ===
 
=== OCTOPUS and SPOCTOPUS ===
  +
  +
==== Methods ====
 
* OCTOPUS was developed by Viklund and Elofsson in 2008 <ref>Håkan Viklund and Arne Elofsson, "Improving topology prediction by two-track ANN-based preference scores and an extended topological grammar", Bioinformatics (2008)</ref>
 
* OCTOPUS was developed by Viklund and Elofsson in 2008 <ref>Håkan Viklund and Arne Elofsson, "Improving topology prediction by two-track ANN-based preference scores and an extended topological grammar", Bioinformatics (2008)</ref>
 
* OCTOPUS (obtainer of correct topologies for uncharacterized sequences) uses a combination of hidden Markov models and artificial neural networks.
 
* OCTOPUS (obtainer of correct topologies for uncharacterized sequences) uses a combination of hidden Markov models and artificial neural networks.
Line 747: Line 990:
 
* Required input information: Protein sequence in FASTA-Format
 
* Required input information: Protein sequence in FASTA-Format
   
  +
==== Results ====
BCKDHA_Spoctopus_RET4_HUMAN_small.png
 
  +
  +
'''A4_HUMAN'''
  +
[[File:A4_human_BCKDHA.png|thumb|center| 650px| Figure32: Prediction for Octopus and Spoctopus for A4_HUMAN]]
  +
<br>
  +
When we compare the results of OCTOPUS and SPOCTOPUS with each other we can see that both tools predicted the membrane topology for A4_HUMAN. The output is visualized in [[:File:A4_human_BCKDHA.png |Figure 32]] and it is shown by the brown line that the protein is mainly in the non-cytoplasmic region. OCTOPUS also detected the signal peptide. By comparing the predictions with the information offered by UniProt we can see that the predictions of both tools are correct.
  +
<br>
  +
  +
'''BACR_HALSA'''
  +
[[File:BACR_HALSA_BCKDHA.png|thumb|center| 650px| Figure33: Prediction for Octopus and Spoctopus for BACR_HALSA]]
  +
<br>
  +
The predictions made by OCTOPUS and SPOCTOPUS for BACR_HALSA are identical and correct. The results are visualized in [[:File:BACR_HALSA_BCKDHA.png |Figure 33]]. We can see that the protein is mainly in the transmembrane region which is pointed out by the red bars. Additionally the alternating brown and green lines indicate that the protein changes in turn between non-cytoplasmic region and cytoplasmic region. SPOCTOPUS was not able to predict a signal peptide, which agrees with the information given in UniProt.
  +
<br>
  +
  +
'''INSL5_HUMAN'''
  +
[[File:INSL5_HUMAN_BCKDHA.png|thumb|center| 650px| Figure34: Prediction for Octopus and Spoctopus for INSL5_HUMAN]]
  +
<br>
  +
When we compare the results of the predictions from OCTOPUS and SPOCTOPUS we can see that both of them predicted the protein beeing in a non-cytoplasmic region after position 22 or 23. This conclusion is supported by the brown line in [[:File:INSL5_HUMAN_BCKDHA.png |Figure 34]].
  +
In this picture it is also shown that the two tools made different predictions for the first part of the protein. SPOCTOPUS predicted the signal peptide of INSL5_HUMAN while OCTOPUS predicted for the same part of the sequence a transmembrane domain. By comparing the results with the information in UniProt we can see that the signal peptide is correctly predicted.
  +
<br>
  +
  +
'''LAMP1_HUMAN'''
  +
[[File:LAMP1_HUMAN_BCKDHA.png|thumb|center| 650px| Figure35: Prediction for Octopus and Spoctopus for LAMP1_HUMAN]]
  +
By looking on the visualization of the results ([[:File:LAMP1_HUMAN_BCKDHA.png |Figure 35]]) we can see that the two tools made mainly the same predictions. But their predictions differ in the beginning of the protein. While SPOCTOPUS predicted the beginning beeing a signal peptide, OCTOPUS assigned this region as an additional inside region and transmembrane helix where the sequence contains a signal. As we know from UniProt the prediction of SPOCTOPUS is the right one because LAMP1_HUMAN has a signal peptide in the beginning of the protein.
  +
<br>
  +
  +
'''RET4_HUMAN'''
  +
[[File:RET4_HUMAN_BCKDHA.png|thumb|center| 650px| Figure36: Prediction for Octopus and Spoctopus for RET4_HUMAN]]
  +
Again the two tools made nearly the same predictions and only differ in the beginning of the protein. As we can see in [[:File:RET4_HUMAN_BCKDHA.png|Figure 36]] both of them predicted the protein to be mainly in a non-cytoplasmic region but while SPOCTOPUS predicted the beginning to be a signal peptide, OCTOPUS assigned this region to be a transmembran helix. By comparing the two predictions with the information offered by UniProt it is obvious that there is a signal peptide in the beginning of the protein.
  +
<br>
  +
  +
'''BCKDHA'''
  +
[[File:BCKDHA.png|thumb|center| 650px| Figure37: Prediction for Octopus and Spoctopus for BCKDHA]]
  +
  +
The OCTOPUS and SPOCTOPUS predictions for the BCKDHA protein are completely contrary in terms of the intracellular and extracellular regions which is very clear by considering [[:File:BCKDHA.png|Figure 37]]. But both predictions are wrong, as BCKDHA is no membran protein. Furthermore, SPOCTOPUS missed the 45bp long signal peptide at the beginning of the sequence.
   
 
=== SignalP ===
 
=== SignalP ===
  +
  +
==== Method ====
 
* SignalP was established by Nielsen et al. in 1997<ref>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites", Protein Engineering, 10:1-6, 1997</ref>
 
* SignalP was established by Nielsen et al. in 1997<ref>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites", Protein Engineering, 10:1-6, 1997</ref>
  +
* Focused on neural networks as well as Hidden Markov Models
* SignalP is neural network based. It identifies signal peptides and cleavage sites.
 
  +
* Uses three different scores for the prediction with HMMs:<br>
*
 
  +
** S-score (score for the signal peptide)<br>
*
 
  +
** C-score (score for the clevage site)<br>
  +
** Y-score (combination of the S-score and the C-score but more precise)
  +
* Identifies signal peptides and cleavage sites
  +
* Make predictions for three different organism groups:<br>
  +
** eukaryotes<br>
  +
** Gram-negative<br>
  +
** Gram-positive bacteria
  +
* can also be run on the [http://www.cbs.dtu.dk/services/SignalP/ SignalP] server
  +
  +
==== Execution ====
  +
  +
To run the command line SignalP tool, the path in the SignalP file had to be adapted to /apps/signalp-3.0
  +
  +
Following commands were used to execute SignalP:<br>
  +
*signalp -t euk P05067.fasta > signalp_out_P05067.txt
  +
*signalp -t gram- P02945.fasta > signalp_out_P02945.txt
  +
*signalp -t euk Q9Y5Q6.fasta > signalp_out_Q9Y5Q6.txt
  +
*signalp -t euk P11279.fasta > signalp_out_P11279.txt
  +
*signalp -t euk P02753.fasta > signalp_out_P02753.txt
  +
*signalp -t euk P12694.fasta > signalp_out_P12694.txt
  +
  +
  +
==== Results ====
  +
[[File:CleavageSite_BCKDHA_BY_HMM.png|thumb|200px|Figure38: Prediction by SignalP for BCKDHA using HMMs]]
  +
  +
'''BCKDHA'''
  +
  +
Both methods (NN and HMM) predicted the most likely cleavage site between positions 32 and 33 (ARG_LA). This is visualized very good by the red lines in [[:File:CleavageSite_BCKDHA_BY_HMM.png|Figure 38]]<br>
  +
This prediction does not agree with UniProt, where a signal peptide from position 1-45 is listed.
  +
  +
'''A4_HUMAN'''
  +
  +
SignalP predicted with both methods a cleavage site between positions 17 and 18 with a high probability for a signal peptide.<br>
  +
SignalP predicted the prediction site for A4_HUMAN correct.
  +
  +
'''BACR_HALSA'''
  +
  +
Both methods (NN and HMM) predicted no cleavage site, and therefore no signal peptide, in the BACR_HALSA sequence.<br>
  +
This is also true according to UniProt, where no signal peptide is stated.
  +
  +
'''INSL5_HUMAN'''
  +
  +
For the INSL5_HUMAN protein signalP detected a cleavage site between positions 22 and 23, which is due to a predicted signal peptide at the beginning of the sequence.<br>
  +
The signal peptidase I cleavage site was predicted correctly, as UniProt states a signal peptide from positions 1-22.
  +
  +
  +
'''LAMP1_HUMAN'''
  +
  +
SignalP predicted with both methods a cleavage site between positions 28 and 29, as there is a signal peptide detected.<br>
  +
The cleavage site prediction made by SignalP for LAMP1_HUMAN is correct. UniProt shows a signal peptide for this protein which ranges from 1-28 in the sequence.
  +
  +
'''RET4_HUMAN'''
  +
  +
SignalP predicted a cleavage site with high probability between positions 18 and 19 in both the NN and the HMM method. This cleavage site is predicted to be after a signal peptide.<br>
  +
This prediction is correct according to UniProt.
   
 
=== TargetP ===
 
=== TargetP ===
  +
  +
==== Method ====
 
* TargetP was developed by Emanuelsson et al. in 2002 <ref> Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", J. Mol. Biol., 200: 1005-1016, 2002</ref>
 
* TargetP was developed by Emanuelsson et al. in 2002 <ref> Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", J. Mol. Biol., 200: 1005-1016, 2002</ref>
* TargetP predicts the subcellular location of eukaryotic proteins. additionally: cleavage site predictions
+
* TargetP predicts the subcellular location of eukaryotic proteins
  +
* Additionally it can make cleavage site predictions
* This method is neural network based. The prediction is based on the N-terminal presequences: chloroplast transit peptide(cTP), mitochondiral targeting peptide (mTP) or secretory pathway signal peptide (SP)
 
  +
* This method is neural network based. The prediction is based on the N-terminal presequences:
  +
** chloroplast transit peptide(cTP)
  +
** mitochondrial targeting peptide (mTP)
  +
** secretory pathway signal peptide (SP)
 
* Required input information: Sequence(s) in FASTA format, organism group
 
* Required input information: Sequence(s) in FASTA format, organism group
  +
* The prediction can also be ran on the [http://www.cbs.dtu.dk/services/TargetP/ targetP] server
   
  +
==== Results ====
back to [[Maple syrup urine disease]] main page
 
  +
{|
  +
|[[File:BCKDHA_TargetP.PNG|thumb| 500px|left| Figure39: prediction results by TargetP]]
  +
|-
  +
|All the results of the prediction of TargetP are shown in the table in [[:File:BCKDHA_TargetP.PNG |Figure 39]]. <br>
  +
The ODBA_HUMAN (BCKDHA) is predicted to be located in the mitochondrion, which is true according to UniProt.
  +
All other tested proteins are predicted to be located in the secretory pathway and therefore to have a signal peptide. These predictions are true except for BACR_HALSA, which has no signal peptide. But here TargetP returns a reliabilty index of four, which indicates an unsafe prediction.
  +
|}
   
 
== 4. Prediction of GO terms ==
 
== 4. Prediction of GO terms ==
  +
  +
The following section deals with GO term prediction tools. In order to verify the predictions, first the real GO annotations are presented (as they are listed in <ref>http://www.uniprot.org</ref>:<br>
  +
(P: Process, F: Function, C: Component)
  +
  +
''' BCKDHA'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|metabolic process||0008152||P
  +
|-
  +
|branched chain family amino acid catabolic process||0009083||P
  +
|-
  +
|cellular nitrogen compound metabolic process||0034641||P
  +
|-
  +
|oxidation-reduction process||0055114||P
  +
|-
  +
|colspan="3" style="text-align:center"| '''Function'''
  +
|-
  +
|alpha-ketoacid dehydrogenase activity||0003826||F
  +
|-
  +
|3-methyl-2-oxobutanoate dehydrogenase (2-methylpropanoyl-transferring) activity||0003863||F
  +
|-
  +
|protein binding||0005515||F
  +
|-
  +
|oxidoreductase activity||0016491||F
  +
|-
  +
|oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptors||0016624||F
  +
|-
  +
|carboxy-lyase activity||0016831||F
  +
|-
  +
|metal ion binding||0046872||F
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|mitochondrion||0005739||C
  +
|-
  +
|mitochondrial matrix||0005739||C
  +
|-
  +
|mitochondrial alpha-ketoglutarate dehydrogenase complex||0005947||C
  +
|}
  +
  +
'''A4_HUMAN'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|G2 phase of mitotic cell cycle||0000085||P
  +
|-
  +
|suckling behaviour||0001967||P
  +
|-
  +
|plantelet degranulation||0002576||P
  +
|-
  +
|mRNA polyadenylation||0006378||P
  +
|-
  +
|regulation of translation||0006417||P
  +
|-
  +
|protein phosphorylation||0006468||P
  +
|-
  +
|cellular copper ion homeostasis||0006878||P
  +
|-
  +
|endocytosis||0006897||P
  +
|-
  +
|apoptosis||0006915||P
  +
|-
  +
|induction of apoptosis||0006917||P
  +
|-
  +
|cell adhesion||0007155||P
  +
|-
  +
|regulation of epidermal growth factor receptor activity||0007176||P
  +
|-
  +
|Notch signaling pathway||0007219||P
  +
|-
  +
|axonogenesis||0007409||P
  +
|-
  +
|blood coagulation||0007596||P
  +
|-
  +
|mating bahavior||0007617||P
  +
|-
  +
|locomotory behavior||0007626||P
  +
|-
  +
|axon cargo transport||0008088||P
  +
|-
  +
|cell death||0008219||P
  +
|-
  +
|adult locomotory behavior||0008344||P
  +
|-
  +
|visual learning||0008542||P
  +
|-
  +
|negative regulation of peptidase activity||0010466||P
  +
|-
  +
|positive regulation of peptidase activity||0010951||P
  +
|-
  +
|axon midline choice point recognition||0016199||P
  +
|-
  +
|neuron remodeling||0016322||P
  +
|-
  +
|dendrite development||0016358||P
  +
|-
  +
|platelet activation||0030168||P
  +
|-
  +
|extracellular matrix organization||0030198||P
  +
|-
  +
|forebrain development||0030900||P
  +
|-
  +
|neuron projection development||0031175||P
  +
|-
  +
|ionotropic glutamate recptor signaling pathway||0035235||P
  +
|-
  +
|regulation of multicellular organism growth||0040014||P
  +
|-
  +
|innate immune response||0045087||P
  +
|-
  +
|negative regulation of neuron differentiation||0045665||P
  +
|-
  +
|positive regulation of mitotic cell cycle||0045931||P
  +
|-
  +
|positive regulation of transcription from RNA polymerase II promotor||0045944||P
  +
|-
  +
|collateral sprouting in absence of injury||0048699||P
  +
|-
  +
|regulation of synapse structure and activity||0050803||P
  +
|-
  +
|neuromuscular process controling balance||0050885||P
  +
|-
  +
|synaptic growth at neuromuscular junction||0051124||P
  +
|-
  +
|neuron apoptosis||0051402||P
  +
|-
  +
|smooth endoplasmic reticulum calcium ion homeostasis||0051563||P
  +
|-
  +
|colspan="3" style="text-align:center"|'''Function'''
  +
|-
  +
|DNA binding||0003677||F
  +
|-
  +
|serine-type endopeptidase inhibitor activity||0004867||F
  +
|-
  +
|receptor binding||0005102||F
  +
|-
  +
|binding||0005488||F
  +
|-
  +
|protein binding||0005515||F
  +
|-
  +
|heparin binding||0008201||F
  +
|-
  +
|peptidase activator activity||0016504||F
  +
|-
  +
|peptidase inhibitor activity||0030414||F
  +
|-
  +
|acetylcholine receptor binding||0033130||F
  +
|-
  +
|identical protein binding||0042802||F
  +
|-
  +
|metal ion binding||0046872||F
  +
|-
  +
|PTB domain binding||0051425||F
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|exracellular region||0005576||C
  +
|-
  +
|membrane fraction||0005624||C
  +
|-
  +
|cytoplasm||0005737||C
  +
|-
  +
|Golgi apparatus||0005794||C
  +
|-
  +
|plasma membrane||0005886||C
  +
|-
  +
|integral to plasma membrane||0005887||C
  +
|-
  +
|coated pit||0005905||C
  +
|-
  +
|cell surface||0009986||C
  +
|-
  +
|membrane||0016020||C
  +
|-
  +
|integral to membrane||0016021||C
  +
|-
  +
|synaptosome||0019717||C
  +
|-
  +
|axon||0030424||C
  +
|-
  +
|plantelet alpha granule lumen||0031093||C
  +
|-
  +
|cytoplasmic vesicle||0031410||C
  +
|-
  +
|neuromuscular junction||0031594||C
  +
|-
  +
|ciliary rootlet||0035253||C
  +
|-
  +
|neuron projection||0042005||C
  +
|-
  +
|dendritic spine||0043197||C
  +
|-
  +
|dendritic shaft||0043198||C
  +
|-
  +
|intracellular membrane-bounded organelle||0043231||C
  +
|-
  +
|apical part of cell||0045177||C
  +
|-
  +
|synapse||0045202||C
  +
|-
  +
|perinuclear region of cytoplasm||0048471||C
  +
|-
  +
|spindle midzone||0051233||C
  +
|}
  +
  +
  +
'''BACR_HALSA'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|transport||0006810||P
  +
|-
  +
|ion transport||0006811||P
  +
|-
  +
|phototransduction||007602||P
  +
|-
  +
|photon transport||0015992||P
  +
|-
  +
|protein-chromophore linkage||0018298||P
  +
|-
  +
|response to stimulus||0050896||P
  +
|-
  +
|colspan="3" style="text-align:center"|'''Function'''
  +
|-
  +
|receptor activity||0004872||F
  +
|-
  +
|ion channel activity||0005216||F
  +
|-
  +
|photoreceptor activity||0009881||F
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|plasma membrane||0005886||C
  +
|-
  +
|membrane||0016020||C
  +
|-
  +
|integral to membrane||0016021||C
  +
|}
  +
  +
'''INSL5_HUMAN'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|biological_process||0008150||P
  +
|-
  +
|colspan="3" style="text-align:center"|'''Function'''
  +
|-
  +
|hormone activitiy||0005279||F
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|cellular_component||0005575||C
  +
|-
  +
|extracellular region ||0005576||C
  +
|}
  +
  +
'''LAMP1_HUMAN'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|autophagy||0006914||P
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|membrane fraction||0005624||C
  +
|-
  +
|lysosome||0005764||C
  +
|-
  +
|lysosomal membrane||0005765||C
  +
|-
  +
|endosome||0005768||C
  +
|-
  +
|late endosome||0005770||C
  +
|-
  +
|multivesicular body||0005771||C
  +
|-
  +
|plasma membrane||0005886||C
  +
|-
  +
|integral to plasma membrane||0005887||C
  +
|-
  +
|external side of plasma membrane||0009897||C
  +
|-
  +
|cell surface||0009986||C
  +
|-
  +
|endosome membrane||0010008||C
  +
|-
  +
|membrane||0016020||C
  +
|-
  +
|integral to membrane||0016021||C
  +
|-
  +
|vesicle||0031982||C
  +
|-
  +
|sarcolemma||0042383||C
  +
|-
  +
|melanosome||0042470||C
  +
|}
  +
  +
'''RET4_HUMAN'''
  +
{|border="1" style="text-align:left; border-spacing:0;"
  +
!style="width:80%" | GO Term Name
  +
!style="width:13%"| GO identifier
  +
!style="width:7%"| Aspect
  +
|-
  +
|colspan="3" style="text-align:center"|'''Process'''
  +
|-
  +
|eye development||0001654||P
  +
|-
  +
|gluconeogenesis||0006094||P
  +
|-
  +
|transport||0006810||P
  +
|-
  +
|spermatogenesis||0007283||P
  +
|-
  +
|heart development||0007507||P
  +
|-
  +
|visual perception||0007601||P
  +
|-
  +
|male gonad development||0008584||P
  +
|-
  +
|embryo development||0009790||P
  +
|-
  +
|maintenance of gastrointestinal epithelium||0030277||P
  +
|-
  +
|lung development||0030324||P
  +
|-
  +
|positive regulation of insulin secretion||0033024||P
  +
|-
  +
|response to retinoic acid||0032526||P
  +
|-
  +
|response to insulin stimulis||0032868||P
  +
|-
  +
|retinol transport||0034633||P
  +
|-
  +
|retinol metabolic process||0042572||P
  +
|-
  +
|retinal metabolic process||0042574||P
  +
|-
  +
|glucose homeostasis||0042593||P
  +
|-
  +
|response to ethanol||0045471||P
  +
|-
  +
|embryonic organ morphogenesis||0048562||P
  +
|-
  +
|embryonic skeletal system development||0048706||P
  +
|-
  +
|cardiac muscle tissue development||0048738||P
  +
|-
  +
|female genitalia morphogenesis||0048807||P
  +
|-
  +
|response to stimulus||0050896||P
  +
|-
  +
|detection of light stimulus involved in visual perception||0050908||P
  +
|-
  +
|positive regulation of immunoglobin secretion||0051024||P
  +
|-
  +
|retina development in camera-type eye||0060041||P
  +
|-
  +
|negative regulation of cardiac muscle cell proliferation||0060044||P
  +
|-
  +
|embryonic retina morphogenesis in camera-type eye||0060059||P
  +
|-
  +
|uterus development||0060065||P
  +
|-
  +
|vagina development||0060068||P
  +
|-
  +
|urinary bladder development||0060157||P
  +
|-
  +
|heart trabecula formation||0060347||P
  +
|-
  +
|colspan="3" style="text-align:center"|'''Function'''
  +
|-
  +
|transporter activity||0005215||F
  +
|-
  +
|binding||0005488||F
  +
|-
  +
|retinoid binding||0005501||F
  +
|-
  +
|protein binding||0005515||F
  +
|-
  +
|retinal binding||0016918||F
  +
|-
  +
|retinol binding||0019841||F
  +
|-
  +
|retinol transporter activity||0034632||F
  +
|-
  +
|colspan="3" style="text-align:center"|'''Component'''
  +
|-
  +
|extracellular region||0005576||C
  +
|-
  +
|extracellular space||0005615||C
  +
|}
   
 
=== GOPET ===
 
=== GOPET ===
  +
GOPET results fot BCKDHA:
 
  +
==== Method ====
  +
* GOPET (Gene Ontology Term Prediction and Evaluation Tool) was described by Vinayagam et al.<ref> Arunachalam Vinayagam, Coral Del Val, Falk Schubert, Roland Eils, Karl-Heinz Glatting, Sándor Suhai, Rainer König, "GOPET: A tool for automated predictions of Gene Ontology terms", BMC Bioinformatics (2006), Volume: 7, Issue: 161, Publisher: BioMed Central, Pages: 161</ref>
  +
* GOPET is a complete automated tool for assigning molecular function terms to a given sequence.
  +
* Bases on homology searches and Support Vector Machines
  +
* Required input information: cDNA or protein sequence
  +
* Gene Ontology is used for annotation terms, GO-mapped protein databases for performing homology searches and Support Vector Machines for the prediction and the assignment of confidence values.
  +
* The prediction is organism independent.
  +
  +
==== Results ====
  +
  +
'''BCKDHA'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 799: Line 1,570:
 
|GO:0046872||F||62%||metal ion binding
 
|GO:0046872||F||62%||metal ion binding
 
|}
 
|}
  +
  +
The GOPET predictions for BCKDHA are mostly correct. The by this tool predicted GO terms with confidence >90% are all listed in the UniProt entry for BCKDHA and so is the metal ion binding function.
   
   
GOPET results for A4_HUMAN
+
'''A4_HUMAN'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 835: Line 1,608:
 
|}
 
|}
   
  +
The GOPET results for A4_HUMAN match the UniProt annotation quite good. The predicted trypsin inhibitor activity and the plasmin inhibitor activity are not present in UniProt, as well as the peptidase inhibitor activity or the endopeptidase activity. But as the predicted serine-type endopeptidase inhibitor activity can be seen as a subcategory of the previously named functions, and it is a true function of the A4_HUMAN protein, the predictions are not that bad. The same is true for the zinc, copper and iron ion binding function, which are all metals, and the protein has a metal ion binding function.
   
  +
GOPET results for BACR_HALSA:
 
  +
'''BACR_HALSA'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 850: Line 1,625:
 
|}
 
|}
   
  +
GOPET predicted the ion channel activity and the photorecptor activity correctly. The hydrogen ion transmembrane transporter activity does not agree with the UniProt annotations.
   
GOPET results for INSL5_HUMAN:
+
'''INSL5_HUMAN'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 861: Line 1,637:
 
|}
 
|}
   
  +
The INSL5_HUMAN protein is correctly predicted to be a hormone.
   
  +
GOPET results for LAMP1_HUMAN:
 
  +
'''LAMP1_HUMAN'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 874: Line 1,652:
 
|}
 
|}
   
  +
For the LAMP1_HUMAN protein, no functional GO annotation is listed in UniProt.
   
  +
GOPET results for RET4_HUMAN:
 
  +
'''RET4_HUMAN'''
 
{|border="1"
 
{|border="1"
 
!GOid
 
!GOid
Line 898: Line 1,678:
 
|GO:0008035||F||60%||high-density lipoprotein particle binding
 
|GO:0008035||F||60%||high-density lipoprotein particle binding
 
|}
 
|}
  +
  +
The GOPET predictions for RET4_HUMAN are correct except for the lipid-linked activities.
   
 
=== Pfam ===
 
=== Pfam ===
   
  +
==== Method ====
{|
 
  +
!Query
 
  +
* Pfam was established by Finn et al. in 2008. It is described in <ref>Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8</ref>
!Cellular Component
 
  +
* Pfam is a database which contains protein families and domains
!Molecular function
 
  +
* The databes consists of two different parts: Pfam-A and Pfam-B
!Biological Process
 
  +
** Pfam-A: more exactly
|-
 
  +
** Pfam-B: generated automatically and so the data are not as qualitativ as in Pfam-A
|BCKDHA
 
  +
* Webserver: [http://pfam.sanger.ac.uk/search Pfam]
|
 
  +
<br>
|GO:0016624 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor)
 
  +
|GO:0008152 (metabolic process)
 
  +
==== Results ====
|-
 
  +
|A4_HUMAN
 
  +
'''BCKDHA'''
|GO:0016021 (integral to membrane)
 
  +
[[File:PfamA_BCKDHA.png|thumb|200px|Figure40: prediction of Pfam-A for BCKDHA]]
|GO:0005488 (bindung)
 
  +
Pfam found one significant match in the database which is visualized in [[:File:PfamA_BCKDHA.png |Figure 40]].
|
 
  +
|-
 
  +
* Molecular function
|BACR_HALSA
 
  +
** GO:0016624 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor)
|GO:0016020 (membrane)
 
  +
|GO:0005216 (ion channel activity)
 
  +
* Biological Process
|GO: 0006811 (ion transport)
 
  +
** GO:0008152 (metabolic process)
|-
 
  +
|INSL5_HUMAN
 
  +
|GO:0005576 (extracellular region)
 
  +
|GO:0005179 (hormone activity)
 
  +
'''A4_HUMAN'''
|
 
  +
[[File:A4_HUMAN_BCKDHA.png|thumb|200px|Figure41: prediction of Pfam-A for A4_HUMAN]]
|-
 
  +
Pfam found six significant matches in the database which are visualized in [[:File:A4_HUMAN_BCKDHA.png |Figure 41]].
|LAMP1_HUMAN
 
  +
|GO:0016020 (membrane)
 
  +
* Cellular Component
|
 
  +
** GO:0016021 (integral to membrane)
|
 
  +
|
 
  +
* Molecular function
|-
 
  +
** GO:0005488 (binding)
|RET4_HUMAN
 
  +
** GO:0004867 (serine-type endopeptidase inhibitor activity )
|
 
  +
|GO:0005488 (binding)
 
  +
* No GO-ID
|
 
  +
** E2 domain of amyloid precursor protein
|}
 
  +
** beta-amyloid precursor protein C-terminus
  +
  +
  +
  +
  +
'''BACR_HALSA'''
  +
[[File:Pfam_BACR_HALSA_BCKDHA.png|thumb|200px|Figure42: prediction of Pfam-A for BACR_HALSA]]
  +
Pfam found one significant match in the database which is visualized in [[:File:Pfam_BACR_HALSA_BCKDHA.png |Figure 42]].
  +
  +
* Cellular Component
  +
** GO:0016020 (membrane)
  +
  +
* Molecular function
  +
** GO:0005216 (ion channel activity)
  +
  +
* Biological Process
  +
**GO: 0006811 (ion transport)
  +
  +
  +
  +
  +
'''INSL5_HUMAN'''
  +
[[File:INSL_HUMAN_BCKDHA_Pfam.png|thumb|200px|Figure43: prediction of Pfam-A for INSL5_HUMAN]]
  +
Pfam found one significant match in the database which is visualized in [[:File:INSL_HUMAN_BCKDHA_Pfam.png |Figure 43]].
  +
  +
* Cellular Component
  +
** GO:0005576 (extracellular region)
  +
  +
* Molecular function
  +
** GO:0005179 (hormone activity)
  +
  +
  +
  +
'''LAMP1_HUMAN'''
  +
[[File:LAMP1_BCKDHA_Pfam.png|thumb|200px|Figure44: prediction of Pfam-A for LAMP1_HUMAN]]
  +
Pfam found one significant match in the database which is visualized in [[:File:LAMP1_BCKDHA_Pfam.png |Figure 44]].
  +
  +
* Cellular Component
  +
** GO:0016020 (membrane)
  +
  +
  +
  +
  +
'''RET4_HUMAN'''
  +
[[File:RET4_HUMAN_BCKDHA_Pfam.png|thumb|200px|Figure45: prediction of Pfam-A for RET4_HUMAN]]
  +
Pfam found one significant match in the database which is visualized in [[:File:RET4_HUMAN_BCKDHA_Pfam.png |Figure 45]].
  +
  +
* Molecular function
  +
** GO:0005488 (binding)
  +
  +
  +
  +
By comparing the Pfam annotations with the already known GO terms for the different proteins it can be seen that the results for all analysed proteins are correct, but by far not exhaustive.
  +
  +
=== ProtFun 2.2 ===
  +
  +
==== Method ====
  +
  +
* ProtFun is described in : Jensen et al.<ref>Prediction of human protein function from post-translational modifications and localization features.
  +
L. Juhl Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Stærfeldt,
  +
K. Rapacki, C. Workman, C. A. F. Andersen, S. Knudsen, A. Krogh, A. Valencia and S. Brunak.
  +
J. Mol. Biol., 319:1257-1265, 2002</ref>
  +
* ProtFun is an ab initio prediction server of protein function from sequence. Various servers are queried and the provided information is integrated into the final prediction.
  +
* The results of ProtFun are only probabilities and odd scores and no prediction if a protein has a specific function or not.
  +
* The arrow (=>) indicates which line includes the highest information content
  +
  +
==== Results ====
  +
  +
'''BCKDHA'''
  +
[[File:BCKDHA_ProtFun_BCKDHA.png|500px|thumb|Figure46: Prediction of GO-terms for BCKDHA by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Central_intermediary_metabolism (Prob: 0.321, Odds: 5.096) (=>)
  +
** Amino_acid_biosynthesis (Prob: 0.187, Odds: 8.520)
  +
** Purines_and_purymidines (Prob: 0.257, Odds: 1.059)
  +
** Biosynthesis_of_cofactors (Prob: 0.246, Odds: 3.413)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Enzyme (Prob: 0.769, Odds: 2.683)
  +
* <b>Enzyme class</b>
  +
** Ligase (Prob: 0.085, Odds: 1.673) (=>)
  +
** Lyase (Prob: 0.076, Odds: 1.614)
  +
* <b>Gene Ontology category</b>
  +
** Growth_factor (Prob: 0.009, Odds: 0.609)
  +
** Signal_transducer (Prob: 0.098, Odds: 0.458)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in BCKDHA are listed in [[:File:BCKDHA_ProtFun_BCKDHA.png |Figure 46]].
  +
In the enumeration above the most significant results are summarized. The programm predicted BCKDHA to have mainly a function in the metabolic process. Also the second point of "functional category" has a very good odd score and so we also consider it to be a certain prediction. The other two entries are the ones with the next best probability or odd score. But we can see that in both cases the odd score is much lower than in the first two results. So we take the first entries as the best predictions of ProtFun2.2. By comparing these assertions with the information in UniProt we can see that they are correct. There was no certain prediction for the "Gene Ontology category". We listed the two best results above but as we can see by looking at the probability and the odd score the results are not significant.
  +
<br><br>
  +
'''A4_HUMAN'''
  +
[[File:BCKDHA_ProtFun_A4_Human.png|500px|thumb|Figure47: Prediction of GO-terms for A4_HUMAN by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
  +
** Transport_and_Binding (Prob: 0.827, Odds: 2.016)
  +
** Biosynthesis_of_cofactors (Prob: 0.261, Odds: 3.623)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Enzyme (Prob: 0.392, Odds: 1.368)
  +
* <b>Enzyme class</b>
  +
** Ligase (Prob: 0.048, Odds: 0.946)
  +
** Transferase (Prob: 0.208, Odds: 0.603)
  +
** Hydrolase (Prob: 0.190, Odds: 0.600)
  +
* <b>Gene Ontology category</b>
  +
** Structural_protein (Prob: 0.034, Odds: 1.205) (=>)
  +
** Stress_response (Prob: 0.076, Odds: 0.862)
  +
** Signal transducer (Prob: 0.126, Odds: 0.586)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in A4_HUMAN are listed in [[:File:BCKDHA_ProtFun_A4_Human.png|Figure 47]].
  +
The most significant results are shown in the listing above. Here we can see that A4_HUMAN is predicted to be a "Cell_envelope" with an odd score of 13.186 which indicates that this prediction is very confident. And as we know from UniProt it is right. A4_HUMAN has also the function of transport and binding which is the next point in the list. So we can see that although the odd score is much lower than the fist one, the prediction is indeed correct. As well as the claim that A4_HUMAN is involved in the biosynthesis of cofactors. ProtFun2.2 assumed that this protein is a structural protein. This is again correct according to the information in the beginning of this section. But this is not the only correct predicted GO-term for A4_HUMAN which is shown by the two other listed GO-terms. By comparing the whole list of predicted GO-terms in [[:File:BCKDHA_ProtFun_A4_Human.png|Figure 47]] with the given information we can see that all of the predictions are right.
  +
<br><br>
  +
'''BACR_HALSA'''
  +
[[File:BCKDHA_ProtFun_BACR_HALSA.png|500px|thumb|Figure48: Prediction of GO-terms for BACR_HALSA by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Transport_and_Binding (Prob: 0.791, Odds: 1.929) (=>)
  +
** Biosynthesis of cofactors (Prob: 0.186, Odds: 2.589)
  +
** Purines_and_pyrimidines (Prob: 0.302, Odds: 1.244)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Nonenzyme (Prob: 0.801, Odds: 1.122)
  +
* <b>Enzyme class</b>
  +
** none
  +
* <b>Gene Ontology category</b>
  +
** Transporter (Prob: 0.400, Odds: 4.036) (=>)
  +
** Receptor (Prob: 0.355, Odds: 2.087)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in BACR_HALSA are listed in [[:File:BCKDHA_ProtFun_BACR_HALSA.png|Figure 48]].
  +
The most significant results are shown in the list above. Since both the prediction in the "functional category" and in "gene ontology category" declared BACR_HALSA to be mainly a transporter it can be assumed that this prediction is very significant. By looking at the UniProt GO annotations we can see that they include ion transport and photon transport, as well as transport itself so we can say that this prediction was correct. But the assumption that BACR_HALSA is involved in the biosynthesis of cofactors is wrong although it has a quite good odd score. But it has a very low probability which shows that the information of both is important. The last point in the list shows that ProtFun2.2 predicted receptor functionallity. This is also a correct prediction because in UniProt is listed that this protein has a receptor activity.
  +
<br><br>
  +
'''INSL5_HUMAN'''
  +
[[File:BCKDHA_ProtFun_INSL5_Human.png|500px|thumb|Figure49: Prediction of GO-terms for INSL5_HUMAN by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Cell_envelope (Prob: 0.756, Odds: 12.393) (=>)
  +
** Transport_and_binding (Prob: 0.834, Odds: 2.033)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Nonenzyme (Prob: 0.791, Odds: 1.109)
  +
* <b>Enzyme class</b>
  +
** none
  +
* <b>Gene Ontology category</b>
  +
** Hormone (Prob: 0.247, Odds: 37.936) (=>)
  +
** Growth_factor (Prob: 0.061, Odds: 4.379)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in INSL5_HUMAN are listed in [[:File:BCKDHA_ProtFun_INSL5_Human.png|Figure 49]].
  +
The most significant results are specified above. The first prediction of ProtFun2.2 which maintained that INSL5_HUMAN can be classified in the functional category "cell envelope" has a very high odd score of 12.393. This suggests that this prediction is correct which can be confirmed by the information of UniProt. The prediction that INSL5_HUMAN is involved in transport and binding is also correct predicted. Additionally ProtFun predicted the hormone activity of INSL5_HUMAN correctly. By comparing this prediction with the information offered by UniProt we can see that it is correct. But it is additionally the only GO-term for this protein which means that the prediction of "growth factor" has to be wrong.
  +
<br><br>
  +
'''LAMP1_HUMAN'''
  +
[[File:BCKDHA_ProtFun_LAMP1_Human.png|500px|thumb|Figure50: Prediction of GO-terms for LAMP1_HUMAN by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
  +
** Transport_and_binding (Prob: 0.834, Odds: 2.033)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Nonenzyme (Prob: 0.724, Odds: 1.014)
  +
* <b>Enzyme class</b>
  +
** none
  +
* <b>Gene Ontology category</b>
  +
** Immune_response (Prob: 0.371, Odds: 4.368) (=>)
  +
** Stress_response (Prob: 0.246, Odds: 2.795)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in LAMP1_HUMAN are listed in [[:File:BCKDHA_ProtFun_LAMP1_Human.png|Figure 50]].
  +
The most significant results can be found in the list above. This protein is predicted to be important for the cell envelope with a very significant probability and odd score. As expected this result is correct since it also occurs in the GO-terms in UniProt. In contrast the prediction of transport and binding which has indeed a good probability but no high odd score is wrong. The GO category Immune response predicted by ProtFun for LAMP1_HUMAN is not false, as autophagy is a process often triggered by the immune system as a response to foreign substances. Since autophagy is listed in UniProt the prediction of stress response is also quite correct.
  +
<br><br>
  +
'''RET4_HUMAN'''
  +
[[File:BCKDHA_ProtFun_RET4_Human.png|500px|thumb|Figure51: Prediction of GO-terms for RET4_HUMAN by ProtFun 2.2]]
  +
* <b>Functional category</b>
  +
** Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
  +
** Central_intermediary_metabolism (Prob: 0.197, Odds: 3.128)
  +
** Transport_and_binding (Prob: 0.800, Odds: 1.951)
  +
* <b>Enzyme/nonenzyme</b>
  +
** Enzyme (Prob: 0.544, Odds: 1.900)
  +
* <b>Enzyme class</b>
  +
** Lyase (Prob: 0.059, Odds: 1.264) (=>)
  +
** Hydrolase (Prob: 0.235, Odds: 0.742)
  +
* <b>Gene Ontology category</b>
  +
** Immune_response (Prob: 0.239, Odds: 2.813) (=>)
  +
** Stress_response (Prob: 0.616, Odds: 1.829)
  +
  +
The results of ProtFun2.2 for the prediction of the GO-terms in RET4_HUMAN are listed in [[:File:BCKDHA_ProtFun_RET4_Human.png|Figure 51]].
  +
The most significant results can be found in the list above. The categorization of ProtFun2.2 of RET4_HUMAN in cell_envelope is done with a very high probability and a significant odd score. And of course this prediction is rigth which can be seen by the comparison with UniProt. The result that the protein is involved in the metabolism has a very bad probability and a much lower odd score than the first hit but anyway it is correct. The last of the three listed functional categories is also predicted accurately. The prediction of immune response for RET4_HUMAN. We can't find any hints in the GO-terms in UniProt for immune response. Whereas the prediction of stress response was correct.
   
 
== References ==
 
== References ==
Line 945: Line 1,900:
   
 
back to [[Maple syrup urine disease]] main page
 
back to [[Maple syrup urine disease]] main page
  +
  +
go to [[Sequence Alignments BCKDHA]] (Task 2)
  +
  +
go to [[Homology_based_structure_predictions_BCKDHA]] (Task 4)

Latest revision as of 20:50, 25 August 2011

1. Secondary structure prediction

General Information

The secondary structure of a protein bases on the primary structure and consists of alpha-helices, beta-sheets and coils.

alpha-helices

Figure1: alpha-helix

Alpha-helices (Figure 1) are built by H-bounds between the NH-group of an amino acid and the CO-group of the amino acid which is placed four recidues earlier (i+4). This form of the alhpa-helix is the most common one. There are two other types of alpha-helices which are very rare. One is called 3,10-helices because the H-bound is between the NH-group and the CO-group three recidues earlier (i+3). The other one is the Phi-helix and here the H-bound is between the NH-group and the CO-group five residues earlier (i+5). The different locations of the CO-group influence the width and the height of the helices.

beta-sheets

Figure2: beta-sheet

The H-bounds (Figure 2) between the CO-group and the NH-group which build a beta-sheet can be located far away from each other in the sequence.
There are two different kinds of beta-sheets. The parallel one where the sheets all point in the same direction and the anti-parallel ones where the sheets point alternately in different directions.

coils

Coils are irregular formed elements like turns.

PSIPRED

Basic information

author: David T. Jones (University College London)
year:1998
version: 2

PSIPRED uses neuronal networks which have a single hidden layer and a feed-forward back-propagation architecture to predict the secondary structure. To run PSIPRED local it requires the output of PSI-BLAST (Position Specific Iterated - BLAST) as input data.
For the online prediction on the server it is enough to enter a amino acid sequence. Since PSIPRED uses a very stringent cross validation method to evaluate the performance it reaches an average Q3 score of 80.7%.
The predicition is splitted into three different steps. In the first step sequence profiles are generated by using a position specific scoring matrix from PSI-BLAST as input for the neuronal network. In the next step the secondary structure is predicted. In the last step the output of the secondary structure prediction is filtered.

There are three different options:
- Mask low complexity regions
- Mask transmembrane helices
- Mask coiled-coil regions

References

[PSIPRED Server]
[Overview of prediction methods]
[History of the PSIPRED]

Prediction

Figure3: Visualization of the prediction of PSIPRED
Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred      CHHHHHHHHHHHHHHHCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCC
UniProt                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred      CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCHH
UniProt             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCHHHHHHHHHHCCCC
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred      CCEEECCCCHHHHHHHCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCC
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred      CCCCCCCCCCCCHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHH
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHHHHCCCEEEEEECCCCCCCCCCCHHCCCCHHHHHCCCCCCCCCEECC
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred      HHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCHHHH
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHHCCCCHHHHHHHHHHCCCCC HHHHHHHHHHHHHHHHHHHHHHHC
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred      CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHCCCCCCCCCCC
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH  


PSIPRED has predicted 23 coils, 16 alpha helices and 6 beta sheets as it is shown in the alignment above. In (Figure 3) these predictions are visualized by pink bars which stand for the alpha helices and yellow arrows which symbolize the beta sheets. PSIPRED does not mark coils with a special figure which means that when there is wether a bar nor a arrow than there is a coil.
As it is shown in the alignment of predicted and real secondary structure of UniProt the prediction is completely wrong in the beginning. In the middle part it become better but still there are many mistakes. It seems that PSIPRED has more problems with beta sheets than with alpha helices because it predicts more beta sheets which do not exists or misses existing beta sheets than alpha helices. In most of the cases it predicts the alpha helices quite good. By comparing with the structure of UniProt it can be seen that especially the long alpha helices are correct predicted. Except of one long region in the middle of the sequence which should be a long beta sheet but is predicted as a alpha helix.

Jpred3

Basic information

author: Cole C, Barber JD & Barton GJ (Bioinformatics and Computational Biology Research, University of Dundee)
year: 1998
version: 3


Jpred is using a neuronal network to make the predictions. To predict the secondary structure of a protein sequence or of a multiple alignment of protein sequences the algorithm Jnet is used. The prediction accuracy for secondary structures lies above 81%. Additionally Jpred makes predictions about the solvent accessibility.
Jpred3 needs a protein sequence or multiple alignment of protein sequences as input.
It is important that the target sequence is the first sequence in the multiple alignment since the alignment is modified so that the first sequence do not have any gaps. The alignemt has to be in the MSF or in the BLC format.

References

Jpred3 Server
About Jpred
FAQ


Prediction

By predicting the secondary structure of BCKDHA with JPred it found many hits with very good e-values in other proteins.

e-value=0.0
2bew, 2bev, 2beu, 1x80, 1wci, 1u5b, 1olx, 1ols, 1dtw, 1x7y, 1x7z, 1x7x, 1x7w, 2j9f, 2bff, 1v1r, 1olu, 1v16, 1v11, 2bfc, 2bfb, 1v1m, 2bfd, 2bfe

e-value=6e-58
1umd, 1umc, 1umb, 1um9

e-value=1e-57
2bp7, 1qs0, 1w85, 3dva, 1w88


Figure4: Visualization of the prediction of JPred (alpha helices: red bars; beta sheets: green arrows)
first line: prediction; second and third line: confidence of the prediction

With these hits JPred run the prediction:

Seq       MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD
Pred        HHHHHHHHHHHHHH                 EEE              
Conf      10090009999980000000323546777770000303566666777777
UniProd                                                     

Seq       KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE
Pred                                 EEEEE                HH
Conf      77777777777777654567777777308885377740467787776368
UniProd             EEEE                          HHH     HH

Seq       KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN
Pred      HHHHHHHHHHHHHHHHHHHHHHHH     E      HHHHHHHHHHH
Conf      99999999999999999999875045000001677517899999885278
UniProt   HHHHHHHHHHHHHHHHHHHHHHHH  EEE        HHHHHHHH     

Seq       TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER
Pred        EEEE    HHHHHHHH  HHHHHHHHH
Conf      84465157745788885065689988740677754577777545677777
UniProt    EEE      HHHHHH    HHHHHHHHH     CCCC         CCC

Seq       HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF
Pred                   HHHHHHHHHHHH     EEEEEE      HHHHHHHH
Conf      64132147888770367889998750688558887407887468999999
UniProt   C       CCCHHHHHHHHHHHHHHH     EEEEEE  HHH HHHHHHH

Seq       NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
Pred      HHHH     EEEEEEE                 HHHHHHH   EEEEE
Conf      87500888606888703677777777777764067777005725774078
UniProt   HHHHH    EEEEEEE EEE    HHH  EEE  HHH HHH  EEEEEE 

Seq       NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE
Pred        HHHHHHHHHHHHHHHHH    EEEEEEEEEE              HHH
Conf      74689999999999988507985588886354067777777765553688
UniProt     EEEEEEEEEEEEEEEEEE   EEEEEE                     

Seq       VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK
Pred      HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH
Conf      99998468758999999986068866899999999999999999988606
UniProt            HHHHHHHHHCCCC   HHHHHHHHHHHHHHHHHHHHHHHH 

Seq       PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred          HHHHHHH      HHHHHHHHHHHHHHHH
Conf      887368777523688756899999999999875267777777888
UniProt       HHHH   EEEE  HHHHHHHHHHHHHHHHHHHH  HHH   


By comparing the prediction of the secondary structure of Jpred and the secondary structure of BCKDHA in UniProt as it is done in the alignment above it is remarkable that in the beginning the prediction differs a lot from UniProt but in the middle and in the end it becomes much better. Jpred predicts more helices and less beta sheets than there are in the UniProt secondary structure. It is interesting that although there are no alpha helices in the beginning Jpred predicts them with a quite high confidence. This high confidence can also be seen very good in the visualization of the predition (Figure 4) where it is displayed by black bars. There is one part in the middle of the sequence where it predicts a very long alpha helix but it should be a beta sheet. It is interesting that PSIPRED also had problems with this beta sheet. In the rest of the middle part the prediction of Jpred is quite correct except for a few positions. (Figure 4) underlines that the protein mainly consists of alpha helices since there are mainly red bars shown.

DSSP

Basic information

author: Wolfgang Kabsch and Chris Sander (Max-Planck-Institut fürmedizinische Forschung, Heidelberg)
year: 1983
whole name: Define Secondary Structure of Proteins

Based on atomic coordinates in Protein Data Bank format, DSSP defines the secondary structure of a protein.
With this method the secondary structure is not predicted but determined from the 3D coordinates.


Referencse

[Introduction]
[Explanation ]


Prediction

Figure5: Visualization of the prediction of DSSP.
Seq     KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMT
Pred        TT       T        TT T    T  TTT  T 333     HHHHHHHHHHHH
UniProt                            EEEEE                HH

Seq     LLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYP
Pred    HHHHHHHHHHHHHHTTTTT     TT HHHHHHHHHTT TTTSSS  TT HHHHHHTT
UniProt HHHHHHHHHHHHHH     E      HHHHHHHHHHH     EEEE    HHHHHHHH  

Seq     LELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANR
Pred    HHHHHHHHHT TT TTTT T TT    TTTT     TTTTTHHHHHHHHHHHHHHTT
UniProt HHHHHHHHH                                  HHHHHHHHHHHH

Seq     VVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPG
Pred     SSSSSSTT333THHHHHHHHHHHHTT  SSSSSSS TSSTTSS333T TTTTT333T33
UniProt EEEEEE      HHHHHHHHHHHH     EEEEEEE                 HHHHHHH

Seq     YGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYR
Pred    3T SSSSSSTT HHHHHHHHHHHHHHHHHHT  SSSSSS    T TTTT  333T
UniProt    EEEEE    HHHHHHHHHHHHHHHHH    EEEEEEEEEE             

Seq     VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFS
Pred     HHHHHHT HHHHHHHHHHHHTT  HHHHHHHHHHHHHHHHHHHHHHHHT    3333TT
UniProt HHHHHH   HHHHHHHHHHH     HHHHHHHHHHHHHHHHHHHHHHHH     HHHHHH
 
Seq     DVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred    TTTTT  HHHHHHHHHHHHHHHHH333T 333
UniProt H      HHHHHHHHHHHHHHHH


Description of the visualization of the prediction
It is important to know that the first 50 amino acids of the sequence are not shown. And that the important part for our protein ends on position 391.
1. line: Sequence
2. line: structural elements
3. line: if a residue is involved in symmetrie contacts it is labeled with a star
4. line: if a residue is solvent accessible it is labeled with an "A"

Letter code for the secondary structure elements:

  • H (blue): alpha helix
  • 3 (yellow): residue in isolated beta-bridge
  • T (red): hydrogen bonded turn
  • S (green): bend

As we can see by the comparison of the predicted structure with the structure of BCKDHA of UniProt they match to a large extent. Especially the alpha helices are assigend mainly correct. As it is shown in (Figure 5) by the blue regions the protein mainly consists of alpha helices so most of the prediction is exact. DSSP has some problems to assign beta sheets which arise from the comparison of the prediction with the UniProt structure.
DSSP offers much more information than the two other tools, since it does not only predict alpha helices, beta sheets and turns but also symmetrie contacts and solvent accessibility.

2. Prediction of disordered regions

General information

Disordered regions are long regions which do not have a regular secondary structure. They are dynamically flexible and have only a regular structure when they bind to another substrate or protein. In these regions polar and charged amino acid and especially proline are overrepresentated. The disordered regions are conserved and obtain mainly in regions which have a regulatory function. Since disordered regions have no clear secondary structure they also have no tertiary structure.


DISOPRED

Basic information

author: Jonathan J. Ward, Liam J. McGuffin, Kevin Bryson, Bernard F. Buxton and David T. Jones (University College London)
year: 2004
version: 2

DISOPRED2 identifies disordered regions by searching residues which appear in the sequence records but have no co-ordinates in the electron density map. This is a very simple method to find disordered regions because the absence of co-ordinates can also be explained with artifacts of the crystalization process.

References

Publication
DISOPRED server
Information

Prediction

Figure6: Prediction of the disordered regions
  
Figure7: Profile plot of the disordered regions

In the first line the confidence of the prediction which is shown in the second line is denoted. The prediction of a disordered region is marked with an asterisk (*). All of the disordered regions are predicted with a very high confidence.
DISOPRED predicts disordered regions mainly in the beginning and a few in the end of BCKDHA as it is shown in Figure 6 by the red fields.
Figure 7 on the right side also points out that the disordered regions are in the beginning and in the end since at these two sides there are the highest peaks.

POODLE

Basic information

POODLE uses machine learning approaches to predict the disordered regions of an amino acid sequence.

author:
- POODLE-L S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi
- POODLE-S K. Shimizu, Y. Muraoka, S. Hirose, and T. Noguchi
- POODLE-W K. Shimizu, Y. Muraoka, S. Hirose, K. Tomii and T. Noguchi
- POODLE-I S.Hirose, K.Shimizu, N.Inoue, S.Kanai and T.Noguchi

year:
- POODLE-L 2007
- POODLE-S 2007
- POODLE-W 2007
- POODLE-I 2008

options:
POODLE-L: This tool searches for disordered regions which are longer than 40 consecutive amino acids.
POODLE-S: Here the focus lies on predicting short disordered regions. There are two different subtools: "Missing residues" and "High B-factor residues"
POODLE-W: With this option the proteins which are mostly disordered can be found.
POODLE-I: In this tool the other three tools are combined. POODLE-I also uses structural information to predict disordered regions. It bases on a work-flow approach.


References

[POODLE-L]
[POODLE-S]
[POODLE-W]
[POODLE-I]
[POODLE server]
[Help]


Prediction

POODLE-S

POODLE-S
Missing residues
POODLE-S
High B-factor residues
Figure8: POODLE-S (Missing residues): disordered region prediction
Figure9: POODLE-S (High B-factor residues): disordered region prediction


POODLE-S
Missing residues
POODLE-S
High B-factor residues
disordered region 1-56 341-345 420-423 6-9 15-57 93 95-96 340-354 379-402
average confidence 0.75 0.58 0.56 0.63 0.77 0.53 0.55 0.67 0.59


POODLE-S (which predicts short disordered regions) with the option "Missing residues" predicted the disordered regions between the positions 1-56, 341-345 and 420-423. This is also shown in Figure 8. The peaks which are over the cut-off value of 0.5 in the green region stay for the disordered regions. In the beginning there is a very high and also very long peak. Because of this it is clear that the tool predicts with a very high confidence that there is a long region with no fixed structure in the beginning of the protein. The average confidence of 0.75 can also be seen in the table under the figures. The other two numbers in this table point out that the predictions of the two disordered regions in the end of the protein do not have a very high confidence. We also ran the prediction with POODLE-S with the option "High B-Factor residues". Here the prediction was that there are disordered regions between the positions 6-9, 15-57, 93, 95-96, 340-354 and 379-402. This is also shown in Figure 9.This option predicts more regions with no fixed structure but as in the option "Missing residues" they are in the beginning and in the end of the protein. By comparing Figure 9 with Figure 8 it can be noticed that the predictions in the end are done with more confidence in the second run with "High B-Factor residues". The peaks are much higher and also longer which shows that the predicted disorderes regions are longer.
In both runs POODLE-S has much variation in the middle part of the protein between the peaks. There are always small peaks but they are not high enough to come over the cut-off value.

POODLE-L

Figure10: POODLE-L : prediction of disordered regions


disordered region 1-48 369-428
average confidence 0.6 0.67


POODLE-L predicts two disordered regions which are longer than 40 amino acids. They are located between the positions 1-48 and 369-428. By looking at Figure 10 we can see that the predictions are in the beginning and in the end of the protein. But both of the predictions only have low peaks so POOLDE-L is not completely confident about the prediction. This observation is supported by the average confidence values of 0.6 and 0.67. This can be explained by the fact that POODLE-L searchs long disordered regions and perhaps the length of the two regions of about 40 amino acids is too short to be a very good match.
Since POODLE-L only looks for long disordered regions it is sure that the rest of the protein does not have any disordered regions. This observation is supported by Figure 10 because we can see that there are no small peaks in the middle of the plot.

POODLE-W

Figure11: POODLE-W: prediction of disordered regions



The regions which could be disordered regions but POODLE is not sure are bordered by blue squares and the certain disordered regions are bordered by red squares in Figure 11.

0=ordered regions
5=perhaps disordered regions
9=disordered regions

In this case there is no predited disordered region in the beginning of the protein which is completely different to the other two tools of POODLE we already used. Instead the prediction of the disordered region in the end is very good which means that the confidence is high and the space which is predicted to be disordered is very long and reachs till the end of the protein. The first part of the disordered region has no high assurance. But the major part of the match is assigned with the highest possible confidence of 9 which can be seen in Figure 11 by the red box.

POODLE-I

Figure12: POODLE-I: prediction of disordered regions


disordered region 1-56 341-345 370-427 443-445
average confidence 0.6 0.56 0.67 0.74


POODLE-I predicted four disordered regions between the positions 1-56, 341-345, 370-427 and 443-445. These predictions are shown in Figure 12 where we can see that they are in the beginning and in the end of the protein. The peak in the beginning is quite long but in the middle of the peak it falls very low so that it is nearly under the cut-off value. That is why the average value is also low. But we can see in the plot (Figure 12) that there are two maximum confidence values for this peak and they are both around 0.7 which underlines that the prediction is quite sure. The next peak is very short and also has a bad average confidence of 0.56 so it seems that POODLE-I is not sure about the prediction. The third peak is longer than the other peaks and has additionally a good average confidence value of 0.67. The prediction of the peak directly in the end of the protein has the highest value but that is comprehensible since the structure is always less defined in the end of a protein. So we have to be carefully with this hit because it also can be wrong.
Between the predicted regions there are also many small peaks which are not high enough to come over the threshold.

Comparison

POODLE-S(Missing residues) POODLE-S(High B-factor residues) POODLE-L POODLE-W POODLE-I
1-56 6-9 1-48 325-445 1-56
341-345 15-57 369-428 341-345
420-423 93 370-427
95-96 443-445
340-354
379-402


By comparing all the several tools of POODLE we can summarize that the disordered regions are mainly in the beginning or in the end of the protein. Only POODLE-S predicts them in the middle of the protein but here the regions are so short and the confidence is so low that it is not sure if they are really disordered regions. The predicted disordered regions are mainly between position 1-56 and 341-445. The fact that the disordered regions are in the beginning and in the end of the protein is obvious, since in these regions the structure is always not very well defined. So such a hit can also be a false positive just because of the bad definition of the secondary structure.


IUPred

Basic information

author: Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon
year: 2005

IUPred predicts disordered regions by estimating the capacity of polypeptides to form stabilizing contacts. The potential to form these contacts depends on the surrounding sequence and on the chemical properties. This approach is based on the idea that disordered regions have no capacity to form sufficient interresidue interactions so that there is no stabilizing energy.

There are three different prediction types which can be chosen:

  • long disorder: predicts context-independent global disorder that encompasses at least 30 consecutive residues of predicted disorder
  • short disorder: predicting short, probably context-dependent, disordered regions, such as missing residues in the X-ray structure of an otherwise globular protein
  • structured regions: takes the energy profile and finds continuous regions confidently predicted ordered
References

[IUPred server]
[Theory]

Prediction

Prediction type: long disorder
Figure13: Prediction of disordered regions with IUPred(long disorder)


disordered region 33-50 89-93 385-388 390-397 399-401 404-413 420-422 424-428 431
average confidence 0.69 0.57 0.52 0.64 0.51 0.55 0.52 0.56 0.55


When using the long disorder tool of IUPred it predicts several disordered regions. They are located at the positions 33-50, 89-93, 385-388, 390-397, 399-401, 404-413, 420-422, 424-428 and on the position 431. Although there are many different regions they are all located in the beginning or in the end of the protein. By looking on Figure 13 it strikes out that mainly the peak in the beginning has a high confidence. Since this hit is quite long there are also regions which don't have a high confidence that's why the average value is only 0.69 which is anyway quite good. The second peak is very short and additionally has a weak confidence so it is not sure wether this is a real hit. In the end there are many predicted disordered regions but except of one the prediction of all of them is quite unsure since the confidence value is only a bit over the cut-off value.
Between all these predicted disordered regions there are many peaks which are only a bit under the threshold. By looking at Figure 13 the whole protein except of the middle part could be part of a disordered region.

Prediction type: short disorder
Figure14: Prediction of disordered regions with IUPred (short disorder)


disordered region 1 33-55 92-93 393-411 415 420-421 423-425 427-428 433 438-445
average confidence 0.56 0.7 0.56 0.57 0.5 0.53 0.53 0.53 0.51 0.73



When using the short disorder tool of IUPred it predicts several disordered regions. They are located at the positions 1, 33-55, 92-93, 393-411, 415, 420-421, 423-425, 427-428, 433 and 438-445. The hit on position 1 can be neglected because it is just one residue long and the confidence value is only 0.56. But the next predicted disordered region seems to be important because it is about 20 residues long and the average confidence value is 0.7. Figure 14 shows that it is 0.7 because the peak is so long. The maximum confidence value of this region is about 0.8 which signals the high confidence of this disordered region. The next hit is only two residues long and has a maximum value of 0.57. Since it is so short it can also be neglected. After these predictions in the beginning of the protein there are many very short regions in the end of the protein. All of them are only about one or two residues long except of the last predicted region. There are two possibilities for these short regions. Either they are declared as too short so that they are no true disordered regions which is supported by the low confidence values or it is possible that they have to be combined to one long disordered region. The second possibility is supported by the fact that all these short regions are next to each other. Since all the other programms also predicted a disordered region in the end of the protein we decide to take the second possibility. The last hit is the most significant one. Indeed it is only eight residues long but the average confidence value is 0.73 and the maximum value is higher than 0.9. It is obvious that there is such a clear prediction for a disordered region in the end of the protein because this part of a protein normally has no well defined fixed structure but although there is no defined secondary structure it is not said that there is no function.

Prediction type: structured regions
Figure15: Prediction of disordered regions with IUPred(structured regions)


By analyzing the secondary structure with the option "structure regions" the programm could not find any disordered regions in the whole protein and only has as output the information that "Unkown globular domains: 1-445" and Figure 15.

META-Disorder

To run META-Disorder we used the tool of PredictProtein Server. <ref> https://www.predictprotein.org/ </ref>
This tool does not only provide the prediction of disordered regions but also many other features like effects of amino acid substitutions, protein-protein interaction sites and so on. A complete list can be found on the wikipedia page <ref> https://rostlab.org/owiki/index.php/PredictProtein-Machine_Image#What_protein_features_are_predicted_by_methods_included_on_the_PredictProtein_Machine_Image.3F </ref>. It is also possible to download the PredictProtein Machine Image (PPVMI) which contains a fully functional Debian system, prediction methods and supporting databases so it is possible to work locally. A very good description how to use it is given in the already mentioned wikipedia. One very important point is that PredictProtein stores all the predictions which makes it possible to get a very fast answer for the jobs. At the moment there are 4,405,120 annotated proteins in the PredictProtein cache.

Prediction
Figure 15: in the picture the output of PredictProtein is shown. In the red box the prediction of the disordered regions is shown. Disordered regions are red and non-disordered regions are green.


disordered region 1-9 394-400
average confidence 0.63 0.57


By predicting the disordered regions with META-DISORDER we only got two regions as possible disordered region. This can be seen in the Figure 15 where only the beginning and the end of the strand is red and the rest is green. Also the table shows that the regions are completely in the beginning and in the end. Since there are no other possible disordered regions in this prediction and the fact that the green part seems to be clearly not disordered indicates that these two hits could be wrong. It is not said but in generall these regions of a protein have no very good defined structure although they have a function.

Comparison

By comparing the results of all disordered region prediction tools we can see that all of them predicted disordered regions in the beginning and in the end of the protein. With these results we have to be carefully because in these regions the structure of a protein is always not very well defined. So the hit can arose because of the bad definition of the secondary structure in these regions. But we also have to see that all of the programms predicted these regions and most of them with a high assurance. Because of this fact it seems to be quite sure that the beginning and the end of BCKDHA are disordered regions.

3. Prediction of transmembrane alpha-helices and signal peptides

General

Transmembrane Topology

The prediction of the membrane topology of proteins aims at discovering which portions of the protein lie within the lipid bilayer of a membrane and which portions protrude from the membrane into the watery environment. Membrane spanning polypeptides usually form helices of about 20 amino acids length. As the surrounding membrane is hydrophobic, the membrane spanning part of the protein consists of hydrophobic amino acids as well. These information can be used for the prediction of transmembrane helices, which subsequently enables the prediction of the membrane topology. <ref> http://en.wikipedia.org/wiki/Membrane_topology</ref><ref>http://en.wikipedia.org/wiki/Transmembrane_domain</ref>

Prediction tools: TMHMM, OCTOPUS and SPOCTOPUS

Signal Peptides

Signal peptides are N-terminal sequence motifs directing proteins to their cellular destination, like secretory pathway, mitochondria and chloroplast. One example for a signal peptide is the secretory signal peptide (SP), which is an N-terminal peptide that is typically 15-30 amino acids long. There are three regions of a signal peptide: an N-terminal region (n-region) which is often built up by positively charged residues, a hydrophobic region (h-region) in the middle of at least six residues and a C-terminal region (c-region) of polar uncharged residues. In Eukaryotes the SP targets proteins across the endoplasmic reticulum, in prokaryotes across the plasma membrane. The SP is cleaved when the protein crosses the membrane.
Furthermore there exist chloroplast transit peptides (cTP) which are also N-terminal and are cleaved when the protein enters the chloroplast. The most conserved site in cTPs is an Alanine directly after the N-terminal methionine... <ref>O. Emanuelsson, S. Brunak, G. von Heijne, H. Nielsen, "Location proteins in the cell unsing TargetP, SignalP and related tools", Nature Protocols, 2007</ref> Prediction tools: SignalP, TargetP

Combined transmembrane and signal peptide prediction

As the hydrophobic regions of a transmembrane helix and a signal peptide are highly similar, this leads to cross reaction between these two types of prediction. <ref>http://www.ebi.ac.uk/Tools/phobius/help.html</ref>

Prediction tools: Phobius and Polyphobius

In the following section different tools for predicting transmembrane helices and signal peptides are tested. As the BCKDHA protein isn't a transmembrane protein, additional proteins were used for the transmembrane and signal peptide analysis:

name organism location transmembrane protein signal peptide function reference
A4_HUMAN Human Cell membrane yes yes Protease Inhibitor P05067
BACR_HALSA Halobacterium salinarium Cell membrane yes no ion transport P02945
INSL5_HUMAN Human extracellular region no yes hormone Q9Y5Q6
LAMP1_HUMAN Human Cell membrane, Lysosome membrane, Endosome membrane yes yes Presents carbohydrate ligands to selectins P11279
RET4_HUMAN Human extracellular space no yes Transport P02753


TMHMM

Method

  • Was developed by Sonnhammer, Heijne and Krogh in 1998 <ref> E.L. Sonnhammer, Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences, Proc Int Conf Intell Syst Mol Biol.(1998)</ref>
  • Predicts transmembrane topology of membrane-spanning proteins
  • Is a membrane topology prediction method based on a hidden Markov model with an architecture of 7 types of states
  • Required Input: protein sequence in fasta format
  • Can also be ran on the TMHMM server

Execution

Before we could execute TMHMM we had to change all occurrences of "/usr/local/bin/" to "/usr/bin" in the following files: tmhmm, tmhmm.ORIG and tmhmmformat.pl

To execute the program we used these commands:

  • tmhmm P05067.fasta > tmhmm _out_P05067.txt
  • tmhmm P02945.fasta > tmhmm _out_P02945.txt
  • tmhmm Q9Y5Q6.fasta > tmhmm _out_Q9Y5Q6.txt
  • tmhmm P11279.fasta > tmhmm _out_P11279.txt
  • tmhmm P02753.fasta > tmhmm _out_P02753.txt
  • tmhmm P12694.fasta > tmhmm _out_P12694.txt

Results

BCKDHA

Position Membrane topology
1-445 outside

TMHMM predicted no membrane spanning region for the BCKDHA protein, which corresponds to the information provided in Uniprot.

A4_HUMAN

Figure16: Membrane topology of A4_HUMAN (source: Uniprot)
Position Membrane topology
1-700 outside
701-723 TMhelix
724-770 inside


TMHMM predicted one transmembrane helix for the A4_HUMAN. This agrees with the Uniprot annotation. The predicted transmembrane helix begins at position 701 in the protein, whereas Uniprot states the transmembrane region goes from position 700-723 which can be seen in Figure 16. The extracellular region reported by Uniprot begins at position 18 in the sequence, this is due to a signal peptide in the beginning of the protein. TMHMM doesn't include a signal peptide prediction, therefore it predicted the extracellular region from position 1-700.


BACR_HALSA

Figure17: Membrane topology of BACR_HALSA (source: Uniprot)
Position Membrane topology
1-22 outside
23-42 TMhelix
43-54 inside
55-77 TMhelix
78-91 outside
92-114 TMhelix
115-120 inside
121-143 TMhelix
144-147 outside
148-170 TMhelix
171-189 inside
190-212 TMhelix
213-262 outside

The TMHMM prediction differs a little bit from the information provided in Uniprot as it can be seen in Figure 17. TMHMM predicted only 13 different domains of the protein (the end of the protein is predicted to be in the extracellular space), whereas in Uniprot 15 domains are reported (protein ends in cytoplasma).

INSL5_HUMAN

Figure:18: Membrane topology of INSL5_HUMAN (source: Uniprot)
Position Membrane topology
1-135 outside

The TMHMM prediction agrees with the fact that INSL5_HUMAN is a hormone and therefore secreted in the extracellular region. The information about these properties are offered by UniProt and can be seen in Figure 18

LAMP1_HUMAN

Figure19: Membrane topology of LAMP1_HUMAN (source: Uniprot)
Position Membrane topology
1-10 inside
11-33 TMhelix
34-383 outside
384-406 TMhelix
407-417 inside

The prediction for LAMP1_HUMAN made by TMHMM does only partially agree with the Uniprot annotation as we can see by comparing the results of TMHMM with the information of UniProt which are shown in Figure 19. The sequence parts from the signal peptide and lumenal domain are predicted to be another transmembrane helix and extracellular domain. The second transmembrane helix is predicted correctly.

RET4_HUMAN

Position Membrane topology
1-201 outside

The TMHMM prediction for RET4_HUMAN is correct, as RET4_HUMAN is a secreted protein and does not span any membrane.

Phobius and Polyphobius

Methods

  • Phobius was developed by Käll et al <ref>Käll et al., "A Combined Transmembrane Topology and Signal Peptide Prediction Method", Journal of Mol. Biology,338(5):1027-1036, 2004 </ref>
  • combined prediction of transmembrane regions and signal peptids
  • Required input information: only sequence in FASTA-Format (20 amino acids and B, Z, X are recognized)
  • As transmembrane topology and signal peptides are likely to be conserved during evolution, Polyphobius was established <ref>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information", Bioinformatics, 21 (Suppl 1):i251-i257, 2005</ref>, which includes information from homologous sequences to the query.
  • Required input:
    • Query Sequence in FASTA-Format, which is then blasted agains uniprot_trembl
    • Or upload of an alignment in FASTA-Format which provides information about homologs

Results

A4_HUMAN
Phobius Polyphobius
Figure20: Prediction of Phobius
sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 1 N-REGION
REGION 2 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

sp|P05067|A4_HUMAN
SIGNAL 1 17
REGION 1 3 N-REGION
REGION 4 12 H-REGION
REGION 13 17 C-REGION
TOPO_DOM 18 700 NON CYTOPLASMIC
TRANSMEM 701 723
TOPO_DOM 724 770 CYTOPLASMIC

Figure21: Prediction of Polyphobius

By comparing the results of Phobius and Polyphobius we can see that they predict mainly the same. Also by looking at Figure 20 and Figure 21 we can see that both predictions are nearly the same. Phobius and Polyphobius predicted the signal peptide and membrane topology for A4_HUMAN correctly. The signal peptide and membrane topology for A4_HUMAN can be found in Figure 16.

BACR_HALSA
Phobius Polyphobius
Figure22: Prediction of Phobius
sp|P02945|BACR_HALSA
TOPO_DOM 1 22 NON CYTOPLASMIC.
TRANSMEM 23 42
TOPO_DOM 43 53 CYTOPLASMIC.
TRANSMEM 54 76
TOPO_DOM 77 95 NON CYTOPLASMIC.
TRANSMEM 96 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 142
TOPO_DOM 143 147 NON CYTOPLASMIC.
TRANSMEM 148 169
TOPO_DOM 170 189 CYTOPLASMIC.
TRANSMEM 190 212
TOPO_DOM 213 217 NON CYTOPLASMIC.
TRANSMEM 218 237
TOPO_DOM 238 262 CYTOPLASMIC.

sp|P02945|BACR_HALSA
TOPO_DOM 1 21 NON CYTOPLASMIC.
TRANSMEM 22 43
TOPO_DOM 44 54 CYTOPLASMIC.
TRANSMEM 55 77
TOPO_DOM 78 94 NON CYTOPLASMIC.
TRANSMEM 95 114
TOPO_DOM 115 120 CYTOPLASMIC.
TRANSMEM 121 141
TOPO_DOM 142 147 NON CYTOPLASMIC.
TRANSMEM 148 166
TOPO_DOM 167 186 CYTOPLASMIC.
TRANSMEM 187 205
TOPO_DOM 206 215 NON CYTOPLASMIC.
TRANSMEM 216 237
TOPO_DOM 238 262 CYTOPLASMIC.

Figure23: Prediction of Polyphobius

The predictions of Phobius and Polyphobius differ only in a small variation in the length of the single domains which can be seen by the results in the two tables above. Additionally the comparison of Figure 22 with Figure 23 show that they are mainly the same and only differ a bit in the posterior label probability of cytoplasmic and non cytoplasmic regions. Both predictions of the membrane topology are correct which can be seen by comparing the results with Figure 17.


INSL5_HUMAN
Phobius Polyphobius
Figure24: Prediction of Phobius
sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 5 N-REGION
REGION 6 17 H-REGION
REGION 18 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

sp|Q9Y5Q6|INSL5_HUMAN
SIGNAL 1 22
REGION 1 4 N-REGION
REGION 5 16 H-REGION
REGION 17 22 C-REGION
TOPO_DOM 23 135 NON CYTOPLASMIC

Figure25: Prediction of Polyphobius

The Phobius and Polyphobius predictions for INSL5_HUMAN agree with the information given on UniProt (Figure 18). By comparing the results in the table above and Figure 24 with Figure 25 we can see that both predicted correctly a signal peptide and only one extracellular region of the protein.

LAMP1_HUMAN
Phobius Polyphobius
Figure26: Prediction of Phobius
sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 10 N-REGION
REGION 11 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

sp|P11279|LAMP1_HUMAN
SIGNAL 1 28
REGION 1 9 N-REGION
REGION 10 22 H-REGION
REGION 23 28 C-REGION
TOPO_DOM 29 381 NON CYTOPLASMIC
TRANSMEM 382 405
TOPO_DOM 405 417 CYTOPLASMIC

Figure27: Prediction of Polyphobius

By comparing the results of Phobius and Polyphobius listet in the table above and shown in Figure 26 and Figure 27 we can assume that the two tools made the same predictions. To find out if these results are correct we compared them to the information offered by UniProt Figure 19 and can conclude that the signal peptide and membrane topology predictions made by Phobius and Polyphobius for LAMP1_HUMAN are correct.

RET4_HUMAN
Phobius Polyphobius
Figure28: Prediction of Phobius
sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 2 N-REGION
REGION 3 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

sp|P02753|RET4_HUMAN
SIGNAL 1 18
REGION 1 3 N-REGION
REGION 4 13 H-REGION
REGION 14 18 C-REGION
TOPO_DOM 19 201 NON CYTOPLASMIC

Figure29: Prediction of Polyphobius

Both tools made nearly the same prediction which can be seen out of the table above and because of the visualization of the two predictions (Figure 28, Figure 29). Both predict the signal peptide of RET4_HUMAN correctly, as well as the one extracellular region of the protein.

For the BCKDHA-protein Phobius predicted a signal peptide with about 90% probability at the beginning of the sequence. The predicted signal peptide is 34 amino acids long. This matches the information given on UniProt, which says, that BCKDHA contains a 45bp long signal peptide for the transfer into the mitochondrion. The rest of the amino acid is a non cytoplasmic protein sequence. No part of the protein is predicted to be transmembrane spanning. This is also true, as BCKDHA is a protein located in the mitochondrion matrix according to UniProt.

BCKDHA
Phobius Polyphobius
Figure30: Prediction of Phobius
sp|P12694|ODBA_HUMAN (BCKDHA)
Signal 1 34
Region 1 16 N-Region
Region 17 25 H-Region
Region 26 34 C-Region
TOPO_DOM 35 445 non cytoplasmic

OBDA_HUMAN (BCKDHA)
TOPO_DOM 1 445 Non cytoplasmic

Figure31: Prediction of Polyphobius

Considering the information given on UniProt, Polyphobius performed worse than Phobius on the BCKDHA-protein sequence. It predicted no signal sequence at the beginning of the protein sequence. There is a low probability for the amino acids between position 1-45 to be a signal sequence, but all in all the whole sequence is predicted to be a non cytoplasmic protein. This is also shown in Figure 31. In contrast to the prediction of Polyphobius, Phobius predicted the signal sequence between position 1 and 34 with a very high probability. This probability is visualized very good in Figure 30

OCTOPUS and SPOCTOPUS

Methods

  • OCTOPUS was developed by Viklund and Elofsson in 2008 <ref>Håkan Viklund and Arne Elofsson, "Improving topology prediction by two-track ANN-based preference scores and an extended topological grammar", Bioinformatics (2008)</ref>
  • OCTOPUS (obtainer of correct topologies for uncharacterized sequences) uses a combination of hidden Markov models and artificial neural networks.
  • It creates a sequence profile by doing a BLAST search to obtain homologous sequences. The profile is used as input for a neural network that predicts the probability for each residue to be located in a transmembrane(M), interface (I), close loop (L), or globular loop (G) environment as well as the preference to be inside (i) or outside (o) of the membrane. A hidden Markov model is used to calculate the most likely Protein Topology.
  • Required input: Protein Sequence in FASTA-Format
  • SPOCTOPUS (Viklund et al., 2008<ref>Viklund et al., "A combined predictor of signal peptides and membrane protein topology", Bioinformatics (2008)</ref>) is an extension of OCTOPUS which also predicts signal peptides. A neural network is used to predict a signal peptide preference score. The signal peptide's location is determined by a hidden Markov model. The output contains the information retrieved by OCTOPUS as well as the probabilty if a residue is predicted to be N-terminal of a signal peptide (n) or in a signal peptide (S).
  • Required input information: Protein sequence in FASTA-Format

Results

A4_HUMAN

Figure32: Prediction for Octopus and Spoctopus for A4_HUMAN


When we compare the results of OCTOPUS and SPOCTOPUS with each other we can see that both tools predicted the membrane topology for A4_HUMAN. The output is visualized in Figure 32 and it is shown by the brown line that the protein is mainly in the non-cytoplasmic region. OCTOPUS also detected the signal peptide. By comparing the predictions with the information offered by UniProt we can see that the predictions of both tools are correct.

BACR_HALSA

Figure33: Prediction for Octopus and Spoctopus for BACR_HALSA


The predictions made by OCTOPUS and SPOCTOPUS for BACR_HALSA are identical and correct. The results are visualized in Figure 33. We can see that the protein is mainly in the transmembrane region which is pointed out by the red bars. Additionally the alternating brown and green lines indicate that the protein changes in turn between non-cytoplasmic region and cytoplasmic region. SPOCTOPUS was not able to predict a signal peptide, which agrees with the information given in UniProt.

INSL5_HUMAN

Figure34: Prediction for Octopus and Spoctopus for INSL5_HUMAN


When we compare the results of the predictions from OCTOPUS and SPOCTOPUS we can see that both of them predicted the protein beeing in a non-cytoplasmic region after position 22 or 23. This conclusion is supported by the brown line in Figure 34. In this picture it is also shown that the two tools made different predictions for the first part of the protein. SPOCTOPUS predicted the signal peptide of INSL5_HUMAN while OCTOPUS predicted for the same part of the sequence a transmembrane domain. By comparing the results with the information in UniProt we can see that the signal peptide is correctly predicted.

LAMP1_HUMAN

Figure35: Prediction for Octopus and Spoctopus for LAMP1_HUMAN

By looking on the visualization of the results (Figure 35) we can see that the two tools made mainly the same predictions. But their predictions differ in the beginning of the protein. While SPOCTOPUS predicted the beginning beeing a signal peptide, OCTOPUS assigned this region as an additional inside region and transmembrane helix where the sequence contains a signal. As we know from UniProt the prediction of SPOCTOPUS is the right one because LAMP1_HUMAN has a signal peptide in the beginning of the protein.

RET4_HUMAN

Figure36: Prediction for Octopus and Spoctopus for RET4_HUMAN

Again the two tools made nearly the same predictions and only differ in the beginning of the protein. As we can see in Figure 36 both of them predicted the protein to be mainly in a non-cytoplasmic region but while SPOCTOPUS predicted the beginning to be a signal peptide, OCTOPUS assigned this region to be a transmembran helix. By comparing the two predictions with the information offered by UniProt it is obvious that there is a signal peptide in the beginning of the protein.

BCKDHA

Figure37: Prediction for Octopus and Spoctopus for BCKDHA

The OCTOPUS and SPOCTOPUS predictions for the BCKDHA protein are completely contrary in terms of the intracellular and extracellular regions which is very clear by considering Figure 37. But both predictions are wrong, as BCKDHA is no membran protein. Furthermore, SPOCTOPUS missed the 45bp long signal peptide at the beginning of the sequence.

SignalP

Method

  • SignalP was established by Nielsen et al. in 1997<ref>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites", Protein Engineering, 10:1-6, 1997</ref>
  • Focused on neural networks as well as Hidden Markov Models
  • Uses three different scores for the prediction with HMMs:
    • S-score (score for the signal peptide)
    • C-score (score for the clevage site)
    • Y-score (combination of the S-score and the C-score but more precise)
  • Identifies signal peptides and cleavage sites
  • Make predictions for three different organism groups:
    • eukaryotes
    • Gram-negative
    • Gram-positive bacteria
  • can also be run on the SignalP server

Execution

To run the command line SignalP tool, the path in the SignalP file had to be adapted to /apps/signalp-3.0

Following commands were used to execute SignalP:

  • signalp -t euk P05067.fasta > signalp_out_P05067.txt
  • signalp -t gram- P02945.fasta > signalp_out_P02945.txt
  • signalp -t euk Q9Y5Q6.fasta > signalp_out_Q9Y5Q6.txt
  • signalp -t euk P11279.fasta > signalp_out_P11279.txt
  • signalp -t euk P02753.fasta > signalp_out_P02753.txt
  • signalp -t euk P12694.fasta > signalp_out_P12694.txt


Results

Figure38: Prediction by SignalP for BCKDHA using HMMs

BCKDHA

Both methods (NN and HMM) predicted the most likely cleavage site between positions 32 and 33 (ARG_LA). This is visualized very good by the red lines in Figure 38
This prediction does not agree with UniProt, where a signal peptide from position 1-45 is listed.

A4_HUMAN

SignalP predicted with both methods a cleavage site between positions 17 and 18 with a high probability for a signal peptide.
SignalP predicted the prediction site for A4_HUMAN correct.

BACR_HALSA

Both methods (NN and HMM) predicted no cleavage site, and therefore no signal peptide, in the BACR_HALSA sequence.
This is also true according to UniProt, where no signal peptide is stated.

INSL5_HUMAN

For the INSL5_HUMAN protein signalP detected a cleavage site between positions 22 and 23, which is due to a predicted signal peptide at the beginning of the sequence.
The signal peptidase I cleavage site was predicted correctly, as UniProt states a signal peptide from positions 1-22.


LAMP1_HUMAN

SignalP predicted with both methods a cleavage site between positions 28 and 29, as there is a signal peptide detected.
The cleavage site prediction made by SignalP for LAMP1_HUMAN is correct. UniProt shows a signal peptide for this protein which ranges from 1-28 in the sequence.

RET4_HUMAN

SignalP predicted a cleavage site with high probability between positions 18 and 19 in both the NN and the HMM method. This cleavage site is predicted to be after a signal peptide.
This prediction is correct according to UniProt.

TargetP

Method

  • TargetP was developed by Emanuelsson et al. in 2002 <ref> Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", J. Mol. Biol., 200: 1005-1016, 2002</ref>
  • TargetP predicts the subcellular location of eukaryotic proteins
  • Additionally it can make cleavage site predictions
  • This method is neural network based. The prediction is based on the N-terminal presequences:
    • chloroplast transit peptide(cTP)
    • mitochondrial targeting peptide (mTP)
    • secretory pathway signal peptide (SP)
  • Required input information: Sequence(s) in FASTA format, organism group
  • The prediction can also be ran on the targetP server

Results

Figure39: prediction results by TargetP
All the results of the prediction of TargetP are shown in the table in Figure 39.

The ODBA_HUMAN (BCKDHA) is predicted to be located in the mitochondrion, which is true according to UniProt. All other tested proteins are predicted to be located in the secretory pathway and therefore to have a signal peptide. These predictions are true except for BACR_HALSA, which has no signal peptide. But here TargetP returns a reliabilty index of four, which indicates an unsafe prediction.

4. Prediction of GO terms

The following section deals with GO term prediction tools. In order to verify the predictions, first the real GO annotations are presented (as they are listed in <ref>http://www.uniprot.org</ref>:
(P: Process, F: Function, C: Component)

BCKDHA

GO Term Name GO identifier Aspect
Process
metabolic process 0008152 P
branched chain family amino acid catabolic process 0009083 P
cellular nitrogen compound metabolic process 0034641 P
oxidation-reduction process 0055114 P
Function
alpha-ketoacid dehydrogenase activity 0003826 F
3-methyl-2-oxobutanoate dehydrogenase (2-methylpropanoyl-transferring) activity 0003863 F
protein binding 0005515 F
oxidoreductase activity 0016491 F
oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptors 0016624 F
carboxy-lyase activity 0016831 F
metal ion binding 0046872 F
Component
mitochondrion 0005739 C
mitochondrial matrix 0005739 C
mitochondrial alpha-ketoglutarate dehydrogenase complex 0005947 C

A4_HUMAN

GO Term Name GO identifier Aspect
Process
G2 phase of mitotic cell cycle 0000085 P
suckling behaviour 0001967 P
plantelet degranulation 0002576 P
mRNA polyadenylation 0006378 P
regulation of translation 0006417 P
protein phosphorylation 0006468 P
cellular copper ion homeostasis 0006878 P
endocytosis 0006897 P
apoptosis 0006915 P
induction of apoptosis 0006917 P
cell adhesion 0007155 P
regulation of epidermal growth factor receptor activity 0007176 P
Notch signaling pathway 0007219 P
axonogenesis 0007409 P
blood coagulation 0007596 P
mating bahavior 0007617 P
locomotory behavior 0007626 P
axon cargo transport 0008088 P
cell death 0008219 P
adult locomotory behavior 0008344 P
visual learning 0008542 P
negative regulation of peptidase activity 0010466 P
positive regulation of peptidase activity 0010951 P
axon midline choice point recognition 0016199 P
neuron remodeling 0016322 P
dendrite development 0016358 P
platelet activation 0030168 P
extracellular matrix organization 0030198 P
forebrain development 0030900 P
neuron projection development 0031175 P
ionotropic glutamate recptor signaling pathway 0035235 P
regulation of multicellular organism growth 0040014 P
innate immune response 0045087 P
negative regulation of neuron differentiation 0045665 P
positive regulation of mitotic cell cycle 0045931 P
positive regulation of transcription from RNA polymerase II promotor 0045944 P
collateral sprouting in absence of injury 0048699 P
regulation of synapse structure and activity 0050803 P
neuromuscular process controling balance 0050885 P
synaptic growth at neuromuscular junction 0051124 P
neuron apoptosis 0051402 P
smooth endoplasmic reticulum calcium ion homeostasis 0051563 P
Function
DNA binding 0003677 F
serine-type endopeptidase inhibitor activity 0004867 F
receptor binding 0005102 F
binding 0005488 F
protein binding 0005515 F
heparin binding 0008201 F
peptidase activator activity 0016504 F
peptidase inhibitor activity 0030414 F
acetylcholine receptor binding 0033130 F
identical protein binding 0042802 F
metal ion binding 0046872 F
PTB domain binding 0051425 F
Component
exracellular region 0005576 C
membrane fraction 0005624 C
cytoplasm 0005737 C
Golgi apparatus 0005794 C
plasma membrane 0005886 C
integral to plasma membrane 0005887 C
coated pit 0005905 C
cell surface 0009986 C
membrane 0016020 C
integral to membrane 0016021 C
synaptosome 0019717 C
axon 0030424 C
plantelet alpha granule lumen 0031093 C
cytoplasmic vesicle 0031410 C
neuromuscular junction 0031594 C
ciliary rootlet 0035253 C
neuron projection 0042005 C
dendritic spine 0043197 C
dendritic shaft 0043198 C
intracellular membrane-bounded organelle 0043231 C
apical part of cell 0045177 C
synapse 0045202 C
perinuclear region of cytoplasm 0048471 C
spindle midzone 0051233 C


BACR_HALSA

GO Term Name GO identifier Aspect
Process
transport 0006810 P
ion transport 0006811 P
phototransduction 007602 P
photon transport 0015992 P
protein-chromophore linkage 0018298 P
response to stimulus 0050896 P
Function
receptor activity 0004872 F
ion channel activity 0005216 F
photoreceptor activity 0009881 F
Component
plasma membrane 0005886 C
membrane 0016020 C
integral to membrane 0016021 C

INSL5_HUMAN

GO Term Name GO identifier Aspect
Process
biological_process 0008150 P
Function
hormone activitiy 0005279 F
Component
cellular_component 0005575 C
extracellular region 0005576 C

LAMP1_HUMAN

GO Term Name GO identifier Aspect
Process
autophagy 0006914 P
Component
membrane fraction 0005624 C
lysosome 0005764 C
lysosomal membrane 0005765 C
endosome 0005768 C
late endosome 0005770 C
multivesicular body 0005771 C
plasma membrane 0005886 C
integral to plasma membrane 0005887 C
external side of plasma membrane 0009897 C
cell surface 0009986 C
endosome membrane 0010008 C
membrane 0016020 C
integral to membrane 0016021 C
vesicle 0031982 C
sarcolemma 0042383 C
melanosome 0042470 C

RET4_HUMAN

GO Term Name GO identifier Aspect
Process
eye development 0001654 P
gluconeogenesis 0006094 P
transport 0006810 P
spermatogenesis 0007283 P
heart development 0007507 P
visual perception 0007601 P
male gonad development 0008584 P
embryo development 0009790 P
maintenance of gastrointestinal epithelium 0030277 P
lung development 0030324 P
positive regulation of insulin secretion 0033024 P
response to retinoic acid 0032526 P
response to insulin stimulis 0032868 P
retinol transport 0034633 P
retinol metabolic process 0042572 P
retinal metabolic process 0042574 P
glucose homeostasis 0042593 P
response to ethanol 0045471 P
embryonic organ morphogenesis 0048562 P
embryonic skeletal system development 0048706 P
cardiac muscle tissue development 0048738 P
female genitalia morphogenesis 0048807 P
response to stimulus 0050896 P
detection of light stimulus involved in visual perception 0050908 P
positive regulation of immunoglobin secretion 0051024 P
retina development in camera-type eye 0060041 P
negative regulation of cardiac muscle cell proliferation 0060044 P
embryonic retina morphogenesis in camera-type eye 0060059 P
uterus development 0060065 P
vagina development 0060068 P
urinary bladder development 0060157 P
heart trabecula formation 0060347 P
Function
transporter activity 0005215 F
binding 0005488 F
retinoid binding 0005501 F
protein binding 0005515 F
retinal binding 0016918 F
retinol binding 0019841 F
retinol transporter activity 0034632 F
Component
extracellular region 0005576 C
extracellular space 0005615 C

GOPET

Method

  • GOPET (Gene Ontology Term Prediction and Evaluation Tool) was described by Vinayagam et al.<ref> Arunachalam Vinayagam, Coral Del Val, Falk Schubert, Roland Eils, Karl-Heinz Glatting, Sándor Suhai, Rainer König, "GOPET: A tool for automated predictions of Gene Ontology terms", BMC Bioinformatics (2006), Volume: 7, Issue: 161, Publisher: BioMed Central, Pages: 161</ref>
  • GOPET is a complete automated tool for assigning molecular function terms to a given sequence.
  • Bases on homology searches and Support Vector Machines
  • Required input information: cDNA or protein sequence
  • Gene Ontology is used for annotation terms, GO-mapped protein databases for performing homology searches and Support Vector Machines for the prediction and the assignment of confidence values.
  • The prediction is organism independent.

Results

BCKDHA

GOid Aspect Confidence GOTerm
GO:0003824 F 97% catalytic activity
Go:0016491 F 96% oxidoreductase activity
GO:0016624 F 95% oxidoredusctase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863 F 90% 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739 F 89% pyruvate dehydrogenase acetyl-transferring activity
GO:0004738 F 78% pyruvat dehydrogenase activity
GO:0003826 F 77% alpha-ketoacid dehydrogenase activity
GO:0047101 F 75% 2-oxoisovalerate dehydrogenase acylting activity
GO:0008677 F 65% 2-dehydropantoate 2-reductase activity
GO:0019152 F 63% acetoin dehydrogenase activity
GO:0030955 F 63% potassium ion binding
GO:0016616 F 62% oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872 F 62% metal ion binding

The GOPET predictions for BCKDHA are mostly correct. The by this tool predicted GO terms with confidence >90% are all listed in the UniProt entry for BCKDHA and so is the metal ion binding function.


A4_HUMAN

GOid Aspect Confidence GOTerm
GO:0004866 F 87% endopeptidase inhibitor activity
GO:0004867 F 86% serine-type endopeptidase inhibitor activity
GO:0030568 F 83% plasmin inhibitor activity
GO:0030304 F 83% trypsin inhibitor activity
GO:0030414 F 82% peptidase inhibitor activity
GO:0005488 F 79% binding
GO:0005515 F 74% protein binding
GO:0046872 F 73% metal ion binding
GO:0003677 F 71% DNA binding
GO:0008201 F 70% heparin binding
GO:0008270 F 69% zinc ion binding
GO:0005507 F 69% copper ion binding
GO:0005506 F 67% iron ion binding

The GOPET results for A4_HUMAN match the UniProt annotation quite good. The predicted trypsin inhibitor activity and the plasmin inhibitor activity are not present in UniProt, as well as the peptidase inhibitor activity or the endopeptidase activity. But as the predicted serine-type endopeptidase inhibitor activity can be seen as a subcategory of the previously named functions, and it is a true function of the A4_HUMAN protein, the predictions are not that bad. The same is true for the zinc, copper and iron ion binding function, which are all metals, and the protein has a metal ion binding function.


BACR_HALSA

GOid Aspect Confidence GOterm
GO:0005216 F 77% ion channel activiy
GO:0008020 F 75% G-protein coupled photoreceptor activity
GO:0015078 F 60% hydrogen ion transmembrane transporter activity

GOPET predicted the ion channel activity and the photorecptor activity correctly. The hydrogen ion transmembrane transporter activity does not agree with the UniProt annotations.

INSL5_HUMAN

GOid Aspect Confidence GOterm
GO:0005179 F 80% hormone activity

The INSL5_HUMAN protein is correctly predicted to be a hormone.


LAMP1_HUMAN

GOid Aspect Confidence GOterm
GO:0004812 F 60% aminoacyl-tRNA ligase activity
GO:0005524 F 60% ATP binding

For the LAMP1_HUMAN protein, no functional GO annotation is listed in UniProt.


RET4_HUMAN

GOid Aspect Confidence GOterm
GO:0005488 F 90% binding
GO:0005501 F 81% retinoid binding
GO:0008289 F 80% lipid binding
GO:0019841 F 78% retinol binding
GO:0005215 F 78% transporter activity
GO:0016918 F 78% retinal binding
GO:0005319 F 69% lipid transporter activity
GO:0008035 F 60% high-density lipoprotein particle binding

The GOPET predictions for RET4_HUMAN are correct except for the lipid-linked activities.

Pfam

Method

  • Pfam was established by Finn et al. in 2008. It is described in <ref>Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A (2008). "The Pfam protein families database.". Nucleic Acids Res 36 (Database issue): D281–8</ref>
  • Pfam is a database which contains protein families and domains
  • The databes consists of two different parts: Pfam-A and Pfam-B
    • Pfam-A: more exactly
    • Pfam-B: generated automatically and so the data are not as qualitativ as in Pfam-A
  • Webserver: Pfam


Results

BCKDHA

Figure40: prediction of Pfam-A for BCKDHA

Pfam found one significant match in the database which is visualized in Figure 40.

  • Molecular function
    • GO:0016624 (oxidoreductase activity, acting on the aldehyde or oxo group of donors, disulfide as acceptor)
  • Biological Process
    • GO:0008152 (metabolic process)


A4_HUMAN

Figure41: prediction of Pfam-A for A4_HUMAN

Pfam found six significant matches in the database which are visualized in Figure 41.

  • Cellular Component
    • GO:0016021 (integral to membrane)
  • Molecular function
    • GO:0005488 (binding)
    • GO:0004867 (serine-type endopeptidase inhibitor activity )
  • No GO-ID
    • E2 domain of amyloid precursor protein
    • beta-amyloid precursor protein C-terminus



BACR_HALSA

Figure42: prediction of Pfam-A for BACR_HALSA

Pfam found one significant match in the database which is visualized in Figure 42.

  • Cellular Component
    • GO:0016020 (membrane)
  • Molecular function
    • GO:0005216 (ion channel activity)
  • Biological Process
    • GO: 0006811 (ion transport)



INSL5_HUMAN

Figure43: prediction of Pfam-A for INSL5_HUMAN

Pfam found one significant match in the database which is visualized in Figure 43.

  • Cellular Component
    • GO:0005576 (extracellular region)
  • Molecular function
    • GO:0005179 (hormone activity)


LAMP1_HUMAN

Figure44: prediction of Pfam-A for LAMP1_HUMAN

Pfam found one significant match in the database which is visualized in Figure 44.

  • Cellular Component
    • GO:0016020 (membrane)



RET4_HUMAN

Figure45: prediction of Pfam-A for RET4_HUMAN

Pfam found one significant match in the database which is visualized in Figure 45.

  • Molecular function
    • GO:0005488 (binding)


By comparing the Pfam annotations with the already known GO terms for the different proteins it can be seen that the results for all analysed proteins are correct, but by far not exhaustive.

ProtFun 2.2

Method

  • ProtFun is described in : Jensen et al.<ref>Prediction of human protein function from post-translational modifications and localization features.

L. Juhl Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Stærfeldt, K. Rapacki, C. Workman, C. A. F. Andersen, S. Knudsen, A. Krogh, A. Valencia and S. Brunak. J. Mol. Biol., 319:1257-1265, 2002</ref>

  • ProtFun is an ab initio prediction server of protein function from sequence. Various servers are queried and the provided information is integrated into the final prediction.
  • The results of ProtFun are only probabilities and odd scores and no prediction if a protein has a specific function or not.
  • The arrow (=>) indicates which line includes the highest information content

Results

BCKDHA

Figure46: Prediction of GO-terms for BCKDHA by ProtFun 2.2
  • Functional category
    • Central_intermediary_metabolism (Prob: 0.321, Odds: 5.096) (=>)
    • Amino_acid_biosynthesis (Prob: 0.187, Odds: 8.520)
    • Purines_and_purymidines (Prob: 0.257, Odds: 1.059)
    • Biosynthesis_of_cofactors (Prob: 0.246, Odds: 3.413)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.769, Odds: 2.683)
  • Enzyme class
    • Ligase (Prob: 0.085, Odds: 1.673) (=>)
    • Lyase (Prob: 0.076, Odds: 1.614)
  • Gene Ontology category
    • Growth_factor (Prob: 0.009, Odds: 0.609)
    • Signal_transducer (Prob: 0.098, Odds: 0.458)

The results of ProtFun2.2 for the prediction of the GO-terms in BCKDHA are listed in Figure 46. In the enumeration above the most significant results are summarized. The programm predicted BCKDHA to have mainly a function in the metabolic process. Also the second point of "functional category" has a very good odd score and so we also consider it to be a certain prediction. The other two entries are the ones with the next best probability or odd score. But we can see that in both cases the odd score is much lower than in the first two results. So we take the first entries as the best predictions of ProtFun2.2. By comparing these assertions with the information in UniProt we can see that they are correct. There was no certain prediction for the "Gene Ontology category". We listed the two best results above but as we can see by looking at the probability and the odd score the results are not significant.

A4_HUMAN

Figure47: Prediction of GO-terms for A4_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Transport_and_Binding (Prob: 0.827, Odds: 2.016)
    • Biosynthesis_of_cofactors (Prob: 0.261, Odds: 3.623)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.392, Odds: 1.368)
  • Enzyme class
    • Ligase (Prob: 0.048, Odds: 0.946)
    • Transferase (Prob: 0.208, Odds: 0.603)
    • Hydrolase (Prob: 0.190, Odds: 0.600)
  • Gene Ontology category
    • Structural_protein (Prob: 0.034, Odds: 1.205) (=>)
    • Stress_response (Prob: 0.076, Odds: 0.862)
    • Signal transducer (Prob: 0.126, Odds: 0.586)

The results of ProtFun2.2 for the prediction of the GO-terms in A4_HUMAN are listed in Figure 47. The most significant results are shown in the listing above. Here we can see that A4_HUMAN is predicted to be a "Cell_envelope" with an odd score of 13.186 which indicates that this prediction is very confident. And as we know from UniProt it is right. A4_HUMAN has also the function of transport and binding which is the next point in the list. So we can see that although the odd score is much lower than the fist one, the prediction is indeed correct. As well as the claim that A4_HUMAN is involved in the biosynthesis of cofactors. ProtFun2.2 assumed that this protein is a structural protein. This is again correct according to the information in the beginning of this section. But this is not the only correct predicted GO-term for A4_HUMAN which is shown by the two other listed GO-terms. By comparing the whole list of predicted GO-terms in Figure 47 with the given information we can see that all of the predictions are right.

BACR_HALSA

Figure48: Prediction of GO-terms for BACR_HALSA by ProtFun 2.2
  • Functional category
    • Transport_and_Binding (Prob: 0.791, Odds: 1.929) (=>)
    • Biosynthesis of cofactors (Prob: 0.186, Odds: 2.589)
    • Purines_and_pyrimidines (Prob: 0.302, Odds: 1.244)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.801, Odds: 1.122)
  • Enzyme class
    • none
  • Gene Ontology category
    • Transporter (Prob: 0.400, Odds: 4.036) (=>)
    • Receptor (Prob: 0.355, Odds: 2.087)

The results of ProtFun2.2 for the prediction of the GO-terms in BACR_HALSA are listed in Figure 48. The most significant results are shown in the list above. Since both the prediction in the "functional category" and in "gene ontology category" declared BACR_HALSA to be mainly a transporter it can be assumed that this prediction is very significant. By looking at the UniProt GO annotations we can see that they include ion transport and photon transport, as well as transport itself so we can say that this prediction was correct. But the assumption that BACR_HALSA is involved in the biosynthesis of cofactors is wrong although it has a quite good odd score. But it has a very low probability which shows that the information of both is important. The last point in the list shows that ProtFun2.2 predicted receptor functionallity. This is also a correct prediction because in UniProt is listed that this protein has a receptor activity.

INSL5_HUMAN

Figure49: Prediction of GO-terms for INSL5_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.756, Odds: 12.393) (=>)
    • Transport_and_binding (Prob: 0.834, Odds: 2.033)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.791, Odds: 1.109)
  • Enzyme class
    • none
  • Gene Ontology category
    • Hormone (Prob: 0.247, Odds: 37.936) (=>)
    • Growth_factor (Prob: 0.061, Odds: 4.379)

The results of ProtFun2.2 for the prediction of the GO-terms in INSL5_HUMAN are listed in Figure 49. The most significant results are specified above. The first prediction of ProtFun2.2 which maintained that INSL5_HUMAN can be classified in the functional category "cell envelope" has a very high odd score of 12.393. This suggests that this prediction is correct which can be confirmed by the information of UniProt. The prediction that INSL5_HUMAN is involved in transport and binding is also correct predicted. Additionally ProtFun predicted the hormone activity of INSL5_HUMAN correctly. By comparing this prediction with the information offered by UniProt we can see that it is correct. But it is additionally the only GO-term for this protein which means that the prediction of "growth factor" has to be wrong.

LAMP1_HUMAN

Figure50: Prediction of GO-terms for LAMP1_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Transport_and_binding (Prob: 0.834, Odds: 2.033)
  • Enzyme/nonenzyme
    • Nonenzyme (Prob: 0.724, Odds: 1.014)
  • Enzyme class
    • none
  • Gene Ontology category
    • Immune_response (Prob: 0.371, Odds: 4.368) (=>)
    • Stress_response (Prob: 0.246, Odds: 2.795)

The results of ProtFun2.2 for the prediction of the GO-terms in LAMP1_HUMAN are listed in Figure 50. The most significant results can be found in the list above. This protein is predicted to be important for the cell envelope with a very significant probability and odd score. As expected this result is correct since it also occurs in the GO-terms in UniProt. In contrast the prediction of transport and binding which has indeed a good probability but no high odd score is wrong. The GO category Immune response predicted by ProtFun for LAMP1_HUMAN is not false, as autophagy is a process often triggered by the immune system as a response to foreign substances. Since autophagy is listed in UniProt the prediction of stress response is also quite correct.

RET4_HUMAN

Figure51: Prediction of GO-terms for RET4_HUMAN by ProtFun 2.2
  • Functional category
    • Cell_envelope (Prob: 0.804, Odds: 13.186) (=>)
    • Central_intermediary_metabolism (Prob: 0.197, Odds: 3.128)
    • Transport_and_binding (Prob: 0.800, Odds: 1.951)
  • Enzyme/nonenzyme
    • Enzyme (Prob: 0.544, Odds: 1.900)
  • Enzyme class
    • Lyase (Prob: 0.059, Odds: 1.264) (=>)
    • Hydrolase (Prob: 0.235, Odds: 0.742)
  • Gene Ontology category
    • Immune_response (Prob: 0.239, Odds: 2.813) (=>)
    • Stress_response (Prob: 0.616, Odds: 1.829)

The results of ProtFun2.2 for the prediction of the GO-terms in RET4_HUMAN are listed in Figure 51. The most significant results can be found in the list above. The categorization of ProtFun2.2 of RET4_HUMAN in cell_envelope is done with a very high probability and a significant odd score. And of course this prediction is rigth which can be seen by the comparison with UniProt. The result that the protein is involved in the metabolism has a very bad probability and a much lower odd score than the first hit but anyway it is correct. The last of the three listed functional categories is also predicted accurately. The prediction of immune response for RET4_HUMAN. We can't find any hints in the GO-terms in UniProt for immune response. Whereas the prediction of stress response was correct.

References

<references />


back to Maple syrup urine disease main page

go to Sequence Alignments BCKDHA (Task 2)

go to Homology_based_structure_predictions_BCKDHA (Task 4)