Difference between revisions of "Task 3 (MSUD)"

From Bioinformatikpedia
(Result)
m (Discussion)
 
(100 intermediate revisions by 2 users not shown)
Line 4: Line 4:
 
=== Result ===
 
=== Result ===
   
  +
==== Approach for predicting secondary structure with ReProf ====
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
 
   
  +
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
<code>
 
/mnt/home/student/schillerl/MasterPractical/task3/reprof/
 
 
/mnt/home/student/schillerl/MasterPractical/task3/psipred/
 
 
/mnt/home/student/schillerl/MasterPractical/task3/dssp/
 
</code>
 
 
 
For P10775, ReProf was run with the protein sequence fasta file and the PSSMs (see <code>/mnt/home/student/schillerl/MasterPractical/task3/pssm/</code>) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP.
 
   
   
Line 57: Line 48:
   
   
  +
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used.
 
  +
  +
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.
  +
  +
  +
==== Comparison of ReProf to PsiPred and DSSP ====
  +
  +
The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.
  +
  +
  +
P12694
  +
  +
{| class="wikitable" border="1" style="width:500px"
  +
!secondary structure element !! ReProf vs. PsiPred !! ReProf vs. DSSP
  +
|-
  +
|H || 0.804 || 0.812
  +
|-
  +
|E || 0.400 || 0.585
  +
|-
  +
|L || 0.876 || 0.782
  +
|-
  +
|all || 0.849 || 0.816
  +
|}
  +
  +
  +
P10775
  +
  +
{| class="wikitable" border="1" style="width:500px"
  +
!secondary structure element !! ReProf vs. PsiPred !! ReProf vs. DSSP
  +
|-
  +
|H || 0.798 || 0.889
  +
|-
  +
|E || 0.691 || 0.649
  +
|-
  +
|L || 0.779 || 0.828
  +
|-
  +
|all || 0.849 || 0.855
  +
|}
  +
  +
  +
Q08209
  +
  +
{| class="wikitable" border="1" style="width:500px"
  +
!secondary structure element !! ReProf vs. PsiPred !! ReProf vs. DSSP
  +
|-
  +
|H || 0.794 || 0.816
  +
|-
  +
|E || 0.487 || 0.615
  +
|-
  +
|L || 0.830 || 0.807
  +
|-
  +
|all || 0.827 || 0.807
  +
|}
  +
  +
  +
Q9X0E6
  +
  +
{| class="wikitable" border="1" style="width:500px"
  +
!secondary structure element !! ReProf vs. PsiPred !! ReProf vs. DSSP
  +
|-
  +
|H || 0.897 || 0.923
  +
|-
  +
|E || 0.694 || 0.643
  +
|-
  +
|L || 0.636 || 0.545
  +
|-
  +
|all || 0.802 || 0.802
  +
|}
  +
  +
  +
Altogether, ReProf agrees in 80-85% of the predictions with PsiPred and DSSP. In most cases the agreement for H and L is higher than for E.
  +
  +
==== Information from UniProt and PDB ====
  +
  +
A summary of interesting features for the proteins taken from [http://www.uniprot.org UniProt] and [http://www.pdb.org PDB]:
  +
  +
  +
'''P12694, 2BFD:'''
  +
* name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
  +
* EC: 1.2.4.4
  +
* gene: BCKDHA
  +
* organism: ''Homo sapiens'' (Human)
  +
* sequence length: 445 AA
  +
* subunit structure: heterotetramer of alpha and beta chains
  +
* subcellular location: mitochondrion matrix
  +
* secondary structure: 42% helical, 10% beta sheet
  +
* 3D similarity: pyruvate dehydrogenase E1
  +
* ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate
  +
  +
  +
'''P10775, 2BNH:'''
  +
* name: ribonuclease inhibitor
  +
* gene: RNH1
  +
* organism: ''Sus scrofa'' (Pig)
  +
* sequence length: 456 AA
  +
* subcellular location. cytoplasm
  +
* sequence similarities: contains 15 LRR (leucine-rich) repeats
  +
* secondary structure: alternating helix and strand, 42% helical, 12% beta sheet
  +
  +
  +
'''Q08209, 1AUI:'''
  +
* name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
  +
* EC: 3.1.3.16
  +
* gene: PPP3CA
  +
* organism: ''Homo sapiens'' (Human)
  +
* sequence length: 521 AA
  +
* subunit structure: heterodimer of alpha and beta chain (human calcineurin heterodimer)
  +
* subcellular location: nucleus
  +
* secondary structure: 27% helical, 11% beta sheet
  +
* ligands: calcium ion, Fe (III) ion, zinc ion
  +
  +
  +
'''Q9X0E6, 1KR4:'''
  +
* name: divalent-cation tolerance protein CutA
  +
* gene: cutA
  +
* organism: ''Thermotoga maritima'' (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)
  +
* sequence length: 101 AA
  +
* subunit structure: homotrimer
  +
* subcellular location: cytoplasm
  +
* secondary structure: great fraction of strands, 29% helical, 35% beta sheet
   
 
=== Discussion ===
 
=== Discussion ===
  +
  +
In the first step, the ReProf results for P10775 were evaluated against the DSSP assignment. Here, DSSP was viewed as a standard of truth, because it assigns secondary structure based on the measured 3D structure by examining angles and H bonds between atoms.
  +
  +
The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that do not alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 for the remaining proteins because it showed a slightly better performance in our test with the example protein P10775.
  +
  +
For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops. A reason for this could be that beta strands are less frequent than alpha helices in most proteins, as can be seen in the informations taken from UniProt and PBD.
  +
  +
Other method for prediction of secondary structure are [http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html GOR], [http://www.compbio.dundee.ac.uk/www-jpred/ Jpred] and [http://www.predictprotein.org/ PHDsec]. Another secondary structure assignment tool is [http://webclu.bio.wzw.tum.de/cgi-bin/stride/stridecgi.py STRIDE].
   
 
== Disordered protein ==
 
== Disordered protein ==
Line 65: Line 183:
   
 
=== Result ===
 
=== Result ===
  +
==== IUPred ====
  +
IUPred performs prediction of intrinsic disordered regions of proteins by the observation of pairwise energy content from proteins with known structures. The results of IUPred fall into 3 categories: long (disordered regions), short (disordered regions) and globular (regions of protein where pairwise interactions between residues are more potential).
  +
  +
For the prediction of disordered regions (prediction mode '''long''' and '''short'''),
  +
position-specific score is assigned to each residue of the query sequence. The score indicates
  +
the tendency that the corresponding residue belongs to a disordered region. Following are the profiles
  +
of disorder tendency generated by web-tool of [http://iupred.enzim.hu/ IUPred]:
  +
  +
===== BCKDHA =====
  +
<gallery perrow=2 widths=400px heights=140px caption="Profiles for disorders and globular regions of BCKDHA">
  +
File:Profile BCKDHA both.png|Profile of long range and short range disorders
  +
File:Profile BCKDHA diff.png|Difference between scores of long range disorders and short range disorders
  +
File:BCKDHA_globular.png|Prediction of globular regions
  +
</gallery>
  +
  +
===== P10775 =====
  +
<gallery perrow=2 widths=400px heights=140px caption="Profiles for disorders and globular regions of P10775">
  +
File:Profile P10775 both.png|Profile of long range and short range disorders
  +
File:Profile P10775 diff.png|Difference between scores of long range disorders and short range disorders
  +
File:P10775_globular.png|Prediction of globular regions
  +
</gallery>
  +
  +
===== Q9X0E6 =====
  +
<gallery perrow=2 widths=400px heights=140px caption="Profiles for disorders and globular regions of Q9X0E6">
  +
File:Profile Q9X0E6 both.png|Profile of long range and short range disorders
  +
File:Profile Q9X0E6 diff.png|Difference between scores of long range disorders and short range disorders
  +
File:Q9X0E6_globular.png|Prediction of globular regions
  +
</gallery>
  +
  +
===== Q08209 =====
  +
<gallery perrow=2 widths=400px heights=140px caption="Profiles for disorders and globular regions of Q08209">
  +
File:Profile Q08209 both.png|Profile of long range and short range disorders
  +
File:Profile Q08209 diff.png|Difference between scores of long range disorders and short range disorders
  +
File:Q08209_globular.png|Prediction of globular regions
  +
</gallery>
  +
  +
===== Statistics =====
  +
<gallery perrow=2 widths=400px heights=256px caption="Relations between profiles of long and short disorders">
  +
File:RMSD all.png
  +
File:Correlation all.png
  +
</gallery>
  +
  +
==== Metadisorder ====
  +
<gallery perrow=2 widths=400px heights=140px caption="Profile of raw scores and predicted globular regions">
  +
File:Metadisorder profile BCKDHA.png|Predicted disordered and globular regions of BCKDHA
  +
File:Metadisorder profile P10775.png|Predicted disordered and globular regions of P10775
  +
File:Metadisorder profile Q9X0E6.png|Predicted disordered and globular regions of Q9X0E6
  +
File:Metadisorder profile Q08209.png|Predicted disordered and globular regions of Q08209
  +
</gallery>
  +
  +
==== DisProt ====
  +
Because DisProt dose not contain all uniprot sequences, we have searched for homologous sequences in DisProt for the protein sequences.
  +
<gallery perrow=2 widths=450px heights=45px caption="Disordered regions found in DisProt (yellow: whole protein, red:disordered regions, blue:ordered regions)">
  +
File:DP00386 C001.gif|Homologous sequence for BCKDHA in DisProt (E-value: 0.19). Homologous region is 47-127 (in BCKDHA 351-432).
  +
File:PkudisprotP10775.gif|Homologous sequence for P10775 in DisProt (E-value: 1.4e-24). Homologous region is 780-985 (in P10775 248-456).
  +
File:DP00465.gif|Homologous sequence for Q9X0E6 in DisProt (E-value: 0.34). Homologous region is 10-90 (in Q9X0E6 12-101).
  +
File:PkudisprotQ08209.gif|There is an entry for Q08209 in DisProt.
  +
</gallery>
   
 
=== Discussion ===
 
=== Discussion ===
  +
  +
==== IUPred ====
  +
[[File:Q9X0E6 1KR4 structure.png|thumb|right|320px|Structural stability of Q9X0E6. Red parts have symmetry-related crystal contacts (within 5 &Aring;). Thickness of backbone represents variation of B-values.]]
  +
* Generally the profiles of long and short disorders are similar because they are overall highly correlated (except for protein Q9X0E6).
  +
* At both ends of the protein short disorder scores are much higher. This reflects the fact that residues at ends of proteins have higher spatial flexibility and are structurally more unstable.
  +
* High short disorder scores of Q9X0E6 can be explained by its structure.
  +
** As is shown in its X-ray structure, at the both ends of the thick representation of backbone indicates high B-values which mean the conformation of residues is either very flexible or even undefined.
  +
** While red parts of the protein show symmetry-related crystal contacts (within 5 &Aring;), the remaining parts have barely symmetry-related crystal contacts.
  +
** The overall lower long disorder score may be explained by the fact that the protein still fold into a definite 3D structure, despite of local flexibility.
  +
* Q9X0E6 also falls into different category in comparison to the other 3 proteins.
  +
** As is described in [http://www.ebi.ac.uk/interpro/entry/IPR004323 InterPro] its function is not clear but should have a role in signal transduction.
  +
** The other 3 proteins are either subunits or inhibitor of enzymes.
  +
  +
==== MetaDisorder ====
  +
* In comparison to IUPred, MetaDisorder seems to be more sensitive to input data. It finds out more short structural regions that locate in disordered regions.
  +
** For sequence Q08209, the prediction result of MetaDisorder shows almost the same disordered regions which are annotated in DisProt.
  +
* Generally IUPred and MetaDisorder share very similar results.
  +
** For sequence Q9X0E6, the predicted results are most dissimilar. The annotation of homologous sequence in DisProt can not show significant information for comparison to prediction methods because the E-value of local alignment is too high.
  +
  +
==== Comparison between Prediction and Annotation ====
  +
<gallery perrow=2 widths=500px caption="Comparison between prediction results of IUPred, MetaDisorder and annotation in Disprot">
  +
File:P50224 DP00011.gif|Annotation of disordered regions in protein P50224. (Yellow: whole protein, Red: disordered regions, Blue: ordered regions)
  +
File:P50224 globular.png|Prediction result from IUPred for protein P50224.
  +
File:Metadisorder profile P50224.png|Prediction result from MetaDisorder for protein P50224.
  +
</gallery>
  +
  +
The prediction results of Human Sulfotransferase 1A3/1A4(P50224) are dissimilar to the annotations in DisProt. It seems structural features like B-factors and symmetry-related crystal contacts give important clue to intrinsically disordered regions. As is shown in PDB structure of P50224, large range of amino-acids have high B-factors.
  +
[[File:P50224 pdbe 1cjm.png|thumb|right|400px|X-ray crystallographic structure of P50224 (Source: PDBe)]]
   
 
== Transmembrane helices ==
 
== Transmembrane helices ==
Line 72: Line 276:
   
 
=== Result ===
 
=== Result ===
  +
  +
==== PolyPhobius ====
  +
Except BCKDHA all other proteins were predicted to have transmembrane helices.
  +
Although there is a weak signal for transmembrane region,
  +
the protein P35462 is predicted to have 6 transmembrane helices.
  +
The protein P47863 is predicted to have 6 transmembrane helices.
  +
And protein Q9YDF8 is predicted to have 7 transmembrane helices.
  +
<gallery perrow=2 widths=400px heights=300px caption="Prediction results of PolyPhobius">
  +
File:BCKDHA polyphobius.png|Residue localization of '''BCKDHA''' predicted by PolyPhobius
  +
File:P35462.png|Residue localization of '''P35462''' predicted by PolyPhobius. 6 transmembrane helices.
  +
File:P47863.png|Residue localization of '''P47863''' predicted by PolyPhobius. 6 transmembrane helices.
  +
File:Q9YDF8.png|Residue localization of '''Q9YDF8''' predicted by PolyPhobius. 7 transmembrane helices.
  +
</gallery>
  +
  +
==== MEMSAT-SVM ====
  +
BCKDHA is predicted to have 1 transmembrane helix. All the other proteins are predicted to have 6 transmembrane helices.
  +
<gallery perrow=2 widths=400px heights=250px caption="Predicted topology of transmembrane helices by MEMSAT-SVM">
  +
File:645af569-d96c-46c5-971e-6639ad7149df.seqjob cartoon memsat svm.png|Prediction result for protein '''BCKDHA'''
  +
File:6780f954-761b-4320-ab74-d89db6051915.seqjob cartoon memsat svm.png|Prediction result for protein '''P35462'''
  +
File:E4168d71-9342-42de-bf8a-d88420353ad2.seqjob cartoon memsat svm.png|Prediction result for protein '''P47863'''
  +
File:517c3604-cfd1-4c94-bdd8-725f34389e50.seqjob cartoon memsat svm.png|Prediction result for protein '''Q9YDF8'''
  +
</gallery>
  +
  +
==== Annotations in OPM and PDBTM ====
  +
* '''BCKDHA''': no annotation was found in OPM and PDBTM.
  +
* '''P35462'''(3PBL):
  +
<gallery perrow=2 widths=300px heights=300px caption="Orientation of transmembrane helices in OPM and PDBTM">
  +
File:Opm 3pbl.png|Structural annotation of P35462 in OPM. There are total 7 transmembrane helices.
  +
File:3pbl lm.png|Structural annotation of P35462 in PDBTM. There are total 7 transmembrane helices.
  +
</gallery>
  +
  +
* '''P47863'''(2D57): The PDB structure is a homo tetramer. Protein '''P47863''' is one of the 4 identical chains.
  +
<gallery perrow=2 widths=300px heights=300px caption="Orientation of transmembrane helices in OPM and PDBTM">
  +
File:Opm 2d57.png|Structural annotation of P47863 in OPM. There are total 32 transmembrane helices. Each chain has 8 transmembrane helices.
  +
File:2d57 lm.png|Structural annotation of P47863 in PDBTM. There are total 32 transmembrane helices. Each chain has 6 transmembrane helices.
  +
</gallery>
  +
  +
* '''Q9YDF8'''(2KYH):
  +
<gallery perrow=2 widths=300px heights=300px caption="Orientation of transmembrane helices in OPM and PDBTM">
  +
File:2kyh.png|Structural annotation of Q9YDF8 in OPM. There are total 5 transmembrane helices.
  +
File:2kyh lm.png|Structural annotation of Q9YDF8 in PDBTM. There are total 4 transmembrane helices.
  +
</gallery>
   
 
=== Discussion ===
 
=== Discussion ===
  +
* Comparison between OPM and PDBTM:
  +
** PDBTM seems to have more narrow transmembrane region than the OPM.
  +
** For same structure, PDBTM tends to assign less transmembrane helices in comparison to OPM.
  +
* As we already know, BCKDHA is a intra-mitochondrial protein. The prediction result of MEMSAT-SVM is wrong.
  +
* Generally prediction results of PolyPhobius and MEMSAT-SVM are similar to annotations in OPM and PDBTM.
   
 
== Signal peptides ==
 
== Signal peptides ==
Line 79: Line 330:
   
 
=== Result ===
 
=== Result ===
  +
  +
==== SignalP predictions ====
  +
  +
The following diagrams show the C- (cleavage site), S- (signal peptide) and Y- (combined cleavage site) scores for the three proteins according to SignalP version 4.1. The combined cleavage site score combines the cleavage score with the slope of the signal peptide score to optimize the recognition of cleavage sites.
  +
  +
[[File:P02768_signalp.png]]
  +
  +
  +
[[File:P47863_signalp.png]]
  +
  +
  +
[[File:P11279_signalp.png]]
  +
  +
  +
For P02768 and P11279 signal peptides are predicted, with the clevage site between position 18 and 19 for P02768 and between 28 and 29 for P11279. For P47863, no signal peptide is predicted.
  +
  +
SignalP version 3.0 came to the same result for P02768 and P11279. However for P47863 it predicts a signal peptide with cleavage site between positions 54 and 55 (neural network) or 56 and 57 (hidden markov model), although only with the probability 0.723 compared to near 1 for the other two proteins.
  +
  +
==== Known signal peptides ====
  +
  +
On the [http://www.signalpeptide.de/index.php Signal Peptide Website] there are entries for P02768 and P11279 but not for P47863:
  +
  +
  +
{| class="wikitable" border="1" style="width:1000px"
  +
!Accession Number !! Entry Name !! Protein Name !! Organism !! Length !! Status !! Signal Sequence
  +
|-
  +
|P02768 || ALBU_HUMAN || Serum albumin || ''Homo sapiens'' || 18 || confirmed || MKWVTFISLLFLFSSAYS
  +
|-
  +
|P11279 || LAMP1_HUMAN || Lysosome-associated membrane glycoprotein 1 || ''Homo sapiens'' || 28 || confirmed || MAAPGSARRPLLLLLLLLLLGLMHCASA
  +
|}
   
 
=== Discussion ===
 
=== Discussion ===
  +
  +
The predictions of the newest version of SignalP agree with the confirmed signal peptides. The older version predicted a signal peptide in P47863, where no one is. According to UniProt, P47863 has transmembrane helices, so these might be mistaken for a signal peptide by the old version because they resemble each other.
  +
  +
Other methods for signal peptide prediction are [http://phobius.sbc.su.se/ Phobius], [http://www.predisi.de/ PrediSi] and
  +
[http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/ Signal-3L].
   
 
== GO terms ==
 
== GO terms ==
Line 86: Line 372:
   
 
=== Result ===
 
=== Result ===
  +
  +
==== GOPET ====
  +
  +
{| class="wikitable" border="1" style="width:750px"
  +
!GOid !! Aspect !! Confidence !! GO term
  +
|-
  +
| GO:0003824 || F || 97% || catalytic activity
  +
|-
  +
| GO:0016491 || F || 96% || oxidoreductase activity
  +
|-
  +
| GO:0016624 || F || 95% || oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
  +
|-
  +
| GO:0003863 || F || 90% || 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
  +
|-
  +
| GO:0004739 || F || 89% || pyruvate dehydrogenase acetyl-transferring activity
  +
|-
  +
| GO:0004738 || F || 78% || pyruvate dehydrogenase activity
  +
|-
  +
| GO:0003826 || F || 77% || alpha-ketoacid dehydrogenase activity
  +
|-
  +
| GO:0047101 || F || 75% || 2-oxoisovalerate dehydrogenase acylating activity
  +
|-
  +
| GO:0008677 || F || 65% || 2-dehydropantoate 2-reductase activity
  +
|-
  +
| GO:0019152 || F || 63% || acetoin dehydrogenase activity
  +
|-
  +
| GO:0030955 || F || 63% || potassium ion binding
  +
|-
  +
| GO:0016616 || F || 62% || oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
  +
|-
  +
| GO:0046872 || F || 62% || metal ion binding
  +
|}
  +
  +
  +
Most terms concerning the catalytic activity (GO:0003824, GO:0016491, GO:0016624, GO:0003863, GO:0003826, GO:0047101) are consistent with the knowledge about the enzyme activity of the 2-oxoisovalerate dehydrogenase (see [[Maple Syrup Urine Disease#Biochemical disease mechanism|biochemical description of MSUD]]). Also the terms about metal binding (GO:0030955, GO:0046872) correspond to the [[#Information from UniProt and PDB|characterization in PDB]].
  +
  +
==== ProtFun ====
  +
  +
<pre>
  +
############## ProtFun 2.2 predictions ##############
  +
  +
>gi_11386135
  +
  +
# Functional category Prob Odds
  +
Amino_acid_biosynthesis 0.187 8.520
  +
Biosynthesis_of_cofactors 0.246 3.413
  +
Cell_envelope 0.035 0.581
  +
Cellular_processes 0.041 0.560
  +
Central_intermediary_metabolism => 0.321 5.096
  +
Energy_metabolism 0.208 2.310
  +
Fatty_acid_metabolism 0.023 1.738
  +
Purines_and_pyrimidines 0.257 1.059
  +
Regulatory_functions 0.031 0.194
  +
Replication_and_transcription 0.170 0.636
  +
Translation 0.047 1.078
  +
Transport_and_binding 0.029 0.071
  +
  +
# Enzyme/nonenzyme Prob Odds
  +
Enzyme => 0.769 2.683
  +
Nonenzyme 0.231 0.324
  +
  +
# Enzyme class Prob Odds
  +
Oxidoreductase (EC 1.-.-.-) 0.178 0.857
  +
Transferase (EC 2.-.-.-) 0.238 0.690
  +
Hydrolase (EC 3.-.-.-) 0.190 0.601
  +
Lyase (EC 4.-.-.-) 0.076 1.614
  +
Isomerase (EC 5.-.-.-) 0.010 0.321
  +
Ligase (EC 6.-.-.-) => 0.085 1.673
  +
  +
# Gene Ontology category Prob Odds
  +
Signal_transducer 0.098 0.458
  +
Receptor 0.006 0.038
  +
Hormone 0.001 0.206
  +
Structural_protein 0.005 0.170
  +
Transporter 0.025 0.226
  +
Ion_channel 0.009 0.163
  +
Voltage-gated_ion_channel 0.004 0.170
  +
Cation_channel 0.010 0.215
  +
Transcription 0.060 0.470
  +
Transcription_regulation 0.053 0.427
  +
Stress_response 0.010 0.110
  +
Immune_response 0.012 0.136
  +
Growth_factor 0.009 0.609
  +
Metal_ion_transport 0.012 0.025
  +
  +
//
  +
</pre>
  +
  +
The protein is predicted to have a function in the central intermediary metabolism, be an enzyme and belong to the enzyme class EC 6 (ligase). The last aspect is not correct, since it is a oxidoreductase (compare [[#Information from UniProt and PDB|Information from UniProt and PDB]]).
  +
  +
  +
==== Pfam ====
  +
  +
Pfam sequence search gives one significant Pfam-A match:
  +
  +
* E1_dh (dehydrogenase E1 component, [http://pfam.sanger.ac.uk/family/PF00676.15 PF00676])
  +
** use thiamine pyrophosphate as cofactor
  +
** includes pyruvate dehydrogenase, 2-oxoglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase
  +
** members of multienzyme complex
  +
** interactions: E1_dh, Transketolase_C (transketolase, C-terminal domain), Transket_pyr (transketolase, pyrimidine binding domain)
  +
** 9023 sequences
  +
** belongs to clan THDP-binding ([http://pfam.sanger.ac.uk/clan/CL0254 CL0254]): thiamin diphosphate-binding superfamily
   
 
=== Discussion ===
 
=== Discussion ===
  +
  +
The GOPET results give a good overview about the catalytic and binding activities of our protein, of which most are consistent with the current knowledge. Whether the enzyme can really accept the other substrates pyruvate, 2-dehydropantoate and acetoin, as the prediction suggests, is not clear. It is possible that these activities were only predicted because of the high sequence identity to pyruvate dehydrogenase and other dehydrogenases. The fact that the confidences for these predictions are not as high as for some of the others, argues for this interpretation.
  +
  +
That ProtFun predicted the false enzyme class for our protein shows that this prediction is not always easy. In support of the method it has to be stated, that the probability and odds values for the different enzyme classes are close to each other. Also these values for the "right" enzyme class (EC 1) are some of the higher ones.
  +
  +
Pfam helps to find related proteins that are clustered into families that have common domains. Families are grouped together in clans, so one can also find out more about the distant relationship between proteins. Pfam found a family with different dehydrogenases from diverse organisms that are homolog to our protein. They have in common to be the first part of a large enzyme complex. So the concept of oxidative decarboxylation has adapted to different substrates and different organisms during evolution, but still uses the same basic principle.
  +
  +
Other methods for GO term prediction are [http://kinaz.fen.bilkent.edu.tr/gopred/ GOPred], [http://www.blast2go.com/b2ghome Blast2GO] and [http://eagl.unige.ch/GOCat/ GOCat].
  +
  +
  +
  +
From protein sequence alone, additional features can be predicted:
  +
* solvent accesibility
  +
* posttranslational modifications
  +
* localization
  +
* metal binding sites
  +
* active sites
  +
* disulfide bridges
  +
* SNP effects

Latest revision as of 15:45, 28 August 2013

Secondary structure

Lab journal

Result

Approach for predicting secondary structure with ReProf

For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.


Comparison of ReProf prediction (fasta input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.719 0.585 0.645
E 0.211 0.500 0.296
L 0.616 0.654 0.635


Comparison of ReProf prediction (big_80 PSSM input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.944 0.889 0.916
E 0.649 0.685 0.667
L 0.826 0.866 0.846


Comparison of ReProf prediction (SwissProt PSSM input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.923 0.914 0.919
E 0.807 0.523 0.634
L 0.719 0.859 0.782


Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.

The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.


Comparison of ReProf to PsiPred and DSSP

The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.


P12694

secondary structure element ReProf vs. PsiPred ReProf vs. DSSP
H 0.804 0.812
E 0.400 0.585
L 0.876 0.782
all 0.849 0.816


P10775

secondary structure element ReProf vs. PsiPred ReProf vs. DSSP
H 0.798 0.889
E 0.691 0.649
L 0.779 0.828
all 0.849 0.855


Q08209

secondary structure element ReProf vs. PsiPred ReProf vs. DSSP
H 0.794 0.816
E 0.487 0.615
L 0.830 0.807
all 0.827 0.807


Q9X0E6

secondary structure element ReProf vs. PsiPred ReProf vs. DSSP
H 0.897 0.923
E 0.694 0.643
L 0.636 0.545
all 0.802 0.802


Altogether, ReProf agrees in 80-85% of the predictions with PsiPred and DSSP. In most cases the agreement for H and L is higher than for E.

Information from UniProt and PDB

A summary of interesting features for the proteins taken from UniProt and PDB:


P12694, 2BFD:

  • name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
  • EC: 1.2.4.4
  • gene: BCKDHA
  • organism: Homo sapiens (Human)
  • sequence length: 445 AA
  • subunit structure: heterotetramer of alpha and beta chains
  • subcellular location: mitochondrion matrix
  • secondary structure: 42% helical, 10% beta sheet
  • 3D similarity: pyruvate dehydrogenase E1
  • ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate


P10775, 2BNH:

  • name: ribonuclease inhibitor
  • gene: RNH1
  • organism: Sus scrofa (Pig)
  • sequence length: 456 AA
  • subcellular location. cytoplasm
  • sequence similarities: contains 15 LRR (leucine-rich) repeats
  • secondary structure: alternating helix and strand, 42% helical, 12% beta sheet


Q08209, 1AUI:

  • name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
  • EC: 3.1.3.16
  • gene: PPP3CA
  • organism: Homo sapiens (Human)
  • sequence length: 521 AA
  • subunit structure: heterodimer of alpha and beta chain (human calcineurin heterodimer)
  • subcellular location: nucleus
  • secondary structure: 27% helical, 11% beta sheet
  • ligands: calcium ion, Fe (III) ion, zinc ion


Q9X0E6, 1KR4:

  • name: divalent-cation tolerance protein CutA
  • gene: cutA
  • organism: Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)
  • sequence length: 101 AA
  • subunit structure: homotrimer
  • subcellular location: cytoplasm
  • secondary structure: great fraction of strands, 29% helical, 35% beta sheet

Discussion

In the first step, the ReProf results for P10775 were evaluated against the DSSP assignment. Here, DSSP was viewed as a standard of truth, because it assigns secondary structure based on the measured 3D structure by examining angles and H bonds between atoms.

The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that do not alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 for the remaining proteins because it showed a slightly better performance in our test with the example protein P10775.

For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops. A reason for this could be that beta strands are less frequent than alpha helices in most proteins, as can be seen in the informations taken from UniProt and PBD.

Other method for prediction of secondary structure are GOR, Jpred and PHDsec. Another secondary structure assignment tool is STRIDE.

Disordered protein

Lab journal

Result

IUPred

IUPred performs prediction of intrinsic disordered regions of proteins by the observation of pairwise energy content from proteins with known structures. The results of IUPred fall into 3 categories: long (disordered regions), short (disordered regions) and globular (regions of protein where pairwise interactions between residues are more potential).

For the prediction of disordered regions (prediction mode long and short), position-specific score is assigned to each residue of the query sequence. The score indicates the tendency that the corresponding residue belongs to a disordered region. Following are the profiles of disorder tendency generated by web-tool of IUPred:

BCKDHA
P10775
Q9X0E6
Q08209
Statistics

Metadisorder

DisProt

Because DisProt dose not contain all uniprot sequences, we have searched for homologous sequences in DisProt for the protein sequences.

Discussion

IUPred

Structural stability of Q9X0E6. Red parts have symmetry-related crystal contacts (within 5 Å). Thickness of backbone represents variation of B-values.
  • Generally the profiles of long and short disorders are similar because they are overall highly correlated (except for protein Q9X0E6).
  • At both ends of the protein short disorder scores are much higher. This reflects the fact that residues at ends of proteins have higher spatial flexibility and are structurally more unstable.
  • High short disorder scores of Q9X0E6 can be explained by its structure.
    • As is shown in its X-ray structure, at the both ends of the thick representation of backbone indicates high B-values which mean the conformation of residues is either very flexible or even undefined.
    • While red parts of the protein show symmetry-related crystal contacts (within 5 Å), the remaining parts have barely symmetry-related crystal contacts.
    • The overall lower long disorder score may be explained by the fact that the protein still fold into a definite 3D structure, despite of local flexibility.
  • Q9X0E6 also falls into different category in comparison to the other 3 proteins.
    • As is described in InterPro its function is not clear but should have a role in signal transduction.
    • The other 3 proteins are either subunits or inhibitor of enzymes.

MetaDisorder

  • In comparison to IUPred, MetaDisorder seems to be more sensitive to input data. It finds out more short structural regions that locate in disordered regions.
    • For sequence Q08209, the prediction result of MetaDisorder shows almost the same disordered regions which are annotated in DisProt.
  • Generally IUPred and MetaDisorder share very similar results.
    • For sequence Q9X0E6, the predicted results are most dissimilar. The annotation of homologous sequence in DisProt can not show significant information for comparison to prediction methods because the E-value of local alignment is too high.

Comparison between Prediction and Annotation

The prediction results of Human Sulfotransferase 1A3/1A4(P50224) are dissimilar to the annotations in DisProt. It seems structural features like B-factors and symmetry-related crystal contacts give important clue to intrinsically disordered regions. As is shown in PDB structure of P50224, large range of amino-acids have high B-factors.

X-ray crystallographic structure of P50224 (Source: PDBe)

Transmembrane helices

Lab journal

Result

PolyPhobius

Except BCKDHA all other proteins were predicted to have transmembrane helices. Although there is a weak signal for transmembrane region, the protein P35462 is predicted to have 6 transmembrane helices. The protein P47863 is predicted to have 6 transmembrane helices. And protein Q9YDF8 is predicted to have 7 transmembrane helices.

MEMSAT-SVM

BCKDHA is predicted to have 1 transmembrane helix. All the other proteins are predicted to have 6 transmembrane helices.

Annotations in OPM and PDBTM

  • BCKDHA: no annotation was found in OPM and PDBTM.
  • P35462(3PBL):
  • P47863(2D57): The PDB structure is a homo tetramer. Protein P47863 is one of the 4 identical chains.
  • Q9YDF8(2KYH):

Discussion

  • Comparison between OPM and PDBTM:
    • PDBTM seems to have more narrow transmembrane region than the OPM.
    • For same structure, PDBTM tends to assign less transmembrane helices in comparison to OPM.
  • As we already know, BCKDHA is a intra-mitochondrial protein. The prediction result of MEMSAT-SVM is wrong.
  • Generally prediction results of PolyPhobius and MEMSAT-SVM are similar to annotations in OPM and PDBTM.

Signal peptides

Lab journal

Result

SignalP predictions

The following diagrams show the C- (cleavage site), S- (signal peptide) and Y- (combined cleavage site) scores for the three proteins according to SignalP version 4.1. The combined cleavage site score combines the cleavage score with the slope of the signal peptide score to optimize the recognition of cleavage sites.

P02768 signalp.png


P47863 signalp.png


P11279 signalp.png


For P02768 and P11279 signal peptides are predicted, with the clevage site between position 18 and 19 for P02768 and between 28 and 29 for P11279. For P47863, no signal peptide is predicted.

SignalP version 3.0 came to the same result for P02768 and P11279. However for P47863 it predicts a signal peptide with cleavage site between positions 54 and 55 (neural network) or 56 and 57 (hidden markov model), although only with the probability 0.723 compared to near 1 for the other two proteins.

Known signal peptides

On the Signal Peptide Website there are entries for P02768 and P11279 but not for P47863:


Accession Number Entry Name Protein Name Organism Length Status Signal Sequence
P02768 ALBU_HUMAN Serum albumin Homo sapiens 18 confirmed MKWVTFISLLFLFSSAYS
P11279 LAMP1_HUMAN Lysosome-associated membrane glycoprotein 1 Homo sapiens 28 confirmed MAAPGSARRPLLLLLLLLLLGLMHCASA

Discussion

The predictions of the newest version of SignalP agree with the confirmed signal peptides. The older version predicted a signal peptide in P47863, where no one is. According to UniProt, P47863 has transmembrane helices, so these might be mistaken for a signal peptide by the old version because they resemble each other.

Other methods for signal peptide prediction are Phobius, PrediSi and Signal-3L.

GO terms

Lab journal

Result

GOPET

GOid Aspect Confidence GO term
GO:0003824 F 97% catalytic activity
GO:0016491 F 96% oxidoreductase activity
GO:0016624 F 95% oxidoreductase activity acting on the aldehyde or oxo group of donors disulfide as acceptor
GO:0003863 F 90% 3-methyl-2-oxobutanoate dehydrogenase 2-methylpropanoyl-transferring activity
GO:0004739 F 89% pyruvate dehydrogenase acetyl-transferring activity
GO:0004738 F 78% pyruvate dehydrogenase activity
GO:0003826 F 77% alpha-ketoacid dehydrogenase activity
GO:0047101 F 75% 2-oxoisovalerate dehydrogenase acylating activity
GO:0008677 F 65% 2-dehydropantoate 2-reductase activity
GO:0019152 F 63% acetoin dehydrogenase activity
GO:0030955 F 63% potassium ion binding
GO:0016616 F 62% oxidoreductase activity acting on the CH-OH group of donors NAD or NADP as acceptor
GO:0046872 F 62% metal ion binding


Most terms concerning the catalytic activity (GO:0003824, GO:0016491, GO:0016624, GO:0003863, GO:0003826, GO:0047101) are consistent with the knowledge about the enzyme activity of the 2-oxoisovalerate dehydrogenase (see biochemical description of MSUD). Also the terms about metal binding (GO:0030955, GO:0046872) correspond to the characterization in PDB.

ProtFun

############## ProtFun 2.2 predictions ##############

>gi_11386135

# Functional category                  Prob     Odds
  Amino_acid_biosynthesis              0.187    8.520
  Biosynthesis_of_cofactors            0.246    3.413
  Cell_envelope                        0.035    0.581
  Cellular_processes                   0.041    0.560
  Central_intermediary_metabolism   => 0.321    5.096
  Energy_metabolism                    0.208    2.310
  Fatty_acid_metabolism                0.023    1.738
  Purines_and_pyrimidines              0.257    1.059
  Regulatory_functions                 0.031    0.194
  Replication_and_transcription        0.170    0.636
  Translation                          0.047    1.078
  Transport_and_binding                0.029    0.071

# Enzyme/nonenzyme                     Prob     Odds
  Enzyme                            => 0.769    2.683
  Nonenzyme                            0.231    0.324

# Enzyme class                         Prob     Odds
  Oxidoreductase (EC 1.-.-.-)          0.178    0.857
  Transferase    (EC 2.-.-.-)          0.238    0.690
  Hydrolase      (EC 3.-.-.-)          0.190    0.601
  Lyase          (EC 4.-.-.-)          0.076    1.614
  Isomerase      (EC 5.-.-.-)          0.010    0.321
  Ligase         (EC 6.-.-.-)       => 0.085    1.673

# Gene Ontology category               Prob     Odds
  Signal_transducer                    0.098    0.458
  Receptor                             0.006    0.038
  Hormone                              0.001    0.206
  Structural_protein                   0.005    0.170
  Transporter                          0.025    0.226
  Ion_channel                          0.009    0.163
  Voltage-gated_ion_channel            0.004    0.170
  Cation_channel                       0.010    0.215
  Transcription                        0.060    0.470
  Transcription_regulation             0.053    0.427
  Stress_response                      0.010    0.110
  Immune_response                      0.012    0.136
  Growth_factor                        0.009    0.609
  Metal_ion_transport                  0.012    0.025

//

The protein is predicted to have a function in the central intermediary metabolism, be an enzyme and belong to the enzyme class EC 6 (ligase). The last aspect is not correct, since it is a oxidoreductase (compare Information from UniProt and PDB).


Pfam

Pfam sequence search gives one significant Pfam-A match:

  • E1_dh (dehydrogenase E1 component, PF00676)
    • use thiamine pyrophosphate as cofactor
    • includes pyruvate dehydrogenase, 2-oxoglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase
    • members of multienzyme complex
    • interactions: E1_dh, Transketolase_C (transketolase, C-terminal domain), Transket_pyr (transketolase, pyrimidine binding domain)
    • 9023 sequences
    • belongs to clan THDP-binding (CL0254): thiamin diphosphate-binding superfamily

Discussion

The GOPET results give a good overview about the catalytic and binding activities of our protein, of which most are consistent with the current knowledge. Whether the enzyme can really accept the other substrates pyruvate, 2-dehydropantoate and acetoin, as the prediction suggests, is not clear. It is possible that these activities were only predicted because of the high sequence identity to pyruvate dehydrogenase and other dehydrogenases. The fact that the confidences for these predictions are not as high as for some of the others, argues for this interpretation.

That ProtFun predicted the false enzyme class for our protein shows that this prediction is not always easy. In support of the method it has to be stated, that the probability and odds values for the different enzyme classes are close to each other. Also these values for the "right" enzyme class (EC 1) are some of the higher ones.

Pfam helps to find related proteins that are clustered into families that have common domains. Families are grouped together in clans, so one can also find out more about the distant relationship between proteins. Pfam found a family with different dehydrogenases from diverse organisms that are homolog to our protein. They have in common to be the first part of a large enzyme complex. So the concept of oxidative decarboxylation has adapted to different substrates and different organisms during evolution, but still uses the same basic principle.

Other methods for GO term prediction are GOPred, Blast2GO and GOCat.


From protein sequence alone, additional features can be predicted:

  • solvent accesibility
  • posttranslational modifications
  • localization
  • metal binding sites
  • active sites
  • disulfide bridges
  • SNP effects