Difference between revisions of "Task 3 (MSUD)"

From Bioinformatikpedia
(Result)
(Result)
Line 3: Line 3:
   
 
=== Result ===
 
=== Result ===
 
==== Secondary structure prediction and assignment ====
 
   
 
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
 
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
Line 17: Line 15:
   
   
  +
Position specific scoring matrices (PSSM) used as input for ReProt are located at:
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see <code>/mnt/home/student/schillerl/MasterPractical/task3/pssm/</code>) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
 
  +
  +
<code>
  +
/mnt/home/student/schillerl/MasterPractical/task3/pssm/
  +
</code>
  +
  +
  +
==== Approach for predicting secondary structure with ReProf ====
  +
  +
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
   
   
Line 59: Line 66:
   
   
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM.
+
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.
   
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used.
+
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.
  +
  +
  +
==== Comparison of ReProf to PsiPred and DSSP ====
  +
  +
The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.
  +
  +
===== P12694 =====
  +
  +
{| class="wikitable" border="1" style="width:400px"
  +
!secondary structure element !! PsiPred !! DSSP
  +
|-
  +
|H || 0.804 || 0.812
  +
|-
  +
|E || 0.400 || 0.585
  +
|-
  +
|L || 0.876 || 0.782
  +
|-
  +
|all || 0.849 || 0.816
  +
|}
  +
  +
  +
===== P10775 =====
  +
  +
{| class="wikitable" border="1" style="width:400px"
  +
!secondary structure element !! PsiPred !! DSSP
  +
|-
  +
|H || 0.798 || 0.889
  +
|-
  +
|E || 0.691 || 0.649
  +
|-
  +
|L || 0.779 || 0.828
  +
|-
  +
|all || 0.849 || 0.855
  +
|}
  +
  +
  +
===== Q08209 =====
  +
  +
{| class="wikitable" border="1" style="width:400px"
  +
!secondary structure element !! PsiPred !! DSSP
  +
|-
  +
|H || 0.794 || 0.816
  +
|-
  +
|E || 0.487 || 0.615
  +
|-
  +
|L || 0.830 || 0.807
  +
|-
  +
|all || 0.827 || 0.807
  +
|}
  +
  +
  +
===== Q9X0E6 =====
  +
  +
{| class="wikitable" border="1" style="width:400px"
  +
!secondary structure element !! PsiPred !! DSSP
  +
|-
  +
|H || 0.897 || 0.923
  +
|-
  +
|E || 0.694 || 0.643
  +
|-
  +
|L || 0.636 || 0.545
  +
|-
  +
|all || 0.802 || 0.802
  +
|}
   
   
Line 68: Line 139:
 
A summary of interesting features for the proteins:
 
A summary of interesting features for the proteins:
   
===== P12694, 2BFD: =====
+
===== P12694, 2BFD =====
 
* name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
 
* name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
 
* EC: 1.2.4.4
 
* EC: 1.2.4.4
Line 80: Line 151:
 
* ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate
 
* ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate
   
===== P10775, 2BNH: =====
+
===== P10775, 2BNH =====
 
* name: ribonuclease inhibitor
 
* name: ribonuclease inhibitor
 
* gene: RNH1
 
* gene: RNH1
Line 89: Line 160:
 
* secondary structure: alternating helix and strand, 42% helical, 12% beta sheet
 
* secondary structure: alternating helix and strand, 42% helical, 12% beta sheet
   
===== Q08209, 1AUI: =====
+
===== Q08209, 1AUI =====
 
* name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
 
* name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
 
* EC: 3.1.3.16
 
* EC: 3.1.3.16
Line 100: Line 171:
 
* ligands: calcium ion, Fe (III) ion, zinc ion
 
* ligands: calcium ion, Fe (III) ion, zinc ion
   
===== Q9X0E6, 1KR4: =====
+
===== Q9X0E6, 1KR4 =====
 
* name: divalent-cation tolerance protein CutA
 
* name: divalent-cation tolerance protein CutA
 
* gene: cutA
 
* gene: cutA

Revision as of 21:18, 16 May 2013

Secondary structure

Lab journal

Result

The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:

/mnt/home/student/schillerl/MasterPractical/task3/reprof/

/mnt/home/student/schillerl/MasterPractical/task3/psipred/

/mnt/home/student/schillerl/MasterPractical/task3/dssp/


Position specific scoring matrices (PSSM) used as input for ReProt are located at:

/mnt/home/student/schillerl/MasterPractical/task3/pssm/


Approach for predicting secondary structure with ReProf

For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.


Comparison of ReProf prediction (fasta input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.719 0.585 0.645
E 0.211 0.500 0.296
L 0.616 0.654 0.635


Comparison of ReProf prediction (big_80 PSSM input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.944 0.889 0.916
E 0.649 0.685 0.667
L 0.826 0.866 0.846


Comparison of ReProf prediction (SwissProt PSSM input) to DSSP assignment
secondary structure element recall precision f-measure
H 0.923 0.914 0.919
E 0.807 0.523 0.634
L 0.719 0.859 0.782


Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.

The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.


Comparison of ReProf to PsiPred and DSSP

The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.

P12694
secondary structure element PsiPred DSSP
H 0.804 0.812
E 0.400 0.585
L 0.876 0.782
all 0.849 0.816


P10775
secondary structure element PsiPred DSSP
H 0.798 0.889
E 0.691 0.649
L 0.779 0.828
all 0.849 0.855


Q08209
secondary structure element PsiPred DSSP
H 0.794 0.816
E 0.487 0.615
L 0.830 0.807
all 0.827 0.807


Q9X0E6
secondary structure element PsiPred DSSP
H 0.897 0.923
E 0.694 0.643
L 0.636 0.545
all 0.802 0.802


Information from UniProt and PDB

A summary of interesting features for the proteins:

P12694, 2BFD
  • name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
  • EC: 1.2.4.4
  • gene: BCKDHA
  • organism: Homo sapiens (Human)
  • sequence length: 445 AA
  • subunit structure: heterotetramer of alpha and beta chains
  • subcellular location: mitochondrion matrix
  • secondary structure: 42% helical, 10% beta sheet
  • 3D similarity: pyruvate dehydrogenase E1
  • ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate
P10775, 2BNH
  • name: ribonuclease inhibitor
  • gene: RNH1
  • organism: Sus scrofa (Pig)
  • sequence length: 456 AA
  • subcellular location. cytoplasm
  • sequence similarities: contains 15 LRR (leucine-rich) repeats
  • secondary structure: alternating helix and strand, 42% helical, 12% beta sheet
Q08209, 1AUI
  • name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
  • EC: 3.1.3.16
  • gene: PPP3CA
  • organism: Homo sapiens (Human)
  • sequence length: 521 AA
  • subunit structure: heterodimer of alpha and beta chain (human calcineurin heterodimer)
  • subcellular location: nucleus
  • secondary structure: 27% helical, 11% beta sheet
  • ligands: calcium ion, Fe (III) ion, zinc ion
Q9X0E6, 1KR4
  • name: divalent-cation tolerance protein CutA
  • gene: cutA
  • organism: Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)
  • sequence length: 101 AA
  • subunit structure: homotrimer
  • subcellular location: cytoplasm
  • secondary structure: great fraction of strands, 29% helical, 35% beta sheet

Discussion

The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that don't alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 because it showed a slightly better performance in our test with the example protein P10775.

Disordered protein

Lab journal

Result

Discussion

Transmembrane helices

Lab journal

Result

Discussion

Signal peptides

Lab journal

Result

Discussion

GO terms

Lab journal

Result

Discussion