Difference between revisions of "Task 3 (MSUD)"

Revision as of 10:30, 17 May 2013

Secondary structure

Lab journal

Result

The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:

/mnt/home/student/schillerl/MasterPractical/task3/reprof/

/mnt/home/student/schillerl/MasterPractical/task3/psipred/

/mnt/home/student/schillerl/MasterPractical/task3/dssp/

Position specific scoring matrices (PSSM) used as input for ReProt are located at:

/mnt/home/student/schillerl/MasterPractical/task3/pssm/

Approach for predicting secondary structure with ReProf

For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.

Comparison of ReProf prediction (fasta input) to DSSP assignment
secondary structure element	recall	precision	f-measure
H	0.719	0.585	0.645
E	0.211	0.500	0.296
L	0.616	0.654	0.635

Comparison of ReProf prediction (big_80 PSSM input) to DSSP assignment
secondary structure element	recall	precision	f-measure
H	0.944	0.889	0.916
E	0.649	0.685	0.667
L	0.826	0.866	0.846

Comparison of ReProf prediction (SwissProt PSSM input) to DSSP assignment
secondary structure element	recall	precision	f-measure
H	0.923	0.914	0.919
E	0.807	0.523	0.634
L	0.719	0.859	0.782

Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.

The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.

Comparison of ReProf to PsiPred and DSSP

The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.

P12694

secondary structure element	PsiPred	DSSP
H	0.804	0.812
E	0.400	0.585
L	0.876	0.782
all	0.849	0.816

P10775

secondary structure element	PsiPred	DSSP
H	0.798	0.889
E	0.691	0.649
L	0.779	0.828
all	0.849	0.855

Q08209

secondary structure element	PsiPred	DSSP
H	0.794	0.816
E	0.487	0.615
L	0.830	0.807
all	0.827	0.807

Q9X0E6

secondary structure element	PsiPred	DSSP
H	0.897	0.923
E	0.694	0.643
L	0.636	0.545
all	0.802	0.802

Altogether, ReProf agrees in 80-85% of the predictions with PsiPred and DSSP. In most cases the agreement for H and L is higher than for E.

Information from UniProt and PDB

A summary of interesting features for the proteins:

P12694, 2BFD:

name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
EC: 1.2.4.4
gene: BCKDHA
organism: Homo sapiens (Human)
sequence length: 445 AA
subunit structure: heterotetramer of alpha and beta chains
subcellular location: mitochondrion matrix
secondary structure: 42% helical, 10% beta sheet
3D similarity: pyruvate dehydrogenase E1
ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate

P10775, 2BNH:

name: ribonuclease inhibitor
gene: RNH1
organism: Sus scrofa (Pig)
sequence length: 456 AA
subcellular location. cytoplasm
sequence similarities: contains 15 LRR (leucine-rich) repeats
secondary structure: alternating helix and strand, 42% helical, 12% beta sheet

Q08209, 1AUI:

name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
EC: 3.1.3.16
gene: PPP3CA
organism: Homo sapiens (Human)
sequence length: 521 AA
subunit structure: heterodimer of alpha and beta chain (human calcineurin heterodimer)
subcellular location: nucleus
secondary structure: 27% helical, 11% beta sheet
ligands: calcium ion, Fe (III) ion, zinc ion

Q9X0E6, 1KR4:

name: divalent-cation tolerance protein CutA
gene: cutA
organism: Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)
sequence length: 101 AA
subunit structure: homotrimer
subcellular location: cytoplasm
secondary structure: great fraction of strands, 29% helical, 35% beta sheet

Discussion

In the first step, the ReProf results for P10775 were evaluated against the DSSP assignment. Here, DSSP was viewed as "the truth", because it assigns secondary structure based on the measured 3D structure by examining angles and H bonds between atoms.

The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that don't alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 for the remaining proteins because it showed a slightly better performance in our test with the example protein P10775.

For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops. A reason for this could be that beta strands are less frequent than alpha helices in most proteins, as can be seen in the informations taken from UniProt and PBD.

Difference between revisions of "Task 3 (MSUD)"

Revision as of 10:30, 17 May 2013

Contents

Secondary structure

Result

Approach for predicting secondary structure with ReProf

Comparison of ReProf to PsiPred and DSSP

Information from UniProt and PDB

Discussion

Disordered protein

Result

Discussion

Transmembrane helices

Result

Discussion

Signal peptides

Result

Discussion

GO terms

Result

Discussion

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools

@@ Line 193: / Line 193: @@
 The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that don't alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 for the remaining proteins because it showed a slightly better performance in our test with the example protein P10775.
-For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops.
+For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops. A reason for this could be that beta strands are less frequent than alpha helices in most proteins, as can be seen in the informations taken from UniProt and PBD.
 == Disordered protein ==