Difference between revisions of "Task 3 (MSUD)"
(→Result) |
(→IUPred) |
||
Line 233: | Line 233: | ||
File:Profile Q08209 diff.png|Difference between scores of long range disorders and short range disorders |
File:Profile Q08209 diff.png|Difference between scores of long range disorders and short range disorders |
||
File:Q08209_globular.png|Prediction of globular regions |
File:Q08209_globular.png|Prediction of globular regions |
||
+ | </gallery> |
||
+ | |||
+ | ===== Statistics ===== |
||
+ | <gallery perrow=2 widths=400px heights=280px caption="Relations between profiles of long and short disorders"> |
||
+ | File:RMSD all.png |
||
+ | File:Correlation all.png |
||
</gallery> |
</gallery> |
||
Revision as of 00:08, 20 May 2013
Contents
Secondary structure
Result
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
/mnt/home/student/schillerl/MasterPractical/task3/reprof/
/mnt/home/student/schillerl/MasterPractical/task3/psipred/
/mnt/home/student/schillerl/MasterPractical/task3/dssp/
Position specific scoring matrices (PSSM) used as input for ReProt are located at:
/mnt/home/student/schillerl/MasterPractical/task3/pssm/
Approach for predicting secondary structure with ReProf
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.719 | 0.585 | 0.645 |
E | 0.211 | 0.500 | 0.296 |
L | 0.616 | 0.654 | 0.635 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.944 | 0.889 | 0.916 |
E | 0.649 | 0.685 | 0.667 |
L | 0.826 | 0.866 | 0.846 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.923 | 0.914 | 0.919 |
E | 0.807 | 0.523 | 0.634 |
L | 0.719 | 0.859 | 0.782 |
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse for H than those using the SwissProt PSMM.
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) was used.
Comparison of ReProf to PsiPred and DSSP
The following tables show the percentages of agreement for secondary structure between ReProf and PsiPred or DSSP.
P12694
secondary structure element | PsiPred | DSSP |
---|---|---|
H | 0.804 | 0.812 |
E | 0.400 | 0.585 |
L | 0.876 | 0.782 |
all | 0.849 | 0.816 |
P10775
secondary structure element | PsiPred | DSSP |
---|---|---|
H | 0.798 | 0.889 |
E | 0.691 | 0.649 |
L | 0.779 | 0.828 |
all | 0.849 | 0.855 |
Q08209
secondary structure element | PsiPred | DSSP |
---|---|---|
H | 0.794 | 0.816 |
E | 0.487 | 0.615 |
L | 0.830 | 0.807 |
all | 0.827 | 0.807 |
Q9X0E6
secondary structure element | PsiPred | DSSP |
---|---|---|
H | 0.897 | 0.923 |
E | 0.694 | 0.643 |
L | 0.636 | 0.545 |
all | 0.802 | 0.802 |
Altogether, ReProf agrees in 80-85% of the predictions with PsiPred and DSSP. In most cases the agreement for H and L is higher than for E.
Information from UniProt and PDB
A summary of interesting features for the proteins:
P12694, 2BFD:
- name: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
- EC: 1.2.4.4
- gene: BCKDHA
- organism: Homo sapiens (Human)
- sequence length: 445 AA
- subunit structure: heterotetramer of alpha and beta chains
- subcellular location: mitochondrion matrix
- secondary structure: 42% helical, 10% beta sheet
- 3D similarity: pyruvate dehydrogenase E1
- ligands: chloride ion, glycerol, potassium ion, manganese (II) ion, (4S)-2-methyl-2,4-pentanediol, thiamin diphosphate
P10775, 2BNH:
- name: ribonuclease inhibitor
- gene: RNH1
- organism: Sus scrofa (Pig)
- sequence length: 456 AA
- subcellular location. cytoplasm
- sequence similarities: contains 15 LRR (leucine-rich) repeats
- secondary structure: alternating helix and strand, 42% helical, 12% beta sheet
Q08209, 1AUI:
- name: serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform
- EC: 3.1.3.16
- gene: PPP3CA
- organism: Homo sapiens (Human)
- sequence length: 521 AA
- subunit structure: heterodimer of alpha and beta chain (human calcineurin heterodimer)
- subcellular location: nucleus
- secondary structure: 27% helical, 11% beta sheet
- ligands: calcium ion, Fe (III) ion, zinc ion
Q9X0E6, 1KR4:
- name: divalent-cation tolerance protein CutA
- gene: cutA
- organism: Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)
- sequence length: 101 AA
- subunit structure: homotrimer
- subcellular location: cytoplasm
- secondary structure: great fraction of strands, 29% helical, 35% beta sheet
Discussion
In the first step, the ReProf results for P10775 were evaluated against the DSSP assignment. Here, DSSP was viewed as "the truth", because it assigns secondary structure based on the measured 3D structure by examining angles and H bonds between atoms.
The prediction of secondary structure is much better if a PSSM is used instead of the sequence. The reason is that a PSSM describes the requirements for each position better than the amino acid sequence, because it uses evolutionary information. So it identifies for each position alternatives for the residues in the primary sequence, that don't alter the overall structure of the protein. The difference between the usage of big_80 or SwissProt for generating the PSSM is not that obvious, but we decided to take big_80 for the remaining proteins because it showed a slightly better performance in our test with the example protein P10775.
For all proteins these ReProf results were compared to PsiPred and DSSP, to see how much the methods agree in the secondary structure assignment. The methods agree for most residues. The highest discrepancies can be observed for the prediction of beta strands, which shows that these are not as easy to predict as alpha helices or loops. A reason for this could be that beta strands are less frequent than alpha helices in most proteins, as can be seen in the informations taken from UniProt and PBD.
Disordered protein
Result
IUPred
IUPred performs prediction of intrinsic disordered regions of proteins by the observation of pairwise energy content from proteins with known structures. The results of IUPred fall into 3 categories: long (disordered regions), short (disordered regions) and globular (regions of protein where pairwise interactions between residues are more potential).
For the prediction of disordered regions (prediction mode long and short), position-specific score is assigned to each residue of the query sequence. The score indicates the tendency that the corresponding residue belongs disordered region. Following are the profiles of disorder tendency generated by web-tool of IUPred:
BCKDHA
P10775
Q9X0E6
Q08209
Statistics
Discussion
Transmembrane helices
Result
Discussion
Signal peptides
Result
SignalP predictions
The following diagrams show the C- (cleavage site), S- (signal peptide) and Y- (combined cleavage site) scores for the three proteins according to SignalP version 4.1. The combined cleavage site score combines the cleavage score with the slope of the signal peptide score to optimize the recognition of cleavage sites.
For P02768 and P11279 signal peptides are predicted, with the clevage site between position 18 and 19 for P02768 and between 28 and 29 for P11279. For P47863, no signal peptide is predicted.
SignalP version 3.0 (results see /mnt/home/student/schillerl/MasterPractical/task3/signalp/
) came to the same result for P02768 and P11279. However for P47863 it predicts a signal peptide with cleavage site between positions 54 and 55 (neural network) or 56 and 57 (hidden markov model), although only with the probability 0.723 compared to near 1 for the other two proteins.
Known signal peptides
On the Signal Peptide Website there are entries for P02768 and P11279 but not for P47863:
Accession Number | Entry Name | Protein Name | Organism | Length | Status | Signal Sequence |
---|---|---|---|---|---|---|
P02768 | ALBU_HUMAN | Serum albumin | Homo sapiens | 18 | confirmed | MKWVTFISLLFLFSSAYS |
P11279 | LAMP1_HUMAN | Lysosome-associated membrane glycoprotein 1 | Homo sapiens | 28 | confirmed | MAAPGSARRPLLLLLLLLLLGLMHCASA |
Discussion
The predictions of the newest version of SignalP agree with the confirmed signal peptides. The older version predicted a signal peptide in P47863, where no one is. According to UniProt, P47863 has transmembrane helices, so these might be mistaken for a signal peptide by the old version because they resemble each other.