Task 3 (MSUD)
Contents
Secondary structure
Result
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
/mnt/home/student/schillerl/MasterPractical/task3/reprof/
/mnt/home/student/schillerl/MasterPractical/task3/psipred/
/mnt/home/student/schillerl/MasterPractical/task3/dssp/
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see /mnt/home/student/schillerl/MasterPractical/task3/pssm/
) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.719 | 0.585 | 0.645 |
E | 0.211 | 0.500 | 0.296 |
L | 0.616 | 0.654 | 0.635 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.944 | 0.889 | 0.916 |
E | 0.649 | 0.685 | 0.667 |
L | 0.826 | 0.866 | 0.846 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.923 | 0.914 | 0.919 |
E | 0.807 | 0.523 | 0.634 |
L | 0.719 | 0.859 | 0.782 |
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM.
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used.