Difference between revisions of "Task 3 (MSUD)"
(→Result) |
(→Result) |
||
Line 15: | Line 15: | ||
− | For P10775, ReProf was run with the protein sequence fasta file and |
+ | For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see <code>/mnt/home/student/schillerl/MasterPractical/task3/pssm/</code>) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator. |
− | |||
− | Recall and Precision are defined as follows: |
||
− | |||
− | * recall = TP / (TP + FN) |
||
− | |||
− | * precision = TP / (TP + FP) |
||
− | |||
− | * f-measure = 2 * recall * precision / (recall + precision) |
||
− | |||
− | where TP means true positive, FP false positive and FN false negative. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator. |
||
Line 66: | Line 56: | ||
|} |
|} |
||
+ | |||
+ | Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM. |
||
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used. |
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used. |
Revision as of 13:46, 16 May 2013
Contents
Secondary structure
Result
The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:
/mnt/home/student/schillerl/MasterPractical/task3/reprof/
/mnt/home/student/schillerl/MasterPractical/task3/psipred/
/mnt/home/student/schillerl/MasterPractical/task3/dssp/
For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see /mnt/home/student/schillerl/MasterPractical/task3/pssm/
) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.719 | 0.585 | 0.645 |
E | 0.211 | 0.500 | 0.296 |
L | 0.616 | 0.654 | 0.635 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.944 | 0.889 | 0.916 |
E | 0.649 | 0.685 | 0.667 |
L | 0.826 | 0.866 | 0.846 |
secondary structure element | recall | precision | f-measure |
---|---|---|---|
H | 0.923 | 0.914 | 0.919 |
E | 0.807 | 0.523 | 0.634 |
L | 0.719 | 0.859 | 0.782 |
Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM.
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used.