# Difference between revisions of "Task 3 (MSUD)"

(→Result) |
(→Result) |
||

Line 15: | Line 15: | ||

− | For P10775, ReProf was run with the protein sequence fasta file and |
+ | For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see <code>/mnt/home/student/schillerl/MasterPractical/task3/pssm/</code>) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator. |

− | |||

− | Recall and Precision are defined as follows: |
||

− | |||

− | * recall = TP / (TP + FN) |
||

− | |||

− | * precision = TP / (TP + FP) |
||

− | |||

− | * f-measure = 2 * recall * precision / (recall + precision) |
||

− | |||

− | where TP means true positive, FP false positive and FN false negative. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator. |
||

Line 66: | Line 56: | ||

|} |
|} |
||

+ | |||

+ | Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM. |
||

The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used. |
The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used. |

## Revision as of 14:46, 16 May 2013

## Contents

## Secondary structure

### Result

The results for ReProf and PsiPred predictions and the DSSP assignments are in the following folders:

```
/mnt/home/student/schillerl/MasterPractical/task3/reprof/
```

```
/mnt/home/student/schillerl/MasterPractical/task3/psipred/
```

```
/mnt/home/student/schillerl/MasterPractical/task3/dssp/
```

For P10775, ReProf was run with the protein sequence fasta file and position specific scoring matrices (PSSM) derived from big_80 and SwissProt (see `/mnt/home/student/schillerl/MasterPractical/task3/pssm/`

) as input. The following tables show the comparison of the prediction results to the secondary structure assignment of DSSP. The f-measure is the harmonic mean of recall and precision, it gives a good indication for the quality of a classificator.

secondary structure element | recall | precision | f-measure |
---|---|---|---|

H | 0.719 | 0.585 | 0.645 |

E | 0.211 | 0.500 | 0.296 |

L | 0.616 | 0.654 | 0.635 |

secondary structure element | recall | precision | f-measure |
---|---|---|---|

H | 0.944 | 0.889 | 0.916 |

E | 0.649 | 0.685 | 0.667 |

L | 0.826 | 0.866 | 0.846 |

secondary structure element | recall | precision | f-measure |
---|---|---|---|

H | 0.923 | 0.914 | 0.919 |

E | 0.807 | 0.523 | 0.634 |

L | 0.719 | 0.859 | 0.782 |

Predictions using a PSSM instead of a simple sequence have a considerably better quality. All methods predict helices better than loops and these better than beta sheets. The results of the run with the big_80 PSMM are better for E and L and only slightly worse than those using the SwissProt PSMM.

The percentages of correctly identified secondary structure (H, E or L) for the three methods are 61 %, 86 % and 82 %. So for the remaining sequences, the method with the best performance (usage of PSSM derived from big_80 as input for ReProf) is used.