Canavan Disease: Task 03 - Sequence-based Predictions
Contents
Secondary Structure
To determine which approach to follow we examined the proposed run-combinations for ReProf, where prediction only from FASTA-sequence vs. prediction from PSSM generated by PSI-Blast was looked at. Additionally the prediction of the secondary structure by ReProf with PSSM was further divided into PSSM generated by using big_80 and PSSM generated by using SwissProt. For further comparison a secondary structure prediction via PSI-Pred was initiated as well as a secondary structure assignment by DSSP. As DSSP assigns the secondary structure using the atom coordinates stored in PDB, we assume that we can use the DSSP assignment as the "true secondary structure" and compare the prediction methods in terms of performance to DSSP as reference. For the secondary structure prediction of Aspartoacylase (P45381|ACY2_HUMAN) the results are displayed in Table 1. As Psi-Pred predictions when run via the official webserver take up much more time than running ReProf locally on the students lab, the decision to further use ReProf was made. More specifically ReProf with a position specific scoring matrix derived from big_80 was chosen (PSSM created with Psi-Blast, cut-off e-10 and 3 iterations). However, out of curosity, additionally to the ReProf prediction, PSI-Pred predictions for the remaining proteins where run nevertheless. Precision, Recall and F-measure where calculated again in the same manner as it was done to decide on the preferred prediction method. A overview of the prediction statistics with the DSSP asignment as reference can be seen in Table 2
<figtable id="ACY_2 statistics">
Secondary Structure Prediction Statistics for ACY_2 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-Measure | ||||||||||
Type | H | E | L | all | H | E | L | all | H | E | L | all |
ReProf (FASTA) | 0.773 | 0.822 | 0.562 | 0.689 | 0.829 | 0.446 | 0.808 | 0.689 | 0.800 | 0.578 | 0.663 | 0.689 |
ReProf (big_80) | 0.878 | 0.889 | 0.644 | 0.782 | 0.793 | 0.675 | 0.89 | 0.782 | 0.833 | 0.767 | 0.747 | 0.782 |
ReProf (SwissProt) | 0.853 | 0.937 | 0.62 | 0.777 | 0.780 | 0.711 | 0.849 | 0.777 | 0.815 | 0.809 | 0.717 | 0.777 |
Psi-Pred | 0.914 | 0.970 | 0.647 | 0.815 | 0.780 | 0.771 | 0.904 | 0.815 | 0.842 | 0.859 | 0.754 | 0.815 |
</figtable>
<figtable id="Add Sec Struc statistics">
Secondary Structure Prediction Statistics for P10775, Q08209, Q9X0E6 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-Measure | |||||||||||
Protein | Type | H | E | L | all | H | E | L | all | H | E | L | all |
P10775 | ReProf | 0.974 | 0.959 | 0.793 | 0.922 | 0.945 | 0.855 | 0.912 | 0.922 | 0.959 | 0.904 | 0.848 | 0.922 |
Psi-Pred | 0.976 | 0.980 | 0.630 | 0.853 | 0.814 | 0.873 | 0.938 | 0.853 | 0.888 | 0.923 | 0.754 | 0.853 | |
Q08209 | ReProf | 0.957 | 0.842 | 0.658 | 0.812 | 0.780 | 0.787 | 0.878 | 0.812 | 0.859 | 0.814 | 0.752 | 0.812 |
Psi-Pred | 0.895 | 0.971 | 0.594 | 0.757 | 0.723 | 0.557 | 0.944 | 0.757 | 0.800 | 0.708 | 0.729 | 0.757 | |
Q9X0E6 | ReProf | 0.973 | 0.971 | 0.526 | 0.879 | 0.947 | 0.829 | 0.833 | 0.879 | 0.960 | 0.894 | 0.645 | 0.879 |
Psi-Pred | 1.000 | 1.000 | 0.600 | 0.912 | 0.947 | 0.854 | 1.000 | 0.912 | 0.973 | 0.921 | 0.750 | 0.912 |
</figtable>