Canavan Disease: Task 03 - Sequence-based Predictions

From Bioinformatikpedia
Revision as of 21:23, 25 May 2013 by Mahlich (talk | contribs)

Secondary Structure

To determine which approach to follow we examined the proposed run-combinations for ReProf, where prediction only from FASTA-sequence vs. prediction from PSSM generated by PSI-Blast was looked at. Additionally the prediction of the secondary structure by ReProf with PSSM was further divided into PSSM generated by using big_80 and PSSM generated by using SwissProt. For further comparison a secondary structure prediction via PSI-Pred was initiated as well as a secondary structure assignment by DSSP. As DSSP assigns the secondary structure using the atom coordinates stored in PDB, we assume that we can use the DSSP assignment as the "true secondary structure" and compare the prediction methods in terms of performance to DSSP as reference. For the secondary structure prediction of Aspartoacylase (P45381|ACY2_HUMAN) the results are displayed in Table 1. As Psi-Pred predictions when run via the official webserver take up much more time than running ReProf locally on the students lab, the decision to further use ReProf was made. More specifically ReProf with a position specific scoring matrix derived from big_80 was chosen (PSSM created with Psi-Blast, cut-off e-10 and 3 iterations). However, out of curosity, additionally to the ReProf prediction, PSI-Pred predictions for the remaining proteins where run nevertheless. Precision, Recall and F-measure where calculated again in the same manner as it was done to decide on the preferred prediction method. A overview of the prediction statistics with the DSSP asignment as reference can be seen in Table 2

<figtable id="ACY_2 statistics">

Secondary Structure Prediction Statistics for ACY_2
Precision Recall F-Measure
Type H E L H E L H E L
ReProf (FASTA) 0.773 0.822 0.562 0.829 0.446 0.808 0.800 0.578 0.663
ReProf (big_80) 0.878 0.889 0.644 0.793 0.675 0.890 0.833 0.767 0.747
ReProf (SwissProt) 0.853 0.937 0.62 0.780 0.711 0.849 0.815 0.809 0.717
Psi-Pred 0.914 0.970 0.647 0.780 0.771 0.904 0.842 0.859 0.754
Statistical overview of Precision, Recall and F-Measure for the prediction tools used, with DSSP as reference. H = Helix, E = Beta-Strand, L = Loop. Psi-Pred shows the best performance for ACY_2. ReProf with a PSSM created by Psi-Blast using big_80 as database preforms second best but greatly outperforms (not shown) Psi-Pred in terms of speed (ReProf run locally, Psi-Pred run on offical webserver)

</figtable>

<figtable id="ACY_2 statistics">

Secondary Structure Prediction Statistics for P10775, Q08209, Q9X0E6
Precision Recall F-Measure
Protein Type H E L H E L H E L
P10775 ReProf 0.974 0.959 0.793 0.945 0.855 0.912 0.959 0.904 0.848
Psi-Pred 0.976 0.980 0.630 0.814 0.873 0.938 0.888 0.923 0.754
Q08209 ReProf 0.957 0.842 0.658 0.780 0.787 0.878 0.859 0.814 0.752
Psi-Pred 0.895 0.971 0.594 0.723 0.557 0.944 0.800 0.708 0.729
Q9X0E6 ReProf 0.973 0.971 0.526 0.947 0.829 0.833 0.960 0.894 0.645
Psi-Pred 1.000 1.000 0.600 0.947 0.854 1.000 0.973 0.921 0.750
Statistical overview of Precision, Recall and F-Measure for the prediction tools used, with DSSP as reference. H = Helix, E = Beta-Strand, L = Loop. For P10775 and Q08209 ReProf clearly shows the better performance. Psi-Pred shows better preformance for Q9X0E6.

</figtable>


Disorder

Transmembrane Helices

Signal Peptides

GO-Terms