Canavan Disease
Contents
Secondary Structure
To determine which approach to follow we examined the proposed run-combinations for ReProf, where prediction only from FASTA-sequence vs. prediction from PSSM generated by PSI-Blast was looked at. Additionally the prediction of the secondary structure by ReProf with PSSM was further divided into PSSM generated by using big_80 and PSSM generated by using SwissProt. For further comparison a secondary structure prediction via PSI-Pred was initiated as well as a secondary structure assignment by DSSP. As DSSP assigns the secondary structure using the atom coordinates stored in PDB, we assume that we can use the DSSP assignment as the "true secondary structure" and compare the prediction methods in terms of performance to DSSP as reference. For the evaluation of the prediction methods there were however some problems we stepped into and had to deal with. First of all the PDB entry of ACY2 regards the protein as a homo-dimer, however it only exists in that form when crystallized. Therefore to compare and create statistics between the prediction methods and DSSP the output of the DSSP assignment had to be double checked and only one part of the assignment (to get the monomer) could be used. Additionally the beginning as well as the ending of the DSSP assignment had to be extended with some no secondary structure assigned symbols to stretch the DSSP assignment data to the full length of the protein. The final statistics concerning the secondary structure prediction of Aspartoacylase (P45381|ACY2_HUMAN) is displayed in <xr id="ACY2_statistics"> Table </xr>.
<figtable id="ACY2_statistics">
Secondary Structure Prediction Statistics for ACY2 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-Measure | ||||||||||
Type | H | E | L | H | E | L | H | E | L | |||
ReProf (FASTA) | 0.773 | 0.822 | 0.562 | 0.829 | 0.446 | 0.808 | 0.800 | 0.578 | 0.663 | |||
ReProf (big_80) | 0.878 | 0.889 | 0.644 | 0.793 | 0.675 | 0.890 | 0.833 | 0.767 | 0.747 | |||
ReProf (SwissProt) | 0.853 | 0.937 | 0.62 | 0.780 | 0.711 | 0.849 | 0.815 | 0.809 | 0.717 | |||
Psi-Pred | 0.914 | 0.970 | 0.647 | 0.780 | 0.771 | 0.904 | 0.842 | 0.859 | 0.754 |
</figtable>
As Psi-Pred predictions when run via the official webserver take up much more time than running ReProf locally on the students lab, the decision to further use ReProf was made. More specifically ReProf with a position specific scoring matrix derived from big_80 was chosen (PSSM created with Psi-Blast, cut-off e-10 and 3 iterations). However, out of curiosity, additionally to the ReProf prediction, PSI-Pred predictions for the remaining proteins where run nevertheless.
During the mapping of Uniprot ID to PDB ID there arose some complications as not all proteins that where found contained the full sequence of the translated gene. The proteins that where used for the DSSP assignment where chosen manually to ensure that the whole sequence is contained within the protein, at least as part of the whole PDB entry. Additionally some modifications had to done again to ensure that the DSSP assignment has the same length as the predictions by ReProf and PSI-Pred. For example Q08209 mapped to 1AUI chain A covering most of translated gene, however parts of 1AUI could not be crystallized and the atom coordinates are missing from the PDB file (374 - 468). As a result those positions are fully absent from the DSSP assignment as well, and had to be filled with no predicted structure. After dealing with all those complications Precision, Recall and F-measure where calculated again in the same manner as it was done to decide on the preferred prediction method. An overview of the prediction statistics with the DSSP assignment as reference can be seen in <xr id="additional_statistics"> Table </xr>.
<figtable id="additional_statistics">
Secondary Structure Prediction Statistics for P10775, Q08209, Q9X0E6 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-Measure | |||||||||||
Protein | Type | H | E | L | H | E | L | H | E | L | |||
P10775 (1DFJ_I) | ReProf | 0.974 | 0.959 | 0.793 | 0.945 | 0.855 | 0.912 | 0.959 | 0.904 | 0.848 | |||
Psi-Pred | 0.976 | 0.980 | 0.630 | 0.814 | 0.873 | 0.938 | 0.888 | 0.923 | 0.754 | ||||
Q08209 (1AUI_A) | ReProf | 0.957 | 0.842 | 0.658 | 0.780 | 0.787 | 0.878 | 0.859 | 0.814 | 0.752 | |||
Psi-Pred | 0.895 | 0.971 | 0.594 | 0.723 | 0.557 | 0.944 | 0.800 | 0.708 | 0.729 | ||||
Q9X0E6 (1O5J) | ReProf | 0.973 | 0.971 | 0.526 | 0.947 | 0.829 | 0.833 | 0.960 | 0.894 | 0.645 | |||
Psi-Pred | 1.000 | 1.000 | 0.600 | 0.947 | 0.854 | 1.000 | 0.973 | 0.921 | 0.750 |
</figtable>
Disorder
P45381
Both IUPred and metadisorder predict the protein to be completely globular. Information about disorder could not be found in Uniprot, PDB or Disprot. As especially in Disport no entry for ACY2 could be found, a sequence search was initiated. However the sequence search did not show reasonable result either. Both Smith-Waterman and PSI-Blast returned hits with e-values such as that it should be fairly save to assume that the results are not relevant. When looked at closer this assumption is proven to be true, as the best hits for both prediction methods are associated with cAMP related chemical reactions, whereas the enzymatic reaction that P45381 catalyzes is taking place completely without any form of cAMP. Furthermore the sequential overlap between the aligned sequences is rather short. Combining these facts it can be stated that taking on of the best hits to represent the information about disorder in the desired protein would most probable result in false assumptions.
P10775
Q9X0E6
Searching in Disprot ith the 2 different approaches for the sequence search resulted in different hits. Looking at the hits deliverd by using PSI-Blast as search algorithm, the found hits can quickly be discarded. Firstly all proteins found have a length of 500 to approximately 1500 residues, while Q9X0E6 has a length of only 100 Amino acids. Secondly the three best hits all originate from viruses (Example: best hit via PSI-Blast). And finally all hits have an e-value above 3.7 and the allignment it self only spans a region of 20 amino acids.
Using Smith-Waterman the best hit (e-value 0.36) delivered a uncategorized protein (Q57696 - Y246_METJA) that is disordered over the complete length of the protein. The found protein is orginating from Methanocaldococcus jannaschii and comparing the secondary structure information of the found protein and the original protein shows a completely different secondary structure. Q57696 is an asumed all alpha protein whereas Q9X0E6 is a mixed alpha and beta protein. Additionally, if comparing the Pfam assoziation for both proteins it becomes visible that they belong to two distinct protein families (Q57696 -> PF01817 vs. Q9X0E6 -> PF03091) following two completely different functions. With this information in mind it can savely be asumed that the hit found in Disprot via Smith-Watermann is a false hit, and therefore no related protein for Q9X0E6 can be found in the disport database.
Q08209
Transmembrane Helices
Following the task the transmembrane helices and topology for the three given proteins plus ACY2 were predicted via Polyphobius and MEMSAT-SVM. As running the prediction with MEMSAT-SVM automatically returned the prediction results for MEMSAT-3 too, this data was incorporated in the comparison of the results as well.
P45381
ACY2 (P45381) is a protein that is located in the cytoplasma and not bound to the cell membrane therefore it should be save to expect that none of the prediction methods predicts a transmembrane helix. However Polyphobius was the only one to do so. MEMSAT-3 predicted a helix from the amiino acid position 60 to 78, even though the score is negative. MEMSAT-SVM predicted a helix ranging form amino acid 114 to 129 again with a negative score. As MEMSAT seems to test all possible combinations of helices present in the protein, ranging from the amount of 1 to n, with the possibility of 0 not tested, it could be hypothesized that MEMSAT always returns a prediction for a transmembrane helix even if the score is negative.
P35462
P35462 (PDB:3PBL) a dopamine receptor in human is a 7-helical-transmembrane protein. Prediction of the transmembrane helices was done with the aid of MEMSAT-(SVM & 3) and Polyphobius. Interestingly MEMSAT-SVM did not predict the correct amount of helices, stoping after the sixth one. MEMSAT-3 did correctly predict seven helices despite being claimed to be worse in prediction power. PolyPhobis did achieve the best prediction for that protein, have correctly predicted all 7 helices and having predicted the borders of the helices more precisely than MEMSAT. The exact numbers can be found in <xr id="P35462_tmhs"> Table </xr>
<figtable id="P35462_tmhs">
Predicted Transmembrane Helices for P35462 | |||||||
---|---|---|---|---|---|---|---|
Helix Positions | |||||||
Method | #1 | #2 | #3 | #4 | #5 | #6 | #7 |
UniProt | 33-55 | 66-88 | 105-126 | 150-170 | 188-212 | 330-351 | 367-388 |
PolyPhobius | 30-55 | 66-88 | 105-126 | 150-170 | 188-212 | 329-352 | 367-386 |
MEMSAT-SVM | 32-55 | 65-88 | 101-129 | 151-169 | 188-209 | 331-354 | no prediction |
MEMSAT-3 | 31-55 | 67-91 | 102-126 | 148-167 | 189-213 | 327-350 | 365-383 |
</figtable>
Additional information:
Q9YDF8
Q9YDF8 (PDB:1ORQ/1ORS/2A0L/2KYH) a crucial part to form potassium channels is a 7-helical-transmembrane protein. Prediction of the transmembrane helices was done with the aid of MEMSAT-(SVM & 3) and Polyphobius. In this case only Polyphobius correctly predicted the number of existent helices. Both MEMSAT-3 and MEMSAT-SVM predicted only six. Additionally all three tools had great problems of predicting the right borders. Polyphobius seems to have jumped over the third helix annotated in Swissprot, completely misspredicting the borders of the fifth helix (fourth helix predicted) and predicts a (sitxth) helix where in the actual protein a intramembrane element is located at the amino acid position 196 to 208. MEMSAT-SVM and MEMSAT-3, although falsely predicting six transmembrane helices, are concerning the precision of predicted helix borders closer to the annotation in SwissProt, except for the third helix where MEMSAT seems to have fused the third and fourth annotated helix. The exact numbers can be found in <xr id="Q9YDF8_tmhs"> Table </xr>
<figtable id="Q9YDF8_tmhs">
Predicted Transmembrane Helices for Q9YDF8 | |||||||
---|---|---|---|---|---|---|---|
Helix Positions | |||||||
Method | #1 | #2 | #3 | #4 | #5 | #6 | #7 |
UniProt | 39-63 | 68-92 | 97-105 | 109-125 | 129-145 | 160-184 | 222-253 |
PolyPhobius | 42-60 | 68-88 | 108-129 | 137-157 | 163-184 | 196-213 | 224-244 |
MEMSAT-SVM | 43-59 | 72-90 | 101-118 | 128-143 | 163-184 | 221-245 | no prediction |
MEMSAT-3 | 38-60 | 66-90 | 100-119 | 122-141 | 161-184 | 218-242 | no prediction |
</figtable>
Additional information:
- UniProt entry: Q9YDF8
- OMP entry: not clear
- PDBTM entry: see OMP
P47863
P47863 (PDB:2D57) a aquaporin in rat is a 6-helical-transmembrane protein. Prediction of the transmembrane helices was done with the aid of MEMSAT-(SVM & 3) and Polyphobius. In this case every prediction tool correctly predicted the number of existent helices. PolyPhobius and MEMSAT-SVM were slightly off predicting the borders of the helices, whereas in this case the claimed inferiority of MEMSAT-3 compared to MEMSAT-SVM can clearly be seen showing less precise border prediction. The exact numbers can be found in <xr id="P47863_tmhs"> Table </xr>
<figtable id="P47863_tmhs">
Predicted Transmembrane Helices for P47863 | ||||||
---|---|---|---|---|---|---|
Helix Positions | ||||||
Method | #1 | #2 | #3 | #4 | #5 | #6 |
UniProt | 37–57 | 65-85 | 116-136 | 156-176 | 185-205 | 232-252 |
PolyPhobius | 34-58 | 70-91 | 115-136 | 156-177 | 188-208 | 231-252 |
MEMSAT-SVM | 35-56 | 71-89 | 113-136 | 157-178 | 190-205 | 232-252 |
MEMSAT-3 | 35-59 | 71-95 | 117-141 | 157-180 | 187-206 | 240-264 |
</figtable>
Additional information:
Signal Peptides
For the prediction of signal peptides SignalP version 4.1 (webserver) was used.
P02768
Serum albumin (P02768) is a protein that is one of the main components of blood plasma. As it clearly has to to be secreted into the blood vessels it can be expected that P02768 has motives that are crucial for the delivery down the secretory pathway and therefore contains a signal peptide sequence. This is exactly what the prediction for signal peptides using SignalP shows. SignalP predicts that P02768 has signal peptide sequence and that a cleavage site exists between amino acid position 18 and 19. Looking at the plot <xr id="P02768_signalp"> (see Figure</xr>) created by SignalP v4.1 this clear signal at position 19 (0.710) can be observed.
<figure id="P02768_signalp">
</figure>
Additional information:
- UniProt entry: P02768
- Signal Peptide Database Entry: ALBU_HUMAN
P47863
As we know after the task to predict transmembrane helices P47863 is a aquaporin that is located within the membrane. The prediction by SignalP shows that neither a signal peptide sequence nor a cleavage site can be detected. Detailed graphical output can be seen in <xr id="P47863_signalp">Figure</xr>.
<figure id="P47863_signalp">
</figure>
Additional information:
- UniProt entry: P47863
P11279
LAMP-1 (Lysosome-associated membrane glycoprotein 1 | P11279) is a membrane protein. It takes an important role in the autophagy process and is associated with tumor metastasis. It has one transmenbrane helix which could be a some sort of protein anchor. Taking a look at the signal peptide prediction by SignalP reveals that LAMP-1 has an assumed signal peptide sequence and a cleavage site between the amino acids 28 and 29. This is congruent with the information stored in the Signal Peptide Database [3]. A detailed graphical output of the SignalP prediction is displayed in <xr id="P11279_signalp"> see Figure</xr>.
<figure id="P11279_signalp">
</figure>
Additional information:
- UniProt entry: P11279
- Signal Peptide Database Entry: LAMP1_HUMAN
GO-Terms
GO-Pet & Prot-Fun
The GO-Term prediction for Aspartoacylase executed by GO-Pet (see <xr id="P11279_signalp"> Table</xr>) is very acurate. Looking at the know enzymatic acitvity of ACY2, it can be observed that the predectied biological processes exactly reflect the chemical reaction happening.
<figtable id="P45381_gopet">
Predicted GOTerms for P45381 by GO-Pet | ||||||
---|---|---|---|---|---|---|
GO-ID | GO-Term / Description | Confidence | ||||
GO:0016787 | hydrolase activity | 96% | ||||
GO:0004046 | aminoacylase activity | 82% | ||||
GO:0019807 | aspartoacylase activity | 82% | ||||
GO:0016788 | hydrolase activity acting on ester bonds | 81% |
</figtable>
Prot-Fun interestingly correclty predicts that ACY2 is an enzym however mispredicting it for an isomerase. (Prob:Odds 0.084:2637 vs 0.115:0.363 for Hydrolase). Additinonally Prot-Fun can not decide on a Gene Ontology category and sorts ACY2 into the functional category of "central intermediary metabolism".
Pfam
The Pfam sequence search with P47863 (ACY2_Human) directed us to the Succinylglutamate desuccinylase / Aspartoacylase family PF04952. The InterPro information stored referenced in Pfam further states that the faminly has the molecular function "hydrolase acitvity, acting on ester bonds "(GO:0016788) and the biological process is addigend to "metabolic process" (GO:0008152). Additional information are that the family belogns to the clan of Pepdidase_MH CL0035. The family contains 2822 sequences, 1568 species and 43 known structures.