Difference between revisions of "Sequence-based predictions HEXA"
(→OCTOPUS and SPOCTOPUS) |
(→Prediction of transmembrane alpha-helices and signal peptides) |
||
(261 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
=== Secondary Structure Prediction === |
=== Secondary Structure Prediction === |
||
+ | To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods give similar results or if they differ extremely. |
||
− | === Prediction of disordered regions === |
||
+ | [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/secstr_general Here]] you can find some general information about these methods. |
||
− | * DISOPRED |
||
+ | <br><br> |
||
− | Authors: Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
+ | ---- |
||
+ | === Prediction of disordered regions === |
||
− | Year: 2004 |
||
+ | After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose. |
||
− | Source: [[http://www.ncbi.nlm.nih.gov/pubmed/15019783 Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.]] |
||
+ | To get more insight into the methods and the theory behind them we also offer you an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/disorder_general general information page]]. |
||
− | |||
+ | <br><br> |
||
− | Description: |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
− | |||
+ | ---- |
||
− | This method is based on a neuronal network which was trained on high resolution X-ray structures from PDB. Disordered regions are regions, which appears in the sequence record, but their electrons are missing from electronic density map. This approach can also failed, because missing electrons can also arise because of the cristallization process. |
||
− | The method runs first a PsiBlast search against a filtered sequence database. Next, a profile for each residue is calculated and classified by using the trained neuronal network. |
||
− | |||
− | |||
− | Prediction: |
||
− | |||
− | As a prediction result you get a file with the predicted disordered region, the precision and recall. Furthermore you can a more detailed output. There you see the sequence, and the predictions and also numbers above the sequence (from 0 to 9 which shows you how likly your prediction is) |
||
− | |||
− | |||
− | Input: |
||
− | |||
− | If you run disopred on the console, you have to define the location of your database. The program needs as input your sequence in a file with fasta format. |
||
− | |||
− | |||
− | |||
− | *POODLE |
||
− | Prediction of order and disorder by machine-learning |
||
− | |||
− | Authors: S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi |
||
− | |||
− | Year: 2007 |
||
− | |||
− | There exist three different variants of POODLE. |
||
− | |||
− | The first variant is called POODLE-L which predicts mainly long disorder region with a length more than 40. |
||
− | |||
− | |||
− | Source: [[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17545177&ordinalpos=8&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions.]] |
||
− | |||
− | The next variant is called POODLE-S, which predicts mainly short disorder regions. |
||
− | |||
− | |||
− | Source: [[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17599940&ordinalpos=7&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.]] |
||
− | |||
− | The last variant is called POODLE-I, which integrates structal information predictors. |
||
− | |||
− | |||
− | Source: [[http://www.bioinfo.de/isb/2010/10/0015/ POODLE-I: Disordered region prediction by integrating POODLE series and structural information predictors based on a workflow approach]] |
||
− | |||
− | There exists als another variant called POODLE-W, which compares different sequences and predicts which sequence is the most disordered one, but this method wasn't used in our analysis. |
||
− | |||
− | |||
− | Description: |
||
− | |||
− | POODLE is also a machine learning based method. This method based on a 2-level SVM (Support Vector Machine). |
||
− | |||
− | We describe here the POODLE-L in detail, but all POODLE variants use the same principle. |
||
− | The method was trained on disordered proteins and proteins with no disoredered regions. On the first level, the SVM predicts the probability of a 40-residue sequence segment to be disordered. If the algorithm found such a disordered regions, the second level of the SVM use the output from the first level and predicts the probability to be disordered for each amino acid. |
||
− | |||
− | |||
− | Output: |
||
− | |||
− | The result of this method is a file with the single amino acids, the prediction if it is ordered or not and the probability for the state. Furtheremore, you get a graphical view of the result. |
||
− | |||
− | |||
− | Input: |
||
− | |||
− | We used the POODLE webserver for our analysis. We paste our sequence in fasta format in the input window and chose the POODLE variant. |
||
=== Prediction of transmembrane helices and signal peptides === |
=== Prediction of transmembrane helices and signal peptides === |
||
+ | The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section. |
||
− | * TMHMM (transmembrane helices hidden markov model) |
||
+ | Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods. |
||
− | Authors: E. L.L. Sonnhammer, G. von Heijne, and A. Krogh <br> |
||
− | Year: 1998 <br> |
||
− | Source: A hidden Markov model for predicting transmembrane helices in protein sequences. <br> |
||
+ | To have a closer look at the different methods we again provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/transmembrane_signal_peptide_general information page.]] |
||
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
+ | ---- |
||
+ | === Prediction of GO Terms === |
||
− | Description:<br> |
||
− | TMHMM is a hidden markov model-based prediction methode for transmembrane helices in proteins. The HMM consists of three different main locations (core, cap, loop) and seven different states (cytoplasmic loop, cytoplasmic cap, helix core, non-cytoplasmic cap, short non-cytoplasmic loop, long non-cytoplasmic loop and globular domain). |
||
+ | The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other. |
||
+ | Again we also provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_terms_general general information page]] about the GO Term methods, we used in our analysis. |
||
− | Prediction: <br> |
||
+ | <br><br> |
||
− | This method search for a given protein sequence in FASTA-format the best path through the hidden markov model. There are two output possibilities, the short one and the long one. The long output format gives additional statistic information (i.e. expected numbers of amino acids in transmembrane helices). |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
− | |||
− | |||
− | Input: <br> |
||
− | The method only needs the protein sequence in FASTA-format for the prediction. |
||
− | |||
− | * Phobius and PolyPhobius |
||
− | |||
− | Phobius:<br> |
||
− | Authors: Lukas Käll, Anders Krogh and Erik L. L. Sonnhammer<br> |
||
− | Year: 2004<br> |
||
− | Source: A Combined Transmembrane Topology and Signal Peptide Prediction Method. <br><br> |
||
− | |||
− | PolyPhobius:<br> |
||
− | Authors: Lukas Käll, Anders Krogh and Erik Sonnhammer<br> |
||
− | Year: 2005<br> |
||
− | Source: An HMM posterior decoder for sequence feature prediction that includes homology information. <br> |
||
− | |||
− | Description:<br> |
||
− | Phobius and PolyPhobius are combined methods, which predict transmembrane helices and signal peptides. These both methods are based on a hidden markov model and combine the methods from TMHMM and SignalP. The basic of these methods are the HMM from TMHMM with an additional start state for signal peptides. The difference between Phobius and PolyPhobius is, that PolyPhobius also use homology information for the prediction.<br> |
||
− | |||
− | Input:<br> |
||
− | We used the Webserver for Phobius and PolyPhobius and there it was only necessary to paste the protein sequence in fasta format.<br> |
||
− | |||
− | Output:<br> |
||
− | The Server outputs a textfile with the prediction of the position of the signal peptide, the type of the signal peptide and also the positions of the transmembrane helices. Furthermore, it outputs a detailed file, with the probabilties for each residue to be located in a transmembrane helix or signal peptide. Additionally, the server outputs a picture of the prediction. |
||
− | |||
− | === Prediction of GO Terms === |
||
== Secondary Structure prediction == |
== Secondary Structure prediction == |
||
+ | === Results === |
||
− | == Prediction of disordered regions == |
||
+ | The detailed output of the different prediction methods can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Secondary_Structure_Prediction here]] |
||
− | Before we start to analyse the results of the different methods, we checked, if our protein has one or more disoredered regions. Therefore, we search our protein in the DisProt database and didn't found it, so our protein doesn't have disordered regions. Another possibility to find out if the protein has disordered regions, is only to check in the UniProt entry, if there is an entry for DisProt. |
||
+ | Here we only present a short summary of the output of the different methods. |
||
+ | * Predicted Helices |
||
− | * Disopred |
||
− | Disopred predicts two disordered regions in our protein. The first region is at the beginning of the protein (first two residues) and the second region is at the end (last three regions). This prediction is wrong, because it is normal, that the electrons from the first and the last amino acids lack in the electron density map. So, our protein Hexosamidase A has no disordered regions. |
||
− | [[Image:disopred_result.png|center|thumb|Result of the Disopred prediction. * shows that this amino acid belongs to a disordered regions, whereas . signs for a non-disordered region.]] |
||
− | |||
− | |||
− | * POODLE |
||
− | We decided to test several POODLE variants and to compare the results. |
||
− | |||
− | POODLE-I |
||
− | |||
− | POODLE-I predicted five disordered regions: |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |method |
||
+ | |#helices |
||
|- |
|- |
||
+ | |PSIPRED |
||
− | |start position |
||
+ | |14 |
||
− | |end position |
||
− | |length |
||
− | |- |
||
− | |1 |
||
− | |2 |
||
− | |2 |
||
|- |
|- |
||
+ | |Jpred3 |
||
|14 |
|14 |
||
− | |19 |
||
− | |6 |
||
|- |
|- |
||
+ | |DSSP |
||
− | |83 |
||
− | | |
+ | |16 |
− | |7 |
||
|- |
|- |
||
− | | |
+ | |} |
+ | |||
− | |109 |
||
+ | * Predicted Beta-Sheets |
||
− | |5 |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |method |
||
+ | |#sheets |
||
|- |
|- |
||
+ | |PSIPRED |
||
− | |527 |
||
− | | |
+ | |15 |
− | | |
+ | |- |
+ | |Jpred3 |
||
+ | |15 |
||
+ | |- |
||
+ | |DSSP |
||
+ | |0 |
||
|- |
|- |
||
|} |
|} |
||
+ | === Comparison of the different methods === |
||
+ | |||
+ | To determine how successful our secondary structure prediction with PSIPRED and Jpred were, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore, the main comparison in this case refers to the alpha-helices. |
||
+ | |||
+ | For PSIPRED the prediction of the alpha-helices was good. In most cases the alpha-helices of DSSP and PSIPRED correspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones. |
||
+ | |||
+ | For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not assigned by DSSP. In contrary, there are three small helices which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assigns two helices which are separated by a turn and Jpred predicts there only one big helix. |
||
+ | |||
+ | All in all, the prediction of the helices is probably good because they correspond mostly with the assignment of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all. |
||
+ | <br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
+ | |||
+ | == Prediction of disordered regions == |
||
+ | Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disordered regions. Therefore, we search our protein in the [[http://www.disprot.org/ DisProt database]] and did not find it, so our protein does not have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check [[http://www.uniprot.org/ UniProt]], if there is an entry for [[http://www.disprot.org DisProt]]. |
||
− | POODLE-L |
||
+ | === Results === |
||
− | POODLE-L found no disordered regions. Therefore, there is no disordered region with a length more than 40aa in our protein. |
||
+ | The detailed results of the different methods can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_Disordered_Regions here]] |
||
+ | In this section, we only want to give a summary of the output of the different methods. |
||
− | POODLE-S (High B-factor residues) |
||
− | This POODLE-S variant searches for high B-facto values in the crystallography, which implies uncertainty in the assignment of the atom positions. |
||
− | POODLE-S predicted five disordered regions: |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |method |
||
+ | |#disordered regions in the protein |
||
+ | |#disordered regions on the brink |
||
|- |
|- |
||
+ | |Disopred |
||
− | |start position |
||
− | |end position |
||
− | |length |
||
− | |- |
||
|0 |
|0 |
||
|2 |
|2 |
||
+ | |- |
||
+ | |POODLE-I |
||
+ | |3 |
||
|2 |
|2 |
||
|- |
|- |
||
+ | |POODLE-L |
||
− | |13 |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
|- |
|- |
||
+ | |POODLE-S (B-factors) |
||
− | |83 |
||
− | | |
+ | |3 |
− | | |
+ | |2 |
|- |
|- |
||
+ | |POODLE-S (missing residues) |
||
− | |105 |
||
− | |109 |
||
− | |5 |
||
− | |- |
||
− | |526 |
||
− | |529 |
||
|4 |
|4 |
||
+ | |2 |
||
|- |
|- |
||
+ | |IUPred (short) |
||
− | |} |
||
+ | |0 |
||
− | |||
− | |||
− | POODLE-S (missing residues) |
||
− | |||
− | POODLE-S (missing residues) predicts regions as disordered, if there is a amino acid in the sequence record, but not on the electron density map. |
||
− | |||
− | Poodle-S found 6 disordered regions. |
||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
− | |- |
||
− | |start position |
||
− | |end position |
||
− | |length |
||
− | |- |
||
− | |17 |
||
− | |18 |
||
|2 |
|2 |
||
|- |
|- |
||
+ | |IUPred (long) |
||
− | |53 |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
|- |
|- |
||
+ | |IUPred (structural information) |
||
− | |78 |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
|- |
|- |
||
+ | |Meta-Disorder |
||
− | |153 |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
|- |
|- |
||
− | |280 |
||
− | |280 |
||
− | |1 |
||
− | |- |
||
− | |345 |
||
− | |345 |
||
− | |1 |
||
− | |- |
||
− | |} |
||
− | |||
− | |||
− | Graphical Output: |
||
− | {| |
||
− | | [[Image:POODLE_S_B.png|thumb|Prediction of POODLE-S (High B-factor residues)]] |
||
− | | [[Image:POODLE_S_M.png|thumb|Prediction of POODLE-S (missing residues)]] |
||
− | | [[Image:POODLE_I.png|thumb|center|Prediction of POODLE-I]] |
||
− | | [[Image:POODLE_L.png |thumb|Prediction of POODLE-L]] |
||
|} |
|} |
||
− | Comparison of the different POODLE variants |
+ | === Comparison of the different POODLE variants === |
− | POODLE-L |
+ | POODLE-L does not find any disordered regions. This is the result we expected, because our protein does not possess any disordered regions. |
− | Both POODLE-S variants found several short disordered regions, which is a false result. |
+ | Both POODLE-S variants found several short disordered regions, which is a false positive result. Interestingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value. |
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor). |
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor). |
||
Line 258: | Line 154: | ||
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct. |
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct. |
||
− | In general, these predictions are used, if nothing |
+ | In general, these predictions are used, if nothing is known about the protein. Therefore, normally we do not know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. |
− | We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values |
+ | We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I. |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
Line 333: | Line 229: | ||
|} |
|} |
||
− | As you can see in the table above, only |
+ | As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong. |
+ | |||
+ | === Comparison of the different methods === |
||
+ | |||
+ | We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case. |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |rowspan="2" | |
||
+ | |colspan="9" | methods |
||
+ | |- |
||
+ | |Disopred |
||
+ | |POODLE-I |
||
+ | |POODLE-L |
||
+ | |POODLE-S (missing) |
||
+ | |POODLE-S (B-factor) |
||
+ | |IUPred (short) |
||
+ | |IUPred (long) |
||
+ | |IUPred (structure) |
||
+ | |Meta-Disorder |
||
+ | |- |
||
+ | | #wrong predicted residues |
||
+ | |5 |
||
+ | |23 |
||
+ | |0 |
||
+ | |47 |
||
+ | |24 |
||
+ | |3 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |} |
||
+ | <br><br> |
||
+ | POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct. |
||
+ | The worst prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).<br><br> |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
== Prediction of transmembrane alpha-helices and signal peptides == |
== Prediction of transmembrane alpha-helices and signal peptides == |
||
− | Because most of the proteins we used in this |
+ | Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.<br> |
Additional proteins: |
Additional proteins: |
||
Line 380: | Line 311: | ||
|} |
|} |
||
+ | The detailed output for the different organism and the different prediction methods can be found here: |
||
− | === TMHMM === |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_HEXA_HUMAN HEXA_HUMAN]] |
||
− | We analysed the six sequences with TMHMM. |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_BACR_HALSA BACR_HALSA]] |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_RET4_HUMAN RET4_HUMAN]] |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_INSL5_HUMAN INSL5_HUMAN]] |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_LAMP1_HUMAN LAMP1_HUMAN]] |
||
+ | * [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Prediction_of_transmembrane_alpha-helices_and_signal_peptides_A4_HUMAN A4_HUMAN]] |
||
+ | === Results === |
||
− | *Hexosamidase A |
||
+ | ==== Transmembrane Helices ==== |
||
− | TODO |
||
− | |||
− | * BACR_HALSA |
||
− | |||
− | [[Image:bacr_halsa_tmhmm.png|thumb|Prediction of TMHMM for the transmembrane helices of BACR_HALSA]] |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
|- |
|- |
||
+ | | |
||
+ | |colspan="3" | TMHMM |
||
+ | |colspan="3" | Phobius |
||
+ | |colspan="3" | PolyPhobius |
||
+ | |colspan="3" | OCTOPUS |
||
+ | |colspan="3" | SPOCTOPUS |
||
+ | |- |
||
+ | |protein |
||
+ | |start position |
||
+ | |end position |
||
+ | |location |
||
+ | |start position |
||
+ | |end position |
||
+ | |location |
||
+ | |start position |
||
+ | |end position |
||
+ | |location |
||
+ | |start position |
||
+ | |end position |
||
+ | |location |
||
|start position |
|start position |
||
|end position |
|end position |
||
|location |
|location |
||
|- |
|- |
||
+ | |rowspan="3" | HEXA HUMAN |
||
+ | |1 |
||
+ | |529 |
||
+ | |outside |
||
+ | |23 |
||
+ | |529 |
||
+ | |outside |
||
+ | |20 |
||
+ | |520 |
||
+ | |outside |
||
+ | |1 |
||
+ | |2 |
||
+ | |inside |
||
+ | |22 |
||
+ | |529 |
||
+ | |outside |
||
+ | |- |
||
+ | |colspan="9" | |
||
+ | |3 |
||
+ | |23 |
||
+ | |TM helix |
||
+ | |colspan="3" | |
||
+ | |- |
||
+ | |colspan="9" | |
||
+ | |24 |
||
+ | |529 |
||
+ | |outside |
||
+ | |colspan="3" | |
||
+ | |- |
||
+ | |rowspan="15" | BACR HALSA |
||
+ | |1 |
||
+ | |22 |
||
+ | |outside |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | |1 |
||
+ | |22 |
||
+ | |outside |
||
|1 |
|1 |
||
|22 |
|22 |
||
Line 405: | Line 399: | ||
|42 |
|42 |
||
|TM Helix |
|TM Helix |
||
+ | |23 |
||
+ | |42 |
||
+ | |TM helix |
||
+ | |22 |
||
+ | |43 |
||
+ | |TM helix |
||
+ | |23 |
||
+ | |43 |
||
+ | |TM helix |
||
+ | |23 |
||
+ | |43 |
||
+ | |TM helix |
||
|- |
|- |
||
|43 |
|43 |
||
+ | |54 |
||
+ | |inside |
||
+ | |43 |
||
+ | |53 |
||
+ | |inside |
||
+ | |44 |
||
+ | |54 |
||
+ | |inside |
||
+ | |44 |
||
+ | |54 |
||
+ | |inside |
||
+ | |44 |
||
|54 |
|54 |
||
|inside |
|inside |
||
Line 413: | Line 431: | ||
|77 |
|77 |
||
|TM Helix |
|TM Helix |
||
+ | |54 |
||
+ | |76 |
||
+ | |TM helix |
||
+ | |55 |
||
+ | |77 |
||
+ | |TM helix |
||
+ | |55 |
||
+ | |75 |
||
+ | |TM helix |
||
+ | |55 |
||
+ | |75 |
||
+ | |TM helix |
||
|- |
|- |
||
|78 |
|78 |
||
|91 |
|91 |
||
+ | |outside |
||
+ | |77 |
||
+ | |95 |
||
+ | |outside |
||
+ | |78 |
||
+ | |94 |
||
+ | |outside |
||
+ | |76 |
||
+ | |95 |
||
+ | |outside |
||
+ | |76 |
||
+ | |95 |
||
|outside |
|outside |
||
|- |
|- |
||
Line 421: | Line 463: | ||
|114 |
|114 |
||
|TM Helix |
|TM Helix |
||
+ | |96 |
||
+ | |114 |
||
+ | |TM helix |
||
+ | |95 |
||
+ | |114 |
||
+ | |TM helix |
||
+ | |96 |
||
+ | |116 |
||
+ | |TM helix |
||
+ | |96 |
||
+ | |116 |
||
+ | |TM helix |
||
|- |
|- |
||
|115 |
|115 |
||
+ | |120 |
||
+ | |inside |
||
+ | |115 |
||
+ | |120 |
||
+ | |inside |
||
+ | |115 |
||
+ | |120 |
||
+ | |inside |
||
+ | |117 |
||
+ | |121 |
||
+ | |inside |
||
+ | |117 |
||
|120 |
|120 |
||
|inside |
|inside |
||
Line 429: | Line 495: | ||
|143 |
|143 |
||
|TM Helix |
|TM Helix |
||
+ | |121 |
||
+ | |142 |
||
+ | |TM helix |
||
+ | |121 |
||
+ | |141 |
||
+ | |TM helix |
||
+ | |122 |
||
+ | |142 |
||
+ | |TM helix |
||
+ | |121 |
||
+ | |141 |
||
+ | |TM helix |
||
|- |
|- |
||
|144 |
|144 |
||
+ | |147 |
||
+ | |outside |
||
+ | |143 |
||
+ | |147 |
||
+ | |outside |
||
+ | |142 |
||
+ | |147 |
||
+ | |outside |
||
+ | |143 |
||
+ | |147 |
||
+ | |outside |
||
+ | |142 |
||
|147 |
|147 |
||
|outside |
|outside |
||
Line 437: | Line 527: | ||
|170 |
|170 |
||
|TM Helix |
|TM Helix |
||
+ | |148 |
||
+ | |169 |
||
+ | |TM helix |
||
+ | |148 |
||
+ | |166 |
||
+ | |TM helix |
||
+ | |148 |
||
+ | |168 |
||
+ | |TM helix |
||
+ | |148 |
||
+ | |168 |
||
+ | |TM helix |
||
|- |
|- |
||
|171 |
|171 |
||
|189 |
|189 |
||
+ | |inside |
||
+ | |170 |
||
+ | |189 |
||
+ | |inside |
||
+ | |167 |
||
+ | |186 |
||
+ | |inside |
||
+ | |169 |
||
+ | |185 |
||
+ | |inside |
||
+ | |169 |
||
+ | |185 |
||
|inside |
|inside |
||
|- |
|- |
||
Line 445: | Line 559: | ||
|212 |
|212 |
||
|TM Helix |
|TM Helix |
||
+ | |190 |
||
+ | |212 |
||
+ | |TM helix |
||
+ | |187 |
||
+ | |205 |
||
+ | |TM helix |
||
+ | |186 |
||
+ | |206 |
||
+ | |TM helix |
||
+ | |186 |
||
+ | |206 |
||
+ | |TM helix |
||
|- |
|- |
||
|213 |
|213 |
||
|262 |
|262 |
||
+ | |outside |
||
+ | |213 |
||
+ | |217 |
||
+ | |outside |
||
+ | |206 |
||
+ | |215 |
||
+ | |outside |
||
+ | |207 |
||
+ | |216 |
||
+ | |outside |
||
+ | |207 |
||
+ | |216 |
||
|outside |
|outside |
||
|- |
|- |
||
+ | |colspan="3" | |
||
− | |} |
||
+ | |218 |
||
− | |||
+ | |237 |
||
− | TMHMM predicts six transmembrane helices for BACR_HALSA. We decided to compare the TMHMM prediction with the real occuring transmembrane helices in BACR_HALSA: |
||
+ | |TM helix |
||
− | |||
+ | |216 |
||
− | [[Image:h_tmhmm_vs_real.png|center|thumb|Comparison between real occuring transmembrane helices and the TMHMM result.]] |
||
+ | |237 |
||
− | Especially in the beginning is the prediction really very good. There is almost 100% overlap between predicted and real helices. Only in the end of the protein lacks one transmembrane helix in the TMHMM prediction. Therefore, in real there are 7 transmembrane helices, whereas TMHMM only predicts 6. This is really bad, because it is a different for the function if there are 6 or 7 helices, but in general the prediction of TMHMM was quite good. |
||
+ | |TM helix |
||
− | |||
+ | |217 |
||
− | |||
+ | |237 |
||
− | * RET4_HUMAN |
||
+ | |TM helix |
||
− | |||
+ | |217 |
||
− | [[Image:ret4_human_tmhmm.png|thumb|Prediction of TMHMM for the transmembrane helices of RET4_HUMAN]] |
||
+ | |237 |
||
− | |||
+ | |TM helix |
||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
|- |
|- |
||
+ | |colspan="3" | |
||
− | |start position |
||
+ | |238 |
||
− | |end position |
||
+ | |262 |
||
− | |location |
||
+ | |inside |
||
+ | |238 |
||
+ | |262 |
||
+ | |inside |
||
+ | |238 |
||
+ | |262 |
||
+ | |inside |
||
+ | |238 |
||
+ | |262 |
||
+ | |inside |
||
+ | |- |
||
+ | |rowspan="3" | RET4 HUMAN |
||
+ | |colspan="9" | |
||
+ | |1 |
||
+ | |1 |
||
+ | |inside |
||
+ | |colspan="3" | |
||
+ | |- |
||
+ | |colspan="9" | |
||
+ | |2 |
||
+ | |23 |
||
+ | |TM helix |
||
+ | |colspan="3" | |
||
|- |
|- |
||
|1 |
|1 |
||
+ | |201 |
||
+ | |outside |
||
+ | |19 |
||
+ | |201 |
||
+ | |outside |
||
+ | |19 |
||
+ | |201 |
||
+ | |outside |
||
+ | |24 |
||
+ | |201 |
||
+ | |outside |
||
+ | |20 |
||
|201 |
|201 |
||
|outside |
|outside |
||
|- |
|- |
||
+ | |rowspan="3" | INSL5 HUMAN |
||
− | |} |
||
+ | |colspan="9" | |
||
− | |||
+ | |1 |
||
− | TMHMM predicts no transmembrane helices. The whole protein is loacted in the extracellular space. |
||
+ | |1 |
||
− | |||
+ | |inside |
||
− | |||
+ | |colspan="3" | |
||
− | [[Image:r_human_tmhmm_vs_real.png|center|thumb|Comparison between real occuring transmembrane helices and the TMHMM result.]] |
||
− | The TMHMM prediction is completly right. Therefore, you can see TMHMM can also predict, that a protein is not a transmembrane protein. |
||
− | |||
− | |||
− | * INSL5_HUMAN |
||
− | |||
− | [[Image:insl5_human_tmhmm.png|thumb|Prediction of TMHMM for the transmembrane helices of INSL5_HUMAN]] |
||
− | |||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
|- |
|- |
||
+ | |colspan="9" | |
||
− | |start position |
||
+ | |2 |
||
− | |end position |
||
+ | |32 |
||
− | |location |
||
+ | |TM helix |
||
+ | |colspan="3" | |
||
|- |
|- |
||
|1 |
|1 |
||
+ | |135 |
||
+ | |outside |
||
+ | |23 |
||
+ | |135 |
||
+ | |outside |
||
+ | |23 |
||
+ | |135 |
||
+ | |outside |
||
+ | |33 |
||
+ | |135 |
||
+ | |outside |
||
+ | |24 |
||
|135 |
|135 |
||
|outside |
|outside |
||
|- |
|- |
||
+ | |rowspan="5" | LAMP1 HUMAN |
||
− | |} |
||
− | |||
− | TMHMM predicts no transmembrane helices. The whole protein is loacted in the extracellular space. |
||
− | |||
− | |||
− | [[Image:insl5_human_tmhmm_vs_real.png|center|thumb|Comparison between real occuring transmembrane helices and the TMHMM result.]] |
||
− | The TMHMM prediction is again completly right. |
||
− | |||
− | * LAMP1_HUMAN |
||
− | |||
− | [[Image:lamp1_human_tmhmm.png|thumb|Prediction of TMHMM for the transmembrane helices of LAMP1_HUMAN]] |
||
− | |||
− | |||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
− | |- |
||
− | |start position |
||
− | |end position |
||
− | |location |
||
− | |- |
||
|1 |
|1 |
||
|10 |
|10 |
||
|inside |
|inside |
||
+ | |colspan="6" | |
||
+ | |1 |
||
+ | |10 |
||
+ | |inside |
||
+ | |colspan="3" | |
||
|- |
|- |
||
|11 |
|11 |
||
|33 |
|33 |
||
|TM Helix |
|TM Helix |
||
+ | |colspan="6" | |
||
+ | |11 |
||
+ | |31 |
||
+ | |TM helix |
||
+ | |colspan="3" | |
||
|- |
|- |
||
|34 |
|34 |
||
+ | |383 |
||
+ | |outside |
||
+ | |29 |
||
+ | |381 |
||
+ | |outside |
||
+ | |29 |
||
+ | |381 |
||
+ | |outside |
||
+ | |32 |
||
+ | |383 |
||
+ | |outside |
||
+ | |30 |
||
|383 |
|383 |
||
|outside |
|outside |
||
Line 529: | Line 712: | ||
|406 |
|406 |
||
|TM Helix |
|TM Helix |
||
+ | |382 |
||
+ | |405 |
||
+ | |TM helix |
||
+ | |382 |
||
+ | |405 |
||
+ | |TM helix |
||
+ | |384 |
||
+ | |404 |
||
+ | |TM helix |
||
+ | |384 |
||
+ | |404 |
||
+ | |TM helix |
||
|- |
|- |
||
|407 |
|407 |
||
|417 |
|417 |
||
|inside |
|inside |
||
+ | |406 |
||
+ | |417 |
||
+ | |outside |
||
+ | |406 |
||
+ | |417 |
||
+ | |outside |
||
+ | |405 |
||
+ | |417 |
||
+ | |outside |
||
+ | |405 |
||
+ | |417 |
||
+ | |outside |
||
|- |
|- |
||
+ | |rowspan="5" | A4 HUMAN |
||
− | |} |
||
+ | |colspan="9" | |
||
− | |||
+ | |1 |
||
− | TMHMM predicts two transmembrane helices, which are divided by a very long loop which is loacted at the extracellular space. |
||
+ | |5 |
||
− | |||
+ | |outside |
||
− | [[Image:lamp1_human_tmhmm_vs_real.png|center|thumb|Comparison between real occuring transmembrane helices and the TMHMM result.]] |
||
+ | |colspan="3" | |
||
− | |||
− | The prediction of TMHMM is quite good. Only at the beginning of the protein TMHMM predicts one wrong transmembrane helix, but the rest of the prediction is correct. |
||
− | |||
− | * A4_HUMAN |
||
− | |||
− | |||
− | [[Image:a4_human_tmhmm.png|thumb|Prediction of TMHMM for the transmembrane helices of A4_HUMAN]] |
||
− | |||
− | |||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
|- |
|- |
||
+ | |colspan="9" | |
||
− | |start position |
||
+ | |6 |
||
− | |end position |
||
+ | |11 |
||
− | |location |
||
+ | |R |
||
+ | |colspan="3" | |
||
|- |
|- |
||
|1 |
|1 |
||
|700 |
|700 |
||
+ | |outside |
||
+ | |18 |
||
+ | |700 |
||
+ | |outside |
||
+ | |18 |
||
+ | |700 |
||
+ | |outside |
||
+ | |12 |
||
+ | |701 |
||
+ | |outside |
||
+ | |19 |
||
+ | |701 |
||
|outside |
|outside |
||
|- |
|- |
||
Line 561: | Line 773: | ||
|723 |
|723 |
||
|TM Helix |
|TM Helix |
||
+ | |701 |
||
+ | |723 |
||
+ | |TM helix |
||
+ | |701 |
||
+ | |723 |
||
+ | |TM helix |
||
+ | |702 |
||
+ | |722 |
||
+ | |TM helix |
||
+ | |702 |
||
+ | |722 |
||
+ | |TM helix |
||
|- |
|- |
||
|724 |
|724 |
||
+ | |770 |
||
+ | |inside |
||
+ | |724 |
||
+ | |770 |
||
+ | |inside |
||
+ | |724 |
||
+ | |770 |
||
+ | |inside |
||
+ | |723 |
||
+ | |770 |
||
+ | |inside |
||
+ | |723 |
||
|770 |
|770 |
||
|inside |
|inside |
||
|- |
|- |
||
|} |
|} |
||
+ | <br><br> |
||
+ | On the table above, you can see the summary of the results of the different methods which predict transmembrane helices. As you can see on this table, OCTOPUS often predicts a transmembrane helix, although all other methods do not predict one. Phobis, PolyPhobius and SPOCTOPUS show always very similar result, whereas TMHMM and OCTOPUS differ from these results.<br><br> |
||
+ | ==== Signal Peptide ==== |
||
− | TMHMM predicts on transmembrane helix at the end of the protein. As we already know is A4_HUMAN a single-spanning transmembrane protein and therefore the numbers of transmembrane helices is right predicted. |
||
− | As next step we compared the position of the transmembrane helix. |
||
− | |||
− | [[Image:a4_human_tmhmm_vs_real.png|center|thumb|Comparison between real occuring transmembrane helices and the TMHMM result.]] |
||
− | |||
− | The result of the TMHMM prediction is pretty well. Except of the first residues at the beginning and the exact start position of the transmembrane helix, the prediction is correct. |
||
− | |||
− | |||
− | === Phobius and PolyPhobius === |
||
− | |||
− | * Hexosamidase A |
||
− | |||
− | [[Image:phobius.png|thumb|Prediction of Phobius for the transmembrane helices and signal peptides of HEXA_HUMAN]] |
||
− | [[Image:polyphobius.png|thumb|Prediction of PolyPhobius for the transmembrane helices and signal peptides of HEXA_HUMAN]] |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | | |
||
− | !colspan="3"|'''Phobius''' |
||
− | |colspan=" |
+ | |colspan="2" | Phobius |
+ | |colspan="2" | PolyPhobius |
||
+ | |colspan="2" | SPOCTOPUS |
||
+ | |colspan="1" | TargetP |
||
+ | |colspan="2" | SignalP |
||
|- |
|- |
||
+ | |protein |
||
|start position |
|start position |
||
|end position |
|end position |
||
− | |prediction |
||
|start position |
|start position |
||
|end position |
|end position |
||
+ | |start position |
||
− | |prediction |
||
+ | |end position |
||
+ | |location |
||
+ | |start position |
||
+ | |end position |
||
|- |
|- |
||
+ | |HEXA HUMAN |
||
− | !colspan="6" | Signal peptide prediction |
||
+ | |1 |
||
+ | |22 |
||
+ | |1 |
||
+ | |19 |
||
+ | |7 |
||
+ | |21 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |22 |
||
|- |
|- |
||
+ | |BACR HALSA |
||
+ | |colspan="6" | no prediction available |
||
+ | |secretory pathway |
||
|1 |
|1 |
||
− | | |
+ | |38 |
− | | |
+ | |- |
+ | |RET4 HUMAN |
||
|1 |
|1 |
||
− | | |
+ | |18 |
+ | |1 |
||
− | |N-Region |
||
+ | |18 |
||
+ | |6 |
||
+ | |19 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |18 |
||
|- |
|- |
||
+ | |INSL5 HUMAN |
||
+ | |1 |
||
+ | |22 |
||
+ | |1 |
||
+ | |22 |
||
|6 |
|6 |
||
+ | |23 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |22 |
||
+ | |- |
||
+ | |LAMP1 HUMAN |
||
+ | |1 |
||
+ | |28 |
||
+ | |1 |
||
+ | |28 |
||
+ | |12 |
||
+ | |29 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |28 |
||
+ | |- |
||
+ | |A4 HUMAN |
||
+ | |1 |
||
|17 |
|17 |
||
+ | |1 |
||
− | |H-Region |
||
− | | |
+ | |17 |
+ | |5 |
||
+ | |18 |
||
+ | |secretory pathway |
||
+ | |1 |
||
|15 |
|15 |
||
− | |H-Region |
||
|- |
|- |
||
− | | |
+ | |} |
+ | <br> |
||
− | |22 |
||
+ | In the last table there is a list with the results of the prediction of the signal peptides created by different methods. As we can see on the first look, all methods predict always a signal peptide, although the stop position of this signal differ. Phobius, PolyPhobius and SPOCTOPUS failed by predicting the signal peptide from BACR_HALSA. Furthermore, TargetP do not predict the position of the signal peptide, instead it only predicts the location of the protein.<br><br> |
||
− | |C-Region |
||
+ | |||
+ | === Comparison of the different methods === |
||
+ | <br><br> |
||
+ | We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both. |
||
+ | <br><br> |
||
+ | * Comparison of transmembrane helix prediction |
||
+ | <br><br> |
||
+ | Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong. |
||
+ | <br> |
||
+ | For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues. |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |rowspan="2" | |
||
+ | |rowspan="2" | |
||
+ | |colspan="5" | methods |
||
+ | |rowspan="1" | |
||
+ | |- |
||
+ | |TMHMM |
||
+ | |Phobius |
||
+ | |PolyPhobius |
||
+ | |OCTOPUS |
||
+ | |SPOCTOPUS |
||
+ | |Transmembrane protein |
||
+ | |- |
||
+ | |rowspan="5" | HEXA_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong insde |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | BACR_HALSA |
||
+ | |#wrong transmembrane |
||
+ | |24 |
||
+ | |20 |
||
+ | |12 |
||
|16 |
|16 |
||
+ | |11 |
||
+ | |rowspan="5" | yes (7 transmembrane helices) |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |46 |
||
+ | |5 |
||
+ | |3 |
||
+ | |4 |
||
+ | |6 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |4 |
||
+ | |4 |
||
+ | |2 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |74 |
||
+ | |29 |
||
+ | |17 |
||
+ | |20 |
||
+ | |17 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |29% |
||
+ | |11% |
||
+ | |6% |
||
+ | |8% |
||
+ | |6% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | RET4_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |5 |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |5 |
||
+ | |0 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |2% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | INSL5_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |10 |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |10 |
||
+ | |0 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |8% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | LAMP1_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |5 |
||
+ | |3 |
||
+ | |4 |
||
+ | |3 |
||
+ | |1 |
||
+ | |rowspan="5" | yes (single-spanning) |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |2 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |7 |
||
+ | |3 |
||
+ | |4 |
||
+ | |5 |
||
+ | |3 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |2% |
||
+ | |0% |
||
+ | |1% |
||
+ | |1% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | A4_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |rowspan="5" | yes (single-spanning) |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |2 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |2 |
||
+ | |3 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | Average number of wrong predicted residues |
||
+ | |- |
||
+ | | |
||
+ | | |
||
+ | |13.6 |
||
+ | |5.5 |
||
+ | |3.6 |
||
+ | |7 |
||
+ | |3.8 |
||
+ | | |
||
+ | |} |
||
+ | |||
+ | TMHMM is the worst prediction method. This can also be seen on the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. |
||
+ | SPOCTOPUS and PolyPhobius are the best prediction methods.<br><br> |
||
+ | In general, the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein. |
||
+ | <br><br> |
||
+ | * Comparison of signal peptide prediction |
||
+ | <br><br> |
||
+ | Now we compared TargetP and SignalP which only predict signal peptides. Furthermore, we compared SPOCTOPUS, Phobius and PolyPhobius. |
||
+ | TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein. |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |rowspan="2" | |
||
+ | |rowspan="2" | |
||
+ | |colspan="6" | methods |
||
+ | |- |
||
+ | |real position |
||
+ | |Phobius |
||
+ | |PolyPhobius |
||
+ | |SPOCTOPUS |
||
+ | |TargetP |
||
+ | |SignalP |
||
+ | |- |
||
+ | |rowspan="3" | HEXA_HUMAN |
||
+ | |stop position |
||
+ | |22 |
||
+ | |22 |
||
|19 |
|19 |
||
+ | |21 |
||
− | |C-Region |
||
+ | |no prediction |
||
+ | |22 |
||
|- |
|- |
||
+ | |#wrong residues |
||
− | !colspan="6" | Summary signal peptide |
||
+ | | |
||
+ | |0 |
||
+ | |3 |
||
+ | |3 |
||
+ | |no prediction |
||
+ | |0 |
||
|- |
|- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | BACR_HALSA |
||
+ | |stop position |
||
+ | |not available |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no consensus prediction |
||
+ | |- |
||
+ | |#wrong predicted |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |no prediction |
||
+ | |not available |
||
+ | |- |
||
+ | |location |
||
+ | |membrane |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |secretory pathway |
||
+ | |non-signal peptide |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | RET4_HUMAN |
||
+ | |stop position |
||
+ | |18 |
||
+ | |18 |
||
+ | |18 |
||
+ | |19 |
||
+ | |no prediction |
||
+ | |18 |
||
+ | |- |
||
+ | |#wrong predicted |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
|1 |
|1 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | |||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | INSL5_HUMAN |
||
+ | |stop position |
||
|22 |
|22 |
||
+ | |22 |
||
− | |Signal Peptide |
||
+ | |22 |
||
+ | |22 |
||
+ | |no prediction |
||
+ | |22 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | LAMP1_HUMAN |
||
+ | |stop position |
||
+ | |28 |
||
+ | |28 |
||
+ | |28 |
||
+ | |29 |
||
+ | |no prediction |
||
+ | |28 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
|1 |
|1 |
||
+ | |no prediction |
||
− | |19 |
||
+ | |0 |
||
− | |Signal Peptide |
||
|- |
|- |
||
+ | |location |
||
− | !colspan="6" | Transmembrane helices prediction |
||
+ | |transmembrane helix |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
|- |
|- |
||
+ | !colspan="8" | |
||
− | |23 |
||
− | | |
+ | |- |
+ | |rowspan="3" | A4_HUMAN |
||
− | |outside |
||
+ | |stop position |
||
− | |20 |
||
− | | |
+ | |17 |
+ | |17 |
||
− | |outside |
||
+ | |17 |
||
+ | |18 |
||
+ | |no prediction |
||
+ | |17 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |transmembrane helix |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |- |
||
+ | !colspan="8" | Average number of wrong prediction |
||
+ | |- |
||
+ | |rowspan="2" | |
||
+ | |sum of wrong predicted residues |
||
+ | | |
||
+ | |0 |
||
+ | |3 |
||
+ | |2 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |#right predicted locations / #predicted locations |
||
+ | | |
||
+ | |3/5 |
||
+ | |3/5 |
||
+ | |no prediction |
||
+ | |3/5 |
||
+ | |no prediction |
||
|} |
|} |
||
+ | SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. |
||
− | Both methods don't predict a transmembrane helix, which is correct, because HEXA_HUMAN is located at the lysosmal space. |
||
+ | In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.<br> |
||
− | We compared the results of Phobius and PolyPhobius with the real protein. |
||
+ | Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. |
||
+ | The results of Phobius were a little bit better than the results of PolyPhobius.<br> |
||
+ | We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysed BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA do not possess a signal peptide, but every method predicts one. Only the eukaryotic prediction method recognized a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.<br><br> |
||
+ | <br><br> |
||
+ | * Comparison of the combined methods |
||
+ | <br><br> |
||
+ | The last issue, we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons. |
||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
− | {| |
||
+ | |rowspan="2" | |
||
− | | [[Image:hexa_phobius_vs_real.png|thumb|Comparison between the prediction of Phobius and the real protein]] |
||
+ | |rowspan="2" | |
||
− | | [[Image:hexa_poly_vs_real.png|thumb|Comparison between the prediction of PolyPhobius and the real protein]] |
||
+ | |colspan="3" | methods |
||
+ | |- |
||
+ | |Phobius |
||
+ | |PolyPhobius |
||
+ | |SPOCTOPUS |
||
+ | |- |
||
+ | |rowspan="3" | HEXA_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |3 |
||
+ | |2 |
||
+ | |- |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | BACR_HALSA |
||
+ | |#wrong predicted residues (TM) |
||
+ | |29 |
||
+ | |17 |
||
+ | |17 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |n.a. |
||
+ | |n.a. |
||
+ | |n.a. |
||
+ | |- |
||
+ | |location |
||
+ | |n.a |
||
+ | |n.a |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | RET4_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | INSL5_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |- |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | LAMP1_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |3 |
||
+ | |4 |
||
+ | |3 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |wrong |
||
+ | |wrong |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | A4_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |1 |
||
+ | |1 |
||
+ | |3 |
||
+ | |- |
||
+ | |location |
||
+ | |wrong |
||
+ | |wrong |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | Average |
||
+ | |- |
||
+ | |rowspan="3" | |
||
+ | |avg(#wrong predicted residues (TM)) |
||
+ | |5.3 |
||
+ | |3.5 |
||
+ | |3.3 |
||
+ | |- |
||
+ | |avg(#wrong predicted residues (SP)) |
||
+ | |0.1 |
||
+ | |0.6 |
||
+ | |1 |
||
+ | |- |
||
+ | |#location (right predicted) / #location(predicted) |
||
+ | |3/5 |
||
+ | |3/5 |
||
+ | |no prediction |
||
+ | |- |
||
|} |
|} |
||
+ | In general, PolyPhobius gave the best results. Although it predicts the signal peptide stop position a little bit worse than Phobius, the transmembrane prediction is significant better than by the prediction of Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.<br> |
||
− | The prediction of Phobius is a little bit better than the PolyPhobius prediction, because Phobius predicts the beginning and the end of the signal peptide totally correct, whereas PolyPhobius cuts two residues of the signal peptide. |
||
+ | Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.<br><br> |
||
− | |||
+ | <br><br> |
||
− | |||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
− | * BACR_HALSA |
||
+ | ==== Signal Peptide ==== |
||
− | [[Image:bacr_halsa_phobius.png|thumb|Prediction of Phobius for the transmembrane helices and signal peptides of BACR_HALSA]] |
||
− | [[Image:bacr_halsa_polyphobius.png|thumb|Prediction of PolyPhobius for the transmembrane helices and signal peptides of BACR_HALSA]] |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | | |
||
− | !colspan="3"|'''Phobius''' |
||
− | |colspan=" |
+ | |colspan="2" | Phobius |
+ | |colspan="2" | PolyPhobius |
||
+ | |colspan="2" | SPOCTOPUS |
||
+ | |colspan="1" | TargetP |
||
+ | |colspan="2" | SignalP |
||
|- |
|- |
||
+ | |protein |
||
|start position |
|start position |
||
|end position |
|end position |
||
− | |prediction |
||
|start position |
|start position |
||
|end position |
|end position |
||
+ | |start position |
||
− | |prediction |
||
+ | |end position |
||
+ | |location |
||
+ | |start position |
||
+ | |end position |
||
|- |
|- |
||
+ | |HEXA HUMAN |
||
− | !colspan="6" | Signal peptide prediction |
||
+ | |1 |
||
+ | |22 |
||
+ | |1 |
||
+ | |19 |
||
+ | |7 |
||
+ | |21 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |22 |
||
|- |
|- |
||
+ | |BACR HALSA |
||
− | |colspan="6" | No prediction available |
||
+ | |colspan="6" | no prediction available |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |38 |
||
|- |
|- |
||
+ | |RET4 HUMAN |
||
− | !colspan="6" | Transmembrane helices prediction |
||
+ | |1 |
||
+ | |18 |
||
+ | |1 |
||
+ | |18 |
||
+ | |6 |
||
+ | |19 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |18 |
||
|- |
|- |
||
+ | |INSL5 HUMAN |
||
+ | |1 |
||
+ | |22 |
||
+ | |1 |
||
+ | |22 |
||
+ | |6 |
||
|23 |
|23 |
||
+ | |secretory pathway |
||
− | |42 |
||
+ | |1 |
||
− | |TM helix |
||
|22 |
|22 |
||
− | |43 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |LAMP1 HUMAN |
||
− | |43 |
||
− | | |
+ | |1 |
+ | |28 |
||
− | |inside |
||
− | | |
+ | |1 |
− | | |
+ | |28 |
+ | |12 |
||
− | |inside |
||
+ | |29 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |28 |
||
|- |
|- |
||
+ | |A4 HUMAN |
||
− | |54 |
||
− | | |
+ | |1 |
+ | |17 |
||
− | |TM helix |
||
− | | |
+ | |1 |
− | | |
+ | |17 |
+ | |5 |
||
− | |TM helix |
||
+ | |18 |
||
+ | |secretory pathway |
||
+ | |1 |
||
+ | |15 |
||
|- |
|- |
||
− | | |
+ | |} |
+ | <br> |
||
− | |95 |
||
+ | In the last table there is a list with the results of the prediction of the signal peptides created by different methods.<br><br> |
||
− | |outside |
||
+ | |||
− | |78 |
||
+ | === Comparison of the different methods === |
||
− | |94 |
||
+ | <br><br> |
||
− | |outside |
||
+ | We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both. |
||
+ | <br><br> |
||
+ | * Comparison of transmembrane helix prediction |
||
+ | <br><br> |
||
+ | Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong. |
||
+ | <br> |
||
+ | For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues. |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |rowspan="2" | |
||
+ | |rowspan="2" | |
||
+ | |colspan="5" | methods |
||
+ | |rowspan="1" | |
||
|- |
|- |
||
+ | |TMHMM |
||
− | |96 |
||
+ | |Phobius |
||
− | |114 |
||
+ | |PolyPhobius |
||
− | |TM helix |
||
+ | |OCTOPUS |
||
− | |95 |
||
+ | |SPOCTOPUS |
||
− | |114 |
||
+ | |Transmembrane protein |
||
− | |TM helix |
||
|- |
|- |
||
+ | |rowspan="5" | HEXA_HUMAN |
||
− | |115 |
||
+ | |#wrong transmembrane |
||
− | |120 |
||
+ | |0 |
||
− | |inside |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
+ | |0 |
||
− | |inside |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
|- |
|- |
||
+ | |#wrong outside |
||
− | |121 |
||
− | | |
+ | |0 |
+ | |0 |
||
− | |TM helix |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
− | | |
+ | |0 |
|- |
|- |
||
+ | |#wrong insde |
||
− | |143 |
||
− | | |
+ | |0 |
+ | |0 |
||
− | |outside |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
+ | |0 |
||
− | |outside |
||
|- |
|- |
||
+ | |#wrong sum |
||
− | |148 |
||
− | | |
+ | |0 |
+ | |0 |
||
− | |TM helix |
||
− | | |
+ | |0 |
− | | |
+ | |0 |
+ | |0 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |%wrong predicted |
||
− | |170 |
||
− | | |
+ | |0% |
+ | |0% |
||
− | |inside |
||
− | | |
+ | |0% |
− | | |
+ | |0% |
+ | |0% |
||
− | |inside |
||
+ | |- |
||
+ | !colspan="8" | |
||
|- |
|- |
||
+ | |rowspan="5" | BACR_HALSA |
||
− | |190 |
||
+ | |#wrong transmembrane |
||
− | |212 |
||
+ | |24 |
||
− | |TM helix |
||
− | | |
+ | |20 |
− | | |
+ | |12 |
+ | |16 |
||
− | |TM helix |
||
+ | |11 |
||
+ | |rowspan="5" | yes (7 transmembrane helices) |
||
|- |
|- |
||
+ | |#wrong outside |
||
− | |213 |
||
− | | |
+ | |46 |
+ | |5 |
||
− | |outside |
||
− | | |
+ | |3 |
− | | |
+ | |4 |
+ | |6 |
||
− | |outside |
||
|- |
|- |
||
+ | |#wrong inside |
||
− | |218 |
||
− | | |
+ | |4 |
+ | |4 |
||
− | |TM helix |
||
− | | |
+ | |2 |
− | | |
+ | |0 |
+ | |0 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |#wrong sum |
||
− | |238 |
||
− | | |
+ | |74 |
+ | |29 |
||
− | |inside |
||
− | | |
+ | |17 |
− | | |
+ | |20 |
+ | |17 |
||
− | |inside |
||
|- |
|- |
||
+ | |%wrong predicted |
||
+ | |29% |
||
+ | |11% |
||
+ | |6% |
||
+ | |8% |
||
+ | |6% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | RET4_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |5 |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |5 |
||
+ | |0 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |2% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | INSL5_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |10 |
||
+ | |0 |
||
+ | |rowspan="5" | no |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |10 |
||
+ | |0 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |8% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | LAMP1_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |5 |
||
+ | |3 |
||
+ | |4 |
||
+ | |3 |
||
+ | |1 |
||
+ | |rowspan="5" | yes (single-spanning) |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |2 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |7 |
||
+ | |3 |
||
+ | |4 |
||
+ | |5 |
||
+ | |3 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |2% |
||
+ | |0% |
||
+ | |1% |
||
+ | |1% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="5" | A4_HUMAN |
||
+ | |#wrong transmembrane |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |rowspan="5" | yes (single-spanning) |
||
+ | |- |
||
+ | |#wrong outside |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |2 |
||
+ | |- |
||
+ | |#wrong inside |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |1 |
||
+ | |- |
||
+ | |#wrong sum |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |2 |
||
+ | |3 |
||
+ | |- |
||
+ | |%wrong predicted |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |0% |
||
+ | |- |
||
+ | !colspan="8" | Average number of wrong predicted residues |
||
+ | |- |
||
+ | | |
||
+ | | |
||
+ | |13.6 |
||
+ | |5.5 |
||
+ | |3.6 |
||
+ | |7 |
||
+ | |3.8 |
||
+ | | |
||
|} |
|} |
||
+ | TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. |
||
− | Both methods don't predict a signal peptide, but both recognize, that this protein is a transmembrane protein with seven helices. The predictions only differ at the beginning and the end of the helix positions, but the distance between these two predictions is about 1 to 3 residues. |
||
+ | SPOCTOPUS and PolyPhobius are the best prediction methods.<br><br> |
||
+ | In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein. |
||
+ | <br><br> |
||
+ | * Comparison of signal peptide prediction |
||
+ | <br><br> |
||
+ | Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius. |
||
+ | TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein. |
||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
− | To evaluate the predictions, we compared the predictions with the real occuring transmembrane helices. |
||
+ | |rowspan="2" | |
||
+ | |rowspan="2" | |
||
+ | |colspan="6" | methods |
||
+ | |- |
||
+ | |real position |
||
+ | |Phobius |
||
+ | |PolyPhobius |
||
+ | |SPOCTOPUS |
||
+ | |TargetP |
||
+ | |SignalP |
||
+ | |- |
||
+ | |rowspan="3" | HEXA_HUMAN |
||
+ | |stop position |
||
+ | |22 |
||
+ | |22 |
||
+ | |19 |
||
+ | |21 |
||
+ | |no prediction |
||
+ | |22 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |3 |
||
+ | |3 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | BACR_HALSA |
||
+ | |stop position |
||
+ | |not available |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no prediction |
||
+ | |no consensus prediction |
||
+ | |- |
||
+ | |#wrong predicted |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |no prediction |
||
+ | |not available |
||
+ | |- |
||
+ | |location |
||
+ | |membrane |
||
+ | |not available |
||
+ | |not available |
||
+ | |not available |
||
+ | |secretory pathway |
||
+ | |non-signal peptide |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | RET4_HUMAN |
||
+ | |stop position |
||
+ | |18 |
||
+ | |18 |
||
+ | |18 |
||
+ | |19 |
||
+ | |no prediction |
||
+ | |18 |
||
+ | |- |
||
+ | |#wrong predicted |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
− | {| |
||
+ | |- |
||
− | | [[Image:bacr_halsa_phobius_vs_real.png|thumb|Comparison between the prediction of Phobius and the real protein]] |
||
+ | |rowspan="3" | INSL5_HUMAN |
||
− | | [[Image:bacr_halsa_poly_vs_real.png|thumb|Comparison between the prediction of PolyPhobius and the real protein]] |
||
+ | |stop position |
||
+ | |22 |
||
+ | |22 |
||
+ | |22 |
||
+ | |22 |
||
+ | |no prediction |
||
+ | |22 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | LAMP1_HUMAN |
||
+ | |stop position |
||
+ | |28 |
||
+ | |28 |
||
+ | |28 |
||
+ | |29 |
||
+ | |no prediction |
||
+ | |28 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |transmembrane helix |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="8" | |
||
+ | |- |
||
+ | |rowspan="3" | A4_HUMAN |
||
+ | |stop position |
||
+ | |17 |
||
+ | |17 |
||
+ | |17 |
||
+ | |18 |
||
+ | |no prediction |
||
+ | |17 |
||
+ | |- |
||
+ | |#wrong residues |
||
+ | | |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |transmembrane helix |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |no prediction |
||
+ | |secretory pathway |
||
+ | |secretory pathway |
||
+ | |- |
||
+ | !colspan="8" | Average number of wrong prediction |
||
+ | |- |
||
+ | |rowspan="2" | |
||
+ | |sum of wrong predicted residues |
||
+ | | |
||
+ | |0 |
||
+ | |3 |
||
+ | |2 |
||
+ | |no prediction |
||
+ | |0 |
||
+ | |- |
||
+ | |#right predicted locations / #predicted locations |
||
+ | | |
||
+ | |3/5 |
||
+ | |3/5 |
||
+ | |no prediction |
||
+ | |3/5 |
||
+ | |no prediction |
||
|} |
|} |
||
+ | SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. |
||
− | |||
+ | In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.<br> |
||
− | === OCTOPUS and SPOCTOPUS === |
||
+ | Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. |
||
− | |||
+ | The results of Phobius were a litte bit better than the results of PolyPhobius.<br> |
||
− | *HEXA_HUMAN |
||
+ | We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.<br><br> |
||
− | [[Image:hexa_human_octopus.png|thumb|Prediction of OCTOPUS for the transmembrane helices of HEXA_HUMAN]] |
||
+ | <br><br> |
||
− | [[Image:hexa_human_spoctopus.png|thumb|Prediction of SPOCTOPUS for the transmembrane helices of HEXA_HUMAN]] |
||
+ | * Comparison of the combined methods |
||
+ | <br><br> |
||
+ | The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons. |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |rowspan="2" | |
||
− | !colspan="3"|'''OCTOPUS''' |
||
+ | |rowspan="2" | |
||
− | |colspan="3"|'''SPOCTOPUS''' |
||
+ | |colspan="3" | methods |
||
|- |
|- |
||
+ | |Phobius |
||
− | |start position |
||
+ | |PolyPhobius |
||
− | |end position |
||
+ | |SPOCTOPUS |
||
− | |prediction |
||
− | |start position |
||
− | |end position |
||
− | |prediction |
||
|- |
|- |
||
+ | |rowspan="3" | HEXA_HUMAN |
||
− | |1 |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |3 |
||
|2 |
|2 |
||
+ | |- |
||
− | |inside |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | BACR_HALSA |
||
+ | |#wrong predicted residues (TM) |
||
+ | |29 |
||
+ | |17 |
||
+ | |17 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |n.a. |
||
+ | |n.a. |
||
+ | |n.a. |
||
+ | |- |
||
+ | |location |
||
+ | |n.a |
||
+ | |n.a |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | RET4_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | INSL5_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |0 |
||
+ | |0 |
||
|1 |
|1 |
||
− | |6 |
||
− | |N-terminal of a signal peptide |
||
|- |
|- |
||
+ | |location |
||
+ | |right |
||
+ | |right |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | LAMP1_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |3 |
||
+ | |4 |
||
|3 |
|3 |
||
− | |23 |
||
− | |TM helix |
||
− | |7 |
||
− | |21 |
||
− | |signal peptide |
||
|- |
|- |
||
+ | |#wrong predicted residues (SP) |
||
− | |24 |
||
− | | |
+ | |0 |
+ | |0 |
||
− | |outside |
||
− | | |
+ | |0 |
− | | |
+ | |- |
+ | |location |
||
− | |outside |
||
+ | |wrong |
||
+ | |wrong |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | |
||
+ | |- |
||
+ | |rowspan="3" | A4_HUMAN |
||
+ | |#wrong predicted residues (TM) |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |- |
||
+ | |#wrong predicted residues (SP) |
||
+ | |1 |
||
+ | |1 |
||
+ | |3 |
||
+ | |- |
||
+ | |location |
||
+ | |wrong |
||
+ | |wrong |
||
+ | |no prediction |
||
+ | |- |
||
+ | !colspan="5" | Average |
||
+ | |- |
||
+ | |rowspan="3" | |
||
+ | |avg(#wrong predicted residues (TM)) |
||
+ | |5.3 |
||
+ | |3.5 |
||
+ | |3.3 |
||
+ | |- |
||
+ | |avg(#wrong predicted residues (SP)) |
||
+ | |0.1 |
||
+ | |0.6 |
||
+ | |1 |
||
+ | |- |
||
+ | |#location (right predicted) / #location(predicted) |
||
+ | |3/5 |
||
+ | |3/5 |
||
+ | |no prediction |
||
|- |
|- |
||
|} |
|} |
||
+ | In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.<br> |
||
− | The results of these two predictions differ. |
||
+ | Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.<br><br> |
||
− | OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts at the same location a signale peptide. |
||
− | <br> |
||
− | To check which methods predicted right, we compared the protein and the prediction. |
||
+ | == Prediction of GO terms == |
||
+ | Before we start with our analysis, we decided to check the GO annotations for the six sequences, which can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_annotation_of_the_proteins here]]: |
||
− | {| |
||
− | | [[Image:hexa_human_octopus_vs_real.png|thumb|Comparison between the prediction of OCTOPUS and the real protein]] |
||
− | | [[Image:hexa_human_spoctopus_vs_real.png|thumb|Comparison between the prediction of SPOCTOPUS and the real protein]] |
||
− | |} |
||
+ | A detailed list of the GO annotation terms of each protein can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Go_annotations_here here]]. |
||
− | SPOCTOPUS gave us the better result, because SPOCTOPUS recognices the signal peptide, whereas OCTOPUS predicts instaed a transmembrane helix. |
||
+ | |||
− | <br> |
||
+ | === Results === |
||
+ | |||
+ | We created for each protein an own result page. Sadly, it is not possible to summarize the results in a short way, so please have a look at the different result pages for a detailed output. |
||
+ | |||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_HEXA_HUMAN HEXA HUMAN]] |
||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_BACR_HALSA BACR HALSA]] |
||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_RET4_HUMAN RET4 HUMAN]] |
||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_INSL5_HUMAN INSL5 HUMAN]] |
||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_LAMP1_HUMAN LAMP1 HUMAN]] |
||
+ | *[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_A4_HUMAN A4 HUMAN]] |
||
+ | <br><br> |
||
+ | === Comparison of the different methods === |
||
− | * BACR_HALSA |
||
+ | <br><br> |
||
− | [[Image:bacr_halsa_octopus.png|thumb|Prediction of OCTOPUS for the transmembrane helices of BACR_HALSA]] |
||
+ | It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.<br> |
||
− | [[Image:bacr_halsa_spoctopus.png|thumb|Prediction of SPOCTOPUS for the transmembrane helices of BACR_HALSA]] |
||
+ | Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.<br><br> |
||
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
+ | | |
||
− | !colspan="3"|'''OCTOPUS''' |
||
+ | | |
||
− | |colspan="3"|'''SPOCTOPUS''' |
||
+ | |colspan="4" | methods |
||
|- |
|- |
||
+ | | |
||
− | |start position |
||
+ | | |
||
− | |end position |
||
+ | |GOPET terms |
||
− | |prediction |
||
+ | |GOPET GOids |
||
− | |start position |
||
+ | |Pfam |
||
− | |end position |
||
+ | |ProtFun |
||
− | |prediction |
||
|- |
|- |
||
+ | |rowspan="6" | HEXA_HUMAN |
||
+ | |#true positive |
||
+ | |7 |
||
+ | |7 |
||
+ | |2 |
||
+ | |31 |
||
+ | |- |
||
+ | |#false negative |
||
|1 |
|1 |
||
− | |22 |
||
− | |outside |
||
|1 |
|1 |
||
− | | |
+ | |0 |
+ | |3 |
||
− | |outside |
||
|- |
|- |
||
+ | |#predictions |
||
− | |23 |
||
− | | |
+ | |8 |
+ | |8 |
||
− | |TM helix |
||
− | | |
+ | |2 |
− | | |
+ | |34 |
− | |TM helix |
||
|- |
|- |
||
+ | |#GO terms |
||
− | |44 |
||
+ | |colspan="4" | 25 |
||
− | |54 |
||
− | |inside |
||
− | |44 |
||
− | |54 |
||
− | |inside |
||
|- |
|- |
||
+ | |true positive (in %) |
||
− | |55 |
||
+ | |0.87 |
||
− | |75 |
||
+ | |0.87 |
||
− | |TM helix |
||
− | | |
+ | |1 |
+ | |0.91 |
||
− | |75 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |ratio true positive/annotated GO terms |
||
− | |76 |
||
+ | |0.28 |
||
− | |95 |
||
+ | |0.28 |
||
− | |outside |
||
+ | |0.08 |
||
− | |76 |
||
+ | |not possible |
||
− | |95 |
||
− | |outside |
||
|- |
|- |
||
+ | |rowspan="6" | BACR_HALSA |
||
− | |96 |
||
+ | |#true positive |
||
− | |116 |
||
+ | |2 |
||
− | |TM helix |
||
− | | |
+ | |1 |
− | | |
+ | |1 |
+ | |30 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |#false negative |
||
− | |117 |
||
− | | |
+ | |1 |
+ | |2 |
||
− | |inside |
||
− | | |
+ | |0 |
− | | |
+ | |4 |
− | |inside |
||
|- |
|- |
||
+ | |#predictions |
||
− | |122 |
||
− | | |
+ | |3 |
+ | |3 |
||
− | |TM helix |
||
− | | |
+ | |1 |
− | | |
+ | |34 |
− | |TM helix |
||
|- |
|- |
||
+ | |#GO terms |
||
− | |143 |
||
+ | |colspan="4" | 12 |
||
− | |147 |
||
− | |outside |
||
− | |142 |
||
− | |147 |
||
− | |outside |
||
|- |
|- |
||
+ | |true positive (in %) |
||
− | |148 |
||
+ | |0.66 |
||
− | |168 |
||
+ | |0.33 |
||
− | |TM helix |
||
− | | |
+ | |1 |
+ | |0.88 |
||
− | |168 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |ratio true positive/annotated GO terms |
||
− | |169 |
||
+ | |0.16 |
||
− | |185 |
||
+ | |0.08 |
||
− | |inside |
||
+ | |0.08 |
||
− | |169 |
||
+ | |not possible |
||
− | |185 |
||
− | |inside |
||
|- |
|- |
||
+ | |rowspan="6" | RET4_HUMAN |
||
− | |186 |
||
+ | |#true positive |
||
− | |206 |
||
+ | |5 |
||
− | |TM helix |
||
− | | |
+ | |5 |
− | | |
+ | |1 |
+ | |30 |
||
− | |TM helix |
||
|- |
|- |
||
+ | |#false negative |
||
− | |207 |
||
− | | |
+ | |3 |
+ | |3 |
||
− | |outside |
||
− | | |
+ | |0 |
− | | |
+ | |4 |
− | |outside |
||
|- |
|- |
||
+ | |#predictions |
||
− | |217 |
||
− | | |
+ | |8 |
+ | |8 |
||
− | |TM helix |
||
− | | |
+ | |1 |
− | | |
+ | |34 |
− | |TM helix |
||
|- |
|- |
||
+ | |#GO terms |
||
− | |238 |
||
+ | |colspan="4" | 41 |
||
− | |262 |
||
− | |inside |
||
− | |238 |
||
− | |262 |
||
− | |inside |
||
|- |
|- |
||
+ | |true positive (in %) |
||
+ | |0.62 |
||
+ | |0.62 |
||
+ | |1 |
||
+ | |0.88 |
||
+ | |- |
||
+ | |ratio true positive/annotated GO terms |
||
+ | |0.12 |
||
+ | |0.12 |
||
+ | |0.02 |
||
+ | |not possible |
||
+ | |- |
||
+ | |rowspan="6" | INSL5_HUMAN |
||
+ | |#true positive |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |32 |
||
+ | |- |
||
+ | |#false negative |
||
+ | |0 |
||
+ | |0 |
||
+ | |0 |
||
+ | |2 |
||
+ | |- |
||
+ | |#predictions |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |34 |
||
+ | |- |
||
+ | |#GO terms |
||
+ | |colspan="4" | 4 |
||
+ | |- |
||
+ | |true positive (in %) |
||
+ | |1 |
||
+ | |1 |
||
+ | |1 |
||
+ | |0.94 |
||
+ | |- |
||
+ | |ratio true positive/annotated GO terms |
||
+ | |0.25 |
||
+ | |0.25 |
||
+ | |0.25 |
||
+ | |not possible |
||
+ | |- |
||
+ | |rowspan="6" | LAMP1_HUMAN |
||
+ | |#true positive |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |33 |
||
+ | |- |
||
+ | |#false negative |
||
+ | |2 |
||
+ | |2 |
||
+ | |0 |
||
+ | |1 |
||
+ | |- |
||
+ | |#predictions |
||
+ | |2 |
||
+ | |2 |
||
+ | |1 |
||
+ | |34 |
||
+ | |- |
||
+ | |#GO terms |
||
+ | |colspan="4" | 17 |
||
+ | |- |
||
+ | |true positive (in %) |
||
+ | |0 |
||
+ | |0 |
||
+ | |1 |
||
+ | |0.97 |
||
+ | |- |
||
+ | |ratio true positive/annotated GO terms |
||
+ | |0 |
||
+ | |0 |
||
+ | |0.05 |
||
+ | |not possible |
||
+ | |- |
||
+ | |rowspan="6" | A4_HUMAN |
||
+ | |#true positive |
||
+ | |7 |
||
+ | |7 |
||
+ | |6 |
||
+ | |33 |
||
+ | |- |
||
+ | |#false negative |
||
+ | |6 |
||
+ | |6 |
||
+ | |0 |
||
+ | |1 |
||
+ | |- |
||
+ | |#predictions |
||
+ | |13 |
||
+ | |13 |
||
+ | |6 |
||
+ | |34 |
||
+ | |- |
||
+ | |#GO terms |
||
+ | |colspan="4" | 78 |
||
+ | |- |
||
+ | |true positive (in %) |
||
+ | |0.53 |
||
+ | |0.53 |
||
+ | |1 |
||
+ | |0.97 |
||
+ | |- |
||
+ | |ratio true positive/annotated GO terms |
||
+ | |0.08 |
||
+ | |0.08 |
||
+ | |0.07 |
||
+ | |not possible |
||
|} |
|} |
||
+ | As you can see in the table above, each method only predicts a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the only method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive. Furthermore, it also predicts more GO terms than the other methods.<br> |
||
− | Both methods have a very very similar result, which is identical with the exception of some residues. Both predicted the seven transmembrane helices, which is a very good result. |
||
+ | It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms. <br> |
||
− | <br> |
||
+ | In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.<br><br> |
||
− | Next we analysed the overlap between the real occuring transmembrane helices and the prediction results. |
||
+ | Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br> |
||
− | |||
− | {| |
||
− | | [[Image:bacr_halsa_octopus_vs_real.png|thumb|Comparison between the prediction of OCTOPUS and the real protein]] |
||
− | | [[Image:bacr_halsa_spoctopus_vs_real.png|thumb|Comparison between the prediction of SPOCTOPUS and the real protein]] |
||
− | |} |
||
− | |||
− | == Prediction of GO terms == |
Latest revision as of 22:27, 30 August 2011
Contents
General Information
Secondary Structure Prediction
To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods give similar results or if they differ extremely.
[Here] you can find some general information about these methods.
Back to [Tay-Sachs Disease]
Prediction of disordered regions
After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.
To get more insight into the methods and the theory behind them we also offer you an [general information page].
Back to [Tay-Sachs Disease]
Prediction of transmembrane helices and signal peptides
The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.
Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.
To have a closer look at the different methods we again provide an [information page.]
Back to [Tay-Sachs Disease]
Prediction of GO Terms
The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.
Again we also provide an [general information page] about the GO Term methods, we used in our analysis.
Back to [Tay-Sachs Disease]
Secondary Structure prediction
Results
The detailed output of the different prediction methods can be found [here]
Here we only present a short summary of the output of the different methods.
- Predicted Helices
method | #helices |
PSIPRED | 14 |
Jpred3 | 14 |
DSSP | 16 |
- Predicted Beta-Sheets
method | #sheets |
PSIPRED | 15 |
Jpred3 | 15 |
DSSP | 0 |
Comparison of the different methods
To determine how successful our secondary structure prediction with PSIPRED and Jpred were, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore, the main comparison in this case refers to the alpha-helices.
For PSIPRED the prediction of the alpha-helices was good. In most cases the alpha-helices of DSSP and PSIPRED correspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.
For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not assigned by DSSP. In contrary, there are three small helices which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assigns two helices which are separated by a turn and Jpred predicts there only one big helix.
All in all, the prediction of the helices is probably good because they correspond mostly with the assignment of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.
Back to [Tay-Sachs Disease]
Prediction of disordered regions
Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disordered regions. Therefore, we search our protein in the [DisProt database] and did not find it, so our protein does not have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check [UniProt], if there is an entry for [DisProt].
Results
The detailed results of the different methods can be found [here]
In this section, we only want to give a summary of the output of the different methods.
method | #disordered regions in the protein | #disordered regions on the brink |
Disopred | 0 | 2 |
POODLE-I | 3 | 2 |
POODLE-L | 0 | 0 |
POODLE-S (B-factors) | 3 | 2 |
POODLE-S (missing residues) | 4 | 2 |
IUPred (short) | 0 | 2 |
IUPred (long) | 0 | 0 |
IUPred (structural information) | 0 | 0 |
Meta-Disorder | 0 | 0 |
Comparison of the different POODLE variants
POODLE-L does not find any disordered regions. This is the result we expected, because our protein does not possess any disordered regions.
Both POODLE-S variants found several short disordered regions, which is a false positive result. Interestingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.
In general, these predictions are used, if nothing is known about the protein. Therefore, normally we do not know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.
functional residues | disordered | |||
---|---|---|---|---|
residue position | amino acid | function | POODLE-S (missing) | POODLE-I |
323 | E | active site | ordered | ordered |
115 | N | Glycolysation | ordered | ordered |
157 | N | Glycolysation | ordered | ordered |
259 | N | Glycolysation | ordered | ordered |
58 (connected with 104) | C | Disulfide bond | disordered | ordered |
104 (connected with 58) | C | Disulfide bond | disordered | ordered |
277 (connected with 328) | C | Disulfide bond | ordered | ordered |
328 (connected with 277) | C | Disulfide bond | ordered | ordered |
505 (connected with 522) | C | Disulfide bond | ordered | ordered |
522 (connected with 505) | C | Disulfide bond | ordered | ordered |
As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.
Comparison of the different methods
We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.
methods | |||||||||
Disopred | POODLE-I | POODLE-L | POODLE-S (missing) | POODLE-S (B-factor) | IUPred (short) | IUPred (long) | IUPred (structure) | Meta-Disorder | |
#wrong predicted residues | 5 | 23 | 0 | 47 | 24 | 3 | 0 | 0 | 0 |
POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct.
The worst prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).
Back to [Tay-Sachs Disease]
Prediction of transmembrane alpha-helices and signal peptides
Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.
Additional proteins:
name | organism | location | transmembrane protein | sequence |
BACR_HALSA | Halobacterium salinarium (Archaea) | Cell membrane | Multi-pass membrane protein | [P02945.fasta] |
RET4_HUMAN | Human (Homo sapiens) | extracellular space | No | [P02753.fasta] |
INSL5_HUMAN | Human (Homo sapiens) | extracellular region | No | [Q9Y5Q6.fasta] |
LAMP1_HUMAN | Human (Homo sapiens) | Cell membrane | Single-pass membrane protein | [P11279.fasta] |
A4_HUMAN | Human (Homo sapiens) | Cell membrane | Single-pass membrane protein | [P05067.fasta] |
The detailed output for the different organism and the different prediction methods can be found here:
- [HEXA_HUMAN]
- [BACR_HALSA]
- [RET4_HUMAN]
- [INSL5_HUMAN]
- [LAMP1_HUMAN]
- [A4_HUMAN]
Results
Transmembrane Helices
TMHMM | Phobius | PolyPhobius | OCTOPUS | SPOCTOPUS | |||||||||||
protein | start position | end position | location | start position | end position | location | start position | end position | location | start position | end position | location | start position | end position | location |
HEXA HUMAN | 1 | 529 | outside | 23 | 529 | outside | 20 | 520 | outside | 1 | 2 | inside | 22 | 529 | outside |
3 | 23 | TM helix | |||||||||||||
24 | 529 | outside | |||||||||||||
BACR HALSA | 1 | 22 | outside | 1 | 22 | outside | 1 | 22 | outside | ||||||
23 | 42 | TM Helix | 23 | 42 | TM helix | 22 | 43 | TM helix | 23 | 43 | TM helix | 23 | 43 | TM helix | |
43 | 54 | inside | 43 | 53 | inside | 44 | 54 | inside | 44 | 54 | inside | 44 | 54 | inside | |
55 | 77 | TM Helix | 54 | 76 | TM helix | 55 | 77 | TM helix | 55 | 75 | TM helix | 55 | 75 | TM helix | |
78 | 91 | outside | 77 | 95 | outside | 78 | 94 | outside | 76 | 95 | outside | 76 | 95 | outside | |
92 | 114 | TM Helix | 96 | 114 | TM helix | 95 | 114 | TM helix | 96 | 116 | TM helix | 96 | 116 | TM helix | |
115 | 120 | inside | 115 | 120 | inside | 115 | 120 | inside | 117 | 121 | inside | 117 | 120 | inside | |
121 | 143 | TM Helix | 121 | 142 | TM helix | 121 | 141 | TM helix | 122 | 142 | TM helix | 121 | 141 | TM helix | |
144 | 147 | outside | 143 | 147 | outside | 142 | 147 | outside | 143 | 147 | outside | 142 | 147 | outside | |
148 | 170 | TM Helix | 148 | 169 | TM helix | 148 | 166 | TM helix | 148 | 168 | TM helix | 148 | 168 | TM helix | |
171 | 189 | inside | 170 | 189 | inside | 167 | 186 | inside | 169 | 185 | inside | 169 | 185 | inside | |
190 | 212 | TM Helix | 190 | 212 | TM helix | 187 | 205 | TM helix | 186 | 206 | TM helix | 186 | 206 | TM helix | |
213 | 262 | outside | 213 | 217 | outside | 206 | 215 | outside | 207 | 216 | outside | 207 | 216 | outside | |
218 | 237 | TM helix | 216 | 237 | TM helix | 217 | 237 | TM helix | 217 | 237 | TM helix | ||||
238 | 262 | inside | 238 | 262 | inside | 238 | 262 | inside | 238 | 262 | inside | ||||
RET4 HUMAN | 1 | 1 | inside | ||||||||||||
2 | 23 | TM helix | |||||||||||||
1 | 201 | outside | 19 | 201 | outside | 19 | 201 | outside | 24 | 201 | outside | 20 | 201 | outside | |
INSL5 HUMAN | 1 | 1 | inside | ||||||||||||
2 | 32 | TM helix | |||||||||||||
1 | 135 | outside | 23 | 135 | outside | 23 | 135 | outside | 33 | 135 | outside | 24 | 135 | outside | |
LAMP1 HUMAN | 1 | 10 | inside | 1 | 10 | inside | |||||||||
11 | 33 | TM Helix | 11 | 31 | TM helix | ||||||||||
34 | 383 | outside | 29 | 381 | outside | 29 | 381 | outside | 32 | 383 | outside | 30 | 383 | outside | |
384 | 406 | TM Helix | 382 | 405 | TM helix | 382 | 405 | TM helix | 384 | 404 | TM helix | 384 | 404 | TM helix | |
407 | 417 | inside | 406 | 417 | outside | 406 | 417 | outside | 405 | 417 | outside | 405 | 417 | outside | |
A4 HUMAN | 1 | 5 | outside | ||||||||||||
6 | 11 | R | |||||||||||||
1 | 700 | outside | 18 | 700 | outside | 18 | 700 | outside | 12 | 701 | outside | 19 | 701 | outside | |
701 | 723 | TM Helix | 701 | 723 | TM helix | 701 | 723 | TM helix | 702 | 722 | TM helix | 702 | 722 | TM helix | |
724 | 770 | inside | 724 | 770 | inside | 724 | 770 | inside | 723 | 770 | inside | 723 | 770 | inside |
On the table above, you can see the summary of the results of the different methods which predict transmembrane helices. As you can see on this table, OCTOPUS often predicts a transmembrane helix, although all other methods do not predict one. Phobis, PolyPhobius and SPOCTOPUS show always very similar result, whereas TMHMM and OCTOPUS differ from these results.
Signal Peptide
Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | |||||
protein | start position | end position | start position | end position | start position | end position | location | start position | end position |
HEXA HUMAN | 1 | 22 | 1 | 19 | 7 | 21 | secretory pathway | 1 | 22 |
BACR HALSA | no prediction available | secretory pathway | 1 | 38 | |||||
RET4 HUMAN | 1 | 18 | 1 | 18 | 6 | 19 | secretory pathway | 1 | 18 |
INSL5 HUMAN | 1 | 22 | 1 | 22 | 6 | 23 | secretory pathway | 1 | 22 |
LAMP1 HUMAN | 1 | 28 | 1 | 28 | 12 | 29 | secretory pathway | 1 | 28 |
A4 HUMAN | 1 | 17 | 1 | 17 | 5 | 18 | secretory pathway | 1 | 15 |
In the last table there is a list with the results of the prediction of the signal peptides created by different methods. As we can see on the first look, all methods predict always a signal peptide, although the stop position of this signal differ. Phobius, PolyPhobius and SPOCTOPUS failed by predicting the signal peptide from BACR_HALSA. Furthermore, TargetP do not predict the position of the signal peptide, instead it only predicts the location of the protein.
Comparison of the different methods
We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.
- Comparison of transmembrane helix prediction
Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.
methods | |||||||
TMHMM | Phobius | PolyPhobius | OCTOPUS | SPOCTOPUS | Transmembrane protein | ||
HEXA_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong insde | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 0 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
BACR_HALSA | #wrong transmembrane | 24 | 20 | 12 | 16 | 11 | yes (7 transmembrane helices) |
#wrong outside | 46 | 5 | 3 | 4 | 6 | ||
#wrong inside | 4 | 4 | 2 | 0 | 0 | ||
#wrong sum | 74 | 29 | 17 | 20 | 17 | ||
%wrong predicted | 29% | 11% | 6% | 8% | 6% | ||
RET4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 5 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 5 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 2% | 0% | ||
INSL5_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 10 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 10 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 8% | 0% | ||
LAMP1_HUMAN | #wrong transmembrane | 5 | 3 | 4 | 3 | 1 | yes (single-spanning) |
#wrong outside | 2 | 0 | 0 | 1 | 1 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 7 | 3 | 4 | 5 | 3 | ||
%wrong predicted | 2% | 0% | 1% | 1% | 0% | ||
A4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | yes (single-spanning) |
#wrong outside | 1 | 1 | 1 | 1 | 2 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 1 | 1 | 1 | 2 | 3 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
Average number of wrong predicted residues | |||||||
13.6 | 5.5 | 3.6 | 7 | 3.8 |
TMHMM is the worst prediction method. This can also be seen on the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
SPOCTOPUS and PolyPhobius are the best prediction methods.
In general, the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
- Comparison of signal peptide prediction
Now we compared TargetP and SignalP which only predict signal peptides. Furthermore, we compared SPOCTOPUS, Phobius and PolyPhobius.
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
methods | |||||||
real position | Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | ||
HEXA_HUMAN | stop position | 22 | 22 | 19 | 21 | no prediction | 22 |
#wrong residues | 0 | 3 | 3 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
BACR_HALSA | stop position | not available | no prediction | no prediction | no prediction | no prediction | no consensus prediction |
#wrong predicted | not available | not available | not available | not available | no prediction | not available | |
location | membrane | not available | not available | not available | secretory pathway | non-signal peptide | |
RET4_HUMAN | stop position | 18 | 18 | 18 | 19 | no prediction | 18 |
#wrong predicted | 0 | 0 | 1 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
INSL5_HUMAN | stop position | 22 | 22 | 22 | 22 | no prediction | 22 |
#wrong residues | 0 | 0 | 0 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
LAMP1_HUMAN | stop position | 28 | 28 | 28 | 29 | no prediction | 28 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
A4_HUMAN | stop position | 17 | 17 | 17 | 18 | no prediction | 17 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | secretory pathway | |
Average number of wrong prediction | |||||||
sum of wrong predicted residues | 0 | 3 | 2 | no prediction | 0 | ||
#right predicted locations / #predicted locations | 3/5 | 3/5 | no prediction | 3/5 | no prediction |
SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not.
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
The results of Phobius were a little bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysed BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA do not possess a signal peptide, but every method predicts one. Only the eukaryotic prediction method recognized a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.
- Comparison of the combined methods
The last issue, we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
methods | ||||
Phobius | PolyPhobius | SPOCTOPUS | ||
HEXA_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 3 | 2 | |
location | right | right | no prediction | |
BACR_HALSA | #wrong predicted residues (TM) | 29 | 17 | 17 |
#wrong predicted residues (SP) | n.a. | n.a. | n.a. | |
location | n.a | n.a | no prediction | |
RET4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | right | right | no prediction | |
INSL5_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 1 | |
location | right | right | no prediction | |
LAMP1_HUMAN | #wrong predicted residues (TM) | 3 | 4 | 3 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | wrong | wrong | no prediction | |
A4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 1 | 1 | 3 | |
location | wrong | wrong | no prediction | |
Average | ||||
avg(#wrong predicted residues (TM)) | 5.3 | 3.5 | 3.3 | |
avg(#wrong predicted residues (SP)) | 0.1 | 0.6 | 1 | |
#location (right predicted) / #location(predicted) | 3/5 | 3/5 | no prediction |
In general, PolyPhobius gave the best results. Although it predicts the signal peptide stop position a little bit worse than Phobius, the transmembrane prediction is significant better than by the prediction of Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.
Back to [Tay-Sachs Disease]
Signal Peptide
Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | |||||
protein | start position | end position | start position | end position | start position | end position | location | start position | end position |
HEXA HUMAN | 1 | 22 | 1 | 19 | 7 | 21 | secretory pathway | 1 | 22 |
BACR HALSA | no prediction available | secretory pathway | 1 | 38 | |||||
RET4 HUMAN | 1 | 18 | 1 | 18 | 6 | 19 | secretory pathway | 1 | 18 |
INSL5 HUMAN | 1 | 22 | 1 | 22 | 6 | 23 | secretory pathway | 1 | 22 |
LAMP1 HUMAN | 1 | 28 | 1 | 28 | 12 | 29 | secretory pathway | 1 | 28 |
A4 HUMAN | 1 | 17 | 1 | 17 | 5 | 18 | secretory pathway | 1 | 15 |
In the last table there is a list with the results of the prediction of the signal peptides created by different methods.
Comparison of the different methods
We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.
- Comparison of transmembrane helix prediction
Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.
methods | |||||||
TMHMM | Phobius | PolyPhobius | OCTOPUS | SPOCTOPUS | Transmembrane protein | ||
HEXA_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong insde | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 0 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
BACR_HALSA | #wrong transmembrane | 24 | 20 | 12 | 16 | 11 | yes (7 transmembrane helices) |
#wrong outside | 46 | 5 | 3 | 4 | 6 | ||
#wrong inside | 4 | 4 | 2 | 0 | 0 | ||
#wrong sum | 74 | 29 | 17 | 20 | 17 | ||
%wrong predicted | 29% | 11% | 6% | 8% | 6% | ||
RET4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 5 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 5 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 2% | 0% | ||
INSL5_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 10 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 10 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 8% | 0% | ||
LAMP1_HUMAN | #wrong transmembrane | 5 | 3 | 4 | 3 | 1 | yes (single-spanning) |
#wrong outside | 2 | 0 | 0 | 1 | 1 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 7 | 3 | 4 | 5 | 3 | ||
%wrong predicted | 2% | 0% | 1% | 1% | 0% | ||
A4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | yes (single-spanning) |
#wrong outside | 1 | 1 | 1 | 1 | 2 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 1 | 1 | 1 | 2 | 3 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
Average number of wrong predicted residues | |||||||
13.6 | 5.5 | 3.6 | 7 | 3.8 |
TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
SPOCTOPUS and PolyPhobius are the best prediction methods.
In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
- Comparison of signal peptide prediction
Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius.
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
methods | |||||||
real position | Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | ||
HEXA_HUMAN | stop position | 22 | 22 | 19 | 21 | no prediction | 22 |
#wrong residues | 0 | 3 | 3 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
BACR_HALSA | stop position | not available | no prediction | no prediction | no prediction | no prediction | no consensus prediction |
#wrong predicted | not available | not available | not available | not available | no prediction | not available | |
location | membrane | not available | not available | not available | secretory pathway | non-signal peptide | |
RET4_HUMAN | stop position | 18 | 18 | 18 | 19 | no prediction | 18 |
#wrong predicted | 0 | 0 | 1 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
INSL5_HUMAN | stop position | 22 | 22 | 22 | 22 | no prediction | 22 |
#wrong residues | 0 | 0 | 0 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
LAMP1_HUMAN | stop position | 28 | 28 | 28 | 29 | no prediction | 28 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
A4_HUMAN | stop position | 17 | 17 | 17 | 18 | no prediction | 17 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | secretory pathway | |
Average number of wrong prediction | |||||||
sum of wrong predicted residues | 0 | 3 | 2 | no prediction | 0 | ||
#right predicted locations / #predicted locations | 3/5 | 3/5 | no prediction | 3/5 | no prediction |
SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not.
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.
- Comparison of the combined methods
The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
methods | ||||
Phobius | PolyPhobius | SPOCTOPUS | ||
HEXA_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 3 | 2 | |
location | right | right | no prediction | |
BACR_HALSA | #wrong predicted residues (TM) | 29 | 17 | 17 |
#wrong predicted residues (SP) | n.a. | n.a. | n.a. | |
location | n.a | n.a | no prediction | |
RET4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | right | right | no prediction | |
INSL5_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 1 | |
location | right | right | no prediction | |
LAMP1_HUMAN | #wrong predicted residues (TM) | 3 | 4 | 3 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | wrong | wrong | no prediction | |
A4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 1 | 1 | 3 | |
location | wrong | wrong | no prediction | |
Average | ||||
avg(#wrong predicted residues (TM)) | 5.3 | 3.5 | 3.3 | |
avg(#wrong predicted residues (SP)) | 0.1 | 0.6 | 1 | |
#location (right predicted) / #location(predicted) | 3/5 | 3/5 | no prediction |
In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.
Prediction of GO terms
Before we start with our analysis, we decided to check the GO annotations for the six sequences, which can be found [here]:
A detailed list of the GO annotation terms of each protein can be found [here].
Results
We created for each protein an own result page. Sadly, it is not possible to summarize the results in a short way, so please have a look at the different result pages for a detailed output.
- [HEXA HUMAN]
- [BACR HALSA]
- [RET4 HUMAN]
- [INSL5 HUMAN]
- [LAMP1 HUMAN]
- [A4 HUMAN]
Comparison of the different methods
It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.
methods | |||||
GOPET terms | GOPET GOids | Pfam | ProtFun | ||
HEXA_HUMAN | #true positive | 7 | 7 | 2 | 31 |
#false negative | 1 | 1 | 0 | 3 | |
#predictions | 8 | 8 | 2 | 34 | |
#GO terms | 25 | ||||
true positive (in %) | 0.87 | 0.87 | 1 | 0.91 | |
ratio true positive/annotated GO terms | 0.28 | 0.28 | 0.08 | not possible | |
BACR_HALSA | #true positive | 2 | 1 | 1 | 30 |
#false negative | 1 | 2 | 0 | 4 | |
#predictions | 3 | 3 | 1 | 34 | |
#GO terms | 12 | ||||
true positive (in %) | 0.66 | 0.33 | 1 | 0.88 | |
ratio true positive/annotated GO terms | 0.16 | 0.08 | 0.08 | not possible | |
RET4_HUMAN | #true positive | 5 | 5 | 1 | 30 |
#false negative | 3 | 3 | 0 | 4 | |
#predictions | 8 | 8 | 1 | 34 | |
#GO terms | 41 | ||||
true positive (in %) | 0.62 | 0.62 | 1 | 0.88 | |
ratio true positive/annotated GO terms | 0.12 | 0.12 | 0.02 | not possible | |
INSL5_HUMAN | #true positive | 1 | 1 | 1 | 32 |
#false negative | 0 | 0 | 0 | 2 | |
#predictions | 1 | 1 | 1 | 34 | |
#GO terms | 4 | ||||
true positive (in %) | 1 | 1 | 1 | 0.94 | |
ratio true positive/annotated GO terms | 0.25 | 0.25 | 0.25 | not possible | |
LAMP1_HUMAN | #true positive | 0 | 0 | 1 | 33 |
#false negative | 2 | 2 | 0 | 1 | |
#predictions | 2 | 2 | 1 | 34 | |
#GO terms | 17 | ||||
true positive (in %) | 0 | 0 | 1 | 0.97 | |
ratio true positive/annotated GO terms | 0 | 0 | 0.05 | not possible | |
A4_HUMAN | #true positive | 7 | 7 | 6 | 33 |
#false negative | 6 | 6 | 0 | 1 | |
#predictions | 13 | 13 | 6 | 34 | |
#GO terms | 78 | ||||
true positive (in %) | 0.53 | 0.53 | 1 | 0.97 | |
ratio true positive/annotated GO terms | 0.08 | 0.08 | 0.07 | not possible |
As you can see in the table above, each method only predicts a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the only method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive. Furthermore, it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.
Back to [Tay-Sachs Disease]