Difference between revisions of "Sequence-based predictions HEXA"
(→General Information) |
(→Prediction of disordered regions) |
||
Line 14: | Line 14: | ||
== Prediction of disordered regions == |
== Prediction of disordered regions == |
||
+ | * Disopred |
||
− | * DISOPRED |
||
+ | Disopred predicts two disordered regions in our protein. The first region is at the beginning of the protein (first two residues) and the second region is at the end (last three regions). This prediction is probably wrong, because it is normal, that the electrons from the first and the last amino acids lack in the electron density map. So, our protein Hexosamidase A has no disordered regions. |
||
− | Authors: Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. |
||
+ | [[Image:disopred_result.png|center|thumb|Result of the Disopred prediction. * shows that this amino acid belongs to a disordered regions, whereas . signs for a non-disordered region.]] |
||
− | Year: 2004 |
||
− | Source: [[http://www.ncbi.nlm.nih.gov/pubmed/15019783 Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.]] |
||
+ | * POODLE |
||
− | Description: |
||
+ | We decided to test several POODLE variants and to compare the results. |
||
+ | POODLE-I |
||
− | This method is based on a neuronal network which was trained on high resolution X-ray structures from PDB. Disordered regions are regions, which appears in the sequence record, but their electrons are missing from electronic density map. This approach can also failed, because missing electrons can also arise because of the cristallization process. |
||
− | The method runs first a PsiBlast search against a filtered sequence database. Next, a profile for each residue is calculated and classified by using the trained neuronal network. |
||
+ | POODLE-I predicted five disordered regions: |
||
− | Prediction: |
||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |- |
||
+ | |start position |
||
+ | |end position |
||
+ | |length |
||
+ | |- |
||
+ | |1 |
||
+ | |2 |
||
+ | |2 |
||
+ | |- |
||
+ | |14 |
||
+ | |19 |
||
+ | |6 |
||
+ | |- |
||
+ | |83 |
||
+ | |89 |
||
+ | |7 |
||
+ | |- |
||
+ | |105 |
||
+ | |109 |
||
+ | |5 |
||
+ | |- |
||
+ | |527 |
||
+ | |529 |
||
+ | |3 |
||
+ | |- |
||
+ | |} |
||
− | As a prediction result you get a file with the predicted disordered region, the precision and recall. Furthermore you can a more detailed output. There you see the sequence, and the predictions and also numbers above the sequence (from 0 to 9 which shows you how likly your prediction is) |
||
+ | POODLE-L |
||
− | Input: |
||
− | |||
− | If you run disopred on the console, you have to define the location of your database. The program needs as input your sequence in a file with fasta format. |
||
+ | POODLE-L found no disordered regions. Therefore, there is no disordered region with a length more than 40aa in our protein. |
||
− | *POODLE |
||
− | Prediction of order and disorder by machine-learning |
||
+ | POODLE-S (High B-factor residues) |
||
− | Authors: S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi |
||
+ | TODO |
||
− | Year: 2007 |
||
+ | POODLE-S predicted five disordered regions: |
||
− | There exist three different variants of POODLE. |
||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |- |
||
+ | |start position |
||
+ | |end position |
||
+ | |length |
||
+ | |- |
||
+ | |0 |
||
+ | |2 |
||
+ | |2 |
||
+ | |- |
||
+ | |13 |
||
+ | |19 |
||
+ | |7 |
||
+ | |- |
||
+ | |83 |
||
+ | |88 |
||
+ | |6 |
||
+ | |- |
||
+ | |105 |
||
+ | |109 |
||
+ | |5 |
||
+ | |- |
||
+ | |526 |
||
+ | |529 |
||
+ | |4 |
||
+ | |- |
||
+ | |} |
||
− | The first variant is called POODLE-L which predicts mainly long disorder region with a length more than 40. |
||
+ | POODLE-S (missing residues) |
||
− | Source: [[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17545177&ordinalpos=8&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions.]] |
||
+ | POODLE-S (missing residues) predicts regions as disordered, if there is a amino acid in the sequence record, but not on the electron density map. |
||
− | The next variant is called POODLE-S, which predicts mainly short disorder regions. |
||
+ | Poodle-S found 6 disordered regions. |
||
− | Source: [[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17599940&ordinalpos=7&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.]] |
||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |- |
||
+ | |start position |
||
+ | |end position |
||
+ | |length |
||
+ | |- |
||
+ | |17 |
||
+ | |18 |
||
+ | |2 |
||
+ | |- |
||
+ | |53 |
||
+ | |61 |
||
+ | |9 |
||
+ | |- |
||
+ | |78 |
||
+ | |109 |
||
+ | |33 |
||
+ | |- |
||
+ | |153 |
||
+ | |153 |
||
+ | |1 |
||
+ | |- |
||
+ | |280 |
||
+ | |280 |
||
+ | |1 |
||
+ | |- |
||
+ | |345 |
||
+ | |345 |
||
+ | |1 |
||
+ | |- |
||
+ | |} |
||
− | The last variant is called POODLE-I, which integrates structal information predictors. |
||
+ | Graphical Output: |
||
− | Source: [[http://www.bioinfo.de/isb/2010/10/0015/ POODLE-I: Disordered region prediction by integrating POODLE series and structural information predictors based on a workflow approach]] |
||
+ | {| |
||
+ | | [[Image:POODLE_I.png|thumb|center|Prediction of POODLE-I]] |
||
+ | | [[Image:POODLE_L.png |thumb|Prediction of POODLE-L]] |
||
+ | |} |
||
+ | {| |
||
+ | | [[Image:POODLE_S_B.png|thumb|Prediction of POODLE-S (High B-factor residues)]] |
||
+ | | [[Image:POODLE_S_M.png|thumb|Prediction of POODLE-S (missing residues)]] |
||
+ | |} |
||
− | There exists als another variant called POODLE-W, which compares different sequences and predicts which sequence is the most disordered one, but this method wasn't used in our analysis. |
||
+ | TODO Comparison!!! |
||
− | Description: |
||
− | |||
− | POODLE is also a machine learning based method. This method based on a 2-level SVM (Support Vector Machine). |
||
− | |||
− | We describe here the POODLE-L in detail, but all POODLE variants use the same principle. |
||
− | The method was trained on disordered proteins and proteins with no disoredered regions. On the first level, the SVM predicts the probability of a 40-residue sequence segment to be disordered. If the algorithm found such a disordered regions, the second level of the SVM use the output from the first level and predicts the probability to be disordered for each amino acid. |
||
− | |||
− | Output: |
||
− | |||
− | The result of this method is a file with the single amino acids, the prediction if it is ordered or not and the probability for the state. Furtheremore, you get a graphical view of the result. |
||
− | |||
− | Input: |
||
− | |||
− | We used the POODLE webserver for our analysis. We paste our sequence in fasta format in the input window and chose the POODLE variant. |
||
== Prediction of transmembrane alpha-helices and signal peptides == |
== Prediction of transmembrane alpha-helices and signal peptides == |
Revision as of 15:28, 27 May 2011
Contents
- 1 General Information
- 2 Secondary Structure Prediction
- 3 Prediction of disordered regions
- 4 Prediction of transmembrane alpha-helices and signal peptides
- 5 Prediction of GO terms
- 6 Secondary Structure prediction
- 7 Prediction of disordered regions
- 8 Prediction of transmembrane alpha-helices and signal peptides
- 9 Prediction of GO terms
General Information
Secondary Structure Prediction
Prediction of disordered regions
Prediction of transmembrane alpha-helices and signal peptides
Prediction of GO terms
Secondary Structure prediction
Prediction of disordered regions
- Disopred
Disopred predicts two disordered regions in our protein. The first region is at the beginning of the protein (first two residues) and the second region is at the end (last three regions). This prediction is probably wrong, because it is normal, that the electrons from the first and the last amino acids lack in the electron density map. So, our protein Hexosamidase A has no disordered regions.
- POODLE
We decided to test several POODLE variants and to compare the results.
POODLE-I
POODLE-I predicted five disordered regions:
start position | end position | length |
1 | 2 | 2 |
14 | 19 | 6 |
83 | 89 | 7 |
105 | 109 | 5 |
527 | 529 | 3 |
POODLE-L
POODLE-L found no disordered regions. Therefore, there is no disordered region with a length more than 40aa in our protein.
POODLE-S (High B-factor residues)
TODO
POODLE-S predicted five disordered regions:
start position | end position | length |
0 | 2 | 2 |
13 | 19 | 7 |
83 | 88 | 6 |
105 | 109 | 5 |
526 | 529 | 4 |
POODLE-S (missing residues)
POODLE-S (missing residues) predicts regions as disordered, if there is a amino acid in the sequence record, but not on the electron density map.
Poodle-S found 6 disordered regions.
start position | end position | length |
17 | 18 | 2 |
53 | 61 | 9 |
78 | 109 | 33 |
153 | 153 | 1 |
280 | 280 | 1 |
345 | 345 | 1 |
Graphical Output:
TODO Comparison!!!