Difference between revisions of "Gaucher Disease: Task 03 - Lab Journal"

From Bioinformatikpedia
(Disorder)
Line 97: Line 97:
   
 
''Key for output:''
 
''Key for output:''
Number - residue number
+
Number - residue number;
Residue - amino-acid type
+
Residue - amino-acid type;
NORSnet - raw score by NORSnet (prediction of unstructured loops)
+
NORSnet - raw score by NORSnet (prediction of unstructured loops);
NORS2st - two-state prediction by NORSnet; D=disordered
+
NORS2st - two-state prediction by NORSnet; D=disordered;
PROFbval - raw score by PROFbval (prediction of residue flexibility from sequence)
+
PROFbval - raw score by PROFbval (prediction of residue flexibility from sequence);
Bval2st - two-state prediction by PROFbval
+
Bval2st - two-state prediction by PROFbval;
Ucon - raw score by Ucon (prediction of protein disorder using predicted internal contacts)
+
Ucon - raw score by Ucon (prediction of protein disorder using predicted internal contacts);
Ucon2st - two-state prediction by Ucon
+
Ucon2st - two-state prediction by Ucon;
MD - raw score by MD (prediction of protein disorder using orthogonal sources)
+
MD - raw score by MD (prediction of protein disorder using orthogonal sources);
MD_rel - reliability of the prediction by MD; values range from 0-9. 9=strong prediction
+
MD_rel - reliability of the prediction by MD; values range from 0-9. 9=strong prediction;
MD2st - two-state prediction by MD
+
MD2st - two-state prediction by MD.
   
 
NORSnet, PROFbval and Ucon raw scores are the predicted tendencies of those methods and MD_raw is the final raw score by MD. In all 2-state scores, a residue is predicted to be disordered (D), if the raw score is higher than 0.5. We wrote an R script to plot all the raw scores for each protein.
 
NORSnet, PROFbval and Ucon raw scores are the predicted tendencies of those methods and MD_raw is the final raw score by MD. In all 2-state scores, a residue is predicted to be disordered (D), if the raw score is higher than 0.5. We wrote an R script to plot all the raw scores for each protein.

Revision as of 18:40, 3 September 2013

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>


Secondary structure

ReProf uses a fasta sequence or a PSI-BLAST PSSM for prediction, PsiPred a fasta sequence and DSSP server needs a PDB file in order to use the 3D coordinates of atoms. The predictions were made for the four proteins, including the Gaucher's disease-causing protein (<xr id="secondary structure proteins"/>). If several PDB structures are available, the one covering the most UniProt sequence most similarly was chosen. For glucosylceramidase the structure 1OGS was used (as in the task 2).

<figtable id="secondary structure proteins">

Uniprot PDB
Entry Protein name Origin Length Entry Method Resolution (Å) Chain Positions
P10775 Ribonuclease inhibitor pig 456 2BNH X-ray 2.30 A 1-456
Q9X0E6 Divalent-cation tolerance protein CutA bacterium Thermotoga maritima 101 1VHF X-ray 1.54 A 2-101
Q08209 Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform, EC=3.1.3.16 human 521 1AUI X-ray 2.10 A 1-521
P04062 Glucosylceramidase/acid-beta-glucosidase, EC=3.2.1.45 human 536 1OGS X-ray 2.00 A/B 40-536
The four UniProt protein sequences and corresponding PDB structures selected for the secondary structure prediction with ReProf and PsiPred and assignment with DSSP.

</figtable>

The script of the Phenylketonuria group filter_secStruc.pl was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total, ignoring positions without a secondary structure ("-"), was calculated using the second script of the Phenylketonuria group, SecStrucComparison.jar as follows:

precision = number of matches / number of residues

<figtable id="secondary structure code">

"Secondary Structure Code"
Secondary structure ReProf PsiPred DSSP
Helix (alpha) H H GHI
Extended strand (beta) E E BE
Loops/Turns L C ST
The different types to represent secondary structure in ReProf, PsiPred and DSSP.

</figtable>

First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: big_80, 3 iterations and evalue cutoff 10E-10, which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision.)

Other used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure.

Disorder

IUPred We called IUPred with "long", "short" and "glob" modes the same four proteins (P10775, Q08209, Q9X0E6, P04062). One can generate graphical output with the IUPred server. However, for a clear view and interpretation of the results of the three options, we combined them in a single plot with an R script.


MD (MetaDisorder) The output files ending on ".md.out.mdisorder" have the following columns (example for P10775):

Number Residue NORSnet NORS2st PROFbval bval2st Ucon Ucon2st MD_raw   MD_rel  MD2st
    1   M       0.19    -       0.99    D       0.15    -       0.551   1       D
    2   N       0.11    -       0.78    D       0.21    -       0.475   1       -
    3   L       0.08    -       0.65    D       0.26    -       0.455   2       -
    4   D       0.13    -       0.68    D       0.20    -       0.414   3       -
    5   I       0.09    -       0.40    -       0.22    -       0.364   5       -
    6   H       0.14    -       0.48    D       0.29    -       0.384   4       -
    7   C       0.58    D       0.57    D       0.26    -       0.384   4       -

Key for output: Number - residue number; Residue - amino-acid type; NORSnet - raw score by NORSnet (prediction of unstructured loops); NORS2st - two-state prediction by NORSnet; D=disordered; PROFbval - raw score by PROFbval (prediction of residue flexibility from sequence); Bval2st - two-state prediction by PROFbval; Ucon - raw score by Ucon (prediction of protein disorder using predicted internal contacts); Ucon2st - two-state prediction by Ucon; MD - raw score by MD (prediction of protein disorder using orthogonal sources); MD_rel - reliability of the prediction by MD; values range from 0-9. 9=strong prediction; MD2st - two-state prediction by MD.

NORSnet, PROFbval and Ucon raw scores are the predicted tendencies of those methods and MD_raw is the final raw score by MD. In all 2-state scores, a residue is predicted to be disordered (D), if the raw score is higher than 0.5. We wrote an R script to plot all the raw scores for each protein.

All the used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/Disorder.

Transmembrane Helices

The Memsat-SVM predictions were processed on the web server Psipred. To get the membrane helices prediction I chose the option MEMSAT3 & MEMSAT-SVM instead of the default option.

Polyphobius was run with the script provided in the subsection Software. It can also be found on the rostlab server /mnt/home/student/gerkej/gaucher/task3/tmh/polyphobius.pl. It automatically includes Blast and kalign

Signal Peptides

To find out more about signal peptides the following webservers were used:

On the Signal Peptide Database we also had a look on the transmembrane helices for the three proteins analysed in this excersise. Except of Glucocerbrosidase which does not have a documented membrane helix in the SP Database, the other three proteins of the "Membrane Helices" exercise are not listed in the Signal Peptide Database.

GO Terms

The GO Terms were reached by using the corresponding webservers of