Gaucher Disease: Task 03 - Lab Journal

From Bioinformatikpedia
Revision as of 00:19, 11 August 2013 by Kalemanovm (talk | contribs) (Disorder)

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>


Secondary structure

ReProf uses a fasta sequence or a PSI-BLAST PSSM for prediction, PsiPred a fasta sequence and DSSP server needs a PDB file in order to use the 3D coordinates of atoms. The predictions were made for the four proteins, including the Gaucher's disease-causing protein (<xr id="secondary structure proteins"/>). If several PDB structures are available, the one covering the most UniProt sequence most similarly was chosen. For glucosylceramidase the structure 1OGS was used (as in the task 2).

<figtable id="secondary structure proteins">

Uniprot PDB
Entry Protein name Origin Length Entry Method Resolution (Å) Chain Positions
P10775 Ribonuclease inhibitor pig 456 2BNH X-ray 2.30 A 1-456
Q9X0E6 Divalent-cation tolerance protein CutA bacterium Thermotoga maritima 101 1VHF X-ray 1.54 A 2-101
Q08209 Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform, EC=3.1.3.16 human 521 1AUI X-ray 2.10 A 1-521
P04062 Glucosylceramidase/acid-beta-glucosidase, EC=3.2.1.45 human 536 1OGS X-ray 2.00 A/B 40-536
The four UniProt protein sequences and corresponding PDB structures selected for the secondary structure prediction with ReProf and PsiPred and assignment with DSSP.

</figtable>

The script of the Phenylketonuria group filter_secStruc.pl was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total, ignoring positions without a secondary structure ("-"), was calculated using the second script of the Phenylketonuria group, SecStrucComparison.jar as follows:

precision = number of matches / number of residues

<figtable id="secondary structure code">

"Secondary Structure Code"
Secondary structure ReProf PsiPred DSSP
Helix (alpha) H H GHI
Extended strand (beta) E E BE
Loops/Turns L C ST
The different types to represent secondary structure in ReProf, PsiPred and DSSP.

</figtable>

First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: big_80, 3 iterations and evalue cutoff 10E-10, which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision.)

Other used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure.

Disorder

IUPred We called IUPred with "long", "short" and "glob" modes the same four proteins (P10775, Q08209, Q9X0E6, P04062). One can generate graphical output with the IUPred server. However, for a clear view and interpretation of the results of the three options, we combined them in a single plot with an R script.


MD (MetaDisorder) The output files ending on ".md.out.mdisorder" have the following columns (example for P10775):

Number Residue NORSnet NORS2st PROFbval bval2st Ucon Ucon2st MD_raw   MD_rel  MD2st
    1   M       0.19    -       0.99    D       0.15    -       0.551   1       D
    2   N       0.11    -       0.78    D       0.21    -       0.475   1       -
    3   L       0.08    -       0.65    D       0.26    -       0.455   2       -
    4   D       0.13    -       0.68    D       0.20    -       0.414   3       -
    5   I       0.09    -       0.40    -       0.22    -       0.364   5       -
    6   H       0.14    -       0.48    D       0.29    -       0.384   4       -
    7   C       0.58    D       0.57    D       0.26    -       0.384   4       -

Key for output: Number - residue number Residue - amino-acid type NORSnet - raw score by NORSnet (prediction of unstructured loops) NORS2st - two-state prediction by NORSnet; D=disordered PROFbval - raw score by PROFbval (prediction of residue flexibility from sequence) Bval2st - two-state prediction by PROFbval Ucon - raw score by Ucon (prediction of protein disorder using predicted internal contacts) Ucon2st - two-state prediction by Ucon MD - raw score by MD (prediction of protein disorder using orthogonal sources) MD_rel - reliability of the prediction by MD; values range from 0-9. 9=strong prediction MD2st - two-state prediction by MD

NORSnet, PROFbval and Ucon raw scores are the predicted tendencies of those methods and MD_raw is the final raw score by MD. In all 2-state scores, a residue is predicted to be disordered (D), if the raw score is higher than 0.5. We wrote an R script to plot all the raw scores for each protein.

All the used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/Disorder.