Gaucher Disease: Task 03 - Lab Journal
<css>
table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }
.colBasic2 th,td { padding: 3px; border: 1px solid black; }
.colBasic2 td { text-align:left; }
/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff
maria's style blue: #adceff grey: #efefef
- /
.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}
</css>
Secondary structure
ReProf uses a fasta sequence or a PSI-BLAST PSSM for prediction, PsiPred a fasta sequence and DSSP server needs a PDB file in order to use the 3D coordinates of atoms. The predictions were made for the four proteins, including the Gaucher's disease-causing protein (<xr id="secondary structure proteins"/>). If several PDB structures are available, the one covering the most UniProt sequence most similarly was chosen. For glucosylceramidase the structure 1OGS was used (as in the task 2).
<figtable id="secondary structure proteins">
Uniprot | PDB | |||||||
---|---|---|---|---|---|---|---|---|
Entry | Protein name | Origin | Length | Entry | Method | Resolution (Å) | Chain | Positions |
P10775 | Ribonuclease inhibitor | pig | 456 | 2BNH | X-ray | 2.30 | A | 1-456 |
Q9X0E6 | Divalent-cation tolerance protein CutA | bacterium Thermotoga maritima | 101 | 1VHF | X-ray | 1.54 | A | 2-101 |
Q08209 | Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform, EC=3.1.3.16 | human | 521 | 1AUI | X-ray | 2.10 | A | 1-521 |
P04062 | Glucosylceramidase/acid-beta-glucosidase, EC=3.2.1.45 | human | 536 | 1OGS | X-ray | 2.00 | A/B | 40-536 |
</figtable>
The script of the Phenylketonuria group filter_secStruc.pl was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total, ignoring positions without a secondary structure ("-"), was calculated using the second script of the Phenylketonuria group, SecStrucComparison.jar as follows:
precision = number of matches / number of residues
<figtable id="secondary structure code">
"Secondary Structure Code" | |||
---|---|---|---|
Secondary structure | ReProf | PsiPred | DSSP |
Helix (alpha) | H | H | GHI |
Extended strand (beta) | E | E | BE |
Loops/Turns | L | C | ST |
</figtable>
First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: big_80, 3 iterations and evalue cutoff 10E-10, which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision
.)
Other used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure.
Disorder
IUPred We called IUPred with "long", "short" and "glob" modes the same four proteins (P10775, Q08209, Q9X0E6, P04062). One can generate graphical output with the IUPred server. However, for a clear view and interpretation of the results of the three options, we combined them in a single plot with an R script.
MD (MetaDisorder)
The output files ending on ".md.out.mdisorder" have the following columns (example for P10775):
Number Residue NORSnet NORS2st PROFbval bval2st Ucon Ucon2st MD_raw MD_rel MD2st 1 M 0.19 - 0.99 D 0.15 - 0.551 1 D 2 N 0.11 - 0.78 D 0.21 - 0.475 1 - 3 L 0.08 - 0.65 D 0.26 - 0.455 2 - 4 D 0.13 - 0.68 D 0.20 - 0.414 3 - 5 I 0.09 - 0.40 - 0.22 - 0.364 5 - 6 H 0.14 - 0.48 D 0.29 - 0.384 4 - 7 C 0.58 D 0.57 D 0.26 - 0.384 4 -
Key for output: Number - residue number Residue - amino-acid type NORSnet - raw score by NORSnet (prediction of unstructured loops) NORS2st - two-state prediction by NORSnet; D=disordered PROFbval - raw score by PROFbval (prediction of residue flexibility from sequence) Bval2st - two-state prediction by PROFbval Ucon - raw score by Ucon (prediction of protein disorder using predicted internal contacts) Ucon2st - two-state prediction by Ucon MD - raw score by MD (prediction of protein disorder using orthogonal sources) MD_rel - reliability of the prediction by MD; values range from 0-9. 9=strong prediction MD2st - two-state prediction by MD
NORSnet, PROFbval and Ucon raw scores are the predicted tendencies of those methods and MD_raw is the final raw score by MD. In all 2-state scores, a residue is predicted to be disordered (D), if the raw score is higher than 0.5. We wrote an R script to plot all the raw scores for each protein.
All the used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/Disorder.