Difference between revisions of "Gaucher Disease: Task 03 - Lab Journal"

From Bioinformatikpedia
(Secondary structure)
(Secondary structure)
Line 54: Line 54:
 
</figtable>
 
</figtable>
   
The script of the Phenylketonuria group [[Phenylketonuria/Task3/Scripts | filter_secStruc.pl]] was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total was calculated using the second script of the Phenylketonuria group, [[Phenylketonuria/Task3/Scripts | SecStrucComparison.jar]] as follows:
+
The script of the Phenylketonuria group [[Phenylketonuria/Task3/Scripts | filter_secStruc.pl]] was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total, ignoring positions without a secondary structure ("-", was calculated using the second script of the Phenylketonuria group, [[Phenylketonuria/Task3/Scripts | SecStrucComparison.jar]] as follows:
   
 
<code><center>precision = number of matches / number of residues </center></code>
 
<code><center>precision = number of matches / number of residues </center></code>

Revision as of 21:46, 7 August 2013

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>


Secondary structure

ReProf uses a fasta sequence or a PSI-BLAST PSSM for prediction, PsiPred a fasta sequence and DSSP server needs a PDB file in order to use the 3D coordinates of atoms. The predictions were made for the four proteins, including the Gaucher's disease-causing protein (<xr id="secondary structure proteins"/>). If several PDB structures are available, the one covering the most UniProt sequence most similarly was chosen. For glucosylceramidase the structure 1OGS was used (as in the task 2).

<figtable id="secondary structure proteins">

Uniprot PDB
Entry Protein name Origin Length Entry Method Resolution (Å) Chain Positions
P10775 Ribonuclease inhibitor pig 456 2BNH X-ray 2.30 A 1-456
Q9X0E6 Divalent-cation tolerance protein CutA bacterium Thermotoga maritima 101 1VHF X-ray 1.54 A 2-101
Q08209 Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform, EC=3.1.3.16 human 521 1AUI X-ray 2.10 A 1-521
P04062 Glucosylceramidase/acid-beta-glucosidase, EC=3.2.1.45 human 536 1OGS X-ray 2.00 A/B 40-536
The four UniProt protein sequences and corresponding PDB structures selected for the secondary structure prediction with ReProf and PsiPred and assignment with DSSP.

</figtable>

The script of the Phenylketonuria group filter_secStruc.pl was used to extract the secondary structures in the three letter code: E, H and L (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision for each one of the three secondary structures and in total, ignoring positions without a secondary structure ("-", was calculated using the second script of the Phenylketonuria group, SecStrucComparison.jar as follows:

precision = number of matches / number of residues

<figtable id="secondary structure code">

"Secondary Structure Code"
Secondary structure ReProf PsiPred DSSP
Helix (alpha) H H GHI
Extended strand (beta) E E BE
Loops/Turns L C ST
The different types to represent secondary structure in ReProf, PsiPred and DSSP.

</figtable>

First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: big_80, 3 iterations and evalue cutoff 10E-10, which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision.)

Other used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure.

Disorder

IUPred We called IUPred with "long", "short" and "glob" modes the same four proteins (P10775, Q08209, Q9X0E6, P04062). Furthermore, we generated graphical outputs with IUPred server.

TODO: combine the three modes into one plot for each protein.


MD (MetaDisorder) The output files ending on ".md.out.mdisorder" have the following columns (example for P10775):

Number Residue NORSnet NORS2st PROFbval bval2st Ucon Ucon2st MD_raw   MD_rel  MD2st
    1   M       0.19    -       0.99    D       0.15    -       0.551   1       D
    2   N       0.11    -       0.78    D       0.21    -       0.475   1       -
    3   L       0.08    -       0.65    D       0.26    -       0.455   2       -
    4   D       0.13    -       0.68    D       0.20    -       0.414   3       -
    5   I       0.09    -       0.40    -       0.22    -       0.364   5       -
    6   H       0.14    -       0.48    D       0.29    -       0.384   4       -
    7   C       0.58    D       0.57    D       0.26    -       0.384   4       -

TODO: plot P10775, PROFbval, Ucon and MD_raw (MD_rel?).


All the used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/Disorder.