Difference between revisions of "Gaucher Disease: Task 03 - Lab Journal"

From Bioinformatikpedia
(Disorder)
(Secondary structure)
Line 29: Line 29:
 
<br>
 
<br>
   
The script of the Phenylketonuria group [[Phenylketonuria/Task3/Scripts | filter_secStruc.pl]] was used to extract the secondary structures in the three letter code: E, H and L. For DSSP irregular regions are encoded as "-". Then, precision of two output secondary structure string was calculated using the second script of the Phenylketonuria group, [[Phenylketonuria/Task3/Scripts | SecStrucComparison.jar]].
+
The script of the Phenylketonuria group [[Phenylketonuria/Task3/Scripts | filter_secStruc.pl]] was used to extract the secondary structures in the three letter code: E, H and L, as described in table (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision of two output secondary structure string was calculated using the second script of the Phenylketonuria group, [[Phenylketonuria/Task3/Scripts | SecStrucComparison.jar]].
  +
  +
<figtable id="secondary structure code">
  +
{| class="colBasic2"
  +
|-
  +
! colspan="8" style="background:#32CD32;" | "Secondary Structure"
  +
|-
  +
! style="background:#90EE90;" align="center" | Type
  +
! style="background:#90EE90;" align="center" | ReProf
  +
! style="background:#90EE90;" align="center" | PsiPred
  +
! style="background:#90EE90;" align="center" | DSSP
  +
|-
  +
| Helix (alpha) || H || H || GHI
  +
|-
  +
| Extended strand (beta) || E || E || BE
  +
|-
  +
| Loops/Turns || L || C || ST
  +
|-
  +
|}
  +
<center><small>'''<caption>''' The different types to represent secondary structure in ReProf, PsiPred and DSSP.</caption></small></center>
  +
</figtable>
   
 
First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: '''big_80, 3 iterations and evalue cutoff 10E-10''', which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: <code>/mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision</code>.)
 
First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: '''big_80, 3 iterations and evalue cutoff 10E-10''', which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: <code>/mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision</code>.)

Revision as of 21:12, 7 August 2013

Secondary structure

ReProf uses a fasta sequence or a PSI-BLAST PSSM for prediction, PsiPred a fasta sequence and DSSP server needs a PDB file in order to use the 3D coordinates of atoms. The predictions were made for the four proteins, including the Gaucher's disease-causing protein, listed below. If several PDB structures are available, the one covering the most UniProt sequence most similarly was chosen. For glucosylceramidase the structure 1OGS was used (as in the task 2).

Uniprot PDB
Entry Protein name Origin Length Entry Method Resolution (Å) Chain Positions
P10775 Ribonuclease inhibitor pig 456 2BNH X-ray 2.30 A 1-456
Q9X0E6 Divalent-cation tolerance protein CutA bacterium Thermotoga maritima 101 1VHF X-ray 1.54 A 2-101
Q08209 Serine/threonine-protein phosphatase 2B catalytic subunit alpha isoform, EC=3.1.3.16 human 521 1AUI X-ray 2.10 A 1-521
P04062 Glucosylceramidase/acid-beta-glucosidase, EC=3.2.1.45 human 536 1OGS X-ray 2.00 A/B 40-536


The script of the Phenylketonuria group filter_secStruc.pl was used to extract the secondary structures in the three letter code: E, H and L, as described in table (<xr id="secondary structure code"/>). For DSSP irregular regions are encoded as "-". Then, precision of two output secondary structure string was calculated using the second script of the Phenylketonuria group, SecStrucComparison.jar.

<figtable id="secondary structure code">

"Secondary Structure"
Type ReProf PsiPred DSSP
Helix (alpha) H H GHI
Extended strand (beta) E E BE
Loops/Turns L C ST
The different types to represent secondary structure in ReProf, PsiPred and DSSP.

</figtable>

First different PSSMs after different PSI-BLAST runs (all combinations: against big_80/swissprot database, 2/3 iterations, E-value 2E-3/10E-10/10E-20) were tested on the shortest protein, Q9X0E6, then the run parameters yielding the best precision compared to PsiPred and DSSP were chosen. The best parameters were: big_80, 3 iterations and evalue cutoff 10E-10, which were then applied to create PSSMs for the other proteins. (The table where the results for all parameters are summarized can be seen here: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure/reprof_out/parsedSecStr/README.Q9X0E6.psiblast_param.precision.)

Other used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/SecondaryStructure

Disorder

IUPred We called IUPred with "long", "short" and "glob" modes the same four proteins (P10775, Q08209, Q9X0E6, P04062). Furthermore, we generated graphical outputs with IUPred server.

TODO: combine the three modes into one plot for each protein.


MD (MetaDisorder) The output files ending on ".md.out.mdisorder" have the following columns (example for P10775):

Number Residue NORSnet NORS2st PROFbval bval2st Ucon Ucon2st MD_raw   MD_rel  MD2st
    1   M       0.19    -       0.99    D       0.15    -       0.551   1       D
    2   N       0.11    -       0.78    D       0.21    -       0.475   1       -
    3   L       0.08    -       0.65    D       0.26    -       0.455   2       -
    4   D       0.13    -       0.68    D       0.20    -       0.414   3       -
    5   I       0.09    -       0.40    -       0.22    -       0.364   5       -
    6   H       0.14    -       0.48    D       0.29    -       0.384   4       -
    7   C       0.58    D       0.57    D       0.26    -       0.384   4       -

TODO: plot P10775, PROFbval, Ucon and MD_raw (MD_rel?).


All the used scripts and created data can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment3_SequenceBasedPredictions/Disorder.