Canavan Disease: Task 03 - Journal

From Bioinformatikpedia
Revision as of 10:36, 28 August 2013 by Boehma (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Link back to Task 03: Sequence-based Predictions

Task 3 Working Log

Secondary structure prediction

  • creation of pssm files via psi-blast
  • blastpgp -i /mnt/home/student/.../data/P45381.fasta -o /mnt/home/student/.../aspa_big80.out 
    -d /mnt/project/pracstrucfunc13/data/big/big_80 -C /mnt/home/student/.../aspa_big80.chk -Q /mnt/home/student/.../aspa_big80.pssm 
    -h 10e-10 -j 3) 
    
       where:
    -i input file
    -o outfile
    -d database to search against
    -C checkfile
    -Q pssm file
    -h eVaule cutoff
    -j number of iterations
    
  • PSSMs from PSI-Blast generated with 3 iterations and an e-Value-cutoff of 10e-10
  • combinations:
    • pssm from big 80
    • pssm from swissprot
    • without pssm
  • dssp-output and psipred-output generated via websites
  • dssp shows following problem for ASPA: 2I3C/P45381 is present as a homo dimer in crystal structure. As a result of this the PDB file that is used to generate the dssp output contains "the fasta sequence" twice. Additional DSSP starts assigning secondary structure at position 10 of the amino acid sequence and stops at position 301. Therefore the DSSP-output and the output of the remaining tools has to be properly aligned.
  • statistics where generated via sec_struc_pred_statistics.py
  • the statistics showed for the asparto-acylase:
    • DSSP taken as "truth" as DSSP assigns secondary structure (does not predict) from the atomic coordinates
    • the precision for each prediction method was calculated
    • the results show that PSI-Pred shows the best precision
    • how ever within ReProf ReProf with sequence profile from Big_80 shows the best result
    • Reprof with Big_80 PSSM used for further predictions, however for comparison PSI-PRED runs are made as well
  • P10775 -> 1DFJ (attention only Chain I): DSSP assignment from PDB-file of 1DFJ however for comparison only the assignment of chain I in the PDB-file is acceptable
  • Q08209 -> 1AUI (attention only Chain A): DSSP assignment from PDB-file of 1AUI however for comparison only the assignment of chain A in the PDB-file is acceptable, additional to that parts of the sequence are not crystallized in the PDB-file therefore a couple of positions had to be manually inserted marked with '*'
  • Q9X0E6 -> 1O5J: DSSP assignment from PDB-file of 1O5J spans the complete protein therefore the DSSP assignment should be okay (except the frist 3 residues)

 Disorder

  • creation of the IUPred predictions via run_iupred.sh
  • creation of analysis via disorder_statistics.py
  • applicable for both IUPred and Metadisorder
  • finding the right match for P10775 in disprot a sequence search had to be initiated (swiss-Waterman and PSI-Pred on the disprot-website)

TMH prediction

  • Polyphobius
    • blastget index file creation
    • /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -create /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot.fasta
      
    • blast get for the single files
    • /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot -ix swiss_p.idx ../data/query_seqs/TMH/P45381.fasta >> P45381.blastget.out
      
    • kaling for the single files
    • /mnt/opt/T-Coffee/bin/kalign -i P45381.blastget.out -o P45381.kalign.out
      
    • polyphobius
    • /mnt/project/pracstrucfunc13/polyphobius/jphobius -poly P45381.kalign.out >> P45381.polyph.out
      
  • MEMSAT
    • Your job is in the queue under the name: P45381 with the job ID: cc3e3788-c45a-11e2-add6-00163e110593
    • Your job is in the queue under the name: P35462 with the job ID: 4cf4b7b0-c3c7-11e2-840f-00163e110593
    • Your job is in the queue under the name: P47863 with the job ID: 6799d884-c3c7-11e2-8b61-00163e110593
    • Your job is in the queue under the name: Q9YDF8 with the job ID: 81e7ddda-c3c7-11e2-8b61-00163e110593
    • http://bioinf.cs.ucl.ac.uk/psipred/result/81e7ddda-c3c7-11e2-8b61-00163e110593
    • Difficulty to find the right protein to compare Q9YDF8 to (in OMP/PDBTM) -> how that was achieved is explained in the wiki

SignalP

  • for creation of the signalP outfiles SginalP version 4.1 was used (the web server)

GOterms: