Difference between revisions of "Canavan Disease: Task 03 - Journal"

From Bioinformatikpedia
(Created page with "<h1 id="task-3-working-log">Task 3 Working Log</h1> <h2 id="secondary-structure-prediction">Secondary structure prediction</h2> <ul> <li>creation of pssm files via psi-blast (b…")
 
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
Link back to Task 03: [[Canavan_Disease:_Task_03_-_Sequence-based_Predictions|Sequence-based Predictions]]
  +
 
<h1 id="task-3-working-log">Task 3 Working Log</h1>
 
<h1 id="task-3-working-log">Task 3 Working Log</h1>
   
Line 4: Line 6:
   
 
<ul>
 
<ul>
  +
<li>creation of pssm files via psi-blast </li>
<li>creation of pssm files via psi-blast (blastpgp -i /mnt/home/student/mahlich/mapra/data/query_seqs/P45381.fasta -o /mnt/home/student/mahlich/mapra/sec_struc_pred/outfiles/aspa_big80.out -d /mnt/project/pracstrucfunc13/data/big/big_80 -C /mnt/home/student/mahlich/mapra/aspa_big80.chk -Q /mnt/home/student/mahlich/mapra/aspa_big80.pssm -h 10e-10 -j 3)</li>
 
  +
<nowiki>blastpgp -i /mnt/home/student/.../data/P45381.fasta -o /mnt/home/student/.../aspa_big80.out
  +
-d /mnt/project/pracstrucfunc13/data/big/big_80 -C /mnt/home/student/.../aspa_big80.chk -Q /mnt/home/student/.../aspa_big80.pssm
  +
-h 10e-10 -j 3)
  +
  +
where:
  +
-i input file
  +
-o outfile
  +
-d database to search against
  +
-C checkfile
  +
-Q pssm file
  +
-h eVaule cutoff
  +
-j number of iterations</nowiki>
  +
 
<li>PSSMs from PSI-Blast generated with 3 iterations and an e-Value-cutoff of 10e-10</li>
 
<li>PSSMs from PSI-Blast generated with 3 iterations and an e-Value-cutoff of 10e-10</li>
 
<li>combinations:
 
<li>combinations:
Line 13: Line 28:
 
</ul></li>
 
</ul></li>
 
<li>dssp-output and psipred-output generated via websites</li>
 
<li>dssp-output and psipred-output generated via websites</li>
<li>dssp shows following problem for ASPA: 2I3C/P45381 is present as a homodimer in crystal structure. As a result of this the PDB file that is used to genereate the dssp output contains "the fasta sequence" twice. Additional DSSP starts assigning secondary structure at position 10 of the Aminoacid sequence and stops at position 301. Therefore the DSSP-output and the output of the remaining tools has to be properly alligend.</li>
+
<li>dssp shows following problem for ASPA: 2I3C/P45381 is present as a homo dimer in crystal structure. As a result of this the PDB file that is used to generate the dssp output contains "the fasta sequence" twice. Additional DSSP starts assigning secondary structure at position 10 of the amino acid sequence and stops at position 301. Therefore the DSSP-output and the output of the remaining tools has to be properly aligned.</li>
<li>statistics where generated via sec<em>struc</em>pred_statistics.py</li>
+
<li>statistics where generated via sec_struc_pred_statistics.py</li>
<li>the statistics showed for the asparto acylase:
+
<li>the statistics showed for the asparto-acylase:
 
<ul>
 
<ul>
 
<li>DSSP taken as "truth" as DSSP assigns secondary structure (does not predict) from the atomic coordinates</li>
 
<li>DSSP taken as "truth" as DSSP assigns secondary structure (does not predict) from the atomic coordinates</li>
 
<li>the precision for each <em>prediction method</em> was calculated</li>
 
<li>the precision for each <em>prediction method</em> was calculated</li>
 
<li>the results show that PSI-Pred shows the best precision</li>
 
<li>the results show that PSI-Pred shows the best precision</li>
<li>how ever within ReProf ReProf with sequence profile from Big<em>80 shows the best result</li>
+
<li>how ever within ReProf ReProf with sequence profile from Big_80 shows the best result</li>
<li>Reprof with Big</em>80 PSSM used for further predictions, however for comparision PSI-PRED runs are made as well</li>
+
<li>Reprof with Big_80 PSSM used for further predictions, however for comparison PSI-PRED runs are made as well</li>
 
</ul></li>
 
</ul></li>
<li>P10775 -> 1DFJ (attention only Chain I): DSSP assignment from PDB-file of 1DFJ however for comparison only the assignment of chain I in the PDB-file is aceptable</li>
+
<li>P10775 -> 1DFJ (attention only Chain I): DSSP assignment from PDB-file of 1DFJ however for comparison only the assignment of chain I in the PDB-file is acceptable</li>
<li>Q08209 -> 1AUI (attention only Chain A): DSSP assignment from PDB-file of 1AUI however for comparison only the assignment of chain A in the PDB-file is aceptable, additional to that parst of the sequnce are not crystalized in the PDB-file therefore a couple of positions had to be manually inserted marked with '*'</li>
+
<li>Q08209 -> 1AUI (attention only Chain A): DSSP assignment from PDB-file of 1AUI however for comparison only the assignment of chain A in the PDB-file is acceptable, additional to that parts of the sequence are not crystallized in the PDB-file therefore a couple of positions had to be manually inserted marked with '*'</li>
 
<li>Q9X0E6 -> 1O5J: DSSP assignment from PDB-file of 1O5J spans the complete protein therefore the DSSP assignment should be okay (except the frist 3 residues)</li>
 
<li>Q9X0E6 -> 1O5J: DSSP assignment from PDB-file of 1O5J spans the complete protein therefore the DSSP assignment should be okay (except the frist 3 residues)</li>
 
</ul>
 
</ul>
   
<h2>Disorder</h2>
+
<h2> Disorder</h2>
   
 
<ul>
 
<ul>
 
<li>creation of the IUPred predictions via run_iupred.sh</li>
 
<li>creation of the IUPred predictions via run_iupred.sh</li>
<li>ceration of analysis via foo.py</li>
+
<li>creation of analysis via disorder_statistics.py</li>
<li>aplicable for both UiPred and metadisorder</li>
+
<li>applicable for both IUPred and Metadisorder</li>
  +
<li>finding the right match for P10775 in disprot a sequence search had to be initiated (swiss-Waterman and PSI-Pred on the disprot-website)</li>
 
</ul>
 
</ul>
   
Line 41: Line 57:
 
<li>Polyphobius
 
<li>Polyphobius
 
<ul>
 
<ul>
<li>blastget index file creation /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -create /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot.fasta</li>
+
<li>blastget index file creation </li>
<li>blast get for the single files /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot -ix swiss_p.idx ../data/query_seqs/TMH/P45381.fasta &gt;&gt; P45381.blastget.out</li>
+
<nowiki>/mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -create /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot.fasta</nowiki>
<li>kaling for the single files /mnt/opt/T-Coffee/bin/kalign -i P45381.blastget.out -o P45381.kalign.out</li>
+
<li>blast get for the single files </li>
<li>polyphobius /mnt/project/pracstrucfunc13/polyphobius/jphobius -poly P45381.kalign.out &gt;&gt; P45381.polyph.out</li>
+
<nowiki>/mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot -ix swiss_p.idx ../data/query_seqs/TMH/P45381.fasta &gt;&gt; P45381.blastget.out</nowiki>
  +
<li>kaling for the single files </li>
  +
<nowiki>/mnt/opt/T-Coffee/bin/kalign -i P45381.blastget.out -o P45381.kalign.out</nowiki>
  +
<li>polyphobius </li>
  +
<nowiki>/mnt/project/pracstrucfunc13/polyphobius/jphobius -poly P45381.kalign.out &gt;&gt; P45381.polyph.out</nowiki>
 
</ul></li>
 
</ul></li>
<li>MEMSAT</li>
+
<li>MEMSAT
  +
<ul>
 
<li>Your job is in the queue under the name: P45381 with the job ID: cc3e3788-c45a-11e2-add6-00163e110593</li>
 
<li>Your job is in the queue under the name: P45381 with the job ID: cc3e3788-c45a-11e2-add6-00163e110593</li>
 
<li>Your job is in the queue under the name: P35462 with the job ID: 4cf4b7b0-c3c7-11e2-840f-00163e110593</li>
 
<li>Your job is in the queue under the name: P35462 with the job ID: 4cf4b7b0-c3c7-11e2-840f-00163e110593</li>
 
<li>Your job is in the queue under the name: P47863 with the job ID: 6799d884-c3c7-11e2-8b61-00163e110593</li>
 
<li>Your job is in the queue under the name: P47863 with the job ID: 6799d884-c3c7-11e2-8b61-00163e110593</li>
<li>Your job is in the queue under the name: Q9YDF8 with the job ID: 81e7ddda-c3c7-11e2-8b61-00163e110593
+
<li>Your job is in the queue under the name: Q9YDF8 with the job ID: 81e7ddda-c3c7-11e2-8b61-00163e110593</li>
http://bioinf.cs.ucl.ac.uk/psipred/result/81e7ddda-c3c7-11e2-8b61-00163e110593</li>
+
<li>http://bioinf.cs.ucl.ac.uk/psipred/result/81e7ddda-c3c7-11e2-8b61-00163e110593</li>
  +
<li>Difficulty to find the right protein to compare Q9YDF8 to (in OMP/PDBTM) -> how that was achieved is explained in the wiki</li>
  +
</ul></li>
 
</ul>
 
</ul>
   
Line 57: Line 80:
   
 
<ul>
 
<ul>
<li>at the moment the singalP website seems to be down</li>
+
<li>for creation of the signalP outfiles SginalP version 4.1 was used (the web server)</li>
 
</ul>
 
</ul>
   
Line 64: Line 87:
 
<ul>
 
<ul>
 
<li>GoPet see xml file</li>
 
<li>GoPet see xml file</li>
<li>Protfun see signalP same problem</li>
+
<li>Protfun used via the webserver</li>
 
<li>PFam see: http://pfam.sanger.ac.uk/protein/P45381</li>
 
<li>PFam see: http://pfam.sanger.ac.uk/protein/P45381</li>
 
</ul>
 
</ul>

Latest revision as of 10:36, 28 August 2013

Link back to Task 03: Sequence-based Predictions

Task 3 Working Log

Secondary structure prediction

  • creation of pssm files via psi-blast
  • blastpgp -i /mnt/home/student/.../data/P45381.fasta -o /mnt/home/student/.../aspa_big80.out 
    -d /mnt/project/pracstrucfunc13/data/big/big_80 -C /mnt/home/student/.../aspa_big80.chk -Q /mnt/home/student/.../aspa_big80.pssm 
    -h 10e-10 -j 3) 
    
       where:
    -i input file
    -o outfile
    -d database to search against
    -C checkfile
    -Q pssm file
    -h eVaule cutoff
    -j number of iterations
    
  • PSSMs from PSI-Blast generated with 3 iterations and an e-Value-cutoff of 10e-10
  • combinations:
    • pssm from big 80
    • pssm from swissprot
    • without pssm
  • dssp-output and psipred-output generated via websites
  • dssp shows following problem for ASPA: 2I3C/P45381 is present as a homo dimer in crystal structure. As a result of this the PDB file that is used to generate the dssp output contains "the fasta sequence" twice. Additional DSSP starts assigning secondary structure at position 10 of the amino acid sequence and stops at position 301. Therefore the DSSP-output and the output of the remaining tools has to be properly aligned.
  • statistics where generated via sec_struc_pred_statistics.py
  • the statistics showed for the asparto-acylase:
    • DSSP taken as "truth" as DSSP assigns secondary structure (does not predict) from the atomic coordinates
    • the precision for each prediction method was calculated
    • the results show that PSI-Pred shows the best precision
    • how ever within ReProf ReProf with sequence profile from Big_80 shows the best result
    • Reprof with Big_80 PSSM used for further predictions, however for comparison PSI-PRED runs are made as well
  • P10775 -> 1DFJ (attention only Chain I): DSSP assignment from PDB-file of 1DFJ however for comparison only the assignment of chain I in the PDB-file is acceptable
  • Q08209 -> 1AUI (attention only Chain A): DSSP assignment from PDB-file of 1AUI however for comparison only the assignment of chain A in the PDB-file is acceptable, additional to that parts of the sequence are not crystallized in the PDB-file therefore a couple of positions had to be manually inserted marked with '*'
  • Q9X0E6 -> 1O5J: DSSP assignment from PDB-file of 1O5J spans the complete protein therefore the DSSP assignment should be okay (except the frist 3 residues)

 Disorder

  • creation of the IUPred predictions via run_iupred.sh
  • creation of analysis via disorder_statistics.py
  • applicable for both IUPred and Metadisorder
  • finding the right match for P10775 in disprot a sequence search had to be initiated (swiss-Waterman and PSI-Pred on the disprot-website)

TMH prediction

  • Polyphobius
    • blastget index file creation
    • /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -create /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot.fasta
      
    • blast get for the single files
    • /mnt/project/pracstrucfunc13/polyphobius/blastget -ix swiss_p.idx -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot -ix swiss_p.idx ../data/query_seqs/TMH/P45381.fasta >> P45381.blastget.out
      
    • kaling for the single files
    • /mnt/opt/T-Coffee/bin/kalign -i P45381.blastget.out -o P45381.kalign.out
      
    • polyphobius
    • /mnt/project/pracstrucfunc13/polyphobius/jphobius -poly P45381.kalign.out >> P45381.polyph.out
      
  • MEMSAT
    • Your job is in the queue under the name: P45381 with the job ID: cc3e3788-c45a-11e2-add6-00163e110593
    • Your job is in the queue under the name: P35462 with the job ID: 4cf4b7b0-c3c7-11e2-840f-00163e110593
    • Your job is in the queue under the name: P47863 with the job ID: 6799d884-c3c7-11e2-8b61-00163e110593
    • Your job is in the queue under the name: Q9YDF8 with the job ID: 81e7ddda-c3c7-11e2-8b61-00163e110593
    • http://bioinf.cs.ucl.ac.uk/psipred/result/81e7ddda-c3c7-11e2-8b61-00163e110593
    • Difficulty to find the right protein to compare Q9YDF8 to (in OMP/PDBTM) -> how that was achieved is explained in the wiki

SignalP

  • for creation of the signalP outfiles SginalP version 4.1 was used (the web server)

GOterms: