Lab journal task 2

From Bioinformatikpedia
Revision as of 12:45, 29 August 2013 by Betza (talk | contribs)

Sequence Searches

All searches were done with an increase number of output lines (10000) in the summary hit list and for the reported alignments in order to have all found hits displayed. The analyses were all conducted using all displayed hits. We set no specific evalue cutoff.

The HFE protein sequence has the Uniprot ID Q30201, NCBI ID 1890180 and PDB ID 1A6Z_A.

All Blast, Psiblast and hhblits output files that where analyses where first parse using the perl script parse_output.pl. For example:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt  


Blast

A Blast search in big_80 was executed using the standard parameter settings:

blastall -p blastp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -o /mnt/home/student/betza/task2/blast/res_blast.txt  
-v 10000 -b 10000

PSiblast

Example call for Psiblast with 10 iterations and evalue cutoff of 10E-10:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -j 10 -h 10E-10 -v 10000 -b 10000 
-o /mnt/home/student/betza
/task2/psiblast/new/j1h1/j1hi.txt -Q /mnt/home/student/betza/task2/psiblast/new/j1h1/j1h1.pssm 
-C /mnt/home/student/betza/task2/psiblast/new/j1h1/j1h1.chk

The perl scrip devide_psiblast_out.pl was then used to divide the different psiblast iterations in order to be able to analyse the results of the last iteration alone.

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/devide_psiblast_out.pl 
/mnt/home/student/betza/task2/psiblast/new/j1h1/j1hi.txt

The /mnt/home/rost/kloppmann/data/blast_db/pdb_seqres database was searched reloading the checkfiles created earlier with the -R flag, example for 2 iterations and e-value cutoff 2E-3:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/home/rost/kloppmann/data/blast_db/pdb_seqres -j 1 -h 0.002 -v 10000 -b 10000 
-m $OF -o /mnt/home/student/betza/task2/psiblast/pdb/new/j2h2/j2h2.$FE -Q /mnt/home/student/betza/task2/psiblast/pdb/new/j2h2/j2h2.pssm 
-R /mnt/home/student/betza/task2/psiblast/new/j2h2/j2h2.chk

HHblits

hhblits commandline call:

hhblits -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/rost_db/data/hhblits/uniprot20_02Sep11 -o /mnt/home/student/betza/task2/hhblits/hfe.hhr -oa3m /mnt/home/student/betza/task2/hhblits/hfe.a3m -oalis /mnt/home/student/betza/task2/hhblits/hfe -ohhm /mnt/home/student/betza/task2/hfe.hhm -Z 10000 -B 10000


The output files were analysed using the script parse_output.pl:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt --query 1a6z_A --sot L30 --out_h /mnt/home/student/betza/task2/hhblits/

Evaluation

CATH
The python script compareCath.py was written to check the overlap of the query protein's CATH fold classes with those of the hits. Example call:

python /mnt/home/student/betza/scripts/compareCath.py -i /mnt/home/student/betza/task2/blast/res_blast.txt_results -q 1a6zA > /mnt/home/student/betza/task2/blast
/res_blast.txt/blast_cath

COPS
This analyses were also done using the script parse_output.pl:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt 
--query 1a6z_A --sot L30