Lab journal task 2

From Bioinformatikpedia

Sequence Searches

All searches were done with an increase number of output lines (10000) in the summary hit list and for the reported alignments in order to have all found hits displayed. The analyses were all conducted using all displayed hits. We set no specific evalue cutoff.

The HFE protein sequence has the Uniprot ID Q30201, NCBI ID 1890180 and PDB ID 1A6Z_A.

All Blast, Psiblast and hhblits output files that where analyses where first parse using the perl script parse_output.pl. For example:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt  


Blast

A Blast search in big_80 was executed using the standard parameter settings:

blastall -p blastp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -o /mnt/home/student/betza/task2/blast/res_blast.txt  
-v 10000 -b 10000

PSiblast

Example call for Psiblast with 10 iterations and evalue cutoff of 10E-10:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -j 10 -h 10E-10 -v 10000 -b 10000 
-o /mnt/home/student/betza
/task2/psiblast/new/j1h1/j1hi.txt -Q /mnt/home/student/betza/task2/psiblast/new/j1h1/j1h1.pssm 
-C /mnt/home/student/betza/task2/psiblast/new/j1h1/j1h1.chk

The perl scrip devide_psiblast_out.pl was then used to divide the different psiblast iterations in order to be able to analyse the results of the last iteration alone.

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/devide_psiblast_out.pl 
/mnt/home/student/betza/task2/psiblast/new/j1h1/j1hi.txt

The /mnt/home/rost/kloppmann/data/blast_db/pdb_seqres database was searched reloading the checkfiles created earlier with the -R flag, example for 2 iterations and e-value cutoff 2E-3:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/home/rost/kloppmann/data/blast_db/pdb_seqres -j 1 -h 0.002 -v 10000 -b 10000 
-m $OF -o /mnt/home/student/betza/task2/psiblast/pdb/new/j2h2/j2h2.$FE -Q /mnt/home/student/betza/task2/psiblast/pdb/new/j2h2/j2h2.pssm 
-R /mnt/home/student/betza/task2/psiblast/new/j2h2/j2h2.chk

HHblits

hhblits commandline call:

hhblits -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/rost_db/data/hhblits/uniprot20_02Sep11 -o /mnt/home/student/betza/task2/hhblits/hfe.hhr -oa3m /mnt/home/student/betza/task2/hhblits/hfe.a3m -oalis /mnt/home/student/betza/task2/hhblits/hfe -ohhm /mnt/home/student/betza/task2/hfe.hhm -Z 10000 -B 10000


The output files were analysed using the script parse_output.pl:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt --query 1a6z_A --sot L30 --out_h /mnt/home/student/betza/task2/hhblits/

Evaluation

CATH
The python script compareCath.py was written to check the overlap of the query protein's CATH fold classes with those of the hits. Example call:

python /mnt/home/student/betza/scripts/compareCath.py -i /mnt/home/student/betza/task2/blast/res_blast.txt_results -q 1a6zA > /mnt/home/student/betza/task2/blast
/res_blast.txt/blast_cath

COPS
This analyses were also done using the script parse_output.pl:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/parse_output.pl --out_p /mnt/home/student/betza/task2/blast/res_blast.txt 
--query 1a6z_A --sot L30

Multiple Alignments

The sequences used were selected from the psiblast run with 2 iterations and an evalue cutoff of 10E-3.

ClustalW

  • Version 2.1

ClustalW was exectued on the student computers with standard parameters using:

clustalw -INFILE=<fastaFile>


MAFFT

  • Version 7

For MAFFT, the web server was used with the following parameters:

Parameter Value
Alignment strategy auto
Scoring Matrix Blosum62
Gap opening penalty 1.53
Offset value 0
Number of homologs for profile building 50
Evalue threshold 1e-10

T-Coffee and Expresso

  • T-Coffe Version 8.99
  • Expresso Version 9.03

For T-Coffe and Expresso, the T-Coffe web server was used with standard parameters. Expresso automatically finds the PDB structures of the sequences in the alignment and thus does not need additional input.


Jalview was used for visualisation of the MSAs and to load the secondary structure assignments from Uniprot.