Gaucher Disease - Task 06 - Lab Journal
1. Multiple sequence alignment
For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: Ras (PF00071). It contains 21,243 sequences. For the calculation of correlated mutations using freecontact</code, the MSA (
ras.txt
) had to be reformatted (to ras.aln
) with /usr/share/freecontact/a2m2aln
like this:>
/usr/share/freecontact/a2m2aln --query '^RASH_HUMAN/(\d+)' --quiet < ras.txt > ras.aln
The alignments can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/pfam_Ras_ali/
For our protein, the Pfam alignment of the only family found, Glyco_hydro_30 (PF02055), contained only 1151 sequences, which is not enough for freecontact
. Therefore, we used own alignments from task 2, as HHblits found after 10 iterations 17,538 hit with default E-value cutoff and 3,189 hits with E-value cutoff of 10E-1. We took both alignment to compare the results, as the search with default E-value on the one hand has generated more alignmnets, which is an advantage for freecontact
, but on the other hand some of those hits could have been false positives, as discussed. HHblits can optionally generate MSA in different formats, including a2m, with the option -oa2m
, which we used like this:
cd ~/master_practical/Assignment2_Alignments/output/task1
uniprot20="/mnt/project/rost_db/data/hhblits/uniprot20_02Sep11"
#HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, default E-value cutoff:
hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20.hhr -n 10 -oa2m hhblits-10iter-uniprot20.a2m
#HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, E-value cutoff 10E-10:
hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20-10E-10.hhr -n 10 -e 10E-10 -oa2m hhblits-10iter-uniprot20-10E-10.a2m
The alignments can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/P04062_ali/