Difference between revisions of "Gaucher Disease - Task 06 - Lab Journal"

From Bioinformatikpedia
(Created page with "'''1. Multiple sequence alignment''' For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: [http://pfam.sanger.ac.uk/family/Ras#tabview=tab3 Ras (PF00071)]. It …")
 
Line 1: Line 1:
 
'''1. Multiple sequence alignment'''
 
'''1. Multiple sequence alignment'''
For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: [http://pfam.sanger.ac.uk/family/Ras#tabview=tab3 Ras (PF00071)]. It contains 21,243 sequences. For the calculation of correlated mutations using <code>freecontact</code, the MSA (<code>ras.txt</code>) had to be reformatted (to <code>ras.aln</code>) with <code>/usr/share/freecontact/a2m2aln</code> like this:>
+
For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: [http://pfam.sanger.ac.uk/family/Ras#tabview=tab3 Ras (PF00071)]. It contains 21,243 sequences. For the calculation of correlated mutations using <code>freecontact</code, the MSA (<code>ras.txt</code>) had to be reformatted (to <code>ras.aln</code>) with <code>/usr/share/freecontact/a2m2aln</code> like this:
  +
 
<nowiki>
 
<nowiki>
 
/usr/share/freecontact/a2m2aln --query '^RASH_HUMAN/(\d+)' --quiet < ras.txt > ras.aln
 
/usr/share/freecontact/a2m2aln --query '^RASH_HUMAN/(\d+)' --quiet < ras.txt > ras.aln
 
</nowiki>
 
</nowiki>
  +
 
The alignments can be found in: <code>/mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/pfam_Ras_ali/</code>
 
The alignments can be found in: <code>/mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/pfam_Ras_ali/</code>
   
   
 
For our protein, the Pfam alignment of the only family found, [http://pfam.sanger.ac.uk/family/PF02055.11#tabview=tab3 Glyco_hydro_30 (PF02055)], contained only 1151 sequences, which is not enough for <code>freecontact</code>. Therefore, we used own alignments from task 2, as HHblits found after 10 iterations 17,538 hit with default E-value cutoff and 3,189 hits with E-value cutoff of 10E-1. We took both alignment to compare the results, as the search with default E-value on the one hand has generated more alignmnets, which is an advantage for <code>freecontact</code>, but on the other hand some of those hits could have been false positives, as discussed. HHblits can optionally generate MSA in different formats, including a2m, with the option <code>-oa2m</code>, which we used like this:
 
For our protein, the Pfam alignment of the only family found, [http://pfam.sanger.ac.uk/family/PF02055.11#tabview=tab3 Glyco_hydro_30 (PF02055)], contained only 1151 sequences, which is not enough for <code>freecontact</code>. Therefore, we used own alignments from task 2, as HHblits found after 10 iterations 17,538 hit with default E-value cutoff and 3,189 hits with E-value cutoff of 10E-1. We took both alignment to compare the results, as the search with default E-value on the one hand has generated more alignmnets, which is an advantage for <code>freecontact</code>, but on the other hand some of those hits could have been false positives, as discussed. HHblits can optionally generate MSA in different formats, including a2m, with the option <code>-oa2m</code>, which we used like this:
  +
 
<nowiki>
 
<nowiki>
 
cd ~/master_practical/Assignment2_Alignments/output/task1
 
cd ~/master_practical/Assignment2_Alignments/output/task1
Line 17: Line 20:
 
#HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, E-value cutoff 10E-10:
 
#HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, E-value cutoff 10E-10:
 
hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20-10E-10.hhr -n 10 -e 10E-10 -oa2m hhblits-10iter-uniprot20-10E-10.a2m
 
hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20-10E-10.hhr -n 10 -e 10E-10 -oa2m hhblits-10iter-uniprot20-10E-10.a2m
  +
 
</nowiki>
 
</nowiki>
  +
 
The alignments can be found in: <code>/mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/P04062_ali/</code>
 
The alignments can be found in: <code>/mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/P04062_ali/</code>

Revision as of 00:51, 14 June 2013

1. Multiple sequence alignment For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: Ras (PF00071). It contains 21,243 sequences. For the calculation of correlated mutations using freecontact</code, the MSA (ras.txt) had to be reformatted (to ras.aln) with /usr/share/freecontact/a2m2aln like this:

/usr/share/freecontact/a2m2aln --query '^RASH_HUMAN/(\d+)' --quiet < ras.txt > ras.aln

The alignments can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/pfam_Ras_ali/


For our protein, the Pfam alignment of the only family found, Glyco_hydro_30 (PF02055), contained only 1151 sequences, which is not enough for freecontact. Therefore, we used own alignments from task 2, as HHblits found after 10 iterations 17,538 hit with default E-value cutoff and 3,189 hits with E-value cutoff of 10E-1. We took both alignment to compare the results, as the search with default E-value on the one hand has generated more alignmnets, which is an advantage for freecontact, but on the other hand some of those hits could have been false positives, as discussed. HHblits can optionally generate MSA in different formats, including a2m, with the option -oa2m, which we used like this:

cd ~/master_practical/Assignment2_Alignments/output/task1 uniprot20="/mnt/project/rost_db/data/hhblits/uniprot20_02Sep11" #HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, default E-value cutoff: hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20.hhr -n 10 -oa2m hhblits-10iter-uniprot20.a2m #HHblits, maximum 10 iterations against uniprot20 and one iteration against pdb_full, E-value cutoff 10E-10: hhblits -i ../../gaucher_data/P04062.fasta -d $uniprot20 -o hhblits-10iter-uniprot20-10E-10.hhr -n 10 -e 10E-10 -oa2m hhblits-10iter-uniprot20-10E-10.a2m

The alignments can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/P04062_ali/