Difference between revisions of "Gaucher Disease - Task 06 - Lab Journal"

From Bioinformatikpedia
Line 33: Line 33:
   
 
After that the contact.out file was parsed with <code>calc_hotspot.py</code> to calculate the hotspot residues.
 
After that the contact.out file was parsed with <code>calc_hotspot.py</code> to calculate the hotspot residues.
The script can be found in the directory <code>/mnt/home/student/gerkej/gaucher/task6</code>, the contact.out files are stored in the subdirectories <code>pfam_Ras</code> and <code>P04062</code>
+
The script can be found in the directory <code>/mnt/home/student/gerkej/gaucher/task6/</code>, the contact.out files are stored in the subdirectories <code>pfam_Ras/</code> and <code>P04062/</code>

Revision as of 11:02, 17 June 2013

1. Multiple sequence alignment

For HRas, we downloaded the full MSA in FASTA format (=a2m?) from Pfam: Ras (PF00071). It contains 21,243 sequences. For the calculation of correlated mutations using freecontact, the MSA (ras.txt) had to be reformatted (to ras.aln) with /usr/share/freecontact/a2m2aln like this:

/usr/share/freecontact/a2m2aln --query '^RASH_HUMAN/(\d+)' --quiet < ras.txt > ras.aln

The alignments can be found in: /mnt/home/student/kalemanovm/master_practical/Assignment6_Evolutionary_sequence_variation/pfam_Ras_ali/

For our protein, the Pfam alignment of the only family found, Glyco_hydro_30 (PF02055), contained only 1151 sequences, which is not enough for freecontact. Therefore, we used own alignments from task 2, as HHblits found 17,538 hit with default E-value cutoff and 3,189 hits with E-value cutoff of 10E-1 after 10 iterations (actually after 5). We took both alignment to compare the results, as the search with default E-value on the one hand has generated more alignmnets, which is an advantage for freecontact, but on the other hand some of those hits could have been false positives, as discussed.

2. Calculate and analyze correlated mutations

With the reformatted alignments the residue contacts where predicted with freecontact:

 freecontact --parprof evfold < ras.aln >  ras_contacts.out

After that the contact.out file was parsed with calc_hotspot.py to calculate the hotspot residues. The script can be found in the directory /mnt/home/student/gerkej/gaucher/task6/, the contact.out files are stored in the subdirectories pfam_Ras/ and P04062/