Task 6 Lab Journal (MSUD)
Multiple sequence alignments
By searching at Pfam website, we have used the web form to download the multiple-sequence alignment(MSA) of all 21,243 sequences that are related to H-Ras protein, and all 9023 sequences related to the BCKDHA protein. (web forms: PF00071, PF00676)
With help of the program /usr/share/freecontact/a2m2aln by Thomas, we have formated the MSA from FASTA format to simple alignment format. Following command-lines were used for re-formating MSA: <source lang='bash'> /usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' < PF00071_full.txt > PF00071_full.aln /usr/share/freecontact/a2m2aln -q '^ODBA_HUMAN/(\d+)' < PF00676_full.txt > PF00676_full.aln </source>
The reformatted alignments were used as input for freecontact, to predict contacts between residues:
<source lang='bash'> freecontact -o evfold < PF00071_full.aln > PF00071.evfold freecontact -o evfold < PF00676_full.aln > PF00676.evfold </source>
The freecontact output was analyzed with the R script located at /mnt/home/student/schillerl/MasterPractical/task6/analyse_correlated_mutations.r
.
EVcouplings was run on the evolutionary couplings server. For BCKDHA, the alignment was restricted to residues 106-405 (the residues which are present in Pfam alignment), because otherwise there were not enough matching columns.
Structure modelling
EVfold was run on the evolutionary couplings server.
For BCKDHA, the alignment was restricted to residues 106-405 (as above), and the number of constraints to use for structure prediction were set to "195 120 300" (which corresponds to 65, 40 and 100 % of alignment length).