Lab Journal - Task 6 (PAH)

From Bioinformatikpedia
Revision as of 12:04, 15 June 2013 by Waldraffs (talk | contribs) (Calculate and analyze correlated mutations)

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

  1. Protein RASH:
    /usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
  2. For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
    /usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln

Calculate and analyze correlated mutations

  1. Freecontact is used to calculate CN-score for the multiple alignments:
    freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold
  2. extract_pairs.pl extracts all residue pairs with distance >5.
  3. the results are sorted (CN-score descending) for both all and extracted residue pairs:
    sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt
  4. CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
    top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)


Evcouplings Reference structure for Ras is 121p. ...

Calculate structural model

The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.