Difference between revisions of "Lab Journal - Task 6 (PAH)"

From Bioinformatikpedia
(Calculate and analyze correlated mutations)
(Calculate and analyze correlated mutations)
Line 5: Line 5:
   
 
== Calculate and analyze correlated mutations ==
 
== Calculate and analyze correlated mutations ==
#<code>Freecontact</code> is used to calculate CN-score for the multiple alignments:<br><code>freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold</code>
+
#''Freecontact'' is used to calculate CN-score for the multiple alignments:<br><code>freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold</code>
#extract_pairs.pl extracts all residue pairs with distance >5.
+
#''extract_pairs.pl'' extracts all residue pairs with distance >5.
 
#the results are sorted (CN-score descending) for both all and extracted residue pairs: <br><code>sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt</code>
 
#the results are sorted (CN-score descending) for both all and extracted residue pairs: <br><code>sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt</code>
#CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:<br><code>top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)</code>
+
#''CN_dist.R'' makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:<br><code>top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)</code>
   
   

Revision as of 12:05, 15 June 2013

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

  1. Protein RASH:
    /usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
  2. For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
    /usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln

Calculate and analyze correlated mutations

  1. Freecontact is used to calculate CN-score for the multiple alignments:
    freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold
  2. extract_pairs.pl extracts all residue pairs with distance >5.
  3. the results are sorted (CN-score descending) for both all and extracted residue pairs:
    sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt
  4. CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
    top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)


Evcouplings Reference structure for Ras is 121p. ...

Calculate structural model

The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.