Difference between revisions of "Lab Journal - Task 6 (PAH)"

From Bioinformatikpedia
(Calculate and analyze correlated mutations)
(Multiple Sequence Alignment)
Line 1: Line 1:
 
== Multiple Sequence Alignment ==
 
== Multiple Sequence Alignment ==
  +
The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.
...
 
  +
#Protein RASH:
 
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
 
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
   
  +
#For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
 
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln
 
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln
   

Revision as of 11:54, 15 June 2013

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

  1. Protein RASH:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln 
  1. For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln

Calculate and analyze correlated mutations

extract_pairs.pl extracts all residue pairs with distance >5.

freecontact -o evfold < 'PF00071.aln' > PF00071.evfold
sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt
sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt

Reference structure for Ras is 121p.

For our

freecontact -o evfold < 'PF00351.aln' > PF00351.evfold
sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt
sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt

CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:

top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)

Calculate structural model

The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.