Difference between revisions of "Lab Journal - Task 6 (PAH)"

From Bioinformatikpedia
(Multiple Sequence Alignment)
(Calculate and analyze correlated mutations)
Line 7: Line 7:
 
extract_pairs.pl extracts all residue pairs with distance >5.
 
extract_pairs.pl extracts all residue pairs with distance >5.
   
freecontact -o evfold < 'PF00071.aln' > PF00071.evfold
+
#RASH <br><code>freecontact -o evfold < 'PF00071.aln' > PF00071.evfold<br>sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt<br>sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt</code>
  +
#PAH <br><code>freecontact -o evfold < 'PF00351.aln' > PF00351.evfold<br>sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt<br>sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt</code>
sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt
 
sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt
 
 
Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p].
 
 
For our
 
freecontact -o evfold < 'PF00351.aln' > PF00351.evfold
 
sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt
 
sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt
 
   
 
CN_dist.R makes histograms and multiple histograms for the CN-Score distribution.
 
CN_dist.R makes histograms and multiple histograms for the CN-Score distribution.
 
Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
 
Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
 
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
 
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
  +
  +
  +
  +
  +
Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p].
   
 
== Calculate structural model ==
 
== Calculate structural model ==

Revision as of 11:57, 15 June 2013

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

  1. Protein RASH:
    /usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
  2. For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
    /usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln

Calculate and analyze correlated mutations

extract_pairs.pl extracts all residue pairs with distance >5.

  1. RASH
    freecontact -o evfold < 'PF00071.aln' > PF00071.evfold
    sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt
    sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt
  2. PAH
    freecontact -o evfold < 'PF00351.aln' > PF00351.evfold
    sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt
    sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt

CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:

top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)



Reference structure for Ras is 121p.

Calculate structural model

The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.