Difference between revisions of "Lab Journal - Task 6 (PAH)"
From Bioinformatikpedia
(→Multiple Sequence Alignment) |
(→Calculate and analyze correlated mutations) |
||
Line 7: | Line 7: | ||
extract_pairs.pl extracts all residue pairs with distance >5. |
extract_pairs.pl extracts all residue pairs with distance >5. |
||
− | + | #RASH <br><code>freecontact -o evfold < 'PF00071.aln' > PF00071.evfold<br>sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt<br>sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt</code> |
|
+ | #PAH <br><code>freecontact -o evfold < 'PF00351.aln' > PF00351.evfold<br>sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt<br>sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt</code> |
||
− | sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt |
||
− | sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt |
||
− | |||
− | Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p]. |
||
− | |||
− | For our |
||
− | freecontact -o evfold < 'PF00351.aln' > PF00351.evfold |
||
− | sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt |
||
− | sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt |
||
CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. |
CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. |
||
Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L: |
Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L: |
||
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L) |
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L) |
||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p]. |
||
== Calculate structural model == |
== Calculate structural model == |
Revision as of 11:57, 15 June 2013
Multiple Sequence Alignment
The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.
- Protein RASH:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
- For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln
extract_pairs.pl extracts all residue pairs with distance >5.
- RASH
freecontact -o evfold < 'PF00071.aln' > PF00071.evfold
sort -k 6 -g -r PF00071.evfold >sort_PF00071.txt
sort -k 6 -g -r PF00071_extract.evfold >sort_PF00071_extract.txt - PAH
freecontact -o evfold < 'PF00351.aln' > PF00351.evfold
sort -k 6 -g -r PF00351.evfold >sort_PF00351.txt
sort -k 6 -g -r PF00351_extract.evfold >sort_PF00351_extract.txt
CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
Reference structure for Ras is 121p.
Calculate structural model
The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.