Lab Journal - Task 6 (PAH)
From Bioinformatikpedia
Multiple Sequence Alignment
The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.
- Protein H-RAS:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
- For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln
- Freecontact is used to calculate CN-score for the multiple alignments:
freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold
- extract_pairs.pl extracts all residue pairs with distance >5.
- the results are sorted (CN-score descending) for both all and extracted residue pairs:
sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt
- CN_dist.R makes histograms and multiple histograms for the CN-Score distribution. Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
- Evcouplings
Reference structure for Ras is 121p.
For the biopterin family we have to set the starting position to 106 to get a multiple alignment.
Calculate structural model
The length of Pfam alignment of Ras is 160, therefore we take following number of contacts: 64, 104, 160.