Difference between revisions of "Lab Journal - Task 6 (PAH)"
From Bioinformatikpedia
(→Calculate and analyze correlated mutations) |
(→Calculate and analyze correlated mutations) |
||
Line 9: | Line 9: | ||
#''extract_pairs.pl'' extracts all residue pairs with distance >5, if such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):<br><code>perl extract_pairs.pl -inp <PFAM-ID>.evfold -map <contact_map.pl output-file> -out <output-file></code> |
#''extract_pairs.pl'' extracts all residue pairs with distance >5, if such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):<br><code>perl extract_pairs.pl -inp <PFAM-ID>.evfold -map <contact_map.pl output-file> -out <output-file></code> |
||
#the results are sorted (CN-score descending) for both all and extracted residue pairs: <br><code>sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt</code> |
#the results are sorted (CN-score descending) for both all and extracted residue pairs: <br><code>sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt</code> |
||
− | #'' |
+ | #''CN_dist2.R'' makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:<br><code>top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)</code> |
#''contact_map.R'' creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted). |
#''contact_map.R'' creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted). |
||
#Evcouplings<br>Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p].<br>For the biopterin family we have to set the starting position to 106 to get a multiple alignment. |
#Evcouplings<br>Reference structure for Ras is [http://www.rcsb.org/pdb/explore/explore.do?structureId=121p 121p].<br>For the biopterin family we have to set the starting position to 106 to get a multiple alignment. |
||
− | The perl scripts can be found in <code>/mnt/home/student/waldraffs/masterpractical/Task6</code>. |
+ | The perl and R scripts can be found in <code>/mnt/home/student/waldraffs/masterpractical/Task6</code>. |
== Calculate structural model == |
== Calculate structural model == |
Revision as of 17:37, 18 June 2013
Multiple Sequence Alignment
The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.
- Protein H-RAS:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
- For our protein PAH, we have two domains. As the Biopterin-domain is said to be causing PKU if damaged, we used the PFAM alignment of this domain:
/usr/share/freecontact/a2m2aln -q '^PH4H_HUMAN/(\d+)' --quiet < PF00351_full.txt > PF00351.aln
- Freecontact is used to calculate CN-score for the multiple alignments:
freecontact -o evfold < '<PFAM-ID>.aln' > <PFAM-ID>.evfold
- contact_map.pl extracts all residue pairs with less than 5 Ångstrom minimum atom distance:
perl contact_map.pl -pdb <pdb-file> -out <output-file>
- extract_pairs.pl extracts all residue pairs with distance >5, if such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):
perl extract_pairs.pl -inp <PFAM-ID>.evfold -map <contact_map.pl output-file> -out <output-file>
- the results are sorted (CN-score descending) for both all and extracted residue pairs:
sort -k 6 -g -r <PFAM-ID>.evfold >sort_<PFAM-ID>.txt
- CN_dist2.R makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
- contact_map.R creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted).
- Evcouplings
Reference structure for Ras is 121p.
For the biopterin family we have to set the starting position to 106 to get a multiple alignment.
The perl and R scripts can be found in /mnt/home/student/waldraffs/masterpractical/Task6
.
Calculate structural model
The length of Pfam alignment of H-Ras is 160, therefore we take following number of contacts: 64, 104, 160.
For biopterin the protein length is 346 as we only make an alignment with amino acids 106 to 452. So we take 138, 225 and 346 as number of contacts.