Phenylketonuria/Task6 Scripts
contact_map.pl
contact_map.pl is a perl script that extracts all residue pairs with less than 5 Ångstrom minimum atom distance:perl contact_map.pl -pdb <pdb-file> -out <output-file>
<source lang="perl">
- !/usr/bin/perl
my $pdb; my $out;
- Reads the command line parameters
for ($i=0; $i<=@ARGV;$i++){ if($ARGV[$i] eq "-pdb"){ $pdb=$ARGV[$i+1]; } if($ARGV[$i] eq "-out"){ $out=$ARGV[$i+1]; } }
if(!$pdb && !$out){ print "Usage: contact_map.pl -pdb <pdb-file> -out <output-file>." }
- reads pdb file and filters for the 'ATOM' lines and saves the residues with their 3D coordinates.
open (FILE,"$pdb") || die $!; my @res; my @x; my @y; my @z; my @res;
while (my $line = <FILE>){ $line =~ s/\s+/ /g; if($line=~ m/^ATOM/){ @splitLine = split(/ /,$line); $acres=$splitLine[5]; push(@{$$acres{aa}},$splitLine[3]); push(@{$$acres{x}}, $splitLine[6]); push(@{$$acres{y}},$splitLine[7]); push(@{$$acres{z}},$splitLine[8]); if((grep /^$acres/, @res)==0){ push(@res,$acres); } } } close FILE;
- writes output file
open(OUTFILE,">$out") || die $!;
- The residue pairs with a distance of 5 Angstrom are filtered.
my @res_res; $rlength=@res; for($res1=$res[0];$res1<=$res[$rlength-1];$res1++){ for($res2=$res1;$res2<=$res[$rlength-1];$res2++){ $stop=0; $aa1= @{$$res1{aa}}[0]; $aa2= @{$$res2{aa}}[0]; $res_res=$res1."_".$res2; $aa=$aa1."_".$aa2; for($c1=0;$c1<@{$$res1{x}};$c1++){ for($c2=0;$c2<@{$$res2{x}};$c2++){ #if the atoms of a residue pair already has a distance smaller than 5 the distances of the other atomes do not have to be calculated. if($stop==0){ #distance x-coordinate, y-coordinate and z-coordinate $xc = (@{$$res1{x}}[$c1]-@{$$res2{x}}[$c2])*(@{$$res1{x}}[$c1]-@{$$res2{x}}[$c2]); $yc = (@{$$res1{y}}[$c1]-@{$$res2{y}}[$c2])*(@{$$res1{y}}[$c1]-@{$$res2{y}}[$c2]); $zc = (@{$$res1{z}}[$c1]-@{$$res2{z}}[$c2])*(@{$$res1{z}}[$c1]-@{$$res2{z}}[$c2]); #complete residue distance: $dist = sqrt ($xc + $yc + $zc); if($dist<5){ #writes the first and the second residue as well as the combinated residue pair and their amino acids in the output-file print OUTFILE "$res1\t$res2\t$res_res\t$aa\n"; $stop=1; } } } } } } close OUTFILE; </source>
extract_pairs.pl
extract_pairs.pl is a perl script that extracts all residue pairs with distance >5. If such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):perl extract_pairs.pl -inp <FILE>.evfold -map <contact_map.pl output-file> -out <output-file>
<source lang="perl">
- !/usr/bin/perl
my $inp; my $map; my $out;
- Reads the command line parameters
for ($i=0; $i<=@ARGV;$i++){ if($ARGV[$i] eq "-inp"){ $inp=$ARGV[$i+1]; }if($ARGV[$i] eq "-map"){ $map=$ARGV[$i+1]; } if($ARGV[$i] eq "-out"){ $out=$ARGV[$i+1]; } }
if(!$inp && !$out && !$map){ print "Usage: extract_pairs.pl -inp <input-file> -map <pdb-map> -out <output-file>" }
- reads the contact_map of the pdb structure created with contact_map.pl
open (MAP,"$map") || die $!; while(my $line = <MAP>){ @splitLine = split(/\t/,$line); push(@res_res, $splitLine[2]); } close MAP;
- reads input file (evfold) and filters the residue pairs with distances bigger than 5 residues.
open (FILE,"$inp") || die $!;
- writes output file
open(OUTFILE,">$out") || die $!;
while (my $line = <FILE>){ chomp($line); @splitLine = split(/ /,$line); $res1=$splitLine[0]; $res2=$splitLine[2]; $distance=$res2-$res1; if($distance > 5){ $res_res=$res1."_".$res2; #if the residue pair is not found in the pdb map it is called FP (false positive) else TP (true positive) if((grep /^$res_res/, @res_res)==0){ print OUTFILE $line." FP\n"; }else{ print OUTFILE $line." TP\n"; } } }
close FILE; close OUTFILE; </source>
CN_dist2.R
CN_dist2.R is a R script that makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L
CN_dist2.R script-call:
R CMD BATCH --slave '--args infile1=<FILE1> infile2=<FILE2> png_file=<OUTFILE1> output=<OUTFILE>' contact_map.R /dev/tty
-infile1 The evfold file with path. -infile2 The extracted evfold file with path. -png_file PNG-file (.png) and the path, where the image of the multiple histogram with the CN-score frequencies for all and extracted residues should be saved. -output File-name and path, where the L-Scores should be stored.
contact_map.R
contact_map.R is a R script that creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted).
contact_map.R script-call:
R CMD BATCH '--args infile1=<FILE1> infile2=<FILE2> tophits=<#pairs> output=<OUTFILE>' contact_map.R
-infile1 The sorted and extracted evfold file with path. -infile2 The pdb-contact file (output of contact_map.pl) with path. -tophits number, how many of the best residue pairs should be represented in the contact map. -output File-name of the map and the path, where it should be stored. File must be a PNG-File (.png).