Difference between revisions of "Phenylketonuria/Task6 Scripts"
(→Path) |
(→Path) |
||
Line 1: | Line 1: | ||
==Path== |
==Path== |
||
The perl and R scripts described below can be found in <code>/mnt/home/student/waldraffs/masterpractical/Task06</code>. |
The perl and R scripts described below can be found in <code>/mnt/home/student/waldraffs/masterpractical/Task06</code>. |
||
+ | Back to [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Lab_Journal_-_Task_6_%28PAH%29#Multiple_Sequence_Alignment Lab Journal] |
||
==contact_map.pl== |
==contact_map.pl== |
Revision as of 15:06, 17 August 2013
Path
The perl and R scripts described below can be found in /mnt/home/student/waldraffs/masterpractical/Task06
.
Back to Lab Journal
contact_map.pl
contact_map.pl is a perl script that extracts all residue pairs with less than 5 Ångstrom minimum atom distance:perl contact_map.pl -pdb <pdb-file> -out <output-file>
Source code: <source lang="perl">
- !/usr/bin/perl
my $pdb; my $out;
- Reads the command line parameters
for ($i=0; $i<=@ARGV;$i++){ if($ARGV[$i] eq "-pdb"){ $pdb=$ARGV[$i+1]; } if($ARGV[$i] eq "-out"){ $out=$ARGV[$i+1]; } }
if(!$pdb && !$out){ print "Usage: contact_map.pl -pdb <pdb-file> -out <output-file>." }
- reads pdb file and filters for the 'ATOM' lines and saves the residues with their 3D coordinates.
open (FILE,"$pdb") || die $!; my @res; my @x; my @y; my @z; my @res;
while (my $line = <FILE>){ $line =~ s/\s+/ /g; if($line=~ m/^ATOM/){ @splitLine = split(/ /,$line); $acres=$splitLine[5]; push(@{$$acres{aa}},$splitLine[3]); push(@{$$acres{x}}, $splitLine[6]); push(@{$$acres{y}},$splitLine[7]); push(@{$$acres{z}},$splitLine[8]); if((grep /^$acres/, @res)==0){ push(@res,$acres); } } } close FILE;
- writes output file
open(OUTFILE,">$out") || die $!;
- The residue pairs with a distance of 5 Angstrom are filtered.
my @res_res; $rlength=@res; for($res1=$res[0];$res1<=$res[$rlength-1];$res1++){ for($res2=$res1;$res2<=$res[$rlength-1];$res2++){ $stop=0; $aa1= @{$$res1{aa}}[0]; $aa2= @{$$res2{aa}}[0]; $res_res=$res1."_".$res2; $aa=$aa1."_".$aa2; for($c1=0;$c1<@{$$res1{x}};$c1++){ for($c2=0;$c2<@{$$res2{x}};$c2++){ #if the atoms of a residue pair already has a distance smaller than 5 the distances of the other atomes do not have to be calculated. if($stop==0){ #distance x-coordinate, y-coordinate and z-coordinate $xc = (@{$$res1{x}}[$c1]-@{$$res2{x}}[$c2])*(@{$$res1{x}}[$c1]-@{$$res2{x}}[$c2]); $yc = (@{$$res1{y}}[$c1]-@{$$res2{y}}[$c2])*(@{$$res1{y}}[$c1]-@{$$res2{y}}[$c2]); $zc = (@{$$res1{z}}[$c1]-@{$$res2{z}}[$c2])*(@{$$res1{z}}[$c1]-@{$$res2{z}}[$c2]); #complete residue distance: $dist = sqrt ($xc + $yc + $zc); if($dist<5){ #writes the first and the second residue as well as the combinated residue pair and their amino acids in the output-file print OUTFILE "$res1\t$res2\t$res_res\t$aa\n"; $stop=1; } } } } } } close OUTFILE; </source>
extract_pairs.pl
extract_pairs.pl is a perl script that extracts all residue pairs with distance >5. If such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):perl extract_pairs.pl -inp <FILE>.evfold -map <contact_map.pl output-file> -out <output-file>
Source code: <source lang="perl">
- !/usr/bin/perl
my $inp; my $map; my $out;
- Reads the command line parameters
for ($i=0; $i<=@ARGV;$i++){ if($ARGV[$i] eq "-inp"){ $inp=$ARGV[$i+1]; }if($ARGV[$i] eq "-map"){ $map=$ARGV[$i+1]; } if($ARGV[$i] eq "-out"){ $out=$ARGV[$i+1]; } }
if(!$inp && !$out && !$map){ print "Usage: extract_pairs.pl -inp <input-file> -map <pdb-map> -out <output-file>" }
- reads the contact_map of the pdb structure created with contact_map.pl
open (MAP,"$map") || die $!; while(my $line = <MAP>){ @splitLine = split(/\t/,$line); push(@res_res, $splitLine[2]); } close MAP;
- reads input file (evfold) and filters the residue pairs with distances bigger than 5 residues.
open (FILE,"$inp") || die $!;
- writes output file
open(OUTFILE,">$out") || die $!;
while (my $line = <FILE>){ chomp($line); @splitLine = split(/ /,$line); $res1=$splitLine[0]; $res2=$splitLine[2]; $distance=$res2-$res1; if($distance > 5){ $res_res=$res1."_".$res2; #if the residue pair is not found in the pdb map it is called FP (false positive) else TP (true positive) if((grep /^$res_res/, @res_res)==0){ print OUTFILE $line." FP\n"; }else{ print OUTFILE $line." TP\n"; } } }
close FILE; close OUTFILE; </source>
CN_dist2.R
CN_dist2.R is a R script that makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L
CN_dist2.R script-call:
R CMD BATCH --slave '--args infile1=<FILE1> infile2=<FILE2> png_file=<OUTFILE1> output=<OUTFILE>' contact_map.R /dev/tty
-infile1 The evfold file with path. -infile2 The extracted evfold file with path. -png_file PNG-file (.png) and the path, where the image of the multiple histogram with the CN-score frequencies for all and extracted residues should be saved. -output File-name and path, where the L-Scores should be stored.
Source code: <source lang="php">
- reads the input parameters
args=(commandArgs(trailingOnly=TRUE))
- go through the parameters
if(length(args)==0){
cat("\nScript-call:\nR CMD BATCH --slave '--args infile1=<FILE1> infile2=<FILE2> png_file=<OUTFILE1> output=<OUTFILE>' contact_map.R /dev/tty
\n\ninfile1\t\tThe evfold file with path. \ninfile2\t\tThe extracted evfold file with path. \npng_file\t\tPNG-file (.png) and the path, where the image of the multiple histogram with the CN-score frequencies for all and extracted residues should be saved. \noutput\t\tFile-name and path, where the L-Scores should be stored.\n\n") }else{
for(i in 1:length(args)){ eval(parse(text=argsi)) }
evfold<-read.table(infile1, sep=" ") evfold_extract<-read.table(infile2, sep=" ") out<-output pg<-png_file
difference <- evfold[,3]-evfold[,1] cn<-evfold[,6]
difference2 <- evfold_extract[,3]-evfold_extract[,1] cn2<-evfold_extract[,6] png(pg) par(mfrow=c(1,2)) hist(cn,col=("forestgreen"),xlab="CN",breaks = seq(-1.5,6.5, by = 0.1), main ="All pairs") hist(cn2,col=("purple"),xlab="CN",breaks = seq(-1.5,6.5, by = 0.1), main ="Extracted pairs") dev.off() #multiples Histogram with both CN score for all and for extracted residues
- library(plotrix)
- par(mfrow=c(1,1))
- l<-list(cn,cn2)
- png(pg)
- multhist(l,col=c("forestgreen","purple"),breaks=35,xlab="CN",ylab="frequency", main = "CN score frequency for all and extracted residue pairs")
- legend("topright",legend=c("ras","extracted"), fill=c("forestgreen","purple"))
- dev.off()
#L-Score for extracted pairs res1<-evfold_extract[,1] res2<-evfold_extract[,3] plength <-166 res <-1:plength sum <- data.frame(residue=res,score=0) #For all top-couplings sum the scores. Both residues must be considered. for(i in 1:plength){ sum[res1[i],2] <- sum[res1[i],2] + cn2[i] sum[res2[i],2] <- sum[res2[i],2] + cn2[i] } sum$score<-round(sum$score/mean(cn2[1:plength]),digits=2) sum <- sum[order(sum$score, decreasing = T),] write.table(sum[1:plength,], out , quote = F, row.names = F) } </source>
contact_map.R
Source code:
contact_map.R is a R script that creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted).
contact_map.R script-call:
R CMD BATCH '--args infile1=<FILE1> infile2=<FILE2> tophits=<#pairs> output=<OUTFILE>' contact_map.R
-infile1 The sorted and extracted evfold file with path. -infile2 The pdb-contact file (output of contact_map.pl) with path. -tophits number, how many of the best residue pairs should be represented in the contact map. -output File-name of the map and the path, where it should be stored. File must be a PNG-File (.png).
<source lang="php">
- reads the input parameters
args=(commandArgs(trailingOnly=TRUE))
- go through the parameters
if(length(args)==0){
cat("\nScript-call:\nR CMD BATCH '--args infile1=<FILE1> infile2=<FILE2> tophits=<#pairs> output=<OUTFILE>' contact_map.R
\n\ninfile1\t\tThe sorted and extracted evfold file with path. \ninfile2\t\tThe pdb-contact file (output of contact_map.pl) with path. \ntophits\t\t number, how many of the best residue pairs should be represented in the contact map. \noutput\t\tFile-name of the map and the path, where it should be stored. File must be a PNG-File (.png).\n\n") }else{
for(i in 1:length(args)){ eval(parse(text=argsi)) }
in1<-read.table(infile1, sep=" ") pdb<-read.table(infile2, sep="\t") npairs<-tophits out<-output
cres1<-c() cres2<-c() #only take the first n entries, where n is the number of residue pairs for(i in 1:npairs){ cres1[i]<-c(in1[i,1]) cres2[i]<-c(in1[i,3]) } #residue pairs of the evfold prediction res1 <- c(cres1,cres2) res2 <- c(cres2,cres1)
#residue pairs of the pdb structure l1<-c(pdb[,1],pdb[,2]) l2<-c(pdb[,2],pdb[,1])
png(out) plot(l1,l2, col="lightgrey",pch =16,ylim=rev(range(l1)),xlab="residue",ylab="residue",main=paste("Contact Map for",npairs, "pairs",sep=" ")) lines(res1,res2,type="p",col="red2",pch =16,ylim=rev(range(res1))) dev.off() } </source>