Lab Journal - Task 6 (PAH)

All files are stored in /mnt/home/student/waldraffs/masterpractical/Task6

/ras: contains all files for H-RAS
/pah: contains all files for PAH

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

Protein H-RAS:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
For our protein PAH, we have two domains (ACT: PF01842, Biopterin: PF00351) and therefore used the hhblits result of Task2. The .a3m file is converted into stockholm format using
perl /usr/share/hhsuite/scripts/reformat.pl a3m sto PAH_2000.a3m PAH_2000.stockholm

After that the header is changed into # query=" and positions that have a gap in the query sequences are removed: PAH.aln.

Calculate and analyze correlated mutations

Freecontact is used to calculate CN-score for the multiple alignments:
freecontact -o evfold < '<FILE>.aln' > <FILE>.evfold
contact_map.pl extracts all residue pairs with less than 5 Ångstrom minimum atom distance:
perl contact_map.pl -pdb <pdb-file> -out <output-file>
extract_pairs.pl extracts all residue pairs with distance >5, if such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):
perl extract_pairs.pl -inp <FILE>.evfold -map <contact_map.pl output-file> -out <output-file>
the results are sorted (CN-score descending) for both all and extracted residue pairs:
sort -k 6 -g -r <FILE> >sort_<FILE>
CN_dist2.R makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
contact_map.R creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted).
Evcouplings
Reference structure for Ras is 121p.
For the biopterin family we have to set the starting position to 106 to get a multiple alignment.

The perl and R scripts can be found in /mnt/home/student/waldraffs/masterpractical/Task6.

CN_dist2.R script-call:
R CMD BATCH --slave '--args infile1=<FILE1> infile2=<FILE2> png_file=<OUTFILE1> output=<OUTFILE>' contact_map.R /dev/tty

-infile1         The evfold file with path.
-infile2         The extracted evfold file with path.
-png_file        PNG-file (.png) and the path, where the image of the multiple histogram with the CN-score frequencies 
                 for all and extracted residues should be saved.
-output          File-name and path, where the L-Scores should be stored.

contact_map.R script-call:
R CMD BATCH '--args infile1=<FILE1> infile2=<FILE2> tophits=<#pairs> output=<OUTFILE>' contact_map.R

-infile1         The sorted and extracted evfold file with path.
-infile2         The pdb-contact file (output of contact_map.pl) with path.
-tophits         number, how many of the best residue pairs should be represented in the contact map.
-output          File-name of the map and the path, where it should be stored. File must be a PNG-File (.png).

Calculate structural model

The length of Pfam alignment of H-Ras is 160, therefore we take following number of contacts: 64, 104, 160.
For biopterin the protein length is 346 as we only make an alignment with amino acids 106 to 452. So we take 138, 225 and 346 as number of contacts.

Lab Journal - Task 6 (PAH)

Multiple Sequence Alignment

Calculate and analyze correlated mutations

Calculate structural model

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools