Difference between revisions of "Lab Journal - Task 6 (PAH)"

Revision as of 14:45, 17 August 2013

All files are stored in /mnt/home/student/waldraffs/masterpractical/Task6

/ras: contains all files for H-RAS
/pah: contains all files for PAH

Multiple Sequence Alignment

The multiple alignments are downloaded from the PFAM server and are converted into a freecontact readable format using a2m2aln.

Protein H-RAS:
/usr/share/freecontact/a2m2aln -q '^RASH_HUMAN/(\d+)' --quiet < PF00071_full.txt > PF00071.aln
For our protein PAH, we have two domains (ACT: PF01842, Biopterin: PF00351) and therefore used the hhblits result of Task2. The .a3m file is converted into stockholm format using
perl /usr/share/hhsuite/scripts/reformat.pl a3m sto PAH_2000.a3m PAH_2000.stockholm

After that the header is changed into # query=" and positions that have a gap in the query sequences are removed: PAH.aln.

Calculate and analyze correlated mutations

Freecontact is used to calculate CN-score for the multiple alignments:
freecontact -o evfold < '<FILE>.aln' > <FILE>.evfold
contact_map.pl extracts all residue pairs with less than 5 Ångstrom minimum atom distance:
perl contact_map.pl -pdb <pdb-file> -out <output-file>
extract_pairs.pl extracts all residue pairs with distance >5, if such a pair also is included in the output of contact_map.pl it is marked with 'TP' (true positive) else with 'FP' (false positive):
perl extract_pairs.pl -inp <FILE>.evfold -map <contact_map.pl output-file> -out <output-file>
the results are sorted (CN-score descending) for both all and extracted residue pairs:
sort -k 6 -g -r <FILE> >sort_<FILE>
CN_dist2.R makes histograms for the CN-Score distribution (for all and extracted pairs). Furthermore it calculates the top L-Score (L = protein length) for each residue i that belongs to the top L:
top L-Score(i) = (sum of CN scores for residue i)/mean(CN-Scores of top L)
contact_map.R creates a contact map with the output-files of the two perl scripts above (pdb = reference structure, extracted = predicted).
Evcouplings
Reference structure for Ras is 121p.
For the biopterin family we have to set the starting position to 106 to get a multiple alignment.

The perl and R scripts can be found in /mnt/home/student/waldraffs/masterpractical/Task6.

Calculate structural model

The length of Pfam alignment of H-Ras is 160, therefore we take following number of contacts: 64, 104, 160.
For biopterin the protein length is 346 as we only make an alignment with amino acids 106 to 452. So we take 138, 225 and 346 as number of contacts.

@@ Line 18: / Line 18: @@
 The perl and R scripts can be found in <code>/mnt/home/student/waldraffs/masterpractical/Task6</code>.
-CN_dist2.R script-call:<br>
-R CMD BATCH --slave '--args infile1=<FILE1> infile2=<FILE2> png_file=<OUTFILE1> output=<OUTFILE>' contact_map.R /dev/tty
- -infile1         The evfold file with path.
- -infile2         The extracted evfold file with path.
- -png_file        PNG-file (.png) and the path, where the image of the multiple histogram with the CN-score frequencies
-                  for all and extracted residues should be saved.
- -output          File-name and path, where the L-Scores should be stored.
-contact_map.R script-call:<br>
-R CMD BATCH '--args infile1=<FILE1> infile2=<FILE2> tophits=<#pairs> output=<OUTFILE>' contact_map.R
- -infile1         The sorted and extracted evfold file with path.
- -infile2         The pdb-contact file (output of contact_map.pl) with path.
- -tophits         number, how many of the best residue pairs should be represented in the contact map.
- -output          File-name of the map and the path, where it should be stored. File must be a PNG-File (.png).
 == Calculate structural model ==

Difference between revisions of "Lab Journal - Task 6 (PAH)"

Revision as of 14:45, 17 August 2013

Multiple Sequence Alignment

Calculate and analyze correlated mutations

Calculate structural model

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools