Difference between revisions of "Lab Journal - Task 9 (PAH)"

From Bioinformatikpedia
(Structure selection)
(Structure selection)
Line 1: Line 1:
 
== Structure selection ==
 
== Structure selection ==
The information for the resolution, chain and the positions in PAH can be found on the UniProt entry [http://www.uniprot.org/uniprot/P00439 P00439] itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the [http://www.rcsb.org/pdb/explore/materialsAndMethods.do?structureId=1DMW pdb] entries in the method section.
+
The information for the resolution, chain and the positions in PAH can be found on the UniProt entry [http://www.uniprot.org/uniprot/P00439 P00439] itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the [http://www.rcsb.org/pdb/explore/materialsAndMethods.do?structureId=1DMW pdb] entries in the method section. In the following we show the used procedure for the example pdb ID 1DMW. All other pdb IDs that are analysed in [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Structure-based_mutation_analysis_(Phenylketonuria) Table 1] are treated the same way.<br> To check if a gap is included in a structure, we first downloaded the pdb file in text format from the [http://www.rcsb.org/pdb/explore/explore.do?structureId=1dmw pdb] website and then used following unix shell-commands:
 
To check if a gap is included in a structure, we first downloaded the pdb file in text format from the [http://www.rcsb.org/pdb/explore/explore.do?structureId=1dmw pdb] website and then used following unix shell-commands:
 
 
grep "^ATOM" 1DMW.pdb > 1DMW.txt
 
grep "^ATOM" 1DMW.pdb > 1DMW.txt
 
cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt
 
cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt

Revision as of 09:29, 27 August 2013

Structure selection

The information for the resolution, chain and the positions in PAH can be found on the UniProt entry P00439 itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the pdb entries in the method section. In the following we show the used procedure for the example pdb ID 1DMW. All other pdb IDs that are analysed in Table 1 are treated the same way.
To check if a gap is included in a structure, we first downloaded the pdb file in text format from the pdb website and then used following unix shell-commands:

grep "^ATOM" 1DMW.pdb > 1DMW.txt
cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt

Then, we have to check, if the residues are consecutive. Therefore, we wrote following Python-script: <source lang=python>

  1. Script to check, if the residues are consecutive or not.
  2. If it runs without printing something to the commandline,
  3. then there is no gap included, else the gap is printed!!!

data = open(".../1DMW_res.txt").readlines() good = [] for i in range(len(data)-1): if int(data[i]) == int(data[i+1])-1: good.append(data[i]) else: print("found\t" + data[i] + "\t" + data[i+1]) </source> The coverage in per cent was calculated like shown in the example below:

  • P00439 has a sequence length of 452AA
  • 1DMW has 424 - 117 = 307 residues
  • coverage: 307 / (452 / 100) = 67,92%

=> 1DMW has a coverage of 67,92% of the PAH (P00439) sequence!

SCWRL

Before generating the mutations with SCWRL, we first had to filter the sequence from the pdb file. Therefore, we used the repairPDB script with following command:

/opt/SS12_Practical/scripts/repairPDB 1J8U.pdb -seq > 1J8U_seq.txt

Afterwards, we had to change the upper letters to lower ones, except for the interesting mutations. For this purpose, we generated a python script: <source lang=python> data = open(".../1J8U_seq.txt") seq = "" for line in data: # print everything in lower case letters seq = seq + line.lower()

  1. print(seq)
  2. find the mutation and change this letter into an upper case one
s.upper() <-- for upper case letters

for i in range(len(seq)): mut = "" if i == (103 - 118): print(seq[i]) mut = "S" seq = seq[0:i] + mut + seq[i+1:len(seq)]

out = open(".../1J8U_seq_mut.txt", "w") out.write(seq) out.close() </source> Then, we can generate a mutation with SCWRL for each sequence file like shown below:

/opt/SS12-Practical/scwrl4/Scwrl4 -i 1J8U.pdb -s 1J8U_seq_mut.txt -o 1J8U_mut.pdb

foldX

For foldX, we used the files from the example shown on the foldX webserver with the approach Multiple mutations using individual list. We did not change the run.txt file, but the following files:

list.txt:

1J8U.pdb

A list of all pdb files to use for the mutation, in our case only the 1J8U pdb file.

individual_list.txt:

QA172H;
AA259V;
TA266A;
FA392S;
PA416Q;

An indiviudal list with all mutations to use. Every line stands for one run. We wanted to do only one mutation per run, so we only had to wrote one mutation in each line, but it would be possible to wrote more than one point mutation per line separated by commas. The first letter of every mutation stands for the amino acid in the unmutated structure, the second one for the chain (always A), the number gives the position in the structure and the last letter is for the mutation. Every line has to be finished with a semicolon.

After generating all needed files, we wanted to run foldX on the biolab computers, but the licence was not current. So, we downloaded a version of foldX onto our own path and generated the mutation files via following command:

/mnt/home/student/worfk/Masterpractical/Task09/foldX/FoldX.linux64 -runfile run.txt

The list.txt, individual_list.txt, run.txt and 1J8U.pdb file have to be in the same direction as well as the rotabase.txt file.

For the calculation of the energy given for the wildtype, we followed the Energy of the molecule example on the FoldX webpage.

minimise

Before minimization, we had to remove hydrogens and waters (protein only) with the repairPDB script:

/opt/SS12-Practical/scripts/repairPDB input.pdb -noh -nohoh > output.pdb

For the wildtype the -nohoh has to be changed to -jprot to remove the included ligands as well. Then, we can minimise via following command:

/opt/SS12-Practical/minimise/minimise input.pdb output.pdb

For the minimization via 5 times, one has to take the output of a run as input for the next run.

We do not know why, but the minimization does not work with the SCWRL outputs!!!