Difference between revisions of "Lab Journal - Task 9 (PAH)"
(→structure selection) |
(→Minimise) |
||
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | == |
+ | == Structure selection == |
− | The information for the resolution, chain and the positions in PAH can be found on the UniProt entry [http://www.uniprot.org/uniprot/P00439 P00439] itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the [http://www.rcsb.org/pdb/explore/materialsAndMethods.do?structureId=1DMW pdb] entries in the method section. |
+ | The information for the resolution, chain and the positions in PAH can be found on the UniProt entry [http://www.uniprot.org/uniprot/P00439 P00439] itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the [http://www.rcsb.org/pdb/explore/materialsAndMethods.do?structureId=1DMW pdb] entries in the method section. In the following we show the used procedure for the example pdb ID 1DMW. All other pdb IDs that are analysed in [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Structure-based_mutation_analysis_(Phenylketonuria) Table 1] are treated the same way.<br> To check if a gap is included in a structure, we first downloaded the pdb file in text format from the [http://www.rcsb.org/pdb/explore/explore.do?structureId=1dmw pdb] website and then used following unix shell-commands: |
− | |||
− | To check if a gap is included in a structure, we first downloaded the pdb file in text format from the [http://www.rcsb.org/pdb/explore/explore.do?structureId=1dmw pdb] website and then used following unix shell-commands: |
||
grep "^ATOM" 1DMW.pdb > 1DMW.txt |
grep "^ATOM" 1DMW.pdb > 1DMW.txt |
||
cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt |
cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt |
||
− | Then, we have to check, if the residues are consecutive. Therefore, we wrote |
+ | Then, we have to check, if the residues are consecutive. Therefore, we wrote following Python-script: |
+ | <source lang=python> |
||
− | python test.py |
||
+ | # Script to check, if the residues are consecutive or not. |
||
+ | # If it runs without printing something to the commandline, |
||
+ | # then there is no gap included, else the gap is printed!!! |
||
+ | data = open(".../1DMW_res.txt").readlines() |
||
+ | good = [] |
||
+ | for i in range(len(data)-1): |
||
+ | if int(data[i]) == int(data[i+1])-1: |
||
+ | good.append(data[i]) |
||
+ | else: |
||
+ | print("found\t" + data[i] + "\t" + data[i+1]) |
||
+ | </source> |
||
The coverage in per cent was calculated like shown in the example below: |
The coverage in per cent was calculated like shown in the example below: |
||
* P00439 has a sequence length of 452AA |
* P00439 has a sequence length of 452AA |
||
Line 17: | Line 26: | ||
/opt/SS12_Practical/scripts/repairPDB 1J8U.pdb -seq > 1J8U_seq.txt |
/opt/SS12_Practical/scripts/repairPDB 1J8U.pdb -seq > 1J8U_seq.txt |
||
Afterwards, we had to change the upper letters to lower ones, except for the interesting mutations. For this purpose, we generated a python script: |
Afterwards, we had to change the upper letters to lower ones, except for the interesting mutations. For this purpose, we generated a python script: |
||
+ | <source lang=python> |
||
− | up2low.py |
||
+ | data = open(".../1J8U_seq.txt") |
||
− | ... |
||
+ | seq = "" |
||
+ | for line in data: # print everything in lower case letters |
||
+ | seq = seq + line.lower() |
||
+ | #print(seq) |
||
+ | # find the mutation and change this letter into an upper case one |
||
+ | s.upper() <-- for upper case letters |
||
+ | for i in range(len(seq)): |
||
+ | mut = "" |
||
+ | if i == (103 - 118): |
||
+ | print(seq[i]) |
||
+ | mut = "S" |
||
+ | seq = seq[0:i] + mut + seq[i+1:len(seq)] |
||
+ | |||
+ | out = open(".../1J8U_seq_mut.txt", "w") |
||
+ | out.write(seq) |
||
+ | out.close() |
||
+ | </source> |
||
Then, we can generate a mutation with SCWRL for each sequence file like shown below: |
Then, we can generate a mutation with SCWRL for each sequence file like shown below: |
||
− | /opt/SS12-Practical/scwrl4/Scwrl4 -i 1J8U.pdb -s |
+ | /opt/SS12-Practical/scwrl4/Scwrl4 -i 1J8U.pdb -s 1J8U_seq_mut.txt -o 1J8U_mut.pdb |
− | == |
+ | == FoldX == |
For foldX, we used the files from the example shown on the [http://foldx.crg.es/examples.jsp foldX webserver] with the approach '''Multiple mutations using individual list'''. We did not change the ''run.txt'' file, but the following files: <br> |
For foldX, we used the files from the example shown on the [http://foldx.crg.es/examples.jsp foldX webserver] with the approach '''Multiple mutations using individual list'''. We did not change the ''run.txt'' file, but the following files: <br> |
||
Line 39: | Line 65: | ||
After generating all needed files, we wanted to run foldX on the biolab computers, but the licence was not current. So, we downloaded a version of foldX onto our own path and generated the mutation files via following command: |
After generating all needed files, we wanted to run foldX on the biolab computers, but the licence was not current. So, we downloaded a version of foldX onto our own path and generated the mutation files via following command: |
||
/mnt/home/student/worfk/Masterpractical/Task09/foldX/FoldX.linux64 -runfile run.txt |
/mnt/home/student/worfk/Masterpractical/Task09/foldX/FoldX.linux64 -runfile run.txt |
||
− | The ''list.txt'', ''individual_list.txt'', ''run.txt'' and ''1J8U.pdb'' file have to be in the same direction as well as the ''rotabase.txt'' file. |
+ | The ''list.txt'', ''individual_list.txt'', ''run.txt'' and ''1J8U.pdb'' file have to be in the same direction as well as the ''rotabase.txt'' file. |
+ | |||
− | |||
+ | For the calculation of the energy given for the wildtype, we followed the '''Energy of the molecule''' example on the [http://foldx.crg.es/examples.jsp FoldX] webpage. |
||
− | == minimise == |
||
+ | |||
+ | == Minimise == |
||
Before minimization, we had to remove hydrogens and waters (protein only) with the repairPDB script: |
Before minimization, we had to remove hydrogens and waters (protein only) with the repairPDB script: |
||
/opt/SS12-Practical/scripts/repairPDB input.pdb -noh -nohoh > output.pdb |
/opt/SS12-Practical/scripts/repairPDB input.pdb -noh -nohoh > output.pdb |
||
For the wildtype the ''-nohoh'' has to be changed to ''-jprot'' to remove the included ligands as well. |
For the wildtype the ''-nohoh'' has to be changed to ''-jprot'' to remove the included ligands as well. |
||
− | Then, we |
+ | Then, we can minimise via following command: |
/opt/SS12-Practical/minimise/minimise input.pdb output.pdb |
/opt/SS12-Practical/minimise/minimise input.pdb output.pdb |
||
For the minimization via 5 times, one has to take the output of a run as input for the next run. |
For the minimization via 5 times, one has to take the output of a run as input for the next run. |
||
− | + | Unfortunately, the minimization does not work with the SCWRL outputs, but we do not know why! |
|
[[Category: Phenylketonuria 2013]] |
[[Category: Phenylketonuria 2013]] |
Latest revision as of 09:32, 27 August 2013
Contents
Structure selection
The information for the resolution, chain and the positions in PAH can be found on the UniProt entry P00439 itself. The R-factor for the proteins, can be found on the PDBsum entries and the pH values on the pdb entries in the method section. In the following we show the used procedure for the example pdb ID 1DMW. All other pdb IDs that are analysed in Table 1 are treated the same way.
To check if a gap is included in a structure, we first downloaded the pdb file in text format from the pdb website and then used following unix shell-commands:
grep "^ATOM" 1DMW.pdb > 1DMW.txt cut -c 23-27 1DMW.txt | uniq > 1DMW_res.txt
Then, we have to check, if the residues are consecutive. Therefore, we wrote following Python-script: <source lang=python>
- Script to check, if the residues are consecutive or not.
- If it runs without printing something to the commandline,
- then there is no gap included, else the gap is printed!!!
data = open(".../1DMW_res.txt").readlines() good = [] for i in range(len(data)-1): if int(data[i]) == int(data[i+1])-1: good.append(data[i]) else: print("found\t" + data[i] + "\t" + data[i+1]) </source> The coverage in per cent was calculated like shown in the example below:
- P00439 has a sequence length of 452AA
- 1DMW has 424 - 117 = 307 residues
- coverage: 307 / (452 / 100) = 67,92%
=> 1DMW has a coverage of 67,92% of the PAH (P00439) sequence!
SCWRL
Before generating the mutations with SCWRL, we first had to filter the sequence from the pdb file. Therefore, we used the repairPDB script with following command:
/opt/SS12_Practical/scripts/repairPDB 1J8U.pdb -seq > 1J8U_seq.txt
Afterwards, we had to change the upper letters to lower ones, except for the interesting mutations. For this purpose, we generated a python script: <source lang=python> data = open(".../1J8U_seq.txt") seq = "" for line in data: # print everything in lower case letters seq = seq + line.lower()
- print(seq)
- find the mutation and change this letter into an upper case one
s.upper() <-- for upper case letters
for i in range(len(seq)): mut = "" if i == (103 - 118): print(seq[i]) mut = "S" seq = seq[0:i] + mut + seq[i+1:len(seq)]
out = open(".../1J8U_seq_mut.txt", "w") out.write(seq) out.close() </source> Then, we can generate a mutation with SCWRL for each sequence file like shown below:
/opt/SS12-Practical/scwrl4/Scwrl4 -i 1J8U.pdb -s 1J8U_seq_mut.txt -o 1J8U_mut.pdb
FoldX
For foldX, we used the files from the example shown on the foldX webserver with the approach Multiple mutations using individual list. We did not change the run.txt file, but the following files:
list.txt:
1J8U.pdb
A list of all pdb files to use for the mutation, in our case only the 1J8U pdb file.
individual_list.txt:
QA172H; AA259V; TA266A; FA392S; PA416Q;
An indiviudal list with all mutations to use. Every line stands for one run. We wanted to do only one mutation per run, so we only had to wrote one mutation in each line, but it would be possible to wrote more than one point mutation per line separated by commas. The first letter of every mutation stands for the amino acid in the unmutated structure, the second one for the chain (always A), the number gives the position in the structure and the last letter is for the mutation. Every line has to be finished with a semicolon.
After generating all needed files, we wanted to run foldX on the biolab computers, but the licence was not current. So, we downloaded a version of foldX onto our own path and generated the mutation files via following command:
/mnt/home/student/worfk/Masterpractical/Task09/foldX/FoldX.linux64 -runfile run.txt
The list.txt, individual_list.txt, run.txt and 1J8U.pdb file have to be in the same direction as well as the rotabase.txt file.
For the calculation of the energy given for the wildtype, we followed the Energy of the molecule example on the FoldX webpage.
Minimise
Before minimization, we had to remove hydrogens and waters (protein only) with the repairPDB script:
/opt/SS12-Practical/scripts/repairPDB input.pdb -noh -nohoh > output.pdb
For the wildtype the -nohoh has to be changed to -jprot to remove the included ligands as well. Then, we can minimise via following command:
/opt/SS12-Practical/minimise/minimise input.pdb output.pdb
For the minimization via 5 times, one has to take the output of a run as input for the next run.
Unfortunately, the minimization does not work with the SCWRL outputs, but we do not know why!