Difference between revisions of "Gaucher Disease: Task 09 - Lab Journal"

From Bioinformatikpedia
(foldX)
(Sources)
 
(35 intermediate revisions by the same user not shown)
Line 29: Line 29:
   
 
</css>
 
</css>
 
'''This page is under construction.'''
 
   
 
==Preparation==
 
==Preparation==
Line 36: Line 34:
 
===1. Choice of a structure to work with===
 
===1. Choice of a structure to work with===
   
In the former tasks, we worked with the reference structure 1OGS, because it has no gaps - except for an offset of 39 residues at the N terminus that all structures for our protein P04062 referenced in [http://www.uniprot.org/uniprot/P04062 UniProt] have - and has a pretty high resolution of 2.0 Å. However, there are five other structures with a higher or equal resolution (all resolved using the X-ray diffraction method). We compared 1OGS and those five structures for the resolution, coverage and gaps, R-factor, R-free and pH-value at which the structure was resolved in order to choose the best structure suiting the requirements of this task (<xr id="structure_choice"/>).
+
In the former tasks, we worked with the reference structure 1OGS, because it has no gaps - except for an offset of 39 residues at the N terminus that all structures for our protein P04062 referenced in [http://www.uniprot.org/uniprot/P04062 UniProt] have - and has a pretty high resolution of 2.0 Å. However, there are five other structures with a higher or equal resolution (all resolved using the X-ray diffraction method). We compared 1OGS and those five structures for the resolution, coverage and gaps, R-factor, R-free and pH-value (at which the structure was resolved) in order to choose the best structure suiting the requirements of this task (<xr id="structure_choice"/>).
   
 
<figtable id="structure_choice">
 
<figtable id="structure_choice">
Line 57: Line 55:
 
</figtable>
 
</figtable>
   
In all the investigated structures residues numbered 1-497 are native and correspond to the positions 40-536 in the UniProt sequence P04062. In some of the structures, like in 2V3E, two residues were added at the beginning (EF, positions -1, 0) and six residues at the end (LLVDTM, positions 498-503). If these "axillary" residues are missing in the resolved structure, we can ignore it. For this reason they are presented in brackets in the table. To be sure, we verified that the PDB annotations about the missing residues are correct using a simple python script.
+
In all the investigated structures residues numbered 1-497 are native and correspond to the positions 40-536 in the UniProt sequence P04062. In some of the structures, like in 2V3E, two residues were added at the beginning (EF, positions -1, 0) and six residues at the end (LLVDTM, positions 498-503). If these "auxiliary" residues are missing in the resolved structure, we can ignore this. For this reason, they are presented in brackets in the table. To be sure, we verified that the PDB annotations about the missing residues are correct using a simple python script.
   
 
The structures '''2NT0''', '''3GXI''' and our familiar reference structure, '''1OGS''', have no missing residues and the first two candidates have the highest resolutions (1.79 and 1.84, respectively). However, the pH-values, at which the structures were resolved, are too low (4.5, 5.5 and 4.6, respectively).
 
The structures '''2NT0''', '''3GXI''' and our familiar reference structure, '''1OGS''', have no missing residues and the first two candidates have the highest resolutions (1.79 and 1.84, respectively). However, the pH-values, at which the structures were resolved, are too low (4.5, 5.5 and 4.6, respectively).
Line 137: Line 135:
 
REMARK 465 THR B 502
 
REMARK 465 THR B 502
 
REMARK 465 MET B 503 -->
 
REMARK 465 MET B 503 -->
Moreover, this structure has relatively low R-value (0.163) and R-free (0.220) values, which are lower than in 2NT0, 3GXI or 1OGS. As 2V3E is the highest resolution structure resolved at a physiological pH-value and the chain B has no gaps, we choose '''2V3E, chain B'''.
+
Moreover, this structure has relatively low R-value (0.163) and R-free (0.220) values, which are lower than in 2NT0, 3GXI or 1OGS. As 2V3E is the highest resolution structure resolved at a physiological pH-value and the chain B has no gaps, we choose '''2V3E, chain B''', to work with in this task.
   
 
===2. Visualization of the mutations to work with===
 
===2. Visualization of the mutations to work with===
Line 152: Line 150:
 
-s <sequence file> file with ATOM sequence of the model file in lower case with one mutated residue we want to predict in uppercase <br/>
 
-s <sequence file> file with ATOM sequence of the model file in lower case with one mutated residue we want to predict in uppercase <br/>
 
-0 normal priority mode <br/>
 
-0 normal priority mode <br/>
  +
> <log file> to save the output energies
   
We prepared the sequence files as follows. We downloaded the FASTA sequence of 2V3E_B from the PDB, deleted the header and the newlines. Then, we deleted the first two and the last six additional residues mentioned before. This is the sequence of the actually resolved structure. Next, we changed all letters to lower case (with VIM editor). Finally, for each mutation, we introduced the new amino acid letter in capital letter.
+
We prepared the sequence files as follows. We downloaded the [http://www.rcsb.org/pdb/files/fasta.txt?structureIdList=2V3E FASTA sequence of 2V3E from the PDB] and copied the sequence of the chain B, deleting the header and the newlines. Then, we deleted the first two and the last six additional residues mentioned before. This is the sequence of the actually resolved structure. Next, we changed all letters to lower case (with VIM editor). Finally, for each mutation, we introduced the new amino acid letter in capital letter.
  +
  +
For comparison with the WT energy, we ran SCWRL once with only lower case sequence. Then, the "Total minimal energy of the graph" of the WT was subtracted from the one of a mutant.
   
 
==Energy comparisons==
 
==Energy comparisons==
Line 160: Line 161:
   
 
We used the approach "Multiple mutations using individual list" with the option 1: "-manual", as described in the [http://foldx.crg.es/examples.jsp examples page] and foldX manual. The version 3.0 Beta 5.1 was used.
 
We used the approach "Multiple mutations using individual list" with the option 1: "-manual", as described in the [http://foldx.crg.es/examples.jsp examples page] and foldX manual. The version 3.0 Beta 5.1 was used.
  +
  +
Produced output files:
  +
*PDB-File: one for each mutant and corresponding wild-type
  +
*Raw-File: energy for each created PDB
  +
*Dif_BuildModel.fxout-File: difference in energy between the mutant and the corresponding wild-type (positive numbers = less stability = effect)
   
 
===Minimise===
 
===Minimise===
   
  +
Before minimise the hydrogens and waters (and hetero atoms, so that only protein atoms are left) were removed using repairPDB with the following command:
===Gromacs (optional task for those who love MD!)===
 
  +
/opt/SS12-Practical/scripts/repairPDB <model.pdb> -noh -jprot > <model_repairPDB.pdb>
  +
For SQWRL mutated models, minimize did not work after this. Applying openbabel hydrogen deletion solved the problem:
  +
obabel -ipdb <model_repairPDB.pdb> -opdb <model_obabel.pdb> -d > <model_obabel.pdb>
  +
  +
Minimise can be executed with the following command:
  +
/opt/SS12-Practical/minimise/minimise <input.pdb> <output.pdb> > <log file>
  +
  +
We documented all the process and commands necessary to produce the minimise runs in [[Gaucher Disease: Task 09 - Lab Journal: run_minimise.sh| run_minimise.sh]].
   
 
==Sources==
 
==Sources==
[http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/Rvalue.html PDB R-value and R-free]
+
'''PDB:''' [http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/Rvalue.html R-value and R-free] <br/>
  +
'''SQWRL:''' [http://www.ncbi.nlm.nih.gov/pubmed/12930999?dopt=Abstract A. A. Canutescu, A. A. Shelenkov, and R. L. Dunbrack, Jr. A graph theory algorithm for protein side-chain prediction. Protein Science 12, 2001-2014 (2003).] <br/>
  +
'''foldX:''' <br/>
  +
# Francois Stricher, Tom Lenaerts, Joost Schymkowitz, Frederic Rousseau and Luis Serrano (2008). FoldX 3.0. “In preparation” <br/>
  +
# Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005). The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8. <br/>
  +
# Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52. <br/>
  +
# Guerois R., Nielsen J. E., Serrano L. (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87. <br/>
  +
<!--'''minimise:''' <br/>-->

Latest revision as of 19:11, 5 September 2013

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>

Preparation

1. Choice of a structure to work with

In the former tasks, we worked with the reference structure 1OGS, because it has no gaps - except for an offset of 39 residues at the N terminus that all structures for our protein P04062 referenced in UniProt have - and has a pretty high resolution of 2.0 Å. However, there are five other structures with a higher or equal resolution (all resolved using the X-ray diffraction method). We compared 1OGS and those five structures for the resolution, coverage and gaps, R-factor, R-free and pH-value (at which the structure was resolved) in order to choose the best structure suiting the requirements of this task (<xr id="structure_choice"/>).

<figtable id="structure_choice">

PDB-ID Resolution (Å) Chain Covered residues (UniProt seq.) Missing residues (ATOM seq.) Covered residues (ATOM seq.) R-Value(obs.) R-Free pH Temperature (K)
2NT0 1.79 A/B/C/D 40-536 (92.7%) - 1-497 0.181 0.215 4.5 100
3GXI 1.84 A/B/C/D 40-536 (92.7%) - 1-497 0.193 0.231 5.5 NULL
2V3F 1.95 A/B 40-536 (92.7%) A: 29-31, (499-503), B: (-1-0), 27-32, (498-503) A: -1-28, 32-498, B: 1-26, 33-497 0.154 0.196 6.5 100
2V3D 1.96 A/B 40-536 (92.7%) A: 28-31, (499-503), B: (-1-0), (498-503) A: -1-27, 32-498, B: 1-497 0.157 0.208 6.5 100
2V3E 2.0 A/B 40-536 (92.7%) A: 31, (498-503), B: (-1), (498-503) A: -1-30, 32-497, B: 0-497 0.163 0.220 7.5 100
1OGS 2.0 A/B 40-536 (92.7%) - 1-497 0.195 0.230 4.6 100
Comparison of the resolution top six PDB structures according to different other criteria.

</figtable>

In all the investigated structures residues numbered 1-497 are native and correspond to the positions 40-536 in the UniProt sequence P04062. In some of the structures, like in 2V3E, two residues were added at the beginning (EF, positions -1, 0) and six residues at the end (LLVDTM, positions 498-503). If these "auxiliary" residues are missing in the resolved structure, we can ignore this. For this reason, they are presented in brackets in the table. To be sure, we verified that the PDB annotations about the missing residues are correct using a simple python script.

The structures 2NT0, 3GXI and our familiar reference structure, 1OGS, have no missing residues and the first two candidates have the highest resolutions (1.79 and 1.84, respectively). However, the pH-values, at which the structures were resolved, are too low (4.5, 5.5 and 4.6, respectively).

The structure 2V3F seems to be a good choice, because it compromises between the pH-value (pH=6.5 is near to the physiological value of 7.4) and high resolution (1.95 Å). Moreover, it has the lowest R-value (0.154) and R-free (0.196). However, there are some missing residues (29-31 in chain A and 27-32 in chain B).

The candidate 2V3D has similar values, but it also has gaps in chain A (28-31). We could have used the chain B, thought.

A structure candidate with the same resolution as 1OGS and a neutral pH of 7.5 is 2V3E. It has only one missing residue in chain A (31) and none native residues are missing in chain B. Moreover, this structure has relatively low R-value (0.163) and R-free (0.220) values, which are lower than in 2NT0, 3GXI or 1OGS. As 2V3E is the highest resolution structure resolved at a physiological pH-value and the chain B has no gaps, we choose 2V3E, chain B, to work with in this task.

2. Visualization of the mutations to work with

We prepared a PDB structure of 2V3E, chain B, without the axillary residue number 0 with PyMol, with which we also visualized the residues to be mutated on this structure.

3. Creation of mutated structures

We ran SCWRL as follows:

/opt/SS12-Practical/scwrl4/Scwrl4 
-i <backbone_IN.pdb> input PDB file of the backbone (of 2V3E_B without the residue number 0, created in PyMol)
-o <model_OUT.pdb> output PDB file of the model
-s <sequence file> file with ATOM sequence of the model file in lower case with one mutated residue we want to predict in uppercase
-0 normal priority mode
> <log file> to save the output energies

We prepared the sequence files as follows. We downloaded the FASTA sequence of 2V3E from the PDB and copied the sequence of the chain B, deleting the header and the newlines. Then, we deleted the first two and the last six additional residues mentioned before. This is the sequence of the actually resolved structure. Next, we changed all letters to lower case (with VIM editor). Finally, for each mutation, we introduced the new amino acid letter in capital letter.

For comparison with the WT energy, we ran SCWRL once with only lower case sequence. Then, the "Total minimal energy of the graph" of the WT was subtracted from the one of a mutant.

Energy comparisons

foldX

We used the approach "Multiple mutations using individual list" with the option 1: "-manual", as described in the examples page and foldX manual. The version 3.0 Beta 5.1 was used.

Produced output files:

  • PDB-File: one for each mutant and corresponding wild-type
  • Raw-File: energy for each created PDB
  • Dif_BuildModel.fxout-File: difference in energy between the mutant and the corresponding wild-type (positive numbers = less stability = effect)

Minimise

Before minimise the hydrogens and waters (and hetero atoms, so that only protein atoms are left) were removed using repairPDB with the following command:

/opt/SS12-Practical/scripts/repairPDB <model.pdb> -noh -jprot > <model_repairPDB.pdb>

For SQWRL mutated models, minimize did not work after this. Applying openbabel hydrogen deletion solved the problem:

obabel -ipdb <model_repairPDB.pdb> -opdb <model_obabel.pdb> -d > <model_obabel.pdb>

Minimise can be executed with the following command:

/opt/SS12-Practical/minimise/minimise <input.pdb> <output.pdb> > <log file>

We documented all the process and commands necessary to produce the minimise runs in run_minimise.sh.

Sources

PDB: R-value and R-free
SQWRL: A. A. Canutescu, A. A. Shelenkov, and R. L. Dunbrack, Jr. A graph theory algorithm for protein side-chain prediction. Protein Science 12, 2001-2014 (2003).
foldX:

  1. Francois Stricher, Tom Lenaerts, Joost Schymkowitz, Frederic Rousseau and Luis Serrano (2008). FoldX 3.0. “In preparation”
  2. Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. (2005). The FoldX web server: an online force field. Nucleic Acids Research, vol 33, pW382-8.
  3. Schymkowitz J. W., Rousseau F., Martins I. C., Ferkinghoff-Borg J., Stricher F., Serrano L. (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc Natl Acad Sci USA, vol 102, p 10147-52.
  4. Guerois R., Nielsen J. E., Serrano L. (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, vol 320, p369-87.