Difference between revisions of "Molecular Dynamics Simulations Gaucher Disease"

From Bioinformatikpedia
(Created page with "In this task, we carried out molecular dynamics simulations on the LRZ supercomputer which were analysed in the subsequent task. Simulating the the motions of proteins under diff…")
 
(Validation)
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
   
 
== Input structures ==
 
== Input structures ==
We applied molecular dynamics simulations for three structures of glycosylceramidase: the wildtype 2nt0_A which served as reference and the two mutants W209R and L470P which were generated by FoldX. '''W209P''' is a disease-causing mutation which might severely change the protein structure since the site is part of a alpha-helix which is broken by the insertion of proline. '''L470P''' is not listed in the HGMD and is therefore not disease-causing. However, both the [[Sequence-based mutation analysis Gaucher Disease|sequence-based mutation analysis]] and the [[Structure-based mutation analysis Gaucher Disease|structure-based mutation analysis]] suggested that this mutation might be deleterious. By simulating structural changes caused by this mutations, we hoped to get a better about its impact on the phenotype.
+
We applied molecular dynamics simulations for three structures of glycosylceramidase: the wildtype 2nt0_A which served as reference and the two mutants W209R and L470P which were generated by FoldX. '''W209R''' is a disease-causing mutation which might severely change the protein structure since the site is part of a alpha-helix and the negative arginine is likely to interfere with the hydrophobic environment. '''L470P''' is not listed in the HGMD and is therefore not disease-causing. However, both the [[Sequence-based mutation analysis Gaucher Disease|sequence-based mutation analysis]] and the [[Structure-based mutation analysis Gaucher Disease|structure-based mutation analysis]] suggested that this mutation might be deleterious. By simulating structural changes caused by this mutations, we hope to get a better understanding about its impact on the phenotype.
   
 
== Preparation ==
 
== Preparation ==
Line 15: Line 15:
 
# We submitted the scripts via <tt>sbatch SCRIPT-FILE</tt> on host <tt>lxa1</tt> which runs the [http://www.lrz.de/services/compute/linux-cluster/batch_parallel/ SLURM scheduler]
 
# We submitted the scripts via <tt>sbatch SCRIPT-FILE</tt> on host <tt>lxa1</tt> which runs the [http://www.lrz.de/services/compute/linux-cluster/batch_parallel/ SLURM scheduler]
   
AGroS is a script for automizing molecular dynamics simulations via GROMACS and performs following steps:
+
Each job performs a molecular dynamics simulation via AGroS.
# Repairing breaks, missing atoms of the input structure.
 
# Calling SCWRL for optimizing side chains.
 
# Adding water, 0.1 NaCl, and neutralization protein intrinsic charge.
 
# Minimization of the solvent.
 
# Short NVT (constant moles, volume, temperature) simulation.
 
# Short NVP ( constant moles, pressure, temperature) simulation.
 
# Long MD simulation.
 
   
  +
== The AGroS molecular dynamics simulation pipeline ==
We called AgroS with the default parameters which resulted in 50 ps long NVT and NVP simulation and a 10 ns long MD simulation.
 
  +
AGroS is a program that controls the three major steps of a molecular dynamics stimulation via GROMACS:
  +
# Preparation
  +
#* Cleaning the PDB
  +
#* Adding missing side-chains
  +
#* Adding the solvent: H2O + NaCl
  +
# Equilibration
  +
#* Minimizing the potential energy by varying
  +
#** the solvent
  +
#** the solvent and side-chains
  +
#* Short NVT run (50 ps)
  +
#* Short NPT run (50 ps)
  +
# Production MD simulation
  +
#* The main MD simulation (10 ns)
  +
  +
In the first steps, the structure is modified such that it is tolerated by GROMACS. In the equilibrium stage, the potential energy is minimized by slightly changing the position of solvent molecules and side-chains. The short NVT and NPT run brings the system into a state which meets certain input conditions, i.e. the starting temperature. In essence, equilibrating the system should avoid strong forces acting acting on particles at the beginning of the simulation. The equilibrated system is then used to carry out the main simulation.
  +
  +
== Intermediate PDB files ==
  +
AGroS creates several intermediate PDB files in the following order:
  +
{|width="1000px"
  +
| 1. || NAME.pdb || The input structure
  +
|-
  +
| 2. || NAME_repair.pdb || NAME.pdb with common PDB problems repaired by repairPDB
  +
|-
  +
| 3. || NAME_br.pdb || Just the protein in NAME.pdb
  +
|-
  +
| 4. || NAME_water.pdb || Just the water in NAME.pdb
  +
|-
  +
| 5. || NAME_dna.pdb || Just the DNA in NAME.pdb
  +
|-
  +
| 6. || NAME_sc.pdb || NAME.pdb with side-chains optimized by SCWRL
  +
|-
  +
| 7. || NAME_nh.pdb || NAME_sc.pdb without hydrogen atoms with the water as well as DNA molecules of NAME.pdb included
  +
|-
  +
| 8. || NAME_solv.pdb || NAME_nh.pdb with the solvent H20 and 0.1 NaCl included
  +
|-
  +
| 9. || NAME_solv_min.pdb || NAME_solv.pdb with the potential energy of the solvent minimized
  +
|-
  +
| 10. || NAME_solv_min2.pdb || NAME_solv_min.pdb with the potential energy of the solvent and side-chains minimized
  +
|-
  +
| 11. || NAME_solv_min3.pdb || NAME_solv_min2.pdb with the potential energy of the solvent and side-chains minimized again
  +
|}
  +
  +
Our input structure 2nt0 was not modified by repairPDB such that 2nt0.png and 2nt0_repair.pdb were identical. Since 2nt0 contained neither water nor DNA, 2nt0_water.pdb and 2nt0_dna.pdb were empty. The side-chain optimization by SCWRL led to slightly different secondary structure assignments (cf. <xr id="fig:pdb_repair_sc"/>). <xr id="fig:pdb_solv"/> depicts 2nt0_solv.pdb and the added Na+ and Cl- ions without water. The minimization of the solvent changed the position of Na+ and Cl- ions but not the protein (cf. <xr id="fig:pdb_solv_min"/>). The subsequent second minimization of the solvent and the side-chains did not alter the position of Na+ and Cl- ions any more. However, minimal deviations between the side-chains of 2nt0_solv_min.pdb and 2nt0_solv_min2.pdb could be observed (cf. <xr id="fig:pdb_min_min2"/>). <xr id="fig:pdb_min2_min3"/> shows that the third minimization run marginally changed some side-chains.
  +
  +
{|
  +
|<figure id="fig:pdb_repair_sc">[[File:pdb_repair_sc.png|thumb|150px|<caption>2nt0_repair.pdb (green) vs. 2nt0_sc.db (blue).</caption>]]</figure>
  +
|<figure id="fig:pdb_solv">[[File:pdb_solv.png|thumb|150px|<caption>2nt0_solv.pdb (yellow) along with Na+ (blue) and Cl- (cyan).</caption>]]</figure>
  +
|<figure id="fig:pdb_solv_min">[[File:pdb_solv_min.png|thumb|150px|<caption>2nt0_solv.pdb (yellow) vs. 2nt0_solv_min.pdb (orange).</caption>]]</figure>
  +
|<figure id="fig:pdb_min_min2">[[File:pdb_min_min2.png|thumb|150px|<caption>2nt0_solv_min.pdb (orange) vs. 2nt0_solv_min2.pdb (red).</caption>]]</figure>
  +
|<figure id="fig:pdb_min2_min3">[[File:pdb_min2_min3.png|thumb|150px|<caption>2nt0_solv_min2.pdb (red) vs. 2nt0_solv_min3.pdb (purple).</caption>]]</figure>
  +
|}
  +
  +
== Runtime ==
  +
The jobs were queued for about 48h. The actual runtime is listed in <xr id="tab:runtime"/>.
  +
<figtable id="tab:runtime">
  +
{|width="200px"
  +
| '''Wiltype''' || 16h 51min
  +
|-
  +
| '''L470P''' || 9h 36min
  +
|-
  +
| '''W209P''' || 9h 35min
  +
|}
  +
<caption>Runtime of the MD simulations.</caption>
  +
</figtable>
  +
  +
== Validation ==
  +
We did a course validation by looking at the output files and calling the tool <tt>gmxcheck -f 2nt0_A_md.xtc</tt>. According to the output, all simulations terminated successfully. All simulations produced 2001 frames.

Latest revision as of 08:14, 3 July 2012

In this task, we carried out molecular dynamics simulations on the LRZ supercomputer which were analysed in the subsequent task. Simulating the the motions of proteins under different conditions can give insights about the functions of proteins and is used to study diseases which are caused by misfolded proteins. Here, we employed molecular dynamics simulations to investigate the impact of two point-mutations in glycosylceramidase.

Input structures

We applied molecular dynamics simulations for three structures of glycosylceramidase: the wildtype 2nt0_A which served as reference and the two mutants W209R and L470P which were generated by FoldX. W209R is a disease-causing mutation which might severely change the protein structure since the site is part of a alpha-helix and the negative arginine is likely to interfere with the hydrophobic environment. L470P is not listed in the HGMD and is therefore not disease-causing. However, both the sequence-based mutation analysis and the structure-based mutation analysis suggested that this mutation might be deleterious. By simulating structural changes caused by this mutations, we hope to get a better understanding about its impact on the phenotype.

Preparation

The input structures and output files are stored in a repository which can be checked out by:

git clone /mnt/home/student/angermue/mp/tasks/task08

We followed this description for preparing the molecular dynamics simulation:

  1. We checked out AGroS from https://github.com/offmarc/AGroS.
  2. We downloaded Scwrl4 from http://dunbrack.fccc.edu/scwrl4 and installed it into the same directory as AGroS.
  3. We prepared the job scripts.
  4. We submitted the scripts via sbatch SCRIPT-FILE on host lxa1 which runs the SLURM scheduler

Each job performs a molecular dynamics simulation via AGroS.

The AGroS molecular dynamics simulation pipeline

AGroS is a program that controls the three major steps of a molecular dynamics stimulation via GROMACS:

  1. Preparation
    • Cleaning the PDB
    • Adding missing side-chains
    • Adding the solvent: H2O + NaCl
  2. Equilibration
    • Minimizing the potential energy by varying
      • the solvent
      • the solvent and side-chains
    • Short NVT run (50 ps)
    • Short NPT run (50 ps)
  3. Production MD simulation
    • The main MD simulation (10 ns)

In the first steps, the structure is modified such that it is tolerated by GROMACS. In the equilibrium stage, the potential energy is minimized by slightly changing the position of solvent molecules and side-chains. The short NVT and NPT run brings the system into a state which meets certain input conditions, i.e. the starting temperature. In essence, equilibrating the system should avoid strong forces acting acting on particles at the beginning of the simulation. The equilibrated system is then used to carry out the main simulation.

Intermediate PDB files

AGroS creates several intermediate PDB files in the following order:

1. NAME.pdb The input structure
2. NAME_repair.pdb NAME.pdb with common PDB problems repaired by repairPDB
3. NAME_br.pdb Just the protein in NAME.pdb
4. NAME_water.pdb Just the water in NAME.pdb
5. NAME_dna.pdb Just the DNA in NAME.pdb
6. NAME_sc.pdb NAME.pdb with side-chains optimized by SCWRL
7. NAME_nh.pdb NAME_sc.pdb without hydrogen atoms with the water as well as DNA molecules of NAME.pdb included
8. NAME_solv.pdb NAME_nh.pdb with the solvent H20 and 0.1 NaCl included
9. NAME_solv_min.pdb NAME_solv.pdb with the potential energy of the solvent minimized
10. NAME_solv_min2.pdb NAME_solv_min.pdb with the potential energy of the solvent and side-chains minimized
11. NAME_solv_min3.pdb NAME_solv_min2.pdb with the potential energy of the solvent and side-chains minimized again

Our input structure 2nt0 was not modified by repairPDB such that 2nt0.png and 2nt0_repair.pdb were identical. Since 2nt0 contained neither water nor DNA, 2nt0_water.pdb and 2nt0_dna.pdb were empty. The side-chain optimization by SCWRL led to slightly different secondary structure assignments (cf. <xr id="fig:pdb_repair_sc"/>). <xr id="fig:pdb_solv"/> depicts 2nt0_solv.pdb and the added Na+ and Cl- ions without water. The minimization of the solvent changed the position of Na+ and Cl- ions but not the protein (cf. <xr id="fig:pdb_solv_min"/>). The subsequent second minimization of the solvent and the side-chains did not alter the position of Na+ and Cl- ions any more. However, minimal deviations between the side-chains of 2nt0_solv_min.pdb and 2nt0_solv_min2.pdb could be observed (cf. <xr id="fig:pdb_min_min2"/>). <xr id="fig:pdb_min2_min3"/> shows that the third minimization run marginally changed some side-chains.

</figure> </figure> </figure> </figure> </figure>
<figure id="fig:pdb_repair_sc">
2nt0_repair.pdb (green) vs. 2nt0_sc.db (blue).
<figure id="fig:pdb_solv">
2nt0_solv.pdb (yellow) along with Na+ (blue) and Cl- (cyan).
<figure id="fig:pdb_solv_min">
2nt0_solv.pdb (yellow) vs. 2nt0_solv_min.pdb (orange).
<figure id="fig:pdb_min_min2">
2nt0_solv_min.pdb (orange) vs. 2nt0_solv_min2.pdb (red).
<figure id="fig:pdb_min2_min3">
2nt0_solv_min2.pdb (red) vs. 2nt0_solv_min3.pdb (purple).

Runtime

The jobs were queued for about 48h. The actual runtime is listed in <xr id="tab:runtime"/>. <figtable id="tab:runtime">

Wiltype 16h 51min
L470P 9h 36min
W209P 9h 35min

Runtime of the MD simulations. </figtable>

Validation

We did a course validation by looking at the output files and calling the tool gmxcheck -f 2nt0_A_md.xtc. According to the output, all simulations terminated successfully. All simulations produced 2001 frames.