Task 8 - Molecular Dynamics Simulations 2011

From Bioinformatikpedia
Revision as of 15:18, 4 July 2011 by Offman (talk | contribs) (Created page with "== Intro == In this section we will simulate the wildtype protein and two interesting mutants with MD e.g. the gromacs package. In the following we supply you with the pipeline …")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Intro

In this section we will simulate the wildtype protein and two interesting mutants with MD e.g. the gromacs package. In the following we supply you with the pipeline to simulate the proteins. As this will take a while, we will post the analysis part at a later point.

A good tutorial can be found here: http://md.chem.rug.nl/~mdcourse/index.html#introduction[1]

Preparation

First we have to prepair our PDB structure. For this we will make use of repairPDB and SCWRL. Select the structure from the PDB or use your model where the missing piece has been added. A new version of repairPDB can be found in the Dropbox dir.

  • use repairPDB to extract crystal water below 15 Å and save into a temporary file. <-ssw 15>
  • use repairPDB and extract just protein. <nosol>
  • now use SCWRL with the extracted protein and give it the PDB sequence in all small letter to complete missing sidechains
  • remove the hydrogens from the scwrl output with repairPDB
  • concatenate the protein and the water into one file

Create important GROMACS files

Next we start to prepare the files for GROMACS:

  • we have to create a TOP and a PAR file with the following command:

pdb2gmx -f <input.pdb> -o <output.gro> -p <output.top> -water tip3p -ff amber03 -vsite hydrogens

As a model for water we choose TIP3P, as a forcefield we use AMBER03. The water is changed to virtualsites to speed up the simulation. Don't forget the ending of the input file needs to be .pdb .

  • Next we prepare to create a box around the protein and fill it with water:

The following commands is used to add the box: editconf -f <file.gro> -o <file_box.gro> -bt dodecahedron -d 1.0

The -bt flag stands for boxtype - dodecahedron is a special geometrical form. -d stands for the minimum distance of the molecules to the box for any surface residues in nanometers.

genbox -cp <file_box.gro> -cs spc216.gro -p <file.top> -o <file_water.gro>

cs tells the program which parameters/topology to use for the water - in this case the ones that are compatible with tip3p

  • In the next step we have to add the ions. To do so we need to used the grompp command, which is always used to prepare a system for more complicated steps. This command needs an mdp file where we can define several additional parameters.

The file will include the following info and should have the ending .mdp:

integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 rlist = 0.9 fourierspacing = 0.12 pme_order = 4 ewald_rtol = 1e-5

For details of the settings please check the gromacs manual found in the dropbox literature folder. We call grommp the following way: **grompp -v -f <file.mdp> -c <file_water.gro> -p <file.top> -o <file_water.tpr>** The tpr file is what we need in the next step where we use the command genion. genion will add the solvent to the system and neutralize and charge that is part of the system because of charged amino acids.

    • genion -s <file_water.tpr> -o <file_solv.pdb> -conc 0.1 -neutral -pname NA+ -nname CL-**

As the selection we will use number **13**, which means that we replace the water molecules with the ions. -conc defines the concentration of the ions which is 0.1 molar. -neutral tells the command to neutralize the charges. -pname is the name of the positively charged ions to add, -nname the negatively charged.

Now we will have to do some manual work. From the output of the last command you should note how many CL- and NA+ ions have been added and how many solvent molecules are now given. This need to be changed in the **top file**; copy the original and edit the copy. At the end of the file there are entries that begin with SOL. From the larger one you subtract the number of ions added. Then add the entry NA and the number of added NA+ ions and CL and the number of added CL- ions.

To make sure our initial crystal water does not overlap with the added water we use repairPDB again.

    • repairPDB <file_solv.pdb> -cleansol > <file_solv2.pdb>**

In the last line there is a REMARK tag with the number of removed water molecules. This has to be changed in the TOP file if larger than 0.

Before we are able to start with the minimization procedures we need to extract **restraints** from the structures. These restraints are useful to reduce the simulation time by disallowing very fast vibrations such as seen for hydrogens. The command we use is called **genrestr**.

    • genrestr -f <file_solv2.pdb> -o <file.itp>**

In the command line menu we choose number 1 for protein.

Minimization solvent

Now we can start our first minimization - minimizing the solvent only and restraining the protein. To prepare the system we once again need to use **grompp** and therefore create a **mdp file**.

The MDP file looks like this:

  • define = -DPOSRES

integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 rlist = 0.9 fourierspacing = 0.12 pme_order = 4 ewald_rtol = 1e-5 pbc = xyz*

We call **grompp** the following way:

    • grompp -v -f <file.mdp> -c <file_solv2.pdb> -p <file.top> -o <file_solv_min.tpr>**

Now we do our first run with the simulation command **mdrun**.

If you have multiple processors you do the following:

    • mpirun -np <number of processors to use> mdrun_mpi -v -deffnm <file_solv_min> -c <file_solv_min.pdb>**

If you only run it on one processor it looks like this:

    • mdrun -v -deffnm <file_solv_min> -c <file_solv_min.pdb>**

Minimization system

Now we have a file where the solvent has been minimized. Next we need to minimize the solvent and the protein sidechains. We have to recreate the position restraints files. Choose option 4 which only restraints the backbone of the protein.

    • genrestr -f <file_solv_min.pdb> -o <file.itp>**

The MDP file from the previous minimization can be reused; therefore, we only have to use grompp to prepare our tpr files.

    • grompp -v -f <file.mdp> -c <file_solv_min.pdb> -p <file.top> -o <file_solv_min2.tpr>**

If you have multiple processors you do the following:

    • mpirun -np <number of processors to use> mdrun_mpi -v -deffnm <file_solv_min2> -c <file_solv_min2.pdb>**

If you only run it on one processor it looks like this:

    • mdrun -v -deffnm <file_solv_min2> -c <file_solv_min2.pdb>**


Equilibration of system

Now that our system is minimized we do have to start equilibrating it. The following procedures will take a while but should still be calculated locally and not using the LRZ account.

In the next step we will apply temperature coupling. In this step we will slowly heat up our system to the wished temperature. In this simulation we have a constant particle number, volume and finally temperature.

You will have to create a new mdp file called **nvt.mdp**

  • define = -DPOSRES

integrator = md dt = 0.005 nsteps = 10000 nstxout = 0 nstvout = 0 nstfout = 0 nstlog = 1000 nstxtcout = 0 nstenergy = 5 energygrps = Protein Non-Protein nstcalcenergy = 5 nstlist = 10 ns-type = Grid pbc = xyz rlist = 0.9 coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 fourierspacing = 0.12 pme_order = 4 ewald_rtol = 1e-5 gen_vel = yes gen_temp = 200.0 gen_seed = 9999 constraints = all-bonds tcoupl = V-rescale tc-grps = Protein Non-Protein tau_t = 0.1 0.1 ref_t = 298 298 pcoupl = no*

For details regarding each setting check the gromacs manual. We use once again grompp to prepare the tpr file for mdrun.

    • grompp -v -f <nvt.mdp> -c <file_solv_min2.pdb> -p <file.top> -o <file_nvt.tpr>**

Then we call mdrun as above:

    • ... mdrun -v -deffnm file_nvt**

Next we create a mdp file for pressure coupling **NPT**.

  • define = -DPOSRES

integrator = md dt = 0.005 nsteps = 10000 nstxout = 0 nstvout = 0 nstfout = 0 nstlog = 1000 nstxtcout = 0 nstenergy = 5 xtc_precision = 1000 xtc-grps = System energygrps = Protein Non-Protein nstcalcenergy = 5 nstlist = 5 ns-type = Grid pbc = xyz rlist = 0.9 coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 fourierspacing = 0.12 pme_order = 4 ewald_rtol = 1e-5 tcoupl = V-rescale tc-grps = Protein Non-Protein tau_t = 0.1 0.1 ref_t = 298 298 pcoupl = Berendsen Pcoupltype = Isotropic tau_p = 1.0 compressibility = 4.5e-5 ref_p = 1.0 gen_vel = no constraints = all-bonds*

We use once again grompp to prepare the tpr file for mdrun.

    • grompp -v -f <npt.mdp> -c <file_nvt.gro> -p <file.top> -o <file_npt.tpr>**

Then we call mdrun as above:

    • ... mdrun -v -deffnm file_npt**


Production run

Finally we come to the production run. Again we do everything as above. Only the final step, the **MD simulation** we will submit to the LRZ linux cluster. Below I will suggest one possible way to do this - of course you can do this in a different way. Also, be aware that this might take a while, as the queues are really full.

This is our md.mdp file:

  • integrator = md

tinit = 0 dt = $stepsize nsteps = $rlMD # Default is 10ns nstxout = 50000 nstvout = 50000 nstfout = 0 nstlog = 1000 nstxtcout = 1000 nstenergy = 1000 energygrps = Protein Non-Protein nstcalcenergy = 5 nstlist = 5 ns-type = Grid pbc = xyz rlist = 0.9 coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 fourierspacing = 0.12 pme_order = 4 ewald_rtol = 1e-5 tcoupl = V-rescale tc-grps = Protein Non-Protein tau_t = 0.1 0.1 ref_t = 298 298 pcoupl = Berendsen Pcoupltype = Isotropic tau_p = 2.0 compressibility = 4.5e-5 ref_p = 1.0 gen_vel = no constraints = all-bonds constraint-algorithm = Lincs unconstrained-start = yes lincs-order = 4 lincs-iter = 1 lincs-warnangle = 30 comm_mode = linear*


We use once again grompp to prepare the tpr file for mdrun.

    • grompp -v -f <md.mdp> -c <file_npt.gro> -p <file.top> -o <file_md.tpr>**

Try the **mdrun** just to see whether all is ok:

    • ... mdrun -v -deffnm file_md**


LRZ

From now on we have to deal with the LRZ. First we have to copy our files to the system. This can be achieved using **scp**. You need the mdp, gro top and tpr files used/produced in the last **grompp** call.

To login to the LRZ use the following nodes: - **ssh -XY username@lx64ia2.lrz.de** or **lx64ia3.lrz.de**

More info at http://www.lrz.de/services/compute/linux-cluster/batch/#batchSGE

- There you have to create a configuration file (bla.cmd):

    1. !/bin/bash
  1. $-o $HOME/DIR/mdrun.WT_1.out -j y
  2. $-N PROJECTNAME(no number at beginning)
  3. $-S /bin/bash
  4. $-M YOUR@EMAIL.COM
  5. $-l h_rt=32:00:00
  6. $-l march=x86_64
  7. $-pe mpi_32 32

. /etc/profile cd $HOME/DIR $HOME/DIR/mpirun -np 32 mdrun_mpi -v -deffnm $HOME/DIR/file_md*

Edit the /etc/profile and add this to the end in order to load gromacs:

    • module load gromacs/4.6**

Finally to submit the job type **qsub bla.cmd** To check if everything is ok type **qstat** and look for your job...