Difference between revisions of "Task 8 - Molecular Dynamics Simulations"
m |
(→Intro) |
||
(5 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
== Intro == |
== Intro == |
||
− | In this section we will simulate the wildtype protein and two interesting mutants with MD, e.g. the gromacs package. |
+ | In this section we will simulate the wildtype protein and two interesting mutants with MD, e.g. the gromacs package. For this we will use an automatic pipeline. As the final simulations will take a while, we will post the analysis part at a later point. |
+ | The pipeline is available as a git repository. All the work needs to be done on the LRZ now. |
||
+ | The slides of the task: [[File:MD talk.pdf]] |
||
− | A good tutorial can be found here: |
||
− | http://md.chem.rug.nl/~mdcourse/index.html#introduction[http://md.chem.rug.nl/~mdcourse/index.html#introduction] |
||
− | |||
− | The related talk can be found here: [[File:Molecular_dynamics_talk.pdf]] |
||
− | |||
− | == Preparation == |
||
− | First we have to prepare our PDB structure. For this we will make use of repairPDB and SCWRL. Select the structure from the PDB or use your model where the missing residues have been added. A new version of repairPDB can be found in the Dropbox dir. |
||
− | |||
− | * use repairPDB to extract crystal water below 15 Å resolution and save into a temporary file. <code><-ssw 15></code> |
||
− | * use repairPDB and extract just protein. <code><-nosol></code> |
||
− | * now use SCWRL with the extracted protein and give it the PDB sequence in all small letter to complete missing sidechains |
||
− | * remove the hydrogen atoms from the SCWRL output with repairPDB |
||
− | * concatenate the protein and the crystal water into one file |
||
− | |||
− | == Create important GROMACS files == |
||
− | Next we start to prepare the files for GROMACS: |
||
− | * we have to create a '''TOP''' (topology) and a '''PAR''' (parameter) file with the following command: |
||
− | |||
− | <code>pdb2gmx -f <input.pdb> -o <output.gro> -p <output.top> -water tip3p -ff amber03 -vsite hydrogens</code> |
||
− | |||
− | As a model for water we choose '''TIP3P''', as a forcefield we use '''AMBER03'''. The water is changed to virtual sites to speed up the simulation. Don't forget the file extension of the input file needs to be '''pdb'''. |
||
− | |||
− | |||
− | * Next we create a '''box around the protein''' and fill it with water: |
||
− | |||
− | The following commands are used to add the box: |
||
− | <code>editconf -f <file.gro> -o <file_box.gro> -bt dodecahedron -d 1.0</code> |
||
− | |||
− | The -bt flag stands for boxtype; dodecahedron is a special geometrical form. -d stands for the minimum distance of the molecules to the box for any surface residue in nanometers. |
||
− | |||
− | <code>genbox -cp <file_box.gro> -cs spc216.gro -p <file.top> -o <file_water.gro></code> |
||
− | |||
− | cs tells the program which parameters/topology to use for the water - in this case the ones that are compatible with tip3p |
||
− | |||
− | |||
− | * In the next step we have to '''add the ions'''. To do so we need to used the '''grompp''' command, which is always used to prepare a system for more complicated steps. This command needs an '''mdp file''' where we can define several additional parameters. |
||
− | The file will include the following info and should have the file extension '''.mdp''': |
||
− | |||
− | <code> |
||
− | integrator = steep |
||
− | emtol = 1.0 |
||
− | nsteps = 500 |
||
− | nstenergy = 1 |
||
− | energygrps = System |
||
− | coulombtype = PME |
||
− | rcoulomb = 0.9 |
||
− | rvdw = 0.9 |
||
− | rlist = 0.9 |
||
− | fourierspacing = 0.12 |
||
− | pme_order = 4 |
||
− | ewald_rtol = 1e-5 |
||
− | </code> |
||
− | |||
− | For details of the settings please check the gromacs manual, e.g. found in the dropbox literature folder. |
||
− | |||
− | |||
− | * We call '''grommp''' the following way: |
||
− | |||
− | <code>grompp -v -f <file.mdp> -c <file_water.gro> -p <file.top> -o <file_water.tpr></code> |
||
− | |||
− | The '''.tpr file''' is what we need in the next step where we use the command '''genion'''. |
||
− | |||
− | |||
− | * Genion will add the solvent to the system and neutralize the charge that is part of the system due to charged amino acids. |
||
− | |||
− | <code>genion -s <file_water.tpr> -o <file_solv.pdb> -conc 0.1 -neutral -pname NA+ -nname CL-</code> |
||
− | |||
− | As the selection we will use number '''13''', which means that we replace the water molecules with the ions. -conc defines the concentration of the ions which is 0.1 molar. -neutral tells the command to neutralize the charges. -pname is the name of the positively charged ions to add, -nname the negatively charged. |
||
− | |||
− | |||
− | * Now we will have to do some manual work. From the output of the last command you should note how many CL- and NA+ ions have been added and how many solvent molecules are now given. These values need to be changed in the '''top file'''; copy the original and edit the copy. At the end of the file there are entries that begin with SOL. From the larger one you subtract the number of ions added. Then add the entry NA and the number of added NA+ ions and the entry CL and the number of added CL- ions. |
||
− | |||
− | |||
− | *To make sure our initial crystal water does not overlap with the added water we use repairPDB again. |
||
− | |||
− | <code>repairPDB <file_solv.pdb> -cleansol <file_solv2.pdb></code> |
||
− | |||
− | In the last line there is a '''REMARK''' tag with the number of removed water molecules. This has to be changed in the '''TOP''' file if larger than 0. |
||
− | |||
− | |||
− | * Before we are able to start with the minimization procedures we need to extract '''restraints''' from the structures. These restraints are useful to reduce the simulation time by disallowing very fast vibrations such as seen for hydrogen atoms. The command we use is called '''genrestr'''. |
||
− | |||
− | <code>genrestr -f <file_solv2.pdb> -o <file.itp></code> |
||
− | |||
− | In the command line menu we choose number 1 for protein. |
||
− | |||
− | == Minimization solvent== |
||
− | Now we can start our first minimization - minimizing the solvent only and restraining the protein. |
||
− | |||
− | * To prepare the system we once again need to use '''grompp''' and therefore create a '''mdp file'''. |
||
− | |||
− | The MDP file looks like this: |
||
− | |||
− | <code> |
||
− | define = -DPOSRES |
||
− | integrator = steep |
||
− | emtol = 1.0 |
||
− | nsteps = 500 |
||
− | nstenergy = 1 |
||
− | energygrps = System |
||
− | coulombtype = PME |
||
− | rcoulomb = 0.9 |
||
− | rvdw = 0.9 |
||
− | rlist = 0.9 |
||
− | fourierspacing = 0.12 |
||
− | pme_order = 4 |
||
− | ewald_rtol = 1e-5 |
||
− | pbc = xyz |
||
− | </code> |
||
− | |||
− | |||
− | |||
− | * We call '''grompp''' the following way: |
||
− | |||
− | <code>grompp -v -f <file.mdp> -c <file_solv2.pdb> -p <file.top> -o <file_solv_min.tpr></code> |
||
− | |||
− | |||
− | |||
− | * Now we do our first run with the simulation command '''mdrun'''. |
||
− | |||
− | If you have multiple processors you do the following: |
||
− | |||
− | <code>mpirun -np <number of processors to use> mdrun_mpi -v -deffnm <file_solv_min> -c <file_solv_min.pdb></code> |
||
− | |||
− | If you only run it on one processor it looks like this: |
||
− | |||
− | <code>mdrun -v -deffnm <file_solv_min> -c <file_solv_min.pdb></code> |
||
− | |||
− | == Minimization system == |
||
− | |||
− | Now we have a file where the solvent has been minimized. |
||
− | |||
− | * Next we need to minimize the solvent and the protein sidechains. |
||
− | We have to recreate the position restraints files. Choose option '''4''' which only restraints the backbone of the protein. |
||
− | |||
− | <code>genrestr -f <file_solv_min.pdb> -o <file.itp></code> |
||
− | |||
− | |||
− | * The '''mdp file''' from the previous minimization can be reused; therefore, we only have to use '''grompp''' to prepare our tpr files. |
||
− | |||
− | <code>grompp -v -f <file.mdp> -c <file_solv_min.pdb> -p <file.top> -o <file_solv_min2.tpr></code> |
||
− | |||
− | |||
− | |||
− | * If you have multiple processors you do the following: |
||
− | |||
− | <code>mpirun -np <number of processors to use> mdrun_mpi -v -deffnm <file_solv_min2> -c <file_solv_min2.pdb></code> |
||
− | |||
− | |||
− | * If you only run it on one processor it looks like this: |
||
− | |||
− | <code>mdrun -v -deffnm <file_solv_min2> -c <file_solv_min2.pdb></code> |
||
− | |||
− | == Equilibration of system == |
||
− | Now that our system is minimized we do have to start equilibrating it. The following procedures will take a while but should still be calculated locally and '''not''' using the '''LRZ account'''. |
||
− | |||
− | * In the next step we will apply temperature coupling. In this step we will slowly heat up our system to the wished temperature. In this simulation we have a constant particle number, volume and finally temperature. |
||
− | |||
− | You will have to create a new mdp file called '''nvt.mdp''' |
||
− | |||
− | |||
− | <code> |
||
− | define = -DPOSRES |
||
− | integrator = md |
||
− | dt = 0.005 |
||
− | nsteps = 10000 |
||
− | nstxout = 0 |
||
− | nstvout = 0 |
||
− | nstfout = 0 |
||
− | nstlog = 1000 |
||
− | nstxtcout = 0 |
||
− | nstenergy = 5 |
||
− | energygrps = Protein Non-Protein |
||
− | nstcalcenergy = 5 |
||
− | nstlist = 10 |
||
− | ns-type = Grid |
||
− | pbc = xyz |
||
− | rlist = 0.9 |
||
− | coulombtype = PME |
||
− | rcoulomb = 0.9 |
||
− | rvdw = 0.9 |
||
− | fourierspacing = 0.12 |
||
− | pme_order = 4 |
||
− | ewald_rtol = 1e-5 |
||
− | gen_vel = yes |
||
− | gen_temp = 200.0 |
||
− | gen_seed = 9999 |
||
− | constraints = all-bonds |
||
− | tcoupl = V-rescale |
||
− | tc-grps = Protein Non-Protein |
||
− | tau_t = 0.1 0.1 |
||
− | ref_t = 298 298 |
||
− | pcoupl = no |
||
− | </code> |
||
− | |||
− | For details regarding each setting check the gromacs manual. |
||
− | |||
− | |||
− | * We use once again grompp to prepare the '''tpr file''' for '''mdrun'''. |
||
− | |||
− | <code>grompp -v -f <nvt.mdp> -c <file_solv_min2.pdb> -p <file.top> -o <file_nvt.tpr></code> |
||
− | |||
− | |||
− | * Then we call mdrun as above without the <code>-c</code> option: |
||
− | |||
− | <code>... mdrun -v -deffnm file_nvt</code> |
||
− | |||
− | |||
− | * Next we create an .mdp file for pressure coupling '''NPT'''. |
||
− | |||
− | <code> |
||
− | define = -DPOSRES |
||
− | integrator = md |
||
− | dt = 0.005 |
||
− | nsteps = 10000 |
||
− | nstxout = 0 |
||
− | nstvout = 0 |
||
− | nstfout = 0 |
||
− | nstlog = 1000 |
||
− | nstxtcout = 0 |
||
− | nstenergy = 5 |
||
− | xtc_precision = 1000 |
||
− | xtc-grps = System |
||
− | energygrps = Protein Non-Protein |
||
− | nstcalcenergy = 5 |
||
− | nstlist = 5 |
||
− | ns-type = Grid |
||
− | pbc = xyz |
||
− | rlist = 0.9 |
||
− | coulombtype = PME |
||
− | rcoulomb = 0.9 |
||
− | rvdw = 0.9 |
||
− | fourierspacing = 0.12 |
||
− | pme_order = 4 |
||
− | ewald_rtol = 1e-5 |
||
− | tcoupl = V-rescale |
||
− | tc-grps = Protein Non-Protein |
||
− | tau_t = 0.1 0.1 |
||
− | ref_t = 298 298 |
||
− | pcoupl = Berendsen |
||
− | Pcoupltype = Isotropic |
||
− | tau_p = 1.0 |
||
− | compressibility = 4.5e-5 |
||
− | ref_p = 1.0 |
||
− | gen_vel = no |
||
− | constraints = all-bonds |
||
− | </code> |
||
− | |||
− | |||
− | * We use once again '''grompp''' to prepare the '''tpr file''' for '''mdrun'''. |
||
− | |||
− | <code>grompp -v -f <npt.mdp> -c <file_nvt.gro> -p <file.top> -o <file_npt.tpr></code> |
||
− | |||
− | |||
− | * Then we call '''mdrun''' as above: |
||
− | |||
− | <code>... mdrun -v -deffnm file_npt</code> |
||
− | |||
− | == Production run == |
||
− | Finally, we come to the production run. Again we do everything as above. Only the final step, the '''MD simulation''' we will submit to the '''LRZ linux cluster'''. Below I will suggest one possible way to do this - of course you can do this in a different way. Also, be aware that this might take a while, as the queues are generally really full. |
||
− | |||
− | * This is our md.mdp file: |
||
− | |||
− | <code> |
||
− | integrator = md |
||
− | tinit = 0 |
||
− | dt = 0.005 |
||
− | nsteps = 2000000 |
||
− | nstxout = 50000 |
||
− | nstvout = 50000 |
||
− | nstfout = 0 |
||
− | nstlog = 1000 |
||
− | nstxtcout = 1000 |
||
− | nstenergy = 1000 |
||
− | energygrps = Protein Non-Protein |
||
− | nstcalcenergy = 5 |
||
− | nstlist = 5 |
||
− | ns-type = Grid |
||
− | pbc = xyz |
||
− | rlist = 0.9 |
||
− | coulombtype = PME |
||
− | rcoulomb = 0.9 |
||
− | rvdw = 0.9 |
||
− | fourierspacing = 0.12 |
||
− | pme_order = 4 |
||
− | ewald_rtol = 1e-5 |
||
− | tcoupl = V-rescale |
||
− | tc-grps = Protein Non-Protein |
||
− | tau_t = 0.1 0.1 |
||
− | ref_t = 298 298 |
||
− | pcoupl = Berendsen |
||
− | Pcoupltype = Isotropic |
||
− | tau_p = 2.0 |
||
− | compressibility = 4.5e-5 |
||
− | ref_p = 1.0 |
||
− | gen_vel = no |
||
− | constraints = all-bonds |
||
− | constraint-algorithm = Lincs |
||
− | unconstrained-start = yes |
||
− | lincs-order = 4 |
||
− | lincs-iter = 1 |
||
− | lincs-warnangle = 30 |
||
− | comm_mode = linear |
||
− | </code> |
||
− | |||
− | |||
− | * We use once again '''grompp''' to prepare the '''.tpr file''' for '''mdrun'''. |
||
− | |||
− | <code> grompp -v -f <md.mdp> -c <file_npt.gro> -p <file.top> -o <file_md.tpr></code> |
||
− | |||
− | Try the '''mdrun''' locally, just to see whether all is ok: |
||
− | |||
− | <code>... mdrun -v -deffnm file_md</code> |
||
== LRZ == |
== LRZ == |
||
+ | =Prepare Environment= |
||
− | From now on we have to deal with the LRZ. |
||
+ | * Login to the LRZ: <code>ssh -XY username@lx64ia2.lrz.de</code> or <code>ssh -XY username@lx64ia3.lrz.de</code> |
||
− | |||
+ | * In order to use git you have to load the software module first. http://www.lrz.de/services/compute/supermuc/software/ |
||
− | * First we have to copy our files to the system. This can be achieved using '''scp'''. You need the '''.mdp''', '''.gro''', '''.top''' and '''.tpr''' files used/produced in the last '''grompp''' call. |
||
+ | * Go to a designated directory and clone the repository from https://github.com/offmarc/AGroS |
||
− | |||
− | + | * Include all the scripts in the PATH environment variable |
|
+ | * Get a license for SCRWL4 and install it into the same dir where the scripts are: http://dunbrack.fccc.edu/scwrl4/ |
||
− | <code>ssh -XY username@lx64ia2.lrz.de</code> |
||
+ | * Finally copy the WT and two mutants to the LRZ (scp) |
||
− | |||
+ | * IMPORTANT: Before you continue you should have a look at the scripts and check what they do! |
||
− | or |
||
− | |||
− | <code>ssh -XY username@lx64ia3.lrz.de</code> |
||
+ | =Prepare Job Scripts= |
||
+ | General info about preparing the Job Scripts can be found at http://www.lrz.de/services/compute/linux-cluster/batch_parallel/ |
||
+ | Submission can only be done from lxia4-1, lxia4-2. |
||
− | More info at http://www.lrz.de/services/compute/linux-cluster/batch/#batchSGE |
||
− | + | For each of the three structures you will have to create a separate job script. |
|
+ | Here is an example that together with the info on the above stated LRZ page should give you an idea how to do it. |
||
<code> |
<code> |
||
#!/bin/bash |
#!/bin/bash |
||
+ | #SBATCH -o /home/hpc/pr32fi/lu32xul/test/info.out |
||
− | #$-o $HOME/DIR/mdrun.WT_1.out -j y |
||
+ | #SBATCH -D /home/hpc/pr32fi/lu32xul/test/ |
||
− | #$-N PROJECTNAME(no number at beginning) |
||
+ | #SBATCH -J 1whz_MD |
||
− | #$-S /bin/bash |
||
+ | #SBATCH --partition=mpp1_inter |
||
− | #$-M YOUR@EMAIL.COM |
||
+ | #SBATCH --get-user-env |
||
− | #$-l h_rt=32:00:00 |
||
+ | #SBATCH --ntasks=32 |
||
− | #$-l march=x86_64 |
||
+ | #SBATCH --mail-type=end |
||
− | #$-pe mpi_32 32 |
||
+ | #SBATCH --mail-user=offman@lrz.de |
||
− | . /etc/profile |
||
+ | #SBATCH --export=NONE |
||
− | cd $HOME/DIR |
||
+ | #SBATCH --time=02:00:00 |
||
− | $HOME/DIR/mpirun -np 32 mdrun_mpi -v -deffnm $HOME/DIR/file_md |
||
+ | source /etc/profile.d/modules.sh |
||
+ | module load gromacs |
||
+ | export PATH="$HOME/test/AGroS:$PATH" |
||
+ | export PATH="$HOME/apps/bin/:$PATH" |
||
+ | AGroS 1whz_new.pdb -dir /home/hpc/pr32fi/lu32xul/test -threads 32 |
||
</code> |
</code> |
||
+ | In this script we do not use the standard cluster <code>--clusters=mpp1</code> but a test queue to get a quicker answer whether the simulation works at all. |
||
+ | =Submit Job= |
||
− | * Edit the '''~/.bashrc''' and add this to the end in order to load '''gromacs''': |
||
+ | Submission is done using the following command <code>sbatch job.script</code> |
||
− | <code>module load gromacs/4.6</code> |
||
+ | If the test simulation fails due to a gromacs problem try to use only 16 cores and change that also for the commandline call of AGroS. |
||
+ | In the real script you choose the standard cluster and instead of only 2 hours (limit) you set something like 16-32 hours depending on the size of your protein. |
||
− | * Finally to submit the job type <code>qsub bla.cmd</code> |
||
+ | =Waiting= |
||
− | *To check if everything is ok type <code>qstat</code> and look for your job... |
||
+ | The state of the job and whether it really sits in the queue can be checked with the command <code>squeue -u <username> <queue></code> where the queue can either be <code>--clusters=mpp1</code> or <code>--partition=mpp1_inter</code>. |
||
+ | Once this all worked you have to wait and write a bit about the different steps of the simulation etc. |
||
+ | We also want you to look at the intermediate PDB files created in the workflow, visualize them and explain what is special, different about them and why we need them. |
||
− | '''HAVE FUN !!!''' |
Latest revision as of 16:21, 27 June 2012
Intro
In this section we will simulate the wildtype protein and two interesting mutants with MD, e.g. the gromacs package. For this we will use an automatic pipeline. As the final simulations will take a while, we will post the analysis part at a later point. The pipeline is available as a git repository. All the work needs to be done on the LRZ now.
The slides of the task: File:MD talk.pdf
LRZ
Prepare Environment
- Login to the LRZ:
ssh -XY username@lx64ia2.lrz.de
orssh -XY username@lx64ia3.lrz.de
- In order to use git you have to load the software module first. http://www.lrz.de/services/compute/supermuc/software/
- Go to a designated directory and clone the repository from https://github.com/offmarc/AGroS
- Include all the scripts in the PATH environment variable
- Get a license for SCRWL4 and install it into the same dir where the scripts are: http://dunbrack.fccc.edu/scwrl4/
- Finally copy the WT and two mutants to the LRZ (scp)
- IMPORTANT: Before you continue you should have a look at the scripts and check what they do!
Prepare Job Scripts
General info about preparing the Job Scripts can be found at http://www.lrz.de/services/compute/linux-cluster/batch_parallel/
Submission can only be done from lxia4-1, lxia4-2.
For each of the three structures you will have to create a separate job script.
Here is an example that together with the info on the above stated LRZ page should give you an idea how to do it.
#!/bin/bash
#SBATCH -o /home/hpc/pr32fi/lu32xul/test/info.out
#SBATCH -D /home/hpc/pr32fi/lu32xul/test/
#SBATCH -J 1whz_MD
#SBATCH --partition=mpp1_inter
#SBATCH --get-user-env
#SBATCH --ntasks=32
#SBATCH --mail-type=end
#SBATCH --mail-user=offman@lrz.de
#SBATCH --export=NONE
#SBATCH --time=02:00:00
source /etc/profile.d/modules.sh
module load gromacs
export PATH="$HOME/test/AGroS:$PATH"
export PATH="$HOME/apps/bin/:$PATH"
AGroS 1whz_new.pdb -dir /home/hpc/pr32fi/lu32xul/test -threads 32
In this script we do not use the standard cluster --clusters=mpp1
but a test queue to get a quicker answer whether the simulation works at all.
Submit Job
Submission is done using the following command sbatch job.script
If the test simulation fails due to a gromacs problem try to use only 16 cores and change that also for the commandline call of AGroS.
In the real script you choose the standard cluster and instead of only 2 hours (limit) you set something like 16-32 hours depending on the size of your protein.
Waiting
The state of the job and whether it really sits in the queue can be checked with the command squeue -u <username> <queue>
where the queue can either be --clusters=mpp1
or --partition=mpp1_inter
.
Once this all worked you have to wait and write a bit about the different steps of the simulation etc.
We also want you to look at the intermediate PDB files created in the workflow, visualize them and explain what is special, different about them and why we need them.