Latest revision as of 16:21, 27 June 2012

Intro

In this section we will simulate the wildtype protein and two interesting mutants with MD, e.g. the gromacs package. For this we will use an automatic pipeline. As the final simulations will take a while, we will post the analysis part at a later point. The pipeline is available as a git repository. All the work needs to be done on the LRZ now.

The slides of the task: File:MD talk.pdf

LRZ

Prepare Environment

Login to the LRZ: ssh -XY username@lx64ia2.lrz.de or ssh -XY username@lx64ia3.lrz.de
In order to use git you have to load the software module first. http://www.lrz.de/services/compute/supermuc/software/
Go to a designated directory and clone the repository from https://github.com/offmarc/AGroS
Include all the scripts in the PATH environment variable
Get a license for SCRWL4 and install it into the same dir where the scripts are: http://dunbrack.fccc.edu/scwrl4/
Finally copy the WT and two mutants to the LRZ (scp)
IMPORTANT: Before you continue you should have a look at the scripts and check what they do!

Prepare Job Scripts

General info about preparing the Job Scripts can be found at http://www.lrz.de/services/compute/linux-cluster/batch_parallel/

Submission can only be done from lxia4-1, lxia4-2.

For each of the three structures you will have to create a separate job script.

Here is an example that together with the info on the above stated LRZ page should give you an idea how to do it.

#!/bin/bash
#SBATCH -o /home/hpc/pr32fi/lu32xul/test/info.out
#SBATCH -D /home/hpc/pr32fi/lu32xul/test/
#SBATCH -J 1whz_MD
#SBATCH --partition=mpp1_inter
#SBATCH --get-user-env
#SBATCH --ntasks=32
#SBATCH --mail-type=end
#SBATCH --mail-user=offman@lrz.de
#SBATCH --export=NONE
#SBATCH --time=02:00:00
source /etc/profile.d/modules.sh
module load gromacs
export PATH="$HOME/test/AGroS:$PATH"
export PATH="$HOME/apps/bin/:$PATH"
AGroS 1whz_new.pdb -dir /home/hpc/pr32fi/lu32xul/test -threads 32

In this script we do not use the standard cluster --clusters=mpp1 but a test queue to get a quicker answer whether the simulation works at all.

Submit Job

Submission is done using the following command sbatch job.script

If the test simulation fails due to a gromacs problem try to use only 16 cores and change that also for the commandline call of AGroS.

In the real script you choose the standard cluster and instead of only 2 hours (limit) you set something like 16-32 hours depending on the size of your protein.

Waiting

The state of the job and whether it really sits in the queue can be checked with the command squeue -u <username> <queue> where the queue can either be --clusters=mpp1 or --partition=mpp1_inter.

Once this all worked you have to wait and write a bit about the different steps of the simulation etc.

We also want you to look at the intermediate PDB files created in the workflow, visualize them and explain what is special, different about them and why we need them.

Difference between revisions of "Task 8 - Molecular Dynamics Simulations"

Latest revision as of 16:21, 27 June 2012

Contents

Intro

LRZ

Prepare Environment

Prepare Job Scripts

Submit Job

Waiting

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools

@@ Line 1: / Line 1: @@
-== Task description ==
+== Intro ==
+In this section we will simulate the wildtype protein and two interesting mutants with MD, e.g. the gromacs package. For this we will use an automatic pipeline. As the final simulations will take a while, we will post the analysis part at a later point.
-A detailed task description can be found [[Task_7_-_Structure-based_mutation_analysis| here]].
+The pipeline is available as a git repository. All the work needs to be done on the LRZ now.
-== Intro and selection of two mutants ==
-The description given here was applied for our wild type (1J8U). However, the steps done for our two mutants R408W  and P281L are the same.
-We selected R408 because it is the most abundant mutation of PAH which is associated with phenylketonuria. It has a total frequency of  6.67% (see [http://www.pahdb.mcgill.ca/cgi-bin/pahdb/mutation_statistics-1.cgi]).
-The second mutation, P281L, was selected because it is the closed mutation to the binding site (HIS 285) and we hope that we are able to see some interesting things here which give us an explanation why this mutation is disease causing.
-== Preperation ==
-Before we can start to generate all necessary files for our MD simulation we have to prepare our structure first. We employed the following steps:
-'''1. Extracting the crystal water below 15 Å'''
-<code>
- repairPDB 1J8U.pdb -ssw 15 > water_below_15.out
-</code>
-We received the following output:
-<code>
- HETATM 2559  O   HOH A1008       2.996   9.738  16.094  1.00 14.91           O
- HETATM 2561  O   HOH A1010      -3.667  12.788   8.538  1.00 14.80           O
- HETATM 2572  O   HOH A1021       3.557  27.911  10.913  1.00 14.69           O
- HETATM 2879  O   HOH A1328      -5.335  24.271  14.490  1.00 14.88           O
- TER
-</code>
-'''2. Extract only the protein'''
-<code>
- repairPDB 1J8U.pdb -nosol > 1J8U_nosol.pdb
-</code>
-'''3. Extract the amino acid sequence and turn it into lower case letter to give it as a input for SCWRL'''
-<code>
- repairPDB 1J8U.pdb -seq > 1J8U_seq.txt
- vim 1J8U_seq.txt
- :%s/.*/\L&/g
-</code>
-'''4. Use SCWRL to complete the side chains of all amino acids'''
-<code>
- /apps/scwrl4/Scwrl4 -i 1J8U_nosol.pdb -s 1J8U_seq.txt -o 1J8U_nosol_after_SCWRL.pdb | tee scwrl_1J8U_nosol_after_scwrl.out
-</code>
-'''5. Remove the hydrogen atoms from the SCWRL output'''
-<code>
- repairPDB 1J8U_nosol_after_SCWRL.pdb -noh > 1J8U_nosol_after_SCWRL_no_h.pdb
-</code>
-'''6. concatenate the protein and the crystal water into one file'''
-We just added the output of step 1 to the end of the PDB file 1J8U_nosol_after_SCWRL_no_h.pdb and named this file ''1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.pdb''
-== Create important GROMACS files ==
-'''1. We have to create a TOP (topology) and a PAR (parameter) file with the following command:'''
-<code>
- pdb2gmx -f 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.pdb -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.gro -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -water tip3p -ff amber03 -vsite hydrogens | tee pdb2gmx.out
-</code>
-'''2. Next we create a box around the protein and fill it with water'''
-Create the box:
-<code>
- editconf -f 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.gro -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_box.gro -bt dodecahedron -d 1.0 | tee editconf.out
-</code>
-Fill it with water:
-<code>
- genbox -cp 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_box.gro -cs spc216.gro -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_water.gro
-</code>
-'''3. In the next step we have to add the ions. To do so we need to use the grompp command, which is always used to prepare a system for more complicated steps. This command needs an mdp file where we can define several additional parameters.'''
-Create parameter file addions.mdp:
-<code>
- vim addions.mdp
-</code>
-And fill it with the following content:
-<code>
- integrator = steep
- emtol = 1.0
- nsteps = 500
- nstenergy = 1
- energygrps = System
- coulombtype = PME
- rcoulomb = 0.9
- rvdw	 = 0.9
- rlist = 0.9
- fourierspacing = 0.12
- pme_order = 4
- ewald_rtol = 1e-5
-</code>
-Now, we execute it with this command:
-<code>
- grompp -v -f addions.mdp  -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_water.gro -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_water.tpr | tee grommp.out
-</code>
-'''4. Now we add the solvent to the system and neutralize the charge that is part of the system due to charged amino acids. To do so we use genion:'''
-<code>
- genion -s 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_water.tpr -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv.pdb -conc 0.1 -neutral -pname NA+ -nname CL- | tee genion.out
-</code>
-During the call we were asked for a number we selected 13.
-'''5. Next we have to adjust the SOL value and add the number of NA+ and CL- in our 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top'''
-We had 25 NA+ and 21 CL- ions generated from our last step. So we had to reduce the number in the SOL field to 10170 - 46 = '''10124''' the .top file had in the end the following values:
-<code>
- [ molecules ]
- ; Compound        #mols
- Protein_chain_A     1
- SOL                 4
- SOL             10124
- NA                 25
- CL                 21
-</code>
-'''6. To make sure our initial crystal water does not overlap with the added water we use repairPDB again.'''
-<code>
- repairPDB 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv.pdb -cleansol > 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2.pdb
-</code>
-'''7. In the last line there is a REMARK tag with the number of removed water molecules. This has to be changed in the TOP file if larger than 0.'''
-<code>
- REMARK RM 0
-</code>
-Value is 0 so we did not have to change anything.
-'''8. We need to extract restraints from the structures. These restraints are useful to reduce the simulation time by disallowing very fast vibrations such as seen for hydrogen atoms. The command we use is called genrestr.'''
-<code>
- genrestr -f 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2.pdb -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2.itp | tee genrestr.out
-</code>
-In the command line menu we choose number 1 for protein.
-== Minimization solvent ==
-'''1. Creating a .mdp file for grompp to prepare the system for the solvent minimization'''
-We created an file with the name minimize_solvent.mdp by using vim:
-<code>
- vim minimize_solvent.mdp
-</code>
-The content of the file is as follows:
-<code>
- define = -DPOSRES
- integrator = steep
- emtol = 1.0
- nsteps = 500
- nstenergy = 1
- energygrps = System
- coulombtype = PME
- rcoulomb	 = 0.9
- rvdw = 0.9
- rlist = 0.9
- fourierspacing = 0.12
- pme_order = 4
- ewald_rtol = 1e-5
- pbc = xyz
-</code>
-'''2. execution of grompp to prepare the system for the MD run'''
-<code>
- grompp -v -f minimize_solvent.mdp -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2.pdb -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min.tpr | tee grompp_minimize_solvent.out
-</code>
-'''3. Executing mdrun to minimize the solvent'''
-<code>
- mdrun -v -deffnm 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min.pdb | tee mdrun_minimize_solvent.out
-</code>
-== Minimization system ==
-In this part we minimize the solvent AND the side chains of the protein.
-'''1. Creating the position restraint file.'''
-<code>
- genrestr -f 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min.pdb -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min.itp | tee genrestr_minimize_system.out
-</code>
-We selected option 4 which says that we only put restraints on the backbone of the protein.
-'''2. Create an .mdp file for the grompp call'''
-We use the same .mdp from the solvent minimization step
-'''3. Calling grompp to prepare the tpr file for our mdrun step'''
-<code>
- grompp -v -f minimize_solvent.mdp -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min.pdb -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min2.tpr | tee grompp_minimize_system.out
-</code>
-'''4. Calling mdrun to minimize the system'''
-<code>
- mdrun -v -deffnm 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min2 -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min2.pdb | tee mdrun_minimize_system.out
-</code>
-== Equilibration of system ==
-=== Heating up the System with constant volume and particle number (NVT) ===
-Here we will heat up our system to a certain temperature. We will keep our particle number and volume constant, the pressure is changing.
-.  creating a .mdp configuration file for grompp
-We created a mdp file with the name nvt.mdp :
-<code>
- vim nvt.mdp
-</code>
-The content is as follows:
-<code>
- define = -DPOSRES
- integrator = md
- dt = 0.005
- nsteps = 10000
- nstxout = 0
- nstvout = 0
- nstfout = 0
- nstlog = 1000
- nstxtcout	 = 0
- nstenergy = 5
- energygrps = Protein Non-Protein
- nstcalcenergy = 5
- nstlist = 10
- ns-type = Grid
- pbc = xyz
- rlist = 0.9
- coulombtype = PME
- rcoulomb = 0.9
- rvdw = 0.9
- fourierspacing = 0.12
- pme_order = 4
- ewald_rtol = 1e-5
- gen_vel = yes
- gen_temp = 200.0
- gen_seed = 9999
- constraints = all-bonds
- tcoupl = V-rescale
- tc-grps = Protein  Non-Protein
- tau_t = 0.1	0.1
- ref_t = 298	298
- pcoupl = no
-</code>
-'''2. Creating .tpr file for our MD run by employing grompp'''
-<code>
- grompp -v -f nvt.mdp -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_solv2_min2.pdb -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_nvt.tpr | tee grompp_equi_system_nvt.out
-</code>
-'''3. MD run'''
-<code>
- mdrun -v -deffnm 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_nvt | tee mdrun_equi_system_nvt.out
-</code>
-=== Heating up the System with constant pressure and particle number (NPT) ===
-Here we will heat up our system to a certain temperature. We will keep our particle number and pressure constant, the volume is changing.
-'''1. Creating an .mdp file for pressure coupling NPT'''
-We create a .mdp file with the following command:
-<code>
- vim npt.mdp
-</code>
-And the content:
-<code>
- define = -DPOSRES
- integrator = md
- dt = 0.005
- nsteps = 10000
- nstxout = 0
- nstvout = 0
- nstfout = 0
- nstlog = 1000
- nstxtcout = 0
- nstenergy = 5
- xtc_precision = 1000
- xtc-grps = System
- energygrps = Protein Non-Protein
- nstcalcenergy = 5
- nstlist = 5
- ns-type = Grid
- pbc = xyz
- rlist = 0.9
- coulombtype = PME
- rcoulomb = 0.9
- rvdw = 0.9
- fourierspacing = 0.12
- pme_order = 4
- ewald_rtol = 1e-5
- tcoupl = V-rescale
- tc-grps = Protein Non-Protein
- tau_t = 0.1 0.1
- ref_t = 298 298
- pcoupl = Berendsen
- Pcoupltype = Isotropic
- tau_p = 1.0
- compressibility = 4.5e-5
- ref_p = 1.0
- gen_vel = no
- constraints = all-bonds
-</code>
-'''2. Creating .tpr file for our MD run by employing grompp'''
-<code>
- grompp -v -f npt.mdp -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_nvt.gro -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_npt.tpr | tee grompp_equi_system_npt.out
-</code>
-'''3. MD run'''
-<code>
- mdrun -v -deffnm 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_npt | tee mdrun_equi_system_npt.out
-</code>
-== Production run ==
-In this part we prepare the files for our final production run on the LRZ cluster.
-'''1. Creating a .mdp parameter file for grompp'''
-We created a file with the following command:
-<code>
- vim md.mdp
-</code>
-And this content:
-<code>
- integrator = md
- tinit = 0
- dt = 0.005
- nsteps = 2000000
- nstxout = 50000
- nstvout = 50000
- nstfout = 0
- nstlog = 1000
- nstxtcout = 1000
- nstenergy = 1000
- energygrps = Protein Non-Protein
- nstcalcenergy = 5
- nstlist = 5
- ns-type = Grid
- pbc = xyz
- rlist = 0.9
- coulombtype = PME
- rcoulomb = 0.9
- rvdw = 0.9
- fourierspacing = 0.12
- pme_order = 4
- ewald_rtol = 1e-5
- tcoupl = V-rescale
- tc-grps = Protein Non-Protein
- tau_t = 0.1 0.1
- ref_t = 298 298
- pcoupl = Berendsen
- Pcoupltype = Isotropic
- tau_p = 2.0
- compressibility = 4.5e-5
- ref_p = 1.0
- gen_vel = no
- constraints = all-bonds
- constraint-algorithm = Lincs
- unconstrained-start = yes
- lincs-order = 4
- lincs-iter = 1
- lincs-warnangle = 30
- comm_mode = linear
-</code>
-'''2. Creating a .tpr file for our MD run'''
-<code>
- grompp -v -f md.mdp -c 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_npt.gro -p 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top -o 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_md.tpr | tee grompp_prduction_run_md.out
-</code>
+The slides of the task: [[File:MD talk.pdf]]
 == LRZ ==
+=Prepare Environment=
+* Login to the LRZ: <code>ssh -XY username@lx64ia2.lrz.de</code> or <code>ssh -XY username@lx64ia3.lrz.de</code>
+* In order to use git you have to load the software module first. http://www.lrz.de/services/compute/supermuc/software/
+* Go to a designated directory and clone the repository from https://github.com/offmarc/AGroS
+* Include all the scripts in the PATH environment variable
+* Get a license for SCRWL4 and install it into the same dir where the scripts are: http://dunbrack.fccc.edu/scwrl4/
+* Finally copy the WT and two mutants to the LRZ (scp)
+* IMPORTANT: Before you continue you should have a look at the scripts and check what they do!
+=Prepare Job Scripts=
-=== Locally ===
+General info about preparing the Job Scripts can be found at http://www.lrz.de/services/compute/linux-cluster/batch_parallel/
+Submission can only be done from lxia4-1, lxia4-2.
-'''1. Copy files to LRZ'''
+For each of the three structures you will have to create a separate job script.
-We copied the following files to the LRZ:
-* md.mdp
-* 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water.top
-* 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_npt.gro
-* 1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_md.tpr
-By using the following command:
+Here is an example that together with the info on the above stated LRZ page should give you an idea how to do it.
 <code>
+ #!/bin/bash
- scp -r . di69duj@lx64ia3.lrz.de:/home/cluster/pr58ni/di69duj
+ #SBATCH -o /home/hpc/pr32fi/lu32xul/test/info.out
+ #SBATCH -D /home/hpc/pr32fi/lu32xul/test/
+ #SBATCH -J 1whz_MD
+ #SBATCH --partition=mpp1_inter
+ #SBATCH --get-user-env
+ #SBATCH --ntasks=32
+ #SBATCH --mail-type=end
+ #SBATCH --mail-user=offman@lrz.de
+ #SBATCH --export=NONE
+ #SBATCH --time=02:00:00
+ source /etc/profile.d/modules.sh
+ module load gromacs
+ export PATH="$HOME/test/AGroS:$PATH"
+ export PATH="$HOME/apps/bin/:$PATH"
+ AGroS 1whz_new.pdb -dir /home/hpc/pr32fi/lu32xul/test -threads 32
 </code>
+In this script we do not use the standard cluster <code>--clusters=mpp1</code> but a test queue to get a quicker answer whether the simulation works at all.
+=Submit Job=
-=== On the LRZ ===
+Submission is done using the following command <code>sbatch job.script</code>
-'''1. Connecting to the LRZ'''
+If the test simulation fails due to a gromacs problem try to use only 16 cores and change that also for the commandline call of AGroS.
-<code>
- ssh di69duj@lx64ia3.lrz.de
-</code>
+In the real script you choose the standard cluster and instead of only 2 hours (limit) you set something like 16-32 hours depending on the size of your protein.
+=Waiting=
-'''2. Creating a working dictionary'''
+The state of the job and whether it really sits in the queue can be checked with the command <code>squeue -u <username> <queue></code> where the queue can either be <code>--clusters=mpp1</code> or <code>--partition=mpp1_inter</code>.
-<code>
- mkdir md_wt
-</code>
-'''3. Copy all files to this working directory'''
-<code>
- cp *.* md_wt
-</code>
-And finally change to that directory:
-<code>
- cd md_wt
-</code>
-'''4. Creating a runfile with the name md_run.cmd'''
-Creating the file with vim:
-<code>
- vim md_run.cmd
-</code>
-With the following content:
-<code>
- #!/bin/bash
- #$-o $HOME/md_wt/mdrun.WT_1.out -j y
- #$-N MD_WT
- #$-S /bin/bash
- #$-M erik.pfeiffenberger@gmail.com
- #$-l h_rt=32:00:00
- #$-l march=x86_64
- #$-pe mpi_32 32
- . /etc/profile
- cd $HOME/md_wt
- $HOME/md_wt/mpirun -np 32 mdrun_mpi -v -deffnm $HOME/md_wt/1J8U_nosol_after_SCWRL_no_h_merged_crystal_water_md
-</code>
-'''5. Submitting job'''
-<code>
- qsub md_run.cmd
-</code>
+Once this all worked you have to wait and write a bit about the different steps of the simulation etc.
-'''6. Waiting...'''
+We also want you to look at the intermediate PDB files created in the workflow, visualize them and explain what is special, different about them and why we need them.
-While waiting, we recommend a cup of earl grey tea with one quarter milk and two tea spoons of sugar. Enjoy :)