Difference between revisions of "Task 9 Structure-based mutation analysis"

From Bioinformatikpedia
m (Intro)
(Intro)
 
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
== Intro ==
 
== Intro ==
This time we make full use of the protein’s known crystal structures and try to predict energetic changes caused by mutations.
+
This time we make full use of the protein’s known crystal structures and try to predict energetic changes caused by mutations. We use different methods to generate structures and predict energies. For the analysis, we check how the predicted structures differ and how the predicted energy changes differ. Do any of the analyses lead to a plausible prediction of the effect of the mutation?
  +
  +
The slides to the talk can be found here: [[File:Structure_based_mutation_analysis_Karo.pdf‎]]
   
 
== Where to run the analyses ==
 
== Where to run the analyses ==
Line 7: Line 9:
   
 
== Preparation ==
 
== Preparation ==
  +
=== Choose a structure to work with ===
Before we start you will have to choose one of structures available (if several can be used). The easiest way to do so is to look at the UniProt entry and check the PDB section there. It is important that the structure has a high resolution (small Å value); furthermore the R-factor should be as small as possible, and the higher the coverage the better. Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4). Finally, before you decide for a structure make sure it does not contain any gaps (missing residues) within the structure – this means two consecutive residues would not have a consecutive numbering. If there is no structure without missing residues, try to create a composite structure (contact me).
 
  +
Before we start you will have to choose one of structures available (if several can be used). In the previous sections you already had to use structures as references, so you could stick to your choice. However, in this part of the practical there are some additional constraints to observe.
  +
* It is important that the structure has a high resolution (small Å value);
  +
* furthermore the R-factor should be as small as possible, and the higher the coverage the better.
  +
* Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).
  +
* Finally, before you decide for a structure make sure it does not contain any gaps (missing residues) within the structure – this means two consecutive residues would not have a consecutive numbering.
  +
* If there is no structure without missing residues, try to create a composite structure.
   
  +
=== Visualise the mutations you want to work with ===
Map 5 mutations of your choice from the previously selected 10 mutations onto the crystal structure. Color the mutants differently than the rest of the protein and create a snapshot for the wiki. If applicable find out whether the mutations are close to the active site, a binding interface or other important functional sites. Visualize this and describe it properly.
 
  +
Map 5 mutations of your choice from the previously selected mutations onto the crystal structure:
  +
* Color the mutants differently than the rest of the protein and create a snapshot for the wiki.
  +
* If applicable find out whether the mutations are close to the active site, a binding interface or other important functional sites. Visualize this and describe it properly.
   
  +
=== Create mutated structures ===
Next we use SCWRL to create our mutation. Make sure you only change the side chain for the mutated structure. It is possible to give SCWRL the mutated sequence. This can be done by extracting the sequence with repairPDB. Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file. This sequence file can be read in by SCWRL using the –s flag. Check if only the mutation side chain has been changed.
 
  +
Next we use SCWRL to create our mutations. Make sure you only change one side chain for each mutated structure. It is possible to give SCWRL the mutated sequence. This can be done by extracting the sequence with repairPDB. Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file. This sequence file can be read in by SCWRL using the –s flag. Check if only the mutated side chain has been changed.
   
== Comparison energies ==
+
== Energy comparisons ==
 
In the following, compare wild type (WT) and mutant structures.
 
In the following, compare wild type (WT) and mutant structures.
   
 
Investigate the local hydrogen-bonding network using pymol[http://www.pymolwiki.org/index.php/Main_Page] – also check for potential clashes (when sidechains are too close to each other). Are you introducing hydrophilics to the core or hydrophobics to the protein surface? Are there any holes introduced to the protein due to the mutations?
 
Investigate the local hydrogen-bonding network using pymol[http://www.pymolwiki.org/index.php/Main_Page] – also check for potential clashes (when sidechains are too close to each other). Are you introducing hydrophilics to the core or hydrophobics to the protein surface? Are there any holes introduced to the protein due to the mutations?
   
Now that you should have a clear idea of the WT and mutant proteins we will try to calculate some energies. Always calculate the energy for the wild type and mutants – then substract/compare.
+
Now that you should have a clear idea of the WT and mutant proteins, we will try to calculate some energies. Always calculate the energy for the wild type and mutants – then substract/compare.
   
   
Line 35: Line 47:
 
To run foldx you will need to make a static link in your working directory to the file <code>ln -sf /opt/SS12-Practical/foldx/rotabase.txt .</code>
 
To run foldx you will need to make a static link in your working directory to the file <code>ln -sf /opt/SS12-Practical/foldx/rotabase.txt .</code>
   
Foreach of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.
+
For each of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.
   
 
Compare the scwrl and foldx structures in Pymol and superimpose them. What are the differences?
 
Compare the scwrl and foldx structures in Pymol and superimpose them. What are the differences?
Line 44: Line 56:
 
Here we call minimise with the input structure filename as the first argument and the output filename as the second.
 
Here we call minimise with the input structure filename as the first argument and the output filename as the second.
   
Apply this for all 10 mutants and the WT using the scwrl and foldx structures.
+
Apply this for all mutant structures (produced by scwrl and foldx) and the WT.
   
 
Also, use the output of one minimisation for another run as input. Do this 5 times for each structure. What happens regarding the energy?
 
Also, use the output of one minimisation for another run as input. Do this 5 times for each structure. What happens regarding the energy?
 
Please only look at the energy for the recursive runs. The structures should only compared for the second minimise run.
 
Please only look at the energy for the recursive runs. The structures should only compared for the second minimise run.
   
== Gromacs ==
+
== Gromacs (optional task for those who love MD!) ==
Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. Here we will get a basic introduction, which will also be useful in the MD task later on. First we will get all the necessary files and then we will minimize our protein in vacuum. Finally we will analyze the energies during the minimization.
+
Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. Here we will get a basic introduction. First we will get all the necessary files and then we will minimize our protein in vacuum. Finally we will analyze the energies during the minimization.
   
 
The Gromacs manual can be found here [http://www.gromacs.org/@api/deki/files/152/=manual-4.5.4.pdf]; tutorial are available at this site[http://www.gromacs.org/Documentation/Tutorials]
 
The Gromacs manual can be found here [http://www.gromacs.org/@api/deki/files/152/=manual-4.5.4.pdf]; tutorial are available at this site[http://www.gromacs.org/Documentation/Tutorials]
   
  +
=== Input ===
  +
* For the mutations we will ONLY use the AMBER03 forcefield!
  +
* For the mutation input choose either the scwrl or foldx structures. Base your decisions on the previous results and explain.
   
  +
=== Steps ===
The following you only have to do for the WT: Repeat step 5 to 7 with different settings for “nsteps” and time mdrun. Create a plot nsteps versus time. Do this for ONLY one forcefield. Do the whole analysis for two other energy functions chosen in step 4 (AMBER03 should be included in any case).
 
  +
<ol>
 
  +
<li> Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.
 
  +
<li> Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.
For the mutations we will ONLY use the AMBER03 forcefield!
 
  +
<li>Create a MDP file with the following content (for a description of the different keywords used in this file see the gromacs manual)<br />
 
For the mutation input choose either the scwrl or foldx structures. Base your decisions on the previous results and explain.
 
 
1. Use fetchpdb to get the pdb structure – look at the script, what does it do?
 
 
 
2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.
 
 
 
3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.
 
 
 
4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.
 
 
 
5. Create a MDP file with the following content
 
 
 
<code>
 
<code>
title = PBSA minimization in vacuum
+
title = PBSA minimization in vacuum<br />
cpp = /usr/bin/cpp
+
cpp = /usr/bin/cpp <br />
define = -DFLEXIBLE -DPOSRES
+
define = -DFLEXIBLE -DPOSRES <br />
implicit_solvent = GBSA
+
implicit_solvent = GBSA <br />
integrator = steep
+
integrator = steep <br />
emtol = 1.0
+
emtol = 1.0 <br />
nsteps = 500
+
nsteps = 500 <br />
nstenergy = 1
+
nstenergy = 1 <br />
energygrps = System
+
energygrps = System <br />
ns_type = grid
+
ns_type = grid <br />
coulombtype = cut-off
+
coulombtype = cut-off <br />
rcoulomb = 1.0
+
rcoulomb = 1.0 <br />
rvdw = 1.0
+
rvdw = 1.0 <br />
constraints = none
+
constraints = none <br />
 
pbc = no
 
pbc = no
 
</code>
 
</code>
  +
<li> Use grompp to prepare the system for gromacs (producing the file FILE.tpr, which we use in the next step): <code>grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr </code>
Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.
 
  +
<li> Now we minimize the system: <code>mdrun -v -deffnm FILE </code>
 
  +
<li> Analyze the minimization of the system (Bond, Angle and Potential) with the following command: <code>g_energy -f FILE.edr -o energy_1.xvg</code>.
 
  +
</ol>
6. Use grompp to prepare the system for gromacs: <code>grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr </code>
 
  +
The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, then print and convert to pdf.
FILE.tpr is the system file to create which we use in the next step
 
 
 
7. Now we minimize the system: <code>mdrun -v -deffnm FILE </code>
 
 
   
  +
=== Further experiments (only for the WT!) ===
8. Analyze the minimization of the system with the following command: <code>g_energy -f FILE.edr -o energy_1.xvg</code>.
 
  +
* Repeat step 3 to 6 with different settings for “nsteps” and measure the time of mdrun; create a plot nsteps versus time.
Do the analysis for Bond, Angle and Potential.
 
  +
* Do the whole analysis for two other energy functions chosen in step 4 (AMBER03 should be included in any case).
The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.
 

Latest revision as of 15:07, 2 July 2013

Intro

This time we make full use of the protein’s known crystal structures and try to predict energetic changes caused by mutations. We use different methods to generate structures and predict energies. For the analysis, we check how the predicted structures differ and how the predicted energy changes differ. Do any of the analyses lead to a plausible prediction of the effect of the mutation?

The slides to the talk can be found here: File:Structure based mutation analysis Karo.pdf

Where to run the analyses

  • You have to use the student computer pool: i12k-biolab0n.informatik.tu-muenchen.de, where n goes from 1 to 9 (or more?). The file server does not have blast etc. installed!
  • The software and scripts to use can be found in the dir /opt/SS12-Practical

Preparation

Choose a structure to work with

Before we start you will have to choose one of structures available (if several can be used). In the previous sections you already had to use structures as references, so you could stick to your choice. However, in this part of the practical there are some additional constraints to observe.

  • It is important that the structure has a high resolution (small Å value);
  • furthermore the R-factor should be as small as possible, and the higher the coverage the better.
  • Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).
  • Finally, before you decide for a structure make sure it does not contain any gaps (missing residues) within the structure – this means two consecutive residues would not have a consecutive numbering.
  • If there is no structure without missing residues, try to create a composite structure.

Visualise the mutations you want to work with

Map 5 mutations of your choice from the previously selected mutations onto the crystal structure:

  • Color the mutants differently than the rest of the protein and create a snapshot for the wiki.
  • If applicable find out whether the mutations are close to the active site, a binding interface or other important functional sites. Visualize this and describe it properly.

Create mutated structures

Next we use SCWRL to create our mutations. Make sure you only change one side chain for each mutated structure. It is possible to give SCWRL the mutated sequence. This can be done by extracting the sequence with repairPDB. Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file. This sequence file can be read in by SCWRL using the –s flag. Check if only the mutated side chain has been changed.

Energy comparisons

In the following, compare wild type (WT) and mutant structures.

Investigate the local hydrogen-bonding network using pymol[1] – also check for potential clashes (when sidechains are too close to each other). Are you introducing hydrophilics to the core or hydrophobics to the protein surface? Are there any holes introduced to the protein due to the mutations?

Now that you should have a clear idea of the WT and mutant proteins, we will try to calculate some energies. Always calculate the energy for the wild type and mutants – then substract/compare.


Use the following approaches:

  • foldX

before minimise and gromacs the hydrogens and waters (protein only) need to be removed using repairPDB

  • minimise
  • gromacs

foldX

Examples how to use foldX can be found here:[2] We want to use the approach Multiple mutations using indivudal list.

To run foldx you will need to make a static link in your working directory to the file ln -sf /opt/SS12-Practical/foldx/rotabase.txt .

For each of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.

Compare the scwrl and foldx structures in Pymol and superimpose them. What are the differences?

In the next step you will use both the scwrl and the foldX structure as input to minimise.

Minimise

Here we call minimise with the input structure filename as the first argument and the output filename as the second.

Apply this for all mutant structures (produced by scwrl and foldx) and the WT.

Also, use the output of one minimisation for another run as input. Do this 5 times for each structure. What happens regarding the energy? Please only look at the energy for the recursive runs. The structures should only compared for the second minimise run.

Gromacs (optional task for those who love MD!)

Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. Here we will get a basic introduction. First we will get all the necessary files and then we will minimize our protein in vacuum. Finally we will analyze the energies during the minimization.

The Gromacs manual can be found here [3]; tutorial are available at this site[4]

Input

  • For the mutations we will ONLY use the AMBER03 forcefield!
  • For the mutation input choose either the scwrl or foldx structures. Base your decisions on the previous results and explain.

Steps

  1. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.
  2. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.
  3. Create a MDP file with the following content (for a description of the different keywords used in this file see the gromacs manual)
    title = PBSA minimization in vacuum
    cpp = /usr/bin/cpp
    define = -DFLEXIBLE -DPOSRES
    implicit_solvent = GBSA
    integrator = steep
    emtol = 1.0
    nsteps = 500
    nstenergy = 1
    energygrps = System
    ns_type = grid
    coulombtype = cut-off
    rcoulomb = 1.0
    rvdw = 1.0
    constraints = none
    pbc = no
  4. Use grompp to prepare the system for gromacs (producing the file FILE.tpr, which we use in the next step): grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr
  5. Now we minimize the system: mdrun -v -deffnm FILE
  6. Analyze the minimization of the system (Bond, Angle and Potential) with the following command: g_energy -f FILE.edr -o energy_1.xvg.

The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, then print and convert to pdf.

Further experiments (only for the WT!)

  • Repeat step 3 to 6 with different settings for “nsteps” and measure the time of mdrun; create a plot nsteps versus time.
  • Do the whole analysis for two other energy functions chosen in step 4 (AMBER03 should be included in any case).