Structure-based mutation analysis ARSA

One can see, that the mutations are spread out through the protein. Some lie near functional sites, other are very distant from them. The table below shows again Pymol visualizations of all mutations, but each seperately. With this, we want to try to derive investigate the correlation of location of the mutation with respect to functional sites and its effect on the protein function.

Nr.	mutation	position	Pymol image	Description	Effect on function
1	Asp-Asn	29		The mutation is located at the position of a metal-binding site.	harmful
2	Pro - Ala	136		The mutation is located near the active site and a substrate binding site in sequence as well as in structure.	harmful
3	Gln-His	153		The mutation is located near the active site and a substrate binding site in sequence as well as in structure.	harmful
4	Trp-Cys	193		The mutation is at moderate distance to all important functional sites of the protein.	neutral
5	Thr-Met	274		The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure. It is located within a beta sheet.	harmful
6	Phe -Val	356		The mutation is at moderate distance to all important functional sites of the protein.	neutral
7	Thr-Ile	409		The mutation is not close to important functional sites.	harmful
8	Asn-Ser	440		The mutation is very distant from all functional sites.	neutral
9	Cys-Gly	489		The mutation is very distant from all functional sites.	harmful
10	Arg-His	496		The mutation is very distant from all functional sites.	harmful

Above visualizations indicate, that mutations near functional important sites of the protein are likely to cause a harmful effect. However, for distant mutations no trend can be observed.

SCRWL

First, we extracted the amino acid sequence from our pdb file and converted it to lower case.


repairPDB arsa_model.pdb -seq > arsa.model.seq
tr '[:upper:]' '[:lower:]' < arsa.model.seq > arsa.model.lower.seq

Next, we included the individual mutations as capital letters in seperate files and executed scwrl with the following command:


scwrl cmd

We also ran SCWRL on the wild type (wt) structure in order to make it comparable to energy predictions by other programs. The minimal energy of the graph for the wt is 415.134.

Nr.	mutation	position	Reference amino acid	mutated amino acid	both (without H-bonds)	Minimal energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29				419.996	1.011712
2	Pro - Ala	136				415.133	0.9999976
3	Gln-His	153				420.494	1.012911
4	Trp-Cys	193				416.252	1.002693
5	Thr-Met	274				434.014	1.045479
6	Phe -Val	356				415.134	1
7	Thr-Ile	409				414.481	0.998427
8	Asn-Ser	440				418.863	1.008983
9	Cys-Gly	489				415.136	1.000005
10	Arg-His	496				421.011	1.014157

FoldX

wt: 510.88

Nr.	mutation	position	Minimal energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29	496.88	0.9725963
2	Pro - Ala	136	493.81	0.966587
3	Gln-His	153	493.90	0.9667632
4	Trp-Cys	193	496.23	0.971324
5	Thr-Met	274	503.34	0.9852412
6	Phe -Val	356	495.39	0.9696798
7	Thr-Ile	409	495.45	0.9697972
8	Asn-Ser	440	496.79	0.9724201
9	Cys-Gly	489	495.87	0.9706193
10	Arg-His	496	498.75	0.9762567

Minimise

wt: -3839.492677

Nr.	mutation	position	Free energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29	-4174.487222	1.087250
2	Pro - Ala	136	-4164.523707	1.084655
3	Gln-His	153	-4109.12924	1.070227
4	Trp-Cys	193	-4169.617285	1.085981
5	Thr-Met	274	-4065.730562	1.058924
6	Phe -Val	356	-4109.021939	1.070199
7	Thr-Ile	409	-4121.130656	1.073353
8	Asn-Ser	440	-4120.275589	1.073130
9	Cys-Gly	489	-4127.969116	1.075134
10	Arg-His	496	-4120.151911	1.073098

Gromacs

Gromacs is an abbreviation for 'GROningen Mixture of Alchemy and Childrens' Stories' and is a package to perform molecular dynamics. It is Free Software, available under the GNU General Public License.

Workflow

1. Use fetchpdb to get the pdb structure – look at the script, what does it do?

-> we did not use fetchpdb, as we already had the needed pdb-file. The script simply downloads a pdb-file for a given PDB-ID from
the PDB-webpage.

2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.

-> TODO

3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.

-> TODO

4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.

-> we chose the AMBER03 forcefield as it is optimized for the use with proteins

5. Create a MDP file with the following content

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.

-> DFLEXIBLE: water in the the topology is flexible
-> DPOSRES: position restraints, defined in posre.itp, are included in the topology
-> implicit_solvent = GBSA: defines the implicit solvent model, here the Generalized Born formalism is used
-> integrator = steep: defines the algorithm for energy minimization, here the steepest descent algorithm is used
-> emtol: defines, when convergence is assumed: minimization is stoppen when maximum force is smaller than this value
-> nsteps: defines the maximum number of steps in the energy minimization
-> nstenergy: frequency in which energies are written to the energy file
-> energygrps: group(s) to write to the energy file 
-> ns_type: defines type of Neighbor searching, here a grid is built and only atoms in neighboring grid cells are checked when a new neighbor list is constructed
-> coulombtype: Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb ≥ rlist.
-> rcoulomb: distance for the Coulomb cut-off
-> rvdw: distance for the LJ or Buckingham cut-off
-> constraints: defines additional constraints besides the ones defined in the topology file
-> pbc: Use no periodic boundary conditions, ignore the box.

6. Use grompp to prepare the system for gromacs: grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr FILE.tpr is the system file to create which we use in the next step

-> commandline: grompp -v -f gromacs_AMBER03.mdp -c gromacs_AMBER03.gro -p gromacs_AMBER03.top -o gromacs_AMBER03.tpr

7. Now we minimize the system: mdrun -v -deffnm FILE

-> commandline: mdrun -f -deffnm gromacs_AMBER03

8. Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.

Results

Mutation 1

The mutation is located at the metal-binding site of the protein. Only regarding this information, we would already suggests, that the mutation is non-neutral. However, we also want to integrate the other informations we have. The H-bond pattern changes and for SCWRL and minimise, the mutated protein is more stable - i.e. has more free energy - than the wild type protein. For FoldX, the wild type has more free energy. Summarizing, the location of the mutation and the changing H-bond pattern suggest, that this is a harmful mutation. The fold-changes in free energy are very low and thus give no striking evidence for one of the cases.
The mutation is indeed harmful.

Mutation 2

Mutation 2 is located near the active site and a substrate binding site in sequence as well as in structure. The H-bond pattern does not change, regarding the pymol mutagenesis analysis. Again the energy fold changes are very low and contradictory. SCWRL and minimise assign a destabilizing effect, wehereas FoldX assigns a stabilizing effect to the protein. As for mutation 1, we are again left to only consider ther location of the mutation and the H-bond pattern. However, in this case we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.

Mutation 3

Mutation 4

Mutation 5

Mutation 6

Mutation 7

Mutation 8

Mutation 9

Mutation 10

Structure-based mutation analysis ARSA

Contents

Preparation

Visualization with Pymol

SCRWL

FoldX

Minimise

Gromacs

Workflow

Results

Mutation 1

Mutation 2

Mutation 3

Mutation 4

Mutation 5

Mutation 6

Mutation 7

Mutation 8

Mutation 9

Mutation 10

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools