^{by Benjamin Drexler and Fabian Grandke}

Introduction

In this task we analyse the structure of our protein to find out what effects the point mutations have. Therefore we created mutated structures and compared them to the wild-type protein. Several tools based on different methods have been used to achieve that aim. We used the mutations that we have chosen in the previous task.

Methods

In the first step of this task we had to find available protein structures for our protein and to decide which one would be the best for our detailed analysis. We set several cut-offs to exclude improper structures. The following tools have been used to perform the energy calulations. They were used as described in the task description.

SCWRL

SCWRL was initially developed by Dunbrack et al. in 1997. We use SCWRL4<ref name=dunb>G. G. Krivov, M. V. Shapovalov, and R. L. Dunbrack, Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins (2009)</ref> which was published in 2009. The program takes a PDB file and a sequence file as input. By usage of a rotamer library, collision detection, and a residue interaction graph the optimal side-chain conformation is calculated, based on the backbone and the mutated sequence given in the input files. The output is a PDB file containing the conformation and the total minimal energy of the graph in STDOUT.

FoldX

FoldX was developed by Serrano et al. in 2002<ref name=serr>Guerois R, Nielsen JE, Serrano L., Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutation. Journal of Molecular Biology (2002)</ref>. We used version FoldX 3.0 beta 4. The program provides the calculation of determination of energy effects of point mutations. It provides different run modes, but basically it takes a PDB file as input calculates several single energies(e.g. Van der Waals, Electrostatics, ...) and returns the single energies together with the total energy as output.

Minimise

Before this tool from the virtual box was used we had to remove the hydrogens and waters from the PDB file with the script repairPDB. Afterwards we were able to compare the energies differences between the wildtype and the mutated protein.

GROMACS

GROMACS is mostly a package to perform molecular dynamics, but it also provides energy calculations. For the mutations we used the forcefield AMBER03 and for the wildtype AMBER03, AMBERGS and CHARMM27. Additionally to the energy calculation task we did a runtime analysis with values from nsteps=10 to nsteps=1500. The results are shown in the results section of this task. According to the task description we created an MDP file with the following content:

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

Keyword	Describtion<ref name=manual>Gromacs Manual</ref>
General
title	Name of Project
cpp	Location of c-preprocessor
Preprocessing
define	Defines to pass to the preprocessor; -DFLEXIBLE:include flexible water in stead of rigid water into your topology; -DPOSRES: include posre.itp into your topology, used for position restraints
Implicit Solvent
implicit_solvent	Simulation with implicit solvent using the Generalized Born formalism
Run Control
integrator	Steepest descent algorithm for energy minimization
nsteps	Maximum number of steps to integrate or minimize
Energy minimization
emtol	Rhe minimization is converged when the maximum force is smaller than this value
Output
nstenergy	Frequency to write energies to energy file
Tables
energygrps	Group(s) to write to energy file
Neighbor searching
ns_type	Type of neighbor searching
pbc	Remove the periodicity (make molecule whole again)
Electrostatics
coulombtype	Type of coulomb energy
rcoulomb	Distance for the Coulomb cut-off
VDW
rvdw	distance for the LJ or Buckingham cut-off
Bonds
constraints	Which constraints should be used

Within the GROMACS work step we used the script fetchpdb. It checks if the given input is a valid PDB entry. If the check was successful it downloads the PDB file, extracts it and removes the packed version.

Results

Structure Selection

There are several structure files available for our protein:

PDB ID	Resolution [Å]	ph-Value	R-Factor	Coverage [%]	Missing Residues
1R46	3.25	8.0	0.262	99.7	422-429
1R47	3.45	8.0	0.285	99.5	422-429
3GXN	3.01	NULL	0.239	88.08	422-429
3GXP	2.20	NULL	0.204	81.9	422-429
3GXT	2.70	NULL	0.245	97.29	422-429
3HG2	2.30	4.6	0.178	97.32	422-429
3HG3	1.90	6.5	0.167	98.64	427-435
3HG4	2.30	4.6	0.166	99.86	422-429
3HG5	2.30	4.6	0.192	100	422-429
3LX9	2.04	6.5	0.178	98.92	423-435
3LXA	3.04	6.5	0.216	99.52	427-435
3LXB	2.85	6.5	0.227	99.3	427-435
3LXC	2.35	6.5	0.186	98.31	423-435

We set two cutoffs to decide which structures are excluded:

ph-value: < 6.5
resolution: > 2.7

After we applied the cutoffs to our set of structures three were left (exclusion factors are colored red in the table). One of them was slightly better than the other ones so we decided to use 3HG3 (worse values are colored gray in the table). Additionally 3GH3 has the best overall resolution and R-factor (colored green). As the missing residues are very similar for all structures they are not further taken into account.

Visual Examination of the Mutations

Figure 1 shows the protein α-galactosidase A and the residues which will be mutated. In the following sections, we compare the side chain conformation of the mutated residues and discuss the influence of the mutation. Aspects will be, inter alia, loss of polar interactions and clashes with other residues.

Figure 1: Representation of the protein α-galactosidase A. The residues which will be mutated are colored red. Asp170 and A231 are part of the active site and colored blue. The ligand is colored green.

M42T (Mutation 1)

Close-up of the wildtype residue number 42 of GLA.

Close-up of the mutated residue number 42 of GLA.

S65T (Mutation 2)

Close-up of the wildtype residue number 65 of GLA.

Close-up of the mutated residue number 65 of GLA.

I117S (Mutation 3)

Close-up of the wildtype residue number 117 of GLA.

Close-up of the mutated residue number 117 of GLA.

A143T (Mutation 4)

Close-up of the wildtype residue number 143 of GLA.

Close-up of the mutated residue number 143 of GLA.

H186R (Mutation 5)

Close-up of the wildtype residue number 186 of GLA.

Close-up of the mutated residue number 186 of GLA.

P205T (Mutation 6)

Close-up of the wildtype residue number 205 of GLA.

Close-up of the mutated residue number 205 of GLA.

D244H (Mutation 7)

Close-up of the wildtype residue number 244 of GLA.

Close-up of the mutated residue number 244 of GLA.

Q283P (Mutation 8)

Close-up of the wildtype residue number 283 of GLA.

Close-up of the mutated residue number 283 of GLA.

Q321E (Mutation 9)

Close-up of the wildtype residue number 321 of GLA.

Close-up of the mutated residue number 321 of GLA.

R363C (Mutation 10)

Close-up of the wildtype residue number 363 of GLA.

Close-up of the mutated residue number 363 of GLA.

Energy Comparison

The results of the energy comparison are presented in the table below. Due to the fact that the result of the first run of the eighth mutation clearly differed from the other results, the run was repeated with the outcome from the first run as input. Thus, there is the number 8.2. This observation shows that minimise has a decreased tolerance for clashes in comparison to the other tools. Their results for the eighth run are not outstanding and seem not to be affected by the fact that a proline was inserted into a helix. Furthermore, their results seem to be almost equally with respect to some variance. Only the comparison of the FoldX results of the mutations with the wildtype show, that the inserted mutations have a huge influence on the energy of the protein.

Number	AA-Position	Codon change	Amino acid change	SCWRL4	FoldX	FoldX - Diff	Minimise	Minimise - Diff
WT				-	-20.93	-	-20481.23	-
1	42	ATG-ACG	Met -> Thr	343.25	157.29	-178.22	-20324.41	-156.82
2	65	AGT-ACG	Ser -> Thr	327.798	152.87	-173.8	-20339.34	-141.89
3	117	ATT-AGT	Ile -> Ser	333.027	157.97	-178.9	-20353.47	-127.76
4	143	cGCA-ACA	Ala -> Thr	333.944	154.40	-175.33	-20339.32	-141.91
5	186	CAC-CGC	His -> Arg	323.717	154.57	-175.5	-20321.32	-159.91
6	205	gCCT-ACT	Pro -> Thr	340.619	155.96	-176.89	-20345.87	-135.36
7	244	gGAC-CAC	Asp -> His	333.594	152.08	-173.01	-20393.12	-88.11
8	283	CAG-CCG	Gln -> Pro	332.631	159.91	-180.84	-8027.71	-12453.52
8.2	-	-	-	-	-	-	-19134.48	-1346,95
9	321	tCAG-TAG	Gln -> Glu	332.853	160.95	-181.88	-20246.98	-234.25
10	363	TATa-TAA	Arg -> Cys	330.56	150.50	-171.43	-20295.77	-185.46

Gromacs

Figure 11: nstep vs. Elapsed Time in Gromacs.

Wildtype

Force Field	Average	Error Estimat	RMSD	Tot-Drift (kJ/mol)
Bond
AMBERGS	1826.99	420	4409.39	-2499.37
AMBER03	1639.74	410	4358.68	-2424.42
CHARMM27	2908.14	350	4779.8	-2033.44
Angle
AMBERGS	5496.47	74	476.18	408.548
AMBER03	5324.13	72	469.75	369.24
CHARMM27	7975.2	86	798.12	432.901
Potential
AMBERGS	-114713	1200	5648.79	-7915.46
AMBER03	-91307.7	1200	5559.82	-7839.05
CHARMM27	136.699	32	64.3892	227.896

Mutations

Force Field	Average	Error Estimat	RMSD	Tot/Drift
Bond
1	1815.39	570	5166.85	-3384.48
2	1862.77	610	5331.85	-3618.04
3	1773.13	520	4937.34	-3068.93
4	1828.63	580	5229.18	-3479.09
5	1870.95	610	5361.67	-3713.22
6	1816.6	550	5091.81	-3303.34
7	1819.7	570	5173.34	-3397.07
8	2992.15	1700	-nan	-10631.8
9	2083.16	830	-nan	-4913.82
10	1867.42	620	5390.82	-3693.03
Angle
1	5183.95	85	360.959	550.303
2	5195.33	80	364.473	515.645
3	5196.5	89	353.256	586.473
4	5175.59	85	364.496	547.465
5	5113.99	81	365.511	526.244
6	5200.44	85	356.964	553.934
7	5261.77	87	365.202	565.196
8	5178.73	76	-nan	215.036
9	5201.95	76	-nan	442.141
10	5174.48	88	375.775	555.294
Potential
1	-90528.4	1600	7234.09	-10149.1
2	-90481.9	1600	7442.03	-10340
3	-90654	1500	6928.73	-9614.54
4	-90541	1600	7311.04	-10343.7
5	-91011.7	1600	7484.45	-10592.5
6	-90782.2	1600	7226.99	-10188.5
7	-90232.9	1600	7236.24	-10198
8	-87316	3600	-nan	-23670.3
9	-90090.3	1900	-nan	-12335.3
10	-89721.8	1700	7523.88	-10750.1

Mutation	Plot
1	Gromacs energy calculation for mutation 1.
2	Gromacs energy calculation for mutation 2.
3	Gromacs energy calculation for mutation 3.
4	Gromacs energy calculation for mutation 4.
5	Gromacs energy calculation for mutation 5.
6	Gromacs energy calculation for mutation 6.
7	Gromacs energy calculation for mutation 7.
8	Gromacs energy calculation for mutation 8.
9	Gromacs energy calculation for mutation 9.
10	Gromacs energy calculation for mutation 10.