Structure-based mutation analysis ARSA

The structures 2AIJ and 2AIK can be eliminated immediately as they only resolve a very small part of the enzyme. Structures 1E1Z, 1E2S, 1E33, 1E3C, 1N2K and 1N2L show high resolutions and very low R-Factors. However, these are all mutant structure and therefore not applicable to our analysis as we need the wild type structure.
The only structure left is 1AUK, which we also used in previous TASKs. This structure has a good resoultion and a sufficiently low R-Factor. But, as we already noticed in previous TASKs, this structure unfortunately contains six missing residues in the middle of the protein, so we could not take an original structure for the subsequent analyses. Thus it first had to be modified with programs - written by Marc Offman - to model the missing residues in the structure. We use this structure for the subsequent analyses.

Visualization with Pymol

The following image shows a pymol visualization of Arylsulfatase A, together with all known active and binding sites.

Pymol visualization of ARSA (with closed gaps). The active site is depicted in yellow, metal-binding site in blue, substrate binding sites in green and missense mutations in red.

One can see, that the mutations are spread out through the protein. Some lie near functional sites, others are very distant from them. The table below shows again Pymol visualizations of all mutations, but each seperately. With this, we want to try to investigate the correlation of location of the mutation with respect to functional sites and it's effect on the protein function.

Nr.	mutation	position	Pymol image	Description	Effect on function
1	Asp-Asn	29		The mutation is located at the position of a metal-binding site.	harmful
2	Pro-Ala	136		The mutation is located near the active site and a substrate binding site in sequence as well as in structure.	harmful
3	Gln-His	153		The mutation is located near the active site and a substrate binding site in sequence as well as in structure.	harmful
4	Trp-Cys	193		The mutation is at moderate distance to all important functional sites of the protein.	neutral
5	Thr-Met	274		The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure. It is located within a beta sheet.	harmful
6	Phe-Val	356		The mutation is at moderate distance to all important functional sites of the protein.	neutral
7	Thr-Ile	409		The mutation is not close to important functional sites.	harmful
8	Asn-Ser	440		The mutation is very distant from all functional sites.	neutral
9	Cys-Gly	489		The mutation is very distant from all functional sites.	harmful
10	Arg-His	496		The mutation is very distant from all functional sites.	harmful

Above visualizations indicate, that mutations near functional important sites of the protein are likely to cause a harmful effect. However, for distant mutations no trend can be observed.

SCRWL

First, we extracted the amino acid sequence from our pdb file and converted it to lower case.


repairPDB arsa_model.pdb -seq > arsa.model.seq
tr '[:upper:]' '[:lower:]' < arsa.model.seq > arsa.model.lower.seq

Next, we included the individual mutations as capital letters in seperate files and executed scwrl with the following command:


scwrl cmd

We also ran SCWRL on the wild type (wt) structure in order to make it comparable to energy predictions by other programs. The minimal energy of the graph for the wt is 415.134.

Nr.	mutation	position	Reference amino acid	mutated amino acid	both (without H-bonds)	Minimal energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29				419.996	1.011712
2	Pro-Ala	136				415.133	0.9999976
3	Gln-His	153				420.494	1.012911
4	Trp-Cys	193				416.252	1.002693
5	Thr-Met	274				434.014	1.045479
6	Phe-Val	356				415.134	1
7	Thr-Ile	409				414.481	0.998427
8	Asn-Ser	440				418.863	1.008983
9	Cys-Gly	489				415.136	1.000005
10	Arg-His	496				421.011	1.014157

FoldX

wt: 510.88

Nr.	mutation	position	Minimal energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29	496.88	0.9725963
2	Pro - Ala	136	493.81	0.966587
3	Gln-His	153	493.90	0.9667632
4	Trp-Cys	193	496.23	0.971324
5	Thr-Met	274	503.34	0.9852412
6	Phe -Val	356	495.39	0.9696798
7	Thr-Ile	409	495.45	0.9697972
8	Asn-Ser	440	496.79	0.9724201
9	Cys-Gly	489	495.87	0.9706193
10	Arg-His	496	498.75	0.9762567

Minimise

wt: -3839.492677

Nr.	mutation	position	Free energy	Energy(mutant)/Energy(wt)
1	Asp-Asn	29	-4174.487222	1.087250
2	Pro - Ala	136	-4164.523707	1.084655
3	Gln-His	153	-4109.12924	1.070227
4	Trp-Cys	193	-4169.617285	1.085981
5	Thr-Met	274	-4065.730562	1.058924
6	Phe -Val	356	-4109.021939	1.070199
7	Thr-Ile	409	-4121.130656	1.073353
8	Asn-Ser	440	-4120.275589	1.073130
9	Cys-Gly	489	-4127.969116	1.075134
10	Arg-His	496	-4120.151911	1.073098

Gromacs

Gromacs is an abbreviation for 'GROningen Mixture of Alchemy and Childrens' Stories' and is a package to perform molecular dynamics. It is Free Software, available under the GNU General Public License.

Workflow

1. Use fetchpdb to get the pdb structure – look at the script, what does it do?

-> we did not use fetchpdb, as we already had the needed pdb-file. The script simply downloads a pdb-file for a given PDB-ID from
the PDB-webpage.

2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.

-> The following options are available:
 -offset value     offset the residue numbering
 -chain char       change Chain ID
 -ratom            renumber Atoms
 -rres             renumber Residues
 -noh              remove hydrogens
 -het              do not change HETATM to ATOM for AA
 -seq              protein sequence from AA
 -seqrs            protein sequence from SEQRES entries
 -nosol            just Protein OR
 -ssw cutoff       print only waters with B-value below cutoff OR
 -cleansol         remove overlapping solvent for GROMACS

We ran repairPDB with the following command:

repairPDB ARSA.pdb -noh -nosol > ARSA_clean.pdb

3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.

We ran SCWRL with the following command:

scwrl -i ARSA.pdb -s extractedPDB.seq -o ARSA_scwrl.pdb

SCWRL returned a pdb-file that includes HETATOMS. These solvent atoms needed to be removed before continuing.

4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.

-> we chose the AMBER03 forcefield as it is optimized for the use with proteins

5. Create a MDP file with the following content

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.

-> DFLEXIBLE: water in the the topology is flexible
-> DPOSRES: position restraints, defined in posre.itp, are included in the topology
-> implicit_solvent = GBSA: defines the implicit solvent model, here the Generalized Born formalism is used
-> integrator = steep: defines the algorithm for energy minimization, here the steepest descent algorithm is used
-> emtol: defines, when convergence is assumed: minimization is stoppen when maximum force is smaller than this value
-> nsteps: defines the maximum number of steps in the energy minimization
-> nstenergy: frequency in which energies are written to the energy file
-> energygrps: group(s) to write to the energy file 
-> ns_type: defines type of Neighbor searching, here a grid is built and only atoms in neighboring grid cells are checked when a new neighbor list is constructed
-> coulombtype: Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb ≥ rlist.
-> rcoulomb: distance for the Coulomb cut-off
-> rvdw: distance for the LJ or Buckingham cut-off
-> constraints: defines additional constraints besides the ones defined in the topology file
-> pbc: Use no periodic boundary conditions, ignore the box.

6. Use grompp to prepare the system for gromacs: grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr FILE.tpr is the system file to create which we use in the next step

-> commandline: grompp -v -f gromacs_AMBER03.mdp -c gromacs_AMBER03.gro -p gromacs_AMBER03.top -o gromacs_AMBER03.tpr

7. Now we minimize the system: mdrun -v -deffnm FILE

-> commandline: mdrun -f -deffnm gromacs_AMBER03

8. Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.

Results

in this section, we give a short summary of the analyses performed above and - based on these results - conclude for each mutation if we can assign a neutral or non-neutral effect.

Mutation 1

The mutation is located at the metal-binding site of the protein. Only regarding this information, we would already suggests, that the mutation is non-neutral. However, we also want to integrate the other informations we have. The H-bond pattern changes and for SCWRL and FoldX, the mutated protein is more stable - i.e. has more free energy - than the wild type protein. For minimise, the wild type has more free energy. Summarizing, the location of the mutation and the changing H-bond pattern suggest, that this is a harmful mutation. The fold-changes in free energy are very low and thus give no striking evidence for one of the cases.
The mutation is indeed harmful.

Mutation 2

Mutation 2 is located near the active site and a substrate binding site in sequence as well as in structure. The H-bond pattern does not change, regarding the SCWRL mutagenesis analysis. Again the energy fold changes are very low and contradictory. SCWRL and FoldX assign a destabilizing effect, wehereas minimise assigns a stabilizing effect to the protein. As for mutation 1, we are again left to only consider ther location of the mutation and the H-bond pattern. However, in this case we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.

Mutation 3

The mutation is located near the active site and a substrate binding site in sequence as well as in structure. SCWRL and minimise predict again a slight stabilizing effect, whereas minimise predicts less free energy for the mutated protein. The SCWRL analysis predicts, that the H-bond pattern changes. Also here the changes in free energy predicted by the methods give no clear hint on the effect of the mutation. Again, we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.

Mutation 4

The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL and minimise predict a stabilizing effect and FoldX again a destabilizing. As the H-bond pattern changes it is likely that the local structure of the protein is altered. However, the mutation is not very near to any functional sites and thus this mutation might be harmless. But there is no striking evidence.
This mutation is taken from dbSNP and neutral.

Mutation 5

The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure. It is located within a beta sheet and the SCWRL analysis shows that the H-bond pattern changes in the mutated protein. As this mutation is located within a secondary structure element and changing the H-bond pattern it could disrupt the beta sheets and thus alter the structure of the whole protein.
The free energy predictions are aagin contradictory. SCWRL and minimise predict a stabilising effect, while FoldX predicts a destabilizing effect. As stated above, the mutation could disrupt the beta sheet structure and have therefore a high impact on the overall structure. Thus the mutation might be harmful.
HGMD assigns a deleterious effect to the mutation.

Mutation 6

The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL assigns exactly the same free energy to the mutated protein and to the wild type. minimise assigns a destabilising effect and FoldX a stabilizing. Again this information is not very useful and the only slightly informative facts are the location and the changing H-bond pattern, which do not give evidence for neutrality or non-neutrality.
The mutation is taken from dbSNP and has a neutral effect.

Mutation 7

The mutation is not close to important functional sites and SCWRL predicts, that the H-bond pattern is altered in the mutated protein. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again we have not a clear hint on wether this mutation could be harmful or not.
The mutation is taken from HGMD and is disease causing.

Mutation 8

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changes. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. These informations give no clear hint on wether this mutation could be harmful or not.
The mutation is taken from dbSNP and is neutral.

Mutation 9

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern does not change. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).

Mutation 10

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changea. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).

Summary

Compared to the sequence-based mutation analysis, we had much more difficulties to discriminate neutral from non-neutral mutations using only structural features of the protein. In most cases we were even left to guess into the blue, because of a lack of reliable information giving clear evidence. The most informative facts for this structure-based mutation analysis were the H-bond patterns predicted by SCWRL and the analysis of the location of the mutation with respect to functional important sites.
Surprisingly the analysis of the free energies of the mutated proteins compared to the wild type revealed only very slight fold changes and we did not know how to interpret the impact of these on the structure and thus on the function. Moreover, these fold-changes were not consistent in predicting (de-)stabilizing effects across all methods. Whereas minimise predicted destabilising effects, FoldX predicted stabilizing effects for all mutations. This fact made these analyses quite useless for us.
However, some informations of the structure-based analysis are informative and are useful to dissect neutral from non-neutral mutations, if they are combined with additional methods - e.g. from the sequence-based mutation analysis.

ID	exp. method	resolution in Å	positions	R-factor
1AUK	X-ray	2.10	19-507	0.248
1E1Z	X-ray	2.40	19-507	0.196
1E2S	X-ray	2.35	19-507	0.194
1E33	X-ray	2.50	19-507	0.187
1E3C	X-ray	2.65	19-507	0.174
1N2K	X-ray	2.75	19-507	0.202
1N2L	X-ray	3.20	19-507	0.182
2AIJ	X-ray	1.55	69-73	0.146
2AIK	X-ray	1.73	68-74	0.142

Structure-based mutation analysis ARSA

Contents

Preparation