Structure-based mutation analysis ARSA
Contents
Preparation
Visualization with Pymol
The following image shows a pymol visualization of Arylsulfatase A, together with all known active and binding sites.
One can see, that the mutations are spread out through the protein. Some lie near functional sites, other are very distant from them. The table below shows again Pymol visualizations of all mutations, but each seperately. With this, we want to try to derive investigate the correlation of location of the mutation with respect to functional sites and its effect on the protein function.
Above visualizations indicate, that mutations near functional important sites of the protein are likely to cause a harmful effect. However, for distant mutations no trend can be observed.
SCRWL
First, we extracted the amino acid sequence from our pdb file and converted it to lower case.
repairPDB arsa_model.pdb -seq > arsa.model.seq
tr '[:upper:]' '[:lower:]' < arsa.model.seq > arsa.model.lower.seq
Next, we included the individual mutations as capital letters in seperate files and executed scwrl with the following command:
scwrl cmd
We also ran SCWRL on the wild type (wt) structure in order to make it comparable to energy predictions by other programs. The minimal energy of the graph for the wt is 415.134.
FoldX
wt: 510.88
Nr. | mutation | position | Minimal energy | Energy(mutant)/Energy(wt) | |
1 | Asp-Asn | 29 | 496.88 | 0.9725963 | |
2 | Pro - Ala | 136 | 493.81 | 0.966587 | |
3 | Gln-His | 153 | 493.90 | 0.9667632 | |
4 | Trp-Cys | 193 | 496.23 | 0.971324 | |
5 | Thr-Met | 274 | 503.34 | 0.9852412 | |
6 | Phe -Val | 356 | 495.39 | 0.9696798 | |
7 | Thr-Ile | 409 | 495.45 | 0.9697972 | |
8 | Asn-Ser | 440 | 496.79 | 0.9724201 | |
9 | Cys-Gly | 489 | 495.87 | 0.9706193 | |
10 | Arg-His | 496 | 498.75 | 0.9762567 |
Minimise
wt: -3839.492677
Nr. | mutation | position | Free energy | Energy(mutant)/Energy(wt) |
1 | Asp-Asn | 29 | -4174.487222 | 1.087250 |
2 | Pro - Ala | 136 | -4164.523707 | 1.084655 |
3 | Gln-His | 153 | -4109.12924 | 1.070227 |
4 | Trp-Cys | 193 | -4169.617285 | 1.085981 |
5 | Thr-Met | 274 | -4065.730562 | 1.058924 |
6 | Phe -Val | 356 | -4109.021939 | 1.070199 |
7 | Thr-Ile | 409 | -4121.130656 | 1.073353 |
8 | Asn-Ser | 440 | -4120.275589 | 1.073130 |
9 | Cys-Gly | 489 | -4127.969116 | 1.075134 |
10 | Arg-His | 496 | -4120.151911 | 1.073098 |
Gromacs
Gromacs is an abbreviation for 'GROningen Mixture of Alchemy and Childrens' Stories' and is a package to perform molecular dynamics. It is Free Software, available under the GNU General Public License.
Workflow
1. Use fetchpdb to get the pdb structure – look at the script, what does it do?
-> we did not use fetchpdb, as we already had the needed pdb-file. The script simply downloads a pdb-file for a given PDB-ID from the PDB-webpage.
2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.
-> TODO
3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.
-> TODO
4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.
-> we chose the AMBER03 forcefield as it is optimized for the use with proteins
5. Create a MDP file with the following content
title = PBSA minimization in vacuum cpp = /usr/bin/cpp define = -DFLEXIBLE -DPOSRES implicit_solvent = GBSA integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System ns_type = grid coulombtype = cut-off rcoulomb = 1.0 rvdw = 1.0 constraints = none pbc = no
Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.
-> DFLEXIBLE: water in the the topology is flexible -> DPOSRES: position restraints, defined in posre.itp, are included in the topology -> implicit_solvent = GBSA: defines the implicit solvent model, here the Generalized Born formalism is used -> integrator = steep: defines the algorithm for energy minimization, here the steepest descent algorithm is used -> emtol: defines, when convergence is assumed: minimization is stoppen when maximum force is smaller than this value -> nsteps: defines the maximum number of steps in the energy minimization -> nstenergy: frequency in which energies are written to the energy file -> energygrps: group(s) to write to the energy file -> ns_type: defines type of Neighbor searching, here a grid is built and only atoms in neighboring grid cells are checked when a new neighbor list is constructed -> coulombtype: Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb ≥ rlist. -> rcoulomb: distance for the Coulomb cut-off -> rvdw: distance for the LJ or Buckingham cut-off -> constraints: defines additional constraints besides the ones defined in the topology file -> pbc: Use no periodic boundary conditions, ignore the box.
6. Use grompp to prepare the system for gromacs: grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr FILE.tpr is the system file to create which we use in the next step
-> commandline: grompp -v -f gromacs_AMBER03.mdp -c gromacs_AMBER03.gro -p gromacs_AMBER03.top -o gromacs_AMBER03.tpr
7. Now we minimize the system: mdrun -v -deffnm FILE
-> commandline: mdrun -f -deffnm gromacs_AMBER03
8. Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.
Results
Mutation 1
The mutation is located at the metal-binding site of the protein. Only regarding this information, we would already suggests, that the mutation is non-neutral. However, we also want to integrate the other informations we have. The H-bond pattern changes and for SCWRL and FoldX, the mutated protein is more stable - i.e. has more free energy - than the wild type protein. For minimise, the wild type has more free energy. Summarizing, the location of the mutation and the changing H-bond pattern suggest, that this is a harmful mutation. The fold-changes in free energy are very low and thus give no striking evidence for one of the cases.
The mutation is indeed harmful.
Mutation 2
Mutation 2 is located near the active site and a substrate binding site in sequence as well as in structure. The H-bond pattern does not change, regarding the SCWRL mutagenesis analysis. Again the energy fold changes are very low and contradictory. SCWRL and FoldX assign a destabilizing effect, wehereas minimise assigns a stabilizing effect to the protein. As for mutation 1, we are again left to only consider ther location of the mutation and the H-bond pattern. However, in this case we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.
Mutation 3
The mutation is located near the active site and a substrate binding site in sequence as well as in structure. SCWRL and minimise predict again a slight stabilizing effect, whereas minimise predicts less free energy for the mutated protein. The SCWRL analysis predicts, that the H-bond pattern changes. Also here the changes in free energy predicted by the methods give no clear hint on the effect of the mutation. Again, we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.
Mutation 4
The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL and minimise predict a stabilizing effect and FoldX again a destabilizing. As the H-bond pattern changes it is likely that the local structure of the protein is altered. However, the mutation is not very near to any functional sites and thus this mutation might be harmless. But there is no striking evidence.
This mutation is taken from dbSNP and neutral.