Structure-based mutation analysis ARSA
For the upcoming analyses in this TASK, a structure fulfilling the following requirements should be selected:
- The resolution (in Å) should be sufficiently high, i.e. have a small value
- The structure should be ideally have been resolved at physiological pH (7.4)
- It should have a small R-factor. The R-factor is a measure to determine the reliability of a crystal structure.
PDB contains the following nine entries of the structure of ARSA:
|ID||exp. method||resolution in Å||positions||R-factor|
The structures 2AIJ and 2AIK can be eliminated immediately as they only resolve a very small part of the enzyme. Structures 1E1Z, 1E2S, 1E33, 1E3C, 1N2K and 1N2L show high resolutions and very low R-Factors. However, these are all mutant structure and therefore not applicable to our analysis as we need the wild type structure.
The only structure left is 1AUK, which we also used in previous TASKs. This structure has a good resoultion and a sufficiently low R-Factor. But, as we already noticed in previous TASKs, this structure unfortunately contains six missing residues in the middle of the protein, so we could not take an original structure for the subsequent analyses. Thus it first had to be modified with programs - written by Marc Offman - to model the missing residues in the structure. We use this structure for the subsequent analyses.
Visualization with Pymol
The following image shows a pymol visualization of Arylsulfatase A, together with all known active and binding sites.
One can see, that the mutations are spread out through the protein. Some lie near functional sites, others are very distant from them. The table below shows again Pymol visualizations of all mutations, but each seperately. With this, we want to try to investigate the correlation of location of the mutation with respect to functional sites and it's effect on the protein function.
Above visualizations indicate, that mutations near functional important sites of the protein are likely to cause a harmful effect. However, for distant mutations no trend can be observed.
First, we extracted the amino acid sequence from our pdb file and converted it to lower case.
repairPDB arsa_model.pdb -seq > arsa.model.seq tr '[:upper:]' '[:lower:]' < arsa.model.seq > arsa.model.lower.seq
Next, we included the individual mutations as capital letters in seperate files and executed scwrl with the following command:
We also ran SCWRL on the wild type (wt) structure in order to make it comparable to energy predictions by other programs. The minimal energy of the graph for the wt is 415.134.
|2||Pro - Ala||136||493.81||0.966587|
|2||Pro - Ala||136||-4164.523707||1.084655|
Gromacs is an abbreviation for 'GROningen Mixture of Alchemy and Childrens' Stories' and is a package to perform molecular dynamics. It is Free Software, available under the GNU General Public License.
1. Use fetchpdb to get the pdb structure – look at the script, what does it do?
-> we did not use fetchpdb, as we already had the needed pdb-file. The script simply downloads a pdb-file for a given PDB-ID from the PDB-webpage.
2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.
-> The following options are available: -offset value offset the residue numbering -chain char change Chain ID -ratom renumber Atoms -rres renumber Residues -noh remove hydrogens -het do not change HETATM to ATOM for AA -seq protein sequence from AA -seqrs protein sequence from SEQRES entries -nosol just Protein OR -ssw cutoff print only waters with B-value below cutoff OR -cleansol remove overlapping solvent for GROMACS
We ran repairPDB with the following command:
repairPDB ARSA.pdb -noh -nosol > ARSA_clean.pdb
3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.
We ran SCWRL with the following command:
scwrl -i ARSA.pdb -s extractedPDB.seq -o ARSA_scwrl.pdb
SCWRL returned a pdb-file that includes HETATOMS. These solvent atoms needed to be removed before continuing.
4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.
-> we chose the AMBER03 forcefield as it is optimized for the use with proteins
5. Create a MDP file with the following content
title = PBSA minimization in vacuum cpp = /usr/bin/cpp define = -DFLEXIBLE -DPOSRES implicit_solvent = GBSA integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System ns_type = grid coulombtype = cut-off rcoulomb = 1.0 rvdw = 1.0 constraints = none pbc = no
Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.
-> DFLEXIBLE: water in the the topology is flexible -> DPOSRES: position restraints, defined in posre.itp, are included in the topology -> implicit_solvent = GBSA: defines the implicit solvent model, here the Generalized Born formalism is used -> integrator = steep: defines the algorithm for energy minimization, here the steepest descent algorithm is used -> emtol: defines, when convergence is assumed: minimization is stoppen when maximum force is smaller than this value -> nsteps: defines the maximum number of steps in the energy minimization -> nstenergy: frequency in which energies are written to the energy file -> energygrps: group(s) to write to the energy file -> ns_type: defines type of Neighbor searching, here a grid is built and only atoms in neighboring grid cells are checked when a new neighbor list is constructed -> coulombtype: Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb ≥ rlist. -> rcoulomb: distance for the Coulomb cut-off -> rvdw: distance for the LJ or Buckingham cut-off -> constraints: defines additional constraints besides the ones defined in the topology file -> pbc: Use no periodic boundary conditions, ignore the box.
6. Use grompp to prepare the system for gromacs: grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr FILE.tpr is the system file to create which we use in the next step
-> commandline: grompp -v -f gromacs_AMBER03.mdp -c gromacs_AMBER03.gro -p gromacs_AMBER03.top -o gromacs_AMBER03.tpr
7. Now we minimize the system: mdrun -v -deffnm FILE
-> commandline: mdrun -f -deffnm gromacs_AMBER03
8. Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.
in this section, we give a short summary of the analyses performed above and - based on these results - conclude for each mutation if we can assign a neutral or non-neutral effect.
The mutation is located at the metal-binding site of the protein. Only regarding this information, we would already suggests, that the mutation is non-neutral. However, we also want to integrate the other informations we have. The H-bond pattern changes and for SCWRL and FoldX, the mutated protein is more stable - i.e. has more free energy - than the wild type protein. For minimise, the wild type has more free energy. Summarizing, the location of the mutation and the changing H-bond pattern suggest, that this is a harmful mutation. The fold-changes in free energy are very low and thus give no striking evidence for one of the cases.
The mutation is indeed harmful.
Mutation 2 is located near the active site and a substrate binding site in sequence as well as in structure. The H-bond pattern does not change, regarding the SCWRL mutagenesis analysis. Again the energy fold changes are very low and contradictory. SCWRL and FoldX assign a destabilizing effect, wehereas minimise assigns a stabilizing effect to the protein. As for mutation 1, we are again left to only consider ther location of the mutation and the H-bond pattern. However, in this case we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.
The mutation is located near the active site and a substrate binding site in sequence as well as in structure. SCWRL and minimise predict again a slight stabilizing effect, whereas minimise predicts less free energy for the mutated protein. The SCWRL analysis predicts, that the H-bond pattern changes. Also here the changes in free energy predicted by the methods give no clear hint on the effect of the mutation. Again, we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.
The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL and minimise predict a stabilizing effect and FoldX again a destabilizing. As the H-bond pattern changes it is likely that the local structure of the protein is altered. However, the mutation is not very near to any functional sites and thus this mutation might be harmless. But there is no striking evidence.
This mutation is taken from dbSNP and neutral.
The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure. It is located within a beta sheet and the SCWRL analysis shows that the H-bond pattern changes in the mutated protein. As this mutation is located within a secondary structure element and changing the H-bond pattern it could disrupt the beta sheets and thus alter the structure of the whole protein.
The free energy predictions are aagin contradictory. SCWRL and minimise predict a stabilising effect, while FoldX predicts a destabilizing effect. As stated above, the mutation could disrupt the beta sheet structure and have therefore a high impact on the overall structure. Thus the mutation might be harmful.
HGMD assigns a deleterious effect to the mutation.
The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL assigns exactly the same free energy to the mutated protein and to the wild type. minimise assigns a destabilising effect and FoldX a stabilizing. Again this information is not very useful and the only slightly informative facts are the location and the changing H-bond pattern, which do not give evidence for neutrality or non-neutrality.
The mutation is taken from dbSNP and has a neutral effect.
The mutation is not close to important functional sites and SCWRL predicts, that the H-bond pattern is altered in the mutated protein. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again we have not a clear hint on wether this mutation could be harmful or not.
The mutation is taken from HGMD and is disease causing.
The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changes. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. These informations give no clear hint on wether this mutation could be harmful or not.
The mutation is taken from dbSNP and is neutral.
The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern does not change. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).
The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changea. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).
Compared to the sequence-based mutation analysis, we had much more difficulties to discriminate neutral from non-neutral mutations using only structural features of the protein. In most cases we were even left to guess into the blue, because of a lack of reliable information giving clear evidence. The most informative facts for this structure-based mutation analysis were the H-bond patterns predicted by SCWRL and the analysis of the location of the mutation with respect to functional important sites.
Surprisingly the analysis of the free energies of the mutated proteins compared to the wild type revealed only very slight fold changes and we did not know how to interpret the impact of these on the structure and thus on the function. Moreover, these fold-changes were not consistent in predicting (de-)stabilizing effects across all methods. Whereas minimise predicted destabilising effects, FoldX predicted stabilizing effects for all mutations. This fact made these analyses quite useless for us.
However, some informations of the structure-based analysis are informative and are useful to dissect neutral from non-neutral mutations, if they are combined with additional methods - e.g. from the sequence-based mutation analysis.