Structure-based mutation analysis ARSA

From Bioinformatikpedia
Revision as of 09:46, 23 August 2011 by Zacher (talk | contribs) (Preparation)

Preparation

For the upcoming analyses in this TASK, a structure fulfilling the following requirements should be selected:

  • The resolution (in Å) should be sufficiently high, i.e. have a small value
  • The structure should be ideally have been resolved at physiological pH (7.4)
  • It should have a small R-factor. The R-factor is a measure to determine the reliability of a crystal structure.

PDB contains the following nine entries of the structure of ARSA:

ID exp. method resolution in Å positions R-factor
1AUK X-ray 2.10 19-507 0.248
1E1Z X-ray 2.40 19-507 0.196
1E2S X-ray 2.35 19-507 0.194
1E33 X-ray 2.50 19-507 0.187
1E3C X-ray 2.65 19-507 0.174
1N2K X-ray 2.75 19-507 0.202
1N2L X-ray 3.20 19-507 0.182
2AIJ X-ray 1.55 69-73 0.146
2AIK X-ray 1.73 68-74 0.142

The following table presents the PDB structures for BCKDHA to date. For each structure the resolution in [Å], the R-factor, coverage and pH-value are listed as well.

As all of our structures had missing residues in the middle of the protein we could not take an original structure for the subsequent analyses.

Furthermode, we could not use any of the PDB structures for BCKDHA because all of them had gaps in the secondary structure which means that some residues were missing. So we took the structure which has the less gaps: 1U5B

   * resultion: 1.83
   * R-factor: 0.156
   * ph-value: 5.8 

This structure has to be modified with some programms to close the gaps. Additionally the first residues which are in BCKDHA misses in 1U5B thats why the start position corresponds to position 6 of the BCKDHA -PDB sequence.

Visualization with Pymol

The following image shows a pymol visualization of Arylsulfatase A, together with all known active and binding sites.

Pymol visualization of ARSA (with closed gaps). The active site is depicted in yellow, metal-binding site in blue, substrate binding sites in green and missense mutations in red.

One can see, that the mutations are spread out through the protein. Some lie near functional sites, other are very distant from them. The table below shows again Pymol visualizations of all mutations, but each seperately. With this, we want to try to derive investigate the correlation of location of the mutation with respect to functional sites and its effect on the protein function.

Nr. mutation position Pymol image Description Effect on function
1 Asp-Asn 29 Arsa29.png The mutation is located at the position of a metal-binding site. harmful
2 Pro - Ala 136 Arsa136.png The mutation is located near the active site and a substrate binding site in sequence as well as in structure. harmful
3 Gln-His 153 Arsa153.png The mutation is located near the active site and a substrate binding site in sequence as well as in structure. harmful
4 Trp-Cys 193 Arsa193.png The mutation is at moderate distance to all important functional sites of the protein. neutral
5 Thr-Met 274 Arsa274.png The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure.
It is located within a beta sheet.
harmful
6 Phe -Val 356 Arsa356.png The mutation is at moderate distance to all important functional sites of the protein. neutral
7 Thr-Ile 409 Arsa409.png The mutation is not close to important functional sites. harmful
8 Asn-Ser 440 Arsa440.png The mutation is very distant from all functional sites. neutral
9 Cys-Gly 489 Arsa489.png The mutation is very distant from all functional sites. harmful
10 Arg-His 496 Arsa496.png The mutation is very distant from all functional sites. harmful

Above visualizations indicate, that mutations near functional important sites of the protein are likely to cause a harmful effect. However, for distant mutations no trend can be observed.

SCRWL

First, we extracted the amino acid sequence from our pdb file and converted it to lower case.


repairPDB arsa_model.pdb -seq > arsa.model.seq
tr '[:upper:]' '[:lower:]' < arsa.model.seq > arsa.model.lower.seq

Next, we included the individual mutations as capital letters in seperate files and executed scwrl with the following command:


scwrl cmd

We also ran SCWRL on the wild type (wt) structure in order to make it comparable to energy predictions by other programs. The minimal energy of the graph for the wt is 415.134.

Nr. mutation position Reference amino acid mutated amino acid both (without H-bonds) Minimal energy Energy(mutant)/Energy(wt)
1 Asp-Asn 29 Arsa hydro29.png Arsa scwrl hydro29.png Arsa both29.png 419.996 1.011712
2 Pro - Ala 136 Arsa hydro136.png Arsa scwrl hydro136.png Arsa both136.png 415.133 0.9999976
3 Gln-His 153 Arsa hydro153.png Arsa scwrl hydro153.png Arsa both153.png 420.494 1.012911
4 Trp-Cys 193 Arsa hydro193.png Arsa scwrl hydro193.png Arsa both193.png 416.252 1.002693
5 Thr-Met 274 Arsa hydro274.png Arsa scwrl hydro274.png Arsa both274.png 434.014 1.045479
6 Phe -Val 356 Arsa hydro356.png Arsa scwrl hydro356.png Arsa both356.png 415.134 1
7 Thr-Ile 409 Arsa hydro409.png Arsa scwrl hydro409.png Arsa both409.png 414.481 0.998427
8 Asn-Ser 440 Arsa hydro440.png Arsa scwrl hydro440.png Arsa both440.png 418.863 1.008983
9 Cys-Gly 489 Arsa hydro489.png Arsa scwrl hydro489.png Arsa both489.png 415.136 1.000005
10 Arg-His 496 Arsa hydro496.png Arsa scwrl hydro496.png Arsa both496.png 421.011 1.014157
FoldX

wt: 510.88

Nr. mutation position Minimal energy Energy(mutant)/Energy(wt)
1 Asp-Asn 29 496.88 0.9725963
2 Pro - Ala 136 493.81 0.966587
3 Gln-His 153 493.90 0.9667632
4 Trp-Cys 193 496.23 0.971324
5 Thr-Met 274 503.34 0.9852412
6 Phe -Val 356 495.39 0.9696798
7 Thr-Ile 409 495.45 0.9697972
8 Asn-Ser 440 496.79 0.9724201
9 Cys-Gly 489 495.87 0.9706193
10 Arg-His 496 498.75 0.9762567

Minimise

wt: -3839.492677

Nr. mutation position Free energy Energy(mutant)/Energy(wt)
1 Asp-Asn 29 -4174.487222 1.087250
2 Pro - Ala 136 -4164.523707 1.084655
3 Gln-His 153 -4109.12924 1.070227
4 Trp-Cys 193 -4169.617285 1.085981
5 Thr-Met 274 -4065.730562 1.058924
6 Phe -Val 356 -4109.021939 1.070199
7 Thr-Ile 409 -4121.130656 1.073353
8 Asn-Ser 440 -4120.275589 1.073130
9 Cys-Gly 489 -4127.969116 1.075134
10 Arg-His 496 -4120.151911 1.073098


Gromacs

Gromacs is an abbreviation for 'GROningen Mixture of Alchemy and Childrens' Stories' and is a package to perform molecular dynamics. It is Free Software, available under the GNU General Public License.

Workflow

1. Use fetchpdb to get the pdb structure – look at the script, what does it do?

-> we did not use fetchpdb, as we already had the needed pdb-file. The script simply downloads a pdb-file for a given PDB-ID from
the PDB-webpage.


2. Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.

-> TODO


3. Run SCWRL with the lowercase protein sequence to make sure there are no missing sidechains. The sequence can be extracted using repairPDB. After SCWRL you will have to remove the hydrogens again.

-> TODO


4. Use the gromacs command “pdb2gmx” the option –f defines the input structure, -o the gromacs outputfile (.gro) and –p the topology (.top) output file. IMPORTANT the input must end with “pdb”. Choose a forcefield and for water the TIP3P model.

-> we chose the AMBER03 forcefield as it is optimized for the use with proteins


5. Create a MDP file with the following content

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

Give a brief description of the different keywords used in this file - for this you should use the gromacs manual.

-> DFLEXIBLE: water in the the topology is flexible
-> DPOSRES: position restraints, defined in posre.itp, are included in the topology
-> implicit_solvent = GBSA: defines the implicit solvent model, here the Generalized Born formalism is used
-> integrator = steep: defines the algorithm for energy minimization, here the steepest descent algorithm is used
-> emtol: defines, when convergence is assumed: minimization is stoppen when maximum force is smaller than this value
-> nsteps: defines the maximum number of steps in the energy minimization
-> nstenergy: frequency in which energies are written to the energy file
-> energygrps: group(s) to write to the energy file 
-> ns_type: defines type of Neighbor searching, here a grid is built and only atoms in neighboring grid cells are checked when a new neighbor list is constructed
-> coulombtype: Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb ≥ rlist.
-> rcoulomb: distance for the Coulomb cut-off
-> rvdw: distance for the LJ or Buckingham cut-off
-> constraints: defines additional constraints besides the ones defined in the topology file
-> pbc: Use no periodic boundary conditions, ignore the box.


6. Use grompp to prepare the system for gromacs: grompp -v -f FILE.mdp -c FILE.gro -p FILE.top -o FILE.tpr FILE.tpr is the system file to create which we use in the next step

-> commandline: grompp -v -f gromacs_AMBER03.mdp -c gromacs_AMBER03.gro -p gromacs_AMBER03.top -o gromacs_AMBER03.tpr


7. Now we minimize the system: mdrun -v -deffnm FILE

-> commandline: mdrun -f -deffnm gromacs_AMBER03


8. Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.

Results

in this section, we give a short summary of the analyses performed above and - based on these results - conclude for each mutation if we can assign a neutral or non-neutral effect.

Mutation 1

The mutation is located at the metal-binding site of the protein. Only regarding this information, we would already suggests, that the mutation is non-neutral. However, we also want to integrate the other informations we have. The H-bond pattern changes and for SCWRL and FoldX, the mutated protein is more stable - i.e. has more free energy - than the wild type protein. For minimise, the wild type has more free energy. Summarizing, the location of the mutation and the changing H-bond pattern suggest, that this is a harmful mutation. The fold-changes in free energy are very low and thus give no striking evidence for one of the cases.
The mutation is indeed harmful.

Mutation 2

Mutation 2 is located near the active site and a substrate binding site in sequence as well as in structure. The H-bond pattern does not change, regarding the SCWRL mutagenesis analysis. Again the energy fold changes are very low and contradictory. SCWRL and FoldX assign a destabilizing effect, wehereas minimise assigns a stabilizing effect to the protein. As for mutation 1, we are again left to only consider ther location of the mutation and the H-bond pattern. However, in this case we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.

Mutation 3

The mutation is located near the active site and a substrate binding site in sequence as well as in structure. SCWRL and minimise predict again a slight stabilizing effect, whereas minimise predicts less free energy for the mutated protein. The SCWRL analysis predicts, that the H-bond pattern changes. Also here the changes in free energy predicted by the methods give no clear hint on the effect of the mutation. Again, we cannot make a reliable guess, based on this information, although we would say that the location of the mutation could be an indicator, that it is non-neutral.
HGMD assigns a harmful effect to the mutation.

Mutation 4

The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL and minimise predict a stabilizing effect and FoldX again a destabilizing. As the H-bond pattern changes it is likely that the local structure of the protein is altered. However, the mutation is not very near to any functional sites and thus this mutation might be harmless. But there is no striking evidence.
This mutation is taken from dbSNP and neutral.

Mutation 5

The mutation is located very distant from the active site and a substrate binding site in sequence as well as in structure. It is located within a beta sheet and the SCWRL analysis shows that the H-bond pattern changes in the mutated protein. As this mutation is located within a secondary structure element and changing the H-bond pattern it could disrupt the beta sheets and thus alter the structure of the whole protein.
The free energy predictions are aagin contradictory. SCWRL and minimise predict a stabilising effect, while FoldX predicts a destabilizing effect. As stated above, the mutation could disrupt the beta sheet structure and have therefore a high impact on the overall structure. Thus the mutation might be harmful.
HGMD assigns a deleterious effect to the mutation.

Mutation 6

The mutation is at moderate distance to all important functional sites of the protein. The H-bond pattern changes and SCWRL assigns exactly the same free energy to the mutated protein and to the wild type. minimise assigns a destabilising effect and FoldX a stabilizing. Again this information is not very useful and the only slightly informative facts are the location and the changing H-bond pattern, which do not give evidence for neutrality or non-neutrality.
The mutation is taken from dbSNP and has a neutral effect.

Mutation 7

The mutation is not close to important functional sites and SCWRL predicts, that the H-bond pattern is altered in the mutated protein. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again we have not a clear hint on wether this mutation could be harmful or not.
The mutation is taken from HGMD and is disease causing.

Mutation 8

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changes. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. These informations give no clear hint on wether this mutation could be harmful or not.
The mutation is taken from dbSNP and is neutral.

Mutation 9

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern does not change. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).

Mutation 10

The mutation is very distant from all functional sites. SCWRL predicts, that the H-bond pattern changea. SCWRL and FoldX predict a stabilizing effect, while minimise predicts again a destabilizing effect. Again, we are left to guess, whether the mutation is neutral or not.
The mutation is non-neutral (HGMD).


Summary

Compared to the sequence-based mutation analysis, we had much more difficulties to discirminate neutral from non-neutral mutations using only structural features of the protein. In most cases we were even left to guess into the blue, because of a lack of reliable information giving clear evidence. The most informative facts for this structure-based mutation analysis were the H-bond patterns predicted by SCWRL and the analysis of the location of the mutation with respect to functional important sites.
Surptisingly the analysis of the free energies of the mutated proteins compared to the wild type revealed only very slight fold changes and we did not know how to interpret the impact of these on the structure and thus on the function. Moreover, these fold-changes were not consistent in predicting (de-)stabilizing effects across all methods. Whereas minimise predicted destabilising effects, FoldX predicted stabilizing effects for all muations. This fact made these analyses quite useless for us.
However, some informations of the structure-based analysis are informative and are useful to dissect neutral from non-neutral mutations, if they are combined with additinal methods - e.g. from the sequence-based mutation analysis.