Structure-based mutation analysis BCKDHA

From Bioinformatikpedia

Structure selection

In order to perform a structure based mutation analysis an appropriate protein structure is needed. Therefore one has to check whether the resolution of the structure is high enough, that means the Å-value should be small. The R-factor (reliability factor or residual factor) is another measure to access the quality of a structure. The R-factor rates how well the crystallographic model predicts the observed data from experimental X-ray diffraction. The smaller the R-factor the better. The coverage is the percentage of UniProt sequence present in the sample structure. This value was obtained from [[1]]. The coverage should be quite high. The pH-value indicates the pH-value at which the structure was resolved. Ideally it should be resolved at physiological pH (7.4).

The following table presents the PDB structures for BCKDHA to date. For each structure the resolution in [Å], the R-factor, coverage and pH-value are listed as well.

PDB id resolution [Å] R-factor % of UniProt sequence present in the sample %residues observed ph-value
1DTW 2.70 0.224 90% 97% 7.5*
1OLS 1.85 0.172 90% 96% 5.5
1OLU 1.90 0.161 90% 91% 5.5
1OLX 2.25 0.161 90% 97% 5.5
1U5B 1.83 0.156 90% 97% 5.8
1V11 1.95 0.139* 90% 92% 5.5
1V16 1.90 0.132* 90% 92% 5.5
1V1M 2.00 0.130* 90% 93% 5.5
1V1R 1.80 0.158 90% 88% 5.5
1WCI 1.84 0.149 90% 97% 5.5
1X7W 1.73 0.148 90% 92% 5.8
1X7X 2.10 0.149 90% 92% 5.8
1X7Y 1.57 0.150 90% 92% 5.8
1X7Z 1.72 0.154 90% 97% 5.8
1X80 2.00 0.161 90% 92% 5.8
2BEU 1.89 0.171 90% 97% 5.5
2BEV 1.80 0.139 90% 97% 5.5
2BEW 1.79 0.147 90% 97% 5.5
2BFB 1.77 0.145 90% 92% 5.5
2BFC 1.64 0.144 90% 92% 5.5
2BFD 1.39* 0.150 90% 93% 5.5
2BFE 1.69 0.150 90% 93% 5.5
2BFF 1.46 0.150 90% 98% 5.5
2J9F 1.88 0.171 90% 95% 5.5

The asteriks-marked values indicate that these structures were resolved with the asked experimental quality. As one can see, none of the structures fulfills all conditions.

Furthermode, we could not use any of the PDB structures for BCKDHA because all of them had gaps in the secondary structure which means that some residues were missing. So we took the structure which has the less gaps: 1U5B

  • resultion: 1.83
  • R-factor: 0.156
  • ph-value: 5.8

This structure has to be modified with some programms to close the gaps. Additionally the first residues which are in BCKDHA misses in 1U5B thats why the start position corresponds to position 6 of the BCKDHA -PDB sequence.

As we can see none of the values corresponds to the demands because it was asked for a structure which has a very small R-factor, a pH of 7.4 and a high resolution.

Mapping of the mutations on the crystal structure

Figure 1 and 2 shows the structure of BCKDHA with mapped mutations.

Figure 1: Structure of BCKDHA. Violet: mutations, orange: thiamine pyrophosphate binding sites, yellow: metal binding sites.
Figure 2: Surface of BCKDHA. Violet: mutations, orange: thiamine pyrophosphate binding sites, yellow: metal binding sites.

As shown in Figure 2 there are some mutations on the surface of the protein, which might have an effect on the stability of the protein. Furthermore it is possible to recognize a mutation which is quite near to the active center. These facts are studied in more detail in the mutation analysis section.

Comparison energies


Before we could use SCWRL we first had to get the sequence of our model:

repairPDB bckdha.pdb  -seq >> bckdha.seq

When we have the sequence we have to make one file for each mutation. In these files we copied the bckdha.seq and changed the sequence to lower case letters. Then we add the mutation in an upper case letter.

To run SCWRL we used the command:

scwrl -i bckdha.pdb -s mutation1.seq -o mutation1Model.pdb

Total minimal energy of the graph

Position Energy
M82L 642.213
Q125E 616.85
Y166N 616.293
G249S 633.378
C264W 805.257
R265W 710.647
I326T 619.424
F409C 617.305
Y438N 615.951


To use foldX we first build a runscript. It is important to change values of <Temperature> and <pH> to the values of the used protein. These values can be found on the pdb side . Additionally we had to create one file with all PDB Ids each in a new line (list.txt). We used the command Foldx -runfile run.txt > Stout.txt to run the programm.

total energy difference
wildtype 401.00 0
M82L 437.88 -36.88
Q125E 431.77 -30.77
Y166N 432.24 -31.24
G249S 432.22 -31.22
C264W 488.43 -87.43
R265W 460.43 -59.43
I326T 432.94 -31.94
F409C 433.33 -32.33
Y438N 431.56 -30.56

After using foldx we have the total energy for the wiltype protein and for each mutation. The value of the wildtype protein is 401.00 which is already a high value. This means that the protein is quite instabile. To find out which mutation has a high influence on the protein we look at the energies and especially on the difference between the energy of the mutated protein and the wildtype protein. All of the mutated proteins have a much higher energy than the unmutated protein which means that these proteins are less stable. We can see in the table that the proteins can be divided into two groups. The first group has an energy difference of about 31 and the other group has a much higher difference.


It is important to remove the hydrogens and water before using the programm. For this we used the new version of repairPDB of the virtualbox. The programm can be started with the command: repairPDB bckdha.pdb -nosol out.pdb > Stout.txt
It is also possible to use the old version but then the command is: repairPDB bckdha.pdb -nosol -noh out.pdb > Stout.txt
It is useful to save the output in a file because it includes the energy.

total energy difference
wildtype -2485.452755 0
M82L -4253.174790 1767.722015
Q125E -4080.989512 1595.536757
Y166N -4354.495238 1869.042483
G249S -4280.043000 1794.590245
C264W -3745.313620 1259.860865
R265W -3989.790625 1504.33787
I326T -4317.105618 1831.652863
F409C -4358.528143 1873.075388
Y438N -4339.778964 1854.326209

Minimise calculates the energy for a mutation by building a new model for each mutation. Then it calculates the energy for the whole mutated model. The aim by comparing the mutated models with the wildtype is to find out if there is a structural change caused by a mutation. The calculated energies are discussed in more detail in the mutation analysis section (below).


The first part describes general background information for gromacs as well as how to run those programs. The second part contains the result description and analysis.


1. fetchpdb

The fetch-pdb script first checks, if it was called with an valid PDB-id. If the entered PDB code has 4letters, the script tries to download the pdb-file from the server. The successfully downloaded folder gets unzipped and everything except the actual pdb file is removed.

2. repairPDB

For repairPDB the following options are available:

-offset value offset the residue numbering
-chain char change Chain ID
-ratom renumber Atoms
-rres renumber Residues
-noh remove hydrogens
-het no change of HETATM to ATOM for AA
-seq returns protein sequence from AA in pdb file
-seqrs protein sequence from SEQRES entries
-nosol just Protein, no solvent OR
-ssw cutoff print only waters with B-value below cutoff OR
-cleansol remove overlapping solvent for GROMACS

We run repairPDB using the following command:

repairPDB bckdha_mod.pdb -noh -nosol > bckdha_clean.pdb

Using this command we removed hydrogens and solvent from our pdb to get just the protein.


SCWRL was executed using the following command:

scwrl -i bckdha_mod.pdb -s extractedPDB.seq -o bckdha_scwrl.pdb

SCWRL returned a pdb including HETATOMS. These solvent atoms needed to be removed before continuing.


(use clean pdb without HEATOMS!)

pdb2gmx -f bckdha_clean.pdb -o bckdha.gro -p -water tip3p -ff amber03

5. MDP

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

adjust nsteps for the time vs steps analysis

integrator a steepest descent algorithm for energy minimization.
emtol tolerance for steep integrator:the minimization is converged when the maximum force is smaller than this value
nsteps maximum number of steps to integrate or minimize, -1 is no maximum
nstenergy frequency to write energies to energy file (last energies are always written)
energygrps groups to write to energy files
ns_type Which atoms to check when construction a new neighbor list
coulombtype which coulomb-type to use, e.g. Cut-off, PME, Ewald
rcoulomb distance for the Coulomb cut-off
rvdw Van-der-Waals cut-of (distance for the LJ or Buckingham cut-off)
constraints constraints for bonds, e.g bonds are presented by a potention, all bonds are converted to constraints,...
pbc Remove the periodicity (make molecule whole again)

6. grompp

grompp -v -f bckdha.mdp -c bckdha.gro -p -o bckdha.tpr

7. System Minimization

mdrun -v -deffnm bckdha 2> mdrun_out.txt

8. Analyzation

g_energy -f bckdha.edr -o energy_1.xvg


Wildtype analysis: nsteps vs time

The table below shoes the running time for mdrun depending on different values for nsteps. It also lists the real number of steps carried out to calculate the energy.

steps time (real) [s] time (user) [s] time (sys) [s] performed steps
50 5.453 4.730 0.120 50
100 10.393 9.210 0.240 100
500 36.419 30.660 0.780 338
1000 5.261 4.390 0.130 47
2000 10.564 8.500 0.290 93
3000 10.661 8.840 0.230 96
4000 2.620 2.010 0.140 21
5000 3.693 3.300 0.100 35

Figure 3 shows the correlation between nsteps and the running time for mdrun

Figure 3: nsteps vs running time for mdrun

Interestingly, the running time is not dependent on the number of nsteps, but just on the number of really performed steps. There is a linear dependency between the calculation time and the number of performed steps. The number of performed steps however is not correlating with the value for nsteps. It is not obvious why the number of performed steps varies so extremely given a certain value for nsteps.

Wildtype analysis: force fields

The different force fields chosen for this task were:

  • AMBER03
Figure 4: GROMACS Energy for the AMBER03 forcefield using the wildtype bckdha structure.
  • CHARMM27
Figure 5: GROMACS Energy for the CHARMM27 forcefield using the wildtype bckdha structure.
Figure 6: GROMACS Energy for the OPLS-AA forcefield using the wildtype bckdha structure.

Bond Analysis

Force Field Average Err. Est. RMSD Tot-Drift (kJ/mol)
AMBER03 3072.83 2200 -nan -13100.2
CHARMM25 3180.46 1700 7382.72 -9958.05
OPLS 2780.55 2100 -nan -11542.6

Angle Analysis

Force Field Average Err. Est. RMSD Tot-Drift (kJ/mol)
AMBER03 3616.97 230 -nan -1295.57
CHARMM25 5018.38 490 1646.81 -2783.35
OPLS 3271.23 340 -nan -1889.98

Potential Analysis

Force Field Average Err. Est. RMSD Tot-Drift (kJ/mol)
AMBER03 2.67001e+07 2.6e+07 -nan -1.60382e+08
CHARMM25 487.479 97 199.742 673.043
OPLS 2.38353e+07 2.4e+07 -nan -1.39932e+08

Comparing the results for the different force fields it is noticeable that only Charmm27 produces values for the RMSD. The values calculated by AMBER03 and OPLS are mostly similar, only the CHARMM27 values vary totally.

Figure 4 shows the energy calculated by Gromacs using the AMBER03 forcefield applied to the wildtype protein structure. Figure 5 and 6 are showing the energy for the Charmm25 and the OPLS-AA force fields, respectively.

Mutation analysis

In order to discuss the effect of different mutations more in detail, one page for each mutation was created:


The following table summarizes the calculated energies for the different mutations.

 FoldX  Minimise  Gromacs
Mutation energy value difference energy value difference energy value difference
wildtype 401.00 0 -2485.452755 0 2.67001e+07 0
M82L 437.88 36.88 -4253.174790 -1767.722015 5.16e+06 -21540100
Q125E 431.77 30.77 -4080.989512 -1595.536757 5.23e+06 -21470100
Y166N 432.24 31.24 -4354.495238 -1869.042483 7.95e+06 -18750100
G249S 432.22 31.22 -4280.043000 -1794.590245 5.96e+06 -20740100
C264W 488.43 87.43 -3745.313620 -1259.860865 3.41e+07 7399900
R265W 460.43 59.43 -3989.790625 -1504.33787 5.36e+06 -21340100
I326T 432.94 31.94 -4317.105618 -1831.652863 7.29e+06 -19410100
F409C 433.33 32.33 -4358.528143 -1873.075388 4.68e+06 -22020100
Y438N 431.56 30.56 -4339.778964 1854.326209 8.33e+06 -18370100

Interestingly, foldX calculates for all mutations a higher energy than for the wildtype, whereas the energy of the mutated protein calculated by Minimise and Gromacs is (almost) always lower than the wildtype energy. The difference in energies is also very varying between these tools. While foldX returns usually very small energy differences between wildtype and mutated structures, the differences are bigger for minimise and even higher for Gromacs. The most interesting observation from these energies is the mutation C264W, where foldX calculates a very different energy for the mutated protein and also Gromacs returns for the only time an energy that is higher for the mutated structure than for the wildtype structure. See detailed mutation analysis for more information.


go back to Maple syrup urine disease main page

go back to Task 6 Sequence based mutation analysis

go to Task 8 Molecular Dynamics Simulations

go to Reference Sequence BCKDHA