Task 7: MSUD - Structure-based mutation analysis

From Bioinformatikpedia
Revision as of 16:05, 25 June 2012 by Wagnerr (talk | contribs) (Minimise)

Task description

preparation

First an x-Ray pdb structure<ref>http://www.uniprot.org/uniprot/P12694</ref> file had to be chosen for our protein.

It is important that the structure has a high resolution (small Å value);
furthermore the R-factor should be as small as possible, and the higher the coverage the better.
Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).
EntryMethodResolution(Å)ChainPositions
1DTWX-ray2.70A46-445
1OLSX-ray1.85A46-445
1OLUX-ray1.90A46-445
1OLXX-ray2.25A46-445
1U5BX-ray1.83A46-445
1V11X-ray1.95A46-445
1V16X-ray1.90A46-445
1V1MX-ray2.00A46-445
1V1RX-ray1.80A46-445
1WCIX-ray1.84A46-445
1X7WX-ray1.73A46-445
1X7XX-ray2.10A46-445
1X7YX-ray1.57A46-445
1X7ZX-ray1.72A46-445
1X80X-ray2.00A46-445
2BEUX-ray1.89A46-445
2BEVX-ray1.80A46-445
2BEWX-ray1.79A46-445
2BFBX-ray1.77A46-445
2BFCX-ray1.64A46-445
2BFDX-ray1.39A46-445
2BFEX-ray1.69A46-445
2BFFX-ray1.46A46-445
2J9FX-ray1.88A/C46-445

We first looked at the structure with the best resolution: 2BFD. It contains chain A and B of the protein. For us only chain A is of interest, here the file covers 400 residues( 90% of the uniprot sequence ). The structure was generated the 12th of april in 2004, using a PH-value of 5.5. Although these values would be ok, we compared them to the next best structures( looking at the resolution ):

pdbresolution(A°)phR-value
2BFD1.395.50.150
2BFF1.465.50.150
1X7Y1.575.80.150
2BFC1.645.50.144
2BFE1.695.50.150
1U5B1.835.80.156

A new Problem turned out here, all of the x-Ray structures contain gaps, which some of the tools can't handle. As the ph-values as well as the R-values do not differ too much, we have decided to use the structure with the least gaps: 1U5B. The next step was, to prepare our list of SNP's, and substitute those, that are not contained in the pdb-sequence. All pdb-files cover position 46-445. This means we had to substitute the SNP L17F. L17F is not listed in the HGMD, and for this seen as neutral, what means we had to chose another neutral SNP for it. We've chosen A71G. The new list of SNP's then is:

N71S, M82I, Q125E, I213T, C258Y, T310R, A328T, I361V, N404S, R429H
The picture shows the 10 SNP's colored in blue on the chain A of obda_human in green. The structure is shown with pymol using the modified PDB file.

obda_human has 4 annotated sites in the uniprot entry P12694:

posfunctionsnp
157-159Thiamine pyrophosphate binding -
206metal binding -
211metal binding -
212metal binding -

tools

SCWRL

It is possible to give SCWRL the mutated sequence.
This can be done by extracting the sequence with repairPDB.
Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file.
This sequence file can be read in by SCWRL using the –s flag. Check if only the mutation side chain has been changed.

the extraction of the sequence and a mapping of positions between snps and pdb-sequence can be found in the journal

foldX

The first thing we observed when looking at the energy comparison between the wild type and the respective mutants was that introducing a mutation at a specific position did not cause a homogeneous increase or decrease in all energy aspects. The C258Y Mutation for example decreases both the Van der Waals and Solvation hydrophobic energies, while strongly increasing the energy due to Van der Waals clashes. As different energy aspects mights be more or less important for a specific position inside a protein in order to conserve function, in an ideal case these should be selected to determine the effect of a mutation. While this might be possible when expert knowledge about the protein is available, it most likely is not when applying the method to a large dataset, or when a fast classification is required. In this case the easiest option would be to just look at the total energy score. We decided to go with this approach and selected an increase of 1.0 in total energy as our classification margin for impaired function. According to this we consider the following mutations as deleterious: Q125E, I213T, C258Y, T310R, A328T.

Pdb total energy Backbone Hbond Sidechain Hbond Van der Waals Electrostatics Solvation Polar Solvation Hydrophobic Van der Waals clashes entropy sidechain entropy mainchain sloop_entropy mloop_entropy cis_bond torsional clash backbone clash helix dipole water bridge disulfide electrostatic kon partial covalent bonds energy Ionisation Entropy Complex
1u5b_A_N71S.pdb 0.47 0.13 0.13 0.12 0.00 -0.16 0.21 0.00 -0.16 0.20 0.00 0.00 0.00 -0.00 -0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_M82I.pdb -0.12 -0.00 0.00 0.33 -0.03 -0.15 0.24 0.06 -0.31 -0.39 0.00 0.00 0.00 0.12 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_Q125E.pdb 1.50 -0.01 3.18 -0.05 -0.70 -0.04 -0.05 0.15 -1.58 0.07 0.00 0.00 0.00 -0.03 -0.03 0.55 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_I213T.pdb 2.21 -0.38 -0.49 1.04 0.00 0.44 2.54 -0.08 -0.49 -0.38 0.00 0.00 0.00 0.01 -0.44 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_C258Y.pdb 4.55 0.01 0.17 -1.78 0.07 1.11 -2.84 7.22 0.31 0.01 0.00 0.00 0.00 0.27 0.90 -0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_T310R.pdb 8.51 0.02 0.28 -1.67 0.13 3.01 -1.29 4.89 1.30 0.04 0.00 0.00 0.00 2.22 0.11 -0.42 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_A328T.pdb 2.77 -0.40 -0.41 -0.63 -0.01 1.14 -0.63 2.76 0.51 -0.92 0.00 0.00 0.00 1.36 -0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_I361V.pdb 0.50 0.01 -0.01 0.48 0.00 -0.28 0.86 -0.04 -0.55 0.02 0.00 0.00 0.00 0.01 -0.07 0.00 0.00 0.00 0.00 0.00 -0.01 0.00
1u5b_A_N404S.pdb 0.20 0.01 0.20 0.37 0.00 -0.48 0.50 -0.02 -0.25 -0.22 0.00 0.00 0.00 0.09 -0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1u5b_A_R429H.pdb -0.08 -0.14 -0.19 0.37 0.14 -0.50 0.40 0.03 -0.43 0.26 0.00 0.00 0.00 -0.19 -0.20 -0.07 0.00 0.00 0.00 0.00 0.25 0.00

Minimise

Protein run 1 run 2 run 3 run 4 run 5
Original Protein -166.145856 -8144.2007 -8775.354687 -8764.146148 -8740.855895
FoldX_N71S -254.098857 -8154.570333 -8779.05239 -8773.230437 -8736.978125
SCWRL_N71S -682.78865 -4603.39178 -6239.51939 -6556.667909 -6798.138952
FoldX_M82I -227.589089 -8173.810999 -8802.978465 -8771.965885 -8744.953646
SCWRL_M82I -380.259836 -4582.016239 -6239.561724 -6600.661303 -6785.40363
FoldX_Q125E -152.379919 -8161.499581 -8816.728299 -8796.535143 -8772.86389
SCWRL_Q125E -369.726206 -4576.418424 -6216.02576 -6578.592414 -6771.381762
FoldX_I213T -163.439225 -8137.431406 -8767.629161 -8765.877114 -8734.197951
SCWRL_I213T -608.199626 -4492.098614 -6220.769096 -6558.093174 -6747.567668
FoldX_C258Y -8090.541777 -8747.079816 -8730.572103 -8693.773636 -8569.396593
SCWRL_C258Y -185.681057 -4294.940201 -6084.80686 -6484.813027 -6723.555507
FoldX_310R -8052.52998 -8669.880924 -8671.597471 -8560.7029 -8425.777732
SCWRL_T310R -327.616106 -4578.192299 -6240.169932 -6600.503779 -6783.873166
FoldX_7.log -107.432097 -8150.500105 -8783.702419 -8772.925438 -8751.201765
FoldX_8.log -248.788986 -8152.466336 -8784.533626 -8758.586197 -8721.483728
FoldX_9.log -162.623248 -8143.163685 -8772.865324 -8761.882377 -8744.860863
FoldX_10.log -220.978199 -8218.048082 -8852.269333 -8845.305536 -8826.735282
SCWRL_A328T -356.274445 -4547.835117 -6211.960089 -6569.427074 -6774.99665
SCWRL_I361V.log -372.510334 -4579.105914 -6219.76695 -6586.177201 -6779.197462
SCWRL_N404S.log -370.681577 -3462.384312 -5731.034623 -6197.2512 -6739.421858
SCWRL_R429H.log -448.084645 -4645.512218 -6325.30034 -6686.569986 -6873.313312

Gromacs

Step 1 fetch-pdb

1.Use fetchpdb to get the pdb structure – look at the script, what does it do?

This script in meant to fetch pdb-files from the server. After a check was passed if the given pdb-file-name is valid, the archive is downloaded, extracted and everything but the pdb-file is removed. In our case fetch-pdb was not used, because we had to use our prepared pdb-file without the gaps in it.

repair pdb

2.Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available.  Make sure you chose the right chain.

To extract the protein without side chains and solvent the options -noh and -nosol from repair pdb were used.

repairPDB 1u5b.pdb -noh -cleansol > 1u5b_noside.pdb

A complete list of parameters of repairPDB can be found in the journal.

SCWRL

For SCWRL the sequence had to be parsed out of the new pdb-file:

repairPDB 1u5b_noside.pdb -seq > 1u5b_sequence

All aminoacids were turned into small letters, and then SCWRL was run:

/opt/SS12-Practical/scwrl4/Scwrl4 -i 1u5b_clean.pdb -s 1u5b-sequence -o 1u5b_scwrl.pdb

After SCWRL, the hydrogens had to be removed again:

repairPDB 1u5b_scwrl.pdb -noh -cleansol > 1u5b_scwrl_noside.pdb

pdb2gmx

We have chosen to use the model amber03 for the first forcefield.

/opt/SS12-Practical/gromacs/bin/pdb2gmx -f 1u5b_scwrl_noside.pdb -o 1u5b_gromacs.out -p 1u5b_gromacs.top -water tip3p -ff amber03

MDP - file creation

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no
integrator - steepA steepest descent algorithm for energy minimization. The maximum step size is emstep [nm], the tolerance is emtol [kJ mol−1 nm−1 ]
emtol - 1.0the minimization is converged when the maximum force is smaller than this value
nsteps - 10-5000maximum number of steps to integrate or minimize, -1 is no maximum
nstenergy - 1frequency to write energies to energy file, the last energies are always written, should be a multiple of nstcalcenergy
energygrps - systemgroup(s) to write to energy file
ns_type - gridMake a grid in the box and only check atoms in neighboring grid cells when construct-

ing a new neighbor list every nstlist steps. In large systems grid search is much

faster than simple search
coulombtype cut-offTwin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb≥rlist
rcoulomb - 1.0distance for the Coulomb cut-off
rvdwdistance for the LJ or Buckingham cut-off
constraints = noneNo constraints except for those defined explicitly in the topology, i.e. bonds are rep-

resented by a harmonic (or other) potential or a Morse potential (depending on the

setting of morse) and angles by a harmonic (or other) potential
pbc - noRemove the periodicity (make molecule whole again)

grompp

/opt/SS12-Practical/gromacs/bin/grompp -v -f config.mdp -c 1u5b_gromacs.out.gro -p 1u5b_gromacs.top -o 1u5b.tpr

mdrun

/opt/SS12-Practical/gromacs/bin/mdrun -v -deffnm 1u5b.tpr

analysis

/opt/SS12-Practical/gromacs/bin/g_energy -f 1u5b.tpr.edr -o energy1.xvg

nsteps vs time

nstepsperformed stepsreal timeuser timesys-time
10100m 1.886s0m 2.290s0. 0.930s
50500m 6.516s0m 9.800s0m 2.680s
1001000m 12.319s0m 19.030s0m 4.980s
2002000m 23.972s0m 37.650s0m 9.510s
5003450m 40.979s1m 4.630s0m 16.160s
10003450m 40.579s1m 4.680s0m 15.860s
20003450m 40.680s1m 4.540s0m 16.220s
50003450m 40.581s1m 4.030s0m 16.520s
The plot shows how much time(y-axis) was consumed to perform a certain number of steps(x-axis) in mdrun. The maximum number of steps for our pdb-file was reached at 345 steps.
The plot shows how much time was consumed to perform one run with a certain amount of maximum steps of mdrun. y-axis->time, x-axis->step-threshold

Bond Angle Potential

Bond
force-fieldAverageErr.Est. RMSDTot-Drift
AMBER031568.328304604.24-4990.8 (kJ/mol)
OPLSAA1462.75760nan-4456.78 (kJ/mol)
charmm272645.8212006332.19-7315.03 (kJ/mol)
Potential
force-fieldAverageErr.Est. RMSDTot-Drift
AMBER0325037.674000960847-460948 (kJ/mol)
OPLSAA-20932.965000nan-388340 (kJ/mol)
charmm2725785.282000952822-518882 (kJ/mol)
Angle
force-fieldAverageErr.Est. RMSDTot-Drift
AMBER033412.0899356.099-538.273 (kJ/mol)
OPLSAA3038.18150nan-876.033 (kJ/mol)
charmm27 - - - -

References

<references />