Task 7: MSUD - Structure-based mutation analysis
Contents
Task description
preparation
First an x-Ray pdb structure<ref>http://www.uniprot.org/uniprot/P12694</ref> file had to be chosen for our protein.
It is important that the structure has a high resolution (small Å value); furthermore the R-factor should be as small as possible, and the higher the coverage the better. Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).
Entry | Method | Resolution(Å) | Chain | Positions |
1DTW | X-ray | 2.70 | A | 46-445 |
1OLS | X-ray | 1.85 | A | 46-445 |
1OLU | X-ray | 1.90 | A | 46-445 |
1OLX | X-ray | 2.25 | A | 46-445 |
1U5B | X-ray | 1.83 | A | 46-445 |
1V11 | X-ray | 1.95 | A | 46-445 |
1V16 | X-ray | 1.90 | A | 46-445 |
1V1M | X-ray | 2.00 | A | 46-445 |
1V1R | X-ray | 1.80 | A | 46-445 |
1WCI | X-ray | 1.84 | A | 46-445 |
1X7W | X-ray | 1.73 | A | 46-445 |
1X7X | X-ray | 2.10 | A | 46-445 |
1X7Y | X-ray | 1.57 | A | 46-445 |
1X7Z | X-ray | 1.72 | A | 46-445 |
1X80 | X-ray | 2.00 | A | 46-445 |
2BEU | X-ray | 1.89 | A | 46-445 |
2BEV | X-ray | 1.80 | A | 46-445 |
2BEW | X-ray | 1.79 | A | 46-445 |
2BFB | X-ray | 1.77 | A | 46-445 |
2BFC | X-ray | 1.64 | A | 46-445 |
2BFD | X-ray | 1.39 | A | 46-445 |
2BFE | X-ray | 1.69 | A | 46-445 |
2BFF | X-ray | 1.46 | A | 46-445 |
2J9F | X-ray | 1.88 | A/C | 46-445 |
We first looked at the structure with the best resolution: 2BFD. It contains chain A and B of the protein. For us only chain A is of interest, here the file covers 400 residues( 90% of the uniprot sequence ). The structure was generated the 12th of april in 2004, using a PH-value of 5.5. Although these values would be ok, we compared them to the next best structures( looking at the resolution ):
pdb | resolution(A°) | ph | R-value |
2BFD | 1.39 | 5.5 | 0.150 |
2BFF | 1.46 | 5.5 | 0.150 |
1X7Y | 1.57 | 5.8 | 0.150 |
2BFC | 1.64 | 5.5 | 0.144 |
2BFE | 1.69 | 5.5 | 0.150 |
1U5B | 1.83 | 5.8 | 0.156 |
A new Problem turned out here, all of the x-Ray structures contain gaps, which some of the tools can't handle. As the ph-values as well as the R-values do not differ too much, we have decided to use the structure with the least gaps: 1U5B. The next step was, to prepare our list of SNP's, and substitute those, that are not contained in the pdb-sequence. All pdb-files cover position 46-445. This means we had to substitute the SNP L17F. L17F is not listed in the HGMD, and for this seen as neutral, what means we had to chose another neutral SNP for it. We've chosen A71G. The new list of SNP's then is:
N71S, M82I, Q125E, I213T, C258Y, T310R, A328T, I361V, N404S, R429H
obda_human has 4 annotated sites in the uniprot entry P12694:
pos | function | snp |
157-159 | Thiamine pyrophosphate binding | - |
206 | metal binding | - |
211 | metal binding | - |
212 | metal binding | - |
tools
SCWRL
It is possible to give SCWRL the mutated sequence. This can be done by extracting the sequence with repairPDB. Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file. This sequence file can be read in by SCWRL using the –s flag. Check if only the mutation side chain has been changed.
the extraction of the sequence and a mapping of positions between snps and pdb-sequence can be found in the journal
foldX
The first thing we observed when looking at the energy comparison between the wild type and the respective mutants was that introducing a mutation at a specific position did not cause a homogeneous increase or decrease in all energy aspects. The C258Y Mutation for example decreases both the Van der Waals and Solvation hydrophobic energies, while strongly increasing the energy due to Van der Waals clashes. As different energy aspects mights be more or less important for a specific position inside a protein in order to conserve function, in an ideal case these should be selected to determine the effect of a mutation. While this might be possible when expert knowledge about the protein is available, it most likely is not when applying the method to a large dataset, or when a fast classification is required. In this case the easiest option would be to just look at the total energy score. We decided to go with this approach and selected an increase of 1.0 in total energy as our classification margin for impaired function. According to this we consider the following mutations as deleterious: Q125E, I213T, C258Y, T310R, A328T.
Pdb | total energy | Backbone Hbond | Sidechain Hbond | Van der Waals | Electrostatics | Solvation Polar | Solvation Hydrophobic | Van der Waals clashes | entropy sidechain | entropy mainchain | sloop_entropy | mloop_entropy | cis_bond | torsional clash | backbone clash | helix dipole | water bridge | disulfide | electrostatic kon | partial covalent bonds | energy Ionisation | Entropy Complex |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1u5b_A_N71S.pdb | 0.47 | 0.13 | 0.13 | 0.12 | 0.00 | -0.16 | 0.21 | 0.00 | -0.16 | 0.20 | 0.00 | 0.00 | 0.00 | -0.00 | -0.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_M82I.pdb | -0.12 | -0.00 | 0.00 | 0.33 | -0.03 | -0.15 | 0.24 | 0.06 | -0.31 | -0.39 | 0.00 | 0.00 | 0.00 | 0.12 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_Q125E.pdb | 1.50 | -0.01 | 3.18 | -0.05 | -0.70 | -0.04 | -0.05 | 0.15 | -1.58 | 0.07 | 0.00 | 0.00 | 0.00 | -0.03 | -0.03 | 0.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_I213T.pdb | 2.21 | -0.38 | -0.49 | 1.04 | 0.00 | 0.44 | 2.54 | -0.08 | -0.49 | -0.38 | 0.00 | 0.00 | 0.00 | 0.01 | -0.44 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_C258Y.pdb | 4.55 | 0.01 | 0.17 | -1.78 | 0.07 | 1.11 | -2.84 | 7.22 | 0.31 | 0.01 | 0.00 | 0.00 | 0.00 | 0.27 | 0.90 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_T310R.pdb | 8.51 | 0.02 | 0.28 | -1.67 | 0.13 | 3.01 | -1.29 | 4.89 | 1.30 | 0.04 | 0.00 | 0.00 | 0.00 | 2.22 | 0.11 | -0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_A328T.pdb | 2.77 | -0.40 | -0.41 | -0.63 | -0.01 | 1.14 | -0.63 | 2.76 | 0.51 | -0.92 | 0.00 | 0.00 | 0.00 | 1.36 | -0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_I361V.pdb | 0.50 | 0.01 | -0.01 | 0.48 | 0.00 | -0.28 | 0.86 | -0.04 | -0.55 | 0.02 | 0.00 | 0.00 | 0.00 | 0.01 | -0.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.01 | 0.00 |
1u5b_A_N404S.pdb | 0.20 | 0.01 | 0.20 | 0.37 | 0.00 | -0.48 | 0.50 | -0.02 | -0.25 | -0.22 | 0.00 | 0.00 | 0.00 | 0.09 | -0.11 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1u5b_A_R429H.pdb | -0.08 | -0.14 | -0.19 | 0.37 | 0.14 | -0.50 | 0.40 | 0.03 | -0.43 | 0.26 | 0.00 | 0.00 | 0.00 | -0.19 | -0.20 | -0.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.00 |
Minimise
Protein | run 1 | run 2 | run 3 | run 4 | run 5 |
---|---|---|---|---|---|
Original Protein | -166.145856 | -8144.2007 | -8775.354687 | -8764.146148 | -8740.855895 |
FoldX_N71S | -254.098857 | -8154.570333 | -8779.05239 | -8773.230437 | -8736.978125 |
SCWRL_N71S | -682.78865 | -4603.39178 | -6239.51939 | -6556.667909 | -6798.138952 |
FoldX_M82I | -227.589089 | -8173.810999 | -8802.978465 | -8771.965885 | -8744.953646 |
SCWRL_M82I | -380.259836 | -4582.016239 | -6239.561724 | -6600.661303 | -6785.40363 |
FoldX_Q125E | -152.379919 | -8161.499581 | -8816.728299 | -8796.535143 | -8772.86389 |
SCWRL_Q125E | -369.726206 | -4576.418424 | -6216.02576 | -6578.592414 | -6771.381762 |
FoldX_I213T | -163.439225 | -8137.431406 | -8767.629161 | -8765.877114 | -8734.197951 |
SCWRL_I213T | -608.199626 | -4492.098614 | -6220.769096 | -6558.093174 | -6747.567668 |
FoldX_C258Y | -8090.541777 | -8747.079816 | -8730.572103 | -8693.773636 | -8569.396593 |
SCWRL_C258Y | -185.681057 | -4294.940201 | -6084.80686 | -6484.813027 | -6723.555507 |
FoldX_310R | -8052.52998 | -8669.880924 | -8671.597471 | -8560.7029 | -8425.777732 |
SCWRL_T310R | -327.616106 | -4578.192299 | -6240.169932 | -6600.503779 | -6783.873166 |
FoldX_A328T | -107.432097 | -8150.500105 | -8783.702419 | -8772.925438 | -8751.201765 |
SCWRL_A328T | -356.274445 | -4547.835117 | -6211.960089 | -6569.427074 | -6774.99665 |
FoldX_I361V | -248.788986 | -8152.466336 | -8784.533626 | -8758.586197 | -8721.483728 |
SCWRL_I361V | -372.510334 | -4579.105914 | -6219.76695 | -6586.177201 | -6779.197462 |
FoldX_N404S | -162.623248 | -8143.163685 | -8772.865324 | -8761.882377 | -8744.860863 |
SCWRL_N404S | -370.681577 | -3462.384312 | -5731.034623 | -6197.2512 | -6739.421858 |
FoldX_R429H | -220.978199 | -8218.048082 | -8852.269333 | -8845.305536 | -8826.735282 |
SCWRL_R429H | -448.084645 | -4645.512218 | -6325.30034 | -6686.569986 | -6873.313312 |
Protein | run 1 | run 2 | run 3 | run 4 | run 5 |
---|---|---|---|---|---|
1u5b_A.log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1u5b_A_1.log | 87.953001 | 10.369633 | 3.697703 | 9.084289 | -3.87777 |
1u5b_A_10.log | 54.832343 | 73.847382 | 76.914646 | 81.159388 | 85.879387 |
1u5b_A_2.log | 61.443233 | 29.610299 | 27.623778 | 7.819737 | 4.097751 |
1u5b_A_3.log | -13.765937 | 17.298881 | 41.373612 | 32.388995 | 32.007995 |
1u5b_A_4.log | -2.706631 | -6.769294 | -7.725526 | 1.730966 | -6.657944 |
1u5b_A_5.log | 7924.395921 | 602.879116 | -44.782584 | -70.372512 | -171.459302 |
1u5b_A_6.log | 7886.384124 | 525.680224 | -103.757216 | -203.443248 | -315.078163 |
1u5b_A_7.log | -58.713759 | 6.299405 | 8.347732 | 8.77929 | 10.34587 |
1u5b_A_8.log | 82.64313 | 8.265636 | 9.178939 | -5.559951 | -19.372167 |
1u5b_A_9.log | -3.522608 | -1.037015 | -2.489363 | -2.263771 | 4.004968 |
1u5b_A_A328T.log | 190.128589 | -3596.365583 | -2563.394598 | -2194.719074 | -1965.859245 |
1u5b_A_C258Y.log | 19.535201 | -3849.260499 | -2690.547827 | -2279.333121 | -2017.300388 |
1u5b_A_I213T.log | 442.05377 | -3652.102086 | -2554.585591 | -2206.052974 | -1993.288227 |
1u5b_A_I361V.log | 206.364478 | -3565.094786 | -2555.587737 | -2177.968947 | -1961.658433 |
1u5b_A_M82I.log | 214.11398 | -3562.184461 | -2535.792963 | -2163.484845 | -1955.452265 |
1u5b_A_N404S.log | 204.535721 | -4681.816388 | -3044.320064 | -2566.894948 | -2001.434037 |
1u5b_A_N71S.log | 516.642794 | -3540.80892 | -2535.835297 | -2207.478239 | -1942.716943 |
1u5b_A_Q125E.log | 203.58035 | -3567.782276 | -2559.328927 | -2185.553734 | -1969.474133 |
1u5b_A_R429H.log | 281.938789 | -3498.688482 | -2450.054347 | -2077.576162 | -1867.542583 |
1u5b_A_T310R.log | 161.47025 | -3566.008401 | -2535.184755 | -2163.642369 | -1956.982729 |
Gromacs
Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. Here we will get a basic introduction, which will also be useful in the MD task later on. First we will get all the necessary files and then we will minimize our protein in vacuum. Finally we will analyze the energies during the minimization.
Step 1 fetch-pdb
1.Use fetchpdb to get the pdb structure – look at the script, what does it do?
This script in meant to fetch pdb-files from the server. After a check was passed if the given pdb-file-name is valid, the archive is downloaded, extracted and everything but the pdb-file is removed. In our case fetch-pdb was not used, because we had to use our prepared pdb-file without the gaps in it.
repair pdb
2.Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.
To extract the protein without side chains and solvent the options -noh and -nosol from repair pdb were used.
repairPDB 1u5b.pdb -noh -cleansol > 1u5b_noside.pdb
A complete list of parameters of repairPDB can be found in the journal.
SCWRL
For SCWRL the sequence had to be parsed out of the new pdb-file:
repairPDB 1u5b_noside.pdb -seq > 1u5b_sequence
All aminoacids were turned into small letters, and then SCWRL was run:
/opt/SS12-Practical/scwrl4/Scwrl4 -i 1u5b_clean.pdb -s 1u5b-sequence -o 1u5b_scwrl.pdb
After SCWRL, the hydrogens had to be removed again:
repairPDB 1u5b_scwrl.pdb -noh -cleansol > 1u5b_scwrl_noside.pdb
pdb2gmx
We have chosen to use the model amber03 for the first forcefield.
/opt/SS12-Practical/gromacs/bin/pdb2gmx -f 1u5b_scwrl_noside.pdb -o 1u5b_gromacs.out -p 1u5b_gromacs.top -water tip3p -ff amber03
MDP - file creation
title = PBSA minimization in vacuum cpp = /usr/bin/cpp define = -DFLEXIBLE -DPOSRES implicit_solvent = GBSA integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System ns_type = grid coulombtype = cut-off rcoulomb = 1.0 rvdw = 1.0 constraints = none pbc = no
integrator - steep | A steepest descent algorithm for energy minimization. The maximum step size is emstep [nm], the tolerance is emtol [kJ mol−1 nm−1 ] |
emtol - 1.0 | the minimization is converged when the maximum force is smaller than this value |
nsteps - 10-5000 | maximum number of steps to integrate or minimize, -1 is no maximum |
nstenergy - 1 | frequency to write energies to energy file, the last energies are always written, should be a multiple of nstcalcenergy |
energygrps - system | group(s) to write to energy file |
ns_type - grid | Make a grid in the box and only check atoms in neighboring grid cells when construct-
ing a new neighbor list every nstlist steps. In large systems grid search is much faster than simple search |
coulombtype cut-off | Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb≥rlist |
rcoulomb - 1.0 | distance for the Coulomb cut-off |
rvdw | distance for the LJ or Buckingham cut-off |
constraints = none | No constraints except for those defined explicitly in the topology, i.e. bonds are rep-
resented by a harmonic (or other) potential or a Morse potential (depending on the setting of morse) and angles by a harmonic (or other) potential |
pbc - no | Remove the periodicity (make molecule whole again) |
grompp
/opt/SS12-Practical/gromacs/bin/grompp -v -f config.mdp -c 1u5b_gromacs.out.gro -p 1u5b_gromacs.top -o 1u5b.tpr
mdrun
/opt/SS12-Practical/gromacs/bin/mdrun -v -deffnm 1u5b.tpr
analysis
/opt/SS12-Practical/gromacs/bin/g_energy -f 1u5b.tpr.edr -o energy1.xvg
nsteps vs time
nsteps | performed steps | real time | user time | sys-time |
10 | 10 | 0m 1.886s | 0m 2.290s | 0. 0.930s |
50 | 50 | 0m 6.516s | 0m 9.800s | 0m 2.680s |
100 | 100 | 0m 12.319s | 0m 19.030s | 0m 4.980s |
200 | 200 | 0m 23.972s | 0m 37.650s | 0m 9.510s |
500 | 345 | 0m 40.979s | 1m 4.630s | 0m 16.160s |
1000 | 345 | 0m 40.579s | 1m 4.680s | 0m 15.860s |
2000 | 345 | 0m 40.680s | 1m 4.540s | 0m 16.220s |
5000 | 345 | 0m 40.581s | 1m 4.030s | 0m 16.520s |
Bond Angle Potential
Bond
force-field | Average | Err.Est. | RMSD | Tot-Drift |
AMBER03 | 1568.32 | 830 | 4604.24 | -4990.8 (kJ/mol) |
OPLSAA | 1462.75 | 760 | nan | -4456.78 (kJ/mol) |
charmm27 | 2645.82 | 1200 | 6332.19 | -7315.03 (kJ/mol) |
Potential
force-field | Average | Err.Est. | RMSD | Tot-Drift |
AMBER03 | 25037.6 | 74000 | 960847 | -460948 (kJ/mol) |
OPLSAA | -20932.9 | 65000 | nan | -388340 (kJ/mol) |
charmm27 | 25785.2 | 82000 | 952822 | -518882 (kJ/mol) |
Angle
force-field | Average | Err.Est. | RMSD | Tot-Drift |
AMBER03 | 3412.08 | 99 | 356.099 | -538.273 (kJ/mol) |
OPLSAA | 3038.18 | 150 | nan | -876.033 (kJ/mol) |
charmm27 | - | - | - | - |
Comparison of wt - mut
The conmparison of wt and mut was made using the force-field AMBER03 only.
mutation | Bond( AVG ) | Angle( AVG ) | Potential | Potential delta |
wt | 1568.32 | 3412.08 | 25037.6 | - |
N71S | 1605.55 | 3414.94 | 28654.3 | -3616.7 |
M82I | 1535.03 | 3414.41 | 21683.7 | 3353.9 |
Q125E | 1589.75 | 3415.99 | 27132 | -2094.4 |
I213T | 1478 | 3401.81 | 16715.4 | 8322.2 |
C258Y | 1372.57 | 3427.22 | 7933.73 | 17103.87 |
T310R | 1335.64 | 3420.59 | 10829.8 | 14207.8 |
A328T | 1924.29 | 3465.46 | 56637.7 | -31600.1 |
I361V | 1532.34 | 3405.29 | 22019.1 | 3018.5 |
N404S | 1290.27 | 3381.86 | -757.702 | 25795.302 |
R429H | 1546.64 | 3485.38 | 23932.6 | 1105 |
The change of the potential energy of the protein, caused by the mutations shall give a hint about the effect of the SNP on the function of the protein. Whereas the mutations N71S, M82I, Q125E, I213T, I361V, and R429H only have a small change of potential energy, and for this won't affect the protein's function too much, the rest (C258Y, T310R, A328T, N404S) are predicted to be damaging, with a changing potential energy of > 8000 kj. As there is no certain cutoff for this effect, this is a personal interpretion of the values.
Conclusion Table
Mutation | prediction SCWRL | prediction FOLDX | prediction GRPMACS | concluding prediction |
N17S | prediction SCWRL | Benign | Benign | concluding prediction |
M82I | prediction SCWRL | Benign | Benign | concluding prediction |
Q125E | prediction SCWRL | Deleterious | Benign | concluding prediction |
I213T | prediction SCWRL | Deleterious | Benign | concluding prediction |
C258Y | prediction SCWRL | Deleterious | Deleterious | concluding prediction |
T310R | prediction SCWRL | Deleterious | Deleterious | concluding prediction |
A328T | prediction SCWRL | Deleterious | Deleterious | concluding prediction |
I361V | prediction SCWRL | Benign | Benign | concluding prediction |
N404S | prediction SCWRL | Benign | Deleterious | concluding prediction |
R429H | prediction SCWRL | Benign | Benign | concluding prediction |
References
<references />