Task description

This weeks journal can be found here.

preparation

First an x-Ray pdb structure<ref>http://www.uniprot.org/uniprot/P12694</ref> file had to be chosen for our protein.

It is important that the structure has a high resolution (small Å value);
furthermore the R-factor should be as small as possible, and the higher the coverage the better.
Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).

Entry	Method	Resolution(Å)	Chain	Positions
1DTW	X-ray	2.70	A	46-445
1OLS	X-ray	1.85	A	46-445
1OLU	X-ray	1.90	A	46-445
1OLX	X-ray	2.25	A	46-445
1U5B	X-ray	1.83	A	46-445
1V11	X-ray	1.95	A	46-445
1V16	X-ray	1.90	A	46-445
1V1M	X-ray	2.00	A	46-445
1V1R	X-ray	1.80	A	46-445
1WCI	X-ray	1.84	A	46-445
1X7W	X-ray	1.73	A	46-445
1X7X	X-ray	2.10	A	46-445
1X7Y	X-ray	1.57	A	46-445
1X7Z	X-ray	1.72	A	46-445
1X80	X-ray	2.00	A	46-445
2BEU	X-ray	1.89	A	46-445
2BEV	X-ray	1.80	A	46-445
2BEW	X-ray	1.79	A	46-445
2BFB	X-ray	1.77	A	46-445
2BFC	X-ray	1.64	A	46-445
2BFD	X-ray	1.39	A	46-445
2BFE	X-ray	1.69	A	46-445
2BFF	X-ray	1.46	A	46-445
2J9F	X-ray	1.88	A/C	46-445

We first looked at the structure with the best resolution: 2BFD. It contains chain A and B of the protein. For us only chain A is of interest, here the file covers 400 residues( 90% of the uniprot sequence ). The structure was generated the 12th of april in 2004, using a PH-value of 5.5. Although these values would be ok, we compared them to the next best structures( looking at the resolution ):

pdb	resolution(A°)	ph	R-value
2BFD	1.39	5.5	0.150
2BFF	1.46	5.5	0.150
1X7Y	1.57	5.8	0.150
2BFC	1.64	5.5	0.144
2BFE	1.69	5.5	0.150
1U5B	1.83	5.8	0.156

A new Problem turned out here, all of the x-Ray structures contain gaps, which some of the tools can't handle. As the ph-values as well as the R-values do not differ too much, we have decided to use the structure with the least gaps: 1U5B. The next step was, to prepare our list of SNP's, and substitute those, that are not contained in the pdb-sequence. All pdb-files cover position 46-445. This means we had to substitute the SNP L17F. L17F is not listed in the HGMD, and for this seen as neutral, what means we had to chose another neutral SNP for it. We've chosen A71G. The new list of SNP's then is:

N71S, M82I, Q125E, I213T, C258Y, T310R, A328T, I361V, N404S, R429H

The picture shows the 10 SNP's colored in blue on the chain A of obda_human in green. The structure is shown with pymol using the modified PDB file.

obda_human has 4 annotated sites in the uniprot entry P12694:

pos	function	snp
157-159	Thiamine pyrophosphate binding	-
206	metal binding	-
211	metal binding	-
212	metal binding	-

tools

SCWRL

It is possible to give SCWRL the mutated sequence.
This can be done by extracting the sequence with repairPDB.
Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file.
This sequence file can be read in by SCWRL using the –s flag. Check if only the mutation side chain has been changed.

the extraction of the sequence and a mapping of positions between snps and pdb-sequence can be found in the journal.

We were not able to recognize any changes in structure with pymol, for each snp. Even after aligning every structure to the underlying 1u5b, there were no differences between the structures, but the mutated residues.

predicted structures, with mutated position marked in blue:

mutation N71S, structure predicted by SCWRL4	mutation N71S, structure predicted by SCWRL4, mutated residue in blue, base structure red	mutation M82I, structure predicted by SCWRL4	mutation M82I, structure predicted by SCWRL4, mutated residue in blue, base structure red
mutation Q125E, structure predicted by SCWRL4	mutation Q125E, structure predicted by SCWRL4, mutated residue in blue, base structure red	mutation I213T, structure predicted by SCWRL4	mutation I213T, structure predicted by SCWRL4, mutated residue in blue, base structure red
mutation C258Y, structure predicted by SCWRL4	mutation C258Y, structure predicted by SCWRL4, mutated residue in blue, base structure red	mutation T310R, structure predicted by SCWRL4	mutation T310R, structure predicted by SCWRL4, mutated residue in blue, base structure red
mutation A328T, structure predicted by SCWRL4	mutation A328T, structure predicted by SCWRL4, mutated residue in blue, base structure red	mutation I361V, structure predicted by SCWRL4	mutation I361V, structure predicted by SCWRL4, mutated residue in blue, base structure red
mutation N404S, structure predicted by SCWRL4	mutation N404S, structure predicted by SCWRL4, mutated residue in blue, base structure red	mutation R429H, structure predicted by SCWRL4	mutation R429H, structure predicted by SCWRL4, mutated residue in blue, base structure red

foldX

The first thing we observed when looking at the energy comparison between the wild type and the respective mutants was that introducing a mutation at a specific position did not cause a homogeneous increase or decrease in all energy aspects. The C258Y Mutation for example decreases both the Van der Waals and Solvation hydrophobic energies, while strongly increasing the energy due to Van der Waals clashes. As different energy aspects mights be more or less important for a specific position inside a protein in order to conserve function, in an ideal case these should be selected to determine the effect of a mutation. While this might be possible when expert knowledge about the protein is available, it most likely is not when applying the method to a large dataset, or when a fast classification is required. In this case the easiest option would be to just look at the total energy score. We decided to go with this approach and selected an increase of 1.0 in total energy as our classification margin for impaired function. According to this we consider the following mutations as deleterious: Q125E, I213T, C258Y, T310R, A328T.

Mutation	total energy	Backbone Hbond	Sidechain Hbond	Van der Waals	Electrostatics	Solvation Polar	Solvation Hydrophobic	Van der Waals clashes	entropy sidechain	entropy mainchain	torsional clash	backbone clash	helix dipole	energy Ionisation
N71S	0.47	0.13	0.13	0.12	0.00	-0.16	0.21	0.00	-0.16	0.20	-0.00	-0.02	0.00	0.00
M82I	-0.12	-0.00	0.00	0.33	-0.03	-0.15	0.24	0.06	-0.31	-0.39	0.12	0.04	0.00	0.00
Q125E	1.50	-0.01	3.18	-0.05	-0.70	-0.04	-0.05	0.15	-1.58	0.07	-0.03	-0.03	0.55	0.00
I213T	2.21	-0.38	-0.49	1.04	0.00	0.44	2.54	-0.08	-0.49	-0.38	0.01	-0.44	0.00	0.00
C258Y	4.55	0.01	0.17	-1.78	0.07	1.11	-2.84	7.22	0.31	0.01	0.27	0.90	-0.00	0.00
T310R	8.51	0.02	0.28	-1.67	0.13	3.01	-1.29	4.89	1.30	0.04	2.22	0.11	-0.42	0.00
A328T	2.77	-0.40	-0.41	-0.63	-0.01	1.14	-0.63	2.76	0.51	-0.92	1.36	-0.01	0.00	0.00
I361V	0.50	0.01	-0.01	0.48	0.00	-0.28	0.86	-0.04	-0.55	0.02	0.01	-0.07	0.00	-0.01
N404S	0.20	0.01	0.20	0.37	0.00	-0.48	0.50	-0.02	-0.25	-0.22	0.09	-0.11	0.00	0.00
R429H	-0.08	-0.14	-0.19	0.37	0.14	-0.50	0.40	0.03	-0.43	0.26	-0.19	-0.20	-0.07	0.25

We removed the rows sloop_entropy, mloop_entropy, cis_bond, water bridge, disulfide, electrostatic kon, partial covalent bonds, Entropy Complex. These rows containes zeroes for all entries. For the complete output please see the here.

Minimise

For the minimise energy calculation we observed that results based on FoldX calculations scored a lot better than the SCWRL results. While the FoldX results are roughly on the same scale as the results of the whild type, the SCWRL results throughout all runs and SNPs receive an energy score that is about 2000 points highter. However the FoldX results seem to reach a peak after three runs and actually decrease in quality after that, while the SCWRL prediction keep improving the energy score with each run (as can be seen in Diagram 1). We are not sure why the minimise scores for SCWRL and FoldX do not seem to be comparable. For the purpose of determining the deleterious mutations we will only use the FoldX energies untill we find further information about this. As the energeticaly best conformations for FoldX seem to be after three runs we will use those.

In the third run there are only four SNPs that that are scored worse thane the original protein structure. These are FoldX_I213T (-7.725526), FoldX_C258Y (-44.782584), FoldX_310R (-103.757216) and FoldX_N404S(-2.489363). As with the FoldX evaluation we will not consider results that are only slightly above or below the zero point and thus only consider FoldX_C258Y and FoldX_310R to be deleterious.

Diagram 1: Graphical representation of minimise energy scores. Green the original protein, red the SCWRL results, blue the FoldX results. Number of runs on the X-Axis. Energy scores on the Y-axis.

Table 1: Absolute calculated energy values from minimise (smaller is better).

Protein	run 1	run 2	run 3	run 4	run 5
Original Protein	-166.145856	-8144.2007	-8775.354687	-8764.146148	-8740.855895
FoldX_N71S	-254.098857	-8154.570333	-8779.05239	-8773.230437	-8736.978125
SCWRL_N71S	-682.78865	-4603.39178	-6239.51939	-6556.667909	-6798.138952
FoldX_M82I	-227.589089	-8173.810999	-8802.978465	-8771.965885	-8744.953646
SCWRL_M82I	-380.259836	-4582.016239	-6239.561724	-6600.661303	-6785.40363
FoldX_Q125E	-152.379919	-8161.499581	-8816.728299	-8796.535143	-8772.86389
SCWRL_Q125E	-369.726206	-4576.418424	-6216.02576	-6578.592414	-6771.381762
FoldX_I213T	-163.439225	-8137.431406	-8767.629161	-8765.877114	-8734.197951
SCWRL_I213T	-608.199626	-4492.098614	-6220.769096	-6558.093174	-6747.567668
FoldX_C258Y	-8090.541777	-8747.079816	-8730.572103	-8693.773636	-8569.396593
SCWRL_C258Y	-185.681057	-4294.940201	-6084.80686	-6484.813027	-6723.555507
FoldX_310R	-8052.52998	-8669.880924	-8671.597471	-8560.7029	-8425.777732
SCWRL_T310R	-327.616106	-4578.192299	-6240.169932	-6600.503779	-6783.873166
FoldX_A328T	-107.432097	-8150.500105	-8783.702419	-8772.925438	-8751.201765
SCWRL_A328T	-356.274445	-4547.835117	-6211.960089	-6569.427074	-6774.99665
FoldX_I361V	-248.788986	-8152.466336	-8784.533626	-8758.586197	-8721.483728
SCWRL_I361V	-372.510334	-4579.105914	-6219.76695	-6586.177201	-6779.197462
FoldX_N404S	-162.623248	-8143.163685	-8772.865324	-8761.882377	-8744.860863
SCWRL_N404S	-370.681577	-3462.384312	-5731.034623	-6197.2512	-6739.421858
FoldX_R429H	-220.978199	-8218.048082	-8852.269333	-8845.305536	-8826.735282
SCWRL_R429H	-448.084645	-4645.512218	-6325.30034	-6686.569986	-6873.313312

Table 2: Energy Values relative to the WT Protein (larger is better).

Protein	run 1	run 2	run 3	run 4	run 5
Original Protein	0.0	0.0	0.0	0.0	0.0
FoldX_N71S	87.953001	10.369633	3.697703	9.084289	-3.87777
SCWRL_N71S	516.642794	-3540.80892	-2535.835297	-2207.478239	-1942.716943
FoldX_M82I	61.443233	29.610299	27.623778	7.819737	4.097751
SCWRL_M82I	214.11398	-3562.184461	-2535.792963	-2163.484845	-1955.452265
FoldX_Q125E	-13.765937	17.298881	41.373612	32.388995	32.007995
SCWRL_Q125E	203.58035	-3567.782276	-2559.328927	-2185.553734	-1969.474133
FoldX_I213T	-2.706631	-6.769294	-7.725526	1.730966	-6.657944
SCWRL_I213T	442.05377	-3652.102086	-2554.585591	-2206.052974	-1993.288227
FoldX_C258Y	7924.395921	602.879116	-44.782584	-70.372512	-171.459302
SCWRL_C258Y	19.535201	-3849.260499	-2690.547827	-2279.333121	-2017.300388
FoldX_310R	7886.384124	525.680224	-103.757216	-203.443248	-315.078163
SCWRL_T310R	161.47025	-3566.008401	-2535.184755	-2163.642369	-1956.982729
FoldX_A328T	-58.713759	6.299405	8.347732	8.77929	10.34587
SCWRL_A328T	190.128589	-3596.365583	-2563.394598	-2194.719074	-1965.859245
FoldX_I361V	82.64313	8.265636	9.178939	-5.559951	-19.372167
SCWRL_I361V	206.364478	-3565.094786	-2555.587737	-2177.968947	-1961.658433
FoldX_N404S	-3.522608	-1.037015	-2.489363	-2.263771	4.004968
SCWRL_N404S	204.535721	-4681.816388	-3044.320064	-2566.894948	-2001.434037
FoldX_R429H	54.832343	73.847382	76.914646	81.159388	85.879387
SCWRL_R429H	281.938789	-3498.688482	-2450.054347	-2077.576162	-1867.542583

Gromacs

Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. 
Here we will get a basic introduction, which will also be useful in the MD task later on. 
First we will get all the necessary files and then we will minimize our protein in vacuum. 
Finally we will analyze the energies during the minimization.

Step 1 fetch-pdb

1.Use fetchpdb to get the pdb structure – look at the script, what does it do?

This script in meant to fetch pdb-files from the server. After a check was passed if the given pdb-file-name is valid, the archive is downloaded, extracted and everything but the pdb-file is removed. In our case fetch-pdb was not used, because we had to use our prepared pdb-file without the gaps in it.

repair pdb

2.Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available.  Make sure you chose the right chain.

To extract the protein without side chains and solvent the options -noh and -nosol from repair pdb were used.

repairPDB 1u5b.pdb -noh -cleansol > 1u5b_noside.pdb

A complete list of parameters of repairPDB can be found in the journal.

SCWRL

For SCWRL the sequence had to be parsed out of the new pdb-file:

repairPDB 1u5b_noside.pdb -seq > 1u5b_sequence

All aminoacids were turned into small letters, and then SCWRL was run:

/opt/SS12-Practical/scwrl4/Scwrl4 -i 1u5b_clean.pdb -s 1u5b-sequence -o 1u5b_scwrl.pdb

After SCWRL, the hydrogens had to be removed again:

repairPDB 1u5b_scwrl.pdb -noh -cleansol > 1u5b_scwrl_noside.pdb

pdb2gmx

We have chosen to use the model amber03 for the first forcefield.

/opt/SS12-Practical/gromacs/bin/pdb2gmx -f 1u5b_scwrl_noside.pdb -o 1u5b_gromacs.out -p 1u5b_gromacs.top -water tip3p -ff amber03

MDP - file creation

title = PBSA minimization in vacuum
cpp = /usr/bin/cpp
define = -DFLEXIBLE -DPOSRES
implicit_solvent = GBSA
integrator = steep
emtol = 1.0
nsteps = 500
nstenergy = 1
energygrps = System
ns_type = grid
coulombtype = cut-off
rcoulomb = 1.0
rvdw	 = 1.0
constraints = none
pbc = no

integrator - steep	A steepest descent algorithm for energy minimization. The maximum step size is emstep [nm], the tolerance is emtol [kJ mol−1 nm−1 ]
emtol - 1.0	the minimization is converged when the maximum force is smaller than this value
nsteps - 10-5000	maximum number of steps to integrate or minimize, -1 is no maximum
nstenergy - 1	frequency to write energies to energy file, the last energies are always written, should be a multiple of nstcalcenergy
energygrps - system	group(s) to write to energy file
ns_type - grid	Make a grid in the box and only check atoms in neighboring grid cells when construct- ing a new neighbor list every nstlist steps. In large systems grid search is much faster than simple search
coulombtype cut-off	Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb≥rlist
rcoulomb - 1.0	distance for the Coulomb cut-off
rvdw	distance for the LJ or Buckingham cut-off
constraints = none	No constraints except for those defined explicitly in the topology, i.e. bonds are rep- resented by a harmonic (or other) potential or a Morse potential (depending on the setting of morse) and angles by a harmonic (or other) potential
pbc - no	Remove the periodicity (make molecule whole again)

grompp

/opt/SS12-Practical/gromacs/bin/grompp -v -f config.mdp -c 1u5b_gromacs.out.gro -p 1u5b_gromacs.top -o 1u5b.tpr

mdrun

/opt/SS12-Practical/gromacs/bin/mdrun -v -deffnm 1u5b.tpr

analysis

/opt/SS12-Practical/gromacs/bin/g_energy -f 1u5b.tpr.edr -o energy1.xvg

nsteps vs time

nsteps	performed steps	real time	user time	sys-time
10	10	0m 1.886s	0m 2.290s	0. 0.930s
50	50	0m 6.516s	0m 9.800s	0m 2.680s
100	100	0m 12.319s	0m 19.030s	0m 4.980s
200	200	0m 23.972s	0m 37.650s	0m 9.510s
500	345	0m 40.979s	1m 4.630s	0m 16.160s
1000	345	0m 40.579s	1m 4.680s	0m 15.860s
2000	345	0m 40.680s	1m 4.540s	0m 16.220s
5000	345	0m 40.581s	1m 4.030s	0m 16.520s

The plot shows how much time(y-axis) was consumed to perform a certain number of steps(x-axis) in mdrun. The maximum number of steps for our pdb-file was reached at 345 steps.

The plot shows how much time was consumed to perform one run with a certain amount of maximum steps of mdrun. y-axis->time, x-axis->step-threshold

Bond Angle Potential

Bond

force-field	Average	Err.Est.	RMSD	Tot-Drift
AMBER03	1568.32	830	4604.24	-4990.8 (kJ/mol)
OPLSAA	1462.75	760	nan	-4456.78 (kJ/mol)
charmm27	2645.82	1200	6332.19	-7315.03 (kJ/mol)

Potential

force-field	Average	Err.Est.	RMSD	Tot-Drift
AMBER03	25037.6	74000	960847	-460948 (kJ/mol)
OPLSAA	-20932.9	65000	nan	-388340 (kJ/mol)
charmm27	25785.2	82000	952822	-518882 (kJ/mol)

Angle

force-field	Average	Err.Est.	RMSD	Tot-Drift
AMBER03	3412.08	99	356.099	-538.273 (kJ/mol)
OPLSAA	3038.18	150	nan	-876.033 (kJ/mol)
charmm27	-	-	-	-

Comparison of wt - mut

The conmparison of wt and mut was made using the force-field AMBER03 only.

mutation	Bond( AVG )	Angle( AVG )	Potential	Potential delta
wt	1568.32	3412.08	25037.6	-
N71S	1605.55	3414.94	28654.3	-3616.7
M82I	1535.03	3414.41	21683.7	3353.9
Q125E	1589.75	3415.99	27132	-2094.4
I213T	1478	3401.81	16715.4	8322.2
C258Y	1372.57	3427.22	7933.73	17103.87
T310R	1335.64	3420.59	10829.8	14207.8
A328T	1924.29	3465.46	56637.7	-31600.1
I361V	1532.34	3405.29	22019.1	3018.5
N404S	1290.27	3381.86	-757.702	25795.302
R429H	1546.64	3485.38	23932.6	1105

The change of the potential energy of the protein, caused by the mutations shall give a hint about the effect of the SNP on the function of the protein. Whereas the mutations N71S, M82I, Q125E, I213T, I361V, and R429H only have a small change of potential energy, and for this won't affect the protein's function too much, the rest (C258Y, T310R, A328T, N404S) are predicted to be damaging, with a changing potential energy of > 8000 kj. As there is no certain cutoff for this effect, this is a personal interpretion of the values.

Conclusion

To come to our final conclusion we first sort out the cases in which we have a consensus through all applied methods. Due to the little recognizable differences between the structures calculated by SCWRL, we will leave this rating out. The SNPs N71S,M82I, I361V and R429H were all classified to be benign while C258Y and T310R both were considered deleterious by all methods used.

For the remaining four Mutations Q125E, I213T, A328T and N404S the methods disagree.

Mutation	prediction SCWRL	prediction FOLDX	prediction GROMACS	prediction minimise	concluding prediction	listed in HGMD
N71S	-	Benign	Benign	Benign	Benign	no
M82I	-	Benign	Benign	Benign	Benign	no
Q125E	-	Deleterious	Benign	Benign	Benign	yes
I213T	-	Deleterious	Benign	Benign	Benign	yes
C258Y	-	Deleterious	Deleterious	Deleterious	Deleterious	yes
T310R	-	Deleterious	Deleterious	Deleterious	Deleterious	yes
A328T	-	Deleterious	Deleterious	Benign	Deleterious	yes
I361V	-	Benign	Benign	Benign	Benign	no
N404S	-	Benign	Deleterious	Benign	Benign	no
R429H	-	Benign	Benign	Benign	Benign	no

According to this result table, we predicted 3/5 deleterious SNP's, and 5/5 Benign correct. This gives an accuracy of 80%, which is a remarcably high value for a structural prediction, especially knowing that we archieved the same accuracy from sequence prediction.

References

Task 7: MSUD - Structure-based mutation analysis

Contents

Task description

preparation

tools

SCWRL

foldX

Minimise

Gromacs

Step 1 fetch-pdb

repair pdb

SCWRL

pdb2gmx

MDP - file creation

grompp

mdrun

analysis

nsteps vs time

Bond Angle Potential

Bond

Potential

Angle

Comparison of wt - mut

Conclusion

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools