Task 7: MSUD - Structure-based mutation analysis
This weeks journal can be found here.
First an x-Ray pdb structure<ref>http://www.uniprot.org/uniprot/P12694</ref> file had to be chosen for our protein.
It is important that the structure has a high resolution (small Å value); furthermore the R-factor should be as small as possible, and the higher the coverage the better. Also, check at which pH-value the structure was resolved; ideally you want physiological pH (7.4).
We first looked at the structure with the best resolution: 2BFD. It contains chain A and B of the protein. For us only chain A is of interest, here the file covers 400 residues( 90% of the uniprot sequence ). The structure was generated the 12th of april in 2004, using a PH-value of 5.5. Although these values would be ok, we compared them to the next best structures( looking at the resolution ):
A new Problem turned out here, all of the x-Ray structures contain gaps, which some of the tools can't handle. As the ph-values as well as the R-values do not differ too much, we have decided to use the structure with the least gaps: 1U5B. The next step was, to prepare our list of SNP's, and substitute those, that are not contained in the pdb-sequence. All pdb-files cover position 46-445. This means we had to substitute the SNP L17F. L17F is not listed in the HGMD, and for this seen as neutral, what means we had to chose another neutral SNP for it. We've chosen A71G. The new list of SNP's then is:
N71S, M82I, Q125E, I213T, C258Y, T310R, A328T, I361V, N404S, R429H
odba_human has 4 annotated sites in the uniprot entry P12694:
|157-159||Thiamine pyrophosphate binding||-|
It is possible to give SCWRL the mutated sequence. This can be done by extracting the sequence with repairPDB. Then you change all letters to lower case. Next you introduce the new amino acid letter (mutation) in capital letters to the sequence file. This sequence file can be read in by SCWRL using the –s flag. Check if only the mutation side chain has been changed.
the extraction of the sequence and a mapping of positions between snps and pdb-sequence can be found in the journal.
We were not able to recognize any changes in structure with pymol, for each snp. Even after aligning every structure to the underlying 1u5b, there were no differences between the structures, but the mutated residues.
predicted structures, with mutated position marked in blue:
The first thing we observed when looking at the energy comparison between the wild type and the respective mutants was that introducing a mutation at a specific position did not cause a homogeneous increase or decrease in all energy aspects. The C258Y Mutation for example decreases both the Van der Waals and Solvation hydrophobic energies, while strongly increasing the energy due to Van der Waals clashes. As different energy aspects mights be more or less important for a specific position inside a protein in order to conserve function, in an ideal case these should be selected to determine the effect of a mutation. While this might be possible when expert knowledge about the protein is available, it most likely is not when applying the method to a large dataset, or when a fast classification is required. In this case the easiest option would be to just look at the total energy score. We decided to go with this approach and selected an increase of 1.0 in total energy as our classification margin for impaired function. According to this we consider the following mutations as deleterious: Q125E, I213T, C258Y, T310R, A328T.
|Mutation||total energy||Backbone Hbond||Sidechain Hbond||Van der Waals||Electrostatics||Solvation Polar||Solvation Hydrophobic||Van der Waals clashes||entropy sidechain||entropy mainchain||torsional clash||backbone clash||helix dipole||energy Ionisation|
We removed the rows sloop_entropy, mloop_entropy, cis_bond, water bridge, disulfide, electrostatic kon, partial covalent bonds, Entropy Complex. These rows containes zeroes for all entries. For the complete output please see the here.
For the minimise energy calculation we observed that results based on FoldX calculations scored a lot better than the SCWRL results. While the FoldX results are roughly on the same scale as the results of the whild type, the SCWRL results throughout all runs and SNPs receive an energy score that is about 2000 points highter. However the FoldX results seem to reach a peak after three runs and actually decrease in quality after that, while the SCWRL prediction keep improving the energy score with each run (as can be seen in Diagram 1). We are not sure why the minimise scores for SCWRL and FoldX do not seem to be comparable. For the purpose of determining the deleterious mutations we will only use the FoldX energies untill we find further information about this. As the energeticaly best conformations for FoldX seem to be after three runs we will use those.
In the third run there are only four SNPs that that are scored worse thane the original protein structure. These are FoldX_I213T (-7.725526), FoldX_C258Y (-44.782584), FoldX_310R (-103.757216) and FoldX_N404S(-2.489363). As with the FoldX evaluation we will not consider results that are only slightly above or below the zero point and thus only consider FoldX_C258Y and FoldX_310R to be deleterious.
Table 1: Absolute calculated energy values from minimise (smaller is better).
Table 2: Energy Values relative to the WT Protein (larger is better).
Gromacs is a powerful molecular dynamics package that can be used to simulate nearly any atomic system at different levels of accuracy. Here we will get a basic introduction, which will also be useful in the MD task later on. First we will get all the necessary files and then we will minimize our protein in vacuum. Finally we will analyze the energies during the minimization.
Step 1 fetch-pdb
1.Use fetchpdb to get the pdb structure – look at the script, what does it do?
This script in meant to fetch pdb-files from the server. After a check was passed if the given pdb-file-name is valid, the archive is downloaded, extracted and everything but the pdb-file is removed. In our case fetch-pdb was not used, because we had to use our prepared pdb-file without the gaps in it.
2.Use repairPDB to clean the PDB and extract the protein only - describe what options you chose and what other options are available. Make sure you chose the right chain.
To extract the protein without side chains and solvent the options -noh and -nosol from repair pdb were used.
repairPDB 1u5b.pdb -noh -cleansol > 1u5b_noside.pdb
A complete list of parameters of repairPDB can be found in the journal.
For SCWRL the sequence had to be parsed out of the new pdb-file:
repairPDB 1u5b_noside.pdb -seq > 1u5b_sequence
All aminoacids were turned into small letters, and then SCWRL was run:
/opt/SS12-Practical/scwrl4/Scwrl4 -i 1u5b_clean.pdb -s 1u5b-sequence -o 1u5b_scwrl.pdb
After SCWRL, the hydrogens had to be removed again:
repairPDB 1u5b_scwrl.pdb -noh -cleansol > 1u5b_scwrl_noside.pdb
We have chosen to use the model amber03 for the first forcefield.
/opt/SS12-Practical/gromacs/bin/pdb2gmx -f 1u5b_scwrl_noside.pdb -o 1u5b_gromacs.out -p 1u5b_gromacs.top -water tip3p -ff amber03
MDP - file creation
title = PBSA minimization in vacuum cpp = /usr/bin/cpp define = -DFLEXIBLE -DPOSRES implicit_solvent = GBSA integrator = steep emtol = 1.0 nsteps = 500 nstenergy = 1 energygrps = System ns_type = grid coulombtype = cut-off rcoulomb = 1.0 rvdw = 1.0 constraints = none pbc = no
|integrator - steep||A steepest descent algorithm for energy minimization. The maximum step size is emstep [nm], the tolerance is emtol [kJ mol−1 nm−1 ]|
|emtol - 1.0||the minimization is converged when the maximum force is smaller than this value|
|nsteps - 10-5000||maximum number of steps to integrate or minimize, -1 is no maximum|
|nstenergy - 1||frequency to write energies to energy file, the last energies are always written, should be a multiple of nstcalcenergy|
|energygrps - system||group(s) to write to energy file|
|ns_type - grid||Make a grid in the box and only check atoms in neighboring grid cells when construct-
ing a new neighbor list every nstlist steps. In large systems grid search is muchfaster than simple search
|coulombtype cut-off||Twin range cut-off’s with neighborlist cut-off rlist and Coulomb cut-off rcoulomb, where rcoulomb≥rlist|
|rcoulomb - 1.0||distance for the Coulomb cut-off|
|rvdw||distance for the LJ or Buckingham cut-off|
|constraints = none||No constraints except for those defined explicitly in the topology, i.e. bonds are rep-
resented by a harmonic (or other) potential or a Morse potential (depending on thesetting of morse) and angles by a harmonic (or other) potential
|pbc - no||Remove the periodicity (make molecule whole again)|
/opt/SS12-Practical/gromacs/bin/grompp -v -f config.mdp -c 1u5b_gromacs.out.gro -p 1u5b_gromacs.top -o 1u5b.tpr
/opt/SS12-Practical/gromacs/bin/mdrun -v -deffnm 1u5b.tpr
/opt/SS12-Practical/gromacs/bin/g_energy -f 1u5b.tpr.edr -o energy1.xvg
nsteps vs time
|nsteps||performed steps||real time||user time||sys-time|
|10||10||0m 1.886s||0m 2.290s||0. 0.930s|
|50||50||0m 6.516s||0m 9.800s||0m 2.680s|
|100||100||0m 12.319s||0m 19.030s||0m 4.980s|
|200||200||0m 23.972s||0m 37.650s||0m 9.510s|
|500||345||0m 40.979s||1m 4.630s||0m 16.160s|
|1000||345||0m 40.579s||1m 4.680s||0m 15.860s|
|2000||345||0m 40.680s||1m 4.540s||0m 16.220s|
|5000||345||0m 40.581s||1m 4.030s||0m 16.520s|
Bond Angle Potential
Comparison of wt - mut
The conmparison of wt and mut was made using the force-field AMBER03 only.
|mutation||Bond( AVG )||Angle( AVG )||Potential||Potential delta|
The change of the potential energy of the protein, caused by the mutations shall give a hint about the effect of the SNP on the function of the protein. Whereas the mutations N71S, M82I, Q125E, I213T, I361V, and R429H only have a small change of potential energy, and for this won't affect the protein's function too much, the rest (C258Y, T310R, A328T, N404S) are predicted to be damaging, with a changing potential energy of > 8000 kj. As there is no certain cutoff for this effect, this is a personal interpretion of the values.
To come to our final conclusion we first sort out the cases in which we have a consensus through all applied methods. Due to the little recognizable differences between the structures calculated by SCWRL, we will leave this rating out. The SNPs N71S,M82I, I361V and R429H were all classified to be benign while C258Y and T310R both were considered deleterious by all methods used.
For the remaining four Mutations Q125E, I213T, A328T and N404S the methods disagree.
|Mutation||prediction SCWRL||prediction FOLDX||prediction GROMACS||prediction minimise||concluding prediction||listed in HGMD|
According to this result table, we predicted 3/5 deleterious SNP's, and 5/5 Benign correct. This gives an accuracy of 80%, which is a remarcably high value for a structural prediction, especially knowing that we archieved the same accuracy from sequence prediction.