Structure-based mutation analysis GLA

From Bioinformatikpedia
Revision as of 14:44, 14 August 2011 by Grandke (talk | contribs)

by Benjamin Drexler and Fabian Grandke

Introduction

In this task we analyse the structure of our protein to find out what effects the point mutations have. Therefor we created mutated structures and compared them to the wild-type protein. Several tools based on different methods have been used to achieve that aim. We used the mutations that we have chosen in the previous task.

Methods

In the first step of this task we had to find available protein structures for our Protein and to decide which one would be the best for our detailed analysis. We set several cut-offs to exclude improper structures.

SCWRL

SCWRL was initially developed by Dunbrack et al. in 1997. We use SCWRL4<ref name=dunb>G. G. Krivov, M. V. Shapovalov, and R. L. Dunbrack, Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins (2009)</ref> which was published in 2009. The program takes a PDB file and a sequence file as input. By usage of a rotamer library, collision detection, and a residue interaction graph the optimal side-chain conformation is calculated, based on the backbone and the mutated sequence given in the input files. The output is a PDB file containing the conformation and the total minimal energy of the graph in STDOUT.

FoldX

FoldX was developed by Serrano et al. in 2002<ref name=serr>Guerois R, Nielsen JE, Serrano L., Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutation. Journal of Molecular Biology (2002)</ref>. We used version FoldX 3.0 beta 4. The program provides the calculation of determination of energy effects of point mutations. It provides different run modes, but basically it takes a PDB file as input calculates several single energies(e.g. Van der Waals, Electrostatics, ...) and returns the single energies together with the total energy as output.

Minimise

Before this tool from the virtual box was used we had to remove the hydrogens and waters from the PDB file with the script repairPDB. Afterwards we were able to compare the energies differences between the wildtype and the mutated protein.

GROMACS

Results

Structure Selection

There are several structure files available for our protein:

PDB ID Resolution [Å] ph-Value R-Factor Coverage [%] Missing Residues
1R46 3.25 8.0 0.262 99.7 422-429
1R47 3.45 8.0 0.285 99.5 422-429
3GXN 3.01 NULL 0.239 88.08 422-429
3GXP 2.20 NULL 0.204 81.9 422-429
3GXT 2.70 NULL 0.245 97.29 422-429
3HG2 2.30 4.6 0.178 97.32 422-429
3HG3 1.90 6.5 0.167 98.64 427-435
3HG4 2.30 4.6 0.166 99.86 422-429
3HG5 2.30 4.6 0.192 100 422-429
3LX9 2.04 6.5 0.178 98.92 423-435
3LXA 3.04 6.5 0.216 99.52 427-435
3LXB 2.85 6.5 0.227 99.3 427-435
3LXC 2.35 6.5 0.186 98.31 423-435

We set two cutoffs to decide which structures are excluded:

  • ph-value: < 6.5
  • resolution: > 2.7

After we applied the cutoffs to our set of structures three were left (exclusion factors are colored red in the table). One of them was slightly better than the other ones so we decided to use 3HG3 (worse values are colored gray in the table). Additionally 3GH3 has the best overall resolution and R-factor (colored green). As the missing residues are very similar for all structures they are not further taken into account.

Mutation Mapping

Energy Comparison

Number AA-Position Codon change Amino acid change SCWRL4 FoldX FoldX - Diff Minimise Minimise - Diff
WT - -20.93 - -20481.23 -
1 42 ATG-ACG Met -> Thr 343.25 157.29 -178.22 -20324.41 -156.82
2 65 AGT-ACG Ser -> Thr 327.798 152.87 -173.8 -20339.34 -141.89
3 117 ATT-AGT Ile -> Ser 333.027 157.97 -178.9 -20353.47 -127.76
4 143 cGCA-ACA Ala -> Thr 333.944 154.40 -175.33 -20339.32 -141.91
5 186 CAC-CGC His -> Arg 323.717 154.57 -175.5 -20321.32 -159.91
6 205 gCCT-ACT Pro -> Thr 340.619 155.96 -176.89 -20345.87 -135.36
7 244 gGAC-CAC Asp -> His 333.594 152.08 -173.01 -20393.12 -88.11
8 283 CAG-CCG Gln -> Pro 332.631 159.91 -180.84 -8027.71 -12453.52
8.2 - - - - - - -19134.48 -1346,95
9 321 tCAG-TAG Gln -> Glu 332.853 160.95 -181.88 -20246.98 -234.25
10 363 TATa-TAA Arg -> Cys 330.56 150.50 -171.43 -20295.77 -185.46

Gromacs

Figure 11: nstep vs. Elapsed Time in Gromacs.

Wildtype

Force Field Average Error Estimat RMSD Tot-Drift (kJ/mol)
Bond
AMBERGS 1826.99 420 4409.39 -2499.37
AMBER03 1639.74 410 4358.68 -2424.42
CHARMM27 2908.14 350 4779.8 -2033.44
Angle
AMBERGS 5496.47 74 476.18 408.548
AMBER03 5324.13 72 469.75 369.24
CHARMM27 7975.2 86 798.12 432.901
Potential
AMBERGS -114713 1200 5648.79 -7915.46
AMBER03 -91307.7 1200 5559.82 -7839.05
CHARMM27 136.699 32 64.3892 227.896


Mutations

Force Field Average Error Estimat RMSD Tot/Drift
Bond
1 1815.39 570 5166.85 -3384.48
2 1862.77 610 5331.85 -3618.04
3 1773.13 520 4937.34 -3068.93
4 1828.63 580 5229.18 -3479.09
5 1870.95 610 5361.67 -3713.22
6 1816.6 550 5091.81 -3303.34
7 1819.7 570 5173.34 -3397.07
8 2992.15 1700 -nan -10631.8
9 2083.16 830 -nan -4913.82
10 1867.42 620 5390.82 -3693.03
Angle
1 5183.95 85 360.959 550.303
2 5195.33 80 364.473 515.645
3 5196.5 89 353.256 586.473
4 5175.59 85 364.496 547.465
5 5113.99 81 365.511 526.244
6 5200.44 85 356.964 553.934
7 5261.77 87 365.202 565.196
8 5178.73 76 -nan 215.036
9 5201.95 76 -nan 442.141
10 5174.48 88 375.775 555.294
Potential
1 -90528.4 1600 7234.09 -10149.1
2 -90481.9 1600 7442.03 -10340
3 -90654 1500 6928.73 -9614.54
4 -90541 1600 7311.04 -10343.7
5 -91011.7 1600 7484.45 -10592.5
6 -90782.2 1600 7226.99 -10188.5
7 -90232.9 1600 7236.24 -10198
8 -87316 3600 -nan -23670.3
9 -90090.3 1900 -nan -12335.3
10 -89721.8 1700 7523.88 -10750.1

References

<references />