Structure-based mutation analysis TSD
No citations, I don't read THAT many books :P - Have some music instead ;)
The journal of this task can be found here here.
There are two structures resolved for the HEXA_HUMAN reference sequence used for the course of this practical. The struture of this Uniprot entry is found in the alpha-chains of the PDB-IDs and 2gjx and 2gk1 both form the same publication <ref name="hexa_pdb_ref">Lemieux, M., Mark, B., & Cherney, M. (2006). Crystallographic Structure of Human beta-Hexosaminidase A: Interpretation of Tay-Sachs Mutations and Loss of GM2 Ganglioside Hydrolysis. Journal of molecular biology, 359(4), 913-29. doi:10.1016/j.jmb.2006.04.004</ref>. Unfortunately a 14 residue stretch towards the N-terminus of the protein (residues 75 to 88) is unresolved in both structures. However as previously shown the alpha subunit of Hex A consists of two domains. The N-terminal domain, Glyco_hydro_20b, is not involved in catalysis. Therefore, to evade problems with the missing backbone in the course of this task, the structure was truncated to contain only the C-terminal, catalytic domain, Glyco_hydro_20. In concordance with previous tasks and based on the experimental data in for both structures, shown in <xr id="tbl:struct_comp"/>, 2gjx was chosen as the reference structure. Details on the alteration of the structure according to the measures described above can be found in the journal.
TODO mention, that this unforntunately does not include the (pseudo-)ligand and we therefore have to make assumptions on the hydrogen bonds...
As only disease causing SNPs were assigned, the week before, some mutations had to be replaced. Additionally the reference PDB structure limited the possibilities, as only the second domain was retained for the following analysis. Thus an almost new set of SNPs was chosen: R178H, R178C, P182L, D207E, S293I, F434L, L451V, E482K, L484Q, E506D. Their position in the 3D-structure of HEXA is shown in <xr id="fig:snpsOnstr"/>.
The chosen mutations are displayed in the context of the remaining domain, the active site and important residues. For a detailed description of the important residues, see introduction . There are 5 mutations occurring in loop regions and 5 in helix elements. One intersection between mutations and important residues is at position 207 where there is a mutations from D to E. At the other important residue within the mutations, namely position 178 two mutants were chosen for the SNP set. Mutations of the important residues are expected to have a severe effect on the protein function.
For the new mutation set the biochemical properties are listed in <xr id="tab:biochem"/>. They have been assembled according to the sequence based mutation analysis.
|R178H||-4.5 (polar)||173.4 (bulky)||positive||-3.2 (polar)||153.2 (bulky)||neutral||29|
|R178C||-4.5 (polar)||173.4 (bulky)||positive||2.5 (polar)||108.5 (small)||neutral||180|
|P182L||-1.6 (nonpolar)||112.7 (small)||neutral||3.8 (nonpolar)||166.7 (bulky)||neutral||98|
|D207E||-3.5 (polar)||111.1 (small)||negative||-3.5 (polar)||138.4 (bulky)||negative||45|
|S2931||-0.8 (polar)||89.0 (tiny)||neutral||4.5 (nonpolar)||166.7 (bulky)||neutral||142|
|F434L||2.8 (nonpolar)||189.9 (bulky)||neutral||3.8 (nonpolar)||166.7 (bulky)||neutral||22|
|L451V||3.8 (nonpolar)||166.7 (bulky)||neutral||4.2 (nonpolar)||140.0 (small)||neutral||32|
|E482K||-3.5 (polar)||138.4 (bulky)||negative||-3.9 (polar)||168.6 (bulky)||positive||56|
|L484Q||3.8 (nonpolar)||166.7 (bulky)||neutral||-3.5 (polar)||143.8 (bulky)||neutral||113|
|E506D||-3.5 (polar)||138.4 (bulky)||negative||-3.5 (polar)||111.1 (small)||negative||45|
<xr id="tbl:pymolstruc"/> displays the wildtype with the respective mutant structures generated by Pymol. Since these are only the manually chosen side chain conformations, selected from the ones given by PyMOL, most of the analysis can be found in the sections below based on the predictions by SCWRL and FoldX. Here a few general things:
SCWRL and FoldX
The SCWRL and FoldX mutant conformations are shown in <xr id="tbl:scwrlfoldx"/>.
R178 is one of the important residues. As can be seen in the according figure in <xr id="tbl:pymolstruc"/> it could form hydrogen bonds with the neighbouring residues D207 and E462. In addition it seems reasonable that these two neighbouring residues with the negative charges somehow coordinate the positive charges placed between them. More importantly R178 has already been shown to form a hydrogen bond with the ligand.
The mutant amino acid histidine, depending on the pH value, can also carry a positive charge. The reported optimum pH value for Hex A is around 4.4, in these conditions, histidine is sure to contain a positive charge <ref name="phforhexa">Grebner,E.E. et al. (1986) Two abnormalities of hexosaminidase A in clinically normal individuals. American journal of human genetics, 38, 505-14.</ref><ref name="histcharge">when is his pos charged, alberts TODO</ref>. However the larger problem under is that under these conditions the formation of a hydrogen bond with the ligand is very unlikely. This is reinforced by the recessed positioning of the residue relative to the active site. Even for the case of SCWRL (purple), where the hydrogen is included (and the ring flipped compared to FoldX), the distance could already be too long for coordination of the ligand. There are slight clashes of both the SCWRL as well as the FoldX structures with D175 however these are very minor, fixable and clearly not the main problem here.
For the mutation to cysteine, the problem is much more pronounced. There is no doubt that formation of the hydrogen bond to the ligand is not possible anymore due to the increase in distance. Clashes do not exist, but would actually be hard to create in any case.
In conclusion the modelling of both mutation side chains is mostly as good as possible and shows only minor differences between the two methods. The orientation of the histidine ring is however more convincing in the SCWRL result.
</figure> P182 is on the surface of the Hex A alpha subunit and proline act as a helix breaker. Breaking of the helix however is mandatory in this case, since the ensuing loop continues straight towards the active site, which is orthogonal to the direction of the alpha-helix. If a mutation of proline should lead to an elongation of the alpha-helix this would very likely have an effect on the protein's function since R178, found in the following loop, could not be placed accurately any more. However the mutation to leucine must not necessarily lead to this and as such is not necessarily disease-causing.
Another consideration should be made about the changed protein surface, which is slightly larger with leucine at position 182. This is the side of alpha subunit, which interacts with the beta-subunit of Hex A and is therefore important to consider, however, as <xr id="fig:p182surface"/> shows, there is a cavity at this position and the change to leucine should not have an effect on the interaction of the subunits.
There are differences between the sidechain placement in FoldX and SCWRL, however under the given surrounding conditions and due to the short sidechains none of the two can be considered better than the other. The only notable difference is that the small clashes seem to be better handled in the placement by SCWRL.
</figure> D207 is another residue considered important, since it forms a hydrogen bond with the "substitute ligand" NGT (c.f. important residues), however this is the same atom that is also reached by R178. The original publication of the crystal structure suggests, that R178 forms hydrogen bond with the ligand, while D207 interacts with the neighbouring residue H262, that acts as protein donor.
The mutant amino acid glutamate has chemical features very similar to the ones of the wildtype, which means that the interaction with histidine is in theory still possible. However the sidechain is longer which in this particular case is very problematic. The side chain placement of FoldX is basically an elongation of the wildtype. While this keeps the charged group close to the H262 there is clearly a clash with R178 that is not easily resolved. SCWRL places the side chain further away, which better evades the clashes, however to fully resolve them a further tilt of the charged group would bring into to close proximity to another neighbouring histidine. Additinally this placement is too far away from H262 to allow any interaction. As can be seen in <xr id="fig:d207ishard"/>, the space usage in the wildtype is near-perfect and there seems to be no choice for a functional placement of the mutant without introducing larger changes in the backbone that might have strong effects on the surrounding residues. Given that this is in the active site, the effects would likely be disease-causing.
S293 is found in a loop structure far away from the active site. It is found on the surface, however not interacting with the beta subunit of Hex A. The -OH group of serine can form hydrogen bonds with a neighbouring glutamate as well as an asparagine, both of which are not possible any more with the mutant amino acid. However none of these seem to be directly essential. The increased length of isoleucine over serine leads to a clashes with the glutamate, however tilting the Glu sidechain slightly upwards should be able to solve this problem. This would slightly change the protein's surface but should also no affect the binding to the beta subunit. The placement done by SCWRL and FoldX is near identical and from the structural observations this mutation would have to be considered non disease-causing.
F434 is found relatively close to the active site, but should not be able to exert an effect onto it. In addition there are no other aromatic rings that could play a role in pi-stacking. The residue is found on the surface of the protein but does not interact with the beta-subunit, therefore the formation of a small mole by the mutation to leucine should not have an effect on the binding. The side chain conformations by SCWRL and FoldX are almost identical, there are no clashes created and all in all, there is no indication that the mutation would lead to a non-functional protein and should therefore be regarded as non disease-causing.
L451 is found at the surface of the protein, opposite of the beta-subunit binding site and catalytic site. The side chain orientations between the two methods are similar, differing slightly in the tilting of the two -CH3 groups. In both cases there are no clashes with other atoms and the surface at the position becomes slightly recessed, compared to L451. However there is no indication for an effect of this change and the mutation is likely not disease causing.
E482 is not near the active site but actually at the face to the non-catalytic domain that had to be cut out at the beginning of the task. Since neither of the two methods have this information at their disposal both choose conformations that, although not the same, are severely clashing with side chains of the cut out domain. However, observation of the local environment suggests, that even when given information about atoms in the second domain, there is no possibility to arrange the very long lysine side chain in the pocket, that shows close packing and formation of several hydrogen bonds between side chains of the two domains. One of these is formed by E482 and would therefore be missing in the mutant as well. Taking all these problems together, although the residue is far away from the catalytic site, it seems possible that this mutation would disturb the folding and especially association between the two domains in such a strong manner that it is disease causing.
L484 is found buried inside protein, not close to the cut out domain, the active site or the binding site to the beta subunit. Introduction of a charged amino acid in this environment could pose a problem, especially during folding. In addition the increased length of the side chain, leads to clashes with neighbouring residues in the conformation chosen by SCWRL. The one given by FoldX is more convincing and should actually fit into the pocket. However this still means the introduction of a strong charge inside the protein where, in this case, hydrophobic residues like Trp, Phe or Gly dominate the environment. Therefore, although not immediately apparent, the mutation should be considered potentially disease-causing.
E506 is found at the surface of the protein, facing the cut out domain. However at this position there is not direct interaction between the domains. This mutation shows the largest disagreement in placement of the sidechain between FoldX and SCWRL: The C-beta atoms are placed almost identically at the position of the wildtype glutamate's C-beta atom. The C-gamma atom however is placed at the position of the wildtype one's as well by FoldX, while SCWRL places it in the almost opposite direction. E506 could form hydrogen bonds with two arginines in the vicinity, and the placement for E506D chosen by FoldX would allow at least one of those to occur as well, while the distance to the other arginine is likely too large due to the missing C-delta in aspartate. However these two arginines can also form hydrogen bonds to residues in the cut-out domain and their orientation suggest, that this is the formation found in reality. On the other hand, with the conformation chosen by SCWRL there are no interactions possible and no clashes occurring either. The slight increase in surface at this position is not a problem, since the cut-domain is far away here. Nonetheless, since the conformation of FoldX is closer to the original residue and would in theory allow hydrogen bonding, FoldX is more convincing in this case. A disease-causing effect is unlikely for this mutation, since there are apparently no important interactions, the other domain is too far away and the charge remains unchanged, should there be a deeper reasoning behind it.
It is hard to make a decision between FoldX and SCWRL based on the previous analyses. In many cases the methods mostly agree, then in several ones they do no, but it does not seem to matter any way. In those cases where an effect would be expected it is often the case that even looking at it manually does not suggest any solution at all. Therefore in the end there remain very few cases, some of which FoldX seems to perform better in and some of which speak for SCWRL. In the following SCWRL will be used since the slightly better performance on the residues that are important for catalysis, however a general statement of SCWRL being better than FoldX cannot be drawn here due to the limited amount of data.
Based on the observations made so far, the predictions for the effect of the mutations would be as shown in <xr id="tab:foldxscwrlpred"/>.
Foreach of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.
What happens regarding the energy?
The Gromacs analysis was conducted with the SCWRL results, as these were determined as slightly better than the FoldX models. Gromacs consists of a variety of programs which were implemented consecutively (pdb2gmx, grompp, mdrun, g_energy). The explanation for the MDP file necessary for the preparation for Gromacs can be found in the protocol.
Gromacs was implemented with the mutations from SCWRL but also with the wildtype, in this case the second domain of 2gjx chain A. <figure id="fig:runtime">
The wildtype analysis was used to get a feeling for the run time in dependence of the number of steps used for calculation, see <xr id="fig:runtime"/>. The number of implemented steps in the calculation converges around 300, which is in agreement with the time. The run time does never exceed 30 seconds. From this it can be concluded that the number of used steps has a upper bound beyond which no superioir calculation is performed.
The energies from bond angle and potential were calculated with the force fields amber03 amber99sb-ildn and charmm27. They are depicted in <xr id="tbl:gromacsenergies"/>. To show how similar the energies for bond angle and potential are through the force-fields they are displayed in one figure but with different shades. The force-fields show very similar behaviour for the energy calculation of bond and angle. Only the potential is different. Here the amber99sb-ildn yields results around 50,000 and the amber03 around 40,000. The charmm27 shows generally a similar trend as the amber03. Besides the differences in potential is becomes also evident that for the distinct force-fields different numbers of steps are necessary.
Whereas there are some small variations within one figure there are very little between apart from the number of steps. The wildtype calculations express the same trends as the mutations and thus there cannot be much inferred from the energy minimisation.
To have a closer look at the mutant minimal energies there average is displayed as well as their difference to the wildtype, see <xr id="tab:avg"/> and <xr id="tab:delta"/>. The difference between wildtype and mutant energy is denoted as delta Δ. As all the force-fields energy calculations express a very similar behaviour the numbers are only displayed for the amber03 force-field.
The average energies show that there is very variance within all results. It is not always the case that the wildtype receives a lower energy which would mean that the mutant was more favourable.
The difference between wildtype and mutant would be expected high for harmful mutations as this would express the confirmation change of the mutant region. A positive difference shows the mutants lower stability and a negative difference means that the mutant is more stable. The changes between wildtype und mutant energies are comparably little as the biggest change in potential is 565 for bond 23 and for angle 113.