Structure-based mutation analysis TSD

From Bioinformatikpedia


The journal of this task can be found here.

Structure preparation

There are two structures resolved for the HEXA_HUMAN reference sequence used for the course of this practical. The struture of this Uniprot entry is found in the alpha-chains of the PDB-IDs and 2gjx and 2gk1 both form the same publication <ref name="hexa_pdb_ref">Lemieux, M., Mark, B., & Cherney, M. (2006). Crystallographic Structure of Human beta-Hexosaminidase A: Interpretation of Tay-Sachs Mutations and Loss of GM2 Ganglioside Hydrolysis. Journal of molecular biology, 359(4), 913-29. doi:10.1016/j.jmb.2006.04.004</ref>. Unfortunately a 14 residue stretch towards the N-terminus of the protein (residues 75 to 88) is unresolved in both structures. However as previously shown the alpha subunit of Hex A consists of two domains. The N-terminal domain, Glyco_hydro_20b, is not involved in catalysis. Therefore, to evade problems with the missing backbone in the course of this task, the structure was truncated to contain only the C-terminal, catalytic domain, Glyco_hydro_20. In concordance with previous tasks and based on the experimental data in for both structures, shown in <xr id="tbl:struct_comp"/>, 2gjx was chosen as the reference structure. Details on the alteration of the structure according to the measures described above can be found in the journal.


<figtable id="tbl:struct_comp">

Table 1: Comparison of experimental parameters for the two resolved structures of UniProt entry P06865. The parameters are important for choosing the structcture used for the structure based mutation analysis. Coverage is the sequence residues covered by the structure according to UniProtKB. This however is based on the SEQRES data. It should be noted that there is an unresolved region (residues 75 to 88) in both structures.

PDB-ID:Chain Coverage Resolution (Å) R-value R-free pH
2gjx:A 23-529 2.8 0.270 0.288 5.5
2gk1:A 23-529 3.25 0.277 0.322 5.5

</figtable>


Mutations

As only disease causing SNPs were assigned, the week before, some mutations had to be replaced. Additionally the reference PDB structure limited the possibilities, as only the second domain was retained for the following analysis. Thus an almost new set of SNPs was chosen: R178H, R178C, P182L, D207E, S293I, F434L, L451V, E482K, L484Q, E506D. Their position in the 3D-structure of HEXA is shown in <xr id="fig:snpsOnstr"/>.


<figure id="fig:snpsOnstr">

Figure 1: SNPs highlighted on the HexA subunit structure. The mutations are displayed in red and the cut out domain is displayed in grey. The active site is displayed in orange and the important residues in yellow.

</figure>

The chosen mutations are displayed in the context of the remaining domain, the active site and important residues. For a detailed description of the important residues, see introduction . There are 5 mutations occurring in loop regions and 5 in helix elements. One intersection between mutations and important residues is at position 207 where there is a mutations from D to E. At the other important residue within the mutations, namely position 178 two mutants were chosen for the SNP set. Mutations of the important residues are expected to have a severe effect on the protein function.


For the new mutation set the biochemical properties are listed in <xr id="tab:biochem"/>. They have been assembled according to the sequence based mutation analysis.

<figtable id="tab:biochem">

Table 2: Biochemical properties.

Mutation Wildtype Mutant Grantham score
Hydrophobicity Volume Charge Hydrophobicity Volume Charge
R178H -4.5 (polar) 173.4 (bulky) positive -3.2 (polar) 153.2 (bulky) neutral 29
R178C -4.5 (polar) 173.4 (bulky) positive 2.5 (polar) 108.5 (small) neutral 180
P182L -1.6 (nonpolar) 112.7 (small) neutral 3.8 (nonpolar) 166.7 (bulky) neutral 98
D207E -3.5 (polar) 111.1 (small) negative -3.5 (polar) 138.4 (bulky) negative 45
S2931 -0.8 (polar) 89.0 (tiny) neutral 4.5 (nonpolar) 166.7 (bulky) neutral 142
F434L 2.8 (nonpolar) 189.9 (bulky) neutral 3.8 (nonpolar) 166.7 (bulky) neutral 22
L451V 3.8 (nonpolar) 166.7 (bulky) neutral 4.2 (nonpolar) 140.0 (small) neutral 32
E482K -3.5 (polar) 138.4 (bulky) negative -3.9 (polar) 168.6 (bulky) positive 56
L484Q 3.8 (nonpolar) 166.7 (bulky) neutral -3.5 (polar) 143.8 (bulky) neutral 113
E506D -3.5 (polar) 138.4 (bulky) negative -3.5 (polar) 111.1 (small) negative 45

</figtable>

Molecular Mechanics

<xr id="tbl:pymolstruc"/> displays the wildtype with the respective mutant structures generated by Pymol. Since these are only the manually chosen side chain conformations, selected from the ones given by PyMOL, most of the analysis can be found in the sections below based on the predictions by SCWRL and FoldX. Here a few general things:

The mutations of arginin at position 178 do not result in any clashes but they are both disruptive because the hydrogen bonding is disabled. With an elimination of proline the helixbreaker it is possible that the mutant leucine at position 182 could be disease causing. In the mutation D207E the wildtype and mutant are very similar in structure and chemical properties and thus this mutation seems to have little impact on the protein structure at first sight. S293I is a mutation on the protein surface that again is corrupting the hydrogen bonding. The mutations F434L and L451V do not cause any clashes or disruption of bond formations. E482K in contrast does interfere with a hydrogen bond and additionally the long chain of the mutant seems to be hard to place without interference with nearby amino acids. The last two mutations L484Q and E506D seem to fit well into their surrounding.


<figtable id="tbl:pymolstruc">

R178H
R178C
P182L
D207E
S293I
F434L
L451V
E482K
L484Q
E506D
Table 3: Overview of all SNPs with Pymol: Comparison wildtype and manually chosen mutant structure. The wildtype is colored light green (or if it is a important residue yellow respectively), the mutant red and polar contacts are displayed in blue.

</figtable>



SCWRL and FoldX

The SCWRL and FoldX mutant conformations are shown in <xr id="tbl:scwrlfoldx"/>.

<figtable id="tbl:scwrlfoldx">

R178H
R178C
P182L
D207E
S293I
F434L
L451V
E482K
L484Q
E506D
Table 4: Overview of all SNPs in the 3D-structure. Compared are the wildtype as well as the SCRWL and FoldX results on the mutation. The wildtype is colored light green (or if it is a important residue yellow respectively), the SCRWL modelling purple and the foldX modelling cyan. The polar contacts are displayed in blue.

</figtable>

R178H/C

R178 is one of the important residues. As can be seen in the according figure in <xr id="tbl:pymolstruc"/> it could form hydrogen bonds with the neighbouring residues D207 and E462. In addition it seems reasonable that these two neighbouring residues with the negative charges somehow coordinate the positive charges placed between them. More importantly R178 has already been shown to form a hydrogen bond with the ligand.

The mutant amino acid histidine, depending on the pH value, can also carry a positive charge. The reported optimum pH value for Hex A is around 4.4, in these conditions, histidine is sure to contain a positive charge <ref name="phforhexa">Grebner,E.E. et al. (1986) Two abnormalities of hexosaminidase A in clinically normal individuals. American journal of human genetics, 38, 505-14.</ref><ref name="histcharge">when is his pos charged, alberts TODO</ref>. However the larger problem is that under these conditions the formation of a hydrogen bond with the ligand is very unlikely. This is reinforced by the recessed positioning of the residue relative to the active site. Even for the case of SCWRL (purple), where the hydrogen is included (and the ring flipped compared to FoldX), the distance could already be too long for coordination of the ligand. There are slight clashes of both the SCWRL as well as the FoldX structures with D175 however these are very minor, fixable and clearly not the main problem here.

For the mutation to cysteine, the problem is much more pronounced. There is no doubt that formation of the hydrogen bond to the ligand is not possible anymore due to the increase in distance. Clashes do not exist, but would actually be hard to create in any case.

In conclusion the modelling of both mutation side chains is mostly as good as possible and shows only minor differences between the two methods. The orientation of the histidine ring is however more convincing in the SCWRL result.

P182L

<figure id="fig:p182surface">

Figure 2: Surface conditions in the reference structure near wildtype residue P182. The Hex A alpha subunit (2gjx:A) is shown in green, the beta subunit (2gjx:B) in blue. P182 is highlighted in orange. As can be seen, there is no interaction between the two subunits at the position of interest and the mutation to leucine should not have an effect dimer formation.

</figure> P182 is on the surface of the Hex A alpha subunit and proline act as a helix breaker. Breaking of the helix however is mandatory in this case, since the ensuing loop continues straight towards the active site, which is orthogonal to the direction of the alpha-helix. If a mutation of proline should lead to an elongation of the alpha-helix this would very likely have an effect on the protein's function since R178, found in the following loop, could not be placed accurately any more. However the mutation to leucine must not necessarily lead to this and as such is not necessarily disease-causing.

Another consideration should be made about the changed protein surface, which is slightly larger with leucine at position 182. This is the side of alpha subunit, which interacts with the beta-subunit of Hex A and is therefore important to consider, however, as <xr id="fig:p182surface"/> shows, there is a cavity at this position and the change to leucine should not have an effect on the interaction of the subunits.

There are differences between the sidechain placement in FoldX and SCWRL, however under the given surrounding conditions and due to the short sidechains none of the two can be considered better than the other. The only notable difference is that the small clashes seem to be better handled in the placement by SCWRL.

D207E

<figure id="fig:d207ishard">

Figure 3: Local packing in the reference structure near wildtype residue D207. Shown as spheres are, from left to right, H262, D207 and R178. The mutation D207E elongates the sidechain to a degree where it cannot be placed anywhere anymore without creating clashes or loosing H262 as a protein donor.

</figure> D207 is another residue considered important, since it forms a hydrogen bond with the "substitute ligand" NGT (c.f. important residues), however this is the same atom that is also reached by R178. The original publication of the crystal structure suggests, that R178 forms hydrogen bond with the ligand, while D207 interacts with the neighbouring residue H262, that acts as protein donor.

The mutant amino acid glutamate has chemical features very similar to the ones of the wildtype, which means that the interaction with histidine is in theory still possible. However the sidechain is longer which in this particular case is very problematic. The side chain placement of FoldX is basically an elongation of the wildtype. While this keeps the charged group close to the H262 there is clearly a clash with R178 that is not easily resolved. SCWRL places the side chain further away, which better evades the clashes, however to fully resolve them a further tilt of the charged group would bring into to close proximity to another neighbouring histidine. Additinally this placement is too far away from H262 to allow any interaction. As can be seen in <xr id="fig:d207ishard"/>, the space usage in the wildtype is near-perfect and there seems to be no choice for a functional placement of the mutant without introducing larger changes in the backbone that might have strong effects on the surrounding residues. Given that this is in the active site, the effects would likely be disease-causing.

S293I

S293 is found in a loop structure far away from the active site. It is found on the surface, however not interacting with the beta subunit of Hex A. The -OH group of serine can form hydrogen bonds with a neighbouring glutamate as well as an asparagine, both of which are not possible any more with the mutant amino acid. However none of these seem to be directly essential. The increased length of isoleucine over serine leads to a clashes with the glutamate, however tilting the Glu sidechain slightly upwards should be able to solve this problem. This would slightly change the protein's surface but should also no affect the binding to the beta subunit. The placement done by SCWRL and FoldX is near identical and from the structural observations this mutation would have to be considered non disease-causing.

F434L

F434 is found relatively close to the active site, but should not be able to exert an effect onto it. In addition there are no other aromatic rings that could play a role in pi-stacking. The residue is found on the surface of the protein but does not interact with the beta-subunit, therefore the formation of a small mole by the mutation to leucine should not have an effect on the binding. The side chain conformations by SCWRL and FoldX are almost identical, there are no clashes created and all in all, there is no indication that the mutation would lead to a non-functional protein and should therefore be regarded as non disease-causing.

L451V

L451 is found at the surface of the protein, opposite of the beta-subunit binding site and catalytic site. The side chain orientations between the two methods are similar, differing slightly in the tilting of the two -CH3 groups. In both cases there are no clashes with other atoms and the surface at the position becomes slightly recessed, compared to L451. However there is no indication for an effect of this change and the mutation is likely not disease causing.

E482K

E482 is not near the active site but actually at the face to the non-catalytic domain that had to be cut out at the beginning of the task. Since neither of the two methods have this information at their disposal both choose conformations that, although not the same, are severely clashing with side chains of the cut out domain. However, observation of the local environment suggests, that even when given information about atoms in the second domain, there is no possibility to arrange the very long lysine side chain in the pocket, that shows close packing and formation of several hydrogen bonds between side chains of the two domains. One of these is formed by E482 and would therefore be missing in the mutant as well. Taking all these problems together, although the residue is far away from the catalytic site, it seems possible that this mutation would disturb the folding and especially association between the two domains in such a strong manner that it is disease causing.

L484Q

L484 is found buried inside the protein, not close to the cut out domain, the active site or the binding site to the beta subunit. Introduction of a charged amino acid in this environment could pose a problem, especially during folding. In addition the increased length of the side chain, leads to clashes with neighbouring residues in the conformation chosen by SCWRL. The one given by FoldX is more convincing and should actually fit into the pocket. However this still means the introduction of a strong charge inside the protein where, in this case, hydrophobic residues like Trp, Phe or Gly dominate the environment. Therefore, although not immediately apparent, the mutation should be considered potentially disease-causing.

E506D

E506 is found at the surface of the protein, facing the cut out domain. However at this position there is not direct interaction between the domains. This mutation shows the largest disagreement in placement of the sidechain between FoldX and SCWRL: The C-beta atoms are placed almost identically at the position of the wildtype glutamate's C-beta atom. The C-gamma atom however is placed at the position of the wildtype one's as well by FoldX, while SCWRL places it in the almost opposite direction. E506 could form hydrogen bonds with two arginines in the vicinity, and the placement for E506D chosen by FoldX would allow at least one of those to occur as well, while the distance to the other arginine is likely too large due to the missing C-delta in aspartate. However these two arginines can also form hydrogen bonds to residues in the cut-out domain and their orientation suggest, that this is the formation found in reality. On the other hand, with the conformation chosen by SCWRL there are no interactions possible and no clashes occurring either. The slight increase in surface at this position is not a problem, since the cut-domain is far away here. Nonetheless, since the conformation of FoldX is closer to the original residue and would in theory allow hydrogen bonding, FoldX is more convincing in this case. A disease-causing effect is unlikely for this mutation, since there are apparently no important interactions, the other domain is too far away and the charge remains unchanged, should there be a deeper reasoning behind it.

Conclusion

<figtable id="tab:foldxscwrlpred">

Table 5: Prediction of effect for the mutations, based only on the observations made, while analysing the sidechain placements by FoldX and SCWRL. Predictions that are unsure, are given in brackets.

Mutation Effect?
R178H yes
R178C yes
P182L (no)
D207E yes
S2931 no
F434L no
L451V no
E482K yes
L484Q (yes)
E506D no

</figtable>

It is hard to make a decision between FoldX and SCWRL based on the previous analyses. In many cases the methods mostly agree, then in several ones they do no, but it does not seem to matter any way. In those cases where an effect would be expected it is often the case that even looking at it manually does not suggest any solution at all. Therefore in the end there remain very few cases, some of which FoldX seems to perform better in and some of which speak for SCWRL. In the following SCWRL will be used since the slightly better performance on the residues that are important for catalysis, however a general statement of SCWRL being better than FoldX cannot be drawn here due to the limited amount of data.

Based on the observations made so far, the predictions for the effect of the mutations would be as shown in <xr id="tab:foldxscwrlpred"/>.

Foreach of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.

Minimise

<xr id="tbl:minimize"/> shows the results from Minimize. For every mutation the energy minimization of the SCWRL as well as the FoldX structure is displayd together with the wildtype for comparison. On first sight the results between the mutations are very similar to each other. The FoldX mutation conformations take up very comparable energy values to the wildtype during minimization. It is striking however that not later than after the second minimization step the energy in all cases rises again. The SCWRL conformations express an even more curious behaviour. Being very similar to each other they stand in contrast to FoldX and wildtype minimizations as the energy value is overall higher and additionally increases up to the 4th minimization step. This seems rather contradictory to the programs purpose. Also it is noticable that the SCWRL structures show already higher starting energy values to the supposedly optimal wildtype.

The energy range stays the same, for FoldX approximately -7700 to -7300 and for SCWRL -7500 to -7000. Mutation R178H expresses the greates energy distance between SCWRL and FoldX. The most opposite behaviour shows mutation S393I as here the curves at some points even overlap. This is why those two mutations were chosen for a closer investigation.

<figtable id="tbl:minimize">

R178H
R178C
P182L
D207E
S293I
F434L
L451V
E482K
L484Q
E506D
Table 6: Minimized energies from SCWRL and FoldX structures. Displayed is for every mutation the energie curve within the 5 minimization steps for SCWRL, FOLDX and additionally the wildtype for comparison.

</figtable>


<figure id="fig:178">

Figure 4: Close-up of the R178H mutation during minimization. The wildtype structure is displayed in green FoldX is colored in cyan and SCWRL in magenta.

</figure> <xr id="fig:178"/> shows a close-up illustration of the R178H mutation within the 5 minimization steps. Although for this mutation the energies differ the most the changes around the mutation site are very small. Only tiny shifts can be perceived. It seems that this energy difference results from some other part of the structure. As it was discussed above the positioning of this mutant should not pose greate difficulties thus it is even more surprising that the structure is seemingly changed by Minimize somewhere other than at the mutation site.


<figure id="fig:293">

Figure 5: Enlarged animation of the steps during minimization for the S293I mutation. The wildtype structure is displayed in green FoldX is colored in cyan and SCWRL in magenta.

</figure>

The behaviour of mutation S293I during minimization is displayed in <xr id="fig:293"/>. There is more motion visible in comparison to <xr id="fig:178"/> especially the side chain from SCWRL is placed differently during the minimization. Alltogether the changes are still comparably little and also there is no direct improvement or deterioration of the positioning recognizable.



Gromacs

The Gromacs analysis was conducted with the SCWRL results, as these were determined as slightly better than the FoldX models. Gromacs consists of a variety of programs which were implemented consecutively (pdb2gmx, grompp, mdrun, g_energy). The explanation for the MDP file necessary for the preparation for Gromacs can be found in the protocol.

Gromacs was implemented with the mutations from SCWRL but also with the wildtype, in this case the second domain of 2gjx chain A. <figure id="fig:runtime">

Figure 6: Runtime of Gromacs with different number of maximal number of steps.

</figure>


The wildtype analysis was used to get a feeling for the run time in dependence of the number of steps used for calculation, see <xr id="fig:runtime"/>. The number of implemented steps in the calculation converges around 300, which is in agreement with the time. The run time does never exceed 30 seconds. From this it can be concluded that the number of used steps has a upper bound beyond which no superioir calculation is performed.


The energies from bond angle and potential were calculated with the force fields amber03 amber99sb-ildn and charmm27. They are depicted in <xr id="tbl:gromacsenergies"/>. To show how similar the energies for bond angle and potential are through the force-fields they are displayed in one figure but with different shades. The force-fields show very similar behaviour for the energy calculation of bond and angle. Only the potential is different. Here the amber99sb-ildn yields results around 50,000 and the amber03 around 40,000. The charmm27 shows generally a similar trend as the amber03. Besides the differences in potential is becomes also evident that for the distinct force-fields different numbers of steps are necessary.
Whereas there are some small variations within one figure there are very little between apart from the number of steps. The wildtype calculations express the same trends as the mutations and thus there cannot be much inferred from the energy minimisation.

<figtable id="tbl:gromacsenergies">

Table 7: Overview of all energies for angle bond and potential. One figure contains all 3 employed forcefields. The darkest shades are amber03 the middle shades amber99 and the light colors charmm27.

Wildtype
R178H
R178C
P182L
D207E
S293I
F434L
L451V
E482K
L484Q
E506D

</figtable>

To have a closer look at the mutant minimal energies there average is displayed as well as their difference to the wildtype, see <xr id="tab:avg"/> and <xr id="tab:delta"/>. The difference between wildtype and mutant energy is denoted as delta Δ. As all the force-fields energy calculations express a very similar behaviour the numbers are only displayed for the amber03 force-field.
The average energies show that there is very variance within all results. It is not always the case that the wildtype receives a lower energy which would mean that the mutant was more favourable.
The difference between wildtype and mutant would be expected high for harmful mutations as this would express the confirmation change of the mutant region. A positive difference shows the mutants lower stability and a negative difference means that the mutant is more stable. The changes between wildtype und mutant energies are comparably little as the biggest change in potential is 565 for bond 23 and for angle 113.


<figtable id="tab:avg">

Table 8: Average energies of wildtype and mutants.

Wildtype R178H R178C P182L D207E S2931 F434L L451V E482K L484Q E506D
Avg. potential -39121 -39737 -38545 -39528 -38897 -38124 -39028 -38515 -38511 -39179 -39388
Avg. bond 856 902 837 798 942 1199 1057 1075 899 884 788
Avg. angle 2646 2784 2648 2671 2662 2663 2697 2646 2616 2621 2682

</figtable>

<figtable id="tab:delta">

Table 9: Energy differences between wildtype and mutants.

R178H R178C P182L D207E S2931 F434L L451V E482K L484Q E506D
Δ potential 428 565 -321 -59 226 -130 238 429 -148 -52
Δ bond 21 4 23 3 -15 8 -16 -13 3 21
Δ angle 113 10 23 4 -16 20 -25 -24 -29 29

</figtable>

References

<references/>