Difference between revisions of "Structure-based mutation analysis TSD"

From Bioinformatikpedia
Line 336: Line 336:
Gromacs was implemented with the mutations from SCWRL but also with the wildtype, in this case the second domain of 2gjx chain A. The wildtype analysis was used to get a feeling for the run time in dependence of the number of steps used for calculation.
Gromacs was implemented with the mutations from SCWRL but also with the wildtype, in this case the second domain of 2gjx chain A. The wildtype analysis was used to get a feeling for the run time in dependence of the number of steps used for calculation, see <xr id="fig:runtime"/>. The number of implemented steps in the calculation converges around 300, which is in agreement with the time. The run time does never exceed 30 seconds. From this it can be concluded that the number of used steps has a upper bound beyond which no superioir calculation is used.

Revision as of 13:08, 26 June 2012

No citations, I don't read THAT many books :P - Have some music instead ;)

The journal of this task can be found here here.

Structure preparation

<figtable id="tbl:struct_comp">

Table 1: Comparison of experimental parameters for the two resolved structures of UniProt entry P06865. The parameters are important for choosing the structcture used for the structure based mutation analysis. Coverage is the sequence residues covered by the structure according to UniProtKB. This however is based on the SEQRES data. It should be noted that there is an unresolved region (residues 75 to 88) in both structures.
PDB-ID:Chain Coverage Resolution (Å) R-value R-free pH
2gjx:A 23-529 2.8 0.270 0.288 5.5
2gk1:A 23-529 3.25 0.277 0.322 5.5


There are two structures resolved for the HEXA_HUMAN reference sequence used for the course of this practical. The struture of this Uniprot entry is found in the alpha-chains of the PDB-IDs and 2gjx and 2gk1 both form the same publication <ref name="hexa_pdb_ref">Lemieux, M., Mark, B., & Cherney, M. (2006). Crystallographic Structure of Human beta-Hexosaminidase A: Interpretation of Tay-Sachs Mutations and Loss of GM2 Ganglioside Hydrolysis. Journal of molecular biology, 359(4), 913-29. doi:10.1016/j.jmb.2006.04.004</ref>. Unfortunately a 14 residue stretch towards the N-terminus of the protein (residues 75 to 88) is unresolved in both structures. However as previously shown the alpha subunit of Hex A consists of two domains. The N-terminal domain, Glyco_hydro_20b, is not involved in catalysis. Therefore, to evade problems with the missing backbone in the course of this task, the structure was truncated to contain only the C-terminal, catalytic domain, Glyco_hydro_20. In concordance with previous tasks and based on the experimental data in for both structures, shown in <xr id="tbl:struct_comp"/>, 2gjx was chosen as the reference structure. Details on the alteration of the structure according to the measures described above can be found in the journal.

TODO mention, that this unforntunately does not include the (pseudo-)ligand and we therefore have to make assumptions on the hydrogen bonds...


As only disease causing SNPs were assigned, the week before, some mutations had to be replaced. Additionally the reference PDB structure limited the possibilities, as only the second domain was retained for the following analysis. Thus an almost new set of SNPs was chosen: R178H, R178C, P182L, D207E, S293I, F434L, L451V, E482K, L484Q, E506D. Their position in the 3D-structure of HEXA is shown in <xr id="fig:snpsOnstr"/>.

<figure id="fig:snpsOnstr">

SNPs highlighted on the HexA subunit structure. The mutations are displayed in red and the cut out domain is displayed in grey. The active site is displayed in orange and the important residues in yellow.


The chosen mutations are displayed in the context of the remaining domain, the active site and important residues. For a detailed description of the important residues, see introduction . There are 5 mutations occurring in loop regions and 5 in helix elements. One intersection between mutations and important residues is at position 207 where there is a mutations from D to E. At the other important residue within the mutations, namely position 178 two mutants were chosen for the SNP set. Mutations of the important residues are expected to have a severe effect on the protein function.

For the new mutation set the biochemical properties are listed in <xr id="tab:biochem"/>. They have been assembled according to the sequence based mutation analysis.

<figtable id="tab:biochem">

Table 2: Biochemical properties
Mutation Wildtype Mutant Grantham score
Hydrophobicity Volume Charge Hydrophobicity Volume Charge
R178H -4.5 (polar) 173.4 (bulky) positive -3.2 (polar) 153.2 (bulky) neutral 29
R178C -4.5 (polar) 173.4 (bulky) positive 2.5 (polar) 108.5 (small) neutral 180
P182L -1.6 (nonpolar) 112.7 (small) neutral 3.8 (nonpolar) 166.7 (bulky) neutral 98
D207E -3.5 (polar) 111.1 (small) negative -3.5 (polar) 138.4 (bulky) negative 45
S2931 -0.8 (polar) 89.0 (tiny) neutral 4.5 (nonpolar) 166.7 (bulky) neutral 142
F434L 2.8 (nonpolar) 189.9 (bulky) neutral 3.8 (nonpolar) 166.7 (bulky) neutral 22
L451V 3.8 (nonpolar) 166.7 (bulky) neutral 4.2 (nonpolar) 140.0 (small) neutral 32
E482K -3.5 (polar) 138.4 (bulky) negative -3.9 (polar) 168.6 (bulky) positive 56
L484Q 3.8 (nonpolar) 166.7 (bulky) neutral -3.5 (polar) 143.8 (bulky) neutral 113
E506D -3.5 (polar) 138.4 (bulky) negative -3.5 (polar) 111.1 (small) negative 45


Map all the 10 mutations onto the crystal structure. Color the mutants differently than the rest of the protein and create a snapshot for the wiki. If applicable find out whether the mutations are close to the active site, a binding interface or other important functional sites. Visualize this and describe it properly.

Molecular Mechanics

<figtable id="tbl:pymolstruc">

Overview of all SNPs: Comparison wildtype and mutant structure. The wildtype is colored light green (or if it is a important residue yellow respectively), the mutant red and polar contacts are displayed in blue.

</figtable> Since these are only the manually chosen side chain conformation, selected from the ones given by PyMOL, most of the analysis can be found in the sections below based on the predictions by SCWRL and FoldX. Here a few general things:


In the following, compare wild type (WT) and mutant structures.
Now that you should have a clear idea of the WT and mutant proteins we will try to calculate some energies. Always calculate the energy for the wild type and mutants – then substract/compare.

SCWRL and FoldX

<figtable id="tbl:pymolstruc">

Table TODO: Overview of all SNPs in the 3D-structure. Compared are the wildtype as well as the SCRWL and FoldX results on the mutation. The wildtype is colored light green (or if it is a important residue yellow respectively), the SCRWL modelling purple and the foldX modelling cyan. The polar contacts are displayed in blue.



R178 is one of the important residues. As can be seen in the according figure in <xr id="tbl:pymolstruc"/> it could form hydrogen bonds with the neighbouring residues D207 and E462. In addition it seems reasonable that these two neighbouring residues with the negative charges somehow coordinate the positive charges placed between them. More importantly R178 has already been shown to form a hydrogen bond with the ligand.

The mutant amino acid histidine, depending on the pH value, can also carry a positive charge. The reported optimum pH value for Hex A is around 4.4, in these conditions, histidine is sure to contain a positive charge <ref name="phforhexa">Grebner,E.E. et al. (1986) Two abnormalities of hexosaminidase A in clinically normal individuals. American journal of human genetics, 38, 505-14.</ref><ref name="histcharge">when is his pos charged, alberts TODO</ref>. However the larger problem under is that under these conditions the formation of a hydrogen bond with the ligand is very unlikely. This is reinforced by the recessed positioning of the residue relative to the active site. Even for the case of SCWRL (purple), where the hydrogen is included (and the ring flipped compared to FoldX), the distance could already be too long for coordination of the ligand. There are slight clashes of both the SCWRL as well as the FoldX structures with D175 however these are very minor, fixable and clearly not the main problem here.

For the mutation to cysteine, the problem is much more pronounced. There is no doubt that formation of the hydrogen bond to the ligand is not possible anymore due to the increase in distance. Clashes do not exist, but would actually be hard to create in any case.

In conclusion the modelling of both mutation side chains is mostly as good as possible and shows only minor differences between the two methods. The orientation of the histidine ring is however more convincing in the SCWRL result.


<figure id="fig:p182surface">

Surface conditions in the reference structure near wildtype residue P182. The Hex A alpha subunit (2gjx:A) is shown in green, the beta subunit (2gjx:B) in blue. P182 is highlighted in orange. As can be seen, there is no interaction between the two subunits at the position of interest and the mutation to leucine should not have an effect dimer formation.

</figure> P182 is on the surface of the Hex A alpha subunit and proline act as a helix breaker. Breaking of the helix however is mandatory in this case, since the ensuing loop continues straight towards the active site, which is orthogonal to the direction of the alpha-helix. If a mutation of proline should lead to an elongation of the alpha-helix this would very likely have an effect on the protein's function since R178, found in the following loop, could not be placed accurately any more. However the mutation to leucine must not necessarily lead to this and as such is not necessarily disease-causing.

Another consideration should be made about the changed protein surface, which is slightly larger with leucine at position 182. This is the side of alpha subunit, which interacts with the beta-subunit of Hex A and is therefore important to consider, however, as <xr id="fig:p182surface"/> shows, there is a cavity at this position and the change to leucine should not have an effect on the interaction of the subunits.

There are differences between the sidechain placement in FoldX and SCWRL, however under the given surrounding conditions and due to the short sidechains none of the two can be considered better than the other. The only notable difference is that the small clashes seem to be better handled in the placement by SCWRL.


<figure id="fig:d207ishard">

Local packing in the reference structure near wildtype residue D207. Shown as spheres are, from left to right, H262, D207 and R178. The mutation D207E elongates the sidechain to a degree where it cannot be placed anywhere anymore without creating clashes or loosing H262 as a protein donor.

</figure> D207 is another residue considered important, since it forms a hydrogen bond with the "substitute ligand" NGT (c.f. important residues), however this is the same atom that is also reached by R178. The original publication of the crystal structure suggests, that R178 forms hydrogen bond with the ligand, while D207 interacts with the neighbouring residue H262, that acts as protein donor.

The mutant amino acid glutamate has chemical features very similar to the ones of the wildtype, which means that the interaction with histidine is in theory still possible. However the sidechain is longer which in this particular case is very problematic. The side chain placement of FoldX is basically an elongation of the wildtype. While this keeps the charged group close to the H262 there is clearly a clash with R178 that is not easily resolved. SCWRL places the side chain further away, which better evades the clashes, however to fully resolve them a further tilt of the charged group would bring into to close proximity to another neighbouring histidine. Additinally this placement is too far away from H262 to allow any interaction. As can be seen in <xr id="fig:d207ishard"/>, the space usage in the wildtype is near-perfect and there seems to be no choice for a functional placement of the mutant without introducing larger changes in the backbone that might have strong effects on the surrounding residues. Given that this is in the active site, the effects would likely be disease-causing.


S293 is found in a loop structure far away from the active site. It is found on the surface, however not interacting with the beta subunit of Hex A. The -OH group of serine can form hydrogen bonds with a neighbouring glutamate as well as an asparagine, both of which are not possible any more with the mutant amino acid. However none of these seem to be directly essential. The increased length of isoleucine over serine leads to a clashes with the glutamate, however tilting the Glu sidechain slightly upwards should be able to solve this problem. This would slightly change the protein's surface but should also no affect the binding to the beta subunit. The placement done by SCWRL and FoldX is near identical and from the structural observations this mutation would have to be considered non disease-causing.


F434 is found relatively close to the active site, but should not be able to exert an effect onto it. In addition there are no other aromatic rings that could play a role in pi-stacking. The residue is found on the surface of the protein but does not interact with the beta-subunit, therefore the formation of a small mole by the mutation to leucine should not have an effect on the binding. The side chain conformations by SCWRL and FoldX are almost identical, there are no clashes created and all in all, there is no indication that the mutation would lead to a non-functional protein and should therefore be regarded as non disease-causing.


L451 is found at the surface of the protein, opposite of the beta-subunit binding site and catalytic site. The side chain orientations between the two methods are similar, differing slightly in the tilting of the two -CH3 groups. In both cases there are no clashes with other atoms and the surface at the position becomes slightly recessed, compared to L451. However there is no indication for an effect of this change and the mutation is likely not disease causing.


E482 is not near the active site but actually at the face to the non-catalytic domain that had to be cut out at the beginning of the task. Since neither of the two methods have this information at their disposal both choose conformations that, although not the same, are severely clashing with side chains of the cut out domain. However, observation of the local environment suggests, that even when given information about atoms in the second domain, there is no possibility to arrange the very long lysine side chain in the pocket, that shows close packing and formation of several hydrogen bonds between side chains of the two domains. One of these is formed by E482 and would therefore be missing in the mutant as well. Taking all these problems together, although the residue is far away from the catalytic site, it seems possible that this mutation would disturb the folding and especially association between the two domains in such a strong manner that it is disease causing.


L484 is found buried inside protein, not close to the cut out domain, the active site or the binding site to the beta subunit. Introduction of a charged amino acid in this environment could pose a problem, especially during folding. In addition the increased length of the side chain, leads to clashes with neighbouring residues in the conformation chosen by SCWRL. The one given by FoldX is more convincing and should actually fit into the pocket. However this still means the introduction of a strong charge inside the protein where, in this case, hydrophobic residues like Trp, Phe or Gly dominate the environment. Therefore, although not immediately apparent, the mutation should be considered potentially disease-causing.


E506 is found at the surface of the protein, facing the cut out domain. However at this position there is not direct interaction between the domains. This mutation shows the largest disagreement in placement of the sidechain between FoldX and SCWRL: The C-beta atoms are placed almost identically at the position of the wildtype glutamate's C-beta atom. The C-gamma atom however is placed at the position of the wildtype one's as well by FoldX, while SCWRL places it in the almost opposite direction. E506 could form hydrogen bonds with two arginines in the vicinity, and the placement for E506D chosen by FoldX would allow at least one of those to occur as well, while the distance to the other arginine is likely too large due to the missing C-delta in aspartate. However these two arginines can also form hydrogen bonds to residues in the cut-out domain and their orientation suggest, that this is the formation found in reality. On the other hand, with the conformation chosen by SCWRL there are no interactions possible and no clashes occurring either. The slight increase in surface at this position is not a problem, since the cut-domain is far away here. Nonetheless, since the conformation of FoldX is closer to the original residue and would in theory allow hydrogen bonding, FoldX is more convincing in this case. A disease-causing effect is unlikely for this mutation, since there are apparently no important interactions, the other domain is too far away and the charge remains unchanged, should there be a deeper reasoning behind it.


<figtable id="tab:foldxscwrlpred">

Table TODO: Prediction of effect for the mutations, based only on the observations made, while analysing the sidechain placements by FoldX and SCWRL. Predictions that are unsure, are given in brackets.
Mutation Effect?
R178H yes
R178C yes
P182L (no)
D207E yes
S2931 no
F434L no
L451V no
E482K yes
L484Q (yes)
E506D no


It is hard to make a decision between FoldX and SCWRL based on the previous analyses. In many cases the methods mostly agree, then in several ones they do no, but it does not seem to matter any way. In those cases where an effect would be expected it is often the case that even looking at it manually does not suggest any solution at all. Therefore in the end there remain very few cases, some of which FoldX seems to perform better in and some of which speak for SCWRL. In the following SCWRL will be used since the slightly better performance on the residues that are important for catalysis, however a general statement of SCWRL being better than FoldX cannot be drawn here due to the limited amount of data.

Based on the observations made so far, the predictions for the effect of the mutations would be as shown in <xr id="tab:foldxscwrlpred"/>.

Foreach of the mutations also a new structure will be created. Note down all of the energies, but also use these structures in the next steps.


 What happens regarding the energy? 


The Gromacs analysis was conducted with the SCWRL results, as these were determined as slightly better than the FoldX models. Gromacs consists of a variety of programs which are implemented consecutively (pdb2gmx, grompp, mdrun, g_energy). The explanation for the MDP file necessary for the preparation for Gromacs can be found in the protocol.

<figure id="fig:runtime">

Runtime of Gromacs with different number of maximal number of steps.


Gromacs was implemented with the mutations from SCWRL but also with the wildtype, in this case the second domain of 2gjx chain A. The wildtype analysis was used to get a feeling for the run time in dependence of the number of steps used for calculation, see <xr id="fig:runtime"/>. The number of implemented steps in the calculation converges around 300, which is in agreement with the time. The run time does never exceed 30 seconds. From this it can be concluded that the number of used steps has a upper bound beyond which no superioir calculation is used.

Analyze the minimization of the system with the following command: g_energy -f FILE.edr -o energy_1.xvg. Do the analysis for Bond, Angle and Potential. The xvg graphs can be viewed with xmgrace and in the print settings you can choose eps output, the print and convert to pdf.