# Wildtype

## A brief check of results

#### How many frames are in the trajectory file and what is the time resolution?

• frames: 2001
• time resolution: 5ps

#### How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

• real time: 9h27:35
• simulation speed: 25.370 ns/day
• simulation speed: 107991 years/second

#### Which contribution to the potential energy accounts for most of the calculations?

• potential energy: -1.24431e+06

## Visualization of results

 Figure1a: MD simulation of the movement of BCKDHA Figure1b: Visualisation of the simulated protein with nmgx

In Figure 1a the motion of the protein and especially of the side chains is shown. As we can see the part on the right side of the protein which is colored blue has the most movement. Figure 1b shows another visualisation of the protein which is produced with ngmx.

## Quality assurance

Quality assurance is a step to find out whether an equilibrium of the system was reached. Therefore tests are performed in which the convergence of thermodynamic parameters (temperature, pressure, potential and kinetic energy) examined. The following section shows the results for the quality assurance analysis for the wildtype protein.

### Energy calculations

#### Pressure

Figure2: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 1.01601 0.015 71.2152 -0.0706383

In Figure 2 the pressure of the molecular dynamic system is shown. The average value is 1.0161 bar which is shown in the table above. But as we can see the pressure ranges from about -250 bar to 250 bar. Since there is such a big range of 500 bar we would say that the temperature of this simulation does not convergate to a specific value.

#### Temperature

Figure3: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.941 0.0047 0.954498 0.00557078

In Figure 3 the temperature of the MD simulation is shown. As we can see it ranges between 294 K and 302 K so it has a very small deviation of the average value of 297.9 K. Since there is only such a small fluctuation we can see that the temperature in the system is quite stable which means that it reached an equilibrium.

#### Potential

Figure4: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.24431e+06 66 1041.57 -463.992

Figure 4 shows the potential of the md system. As we can see in the picture the potential ranges from -1.24e+06 kJ/mol to -1.25e+06 kJ/mol. Although this is a very huge range of 10000 we can see that all in all the potential is very low. This low potential indicates that the protein is quite stable. Since the structure of a protein is responsible for the function of a protein this stability is important for the function of the protein.

#### Total Energy

Figure5: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.02119e+06 65 1279.76 -459.819

The low potential energy already indicated that the total energy of the system has to be quite low. By looking at Figure 5 we can see that the energy is a bit higher than the potential energy but it is still very low. Additionally there is less variation in the energy since it ranges between -1.017e+06 kJ/mol and -1.025e+06 kJ/mol. Again we can say that such a low energy stands for a stable protein which indicates that the simulation was correct.

### Minimum distance between periodic boundary cells

The determination of the minimum distance between periodic boundary cells is a crucial part in the quality assurance of an MD simulation. In this step you have to verify that there were no direct interactions between periodic images, as interactions between atoms of the same molecule over the periodic boundary would disturb the native behaviour of the protein. This would lead to invalid molecular dynamics results. Therefore we have to check that the minimum distance is greater than 0.9.

Figure6: Minimum distance between periodic boundary cells

The shortest periodic distance is 1.40945 nm at time 6090 ps between atoms 25 and 6490. As we can see in Figure 6 the values range between 1 nm and 4 nm during the whole simulation. It is interesting that the most variation is in the middle part of the simulation so we can say that in the middle of the simulation. Since the fluctuation of the values correspond with the movement and flexibility of the protein we can say that the protein is very flexible between 4000 ps and 7000 ps. The rest of the time it is also flexible but less than in this period.

### Root mean square fluctuations

RMSF for protein RMSF for C-alpha
Figure7: RMSF for protein
Figure8: RMSF for C-alpha

Figure 7 shows the RMSF for the whole protein. The beginning of the protein shows the most fluctuation. This indicates that this region of the protein is the one with the most flexibility. In the middle part of the protein there are nearly no peaks which means that this part is very stable. In the end there is a peak which is a bit higher than the RMSF of the rest of the protein. This could suggest that this terminal region is also a bit flexible.
In Figure 8 we did the RMSF calculations not for the whole protein but only for the C-alpha atoms. Those atoms are the central carbon atoms of each amino acid of the protein which means that in this calculation only the backbone of the protein is considered. But by comparing the two plots above with each other we can see that in both cases the beginning of the protein is the part which has definitly the highest fluctuation. So not only the side chains differ from the average structure but also the backbone which indicates for a strong flexibility.

### Pymol analysis of average and bfactors

The average.pdb file was produced automatically during the calculation of the RMSF because it is needed for comparisons. This file contains the average structure of the protein. Because of the option -oq the bfactor.pdb file was produced additionally. In this file the temperature factors (bfactors) are calculated and added to a reference structure by coloring the specific regions of the structure. Normally the parts of the protein which are most flexible have also the highest temperature. To find out if this is the fact in our case we used pymol to analyze the average structure and the bfactors structure. Additionally we compared the predicted average structure with the original structure of our protein.

#### Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure9: Alignment of the experimental structure with the average structure
Figure10: Alignment of the experimental structure with the structure containing the bfactors
Figure11: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.169 RMSD: 0.377 RMSD: 1.422

To find out how accurate the calculated average structure is we aligned it with the experimental structure of our protein (1u5b). As we can see in Figure 9 and additionally because of the RMSD value of 1.169 the superposition of the two structures is not covering perfectly. The middle part of the protein is aligned quite good. The most deviating parts are the two ends of the structures on the left and on the right side of the picture. By looking at the already discussed RMSF we can see that these regions are the most flexible ones so it is possible that the two structures are only in two different states of movement. Next we compared the structure of 1u5b with the structure containing the bfactors and according to Figure 10 the used reference structure on which the bfactors are added is the structure of 1u5b. This is obvious since they are superposed nearly perfectly. There is a minimal shift in the alignment but since this occurs at the whole structure we consider it to be an error of the superposition tool. Now we come to the analysis of the bfactors. In Figure 11 we can see the alignment of the structur containing the bfactors with the average structure. Of course they are a bit different again because of the different states of movement but in this figure the bfactors are the most interesting observation. As it is shown in the picture only the part in the end of the protein (left side of the picture) is colored indicating that only this part of the protein is flexible. The coloring ranges from yellow to red where yellow stands for little and red for high flexibility. This flexibility according to the bfactors is reflected by the RMSF value above. The other end of the protein is not colored but only a bit shifted. We thought that this shift could be a result of a movement but since it is not colored this theory is perhaps false. But by looking at the RMSF values above we see that there is only a very little fluctuation of the atoms. So perhaps this part is a little bit flexible but not enough to be marked as flexible by the bfactors.

#### C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure12: Alignment of the experimental structure with the C-alpha atoms of the average structure
Figure13: Alignment of the experimental structure with the C-alpha atoms of the structure which contains the bfactors
Figure14: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 0.955 RMSD: 0.300 RMSD: 0.993

To analyse the run where we only considered the C-alpha atoms of the structure again we first wanted to find out how good the calculated average structure fits the experimental structure. As we can see in Figure 12 there is again a lot of variation between the average structure and the structure of 1u5b. But by looking at the RMSD value (0.955) we can see that it is smaller than the RMSD value considering the whole protein for the average structure (1.169). Since we assumed that the variation is aroused by the different states of movement in the different structures we can say that the backbone has a bit less flexibility because of the lower RMSD value. The most variation is again in the both terminal regions of the protein. Next the comparison of the experimental structure with the C-alpha atoms of the structure containing the bfactors is analysed. Figure 13 and also the very low RMSD value show that the superposition of the two structures is very close. Perhaps there is a little bit of variation between the two structures or we have the same case as in Figure 10 where we assumed a mistake of the programm since there was a shift during the whole alignment. It is hard to see it here because of the spheres. The last analysis is the detailed one of the structure containing the bfactors ( Figure 14). Again the part with the highest temperature is colored where red means high flexibility and green low flexibility. As we can see only the end of the protein (right side) is colored so only this part of the protein shows strong flexibility. This observation agrees with the RMSF because in both cases the beginning of the protein is predicted to be flexible.

The radius of gyration reflects how the structure changes during the simulation and how the shape changes during the time.

Figure15: Radius of gyration during the MD simulation

According to the black line in Figure 15 the radius of gyration ranges between 2.22 and 2.4. nm during the whole simulation. The black line describes the general change of the shape of the protein. By looking at the plot more closely we can see that there is a trend. In the beginning the radius has its maximal value of about 2.4 nm but during the simulation it falls half of the time. But after about 6300 ps the decline of the radius stopps. From then on the value is between 2.23 and 2.27 nm which shows that the fluctuation is very small. But the fact there is still variation until the end of the simulation shows that the protein moves all the time indicating the flexibility of the protein. The changes in the profile of the protein are specified by the red (x axis), green (y axis) and blue (z axis) lines.

## Structural analysis

### Solvent accessible surface area

The solvent accessible surface area (SASA) of a protein is the part of the surface which is reachable a solvent. This definition of SASA can be devided into two subgroups - hydrophilic SASA and hydrophobic SASA. This shows that the possibility that a solvent can reach the surface depends on its properties.

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure16: Plot of the average solvent accessibe surface over time per residue
Figure17: Plot of the average solvent accessibe surface over time per atom
Figure18: Plot of the solvent accessible surface of the protein during the md simulation

In Figure 16 the average sas for each residue during the simulation is shown. We can see that there is much variation and the solvent asseccible areas for the residues range between 0 nm2 and 2.3 nm2. As there are also regions which have a sas of 0 nm2 we can see that there are parts which are not accessible for solvents but the most regions are accessible. The most accessible one is in the total beginning since the peak is definitly the highest one. Additionally there are two high peaks in the middle of the protein which differ completely from the peaks next to them since they are all quite low. This shows that there are only a few parts in the the center of the protein which are accessible for solvents but here the accessibility is very good. In Figure 17 the average solvent accessibe surface over time per atom is shown. Again there is a lot of variation in the sas. It ranges from 0 nm2 to 0.55 nm2. The last plot ( Figure 18) shows the general sas for the whole protein during the simulation. The red line describes the accessibility for hydrophilic solvents and the black line for hydrophobic solvents. As we can see the accessibility for hydrophobic solvents is a little bit higher but not a lot. The green line which hardly fluctuates shows the general sas for the protein during the whole simulation indicating that the sas is always quite the same.

### Hydrogen bonds

protein and protein protein and water
Figure19a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure20a: Hydrogen bonds with the surrounding solvents and pairs within 0.35 nm during the simulation
Figure19b: Internal hydrogen bonds during the simulation
Figure20b: Hydrogen bonds with the surrounding solvents during the simulation

Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 594 1158 308.847 343926
protein-water 29470 30034 806.073 4.42551e+08

Figure 19a (left) shows the number of internal hydrogen bonds during the simulation. According to the black line in this plot which describes the hydrogen bonds the number of bonds is about 300. This number is supported by the table above. Since the black line shows nearly no variation during the whole simulation it seems that there is no change in the number of hydrogen bonds. But by looking at Figure 19b we can see that the number of hydrogen bonds change since there is much fluctuation in the curve. Although the number of hydrogen bonds varies between 280 and 335 a trend can be seen. In the beginning the average number is about 310 then it goes down to about 300 and rises again to 320. So we see that the number of hydrogen bonds first declines a bit but after one third of the simulation it rises again. By comparing this trend with the one shown in Figure 20b we can say that they are completely contrary. First the number of hydrogen bonds is low, then rises a bit and after about one third of the time it falls again. It has to be recognized that the number of extrenal hydrogen bonds is always much higher than the number of internal ones since it ranges from 740 to 860 but it is interesting that they are completely opposed. It is obvious that they have to be like this because of the movement of the shape of the protein. Since there is movement which is indicated by the alternating hydrogen bonds we can say that the protein is very flexible during the simulation. The red lines in Figure 19a and Figure 20a display the pairs within 0.35 nm. There are much more pairs within this distance inside of the protein (1400-1500 pairs) than with the surrounding solvents (1000 -1200 pairs). Additionally there is much more variation in the number of pairing with the solvents during the simulation than inside of the protein.

### Ramachandran plot

Ramachandran plot of our simulation general Ramachandran plot
Figure21: Ramachandran plot of our protein
Figure22: General Ramachandran plot (<ref>http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg</ref>)

Figure 21 shows the Ramachandran plot of the protein predicted by MD. As we can see the regions for beta sheets and alpha helices are very black and also the part for lefthanded helices. Additionally to these fields the other three corners are black. By comparing it to the general Ramachandran plot (Figure 22) we can say that there are much more black fields in the plot of the simulation. This shows that the angles are not that concentrated on one position but vary a lot. Since there are regions which are completely white it is obvious that some positions and angle combinations do not occur in the simulated protein. The fact that there are so many different angle positions and not only the ones like in the general Ramachandran plot could indicate that this protein is flexible.

## Analysis of dynamics and time-averaged properties

### RMSD matrix

Figure23: RMSD matrix of the structures of our protein during the simulation

Figure 23 shows the correlation between the several structures of our protein during the simulation. It is obvious that there has to be a diagonal which is turquoise and blue, as there is no distance between two identical structures. As we can see there is only one other part in the matrix which is turquoise and it is in the end of the simulation between 6000 ps and 10000 ps. This shows that these groups of structures are all quite similar and the simulations reaches an equilibrium. Additionally there are red parts between 6500 ps till 9000 ps and 1000 ps till 2500 ps. This shows that the group of structures which are similar in the end are quite different from the groups in the first part of the simulation. It is also very interesting that the groups of structures which are completely in the beginning of the simulation seem to be very different to the whole rest of structures during the simulation since the border of the matrix is red and only in the bottom left corner it is colored green. This indicates that the structures at the beginning of the MD simulation change a lot at that time as no energetically favourable structure had been found yet.

### Cluster analysis

 Figure24: Visualisation of the cluster of structure groups Figure25: Plot of the RMSD values of the clusters

The programm was able to find 542 cluster. In figure Figure 24 the clustered structures of the protein are visualised. The plot of the clustures show the RMSD values of the different plots. The RMSD values range from 0.07 nm to 0.57 nm. The fact that these values are quite low indicates that the all of the groups of structures are not completely different. As we can see most of the clusters have an RMSD value of about 0.35 which shows that that the main part of the structures have a bit similarity to other groups of structures. There is also a little number of groups with a value of 0.57 which shows that these groups of structures only have a bit similartiy during the simulation. Since the peaks between 0.1 and 0.2 are very small there only a few groups of structures which show a very high similarity during the simulation.
Furthermore we compared two of the clusters to each other by comparing the structures. We chose cluster 1 and cluster 2 for this comparison. Since the RMSD value is 0.709 we can see that the clusters are not completely different and there are groups of structures in the clusters which still have similarities.

### Internal RMSD

Figure26: Internal RMSD of our protein during the simulation

The internal RMSD values ranges from 0.1 nm to 0.45 nm so we can see that there is a lot fluctuation during the simulation. As we can see in Figure 26 in the beginning the values are very low but then they rise very fast until 1500 ps. After this point they only range from 0.3 nm to 0.4 nm, which is not a huge variation. After about 5000 ps they rise again a bit so that the average value for the following time is 0.4 nm. After 10000 ps it seems that the RMSD converges against 04.nm.

# Mutation M82L

## A brief check of results

#### How many frames are in the trajectory file and what is the time resolution?

• frames: 2001
• time resolution: 5

#### How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

• real time: 1d03h11:10
• simulation speed: 8.828 ns/day
• simulation speed: 310388 years/second

#### Which contribution to the potential energy accounts for most of the calculations?

• potential energy: -1.24452e+06

## Visualization of results

 Figure27a: MD simulation of the movement of the mutated BCKDHA Figure27b: Visualisation of the simulated protein with ngmx

In Figure 27a the movement of the whole protein and especially of the side chains is shown. As we can see the part on the bottom left side of the protein which is colored blue has the most movement. Additionally to this part the red part on the ride side also seems to show motion but not as much as the blue colored part. Figure 27b shows another visualisation of the protein which is produced with ngmx.

## Quality assurance

### Energy calculations

#### Pressure

Figure28: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 0.998885 0.018 71.3509 0.00251495

In Figure 28 the pressure of the molecular dynamic system for the mutation M82L is shown. According to the figure and also to the table the average value is 0.999 bar. But as we can see the pressure ranges from about -240 bar to 230 bar so the pressure varies a lot. Because of this fact we are not sure if this value of 0.999 bar is the equilibrium which should be reached or only the arithmetic average of all the values.

#### Temperature

Figure29: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.939 0.0044 0.959135 0.00200358

The temperature which is shown in Figure 29 ranges between 294.5 K and 301 K. As we can see in the figure there is a lot fluctuation around the average value of 197.939 but the range itself is not very big. Because of this fact we can say that the temperature in the system is quite stable which means that it reached an equilibrium.

#### Potential

Figure30: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.24452e+06 98 1059.66 -676.063

In Figure 30 where the potential of the system is shown we can see that there is a lot fluctuation during the simulation. The values range from -1.249e+06 to -1.24e+06 during the whole simulation indicating that the potential did not reach an equilibrium. But it is possible to recognize a trend in the fluctuation since the values decline during the whole simulation. A low potential indicates that a protein is stable so in our case we can say that the protein got more stable during the simulation. This fact is very good for the protein since the shape is important for the function and so it is important for a functional protein to be stable.

#### Total Energy

Figure31: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.02137e+06 98 1298.13 -674.564

The total energy ( Figure 31) is very low again with an average value of -1.02137e+06 kJ/mol, but it is not as low as the potential. Except for a few peaks we can say that the fluctuation is quite regular indicating that the protein never has a high energy. As already mentioned in the analysis of the potential we can say that this high total energy is important for the stablility and therefore also for the function of the protein.

### Minimum distance between periodic boundary cells

Figure32: Minimum distance between periodic boundary cells

The shortest periodic distance is 1.91319 nm at time 630 ps, between atoms 25 and 5578. The distance ranges between 2 nm and 5 nm which shows that the protein is flexible during the whole simulation. By looking at Figure 32 we can recognize a trend in the minimum distance since it starts quite low and fluctuates around a value of 2.5 nm and after 4700 ps the values rise constantly. After 7000 ps they fluctuate around a average value of 4. So we can say that during the simulation the minimum distance rises.

### Root mean square fluctuations

RMSF for protein RMSF for C-alpha
Figure33: RMSF for protein
Figure34: RMSF for C-alpha

In Figure 33 the RMSF for each residue of the protein is shown. Because of the huge peak at the first residues it is obvious that the part which varies most of the average structure is in the beginning. This variation indicates that this region of the protein is very flexible because it changes its shape a lot. The rest of the protein is quite fixed beacuse there are no remarkable high peaks. Only the region in the end contains a peak which is a bit higher than the other ones so we can say that there is perhaps a region which is a bit flexible. Since the trend in Figure 34 which shows the RMSF for the C-alpha atoms is the same we can say that not only the residues in the beginning are flexible but also the backbone. There is also a small peak in the end indicating that there is little flexibility in the end of the backbone of the protein.

### Pymol analysis of average and bfactors

#### Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure35: Alignment of the mutated structure with the average structure
Figure36: Alignment of the mutated structure with the structure containing the bfactors
Figure37: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.106 RMSD: 0.377 RMSD: 1.294

First we aligned the calculated average structure with the experimental structure of the protein to see if the mutation causes huge changes in the structure (Figure 35). As we can see there are big differences but they have already been there without the mutation (Figure 9) and can be explained by the fact that the two structures are in different steps of the movement. In the next figure (Figure 36) we compared the structure of 1u5b with the structures containing the bfactors. Since this structure is the reference structure the two structures should be very similar or perhaps identical. As we can see both structures are aligned nearly perfectly. It only looks like there is a shift in the whole structure but this could be a mistake of the program. In Figure 37 it can be seen very good that the beginning of the protein is the only region which is colored with another color than blue or turquoise so it is the only region which has a higher temperature. As some parts are completely red we can say that the beginning of the protein is very flexible.

#### C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure38: Alignment of the mutated original structure with the C-alpha atoms of the average structure
Figure39: Alignment of the mutated original structure with the C-alpha atoms of the structure which contains the bfactors
Figure40: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 0.886 RMSD: 0.289 RMSD: 0.930

To find out if the backbone is also flexible and not only the sidechains we also analysed the C-alpha atoms. In Figure 38 we first aligned the calculated C-alpha atoms with the original structure and as we can see there are huge differences between the two structures expecially in the beginning and in the end of the protein. But these differences are not evoked by the mutation since there are also differences in Figure 12. This variation is only because the two structures are in different steps in the movement. To compare the structure of the bfactors with the original structure we also aligned them ( Figure 39). Here we can see that they cover each other nearly perfectly. This observation is supported by the RMSD value of 0.289 which is very low. This fact let us assume that there is no influence of the mutation on the structure. In Figure 40 the bfactors are colored and so we can see very good that only the beginning of the protein is not only blue or turquoise. This indicates that only the beginning of the protein is flexible. Since only the very ending of the protein is completely red and not only orange we know that only this part is very flexible and the rest is indeed flexible but not that much.

Figure41: Radius of gyration during the MD simulation

The radius of gyration which is visualised in Figure 41 ranges from 2.25 to 2.4 nm. According to the black line which describes the general variation in the shape of the protein the radius decreases during the simulation, except of the middle of the simulation where the values rise a bit to 2.35 nm again. But in the end we have only a radius of gyration of about 2.25 nm. Since there is only a very small fluctuation in the end of the simulation we can assume that there is only a little flexibility in the protein.

## Structural analysis

### Solvent accessible surface area

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure42: Plot of the average solvent accessibe surface over time per residue
Figure43: Plot of the average solvent accessibe surface over time per atom
Figure44: Plot of the sas of the protein during the md simulation

In the left figure ( Figure 42) the average solvent accessibility for each residue during the simluation is shown. Since the values range from 0 nm2 to 2 nm2 we can say that the protein has many different regions that are accessible to solvents to different extents. There are many regions in the middle of the protein which have very low values indicating that these parts of the protein can hardly be reached by solvents. Nevertheless there are three very high peaks at residue 150, 180 and 230 so there are also some amino acids in this middle part which are exposed to solvents. But the region with the highest accessibility is in the beginning of the protein at residue 10 since the peak at this position is the highest one in the whole plot. All in all we can say looking at the plot that there are more high values in the beginning and in the end of the protein and less in the middle part. This agrees with the results from the bfactor analysis. In Figure 43 the average sas for each atom during the simulation is shown. Again there is a lot of variation between the different parts of the protein since the values range from 0 nm2 to 0.5 nm2. The parts with very low values are quite the same as in Figure 42 and are located in the middle of the protein. The interesting point is that the highest value is not in the beginning but in the end of the protein. Additionally there are many peaks in the middle part which are as high as the peaks in the beginning. These facts indicate that the first part of the protein is not better accessible for solvents than many other regions of the protein. The accessibility in the end of the protein is very varying. Although there is one very high peak the peaks in the neighbourhood are very low so there is only a small region which is good accessible for solvents but around this region the accessibility got worse. In Figure 44 the accessibility is devided in hydrophilic (red) and hydrophobic (black). Both lines are at about 110 nm2 - 120 nm2 and do not change a lot during the simulation. Since the black line is bit higher than the red one we can conclude that the accessibility for hydrophobic solvents is a bit better but not a lot.

### Hydrogen bonds

protein and protein protein and water
Figure45a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure46a: Hydrogen bonds and pairs within 0.35 nm with the surrounding solvents during the simulation
Figure45b: Internal hydrogen bonds during the simulation
Figure46b: Hydrogen bonds with the surrounding solvents during the simulation

Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 594 1158 304.417 343926
protein-water 29474 30038 817.490 4.4267e+08

By comparing Figure 45a which shows the internal hydrogen bonds with Figure 46a we see that there are more hydrogen bonds with the surrounding solvents than in the interior of the protein. This is clear since there are much more possibilities to build bonds with the surrounding solvent. But it is interesting that there are more pairs within 0.35 nm inside of the protein than with the surrounding solvents. Additionally the numbers of these pairs vary lot indicating that there is much movement in the protein which means that the protein is flexible all the time. The black lines which describe the hydrogen bonds seems barely to fluctuate. But by looking at Figure 45b and Figure 46b we can recognize that they indeed fluctuate. The number of hydrogen bonds inside the protein range from 280 to 330 and the number of hydrogen bonds with the surrounding solvents lies between 750 and 870. Both rise and decline very constantly all the time like a sinus curve. It is remarkable that both curves are inversely. This fact and that they are fluctuating all the time indicates that there is a lot variation in the shape of the protein which means that it is very flexible.

### Ramachandran plot

Ramachandran plot of our simulation general Ramachandran plot
Figure47: Ramachandran plot of our protein
Figure48: General Ramachandran plot (<ref>http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg</ref>)

By comparing the Ramachandran plot of our protein (Figure 47) with the general Ramachandran plot (Figure 48) we can see that there are huge differences. Indeed the regions where the alpha helices and the beta sheets are visualised are the same but there are much more left-handed alpha helices in our protein. Additionally the right upper corner and the bottom right corner are black which means that these angle combinations are also very common in our protein.

## Analysis of dynamics and time-averaged properties

### RMSD matrix

Figure49: RMSD matrix of the structures of our protein during the simulation

Figure 49 shows the correlation between the several structures of our protein during the simulation. It is obvious that there have to be a diagonal which is turquoise and blue. In addition to this diagonal there are three regions next to the diagonal which are also turquoise. According to this turquoise coloration we can say that the structures between 0 ps and 4000 ps correlate quite good with each other. There are some green and yellow parts in this square so some of the structures are not very similar but all in all the structures correlate well. The next region is between 4000 ps and 6000 ps where the structures correlate very good. In the last part of the simulation between 6500 ps and 10000 ps the structures correlate the best since the coloring is partly blue. As the correlating parts are like blocks around the diagonal we can say that the structures change only a little during some parts of the simulation since they all correlate with each other in one block. But then there is a jump to another structure. It seems like there is nearly no change in the structure after 6500 ps since here the structures are very similar. The jumps between the several blocks has to be high since the structures of the beginning and the end of the simulation do not correlate at all. This is shown be the red coloring between 200 ps - 600 ps and 6000 ps - 10000 ps which means that they have a RMSd value of 0.911 nm.

### Cluster analysis

 Figure50: Visualisation of the cluster of structure groups Figure51: Plot of the RMSD values of the clusters

The programm was able to find 525 cluster. In figure Figure 50 the clustered structures of the protein are visualised. The plot (Figure 51) shows the distribution of the RMSD values of the clusturs. The RMSD values range from 0.07 nm to 0.9 nm. Since there are such high RMSD values we can see that some structures are clustered although they are not very similar. Contrary we also have clusters with a very low RMSD of 0.07 which shows that the structures in these clusters have to be very similar. But most of the clusters (19000) have an average RMSD of 0.58 indicating that the structures are not very similar but correlate quite good. There is also another high peak with about 10000 clusters which have an RMSD of 0.2. This is a very low value indicating that many very similar structures are clustered.
Additionally we compared the two clusters one and two with each other. The have an RMSD of 0.822 nm. This is interesting since there are clusters which have an internal RMSD which is higher than this one. So we can assume that there are a few structures which are very different to all other structures.

### Internal RMSD

Figure52: Internal RMSD of our protein during the simulation

The internal RMSD which is plotted in Figure 52 changes a lot during the simulation. In the beginning it lies at 0.1 nm which means that the interatomic distances were all very small. But between 0 ps and 1000 ps there are two huge jumps so that the RMSD value amounts 0.35 nm. Until the end of the simulation it rises so that the RMSD in the end is at about 0.45 nm. Since there is not much fluctuation in the end it seems that the value convergates to 0.45 nm. Still there is a bit variation indicating that the protein stays flexible.

# Mutation C264W

## A brief check of results

#### How many frames are in the trajectory file and what is the time resolution?

• frames: 2001
• time resolution: 5ps

#### How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second?

• real time: 1d03h22:33
• simulation speed: 8.767 ns/day
• simulation speed: 312557 years/second

#### Which contribution to the potential energy accounts for most of the calculations?

• potential energy: -1.24420e+06

## Visualization of results

 Figure53a: MD simulation of the movement of BCKDHA Figure53b: Visualisation of the simulation with ngmx

As it is shown in Figure 53a the blue part on the bottom left side of the protein moves a lot and so it is very flexible. Additionally to this motion the red part on the right side also moves a bit but not as much as the blue part of the protein. In Figure 53b the simulated protein is visualised again but this time with ngmx.

## Quality assurance

### Energy calculations

#### Pressure

Figure54: Plot of the pressure during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (bar)
Pressure 1.00041 0.0069 71.2651 -0.0331535

In Figure 54 the pressure of the molecular dynamic system is shown. According to the figure and also to the table average value is 1.00041 bar. But as we can see the pressure ranges from about -250 bar to 250 bar so the values lie in a big range. Because of this fact we are not sure if this value of 1.00041 bar is the equilibrium which should be reached or only the arithmetic average of all the values.

#### Temperature

Figure55: Plot of the temperature during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (K)
Temperature 297.94 0.0064 0.956382 -8.60686e-05

In Figure 55 the temperature of the MD simulation is shown. As we can see it ranges between 294 K and 300.5 K so it has a very small deviation of the average value of 297.94 K. Since there is only such a small fluctuation we can see that the temperature in the system is quite stable which means that it reached an equilibrium.

#### Potential

Figure56: Plot of the potential during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Potential -1.2442e+06 75 1052.26 -477.246

Figure 56 shows the potential of the md system. As we can see in the picture the potential ranges from -1.248e+06 kJ/mol to -1.24e+06 kJ/mol. Although this is a very huge range of 10000 we can see that all in all the potential is very low. This low potential indicates that the protein is quite stable. Since the structure of a protein is responsible for the function of a protein this stability is important for the function of the protein.

#### Total Energy

Figure57: Plot of the total energy during the MD simulation
Energy Average Err.Est. RMSD Tot-Drift (kJ/mol)
Total Energy -1.0211e+06 75 1292.04 -477.313

The low potential energy already indicated that the total energy of the system has to be quite low. By looking at Figure 57 we can see that the energy is a bit higher than the potential energy but it is still very low. Additionally there is less variation in the energy since it ranges between -1.025e+06 kJ/mol and -1.017e+06 kJ/mol. Again we can say that such a low energy stands for a stable protein which indicates that the simulation was correct.

### Minimum distance between periodic boundary cells

Figure58: Minimum distance between periodic boundary cells

The shortest periodic distance is 2.01518 nm at time 1590 ps, between atoms 166 and 6569. In Figure 58 the minimum distance is visualised by the black line. In the beginning the minimal distance is 0.25 nm but in the following 4500 ps it rise till 4 nm. After 4500 ps the minimal distance fluctuates around this value. With this information we can conclude that in the beginning there is much more variation and so flexibilty in protein but although the change in the minimum distance in the end is not that much there are still fluctuations. This fact indicates that the protein is still flexible in the end of the simulation but does not change its shape that much as in the beginning.

### Root mean square fluctuations

RMSF for protein RMSF for C-alpha
Figure59: RMSF for protein
Figure60: RMSF for C-alpha

Figure 59 shows the RMSF for the whole protein. It is obvious that mainly the beginning of the protein shows the most fluctuation. This indicates that this region of the protein is the one with the most flexibility. In the middle part of the protein there are no significant high peaks which means that this part is fixed. In the end there is a peak which is a bit higher than the rest of the protein. This could suggest that this region is also a bit flexible. By comparing this plot with Figure 60 we can see the same trend only the peaks are not as high as in Figure 61. This indicates that especially the backbone which is displayed by the C-alpha atoms is very flexible.

### Pymol analysis of average and bfactors

#### Protein

1u5b/average 1u5b/bfactors bfactors/average
Figure61: Alignment of the mutated structure with the average structure
Figure62: Alignment of the mutated structure with the structure containing the bfactors
Figure63: Alignment of the average structure with the structure containing the bfactors
RMSD: 1.319 RMSD: 0.385 RMSD: 1.463

First we wanted to find out if there is a change in the structure because of the mutation so we aligned the average structure with the original structure ( Figure 61). As we can see the structures can not be superposed very good and especially in the beginning the two structures seem to be completely different. But by comparing it with Figure 9 we can see that this bad superposition does not occur because of the mutation. Only the part in the beginning is interesting because in the original structure and in the mutated structure they go in two different directions and this is not the case in the unmutated average structure. In Figure 62 we superposed the original structure (1u5b) with the structure containing the bfactors. The superposition seems to be almost perfect which is supported by the RMSD value. In Figure 63 the bfactors are visualised. As we can see the beginning of the protein is red-orange and turquoise so we can say that this part is the flexible one. But it is remarkable that this time the red part is not at the complete end but a few residues earlier which is different to the other two cases. Additionally to this fact we can also observe that on the other side of the protein (right) the protein is colored light blue indicating that this region is also a bit flexible.

#### C-alpha

1u5b/average 1u5b/bfactors bfactors/average
Figure64: Alignment of the mutated structure with the C-alpha atoms of the average structure
Figure65: Alignment of the mutated structure with the C-alpha atoms of the structure which contains the bfactors
Figure66: Alignment of the C-alpha atoms of the average structure with the C-alpha atoms of the structure containing the bfactors
RMSD: 1.106 RMSD: 0.301 RMSD: 1.112

In Figure 64 we first aligned the calculated C-alpha atoms with the original structure and as we can see there are huge differences between the two structures expecially in the beginning and in the end of the protein. These differences are not only evoked by the mutation since there are also differences in Figure 12. But we can see that in the beginning the structure is different than in Figure 12 so we can assume that this change in the structure is caused by the mutation. In Figure 65 the original structure is aligned with the reference structure containing the bfactors. Here we can see that the two structures are superposed very well. And it seems that there are no differences between the two structures. The bfactors which are shown in Figure 66 indicate that the beginning of the protein (left side) has a high temperature and so this part is flexible. It is interesting that on the right side of the picture there are also some C-alpha atoms which are colored in light blue meaning that this region of the protein is also a bit flexible.

Figure67: Radius of gyration during the MD simulation

In Figure 67 the radius of gyration of our protein is shown. It ranges from 2.25 nm to 2.4 nm. In the beginning it lies at about 2.4 nm but the it decreases all the time until 6500 ps where it reaches a value of 2.25 nm. At 6500 ps it rises a bit so that the radius is about 2.3 nm. During the remaining simulation it fluctuates around this value. Since there is more variation in the radius in the first part of the simulation we can say that here the protein is more flexible. But although the radius of gyration does not change that much in the end there is still a bit fluctuation indicating that the protein is still flexible.

## Structural analysis

### Solvent accessible surface area

SAS over time per residue SAS over time per atom Solvent accessible surface
Figure68: Plot of the average solvent accessibe surface over time per residue
Figure69: Plot of the average solvent accessibe surface over time per atom
Figure70: Plot of the solvent accessible surface of the protein during the md simulation

The first figure (Figure 68) shows the average sas for each residue during the simulation. It ranges from 0.01 nm2 to 2 nm2, which is a huge range. By looking at the plot we can see that the most accessible region is in the beginning since this peak is the highest one. After this peak the accessibility declines with each following residue until residue 100 where the surface seems to be not accessible for solvents. There are more regions on the surface especially in the center of the protein which are not accessible for solvents according to the plot. Although there are so many low values in the middle part there are three high peaks which show that there are some good reachable regions anyway. The end of the protein has not as high peaks as the beginning but there are many quite high peaks indicating that this region is again accessible for solvents. Figure 69 shows the same tendency as the figure before as there are much more low values in the middle part of the protein and more high peaks at both ends of the protein. The interesting point is that there are some peaks in the middle which are as high as the one in the beginning of the protein meaning that those regions are as accessible for solvents as the beginning. In the end of the protein between atom 6400 and 6500 there are two peaks which are even higher than the one in the beginning so the accessibility is even higher in some parts in the end of the protein. In the last figure (Figure 70) the solvents are devided in hydrophilic and hydrophobic so we can see whether the accessibility is better for hydrophilic (red) or hydrophobic (black) solvents. According to the figure the accessibility is better for hydrophobic solvents as the black line is always a bit higher than the red one. It is like this nearly the whole simulation apart from the end where they have both the same value.

### Hydrogen bonds

protein and protein protein and water
Figure71a: Internal hydrogen bonds and pairs within 0.35 nm during the simulation
Figure72a: Hydrogen bonds and pairs within 0.35 nm with the surrounding solvents during the simulation
Figure71b: Internal hydrogen bonds during the simulation
Figure72b: Hydrogen bonds with the surrounding solvents during the simulation

Donors Acceptors avg.# of h-bonds possible # of h-bonds
protein-protein 595 1159 302.744 344802
protein-water 29467 30031 807.996 4.42462e+08

In Figure 71a and Figure 72a the hydrogen bonds and the pairs within 0.35 nm are shown. As we can see and also according to the table above there are much more hydrogen bonds between the protein and the solvent (~800) than inside of the protein (~300). It is interesting that there are more pairs within 0.35 nm inside of the protein than with the solvents. Since there are less changes in the number of pairs during the simulation inside of the protein we can assume that the core of the protein is much more stable than the surface. Although it looks like there is no fluctuation in the hydrogen bonds we can see that they vary a lot when we look at Figure 71b and Figure 72b. The hydrogen bonds inside the protein range from 275 to 330 and the hydrogen bonds with the surrounding solvents range from 750 to 870. According to the plots we can see that both fluctuate a lot but all in all the number of hydrogen bonds first declines until 4000 ps and then they rise again. The hydrogen bonds with the solvents behave contrary since the number rises first until 4000 ps and then it declines. Since there is so much variation in the hydrogen bonds we can assume that the protein is flexible during the whole simulation.

### Ramachandran plot

 Figure73: Ramachandran plot of our protein Figure74: General Ramachandran plot (http://en.wikipedia.org/wiki/File:Ramachandran_plot_general_100K.jpg)

In Figure 73 the Ramachandran plot of our protein is shown. By comparing it with the average Ramachandran plot ( Figure 74) we can see that there are some similarities between the two figures although the left has much more black regions. But only the regions which are bordered red in the right figure are filled with black dots in our ramachandran plot except for the middle of the right border. This means that there are the average kind of angle combinations in our protein just in a much higher number. Only the combination of 0 Psi and 150-200 Phi only occurs in our protein according to the plots.

## Analysis of dynamics and time-averaged properties

### RMSD matrix

Figure75: RMSD matrix of the structures of our protein during the simulation

Figure 75 shows the correlation between the several structures of our protein during the simulation. It is obvious that there have to be a diagonal which is turquoise and blue. In addition to this diagonal there are many regions next to the diagonal which are also turquoise. Since nearly the whole neighbourhood around the diagonal is turquoise we can say that all the structures which are simulated are very similar to the ones which are simulated directly before and after themselves. So we can say that the change during the simulation is not very high. Only after 6500 ps it seems like there was a bigger change since the square between 6500 ps and 10000 ps is sometimes blue. Additionally to this observation we can see that the structures simulated between 0 ps - 800 ps and 6800 ps - 10000 ps are completely different to each other since this regions are red.

### Cluster analysis

 Figure76: Visualisation of the cluster of structure groups Figure77: Plot of the RMSD values of the clusters

The program found 411 clusters. In Figure 76 the clustered structures of the protein are visualised. The plot of the clustures show the RMSD values of the different plots. The RMSD values range from 0.08 nm to 0.75 nm. A RMSD value of 0.08 nm is very low indicating that the structures in this cluster have to be very similar. Since the peak is very small at this position there are only a few clusters with that similar structures. Contrary to those clusters the clusters with a RMSD value of 0.75 nm contain structures which are not very similar. But again there are only a few clusters with that property as the peak is very low. According to the plot most of the clusters have a RMSD value of 0.13 nm and 0.45 nm because the peaks are the highest ones in this region. Those values are all quite low indicating that most of the structures which are clustured are similar. This shows that there are no big changes in the structure during the simulation. Additionally we compared the first cluster with the second one. By aligning them we got an RMSD value of 0.553 nm. This is interesting because there are cluster which have a higher RMSD value so the structures which are clustered in cluster one and two have to be very similar.

### Internal RMSD

Figure78: Internal RMSD of our protein during the simulation

The internal RMSD values range from 0.05 nm to 0.48 nm so we can see that there is a lot fluctuation during the simulation. As we can see in Figure 78 in the beginning the values are very low but then they rise very fast until 5800 ps. At 5800 ps the internal RMSD value is 0.48 nm. Until the end of the simulation it only declines a bit but mainly fluctuates around 0.45. Since there is much more variation until 5800 ps we can say that the protein is much more flexible in the beginning of the simulation. But there is still fluctuation after this point so the protein is flexible until the end. Additionally the differences inside the protein grow since the value is much higher in the end than in the beginning.

# Discussion

## Mutation M82L

To find out whether the mutation at position 82 from methionine to leucine influences the structure or the function of the protein we compared the two simulations (wildtype and mutation).
By looking at the energy calculations we could not find any differences so the energies in the system are not changed because of the mutation.
As next point we compared the minimum distance between periodic boundary cells. Here the first two differences occur. All in all the two plots look very similar (original, mutation) apart from the peak at 500 ps and the decline at 6000 ps in the plot of the mutated protein. On these positions the minimum distance is completely the other way around. On position 500 it declines and on position 6000 it still rise in the unmutated protein.
By looking at the RMSF values of the two proteins we can see the same phenomenon. Both have the same trend since in both cases the highest peak is in the beginning and there is also a small one in the end. But the peak in the beginning has a height of 1.75 nm in the mutated protein and only 1.2 nm in the unmutated protein. So it seems that the mutation causes more flexibility in the beginning of the protein.
The structures which are analyzed in the next step show no big differences between the two simulated proteins. In contrast the bfactors are different. This is visualised since the end of the mutated protein is colored red at more positions (original, mutation) which means that it is more flexible. This fact shows again that the mutation effects more flexibility but has no influence on the structure.
The radius of gyration is very similar between the two proteins except of two times. The first one is at about 500 ps where the original structure declines whereas the mutated structure has a peak at this position. The second one is at 6000 ps. Now there is a peak in the unmutated protein where the mutated protein declines. On these two postions there are already differences in the minimum distance between periodic boundary cells which shows that the mutation has an influence on the movement of the protein at these two moments.
The fact that the behaviour of the beginning of the protein changes a bit because of the mutation can also be seen by the average sas per residue and per atom. In both cases the accessibility of the beginning of the protein for solvents is much higher in the unmutated protein than in the mutated one.
Additional to the sas we can compare the hydrogen bonds between the two proteins. By looking at the two plots (original, mutation) we can see that the trends of the number of hydrogen bonds inside the protein is completely different. In the unmutated protein there first is a decline until 2000 ps and then a rise of the number of hydrogen bonds till the end of the simulation. In contrast in the mutated protein the number of hydrogen bonds fluctuates all the time. The same is true for the hydrogen bonds with the surrounding solvents. Here we can see that the mutation has a big influence on the movement of the protein.
To find out whether the mutation has also a huge effect on the structure of the protein we compared the Ramachandran plots. As we can see most of the parts are very similar except of the right upper corner and the bottom right corner since these regions are more black in the mutated protein. But this can just be evoked by the different movement of the protein and not by a elementary change in the structure.
The comparison of the two RMSD matrices (original, mutation) was also very interesting as we can see how much the structures change during the two simulations. We have to be careful since the range of the RMSD value is different between the two matrices. Because of this fact it is difficult to say whether the structures are more similar in the one matrix or in the other one. What we can say that the regions which are colored turquoise. So the change during the different timesteps seems to be the same. The only point which is really interesting and very obvious is that the part between 200 ps - 600 ps and 6000 ps - 10000 ps does not correlate at all since this region is red and has a RMSD value of 0.911 nm.
As there are two kinds of structures which are so different it is clear that there has to be a high number of clusturs containing similar structures with a low RMSD value. Additionally there are clusters which have a very high RMSD value up to 0.9 nm. This is completely different with the clustered structures of the unmutated protein. Here most of the clusters have an RMSD value of 0.3 nm - 0.4 nm indicating that the structures are all quite similar.
The last point we compare is the internal RMSD (origianl, mutation). Both have the same RMSD value in the beginning of the protein but the values in the plot of the unmutated protein rise much faster than in the umutated one indicating that the unmutated protein changes much faster in the beginning of the simulation. Nevertheless the RMSD value of the unmutated protein rise continuous so that it is about 0.45 nm whereas the RMSD value of the unmutated protein is only about 0.4 nm. This different trend in the RMSD value shows again the influence of the mutation.
All in all we can say that there are many difference between the mutated and the unmutated protein indicating that especially the flexibility of the beginning of the protein has been affected by the mutation. But the general trend is nearly always the same so that the structure and the flexibility of the rest of the protein should be the same. We would say that only a change of the flexibility in the beginning of the protein is not grave since the active site is in the middle part of the protein. Since the whole protein consists of two domains (protein) this ending part of the protein could be important for the binding two the other domain. But since there is only a change in the flexibility and not in the structure this binding is still possible. Because of this we would say that the mutation is not damaging.

## Mutation C264W

To find out whether the mutation at position 264 from cysteine to tryptophan influences the structure or the function of the protein we compared the simulation of the wildtype protein with the mutated protein.
By looking at the energy calculations we could not find any differences so the energies in the system are not changed because of the mutation.
The next point we analysed is the minimum distance between periodic boundary cells. By looking at the plots (original, mutation) we can see that the two plots are very different. Both distances have the same value in the beginning but while the distance in the unmutated protein only fluctuates a bit around the value of 0.3 nm, the distance in the mutated protein immediatly declines to 0.2 nm. After 900 ps the values slowly rise again up to 4 nm and then the distance only fluctuates around this value during the rest of the simulation. This is completely contrary to the minimum distance in the unmutated protein during the simulation since there are two minima in the curve at 5000 ps and 6000 ps and in the end of the simulation the value of the minimum distance declines again. Since the two minimum distance curves are so different we can say that the movement of the two proteins has to be different, too.
By comparing the root mean square fluctuations again the plots (original, mutation) vary a lot. In the unmutated protein the highest peak is at the total beginning of the protein of about 1.25 nm. Whereas in the mutated the initial value is about 1 nm and than it rises so that the maximum value of 1.25 nm is at residue 20. This shows that the maximal flexibility is a bit later in the mutated protein than in the unmutated. The middle part of the two proteins are very similar since the peaks are all very low so that there is no significant flexibility in this part. The end is also quite the same. The peak is a bit higher in the mutated protein than in the unmutated one so its a bit more flexibile but not very much.
This different flexibility of the end of the protein can also be recognized by looking at the right side of the pymol analysis (original, mutation). Here we can see that the alpha helix in the end of the mutated protein is nearly completely light blue whereas the alpha helix in the unmutated protein has only a few light blue parts indicating that the end of the mutated protein is more flexible. Additionally the different flexibility in the beginning of the protein is shown. The unmutated protein is completely red in the beginning while the mutated one is red after a few residues. But not only the flexibility is different between the two proteins but also the structure as we can see in the pictures. In the unmutated protein the beginning goes in the same direction like the reference protein 1u5b but in the mutated protein the structure in the beginning is more curved so that it goes horizontal while the reference structure points upside. The next property we analysed is the radius of gyration. In both plots the trend seems to be the same except of one peak between 5000 ps and 6000 ps in the unmutated protein. Additionally to this peak the fluctuation in the unmutated protein is more intensive which shows that this protein changes its gyration more often during the simulation than the mutated one.
The mutation also has an influence on the solvent accessibility surface of the beginning of the protein. This is shown in the plots (original, mutation) since the highest peak of the unmutated protein reaches a value of 2.3 nm2 whereas the highest peak in the mutated protein only has a value of 2.0 nm2. The rest of the sas of the protein is nearly the same. So because of the mutation at position 264 the surface in the beginning of the protein is less reachable for solvents. The sas per atom also show differences between the two proteins. All in all the trend is the same between them but the values for the sas of the unmutated protein are most of the time higher than for the mutated protein indicating again that the solvent accessibility is better in the unmutated protein. The average sas during the simulation seems to be very similar in both proteins so we can say that the different sas in the beginning has no huge influence on the whole sas. It applies for both proteins that the sas does not change significantly during the simulation.
By looking at the internal hydrogen bonds we can see that in both cases the average number is nearly the same but we have to regard the trend of the occurence of them to see that the proteins are different. In the unmutated protein the number of hydrogen bonds starts with about 310 than declines until 2000 ps where only 290 hydrogen bonds exist. But after this moment the number rises again so that in the end of the simulation 320 hydrogen bonds exist. The mutated protein has the same number of hydrogen bonds in the beginning but during the simulation the number of hydrogen bonds fluctuates much more than in the unmutated protein and in the end of the simulation it only has 310 hydrogen bonds. The same disparities also occur with the hydrogen bonds with the surrounding solvents. This is obvious since the internal hydrogen bonds change because of the movement of the protein and this movement also influences the external hydrogen bonds. The fact that the progress of the hydrogen bonds is so different between the two proteins demonstrate that the proteins act different during the simulation because of the mutation.
The Ramachandran plots which visualise the secondary structure elements of the proteins have many analogies. But by looking to the right side of the plot we can see that there are many angle combinations and therefore stuctures missing in the mutated protein which exist in the unmutated one. So we can conclude that the mutation does not only influence the flexibility of the protein but also the structure itself.
To analyse the structure of the proteins more closely we compare the RMSD matrices (original, mutation). Of course we have to be careful again because of the different color-scale but we can see that there are three huge blocks of structures in the mutated protein during the simulation which are very similar. This is different in the unmutated protein. Of course there are also structures during the simulation which correlate but they are not partitioned that much.
Because of this fact it is obviously that there are many clusters of structures of the mutated protein which have a very good RMSD value of 0.15 nm to 0.45 nm. It is different in the unmutated protein where most of the clusters have an RMSD value of 0.3 nm and only a few clusters have an RMSD value better than 0.2 nm. This shows that the development of the structures in the unmutated protein is very continuous while in the mutated protein the structures developed erratically.
The last point we compared between the two proteins is the internal RMSD (original, Figure mutation). Both RMSD values are 0.1 nm in the beginning but while the value first rise continuously and the fluctuates around a value of 0.45 nm in the mutated protein there is much more variation in the unmutated one. Two times the RMSD value declines in this protein. The first time after 800 ps and the second time after 4000 ps. In the end it fluctuates around 4.0 nm. This shows again that the two proteins move different during the simulation.
By regarding all the results we can say that this mutation influences both the flexibility and the structure. Although the mutation occurs at position 264 the influenced region is in the beginning. The mutated protein is a bit more flexible than the unmutated one but this flexibility does not appear at the total beginning of the protein but a few residues later. Additionally to this change in the flexibility the end of the mutated protein is also more flexible than the unmutated protein. Because of this change in the two regions and especially because of the change in the structure we would say that this mutation could be deleterious. The terminal region of the BCKDHA could be important to bind to the beta-subunit (BCKDHB) of the protein complex and since the structure of this part changed a bit it could be possible that this binding does not exist any longer and therefore the function is affected, which leads to Maple syrup urine disease.