Homology Modelling GLA
by Benjamin Drexler and Fabian Grandke
Contents
Introduction
In this task, we performed homology modelling of the protein α-galactosidase A with the programs MODELLER, SWISS-MODEL, iTasser and 3D-JIGSAW. Homology modelling relies on the following two assumptions. First, the structure of the protein is determined by its amino acid sequence. Second, the structure of a protein is more conserved than its amino acid sequence. Usually one performs homology modelling of a protein which structure is not known. In this case, we have several PDB structures of the α-galactosidase A available and hence we are able to evaluate the resulting models of the programs afterwards.
General
Template Selection
The following table lists the best ten hits of the HHpred search of Task 1. We used 3HG3 (97% identity), 1KTB (53%) and 3CC1 (34%) as templates for the modelling process. This selection covers a wide range of sequence identity and hence we are able to evaluate how the sequence identity influence the quality of the models.
PDB-ID | Name | Probability | E-value | P-value | Identity | Template |
---|---|---|---|---|---|---|
> 60% sequence identity | ||||||
3hg3_A | Alpha-galactosidase A | 1.0 | 0 | 0 | 97% | x |
> 40% sequence identity | ||||||
1ktb_A | Alpha-N-acetylgalactosaminidase | 1.0 | 0 | 0 | 53% | x |
< 40% sequence identity | ||||||
1uas_A | Alpha-galactosidase | 1.0 | 0 | 0 | 39% | |
3lrk_A | Alpha-galactosidase 1 | 1.0 | 0 | 0 | 32% | |
3a5v_A | Alpha-galactosidase | 1.0 | 0 | 0 | 35% | |
1szn_A | Alpha-galactosidase | 1.0 | 0 | 0 | 34% | |
3a21_A | Putative secreted alpha-galactosidase | 1.0 | 0 | 0 | 34% | |
3cc1_A | BH1870 protein | 1.0 | 0 | 0 | 26% | x |
3a24_A | Alpha-galactosidase | 1.0 | 0 | 0 | 14% | |
1zy9_A | Alpha-galactosidase | 1.0 | 2.2E-37 | 8.8E-42 | 14% |
Evaluation
The evaluation of the models consist of two parts, i.e. a visual comparison with an experimental structure and a numeric evaluation. The PDB structures 1R46 and 1R47 were used for the evaluation. 1R46 is a structure of human α-galactosidase A without galactose (apo) and the structure 1R47 contains galactose (complexed). Both, the visual comparison and the numeric evaluation, were done only with the chain A of the structures, because the model programs also modelled one chain.
The differences between 1R46 and 1R47 are very marginal (see figure 1) and hence we did the visual comparison with one structure, i.e. 1R47.
The numeric evaluation involves the calculation of several scores.
RMSD
The root mean square deviation (RMSD) value between the model and the reference structure was calculated by the command line tool sap.
The calculation of the RMSD in the catalytic site was done by PyMol. We used the annotation of the UniProt entry to determine the active sites, which are Asp170 and Asp231. We applied the following workflow:
- Import the reference structure, e.g. 1R47
- Select the residues of the active site, i.e. Asp170 and Asp231
- Expand the selection with modify -> expand -> by 6A, residues
- Rename this selection to "selection_ref"
- Import the model
- Align the model to the reference structure (align -> to molecule -> 1R47)
- Select the residues of the active site of 1R47
- Expand the selection once again, but exclude residues of 1R47 (modify -> exluce -> object -> 1R47) afterwards
- Rename this selection to "selection_model"
- Align "selection_model" to "selection_ref" with align -> to selection and retrieve the RMSD
TMS
We used two resources to calculate the template modeling score (TMS). The first one is the command line tool TMS and the second one is the webserver of Zhang Lab. We included both scores, because they differ from each other.
Calculation of Models
MODELLER
MODELLER is a program to produce three-dimensional protein structures based on homology or comparative modelling. The user has to provide the sequence of the protein to be modeled and the structure and sequence of at least one related protein that is used as a template. MODELLER uses all atoms of the template protein, but the hydrogen-atoms. We used MODELLER as described in the tutorial Using Modeller for TASK 4. Therefor we had to align both sequences and convert them into pir-format. This alignment is given as input together with the template pdb-file. Unfortunately the input file has to be provided as python file. Additionally to the pairwise approach we used a multiple alignment as template for the model. Therefor we created an alignment of the sequence, provided in the Multiple_Sequence_Alignments section of this page. Then we added the target sequence to the alignment and supervised it. The supervision showed, that the sequences aligned very well in general, but the sequences 3LRK_A and 3CC1_A. Thus, those were removed and the alignment was realigned. Both, the supervised and the unsupervised alignment have been used as input for MODELLER.<ref name=modeller>http://salilab.org/modeller/</ref>
Pairwise Alignments
In this section, we used a pairwise alignment between the template (i.e. 3HG3, 1KTB and 3CC1) and the target as the input for MODELLER. All three models fairly match the structure of 1R47 (see figure 2). The model of 3HG3 seems to be the best (see figure 2A), closely followed by the model of 1KTB (see figure 2B). The largest deviations in respect to the reference structure can be observed in the model of 3CC1 (see figure 2C), especially in the coil regions of the protein.
The numeric evaluation confirms these observations. The differences of the RMSD values and the TMS of the model by 3HG3 and 1KTB are very close in all columns. The TMS of 3CC1 also does not differ very much, but the corresponding RMSD values are significant higher. In this case, it seems like that the difference of about 40% sequence identity between 3HG3 and 1KTB does not affect the quality of the model which is very interesting. In contrast the difference of 20% between 1KTB and 3CC1 leads to an quite observable loss in the quality of the model. It is also remarkable that the RMSD values of the catalytic site are lower than the overall RMSD value of 3HGH3 and 1KTB. So it seems like that their is an increase of the quality of the model in the active site.
Apo (1R46) | Complexed (1R47) | |||||||
---|---|---|---|---|---|---|---|---|
Template | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site |
3HG3 | 0.141 | 0.2743 | 0.498 | 0.326 | 0.1413 | 0.2746 | 0.512 | 0.366 |
1KTB | 0.140 | 0.2793 | 0.901 | 0.439 | 0.1413 | 0.2739 | 0.888 | 0.437 |
3CC1 | 0.1397 | 0.2670 | 2.864 | 3.436 | 0.1397 | 0.2665 | 2.853 | 3.405 |
Multiple Sequence Alignments
PDB-ID | Unsupervised | Supervised | Identity | Comment |
---|---|---|---|---|
3LX9_A | 99% | |||
3GXP_A | 99% | |||
3H53_A | 99% | |||
3HG3_A | 97% | |||
3IGU_A | 54% | |||
1KTB_A | 53% | |||
1UAS_A | 39% | |||
3LRK_A | 34% | Was removed due to little sequence identity. Caused huge gaps in alignment. | ||
3CC1_A | 28% | Was removed due to little sequence identity. Caused huge gaps in alignment. |
iTasser
Figure 1 shows, that iTasser takes an amino acid sequence as input and tries to retrieve template proteins from PDB. In the next step fragments from the the templates are reassembled to a complete model. In the last step, the model is reassembled by taking energy calculations into account. Additionally biological function prediction is done, but that was not of interest of this task.<ref name=itasser1>http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html</ref>
We used the iTasser-server in two different ways:
- Standard parameters: the protein sequence is given as input and the program searches PDB for templates. The found proteins are used to create a template to predict the structure.
- PDB-ID as input: together with the amino acid sequence a template PDB-ID is given as input. The program takes all available information into account and uses them to calculate the structure.
As the iTasser server has very low capacities and only one job commitment at the same time is possible, the results of the second way are not yet present. The standalone version is no option, because it has a size of about 10GB and it does not work properly.
SWISS-MODEL
We used the swissmodel server with two different options:
- Automated Mode: A template sequence is given as input. As no further information are given, the model is directly created from the amino acid sequence. This method should only be used, if the sequence identity between target and template is greater than 50%.
- Aligned Mode: A pairwise alignment of template and target sequence is given as input. We created our alignments using online ClustalW2 from EBI.
Following sequences have been selected:
3hg3_A | 1ktb_A | 3cc1_A | ||||||
---|---|---|---|---|---|---|---|---|
Automated Mode | Aligned Mode | Automated Mode | Aligned Mode | Automated Mode | Aligned Mode | |||
Identity | Z-score | Z-score | Identity | Z-score | Z-score | Identity | Z-score | Z-score |
97% | 0 | -0.415 | 53% | -2.742 | -12.996 | 26% | Error¹ | -14.046 |
¹The sequences are to different to create a useful model(26%). In the automated mode sequence identity of at least 50% is recommended.
Evaluation of Models
MODELLER
Numeric Evaluation
Apo (1R46) | Complexed (1R47) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Template | TMS | RMSD | RMSD catalytic site | TMS | RMSD | RMSD catalytic site | |||
3HG3 | 0.141 | 0.498 | 0.326 | 0.1413 | 0.512 | 0.366 | |||
1KTB | 0.140 | 0.901 | 0.439 | 0.1413 | 0.888 | 0.437 | |||
3CC1 | 0.1397 | 2.864 | 3.436 | 0.1397 | 2.853 | 3.405 |
Comparison to Experimental Structure
iTasser
Numeric Evaluation
Comparison to Experimental Structure
SWISS-MODEL
Template 3HG3
Numeric Evaluation
Apo (1R46) | Complexed (1R47) | |||||||
---|---|---|---|---|---|---|---|---|
Mode | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site |
Aligned | 0.1411 | 0.2729 | 0.485 | 0.279 | 0.1412 | 0.2731 | 0.489 | 0.290 |
Automated | 0.1411 | 0.2729 | 0.485 | 0.277 | 0.1412 | 0.2731 | 0.489 | 0.291 |
Comparison to Experimental Structure
Template 1KTB
Numeric Evaluation
Apo (1R46) | Complexed (1R47) | |||||||
---|---|---|---|---|---|---|---|---|
Mode | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site |
Aligned | 0.1598 | 0.2638 | 0.943 | 5.073 | 0.1606 | 0.2636 | 0.932 | 6.409 |
Automated | 0.1361 | 0.2669 | 0.981 | 0.417 | 0.1368 | 0.2672 | 0.974 | 0.404 |
Comparison to Experimental Structure
Template 3CC1
Numeric Evaluation
Apo (1R46) | Complexed (1R47) | |||||||
---|---|---|---|---|---|---|---|---|
Mode | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site | TMS (command line) | TMS (webserver) | RMSD | RMSD catalytic site |
Aligned | 0.1302 | 0.2436 | 3.279 | 7.107 | 0.1300 | 0.2442 | 3.802 | 7.357 |
Automated | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Comparison to Experimental Structure
Discussion
References
<references />