Difference between revisions of "Task 5: Homology Modeling"

Revision as of 21:45, 27 August 2013

1A6Z chain A was used as modeling target for all three methods.

Modeller

We used Modeller to create models based on a single template and also multiple templates.

Single template

<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 70%; } .colBasic2 th,td { padding: 3px; border: 2px solid black; } .colBasic2 td { text-align:left; } .colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;} </css>

**Table 1:** Template structures and their sequence identity to the target, as computed by Blast. The DOPE score, RMSD and GDT score are given as a quality measure. The different models were created based on pairwise sequence alignments with dynamic programming (std Alignment), pairwise sequence alignment with additional secondary structure information (2d Alignment) and manually curated alignments (curated Alignment).
Template	Seq. identity	std Alignment			2d alignment			curated Alignment
		DOPE score	RMSD	GDT score	DOPE score	RMSD	GDT score	DOPE score	RMSD	GDT score
1QVO_A	39%	-27772	3.647	0.6241	-27169	4.994	0.5653
1S7X_A	29%	-19941	15.806	0.3355	-18667	18.099	0.2509
1CD1_A	21%	-19034	18.066	0.3640	-24213	5.640	0.4697

</figtable>

<xr id="Modeller single"/> lists the selected templates and the Modeller results for the different template structures and alignment methods. In addition to the standard pairwise sequence alignment based on dynamic programming, we also used Modeller's alig2dn() method to improve the alignment by including secondary structure information of the template. The method tries to place gaps outside secondary structure segments. This is especially useful for alignments between more distant related sequences, because those alignments usually contain more gaps than alignments between more close related sequences. We also tried to improve the alignments manually. As Modeller quality score, we chose the DOPE score, which is a statistical potential that was optimized for the assessment of model quality. The DOPE score has an arbitrary scale, but scores for structures of the same protein are comparable and can be used to select the best model from a collection of structures. The lower the score, the better the model. In addition to the DOPE score, we also computed the RMSD and GDT score. The RMSD is a a good measure of the average distance between all pairs of corresponding atoms in two structures. Therefore, the lower the RMSD the better. For the GDT score, the average coverage of the target sequence under four defined distance cutoffs is computed. Normally, 1, 2, 4 and 8 Å are used as distance thresholds. The GDT score ranges between 0 and 1, with random superpositions of unrelated structures having a score of 0.1 to 0.2.

**Figure 1:** Superposition of the target 1A6Z_A (green), the template 1QVO_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller.
a) classical pairwise sequence alignment	b) inclusion of secondary structure information in the alignment

</figtable>

<xr id="pymol modeller 1QVO"/> shows a visualisation of the two models (purple) created from the template 1QVO_A (red) with the closest homology to the target (green). The first model a) is obviously much better than the second, because its secondary structure features match the target quite good and the position of the alpha helices of the second model differs more from the target.

**Figure 2:** Superposition of the target 1A6Z_A (green), the template 1S7X_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller.
a) classical pairwise sequence alignment	b) inclusion of secondary structure information in the alignment

</figtable>

The models created from the template 1S7X_A are worse than those from the more closely related 1QVO_A. The 3D representation in <xr id="pymol modeller 1S7X"/> shows several regions in both models, where the secondary structure elements could not be superimposed correctly to the reference.

**Figure 3:** Superposition of the target 1A6Z_A (green), the template 1CD1_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller.
a) classical pairwise sequence alignment	b) inclusion of secondary structure information in the alignment

</figtable>

Including the secondary structure information in the alignment did only improve the model of the most distant homolog 1CD1_A. <xr id="pymol modeller 1CD1"/> shows, that the second model b) is comparable to the models created from 1QVO_A, but one end of model a) in the upper left stands out from the protein. The alignments between the target sequence and the two more close related template sequences 1QVO and 1S7X are probably already quite good so that including secondary structure information could not improve those alignments.

The DOPE score has a positive correlation with the the GDT score, as well as with the RMSD. The scores differ a bit in some cases, but all in all, they agree that the model created from 1QVO_A and the standard alignment is the best. This is not surprising, since the 3D visualisations show that 1QVO_A is already very similar to the reference, whereas the other two templates differ much more.

Multiple templates

We also created templates using more than one template in a single modeling step. Therefore, we created three sets of structures, one with close homologs, one with distant homologs and one mixed set.

**Table 2:** The three different sets used as templates for Modeller: two sets of close and distant homologs and a mixed set.
close homology		distant homology		mixed
Template	Seq. identity	Template	Seq. identity	Template	Seq. identity
1QVO_A	39%	3HUJ_C	23%	1QVO_A	39%
1ZAG_A	36%	1CD1_A	21%	1CD1_A	21%
1RJZ_D	34%	1VZY_A	14%

</figtable>

<xr id="multiple sets"/> specifies the three sets.

**Table 3:** Results of the modeling with multiple templates. We computed models using 2 structures as templates and also using three structures.
	close homology		distant homology		mixed homology
Template	1QVO_A, 1ZAG_A	1QVO_A, 1ZAG_A, 1RJZ_D	3HUJ_C, 1CD1_A	3HUJ_C, 1CD1_A, 1VZY_A	1QVO_A, 1CD1_A
DOPE score	-28073	-27460	-25967	-20588	-25894
RMSD	3.432	2.431	4.130	7.741	3.974
GDT score	0.6553	0.7638	0.5607	0.3814	0.5846
Pymol visualisation	Visualisation of the target (green) and the model created from 1QVO_A and 1ZAG_A (purple).	Visualisation of the target (green) and model created from 1QVO_A, 1ZAG_A and 1RJZ_D (purple).	Visualisation of the target (green) and model model created from 3HUJ_C and 1CD1_A (purple).	Visualisation of the target (green) and model created from 3HUJ_C, 1CD1_A and 1VZY_A (purple).	Visualisation of the target (green) and the model created from 1QVO_A and 1CD1_A (purple).

</figtable>

Including more templates improves the quality of the models. However, we got the best results with two template sequences, because three templates led to a bit worse model than two templates. Surprisingly, two templates with low sequence identity to the target led to a good model with an RMSD of 4.12 which is nearly as good as the model created with the mixed set.

Swiss-Model

We used Swiss-Model to create models using 1QVO_A, 1S7X_A and 1CD1_A as template.

**Table 4:** Overview of the Swiss-Model results for the three different templates.
	1QVO_A	1S7X_A	1CD1_A
Seq. identity	39%	29%	21%
Z-score	-1.977	-2.005	-2.707
RMSD	2.847	2.757	3.604
GDT score	0.6774	0.7086	0.6121
Pymol visualisation	Visualisation of the target (green), the template 1QVO_A and the model (purple).	Visualisation of the target (green), the template 1S7X-A and the model (purple).	Visualisation of the target (green), the template 1CD1_A and the model (purple).
Anolea and Gromos energy

</figtable>

Swiss-Model outputs a raw score and also a Z-score that represents an absolute measure of the model quality. It relates the model's raw score to the scores that high-resolution X-ray structures get and thus gives an estimate of how likely the model has a quality comparable to an experimental structure. A low quality model is indicated by a strong negative Z-score, which means that the raw score is several standard deviations lower as the scores of experimental structures with similar size (see Swiss-Model help).

Swiss-Model also provides plots that help to analyse the local energy of the model. For this, the atomic empirical mean force potential (Anolea) and the Gromos simulation package are used. Both are used calculate the energy of each amino acid in the sequence. The two plots show the protein sequence on the x-axis and the calculated energy of each residue on the y-axis. A low energy corresponds to a favorable energy environment for an amino acid and a positive energy represents an unfavorable energy environment.

I-TASSER

I-Tasser was used to create models from two different templates. Due to I-Tassers very long runtime of over 60h for one protein and because we were only allowed to run one job at a time, we only created 2 models.

	1QVO_A	1CD1_A
Seq. identity	39%	21%
C-score	1.73
RMSD	3.062
GDT score	0.6719
Pymol visualisation	Visualisation of the target (green), the template 1QVO_A and the model (purple).	Visualisation of the target (green), the template 1CD1_A and the model (purple).

</figtable>

I-Tasser uses threading in the first step to search for several template structures with high secondary structure similarity to the target in addition to the user specified template(s). Fragments of those structures are then reassembled to create several models for the target. The models are clustered and the lowest energy models are reported. We only selected the first and best model, because we wanted to make it comparable to the results from Modeller and Swiss-Model, both methods only report a single model.

I-Tasser computes a confidence score (C-score) as quality measure for the created models. It ranges between -5 and 2, with a high score indicating a high confidence (see cscore.txt).

Discussion

The GDT score correates with the RMSD for most models, but there are some exceptions where the GDT score is much worse than the RMSD.

@@ Line 79: / Line 79: @@
 Including the secondary structure information in the alignment did only improve the model of the most distant homolog 1CD1_A. <xr id="pymol modeller 1CD1"/> shows, that the second model b) is comparable to the models created from 1QVO_A, but one end of model a) in the upper left stands out from the protein.
-The alignments between the target sequence and the two more close related template sequences 1QVO and 1S7X are probably already quite good, so that including secondary structure information could not improve those alignments.
+The alignments between the target sequence and the two more close related template sequences 1QVO and 1S7X are probably already quite good so that including secondary structure information could not improve those alignments.
 The DOPE score has a positive correlation with the the GDT score, as well as with the RMSD. The scores differ a bit in some cases, but all in all, they agree that the model created from 1QVO_A and the standard alignment is the best. This is not surprising, since the 3D visualisations show that 1QVO_A is already very similar to the reference, whereas the other two templates differ much more.
 === Multiple templates ===
@@ Line 129: / Line 127: @@
 </figtable>
+Including more templates improves the quality of the models. However, we got the best results with two template sequences, because three templates led to a bit worse model than two templates. Surprisingly, two templates with low sequence identity to the target led to a good model with an RMSD of 4.12 which is nearly as good as the model created with the mixed set.
-Including more templates improves the quality of the model, too many templates
 ==Swiss-Model==

Difference between revisions of "Task 5: Homology Modeling"

Revision as of 21:45, 27 August 2013

Contents

Modeller

Single template

Multiple templates

Swiss-Model

I-TASSER

Discussion

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools