Difference between revisions of "Homology Based Structure Predictions Hemochromatosis"

From Bioinformatikpedia
(1k5nA)
(Riddle of the task: riddle removed)
 
(37 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Hemochromatosis|Hemochromatosis]]>>[[Homology Based Structure Predictions Hemochromatosis|Task 4: Homology based structure predictions]]
 
[[Hemochromatosis|Hemochromatosis]]>>[[Homology Based Structure Predictions Hemochromatosis|Task 4: Homology based structure predictions]]
 
== Riddle of the task ==
 
 
After endless battles and deadly traps you have finally reached the tomb's final chamber. As you enter it you notice that there is no sign of the treasures that were promised by the old map you found months ago. Suddenly you hear a loud noise behind you and a solid wall of stone blocks the only entrance into the room. You are trapped! You look around and notice something on the walls. On the left wall are four runes in an ancient language. Luckily its the same language as the notes on the map you deciphered and they translate into four single letters:
 
* C N O I
 
 
On the opposite wall you can see four simple symbols:
 
* a triangle
 
* a square
 
* a circle
 
* and a diamond (dt. Raute)
 
 
After further investigation you notice that the four symbols can be pushed into the wall, but you don't know what would happen and which one(s) to push.
 
 
What do you do?
 
 
 
 
Some hints:
 
*<font style="color: black; background-color: black">You only have to push one button and only once.</font>
 
   
 
== Short Task Description ==
 
== Short Task Description ==
Line 189: Line 169:
 
=== 1k5nA ===
 
=== 1k5nA ===
   
<figure id="1K5NbasedModels">
+
<figure id="1K5NSimple">
[[File:hemochromatosis_modeller_Simple1K5N.gif|200px|thumb|left|model built by using 1K5N, simple modeller alignment, superimposed with 1A6Z]][[File:hemochromatosis_modeller_2d-1K5N.gif|200px|thumb|center|model built by using 1K5N, 2d modeller alignment, superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_Simple1K5N.gif|200px|thumb|left|'''Figure 1:''' model built by using 1K5N, simple modeller alignment, superimposed with 1A6Z]]</figure><figure id="1K5N2d">[[File:hemochromatosis_modeller_2d-1K5N.gif|200px|thumb|center|'''Figure 2:''' model built by using 1K5N, 2d modeller alignment, superimposed with 1A6Z]]</figure>
</figure>
 
 
<br>
 
<br>
 
Both models are nearly the same, showing a good superimposition with chain A from 1A6Z. Also the model generated using the 2d alignment of modeller does not improve TM-score, TMAlign score and RMSD together. This is caused by the fact that the structure incorporating alignment method generates nearly the same alignment as the "normal" one (from modeller). Also the 2dalign() call aligns the first occuring amino acid way before the following residues. This might be an error and tried to avoided by manual correction.
 
Both models are nearly the same, showing a good superimposition with chain A from 1A6Z. Also the model generated using the 2d alignment of modeller does not improve TM-score, TMAlign score and RMSD together. This is caused by the fact that the structure incorporating alignment method generates nearly the same alignment as the "normal" one (from modeller). Also the 2dalign() call aligns the first occuring amino acid way before the following residues. This might be an error and tried to avoided by manual correction.
Line 199: Line 178:
   
 
<br style="clear:both;">
 
<br style="clear:both;">
<figure id="1ZS8basedModels">
+
<figure id="1ZS8Simple">
[[File:hemochromatosis_modeller_Simple-1ZS8.gif|200px|thumb|left|model built by using 1ZS8, simple modeller alignment, superimposed with 1A6Z]][[File:hemochromatosis_modeller_2d-1ZS8.gif|200px|thumb|center|model built by using 1ZS8, 2d modeller alignment, superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_Simple-1ZS8.gif|200px|thumb|left|'''Figure 3:''' model built by using 1ZS8, simple modeller alignment, superimposed with 1A6Z]]</figure><figure id="1ZS82d">[[File:hemochromatosis_modeller_2d-1ZS8.gif|200px|thumb|center|'''Figure 4:''' model built by using 1ZS8, 2d modeller alignment, superimposed with 1A6Z]]
 
</figure>
 
</figure>
 
<br>
 
<br>
On the first look both structures look the same but as Table TODO shows, the 2d alignment based model has a Higher TM-Score, TMAlign score and RMSD score. This is caused by the fact that the 2d alignment (which incorporates structure) introduces more gaps but aligns the structure a bit better, resulting in the improvement of the model (against the normal-modeller-alignment based one).
+
On the first look both structures look the same but as <xr id="modeller_scores_native"/>/<xr id="modeller_scores_complex"/> show, the 2d alignment based model has a Higher TM-Score, TMAlign score and RMSD score. This is caused by the fact that the 2d alignment (which incorporates structure) introduces more gaps but aligns the structure a bit better, resulting in the improvement of the model (against the normal-modeller-alignment based one).
   
 
=== 2iadB ===
 
=== 2iadB ===
<figure id="2IADbasedModels">
+
<figure id="2IADSimple">
[[File:hemochromatosis_modeller_Simple-2IAD.gif|200px|thumb|left|model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z]][[File:hemochromatosis_modeller_2d-2IAD.gif|200px|thumb|center|model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_Simple-2IAD.gif|200px|thumb|left|'''Figure 5:''' model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z]]</figure><figure id="2IAD2d">[[File:hemochromatosis_modeller_2d-2IAD.gif|200px|thumb|center|'''Figure 6:''' model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z]]</figure>
  +
<figure id="2IADSimpleNoBars">[[File:hemochromatosis_modeller_Simple-2IADNoALNBars.gif|200px|thumb|left|'''Figure 7:''' model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z]]</figure><figure id="2IAD2dNoBars">[[File:hemochromatosis_modeller_2d-2IAD_noALNBars.gif|200px|thumb|center|'''Figure 8:''' model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z]]
<br>
 
[[File:hemochromatosis_modeller_Simple-2IADNoALNBars.gif|200px|thumb|left|model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z]][[File:hemochromatosis_modeller_2d-2IAD_noALNBars.gif|200px|thumb|center|model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z]]
 
 
</figure>
 
</figure>
  +
<br style="clear:both;">
<br>
 
 
As indicated by the alignment lines and also detectable in the ones without the alignment lines these models have not a good predicted structure as the ones presented before. This also reflects the low TMAlign and high RMSD scores (compared to the other models). But it is also noticable that the beta sheets and the two helices have nearly the right orientation. Therefore homology (0.5 TMScore) is nearly achieved.
 
As indicated by the alignment lines and also detectable in the ones without the alignment lines these models have not a good predicted structure as the ones presented before. This also reflects the low TMAlign and high RMSD scores (compared to the other models). But it is also noticable that the beta sheets and the two helices have nearly the right orientation. Therefore homology (0.5 TMScore) is nearly achieved.
   
Line 220: Line 198:
   
 
<br style="clear:both;">
 
<br style="clear:both;">
<figure id="3DBXbasedModels">
+
<figure id="3DBXSimple">
[[File:hemochromatosis_modeller_Simple-3DBX.gif|200px|thumb|left|model built by using 3DBX, simple modeller alignment, superimposed with 1A6Z]][[File:hemochromatosis_modeller_2d_3DBX.gif|200px|thumb|center|model built by using 3DBX, 2d modeller alignment, superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_Simple-3DBX.gif|200px|thumb|left|'''Figure 9:''' model built by using 3DBX, simple modeller alignment, superimposed with 1A6Z]]</figure><figure id="3DBX2d">[[File:hemochromatosis_modeller_2d_3DBX.gif|200px|thumb|center|'''Figure 10:''' model built by using 3DBX, 2d modeller alignment, superimposed with 1A6Z]]
 
</figure>
 
</figure>
  +
<br>
 
 
Both of the 3DBX based models resemble the original HFE protein mostly, also indicated by their TMAlign scores. However, in this case the 2d alignment does not improve but worsen the prediction (against the simple alignment). The small slide in the 2d alignment of the first residues (agains the normal alignment) seems to cause this.
 
Both of the 3DBX based models resemble the original HFE protein mostly, also indicated by their TMAlign scores. However, in this case the 2d alignment does not improve but worsen the prediction (against the simple alignment). The small slide in the 2d alignment of the first residues (agains the normal alignment) seems to cause this.
   
Line 229: Line 207:
   
 
<figure id="MSA1basedModel">
 
<figure id="MSA1basedModel">
[[File:hemochromatosis_modeller_MSA1.gif|200px|thumb|left|model built by using MSA1 superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_MSA1.gif|200px|thumb|left|'''Figure 11:''' model built by using MSA1 superimposed with 1A6Z]]
 
</figure>
 
</figure>
<br>
 
 
The model resembles the original protein, and the MSA shows a common gap on position 19-42 of the target sequence. This is the gap in the 3DBX model simple alignment, on which the best model has been built (based on TMAlign score).
 
The model resembles the original protein, and the MSA shows a common gap on position 19-42 of the target sequence. This is the gap in the 3DBX model simple alignment, on which the best model has been built (based on TMAlign score).
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 239: Line 216:
   
 
<figure id="MSA2basedModel">
 
<figure id="MSA2basedModel">
[[File:hemochromatosis_modeller_MSA2.gif|200px|thumb|left|model built by using MSA2 superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_MSA2.gif|200px|thumb|left|'''Figure 12:''' model built by using MSA2 superimposed with 1A6Z]]
 
</figure>
 
</figure>
<br>
 
 
This model is worse than the MSA1 based one, although the 3DBX and 1ZS8 alignments resulted in good models. This means that MSAs do not always improve the built models, but can (if one chooses a "bad" template as 2IAD for example).
 
This model is worse than the MSA1 based one, although the 3DBX and 1ZS8 alignments resulted in good models. This means that MSAs do not always improve the built models, but can (if one chooses a "bad" template as 2IAD for example).
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 249: Line 225:
 
<br style="clear:both;">
 
<br style="clear:both;">
 
<figure id="MSA3basedModel">
 
<figure id="MSA3basedModel">
[[File:hemochromatosis_modeller_MSA3.gif|200px|thumb|left|model built by using MSA3 superimposed with 1A6Z]][[File:hemochromatosis_modeller_MSA3NoALNBars.gif|200px|thumb|center|model built by using MSA3 superimposed with 1A6Z]]
+
[[File:hemochromatosis_modeller_MSA3.gif|200px|thumb|left|'''Figure 13:''' model built by using MSA3 superimposed with 1A6Z]]</figure><figure id="MSA3basedModelNoBars">[[File:hemochromatosis_modeller_MSA3NoALNBars.gif|200px|thumb|center|'''Figure 14:''' model built by using MSA3 superimposed with 1A6Z]]
 
</figure>
 
</figure>
 
<br>
 
<br>
Line 356: Line 332:
 
| style="border-style: solid; border-width: 0 0 0 0" |3.679 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |3.679 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 3:''' Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 459: Line 435:
 
| style="border-style: solid; border-width: 0 0 0 0" |3.637 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |3.637 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 4:''' Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 487: Line 463:
 
| align="right" | [[File:hemo_swiss_1k5nA_zscore.png|thumb|200px|QMEAN statistics for SwissModel 1k5nA.]]
 
| align="right" | [[File:hemo_swiss_1k5nA_zscore.png|thumb|200px|QMEAN statistics for SwissModel 1k5nA.]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''TODO''' Summarized output from SwissModel for 1k5nA.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 5:''' Summarized output from SwissModel for 1k5nA.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 493: Line 469:
 
The alignment from SwissModel for 1k5nA spans the residues 26 to 299 of HFE. It has a sequence identity of 38.71% and contains only three single gaps in 1k5nA's sequence, none of which break up a secondary structure. The estimated QMEAN Z-score is -1.92. <xr id="swiss_1k5nA_stats"/> shows the summary of the quality assessment provided by SwissModel. The error rates per residue fluctuate rapidly with peaks of over 4 anstrom. The anolea estimation shows two long regions with unfavorable scores from residue 69 to 110 and 128 to 206.
 
The alignment from SwissModel for 1k5nA spans the residues 26 to 299 of HFE. It has a sequence identity of 38.71% and contains only three single gaps in 1k5nA's sequence, none of which break up a secondary structure. The estimated QMEAN Z-score is -1.92. <xr id="swiss_1k5nA_stats"/> shows the summary of the quality assessment provided by SwissModel. The error rates per residue fluctuate rapidly with peaks of over 4 anstrom. The anolea estimation shows two long regions with unfavorable scores from residue 69 to 110 and 128 to 206.
   
The three major helices of the MHC I domain are well modelled and aligned. The same is true for the sheets of the same domain. The C1 domain's sheets are not as good. Some are missing or too short and they are also not that well aligned.
+
The three major helices of the MHC I domain (front of the figure) are well modeled and aligned. The same is true for the sheets of the same domain. The C1 domain's (background of the figure) sheets are not as good. Some are missing or too short and they are also not that well aligned.
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 510: Line 486:
 
| align="right" | [[File:hemo_swiss_1zs8A_zscore.png|thumb|200px|QMEAN statistics for SwissModel 1zs8A.]]
 
| align="right" | [[File:hemo_swiss_1zs8A_zscore.png|thumb|200px|QMEAN statistics for SwissModel 1zs8A.]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''TODO''' Summarized output from SwissModel for 1zs8A.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 6:''' Summarized output from SwissModel for 1zs8A.
 
|}
 
|}
 
</figtable>
 
</figtable>
   
Similar to 1k5nA the alignment spans from 27 to 298 with a sequence identity of 29.56%. It contains three single gaps and one gaps of 3 residues within 1zs8A that don't split a secondary structure, but also one bigger gap of 7 residues that breaks up a helix. The QMEAN Z-score of -2.88 is worse than that for 1k5nA which isn't surprising given the drop of 9% sequence identity. The error rates are also worse with a maximum of 8 angstrom and they almost never go below 1 angstrom. The anolea graph exhibits the same unfavorable regions as in 1k5nA, though the first region is even worse and the second a bit shorter.
+
The results for 1zs8A are summarized in <xr id="swiss_1zs8A_stats"/>. Similar to 1k5nA the alignment spans from 27 to 298 with a sequence identity of 29.56%. It contains three single gaps and one gaps of 3 residues within 1zs8A that don't split a secondary structure, but also one bigger gap of 7 residues that breaks up a helix. The QMEAN Z-score of -2.88 is worse than that for 1k5nA which isn't surprising given the drop of 9% sequence identity. The error rates are also worse with a maximum of 8 angstrom and they almost never go below 1 angstrom. The anolea graph exhibits the same unfavorable regions as in 1k5nA, though the first region is even worse and the second a bit shorter.
  +
  +
This time SwissModel seems to have had more problems with the MHC I domain. The sheets are not as well aligned as in the 1k5nA model. The first helix (lower front) is quite well aligned, but the second one (left background) is completely missing and the third is a bit too short. The C1 domain is missing one sheet, but is better aligned than in the previous model (1k5nA).
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 531: Line 509:
 
| align="right" | [[File:hemo_swiss_2iadB_zscore.png|thumb|200px|QMEAN statistics for SwissModel 2iadB.]]
 
| align="right" | [[File:hemo_swiss_2iadB_zscore.png|thumb|200px|QMEAN statistics for SwissModel 2iadB.]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''TODO''' Summarized output from SwissModel for 2iadB.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 7:''' Summarized output from SwissModel for 2iadB.
 
|}
 
|}
 
</figtable>
 
</figtable>
   
In contrast to the other three templates, the alignment for 2iadB is rather short (111 to 299). It contains one single gap and two 2 residue gaps, neither of them breaking a secondary structure in 2iadB. This and the sequence identity of only 21.76% might be the cause for the very bad QMEAN Z-score of -3.42. The error rates don't go as high as for 1zs8A, but they never drop below 2 angstrom for the first half of the alignment and improve only slightly in the second half. This is also reflected in the anolea distribution as there are almost no favorable regions during the first half. The second half (starting around 222) gets much better which correlates with the regions from the other templates.
+
In contrast to the other three templates, the alignment for 2iadB is rather short (111 to 299). It contains one single gap and two 2 residue gaps, neither of them breaking a secondary structure in 2iadB. This and the sequence identity of only 21.76% might be the cause for the very bad QMEAN Z-score of -3.42 (see <xr id="swiss_2iadB_stats"/> for a summary of the results). The error rates don't go as high as for 1zs8A, but they never drop below 2 angstrom for the first half of the alignment and improve only slightly in the second half. This is also reflected in the anolea distribution as there are almost no favorable regions during the first half. The second half (starting around 222) gets much better which correlates with the regions from the other templates.
  +
  +
Due to the short alignment the MHC I domain starts halfway through and is quite bad. The sheets, while modeled correctly, are quite misaligned. Both helices are too short and not that well aligned as in the other models. The C1 domain's sheets are now all modeled, but the alignment is even worse.
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 552: Line 532:
 
| align="right" | [[File:hemo_swiss_3dbxA_zscore.png|thumb|200px|QMEAN statistics for SwissModel 3dbxA.]]
 
| align="right" | [[File:hemo_swiss_3dbxA_zscore.png|thumb|200px|QMEAN statistics for SwissModel 3dbxA.]]
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''TODO''' Summarized output from SwissModel for 3dbxA.
+
|+ style="caption-side: bottom; text-align: left" |<font size=1>'''Table 8:''' Summarized output from SwissModel for 3dbxA.
 
|}
 
|}
 
</figtable>
 
</figtable>
   
3dbxA has the longest alignment of all templates with a length of 275 residues (24 to 298 in HFE), but it also has the lowest sequence identity with only 18.93 percent. The alignment has only two single gaps in 3bdxA's sequence, but one of them splits a sheet and the other one a helix. Nevertheless it has a QMEAN Z-score of -2.8 which is even slightly better than for 1zs8A. The error rates average around 3 angstrom for the first half of the alignment and slightly improve for the second. The anolea graph again shows the two unfavorable regions from 1k5nA and 1zs8A, though this time the second one is almost neutral and the first is slightly longer.
+
3dbxA (SwissModel output shown in <xr id="swiss_3dbxA_stats"/>) has the longest alignment of all templates with a length of 275 residues (24 to 298 in HFE), but it also has the lowest sequence identity with only 18.93 percent. The alignment has only two single gaps in 3bdxA's sequence, but one of them splits a sheet and the other one a helix. Nevertheless it has a QMEAN Z-score of -2.8 which is even slightly better than for 1zs8A. The error rates average around 3 angstrom for the first half of the alignment and slightly improve for the second. The anolea graph again shows the two unfavorable regions from 1k5nA and 1zs8A, though this time the second one is almost neutral and the first is slightly longer.
  +
  +
The sheets of the MHC I domain, while sometimes a bit too long, are very well aligned. The first helix is a bit too short, contains a break, and is quite misalinged (especially in the first half). The second and third helices are quite well aligned, though the third one got a bit too long. The C1 domain is also quite good, but one of the sheets got split into two shorter ones which are also quite misaligned to their "parent sheet".
   
 
<br style="clear:both;">
 
<br style="clear:both;">
Line 563: Line 545:
   
   
TM-Score again seems to have problems to correctly align both sequences and therefore provides no meaningful data (TM-Score, GDT-TS, and GDT-HA in <xr id="swiss_scores_native"/>). When comparing the TM-Score from TM-Align to the corresponding scores for the Modeller results (cf. <xr id="modeller_scores_native"/> and <xr id="modeller_scores_complex"/>)you can see that SwissModel does neither perform better nor worse than Modeller on a general basis.
+
TM-Score again seems to have problems to correctly align both sequences and therefore provides no meaningful data (TM-Score, GDT-TS, and GDT-HA in <xr id="swiss_scores_native"/> and <xr id="swiss_scores_complex"/>). When comparing the TM-Score from TM-Align to the corresponding scores for the Modeller results (cf. <xr id="modeller_scores_native"/> and <xr id="modeller_scores_complex"/>)you can see that SwissModel does neither perform better nor worse than Modeller on a general basis.
   
 
Surprisingly 3dbxA provides the best model (TM-Align 0.89) despite its low sequence identity and outperforms those from Modeller. Especially the weighted RMSD of 1.1 is quite good compared to the other models.
 
Surprisingly 3dbxA provides the best model (TM-Align 0.89) despite its low sequence identity and outperforms those from Modeller. Especially the weighted RMSD of 1.1 is quite good compared to the other models.
Line 613: Line 595:
 
| style="border-style: solid; border-width: 0 0 0 0" |1.111 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |1.111 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 9:''' Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 660: Line 642:
 
| style="border-style: solid; border-width: 0 0 0 0" |1.203 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |1.203 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 10:''' Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
  +
|}
  +
</figtable>
  +
  +
<br style="clear:both;">
  +
  +
=== Amino acid modification sites analysis ===
  +
  +
  +
The annotated amino acid modification sites for HFE show similar conservation results (cf. <xr id="swiss_align_conserved"/>) as previously in [[Sequence Alignments Hemochromatosis#Conservation_of_important_positions|Task 2]]. With the exception of 3dbxA both disulfide bonds are conserved as well as the first glycosylation site. The later two glycosylation sites are substituted. The alignment with 2iadB does not include the first glycosylation site.
  +
  +
An examination with PyMol shows that all disulfide bonds, with the exception of the first bond in 3dbxA, can be established. This is in accordance with the conservation in the alignments. A quick check with the pdb structure of 3dbxA revealed that there is no substitute disulfide bond in the original protein. This suggests that this bond is not that important for the forming of the MHC I domain in HFE.<br>
  +
The glycosylation sites in uniprot are annotated as "potential" which means that there is no experimental evidence for their importance or even existence. Nevertheless the first one (residue 110) seems to be highly conserved within HFE and its homologs. The surface accessibility and orientation for this residue is also quite good conserved in 1k5nA and 3dbxA. In 1zs8A the position and orientation are quite shifted. The third glycosylation site (residue 234) is modeled and aligned very well in 1k5nA, 1zs8A, and 3dbxA, but a bit off in the 2iadB model. Only the 1k5nA model aligns the third one (residue 234) about right. In the other three models its way off with 3dbxA taking the second place.
  +
  +
<figtable id="swiss_align_conserved">
  +
{| class="wikitable" style="width: 50%; margin: 1em 1em 1em 0; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
  +
|-
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Site
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |1k5nA
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |1zs8A
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |2iadB
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |3dbxA
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Glycosylation 1 (N at 110)
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |---
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (A)
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Glycosylation 2 (N at 130)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (G)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (G)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (T)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (G)
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Glycosylation 3 (N at 234)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (E)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (D)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (K)
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (P)
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Disulfide bond 1 (C at 124)
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Disulfide bond 1 (C at 187)
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |subst. (F)
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Disulfide bond 2 (C at 225)
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Disulfide bond 2 (C at 282)
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
| style="border-style: solid; border-width: 0 0 0 0" |conserved
  +
|-
  +
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 11:''' Conservation of HFE's amino acid modification sites for the different SwissModel alignments.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 677: Line 724:
   
 
<figure id="itasser_2iadB_1">
 
<figure id="itasser_2iadB_1">
[[File:hemo_model_itasser_2iadB_1.png|thumb|200px|I-Tasser 2iadB (red) superimposed on 1a6zA (green).]]
+
[[File:hemo_model_itasser_2iadB_1.png|thumb|200px|'''Figure 15:''' I-Tasser 2iadB (red) superimposed on 1a6zA (green).]]
 
</figure>
 
</figure>
   
 
<figure id="itasser_swiss_2iadB_1">
 
<figure id="itasser_swiss_2iadB_1">
[[File:Hemo_model_itasser_swiss_2iadB_1.png|thumb|200px|I-Tasser 2iadB (red) and SwissModel 2iadB (pink) superimposed on 1a6zA (green).]]
+
[[File:Hemo_model_itasser_swiss_2iadB_1.png|thumb|200px|'''Figure 16:''' I-Tasser 2iadB (red) and SwissModel 2iadB (pink) superimposed on 1a6zA (green).]]
 
</figure>
 
</figure>
   
 
I-Tasser provided 5 models for 2iadB. The best one with a C-score of -1.46 and an estimated TM-Score of 0.53±0.15 was selected for the evaluation.
 
I-Tasser provided 5 models for 2iadB. The best one with a C-score of -1.46 and an estimated TM-Score of 0.53±0.15 was selected for the evaluation.
   
Compared to all other models this one '''looks''' the worst (see <xr id="itasser_2iadB_1"/>). A whole helix (upper front in the figure) is not predicted. This is especially weird as this helix is quite good modelled in the SwissModel for 2iadB (<xr id="itasser_swiss_2iadB_1"/>). Additionally the majority of the sheets is missing.<br>
+
Compared to all other models this one '''looks''' the worst (see <xr id="itasser_2iadB_1"/>). A whole helix (upper front in the figure) is not predicted. This is especially weird as this helix is modeled in the SwissModel for 2iadB (cf. <xr id="itasser_swiss_2iadB_1"/>). Additionally the majority of the sheets is missing.<br>
 
On the other hand I-Tasser correctly predicted a part of the transmembrane helix around 307 to 330 (lower right corner in the figure) which is not included in the pdb file for 1a6z. I-Tasser also predicted a helix at the beginning of HFE in the signal peptide region, but this region is also not contained in the pdb file nor is it specified as a helix in uniprot, but it was also predicted as helix by PsiPred and ReProf in [[Sequence-Based Predictions Hemochromatosis|Task 3]].
 
On the other hand I-Tasser correctly predicted a part of the transmembrane helix around 307 to 330 (lower right corner in the figure) which is not included in the pdb file for 1a6z. I-Tasser also predicted a helix at the beginning of HFE in the signal peptide region, but this region is also not contained in the pdb file nor is it specified as a helix in uniprot, but it was also predicted as helix by PsiPred and ReProf in [[Sequence-Based Predictions Hemochromatosis|Task 3]].
   
Line 695: Line 742:
   
 
<figure id="itasser_1k5nA_1">
 
<figure id="itasser_1k5nA_1">
[[File:hemo_model_itasser_1k5nA_1.png|thumb|200px|I-Tasser 1k5nA (red) superimposed on 1a6zA (green).]]
+
[[File:hemo_model_itasser_1k5nA_1.png|thumb|200px|'''Figure 17:''' I-Tasser 1k5nA (red) superimposed on 1a6zA (green).]]
 
</figure>
 
</figure>
   
Line 707: Line 754:
   
   
The scores for both models are shown in <xr id="itasser_scores_native"/> and <xr id="itasser_scores_complex"/>. TM-Align calculates a TM-Score of around 0.79 for the best model for 2iadB. Although this is much higher than the other methods achieved for 2iadB it shows that 2iadB seems to be a quite bad template for HFE as even the incorporation of the pdb structures of HFE (1a6z, 1de4) in the modelling process didn't raise the score above many of the other models based on low identity templates. The same is true for 1k5nA. Overall the I-Tasser results are pretty underwhelming given that it is considered one of the best methods out there.
+
The scores for both models are shown in <xr id="itasser_scores_native"/> and <xr id="itasser_scores_complex"/>. TM-Align calculates a TM-Score of around 0.79 for the best model for 2iadB. Although this is much higher than the other methods achieved for 2iadB it shows that 2iadB seems to be a quite bad template for HFE as even the incorporation of the pdb structures of HFE (1a6z, 1de4) in the modelling process didn't raise the score above many of the other models based on low identity templates. The same is true for 1k5nA. In addition both models failed to establish the first disulfide bond (see <xr id="swiss_align_conserved"/> for positions), but both succeeded with the second bond. Overall the I-Tasser results are pretty underwhelming given that it is considered one of the best methods out there.
   
 
<figtable id="itasser_scores_native">
 
<figtable id="itasser_scores_native">
Line 736: Line 783:
 
| style="border-style: solid; border-width: 0 0 0 0" |2.333 (over 270 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |2.333 (over 270 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 12:''' Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 767: Line 814:
 
| style="border-style: solid; border-width: 0 0 0 0" |2.161 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |2.161 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 13:''' Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 793: Line 840:
   
 
<figure id="3dJigsawAll">
 
<figure id="3dJigsawAll">
[[File:hemo_model_jigsaw_all.png|200px|thumb|Jigsaw models superimposed on 1a6zA. Model1 (red), Model2 (blue), Model3 (yellow), Model4 (pink), and Model5 (cyan).]]
+
[[File:hemo_model_jigsaw_all.png|200px|thumb|'''Figure 18:''' Jigsaw models superimposed on 1a6zA. Model1 (red), Model2 (blue), Model3 (yellow), Model4 (pink), and Model5 (cyan).]]
 
</figure>
 
</figure>
   
Line 803: Line 850:
   
   
It is not surprising that all 5 models have about the same scores (cf. <xr id="jigsaw_scores_native"/> and <xr id="jigsaw_scores_complex"/>) as they almost look alike. Although they can compete with the best models from the other methods none of them are really astonishing. The TM-Score (TM-Align) and weighted RMSD are both lower than several of the provided models (e.g. 3dbxA).
+
It is not surprising that all 5 models have about the same scores (cf. <xr id="jigsaw_scores_native"/> and <xr id="jigsaw_scores_complex"/>) as they almost look alike. Although they can compete with the best models from the other methods none of them are really astonishing. The TM-Score (TM-Align) and weighted RMSD are both lower than several of the provided models (e.g. 3dbxA). What's more, none of the models could establish either of the two disulfide bonds mentioned before in <xr id="swiss_align_conserved"/>.
   
 
<figtable id="jigsaw_scores_native">
 
<figtable id="jigsaw_scores_native">
Line 856: Line 903:
 
| style="border-style: solid; border-width: 0 0 0 0" |2.350 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |2.350 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 14:''' Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 911: Line 958:
 
| style="border-style: solid; border-width: 0 0 0 0" |2.157 (over 272 atoms)
 
| style="border-style: solid; border-width: 0 0 0 0" |2.157 (over 272 atoms)
 
|-
 
|-
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''TODO''': Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..
+
|+ style="caption-side: bottom; text-align: left" | <font size=1>'''Table 15:''' Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.
 
|}
 
|}
 
</figtable>
 
</figtable>
Line 920: Line 967:
   
   
  +
Our results show that a low sequence identity does not always mean that it's a bad template. In fact our best model was 3dbxA which had by far the least (16% to 19%, depending on alignment). This suggests that the structure of HFE is by far more conserved than the sequence as even distant relatives provide good templates. Although with such a low identity the alignment method becomes even more important as it isn't trivial anymore to pair residues.
TODO: make text :P
 
  +
  +
When looking at the scores (TM-Scores, GDT scores, RMSD) it seems that with a few exceptions the TM-Score, GDT-TS score, and GDT-HA score correlate quite well, although the GDT-HA score has the most outliers to this rule. The RMSD seems to be more independent from the former triad. Though you have to keep in mind that our observations arise from failed TM-Score alignments and TM-Align only outputs a TM-Score (no GDT scores).
   
  +
In the end SwissModel seems to have had the best performance for our protein and accomplished good results with low identity templates.
* HFE structure more conserved than sequence
 
* low identity =/= bad model
 
* alignment important
 
* correlation
 

Latest revision as of 09:17, 5 June 2012

Hemochromatosis>>Task 4: Homology based structure predictions

Short Task Description

Detailed description: Homology based structure predictions

In this task we want to assess the quality of 3D models built with homology information. For this we have employed several model building methods:

  • Modeller
  • SwissModel
  • I-Tasser
  • 3D-Jigsaw

The models were then evaluated by eye, TM-Score, TM-Align, and SAP.


Regarding the extra task:

  • Extra diligence task: define a radius of 6 Angstrom around the catalytic centre and calculate the all atom RMSD in that region

As there is no defined active site for our protein, we couldn't do it.

Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.

PDB templates

In order to find templates for our models we performed several searches for homologs with COMA and HHPred. We also reused the sequences from Task 2. However none of these methods yielded homologs with a sequence identity above 40% (except 1a6z which is HFE itself) that could be mapped to a PDB structure. The best results for COMA and HHPred are listed in <xr id="coma_t"/> and <xr id="hhpred_t"/> respectively. Therefore we could not generate models with a sequence identity above 80% and the 40%-80% range was limited to its lower bound. In addition to those shown below, we also used 2iad_B (P01921, 21.10% identity) from task 2.

<figtable id="coma_t">

PDB ID e-Value Identities Positives
1a6z_A 1.00E-63 100% 100%
1t7v_A 1.20E-63 34% 62%
3nwm_A 1.60E-57 30% 57%
1frt_A 4.90E-65 28% 66%
2wy3_A 2.80E-63 26% 66%
3ov6_A 2.70E-55 20% 67%
1u58_A 1.00E-54 19% 59%
3d2u_A 7.50E-60 16% 70%
3dbx_A 9.70E-59 16% 71%
3it8_D 3.40E-52 15% 65%
Table 1: Top 10 results (based on e-Value) from the COMA search sorted by Identities.

</figtable>

<figtable id="hhpred_t">

PDB ID e-Value Identities Similarity
1a6z_A 1.80E-69 100% 1.623
1k5n_A 2.80E-68 40% 0.725
1s7q_A 5.80E-78 37% 0.655
1t7v_A 7.40E-68 36% 0.702
3p73_A 1.10E-69 35% 0.638
3bev_A 1.00E-69 34% 0.684
2yf1_A 2.40E-74 32% 0.617
1zs8_A 7.30E-68 30% 0.553
2wy3_A 3.30E-68 29% 0.496
1cd1_A 1.60E-67 21% 0.394
Table 2: Top 10 results (based on e-Value) from the HHPred search sorted by Identities.

</figtable>


Models

For our models we selected 1k5nA, 1zs8A, 2iadB, and 3dbxA as templates. We chose them to have a wide variety of sequence identities. The MSA models were created with a subset of these 4 templates: MSA1 contains all four templates, MSA2 the lower three (1zs8A, 2iadB, and 3dbxA), and MSA3 only 2iadB and 3dbxA. For the Modeller models we used both alignment methods (simple and 2d) to create the single template models. For the evaluation we compared the models to the "native" (complexed with beta-2-microglobulin only) and complex (complexed with beta-2-microglobulin and transferrin receptor) structure of HFE, 1a6zA and 1de4A respectively. Both structures contain only 275 of HFE's 348 residues (namely 23-297), thus excluding the signal peptide and the transmembrane and cytoplasmic region.


Modeller


In the following segment the models were evaluated that were built with modeller. The modelbuilding was based on different alignments, created with modeller itself and valuated against our HFE protein (1A6Z and 1DE4). The alignments for each model can be found here.

The presented pictures show the resulting models (in green) superimposed with the 1A6Z protein (red). The yellow lines indicate which positions PYMol had aligned to superimpose them.

1k5nA

<figure id="1K5NSimple">

Figure 1: model built by using 1K5N, simple modeller alignment, superimposed with 1A6Z

</figure><figure id="1K5N2d">

Figure 2: model built by using 1K5N, 2d modeller alignment, superimposed with 1A6Z

</figure>


Both models are nearly the same, showing a good superimposition with chain A from 1A6Z. Also the model generated using the 2d alignment of modeller does not improve TM-score, TMAlign score and RMSD together. This is caused by the fact that the structure incorporating alignment method generates nearly the same alignment as the "normal" one (from modeller). Also the 2dalign() call aligns the first occuring amino acid way before the following residues. This might be an error and tried to avoided by manual correction.

1zs8A


<figure id="1ZS8Simple">

Figure 3: model built by using 1ZS8, simple modeller alignment, superimposed with 1A6Z

</figure><figure id="1ZS82d">

Figure 4: model built by using 1ZS8, 2d modeller alignment, superimposed with 1A6Z

</figure>
On the first look both structures look the same but as <xr id="modeller_scores_native"/>/<xr id="modeller_scores_complex"/> show, the 2d alignment based model has a Higher TM-Score, TMAlign score and RMSD score. This is caused by the fact that the 2d alignment (which incorporates structure) introduces more gaps but aligns the structure a bit better, resulting in the improvement of the model (against the normal-modeller-alignment based one).

2iadB

<figure id="2IADSimple">

Figure 5: model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z

</figure><figure id="2IAD2d">

Figure 6: model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z

</figure> <figure id="2IADSimpleNoBars">

Figure 7: model built by using 2IAD, simple modeller alignment, superimposed with 1A6Z

</figure><figure id="2IAD2dNoBars">

Figure 8: model built by using 2IAD, 2d modeller alignment, superimposed with 1A6Z

</figure>
As indicated by the alignment lines and also detectable in the ones without the alignment lines these models have not a good predicted structure as the ones presented before. This also reflects the low TMAlign and high RMSD scores (compared to the other models). But it is also noticable that the beta sheets and the two helices have nearly the right orientation. Therefore homology (0.5 TMScore) is nearly achieved.

The 2d alignment introduces more gaps, but results in a more accurate model through this (0.49 TMAlign score against 0.40).

3dbxA


<figure id="3DBXSimple">

Figure 9: model built by using 3DBX, simple modeller alignment, superimposed with 1A6Z

</figure><figure id="3DBX2d">

Figure 10: model built by using 3DBX, 2d modeller alignment, superimposed with 1A6Z

</figure>
Both of the 3DBX based models resemble the original HFE protein mostly, also indicated by their TMAlign scores. However, in this case the 2d alignment does not improve but worsen the prediction (against the simple alignment). The small slide in the 2d alignment of the first residues (agains the normal alignment) seems to cause this.

MSA1

<figure id="MSA1basedModel">

Figure 11: model built by using MSA1 superimposed with 1A6Z

</figure> The model resembles the original protein, and the MSA shows a common gap on position 19-42 of the target sequence. This is the gap in the 3DBX model simple alignment, on which the best model has been built (based on TMAlign score).

MSA2

<figure id="MSA2basedModel">

Figure 12: model built by using MSA2 superimposed with 1A6Z

</figure> This model is worse than the MSA1 based one, although the 3DBX and 1ZS8 alignments resulted in good models. This means that MSAs do not always improve the built models, but can (if one chooses a "bad" template as 2IAD for example).

MSA3


<figure id="MSA3basedModel">

Figure 13: model built by using MSA3 superimposed with 1A6Z

</figure><figure id="MSA3basedModelNoBars">

Figure 14: model built by using MSA3 superimposed with 1A6Z

</figure>
The last model build with Modeller was using MSA3. As one can tell from the size of the alignment lines in the picture, this model is not very good. And even although 3DBX was used (which resulted in a very good model) the resulting model from this was slightly worse (by TMAlign score) than the one based on 2IAD.

Evaluation

<figtable id="modeller_scores_native">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1K5N_2d 272 0.1686 0.0846 0.0487 0.83298 2.200 (over 272 atoms)
1K5N_simple 272 0.1649 0.0846 0.0506 0.83358 2.325 (over 272 atoms)
1ZS8_2d 272 0.1698 0.0827 0.0432 0.85494 1.642 (over 272 atoms)
1ZS8_simple 272 0.1550 0.0790 0.0423 0.79841 2.302 (over 271 atoms)
2IAD_2d 272 0.1725 0.0836 0.0460 0.49103 2.166 (over 269 atoms)
2IAD_simple 272 0.1337 0.0607 0.0349 0.40162 3.705 (over 272 atoms)
3DBX_2d 272 0.1742 0.0892 0.0496 0.81698 2.374 (over 267 atoms)
3DBX_simple 272 0.1684 0.0882 0.0496 0.86512 1.524 (over 267 atoms)
MSA1 272 0.1680 0.0855 0.0496 0.83014 2.366 (over 270 atoms)
MSA2 272 0.3218 0.1811 0.0855 0.72682 1.889 (over 270 atoms)
MSA3 272 0.1530 0.0708 0.0358 0.40265 3.679 (over 272 atoms)
Table 3: Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..

</figtable>

<figtable id="modeller_scores_complex">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1K5N_2d 272 0.1674 0.0800 0.0469 0.85238 1.986 (over 272 atoms)
1K5N_simple 272 0.1638 0.0800 0.0478 0.83836 2.142 (over 272 atoms)
1ZS8_2d 272 0.1715 0.0827 0.0450 0.86026 1.720 (over 272 atoms)
1ZS8_simple 272 0.1550 0.0790 0.0414 0.82124 2.059 (over 271 atoms)
2IAD_2d 272 0.1706 0.0836 0.0441 0.48549 2.279 (over 269 atoms)
2IAD_simple 272 0.1318 0.0653 0.0358 0.40088 3.655 (over 272 atoms)
3DBX_2d 272 0.1737 0.0873 0.0487 0.83662 2.088 (over 269 atoms)
3DBX_simple 272 0.1687 0.0901 0.0515 0.86877 1.492 (over 269 atoms)
MSA1 272 0.1698 0.0873 0.0524 0.84313 2.150 (over 270 atoms)
MSA2 272 0.3223 0.1783 0.0846 0.70465 2.011 (over 270 atoms)
MSA3 272 0.1574 0.0754 0.0377 0.40026 3.637 (over 272 atoms)
Table 4: Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.

</figtable>


One can see in both figures and tables that all models built via modeller (except the one based on 2IAD) lead to fairly good models. Also it seems that using multiple templates does not mean that the resulting models get better. This is indicated e. g. between the tm-align scores of MSA2 and MSA3 and the predicted structure of their models. It is plausible that the aligned sequence of 2IAD in the MSAs leads in case of MSA3 to a disruption of the modelpositions, but in case of MSA2 or MSA1 this one template is not enough anymore to alter the resulting model greatly. Unfortunately the sequences we used have a big overlap in the MSA, it would be informative how modelling performs when different fragments were modelled from different templates.

SwissModel

For SwissModel we only used the four single templates 1k5nA, 1zs8A, 2iadB, and 3dbxA. The alignments for each model can be found here.


1k5nA

<figtable id="swiss_1k5nA_stats">

SwissModel 1k5nA (red) superimposed on 1a6zA (green).
QMEAN and anolea distribution for SwissModel 1k5nA.
Per residue error rate for SwissModel 1k5nA.
QMEAN statistics for SwissModel 1k5nA.
Table 5: Summarized output from SwissModel for 1k5nA.

</figtable>

The alignment from SwissModel for 1k5nA spans the residues 26 to 299 of HFE. It has a sequence identity of 38.71% and contains only three single gaps in 1k5nA's sequence, none of which break up a secondary structure. The estimated QMEAN Z-score is -1.92. <xr id="swiss_1k5nA_stats"/> shows the summary of the quality assessment provided by SwissModel. The error rates per residue fluctuate rapidly with peaks of over 4 anstrom. The anolea estimation shows two long regions with unfavorable scores from residue 69 to 110 and 128 to 206.

The three major helices of the MHC I domain (front of the figure) are well modeled and aligned. The same is true for the sheets of the same domain. The C1 domain's (background of the figure) sheets are not as good. Some are missing or too short and they are also not that well aligned.


1zs8A

<figtable id="swiss_1zs8A_stats">

SwissModel 1zs8A (red) superimposed on 1a6zA (green).
QMEAN and anolea distribution for SwissModel 1zs8A.
Per residue error rate for SwissModel 1zs8A.
QMEAN statistics for SwissModel 1zs8A.
Table 6: Summarized output from SwissModel for 1zs8A.

</figtable>

The results for 1zs8A are summarized in <xr id="swiss_1zs8A_stats"/>. Similar to 1k5nA the alignment spans from 27 to 298 with a sequence identity of 29.56%. It contains three single gaps and one gaps of 3 residues within 1zs8A that don't split a secondary structure, but also one bigger gap of 7 residues that breaks up a helix. The QMEAN Z-score of -2.88 is worse than that for 1k5nA which isn't surprising given the drop of 9% sequence identity. The error rates are also worse with a maximum of 8 angstrom and they almost never go below 1 angstrom. The anolea graph exhibits the same unfavorable regions as in 1k5nA, though the first region is even worse and the second a bit shorter.

This time SwissModel seems to have had more problems with the MHC I domain. The sheets are not as well aligned as in the 1k5nA model. The first helix (lower front) is quite well aligned, but the second one (left background) is completely missing and the third is a bit too short. The C1 domain is missing one sheet, but is better aligned than in the previous model (1k5nA).


2iadB

<figtable id="swiss_2iadB_stats">

SwissModel 2iadB (red) superimposed on 1a6zA (green).
QMEAN and anolea distribution for SwissModel 2iadB.
Per residue error rate for SwissModel 2iadB.
QMEAN statistics for SwissModel 2iadB.
Table 7: Summarized output from SwissModel for 2iadB.

</figtable>

In contrast to the other three templates, the alignment for 2iadB is rather short (111 to 299). It contains one single gap and two 2 residue gaps, neither of them breaking a secondary structure in 2iadB. This and the sequence identity of only 21.76% might be the cause for the very bad QMEAN Z-score of -3.42 (see <xr id="swiss_2iadB_stats"/> for a summary of the results). The error rates don't go as high as for 1zs8A, but they never drop below 2 angstrom for the first half of the alignment and improve only slightly in the second half. This is also reflected in the anolea distribution as there are almost no favorable regions during the first half. The second half (starting around 222) gets much better which correlates with the regions from the other templates.

Due to the short alignment the MHC I domain starts halfway through and is quite bad. The sheets, while modeled correctly, are quite misaligned. Both helices are too short and not that well aligned as in the other models. The C1 domain's sheets are now all modeled, but the alignment is even worse.


3dbxA

<figtable id="swiss_3dbxA_stats">

SwissModel 3dbxA (red) superimposed on 1a6zA (green).
QMEAN and anolea distribution for SwissModel 3dbxA.
Per residue error rate for SwissModel 3dbxA.
QMEAN statistics for SwissModel 3dbxA.
Table 8: Summarized output from SwissModel for 3dbxA.

</figtable>

3dbxA (SwissModel output shown in <xr id="swiss_3dbxA_stats"/>) has the longest alignment of all templates with a length of 275 residues (24 to 298 in HFE), but it also has the lowest sequence identity with only 18.93 percent. The alignment has only two single gaps in 3bdxA's sequence, but one of them splits a sheet and the other one a helix. Nevertheless it has a QMEAN Z-score of -2.8 which is even slightly better than for 1zs8A. The error rates average around 3 angstrom for the first half of the alignment and slightly improve for the second. The anolea graph again shows the two unfavorable regions from 1k5nA and 1zs8A, though this time the second one is almost neutral and the first is slightly longer.

The sheets of the MHC I domain, while sometimes a bit too long, are very well aligned. The first helix is a bit too short, contains a break, and is quite misalinged (especially in the first half). The second and third helices are quite well aligned, though the third one got a bit too long. The C1 domain is also quite good, but one of the sheets got split into two shorter ones which are also quite misaligned to their "parent sheet".


Evaluation

TM-Score again seems to have problems to correctly align both sequences and therefore provides no meaningful data (TM-Score, GDT-TS, and GDT-HA in <xr id="swiss_scores_native"/> and <xr id="swiss_scores_complex"/>). When comparing the TM-Score from TM-Align to the corresponding scores for the Modeller results (cf. <xr id="modeller_scores_native"/> and <xr id="modeller_scores_complex"/>)you can see that SwissModel does neither perform better nor worse than Modeller on a general basis.

Surprisingly 3dbxA provides the best model (TM-Align 0.89) despite its low sequence identity and outperforms those from Modeller. Especially the weighted RMSD of 1.1 is quite good compared to the other models. In the case of 1zs8A SwissModel performs better than the simple alignment based Modeller model, but worse than the 2d alignment one. Regarding the weighted RMSD SwissModel outperforms Modeller for both alignment methods. 2iadB again performs worst (cf. Modeller results) as it suffers from the short alignment with HFE. Even with TM-Align the TM-Score barely reaches 0.5 which is the minimum threshold for similar folds. Despite its high sequence identity 1k5nA performs about as well as 1zs8A regarding the TM-Score, but has a much worse weighted RMSD.

<figtable id="swiss_scores_native">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1k5nA 250 0.1626 0.0809 0.0478 0.84456 2.121 (over 272 atoms)
1zs8A 249 0.1449 0.0662 0.0377 0.83755 1.514 (over 271 atoms)
2iadB 165 0.1218 0.0680 0.0450 0.50849 2.805 (over 187 atoms)
3dbxA 252 0.1684 0.0836 0.0478 0.89308 1.111 (over 272 atoms)
Table 9: Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.

</figtable>

<figtable id="swiss_scores_complex">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1k5nA 250 0.1611 0.0781 0.0469 0.85087 2.009 (over 272 atoms)
1zs8A 249 0.1450 0.0689 0.0377 0.83904 1.501 (over 271 atoms)
2iadB 165 0.1201 0.0671 0.0432 0.49068 3.172 (over 187 atoms)
3dbxA 252 0.1679 0.0818 0.0460 0.88762 1.203 (over 272 atoms)
Table 10: Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.

</figtable>


Amino acid modification sites analysis

The annotated amino acid modification sites for HFE show similar conservation results (cf. <xr id="swiss_align_conserved"/>) as previously in Task 2. With the exception of 3dbxA both disulfide bonds are conserved as well as the first glycosylation site. The later two glycosylation sites are substituted. The alignment with 2iadB does not include the first glycosylation site.

An examination with PyMol shows that all disulfide bonds, with the exception of the first bond in 3dbxA, can be established. This is in accordance with the conservation in the alignments. A quick check with the pdb structure of 3dbxA revealed that there is no substitute disulfide bond in the original protein. This suggests that this bond is not that important for the forming of the MHC I domain in HFE.
The glycosylation sites in uniprot are annotated as "potential" which means that there is no experimental evidence for their importance or even existence. Nevertheless the first one (residue 110) seems to be highly conserved within HFE and its homologs. The surface accessibility and orientation for this residue is also quite good conserved in 1k5nA and 3dbxA. In 1zs8A the position and orientation are quite shifted. The third glycosylation site (residue 234) is modeled and aligned very well in 1k5nA, 1zs8A, and 3dbxA, but a bit off in the 2iadB model. Only the 1k5nA model aligns the third one (residue 234) about right. In the other three models its way off with 3dbxA taking the second place.

<figtable id="swiss_align_conserved">

Site 1k5nA 1zs8A 2iadB 3dbxA
Glycosylation 1 (N at 110) conserved conserved --- subst. (A)
Glycosylation 2 (N at 130) subst. (G) subst. (G) subst. (T) subst. (G)
Glycosylation 3 (N at 234) subst. (E) subst. (D) subst. (K) subst. (P)
Disulfide bond 1 (C at 124) conserved conserved conserved conserved
Disulfide bond 1 (C at 187) conserved conserved conserved subst. (F)
Disulfide bond 2 (C at 225) conserved conserved conserved conserved
Disulfide bond 2 (C at 282) conserved conserved conserved conserved
Table 11: Conservation of HFE's amino acid modification sites for the different SwissModel alignments.

</figtable>


I-Tasser

For the I-Tasser models we used the option to provide a template. As templates we have used so far 1k5nA and 2iadB. The runtimes were extremely high with 4 (2iadB) to 5 (1k5nA) days. As there is a limit to only one submission per account and considering the long runtime, we won't be able to process more templates until tuesday. It should also be noted that I-Tasser incorporated 1a6z and 1de4 into the modelling process, both of which are pdb entries for HFE.


2iadB

<figure id="itasser_2iadB_1">

Figure 15: I-Tasser 2iadB (red) superimposed on 1a6zA (green).

</figure>

<figure id="itasser_swiss_2iadB_1">

Figure 16: I-Tasser 2iadB (red) and SwissModel 2iadB (pink) superimposed on 1a6zA (green).

</figure>

I-Tasser provided 5 models for 2iadB. The best one with a C-score of -1.46 and an estimated TM-Score of 0.53±0.15 was selected for the evaluation.

Compared to all other models this one looks the worst (see <xr id="itasser_2iadB_1"/>). A whole helix (upper front in the figure) is not predicted. This is especially weird as this helix is modeled in the SwissModel for 2iadB (cf. <xr id="itasser_swiss_2iadB_1"/>). Additionally the majority of the sheets is missing.
On the other hand I-Tasser correctly predicted a part of the transmembrane helix around 307 to 330 (lower right corner in the figure) which is not included in the pdb file for 1a6z. I-Tasser also predicted a helix at the beginning of HFE in the signal peptide region, but this region is also not contained in the pdb file nor is it specified as a helix in uniprot, but it was also predicted as helix by PsiPred and ReProf in Task 3.


1k5nA

<figure id="itasser_1k5nA_1">

Figure 17: I-Tasser 1k5nA (red) superimposed on 1a6zA (green).

</figure>

The best model for 1k5nA provided by I-Tasser has a C-score of -0.75 and an estimation of 0.62±0.14 for the TM-Score.

The model fits the reference structure (1a6zA) quite well (see <xr id="itasser_1k5nA_1"/>). The long helices are well aligned, but the sheets seem to be quite shifted. I-Tasser also predicted a helix in the signal peptide region (background, center), just like in the previous 2iadB model.


Evaluation

The scores for both models are shown in <xr id="itasser_scores_native"/> and <xr id="itasser_scores_complex"/>. TM-Align calculates a TM-Score of around 0.79 for the best model for 2iadB. Although this is much higher than the other methods achieved for 2iadB it shows that 2iadB seems to be a quite bad template for HFE as even the incorporation of the pdb structures of HFE (1a6z, 1de4) in the modelling process didn't raise the score above many of the other models based on low identity templates. The same is true for 1k5nA. In addition both models failed to establish the first disulfide bond (see <xr id="swiss_align_conserved"/> for positions), but both succeeded with the second bond. Overall the I-Tasser results are pretty underwhelming given that it is considered one of the best methods out there.

<figtable id="itasser_scores_native">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1k5nA_1 272 0.1682 0.0836 0.0478 0.84911 2.305 (over 272 atoms)
2iadB_1 272 0.1699 0.0873 0.0506 0.78679 2.333 (over 270 atoms)
Table 12: Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..

</figtable>

<figtable id="itasser_scores_complex">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
1k5nA_1 272 0.1666 0.0836 0.0496 0.85045 2.217 (over 272 atoms)
2iadB_1 272 0.1692 0.0855 0.0478 0.79875 2.161 (over 272 atoms)
Table 13: Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.

</figtable>


3D-Jigsaw

The first attempt with 3D-Jigsaw contained the following models as those seemed to be about the best:

  • 1zs8A (Modeller, 2d align)
  • 3dbxA (Modeller, simple align)
  • 3dbxA (SwissModel)
  • MSA1 (Modeller)
  • MSA2 (Modeller)

3D-Jigsaw failed to generate new models with these.

The second attempt contained all 15 models from Modeller and SwissModel combined. This time 3D-Jigsaw was able to generate 5 new models.


Models

<figure id="3dJigsawAll">

Figure 18: Jigsaw models superimposed on 1a6zA. Model1 (red), Model2 (blue), Model3 (yellow), Model4 (pink), and Model5 (cyan).

</figure>

All models seem to be almost identically (see <xr id="3dJigsawAll"/>). They also match the pdb structure (1a6zA) quite well. The only differences for the models are around the coiled regions not included in the pdb structure (residues 1-22 and 298-348). This is no surprise as all previous models also had the problem that the templates lacked these regions.


Evaluation

It is not surprising that all 5 models have about the same scores (cf. <xr id="jigsaw_scores_native"/> and <xr id="jigsaw_scores_complex"/>) as they almost look alike. Although they can compete with the best models from the other methods none of them are really astonishing. The TM-Score (TM-Align) and weighted RMSD are both lower than several of the provided models (e.g. 3dbxA). What's more, none of the models could establish either of the two disulfide bonds mentioned before in <xr id="swiss_align_conserved"/>.

<figtable id="jigsaw_scores_native">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
model_1 272 0.1647 0.0864 0.0533 0.83038 2.359 (over 272 atoms)
model_2 272 0.1648 0.0873 0.0542 0.83125 2.351 (over 272 atoms)
model_3 272 0.1648 0.0873 0.0542 0.83127 2.350 (over 272 atoms)
model_4 272 0.1648 0.0873 0.0542 0.83127 2.350 (over 272 atoms)
model_5 272 0.1648 0.0873 0.0542 0.83127 2.350 (over 272 atoms)
Table 14: Scoring results for the models against 1a6z (native). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP..

</figtable>

<figtable id="jigsaw_scores_complex">

Model Common residues TM-Score GDT-TS GDT-HA TM-Align Weighted RMSD
model_1 272 0.1637 0.0836 0.0506 0.83550 2.166 (over 272 atoms)
model_2 272 0.1637 0.0836 0.0506 0.83622 2.158 (over 272 atoms)
model_3 272 0.1637 0.0836 0.0506 0.83627 2.157 (over 272 atoms)
model_4 272 0.1637 0.0836 0.0506 0.83625 2.158 (over 272 atoms)
model_5 272 0.1637 0.0836 0.0506 0.83626 2.157 (over 272 atoms)
Table 15: Scoring results for the models against 1de4 (complex). Common residues, TM-Score, GDT-TS, and GDT-HA are calculated by TM-Score. TM-Align is the TM-Score based on TM-Align. Weighted RMSD calculated with SAP.

</figtable>


Conclusion

Our results show that a low sequence identity does not always mean that it's a bad template. In fact our best model was 3dbxA which had by far the least (16% to 19%, depending on alignment). This suggests that the structure of HFE is by far more conserved than the sequence as even distant relatives provide good templates. Although with such a low identity the alignment method becomes even more important as it isn't trivial anymore to pair residues.

When looking at the scores (TM-Scores, GDT scores, RMSD) it seems that with a few exceptions the TM-Score, GDT-TS score, and GDT-HA score correlate quite well, although the GDT-HA score has the most outliers to this rule. The RMSD seems to be more independent from the former triad. Though you have to keep in mind that our observations arise from failed TM-Score alignments and TM-Align only outputs a TM-Score (no GDT scores).

In the end SwissModel seems to have had the best performance for our protein and accomplished good results with low identity templates.