Difference between revisions of "Homology based structure predictions"
(→I-Tasser) |
(→TODO) |
||
(202 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | <sup>by [[User:Greil|Robert Greil]] and [[User:Landerer|Cedric Landerer]]</sup> |
||
+ | |||
== Homologous == |
== Homologous == |
||
− | Because we found no homologous structures in Task 2, we extended our list by using HHSearch. |
||
+ | [[Image:1bii.PNG|thumb|right|Figure 1: 1BII, the template structure]] |
||
− | HHSearch found just sequences with an indentity below 40% therefore we will use the 12 proteins shown below for creating a multiple alignment for homologous modeling. We choose sequences to cover the whole protein and we pay specific attention on the transmembrane region. |
||
+ | |||
+ | Because we found no homologous structures in Task 2, we extended our list by using [http://toolkit.tuebingen.mpg.de/hhpred HHSearch]. |
||
+ | |||
+ | HHSearch found just sequences with an identity below 40% therefore we will use the 12 proteins shown below for creating a multiple alignment for homologous modeling. We choose sequences to cover the whole protein and we pay specific attention on the transmembrane region. In the cases were we can just use one template, we will use 1BII as a template (Figure 1). |
||
Line 10: | Line 15: | ||
|'''Description''' |
|'''Description''' |
||
|- |
|- |
||
− | | |
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1S79 1S79] || 37% || human La protein |
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2WY3 2WY3] || 29% || HCMV UL16-MICB complex |
||
− | | 3p73 || 28% || classical MHC class I molecule |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=3P73 3P73] || 28% || classical MHC class I molecule |
||
− | | 1kcg || 22% || NKG2D |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=3JTS 3JTS] || 25% || Mamu A*2 |
||
− | | 1jfm || 14% || MURINE NK CELL LIGAND RAE-1 BETA |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1KCG 1KCG] || 22% || NKG2D |
||
− | | 1bii || 22% || H-2DD MHC CLASS I |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1BII 1BII] || 22% || H-2DD MHC CLASS I |
||
− | | 2p24 || 21% || alphabeta TCR |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1OW0 1OW0] || 22% || human FcaRI |
||
− | | 1cd1 || 21% || MHC-like fold with a large hydrophobic binding groove |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2P24 2P24] || 21% || alphabeta TCR |
||
− | | 2wy3 || 29% || HCMV UL16-MICB complex |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1CD1 1CD1] || 21% || MHC-like fold with a large hydrophobic binding groove |
||
− | | 1lqv || 14% || Endothelial protein C receptor |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1HXM 1HXM] || 18% || Human Vgamma9/Vdelta2 T Cell Receptor |
||
− | | 3jts || 25% || Mamu A*2 |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1LQV 1LQV] || 14% || Endothelial protein C receptor |
||
− | | 1ow0 || 22% || human FcaRI |
||
|- |
|- |
||
+ | | [http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1JFM 1JFM] || 14% || MURINE NK CELL LIGAND RAE-1 BETA |
||
− | | 1hxm || 18% || Human Vgamma9/Vdelta2 T Cell Receptor |
||
|- |
|- |
||
|} |
|} |
||
− | With these |
+ | With these sequences including the HFE-Gen(Q30201), we did a multiple sequence alignment with t-coffee(EXPRESSO). This multiple sequence alignment is later used as a raw alignment in the Alignment Mode of SwissModel and Modeller. Later on, we will try to fit better models by editing the alignment by keeping functional regions together. |
DSSP --EEEEEEEEEEB-SS-SSB--EEEEEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH |
DSSP --EEEEEEEEEEB-SS-SSB--EEEEEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH |
||
Line 114: | Line 119: | ||
− | Based on the secondary structure for the HFE-Gen assigned by DSSP from the PDB structure (1a6z) the |
+ | Based on the secondary structure for the HFE-Gen assigned by DSSP from the PDB structure (1a6z) the multiple sequence alignment conserves most parts of the secondary structure.<br> |
+ | <br> |
||
+ | As HHSearch found just weak homologous, we searched in CATH to find structure homologous. The BLAST search in CATH found sequence homologous in a range from 49% to 22%. The HFE protein is classified as a two domain protein (Alpha Beta, Mainly Beta)<ref>http://www.cathdb.info/domain/1a6zA01</ref>. We found both domains with a sequence similarity of 100%. We than used BLAST to test the results at random with another search against CATH. We found for several proteins the same sequence identity distribution. With this BLAST search, we are now sure HFE is a protein with a high conservation in structure elements but a very weak sequence conservation. Therefore we would recommend a new acceptance range of about 20% to 40% sequence similarity for this protein. |
||
== I-Tasser == |
== I-Tasser == |
||
− | [[Image:Casp789_web.gif|thumb|right| perfomance of I-TASSER at CASP<br> Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html]] |
+ | [[Image:Casp789_web.gif|thumb|right|Figure 2: perfomance of I-TASSER at CASP<br> Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html]] |
− | [[Image:Tasser_workflow.PNG|thumb|right|Workflow of the I-Tasser server<br> Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html]] |
+ | [[Image:Tasser_workflow.PNG|thumb|right|Figure 3: Workflow of the I-Tasser server<br> Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html]] |
− | I-Tasser is a webservice for protein structure prediction provided and published by Ambrish Roy, Alper Kucukural and Yang Zhang at http://zhanglab.ccmb.med.umich.edu/I-TASSER/ for the CASP competition with outstanding achievement. |
+ | I-Tasser<ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins</ref> is a webservice for protein structure prediction provided and published by Ambrish Roy, Alper Kucukural and Yang Zhang at http://zhanglab.ccmb.med.umich.edu/I-TASSER/ for the CASP competition with outstanding achievement (Figure 2). <br> |
+ | <br> |
||
+ | The I-Tasser protocol consists of several steps which are:<br> |
||
+ | * threading the sequence into different structures to create an initial template. |
||
+ | * break the template apart into fragments which match the structure (leave the parts of the structure out to which no sequence is assigned). |
||
+ | * Structure assembly and clustering |
||
+ | * use the cluster centroid for structure reassembly |
||
+ | * search the structure with the lowest energy and do REMO H-bond optimization to get the final model. |
||
+ | <br> |
||
+ | A graphical workflow is shown in Figure 3. |
||
+ | |||
+ | Further on, I-Tasser also predicts GO-Terms and binding sites. Therefore it uses the final model to search for global and local matches in the PDB to predict these terms. |
||
+ | <br> |
||
+ | |||
+ | For us, a problem is that I-Tasser only generates complete models, but the PDB structure of our protein is not complete. Therefore we compared the predicted secondary structure with the one form UniProt. |
||
+ | |||
+ | '''Compare secondary structure''' of the model and the structure assigned in UniProt:<br> |
||
+ | Seq: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMEN |
||
+ | Pred: CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCEEEEECCCCCCCCCCEEEEEEECCCEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHH |
||
+ | UniP: ---------------------------EEEEEEEEEEE----EEE--EEEEEE--EEEEEEEEEE--EEE--------TTTHHHHHHHHHHHHHHHHHHHHHHHHHHT |
||
+ | |||
+ | Seq: HNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHV |
||
+ | Pred: HCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCEEEECCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHCCCCCCCCCCCCC |
||
+ | UniP: TT-EEE--EEEEEEEEEE-----EEEEEEEEE--EEEEEEEHHH-EEEEEE---HHHHHHHH---HHHHHHHHHHH-HHHHHHHHHHHHHTTT-------EEEEEEEE |
||
+ | |||
+ | Seq: TSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGI |
||
+ | Pred: CCCHHHHCHHHHCCCCCCEEEEEEECCCCCCCCCCEEEECCCCCCCCCCCEEEEECCCCCCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCCCCCHHHHHHHCCHHH |
||
+ | UniP: ----EEEEEEEEEEEEE--EEEEEE------HHH----EEEE-----EEEEEEEEE---HHHHEEEEEE---EEE-EEEE---------------------------- |
||
+ | |||
+ | Seq: LFIILRKRQGSRGAMGHYVLAERE |
||
+ | Pred: HHHHHHCCCCCCCCCCCCCHCCCC |
||
+ | UniP: ------------------------ |
||
+ | For a better overview we replaced the I-Tasser S for Sheet by an E like in the UniProt secondary structure.<br> |
||
+ | <br> |
||
+ | As we can see, the secondary structure predicted by I-Tasser is mostly correct. Sometimes we see a slightly shift in the structure and sometimes the secondary structure elements have not the correct length. As this model is also based on a self hit, it is not a surprise to see a good results like this one. |
||
'''Predicted Secondary Structure by I-Tasser'''<br> |
'''Predicted Secondary Structure by I-Tasser'''<br> |
||
Line 138: | Line 179: | ||
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC |
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC |
||
Conf-Score: 010211112222100246665443013678898651020169 |
Conf-Score: 010211112222100246665443013678898651020169 |
||
− | Secondary structure elements are shown as H for Alpha helix,S for Beta sheet & C for Coil |
+ | Secondary structure elements are shown as H for Alpha helix, S for Beta sheet & C for Coil |
'''Predicted Solvent Accessibility by I-Tasser''' |
'''Predicted Solvent Accessibility by I-Tasser''' |
||
Line 154: | Line 195: | ||
Values range from 0 (buried residue) to 9 (highly exposed residue) |
Values range from 0 (buried residue) to 9 (highly exposed residue) |
||
+ | '''I-Tasser predicted five Models''' with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below. As cutoff for the C-Score, we use -1.5 as recommended by the Zhang group<ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010). [http://zhanglab.ccmb.med.umich.edu/papers/2010_3.pdf Zhang et al.]</ref> that is proposed to give a false-positive and a false-negative rate of about 0.1. That means that more than 90% of the quality predictions are correct. Therefore we just use Model1 for the comparison with the other methods. All resulting models are shown below in Figure 4 to Figure 8. |
||
− | '''I-Tasser predicted five Models''' with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below. |
||
{| class="centered" |
{| class="centered" |
||
− | |[[Image:Model1_HFE.gif|thumb| Model 1 with a C-Score of -0.557]] |
+ | |[[Image:Model1_HFE.gif|thumb|Figure 4: Model 1 with a C-Score of -0.557]] |
− | |[[Image:Model2_HFE.gif|thumb| Model 2 with a C-Score of -2.539]] |
+ | |[[Image:Model2_HFE.gif|thumb|Figure 5: Model 2 with a C-Score of -2.539]] |
− | |[[Image:Model3_HFE.gif|thumb| Model 3 with a C-Score of -2.266]] |
+ | |[[Image:Model3_HFE.gif|thumb|Figure 6: Model 3 with a C-Score of -2.266]] |
− | |[[Image:Model4_HFE.gif|thumb| Model 4 with a C-Score of -2.772]] |
+ | |[[Image:Model4_HFE.gif|thumb|Figure 7: Model 4 with a C-Score of -2.772]] |
− | |[[Image:Model5_HFE.gif|thumb| Model 5 with a C-Score of -3.298]] |
+ | |[[Image:Model5_HFE.gif|thumb|Figure 8: Model 5 with a C-Score of -3.298]] |
|} |
|} |
||
− | Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%. |
+ | Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. A special case is 1de4 which is the transferin receptor, but in complex with the HFE protein (chain A) which is a self hit as well. The sequence in this case is also identical, but we can not give any conclusion about the 3D structure of the protein bind to the receptor. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%. |
− | '''I-Tasser using templates with a sequence identity below 80%''' to avoid self hits. |
+ | '''I-Tasser using templates with a sequence identity below 80%''' to avoid self hits. <br> |
+ | The second run brought to our surprise the same results based also on the same self hit. We have at this point no idea what went wrong but because the self hit is just one out of five templates used to create the model, we decided to keep the best model (Model1) for the comparison with the other methods. |
||
== SwissModel == |
== SwissModel == |
||
− | + | '''SwissModel'''<ref>Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation.</ref> is a server based tool provided by the SIB. It combines tools like PSI-PRED and DISOPRED for secondary structure and disordered region prediction.<br> |
|
+ | '''The SwissModel workspace''' is a web-based service dedicated to protein structure homology modeling. It provide a personal working environment where several projects can be calculated parallel. The environment provide tools for template selection, model building and structure quality evaluation as well. To find suitable templates for a given target protein a library of experimental protein structures is searched<ref>http://bioinformatics.oxfordjournals.org/cgi/content/short/22/2/195</ref>. <br> |
||
+ | '''The SwissModel repository''' is a database of annotated 3d protein structure models. The database consists of more than 3.4 million structures<ref>http://nar.oxfordjournals.org/content/37/suppl_1/D387.full</ref>. |
||
+ | All models were generated from the UniProt database with the SwissModel pipeline. Form the SwissModel repository the density of the QMEAN-Score is estimated to give a dent of the model quality of the predicted model. |
||
<br> |
<br> |
||
− | The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. |
+ | The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. We could just set a specific template, therefore we also run SwissModel in Alignment-Mode. So we had the chance to influence the alignment. And as one can see, the density of the QMEAN-Score and of the Automated mode and the Alignment mode are the same. Therefore the target (1a6z) and the template (1bii) are part of the same reference set. We take this as an indicator for a good template choice, because the template is in the same set as the target which is also used as a template in the Alignment mode. Therefore we rated this as evidence for the high diversity of the MHC 1 family. |
===Automated Mode=== |
===Automated Mode=== |
||
− | [[Image:16az.jpg|thumb| predicted model]] |
+ | [[Image:16az.jpg|thumb|Figure 9: predicted model]] |
Line 189: | Line 234: | ||
{| class="centered" |
{| class="centered" |
||
− | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png|thumb| Estimated absolute model quality]] |
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png|thumb|Figure 10: Estimated absolute model quality]] |
− | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_density_plot.png|thumb|Estimated density of model quality]] |
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_density_plot.png|thumb|Figure 11: Estimated density of model quality]] |
− | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_slider.png|thumb| Z-Score by category]] |
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_slider.png|thumb|Figure 12: Z-Score by category]] |
− | |[[Image:QMEAN_plots_energy_profile_plots_Batch.1.short.pdb_local_energy_profile_QMEANlocal.png|thumb|predicted error]] |
+ | |[[Image:QMEAN_plots_energy_profile_plots_Batch.1.short.pdb_local_energy_profile_QMEANlocal.png|thumb|Figure 13: predicted error for each position.]] |
|} |
|} |
||
− | Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one. |
+ | Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one. Figure 9 shows the predicted structure based on the template. The typical two parallel helices on a beta-sheet are clearly observable. Figure 10 shows the QMEAN4-score distribution over the protein size. Figure 11 shows the density plot of the reference set. The set contains the score of structures with a similar size. Figure 11 shows the different scores which are used to calculate the final QMEAN score. Here we can see, that the torsion angles caused the most issues, which leads to a lower QMEAN score. Figure 12 shows the predicted error for each residue on an arbitrary scale. We see a higher error at the beginning, but more or less the same pattern (pattern size of about 100 aa) of error values over the whole protein. |
===Alignment Mode=== |
===Alignment Mode=== |
||
+ | [[Image:Swissmodel_aliMode_model.jpg|thumb|Figure 14: predicted model]] |
||
+ | '''Model information:'''<br> |
||
+ | Modelled residue range: 1 to 272<br> |
||
+ | Based on template: 1bii_A |
||
+ | |||
+ | '''Quality information:'''<br> |
||
+ | QMEAN Z-Score: -2.065 |
||
+ | |||
− | == Modeller == |
||
+ | {| class="centered" |
||
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot_aliM.png|thumb|Figure 15: Estimated absolute model quality]] |
||
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_density_plot_aliM.png|thumb|Figure 16: Estimated density of model quality]] |
||
+ | |[[Image:QMEAN_plots_Batch.1.short.pdb_plot.png_slider_aliM.png|thumb|Figure 17: Z-Score by category]] |
||
+ | |[[Image:QMEAN_plots_energy_profile_plots_Batch.1.short.pdb_local_energy_profile_QMEANlocal_aliM.png|thumb|Figure 18: predicted error for each position]] |
||
+ | |} |
||
+ | |||
+ | TARGET 26 RSH SLHYLFMGAS EQDLGLSLFE |
||
+ | 1biiA 1 gsh slryfvtavs rpgfgeprym |
||
+ | TARGET sss ssssssssss sss |
||
+ | 1biiA sss ssssssssss sss |
||
+ | TARGET 49 ALGYVDDQLF VFYDHES--R RVEPRTPWVS SRISSQMWLQ LSQSLKGWDH |
||
+ | 1biiA 24 evgyvdntef vrfdsdaenp ryeprarwie -qegpeywer etrrakgneq |
||
+ | TARGET ssssss sss sssss sss hhh hh hhhhh hhhhhhhhhh |
||
+ | 1biiA ssssss sss sssss sss hh hhhhh hhhhhhhhhh |
||
+ | TARGET 97 MFTVDFWTIM ENH-NHSKES HTLQVILGCE MQEDNST-EG YWKYGYDGQD |
||
+ | 1biiA 73 sfrvdlrtal ryynqsaggs htlqwmagcd vesdgrllrg ywqfaydgcd |
||
+ | TARGET hhhhhhhhhh hhh ssssssssss sss sss ss sssssss ss |
||
+ | 1biiA hhhhhhhhhh hhh ssssssssss sss sssss sssssss ss |
||
+ | TARGET 145 HLEFCPDTLD WRAAEPRAWP TKLEWERHKI RARQNRAYLE RDCPAQLQQL |
||
+ | 1biiA 123 yialnedlkt wtaadmaaqi trrkweqa-g aaerdrayle gecvewlrry |
||
+ | TARGET sssss s ss hh hhhhh hhhhhhhhh hhhhhhhhhh |
||
+ | 1biiA sssss s ss hhh hhhhhhh hhhhhhhhh hhhhhhhhhh |
||
+ | TARGET 195 LELGRGVLDQ QVPPLVKVTH HVTS-SVTTL RCRALNYYPQ NITMKWLKDK |
||
+ | 1biiA 172 lkngnatllr tdppkahvth hrrpegdvtl rcwalgfypa ditltwqln- |
||
+ | TARGET hhh ssssss sss ssss ssssss sssssss |
||
+ | 1biiA hhh ssssss sss ssss ssssss sssss |
||
+ | TARGET 244 QPMDAKEFEP KDVLPNGDGT YQGWITLAVP PGEEQRYTCQ VEHPGLDQPL |
||
+ | 1biiA 221 geeltqemel vetrpagdgt fqkwasvvvp lgkeqkytch veheglpepl |
||
+ | TARGET ss s sss s sssssssss sssss ss s |
||
+ | 1biiA ss s sss s sssssssss sss ss s |
||
+ | TARGET 294 IVIW |
||
+ | 1biiA 271 tlrw- |
||
+ | TARGET ss |
||
+ | 1biiA ss |
||
+ | |||
+ | As one can see, a very similar secondary structure in this alignment is shown, and also a very similar 3d structure. The RMSD for the model is about 2.9. This is a quite good results but just the residues which are superimposed are used for the calculation. So the missing beta-sheet is not a part of the calculation. But in general, we see results, comparable to the self hit model. Figure 15 and Figure 16 also show the score distribution compared to other models of the same size. In Figure 17, we are able to see, also the torsion angles causes the main issues like in the self hit model. This could mean, that the torsion angles in this protein are not that obvious. The predicted error shown in Figure 18 shows a comparable patterning like the predicted error of the self hit model shown in Figure 13. But the high peak in the beginning is missing. |
||
+ | |||
+ | == MODELLER == |
||
+ | MODELLER<ref>Eswar N. et. Al. Comparative protein structure modeling using MODELLER.</ref> is a standalone application used for protein structure modeling by satisfying spatial restraints. These restraints derive from different types of information, so the model is not only based on the target-template alignment (but it also could). MODELLER is capable of pairwise/multiple alignment, fold assignment and modeling of loops. |
||
+ | |||
+ | We downloaded and installed Modeller locally to our Windows PC and used the examples given at the [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Workflow_homology_modelling_glucocerebrosidase#MODELLER Workflow homology modeling glucocerebrosidase]. |
||
+ | |||
+ | Our target has been set to the FASTA sequence of [http://www.uniprot.org/uniprot/Q30201.fasta HFE_HUMAN]. |
||
+ | Our standard template for the single template-target alignment has been set to chain A of '''1BII''', because it covers the whole sequence of the HFE_HUMAN. For the multiple sequence alignment we used additional to 1BII the protein structures '''1S79''' and '''3P73'''. Both, 1S79 and 3P73 were chosen because of the relative high sequence indentity of about 37% of 1S79 and because 3P73 is a classical MHC class I molecule with a similar function to the HFE_HUMAN protein. |
||
+ | |||
+ | === Single template-target === |
||
+ | |||
+ | ==== Scripts ==== |
||
+ | |||
+ | '''script_pairwise-alignment-template-target.py''' |
||
+ | from modeller import * |
||
+ | |||
+ | env = environ() |
||
+ | aln = alignment(env) |
||
+ | mdl = model(env, file='1BII.pdb', model_segment=('FIRST:A', 'END:A')) |
||
+ | aln.append_model(mdl, align_codes='1BII', atom_files='1BII.pdb') |
||
+ | aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN') |
||
+ | aln.align2d() |
||
+ | aln.check() |
||
+ | aln.write(file='pairwise-2d.ali', alignment_format='PIR') |
||
+ | aln.align() |
||
+ | aln.check() |
||
+ | aln.write(file='pairwise.ali', alignment_format='PIR') |
||
+ | |||
+ | '''script_pairwise-to-model.py''' |
||
+ | from modeller import * |
||
+ | from modeller.automodel import * |
||
+ | |||
+ | env = environ() |
||
+ | a = automodel(env, |
||
+ | alnfile = 'pairwise.ali', #file:pir:alignment |
||
+ | knowns = '1BII', #file:pdb:template |
||
+ | sequence = 'HFE_HUMAN', #id:target |
||
+ | assess_methods=(assess.DOPE, assess.GA341)) |
||
+ | a.starting_model= 1 |
||
+ | a.ending_model = 1 |
||
+ | a.make() |
||
+ | b = automodel(env, |
||
+ | alnfile = 'pairwise-2d.ali', #file:pir:alignment |
||
+ | knowns = '1BII', #file:pdb:template |
||
+ | sequence = 'HFE_HUMAN', #id:target |
||
+ | assess_methods=(assess.DOPE, assess.GA341)) |
||
+ | b.starting_model= 2 |
||
+ | b.ending_model = 2 |
||
+ | b.make() |
||
+ | |||
+ | ==== Alignments ==== |
||
+ | |||
+ | We used two different alignments for Modeller, one without use of structural information at the template side: |
||
+ | |||
+ | '''pairwise.ali''' |
||
+ | >P1;1BII |
||
+ | structureX:1BII.pdb: 1 :A:+383 :P:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; |
||
+ | ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; |
||
+ | EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON |
||
+ | HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 |
||
+ | -------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR |
||
+ | WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI |
||
+ | ALNEDLKTWTAADMAAQITRRKWEQAGAAER-DRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD |
||
+ | VTLRCWALGFYPADITLTWQLNGEEL-TQEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL |
||
+ | RW/IQKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTEFTP |
||
+ | TETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI* |
||
+ | |||
+ | >P1;HFE_HUMAN |
||
+ | sequence:reference: : : : :::-1.00:-1.00 |
||
+ | MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDH--ESRRVEPRTP |
||
+ | WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNST-EGYWKYGYDGQDHL |
||
+ | EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV |
||
+ | TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV |
||
+ | IW-------------EPSPSGTLVI---------------------GVISGIAVFVVILFIGILFIILRKRQGSR |
||
+ | GAMGHYV-------LAERE-------------------* |
||
+ | |||
+ | And one with the use of structural information at the template side: |
||
+ | |||
+ | '''pairwise-2d.ali''' |
||
+ | |||
+ | >P1;1BII |
||
+ | structureX:1BII.pdb: 1 :A:+383 :P:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; |
||
+ | ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; |
||
+ | EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON |
||
+ | HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 |
||
+ | ---------------------G----SHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR |
||
+ | WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI |
||
+ | ALNEDLKTWTAADMAAQITRRKWE-QAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD |
||
+ | VTLRCWALGFYPADITLTWQLNGEELT-QEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL |
||
+ | RW/I---QKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTE |
||
+ | FTPTETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI* |
||
+ | |||
+ | >P1;HFE_HUMAN |
||
+ | sequence:reference: : : : :::-1.00:-1.00 |
||
+ | MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYD--HESRRVEPRTP |
||
+ | WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSK-ESHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHL |
||
+ | EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV |
||
+ | TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV |
||
+ | IW-EPSPSGTLVIGVIS---------GIAVFVVILF--IGILFIILRK-RQGSRGAMGH---------YVLAERE |
||
+ | -----------------------------------------* |
||
+ | |||
+ | The models will be presented under [[Task_4#Model_comparison|model comparison]], but surprisingly the model with the structural information is worse than the model without. We think Modeller has some issues to threader the sequence of HFE_HUMAN into the given structure if 1BII. Therefore, we derive the possibility that 1a6z, which have a very similar structure to 1bii, has a different amino acid composition for this type of structure. But at the moment we have no chance to test and prove this. |
||
+ | |||
+ | === Alignment: multiple template-target === |
||
+ | |||
+ | ==== Scripts ==== |
||
+ | |||
+ | '''script_msa-align-templates.py''' |
||
+ | from modeller import * |
||
+ | |||
+ | env = environ() |
||
+ | aln = alignment(env) |
||
+ | for (code, chain) in (('1BII', 'A'), ('1S79', 'A'), ('3P73', 'A')): |
||
+ | mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain)) |
||
+ | aln.append_model(mdl, atom_files=code, align_codes=code+chain) |
||
+ | aln.salign() |
||
+ | aln.check() |
||
+ | aln.write(file='MSA.ali', alignment_format='PIR') |
||
+ | |||
+ | '''script_msa-align-target-to-msa.py''' |
||
+ | from modeller import * |
||
+ | |||
+ | env = environ() |
||
+ | aln = alignment(env) |
||
+ | aln.append(file='MSA.ali', align_codes='all') |
||
+ | aln_block = len(aln) |
||
+ | aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN') |
||
+ | aln.salign() |
||
+ | aln.check(); |
||
+ | aln.write(file='MSA.ali', alignment_format='PIR') |
||
+ | |||
+ | '''script_msa-to-model.py''' |
||
+ | from modeller import * |
||
+ | from modeller.automodel import * |
||
+ | |||
+ | env = environ() |
||
+ | a = automodel(env, |
||
+ | alnfile = 'MSA.ali', #file:pir:alignment |
||
+ | knowns = ('1BIIA', '1S79A', '3P73A'), #file:pdb:template |
||
+ | sequence = 'HFE_HUMAN', #id:target |
||
+ | assess_methods=(assess.DOPE, assess.GA341)) |
||
+ | a.starting_model = 1 |
||
+ | a.ending_model = 1 |
||
+ | a.make() |
||
+ | |||
+ | ==== Alignment ==== |
||
+ | |||
+ | The MSA used by Modeller is: |
||
+ | |||
+ | >P1;1BIIA |
||
+ | structureX:1BII:1 :A:+274 :A:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; |
||
+ | ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; |
||
+ | EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE |
||
+ | MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 |
||
+ | -------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR |
||
+ | WIEQEGPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYIA |
||
+ | LNEDLKTWTAADMAAQITRRKWEQAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGDVT |
||
+ | LRCWALGFYPADITLTWQLNGEELTQ-EMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTLRW |
||
+ | ---------------------------------------------------* |
||
+ | |||
+ | >P1;1S79A |
||
+ | structureX:1S79:100 :A:+103 :A:MOL_ID 1; MOLECULE LUPUS LA PROTEIN; CHAIN A; FRAGMENT CENTRAL RRM; SYNONYM SJOGREN SYNDROME TYPE B ANTIGEN, SS-B, LA RIBONUCLEOPROTEIN, LA AUTOANTIGEN; ENGINEERED YES:MOL_ID |
||
+ | 1; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE SSB; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3)PLYSS; |
||
+ | EXPRESSION_SYSTEM_VECTOR PET28:-1.00:-1.00 |
||
+ | -------------------------GRWILKNDVKNRSVYIKGFPTDATLDDIK--------------------- |
||
+ | --------------------------------------------------------------------------- |
||
+ | ----------------------------------------EWLEDKGQVLNIQMRRT------------------ |
||
+ | --------------LHKAFKGSIFVV-FDSIESAKKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE--- |
||
+ | ---------------------------------------------------* |
||
+ | |||
+ | >P1;3P73A |
||
+ | structureX:3P73:-1 :A:+275 :A:MOL_ID 1; MOLECULE MHC RFP-Y CLASS I ALPHA CHAIN; CHAIN A; FRAGMENT UNP RESIDUES 20-294; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2-MICROGLOBULIN; CHAIN B; ENGINEERED |
||
+ | YES:MOL_ID 1; ORGANISM_SCIENTIFIC GALLUS GALLUS; ORGANISM_COMMON BANTAM,CHICKENS; ORGANISM_TAXID 9031; GENE YFV; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN |
||
+ | TB1; EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PMAL-P4X; MOL_ID 2; ORGANISM_SCIENTIFIC GALLUS GALLUS; ORGANISM_COMMON BANTAM,CHICKENS; ORGANISM_TAXID 9031; GENE B2M; EXPRESSION_SYSTEM |
||
+ | ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN TB1; EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PMAL-P4X: 1.32: 0.16 |
||
+ | -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKSRTA--QPIVE |
||
+ | MLPQEDQEHWDTQTQKAQGGERDFDWNLNRLPERYNKSKG-SHTMQMMFGCDILEDGS-IRGYDQYAFDGRDFLA |
||
+ | FDMDTMTFTAADPVAEITKRRWETEGTYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEADGILT |
||
+ | LSCHAHGFYPRPITISWMKDGMVRDQ-ETRWGGIVPNSDGTYHASAAIDVLPEDGDKYWCRVEHASLPQPGLFSW |
||
+ | EP------------------------------------------------Q* |
||
+ | |||
+ | >P1;HFE_HUMAN |
||
+ | sequence:reference: : : : :::-1.00:-1.00 |
||
+ | MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVE-PRTPW |
||
+ | VSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHLE |
||
+ | FCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTT |
||
+ | LRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW |
||
+ | EPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE* |
||
+ | |||
+ | ==== Model editing ==== |
||
+ | We tried to edit the single-template model of MODELLER, because it is one of our best models. As we looked at our alignment with Jalview 2.6 (Figure 19), we noticed that the alignment is already very well defined and changes will only lead to worse results. The average conservation is about 7 to 8 and the quality around 5 to 6. |
||
+ | |||
+ | [[Image:Model_modeller_pw.png|thumb|450px|Figure 19: visualization of the single-template model of MODELLER done by Jalview]] |
||
+ | |||
+ | The hydrophobic groups are also very well aligned, so we decided to leave that model as it is, because there is nothing to edit. Only the end of the alignment has much gaps, but shifting the gaps would result in a break of the conserved block in the middle of the alignment. |
||
+ | |||
+ | The only difference between the see-supported model and the single-template model are the different aligned residues (Figure 20). These result from the information about the secondary structure of the template incorporated into the model and thus we will not edit them. |
||
+ | |||
+ | [[Image:Model_modeller_pw-2d.png|thumb|450px|Figure 20: visualization of the see-supported model of MODELLER done by Jalview]] |
||
+ | |||
+ | It is hard to edit the msa model because of the multiple alignments between the different sequences. We tried changing some aligned groups to different position inside the sequence alignment, but were not able to manage the corresponding alignment at the other sequences. After some unfruitful tempts we decided to leave also that alignment as it is (Figure 21). |
||
+ | |||
+ | [[Image:Model_modeller_msa.png|thumb|450px|Figure 21: visualization of the multi-template model of MODELLER done by Jalview]] |
||
+ | |||
+ | |||
+ | In a summary, we have not edited any alignment successful because there was nothing to edit or it was too complicated and introduced too much errors. |
||
==Model comparison== |
==Model comparison== |
||
+ | ===3D-Jigsaw=== |
||
+ | We had several issues with the execution of 3D-Jigsaw<ref>Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2001) Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM.</ref>, like strange error messages and non accepting of our input. Finally we got it to work with the following instruction: |
||
+ | |||
+ | * Server: http://bmm.cancerresearchuk.org/~populus/populus_submit.html |
||
+ | * Mode: upload |
||
+ | * sequence box: FASTA sequence of [[http://www.pdb.org/pdb/files/fasta.txt?structureIdList=1A6Z 1A6Z_A]] |
||
+ | * own models: one pdb file containing all of our models (except the SwissModel selfhit) separated by the 'TER' command |
||
+ | * predicted runtime: 18.33 hours |
||
+ | |||
+ | ==== Result ==== |
||
+ | |||
+ | {| border layout = 1 |
||
+ | !Name |
||
+ | !Data |
||
+ | |- |
||
+ | | Length || <font face="Courier New">_________10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270___ </font> |
||
+ | |- |
||
+ | | AA || <font face="Courier New">RLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW </font> |
||
+ | |- |
||
+ | | Prediction || <font face="Courier New"> CCCC<font color = blue>EEEEEEEEEE</font>CCCCCCCCCC<font color = blue>EEEEEEE</font>CCCC<font color = blue>EEEE</font>CCCCCCCCC<font color= red>HHHHHH</font>CCCCC<font color= red>HHHHHHHHHHHHH</font>CC<font color= red>HHHHHHHHHHHHH</font>CCCCCC<font color = blue>EEEE</font>CCCCC<font color = blue>EE</font>CCCCCCCC<font color = blue>EEE</font>CCCCCC<font color = blue>EEEEE</font>C<font color= red>HHHHHH</font>CCCCC<font color= red>HHH</font>CC<font color= red>HHHH</font>CCC<font color= red>HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH</font>CCCCCC<font color = blue>EEEEEEEEE</font>CC<font color = blue>EEEEEEEE</font>CCCCCCC<font color = blue>EEEEEEE</font>CC<font color = blue>EE</font>CC<font color= red>HHH</font><font color = blue>EEEEEEE</font>CCCCCCCCC<font color = blue>EEEEEE</font>CCCCCCC<font color = blue>EEEEEEE</font>CCCCCC<font color = blue>EEEE</font>C </font> |
||
+ | |- |
||
+ | | Confidence || <font face="Courier New">93303453556763258999987358999894885499848988867403652146670222210276553000136609989986528798168852425544699858524402146712899971352101652001125776224548999999999998999999999987887332589869999951389499999762611761499996677566832279987550889983003699826988533699999504888767859 </font> |
||
+ | |- |
||
+ | | Disorder || <font face="Courier New"><font color = orange>DDDD</font><font color = green>OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO</font><font color = orange>D</font><font color = green>OOOOOOOOOO</font><font color = orange>DDDDDDDDDD</font><font color = green>OOOO</font><font color = orange>DDD</font><font color = green>OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO</font><font color = orange>DD</font><font color = green>OOO</font><font color = orange>DDDDD</font><font color = green>OOOOOOOOOOOOOOOOOOOOOOOOOOOO</font><font color = orange>DD</font><font color = green>OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO</font><font color = orange>DDD</font><font color = green>OOOOOOOOOOOO</font><font color = orange>D</font><font color = green>OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO</font><font color = orange>D</font><font color = green>OOO</font><font color = orange>DD</font> </font> |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | 3D-Jigsaw gave us also information about the predicted secondary structure and the ordered and disordered regions. It used that information to successfully optimize all of our submitted models. All optimized models have an energy around ~ -340 and an coverage of 0.99, which is really good. Their date and pictures are visible at the table below. |
||
+ | |||
+ | ===Model evaluation=== |
||
+ | |||
+ | After trying serveral tools (RasWin, JMol, SwissPDB-Viewer), we decided to use PyMol for superimposing and displaying the model-target alignment of the proteins. We truncated the original HFE_HUMAN protein (pdbid: 1a6z) at chain C, thus we used only chain A and B for displaying. The original HFE_HUMAN is always shown in green and the model in red (see table below). |
||
+ | |||
+ | We created our models using PyMol by: |
||
+ | * load '1A6Z_AB.pdb' into PyMol (alternatively: command 'fetch 1A6Z' and then hide chain C and D) |
||
+ | * hide everything |
||
+ | * show cartoon |
||
+ | * color red |
||
+ | * load 'model.pdb' into PyMol |
||
+ | * hide everything |
||
+ | * show cartoon |
||
+ | * color green |
||
+ | * align 'model.pdb' to '1A6Z_AB.pdb' |
||
+ | * command 'ray' (nicer output!) |
||
+ | * save the image |
||
+ | |||
+ | |||
+ | For evaluating our models with the RMSD and TMScore we used TMalign. We were advised to use SAP for the RMSD and TMScore for the TMScore but TMScore failed because our target is the sequence of the HFE_HUMAN from UniProt and therefore longer than the '1BII' template. This causes a problem with TMScore because it needs pdbs with same length and the thus the superimposing of TMScore does not really work. |
||
+ | |||
+ | TMalign is able to use pdbs with different length and the scores are normalized by the second structure. We use '1A6Z' as second structure to create comparable scores of all our models. The modeling of HFE_HUMAN is very difficult because it is a multi domain protein. All the methods do not support a multi domain modeling. |
||
+ | |||
+ | [http://zhanglab.ccmb.med.umich.edu/TM-align/ TMalign] can be found at the website of the Zhang-Lab. |
||
+ | |||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | |'''Picture''' |
||
+ | |'''Model''' |
||
+ | |'''RMSD''' |
||
+ | |'''TM-Score''' |
||
+ | |'''Optimized picture''' |
||
+ | |'''Optimized RMSD''' |
||
+ | |'''Optimized TM-Score''' |
||
+ | |'''3D-JigSaw energy calculation''' |
||
+ | |- |
||
+ | | [[Image:Modeller_1a6z(green,AB)_model(red).png|thumb| MODELLER: superimposed, green:1a6z, red:model(1BII)]]|| MODELLER: superimpose, template:1BII || 2.58 || 0.86468 || [[Image:3dj_m1_pw_1a6z(green,AB)_model(red).png|thumb| Optimized MODELLER pw-model by 3D-Jigsaw]]|| 1.70 || 0.95082 || -341.87 |
||
+ | |- |
||
+ | | [[Image:Modeller_sse-support_1a6z(green,AB)_model(red).png|thumb| MODELLER: sse-support, superimposed, green:1a6z, red:model(1BII)]] || MODELLER: superimpose, see-support, template:1BII || 3.42 || 0.59586 || [[Image:3dj_m2_pw2d_1a6z(green,AB)_model(red).png|thumb| Optimized MODELLER sse-model by 3D-Jigsaw]] || 0.98 || 0.96990 || -341.41 |
||
+ | |- |
||
+ | | [[Image:Modeller_msa_1a6z(green,AB)_model(red,1BII,1S79,3P73).png|thumb| MODELLER: msa, superimposed, green:1a6z, red:model(1BII,1S79,3P73)]] || MODELLER: superimpose, msa, template:1BII,1S79,3P73 || 2.05 || 0.89042 || [[Image:3dj_m3_msa_1a6z(green,AB)_model(red).png|thumb| Optimized MODELLER msa-model by 3D-Jigsaw]] || 1.70 || 0.95087 || -341.23 |
||
+ | |- |
||
+ | | [[Image:I_tasser_1a6z_superimpose.PNG|thumb| I_Tasser: superimposed, green:1a6z, red:model]] || I-Tasser || 1.61 || 0.93760 || [[Image:3dj_m4_itas_1a6z(green,AB)_model(red).png|thumb| Optimized I_Tasser model by 3D-Jigsaw]] || 2.48 || 0.87855 || -339.33 |
||
+ | |- |
||
+ | | [[Image:Swissmodel_1a6z_superimposed.PNG|thumb| SwissModel: superimposed, green:1a6z, red:model]] || SwissModel || 2.67 || 0.85048 || [[Image:3dj_m5_sm_1a6z(green,AB)_model(red).png|thumb| Optimized SwissModel model by 3D-Jigsaw]] || 2.48 || 0.87851 || -339.17 |
||
+ | |- |
||
+ | | [[Image:I_tasser_swiss_self_superimpose.PNG|thumb| SwissModel: self-hit, superimposed, green:1a6z, red:model]] || SwissModel self || 0.08 || 0.99984 |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | As one can clearly see, the I-Tasser model is the best with an TM-Score ~0.94 followed by the MSA model of MODELLER with an TM-Score of ~0.89 and the SwissModel with an TM-Score of ~0.85. |
||
+ | |||
+ | The worst model is the secondary structure supported information at the template site model of MODELLER with an TM-Score of ~0.6. We are sure, that the low sequence identity and secondary structure similarity of only 22% affected this model the bad way, because the normal model is also based on the same template and achieves an significantly higher TM-Score. |
||
+ | |||
+ | All of our models are really good, except for the sse-supported model of MODELLER. |
||
+ | |||
+ | After optimization by 3D-Jigsaw all MODELLER model are much better because 3D-Jigsaw cut off those clearly wrong modeled strands of useless amino acids. Surprisingly, the worse sse-supported model is now the best of all MODELLER models and even better than the previous best model of I-Tasser. It is not surprising that 3D-Jigsaw was also able to optimize the Swissmodel model but failed at the I-Tasser model, because it incorporated the information of the not so well done models. But surprisingly, the RMSD of the I-Tasser model got worse after the 3D-Jigsaw optimization. |
||
+ | |||
+ | Our models are still all very good, but the best one is now the sse-supported model of MODELLER with an RMSD of below 1 and and TM-Score of almost 1; it is now almost an perfect model. The second and third best models are standard pairwise and the msa model of MODELLER which are now very similar according to the RMSD and TM-Score. The I-Tasser and SwissModel model are now both very similar, too. |
||
+ | |||
+ | ==Discussion== |
||
+ | |||
+ | For the I-Tasser protocol, it is not possible to choose a specific template, so we run I-Tasser twice, first with standard parameter, and one with a similarity threshold of 80%. In the second case, we got a model also based on a self hit. So we repeated the prediction a third time with the same result. We were not able to find out for what reason the given threshold was ignored. |
||
+ | |||
+ | Our attempts to get homologous at all given categories (>60%, >40%, >20%) was not successful, because HHSearch was not able to list matching ones. Doing a Blast search against the NR-Database also failed to provide acceptable results and resulted only in proteins with 40% or less sequence identity. Thus we come to the conclusion, that the HFE family must have a very high diversity of the sequence by a high structural conservation. This theory got supported as we did an alignment of structural homologous listed in [http://www.cathdb.info/ CATH]. |
||
+ | |||
+ | The templates which we had chosen from the HHSearch were used to cover the whole protein sequence and give a special coverage of the transmembrane region. But as we saw later the tools do not support multiple sequence alignments. Therefore we decided to use '1BII' as template for SwissModel and Modeller because it covers the sequence completely and with a sequence similarity of 22% it is in the lower midrange of the HHSearch results. '1S79' has with 37% more sequence identity but also a very worse conservation with HFE_HUMAN. We decided to rank coverage of the whole sequence higher than the sequence identity. |
||
+ | |||
+ | After this task, we would suggest SwissModel to use in the first place to get a quick overview and a first idea about the protein structure. We also would advice I-Tasser because of its nice usability. The Modeller approach we would just advise for experts, which are really interested in a special alignment, as the usability is awful for layman. |
||
+ | |||
+ | ==== Extra diligence task ==== |
||
+ | We were not able to perform the task of calculating the RMSD of all atoms inside an 6 Angström threshold of the catalytic core, because there is no one defined at [http://www.uniprot.org/uniprot/Q30201#section_features UniProt:Q30201(HFE_HUMAN)] and also not at [http://www.pdb.org/pdb/explore/explore.do?structureId=1A6Z PDB:1A6Z]. |
||
== References == |
== References == |
||
<references /> |
<references /> |
||
+ | |||
+ | [[Category : Hemochromatosis]] |
Latest revision as of 14:59, 30 August 2011
by Robert Greil and Cedric Landerer
Contents
Homologous
Because we found no homologous structures in Task 2, we extended our list by using HHSearch.
HHSearch found just sequences with an identity below 40% therefore we will use the 12 proteins shown below for creating a multiple alignment for homologous modeling. We choose sequences to cover the whole protein and we pay specific attention on the transmembrane region. In the cases were we can just use one template, we will use 1BII as a template (Figure 1).
PDB-ID | Identity | Description |
1S79 | 37% | human La protein |
2WY3 | 29% | HCMV UL16-MICB complex |
3P73 | 28% | classical MHC class I molecule |
3JTS | 25% | Mamu A*2 |
1KCG | 22% | NKG2D |
1BII | 22% | H-2DD MHC CLASS I |
1OW0 | 22% | human FcaRI |
2P24 | 21% | alphabeta TCR |
1CD1 | 21% | MHC-like fold with a large hydrophobic binding groove |
1HXM | 18% | Human Vgamma9/Vdelta2 T Cell Receptor |
1LQV | 14% | Endothelial protein C receptor |
1JFM | 14% | MURINE NK CELL LIGAND RAE-1 BETA |
With these sequences including the HFE-Gen(Q30201), we did a multiple sequence alignment with t-coffee(EXPRESSO). This multiple sequence alignment is later used as a raw alignment in the Alignment Mode of SwissModel and Modeller. Later on, we will try to fit better models by editing the alignment by keeping functional regions together.
DSSP --EEEEEEEEEEB-SS-SSB--EEEEEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH Q30201 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHES--RRVE-PRTPWVSSRISSQMWLQLSQSLKGWDHM 1S79_A --------------------------------------------------------------------GRW-IL-KNDVKNRSVYIKGFPTDATLDDIKE 3P73_A -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKS--RTAQ-PIVEML-PQEDQEHWDTQTQKAQGGERD 1KCG_C -------------------------DAHSLWYNFTIIHLPRHGQQWCEVQSQVDQKNFLSYDCGS--DKVLSMGHL-EEQLYATDAWGKQLEMLREVGQR 1JFM_A -------------------------DAHSLRCNLTIKDPTPADPLWYEAKCFVGEILILHLSNIN--KTMT-SG-DPGETANATEVKKCLTQPLKNLCQK 1BII_A -MGAMAPRTLLLLLAAALGPTQTRAGSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYE-PRARWIE-QEGPEYWERETRRAKGNEQS 2P24_A ----------------------------------------------------------------------------M----AIMAPRTLVLLLSGALALT 1CD1_A -----------------------QQKNYTFRCLQMSSFANR-SWSRTDSVVWLGDLQTHRWSNDS--ATIS-FTKPWSQGKLSNQQWEKLQHMFQVYRVS 2WY3_A ------------------------MEPHSLRYNLMVLSQDESVQSGFLAEGHLDGQPFLRYDRQK--RRAK-PQGQWAEDVLGAETWDTETEDLTENGQD 1LQV_A -------------------SQDASDGLQRLHMLQISYFR-DPYHVWYQGNASLGGHLTHVLEGPDTNTTII-QLQPL----QEPESWARTQSGLQSYLLQ 3JTS_A -------------------------GSHSMRYFYTSMSRPGRWEPRFIAVGYVDDTQFVRFDSDAASQRME-PRAPWVE-QEGPEYWDRETRNMKAETQN 1OW0_A ---------------------------------------------------------------------------------------------------- 1HXM_A -------------------------------------------------------------------------------------AIELVPEHQTVPVSI DSSP HHHHHHHHTTT-SSS--E--------EEEEEE-EEE-TTS-E-EEE-E------------EEEETTEE----------------EEEEEGGGTEEEES-- Q30201 FTVDFWTIMENHN-HSKE--------SHTLQV-ILGCEMQED-NST-E------------GYWKYGYD----------------GQDHLEFCPDTLDW-- 1S79_A WLEDKGQV-LNIQMRRTL--------HKAFKG-SIFVVFDSI-ESA-KKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE--------------- 3P73_A FDWNLNRLPERYN-KSKG--------SHTMQM-MFGCDILED-GSI-R------------GYDQYAFD----------------GRDFLAFDMDTMTF-- 1KCG_C LRLELADT---------ELEDFTPSGPLTLQV-RMSCECEAD-GYI-R------------GSWQFSFD----------------GRKFLLFDSNNRKW-- 1JFM_A LRNKVSNT-KVDTHKTNG--------YPHLQV-TMIYPQSQG-RTP-S------------ATWEFNIS----------------DSYFFTFYTENMSW-- 1BII_A FRVDLRTALRYYNQSAGG--------SHTLQW-MAGCDVESD-GRLLR------------GYWQFAYD----------------GCDYIALNEDLKTW-- 2P24_A QTWAGSHSRGEDD--IEA--------DHVGSYGIVVYQSP----GD-I------------GQYTFEFD----------------GDELFYVDLDKKET-- 1CD1_A FTRDIQELVKMMSPKEDY--------PIEIQL-SAGCEMYPG-NAS-E------------SFLHVAFQ----------------GKYVVRFWG--TSWQT 2WY3_A LRRTLTHI----KDQKGG--------LHSLQE-IRVCEIHED-SST-R------------GSRHFYYN----------------GELFLSQNLETQES-- 1LQV_A FHGLVRLVHQERT--LAF--------PLTIRC-FLGCELPPEGSRA-H------------VFFEVAVN----------------GSSFVSFRPERALW-- 3JTS_A APVNLRNLRGYYNQSEAG--------SHTIQR-MYGCDLGPD-GRLLR------------GYHQSAYD----------------GKDYIALNEDLRSW-- 1OW0_A -----ACHPRLSLHRPAL--------EDLLLG-SEANLTCTL-TGLRD------------ASGVTFTW----------------TPSSGKSAV--QGPPE 1HXM_A GVPATLRCSMKGEAIGNY--------YINWYR-KTQGNTMTF-IYRE-------------KDIYGPGF----------------KDNFQGDIDIAKNL-- DSSP SGG-G----HHH-HHHHHSSTHHH--HHHHHHHHTHHHHHHHHHHHHHTTTSS--B--EEEEEEEE-SS-----E-EEEEEEEEEBSS--EEEEEETTEE Q30201 RAA-E----PRA-WPTKLEWERHK--IRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVT----S-SVTTLRCRALNYYPQNITMKWLKD 1S79_A ---------------------------------------------------------------------------------------------------- 3P73_A TAA-D----PVA-EITKRRWETEG--TYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEA----D-GILTLSCHAHGFYPRPITISWMKD 1KCG_C TVV-H----AGA-RRMKEKWEKDS--GLTTFFKMVSMRDCKSWLRDFLMHRKKRLE-------------------------------------------- 1JFM_A RSA-N----DES-GVIMNKWKDDG--EFVKQLKFLI-HECSQKMDEFLKQSKEK---------------------------------------------- 1BII_A TAA-D----MAA-QITRRKWEQA---GAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRR----PEGDVTLRCWALGFYPADITLTWQLN 2P24_A IWM-------------LPEFAQLR--SFDPQGGLQNIATGKHNLGVLTKRSNSTPATNEAPQATVFPKSP--VLLGQPNTLICFVDNIFPPVINITWLRN 1CD1_A VPGAP----SWL-DLPIKVLNADQ--GTSATVQMLLNDTCPLFVRGLLEAGKSDLEKQEKPVAWLSSVP---SSAHGHRQLVCHVSGFYPKPVWVMWMRG 2WY3_A TVP-QSSRAQTLAMNVTNFW-KEDAMKTKTHYRAMQ-ADCLQKLQRYLKSGVAIRRTVPPMVNVTCSEVS----EGNITVTCRASSFYPRNITLTWRQDG 1LQV_A QAD-TQVTSGVV-TFTLQQLNAYN--RTRYELREFLEDTCVQYVQKHISAENTKGSQTSRSYTS------------------------------------ 3JTS_A TAA-D----MAA-QNTQRKWEAA---GEAEQHRTYLEGECLEWLRRYLENGKETLQRADPPKTHVTHHPV----SDQEATLRCWALGFYPAEITLTWQRD 1OW0_A R--DL----CGC-YSVSSVLPGCA--EPWNHGKTFTCTAAYPESKTPLTATLSKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQG 1HXM_A AVL-K----ILA-PSERDEGSYYC--ACDTLGMGGEYTDKLIFGKGTRVTVEPRSQPHTKPSVFVMKNG---------TNVACLVKEFYPKDIRINLVSS DSSP --GGGS---EEEE-TTS-E----EEEEEEEE-TTGGGGEE---EEEE-TTSSS-EEE-E- Q30201 K-QPMDAKEFEPKDVLPNG----DGTYQGWITLAVPPGEE---QRYTCQVEHPGLDQ-PLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQ 1S79_A ---------------------------------------------------------------------------------------------------- 3P73_A --GMVRDQETRWGGIVPNS----DGTYHASAAIDVLPEDG---DKYWCRVEHASLPQ-PGLFSWEPQ--------------------------------- 1KCG_C ---------------------------------------------------------------------------------------------------- 1JFM_A ---------------------------------------------------------------------------------------------------- 1BII_A --GEELTQEMELVETRPAG----DGTFQKWASVVVPLGKE---QKYTCHVEHEGLPE-PLTLRWGKEEPPSSTKTNTVIIAVPVVLGAVVILGAVMAFVM 2P24_A --SKSVADGVYETSFFVNR----DYSFHKLSYLTFIPSDD---DIYDCKVEHWGLEE-PVLKHWEPEIPAPMSELTETSGSRLEVLFQ------------ 1CD1_A --DQ-EQQGTHRGDFLPNA----DETWYLQATLDVEAGEE---AGLACRVKHSSLGG-QDIILYWDARQAPVGLIVFIVLIMLVVVGAVVYYIWRRRSAY 2WY3_A --VSLSHNTQQWGDVLPDG----NGTYQTWVATRIRQGEE---QRFTCYMEHSGNHG-THPVPSGKVLVLQSQRTDFPYVSAAMPCFVIIIILCVPCCKK 1LQV_A ---------------------------------------------------------------------------------------------------- 3JTS_A --GEDQTQDTELVETRPAG----DGTFQKWAAVVVPSGKE---QRYTCHVQHEGLRE-PLTLRWEP---------------------------------- 1OW0_A SQEL-PREKYLTW-ASRQEPSQGTTTFAVTSILRVAAEDWKKGDTFSCMVGHEALPLAFTQKTIDRLAGK------------------------------ 1HXM_A -----KKITEFDPAIVISP----SGKYNAVKLGKYE--DS---NSVTCSVQHDNK---TVHSTDFEVKTDSTDHVKPKETENTKQPSKS----------- DSSP Q30201 GSRGAMGHYVLAERE---------------- 1S79_A ------------------------------- 3P73_A ------------------------------- 1KCG_C ------------------------------- 1JFM_A ------------------------------- 1BII_A KRRRNTGGKGGDYALAPGSQSSDMSLPDCKV 2P24_A ------------------------------- 1CD1_A QDIR--------------------------- 2WY3_A KTSAAEGP----------------------- 1LQV_A ------------------------------- 3JTS_A ------------------------------- 1OW0_A ------------------------------- 1HXM_A -------------------------------
Based on the secondary structure for the HFE-Gen assigned by DSSP from the PDB structure (1a6z) the multiple sequence alignment conserves most parts of the secondary structure.
As HHSearch found just weak homologous, we searched in CATH to find structure homologous. The BLAST search in CATH found sequence homologous in a range from 49% to 22%. The HFE protein is classified as a two domain protein (Alpha Beta, Mainly Beta)<ref>http://www.cathdb.info/domain/1a6zA01</ref>. We found both domains with a sequence similarity of 100%. We than used BLAST to test the results at random with another search against CATH. We found for several proteins the same sequence identity distribution. With this BLAST search, we are now sure HFE is a protein with a high conservation in structure elements but a very weak sequence conservation. Therefore we would recommend a new acceptance range of about 20% to 40% sequence similarity for this protein.
I-Tasser
I-Tasser<ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins</ref> is a webservice for protein structure prediction provided and published by Ambrish Roy, Alper Kucukural and Yang Zhang at http://zhanglab.ccmb.med.umich.edu/I-TASSER/ for the CASP competition with outstanding achievement (Figure 2).
The I-Tasser protocol consists of several steps which are:
- threading the sequence into different structures to create an initial template.
- break the template apart into fragments which match the structure (leave the parts of the structure out to which no sequence is assigned).
- Structure assembly and clustering
- use the cluster centroid for structure reassembly
- search the structure with the lowest energy and do REMO H-bond optimization to get the final model.
A graphical workflow is shown in Figure 3.
Further on, I-Tasser also predicts GO-Terms and binding sites. Therefore it uses the final model to search for global and local matches in the PDB to predict these terms.
For us, a problem is that I-Tasser only generates complete models, but the PDB structure of our protein is not complete. Therefore we compared the predicted secondary structure with the one form UniProt.
Compare secondary structure of the model and the structure assigned in UniProt:
Seq: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMEN Pred: CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCEEEEECCCCCCCCCCEEEEEEECCCEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHH UniP: ---------------------------EEEEEEEEEEE----EEE--EEEEEE--EEEEEEEEEE--EEE--------TTTHHHHHHHHHHHHHHHHHHHHHHHHHHT Seq: HNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHV Pred: HCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCEEEECCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHCCCCCCCCCCCCC UniP: TT-EEE--EEEEEEEEEE-----EEEEEEEEE--EEEEEEEHHH-EEEEEE---HHHHHHHH---HHHHHHHHHHH-HHHHHHHHHHHHHTTT-------EEEEEEEE Seq: TSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGI Pred: CCCHHHHCHHHHCCCCCCEEEEEEECCCCCCCCCCEEEECCCCCCCCCCCEEEEECCCCCCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCCCCCHHHHHHHCCHHH UniP: ----EEEEEEEEEEEEE--EEEEEE------HHH----EEEE-----EEEEEEEEE---HHHHEEEEEE---EEE-EEEE---------------------------- Seq: LFIILRKRQGSRGAMGHYVLAERE Pred: HHHHHHCCCCCCCCCCCCCHCCCC UniP: ------------------------
For a better overview we replaced the I-Tasser S for Sheet by an E like in the UniProt secondary structure.
As we can see, the secondary structure predicted by I-Tasser is mostly correct. Sometimes we see a slightly shift in the structure and sometimes the secondary structure elements have not the correct length. As this model is also based on a self hit, it is not a surprise to see a good results like this one.
Predicted Secondary Structure by I-Tasser
Sequence: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF Predicted: CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCSSSSSCCCCCCCCCCSSSSSSSCCCSSSSCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH Conf-Score: 985028899999999899875122045421036641367999985269985643743686068998778788540145583478888887676654315558 Sequence: WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ Predicted: HHHHHHHCCCCCCSSSSSSSCCCCCCCCCCCCCCCCCCCCCCSSSSCCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHC Conf-Score: 888755315777644463525565898763541000558873365263022202455666677878887004598888767064299999999747666642 Sequence: QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV Predicted: CCCCCCCCCCCCCCCHHHHCHHHHCCCCCCSSSSSSSCCCCCCCCCCSSSSCCCCCCCCCCCSSSSSCCCCCCCCSSSSCCCCCCCCCSSSSCCCCCCCCCC Conf-Score: 599877567699854442101541541332479864358754456553541024888652112699807986310267512589998726840688766531 Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC Conf-Score: 010211112222100246665443013678898651020169
Secondary structure elements are shown as H for Alpha helix, S for Beta sheet & C for Coil
Predicted Solvent Accessibility by I-Tasser
Sequence: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF Prediction: 723312000000000101112222011200120120023333331200000102322003123724434241311436413610352044144313323230 Sequence: WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ Prediction: 220132133351310001010021136231211333023032003016303403102321432433044143404422010333005103400630351154 Sequence: QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV Prediction: 342353313321443300000100101014010203346564435434135233334221320000000347533120214264144202020214542200 Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE Prediction: 000001100000011100000001334446443132333438
Values range from 0 (buried residue) to 9 (highly exposed residue)
I-Tasser predicted five Models with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below. As cutoff for the C-Score, we use -1.5 as recommended by the Zhang group<ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010). Zhang et al.</ref> that is proposed to give a false-positive and a false-negative rate of about 0.1. That means that more than 90% of the quality predictions are correct. Therefore we just use Model1 for the comparison with the other methods. All resulting models are shown below in Figure 4 to Figure 8.
Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. A special case is 1de4 which is the transferin receptor, but in complex with the HFE protein (chain A) which is a self hit as well. The sequence in this case is also identical, but we can not give any conclusion about the 3D structure of the protein bind to the receptor. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%.
I-Tasser using templates with a sequence identity below 80% to avoid self hits.
The second run brought to our surprise the same results based also on the same self hit. We have at this point no idea what went wrong but because the self hit is just one out of five templates used to create the model, we decided to keep the best model (Model1) for the comparison with the other methods.
SwissModel
SwissModel<ref>Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation.</ref> is a server based tool provided by the SIB. It combines tools like PSI-PRED and DISOPRED for secondary structure and disordered region prediction.
The SwissModel workspace is a web-based service dedicated to protein structure homology modeling. It provide a personal working environment where several projects can be calculated parallel. The environment provide tools for template selection, model building and structure quality evaluation as well. To find suitable templates for a given target protein a library of experimental protein structures is searched<ref>http://bioinformatics.oxfordjournals.org/cgi/content/short/22/2/195</ref>.
The SwissModel repository is a database of annotated 3d protein structure models. The database consists of more than 3.4 million structures<ref>http://nar.oxfordjournals.org/content/37/suppl_1/D387.full</ref>.
All models were generated from the UniProt database with the SwissModel pipeline. Form the SwissModel repository the density of the QMEAN-Score is estimated to give a dent of the model quality of the predicted model.
The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. We could just set a specific template, therefore we also run SwissModel in Alignment-Mode. So we had the chance to influence the alignment. And as one can see, the density of the QMEAN-Score and of the Automated mode and the Alignment mode are the same. Therefore the target (1a6z) and the template (1bii) are part of the same reference set. We take this as an indicator for a good template choice, because the template is in the same set as the target which is also used as a template in the Alignment mode. Therefore we rated this as evidence for the high diversity of the MHC 1 family.
Automated Mode
Model information:
Modelled residue range: 26 to 297
Based on template: 1a6zC (2.60 Å)
Sequence Identity [%]: 100
Evalue: 7.66e-163
Quality information:
QMEAN Z-Score: -1.035
Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one. Figure 9 shows the predicted structure based on the template. The typical two parallel helices on a beta-sheet are clearly observable. Figure 10 shows the QMEAN4-score distribution over the protein size. Figure 11 shows the density plot of the reference set. The set contains the score of structures with a similar size. Figure 11 shows the different scores which are used to calculate the final QMEAN score. Here we can see, that the torsion angles caused the most issues, which leads to a lower QMEAN score. Figure 12 shows the predicted error for each residue on an arbitrary scale. We see a higher error at the beginning, but more or less the same pattern (pattern size of about 100 aa) of error values over the whole protein.
Alignment Mode
Model information:
Modelled residue range: 1 to 272
Based on template: 1bii_A
Quality information:
QMEAN Z-Score: -2.065
TARGET 26 RSH SLHYLFMGAS EQDLGLSLFE 1biiA 1 gsh slryfvtavs rpgfgeprym TARGET sss ssssssssss sss 1biiA sss ssssssssss sss TARGET 49 ALGYVDDQLF VFYDHES--R RVEPRTPWVS SRISSQMWLQ LSQSLKGWDH 1biiA 24 evgyvdntef vrfdsdaenp ryeprarwie -qegpeywer etrrakgneq TARGET ssssss sss sssss sss hhh hh hhhhh hhhhhhhhhh 1biiA ssssss sss sssss sss hh hhhhh hhhhhhhhhh TARGET 97 MFTVDFWTIM ENH-NHSKES HTLQVILGCE MQEDNST-EG YWKYGYDGQD 1biiA 73 sfrvdlrtal ryynqsaggs htlqwmagcd vesdgrllrg ywqfaydgcd TARGET hhhhhhhhhh hhh ssssssssss sss sss ss sssssss ss 1biiA hhhhhhhhhh hhh ssssssssss sss sssss sssssss ss TARGET 145 HLEFCPDTLD WRAAEPRAWP TKLEWERHKI RARQNRAYLE RDCPAQLQQL 1biiA 123 yialnedlkt wtaadmaaqi trrkweqa-g aaerdrayle gecvewlrry TARGET sssss s ss hh hhhhh hhhhhhhhh hhhhhhhhhh 1biiA sssss s ss hhh hhhhhhh hhhhhhhhh hhhhhhhhhh TARGET 195 LELGRGVLDQ QVPPLVKVTH HVTS-SVTTL RCRALNYYPQ NITMKWLKDK 1biiA 172 lkngnatllr tdppkahvth hrrpegdvtl rcwalgfypa ditltwqln- TARGET hhh ssssss sss ssss ssssss sssssss 1biiA hhh ssssss sss ssss ssssss sssss TARGET 244 QPMDAKEFEP KDVLPNGDGT YQGWITLAVP PGEEQRYTCQ VEHPGLDQPL 1biiA 221 geeltqemel vetrpagdgt fqkwasvvvp lgkeqkytch veheglpepl TARGET ss s sss s sssssssss sssss ss s 1biiA ss s sss s sssssssss sss ss s TARGET 294 IVIW 1biiA 271 tlrw- TARGET ss 1biiA ss
As one can see, a very similar secondary structure in this alignment is shown, and also a very similar 3d structure. The RMSD for the model is about 2.9. This is a quite good results but just the residues which are superimposed are used for the calculation. So the missing beta-sheet is not a part of the calculation. But in general, we see results, comparable to the self hit model. Figure 15 and Figure 16 also show the score distribution compared to other models of the same size. In Figure 17, we are able to see, also the torsion angles causes the main issues like in the self hit model. This could mean, that the torsion angles in this protein are not that obvious. The predicted error shown in Figure 18 shows a comparable patterning like the predicted error of the self hit model shown in Figure 13. But the high peak in the beginning is missing.
MODELLER
MODELLER<ref>Eswar N. et. Al. Comparative protein structure modeling using MODELLER.</ref> is a standalone application used for protein structure modeling by satisfying spatial restraints. These restraints derive from different types of information, so the model is not only based on the target-template alignment (but it also could). MODELLER is capable of pairwise/multiple alignment, fold assignment and modeling of loops.
We downloaded and installed Modeller locally to our Windows PC and used the examples given at the Workflow homology modeling glucocerebrosidase.
Our target has been set to the FASTA sequence of HFE_HUMAN. Our standard template for the single template-target alignment has been set to chain A of 1BII, because it covers the whole sequence of the HFE_HUMAN. For the multiple sequence alignment we used additional to 1BII the protein structures 1S79 and 3P73. Both, 1S79 and 3P73 were chosen because of the relative high sequence indentity of about 37% of 1S79 and because 3P73 is a classical MHC class I molecule with a similar function to the HFE_HUMAN protein.
Single template-target
Scripts
script_pairwise-alignment-template-target.py
from modeller import * env = environ() aln = alignment(env) mdl = model(env, file='1BII.pdb', model_segment=('FIRST:A', 'END:A')) aln.append_model(mdl, align_codes='1BII', atom_files='1BII.pdb') aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN') aln.align2d() aln.check() aln.write(file='pairwise-2d.ali', alignment_format='PIR') aln.align() aln.check() aln.write(file='pairwise.ali', alignment_format='PIR')
script_pairwise-to-model.py
from modeller import * from modeller.automodel import * env = environ() a = automodel(env, alnfile = 'pairwise.ali', #file:pir:alignment knowns = '1BII', #file:pdb:template sequence = 'HFE_HUMAN', #id:target assess_methods=(assess.DOPE, assess.GA341)) a.starting_model= 1 a.ending_model = 1 a.make() b = automodel(env, alnfile = 'pairwise-2d.ali', #file:pir:alignment knowns = '1BII', #file:pdb:template sequence = 'HFE_HUMAN', #id:target assess_methods=(assess.DOPE, assess.GA341)) b.starting_model= 2 b.ending_model = 2 b.make()
Alignments
We used two different alignments for Modeller, one without use of structural information at the template side:
pairwise.ali
>P1;1BII structureX:1BII.pdb: 1 :A:+383 :P:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 -------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI ALNEDLKTWTAADMAAQITRRKWEQAGAAER-DRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD VTLRCWALGFYPADITLTWQLNGEEL-TQEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL RW/IQKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTEFTP TETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI* >P1;HFE_HUMAN sequence:reference: : : : :::-1.00:-1.00 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDH--ESRRVEPRTP WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNST-EGYWKYGYDGQDHL EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV IW-------------EPSPSGTLVI---------------------GVISGIAVFVVILFIGILFIILRKRQGSR GAMGHYV-------LAERE-------------------*
And one with the use of structural information at the template side:
pairwise-2d.ali
>P1;1BII structureX:1BII.pdb: 1 :A:+383 :P:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 ---------------------G----SHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI ALNEDLKTWTAADMAAQITRRKWE-QAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD VTLRCWALGFYPADITLTWQLNGEELT-QEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL RW/I---QKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTE FTPTETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI* >P1;HFE_HUMAN sequence:reference: : : : :::-1.00:-1.00 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYD--HESRRVEPRTP WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSK-ESHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHL EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV IW-EPSPSGTLVIGVIS---------GIAVFVVILF--IGILFIILRK-RQGSRGAMGH---------YVLAERE -----------------------------------------*
The models will be presented under model comparison, but surprisingly the model with the structural information is worse than the model without. We think Modeller has some issues to threader the sequence of HFE_HUMAN into the given structure if 1BII. Therefore, we derive the possibility that 1a6z, which have a very similar structure to 1bii, has a different amino acid composition for this type of structure. But at the moment we have no chance to test and prove this.
Alignment: multiple template-target
Scripts
script_msa-align-templates.py
from modeller import * env = environ() aln = alignment(env) for (code, chain) in (('1BII', 'A'), ('1S79', 'A'), ('3P73', 'A')): mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(mdl, atom_files=code, align_codes=code+chain) aln.salign() aln.check() aln.write(file='MSA.ali', alignment_format='PIR')
script_msa-align-target-to-msa.py
from modeller import * env = environ() aln = alignment(env) aln.append(file='MSA.ali', align_codes='all') aln_block = len(aln) aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN') aln.salign() aln.check(); aln.write(file='MSA.ali', alignment_format='PIR')
script_msa-to-model.py
from modeller import * from modeller.automodel import * env = environ() a = automodel(env, alnfile = 'MSA.ali', #file:pir:alignment knowns = ('1BIIA', '1S79A', '3P73A'), #file:pdb:template sequence = 'HFE_HUMAN', #id:target assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 1 a.make()
Alignment
The MSA used by Modeller is:
>P1;1BIIA structureX:1BII:1 :A:+274 :A:MOL_ID 1; MOLECULE MHC CLASS I H-2DD; CHAIN A; FRAGMENT HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM DD; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2 MICROGLOBULIN; CHAIN B; ENGINEERED YES; MOL_ID 3; MOLECULE DECAMERIC PEPTIDE; CHAIN P; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-3A; MOL_ID 2; ORGANISM_SCIENTIFIC MUS MUSCULUS; ORGANISM_COMMON HOUSE MOUSE; ORGANISM_TAXID 10090; CELL_LINE BL21; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID PET-8C; MOL_ID 3: 2.40: 0.28 -------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR WIEQEGPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYIA LNEDLKTWTAADMAAQITRRKWEQAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGDVT LRCWALGFYPADITLTWQLNGEELTQ-EMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTLRW ---------------------------------------------------* >P1;1S79A structureX:1S79:100 :A:+103 :A:MOL_ID 1; MOLECULE LUPUS LA PROTEIN; CHAIN A; FRAGMENT CENTRAL RRM; SYNONYM SJOGREN SYNDROME TYPE B ANTIGEN, SS-B, LA RIBONUCLEOPROTEIN, LA AUTOANTIGEN; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC HOMO SAPIENS; ORGANISM_COMMON HUMAN; ORGANISM_TAXID 9606; GENE SSB; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN BL21(DE3)PLYSS; EXPRESSION_SYSTEM_VECTOR PET28:-1.00:-1.00 -------------------------GRWILKNDVKNRSVYIKGFPTDATLDDIK--------------------- --------------------------------------------------------------------------- ----------------------------------------EWLEDKGQVLNIQMRRT------------------ --------------LHKAFKGSIFVV-FDSIESAKKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE--- ---------------------------------------------------* >P1;3P73A structureX:3P73:-1 :A:+275 :A:MOL_ID 1; MOLECULE MHC RFP-Y CLASS I ALPHA CHAIN; CHAIN A; FRAGMENT UNP RESIDUES 20-294; ENGINEERED YES; MOL_ID 2; MOLECULE BETA-2-MICROGLOBULIN; CHAIN B; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC GALLUS GALLUS; ORGANISM_COMMON BANTAM,CHICKENS; ORGANISM_TAXID 9031; GENE YFV; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN TB1; EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PMAL-P4X; MOL_ID 2; ORGANISM_SCIENTIFIC GALLUS GALLUS; ORGANISM_COMMON BANTAM,CHICKENS; ORGANISM_TAXID 9031; GENE B2M; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; EXPRESSION_SYSTEM_STRAIN TB1; EXPRESSION_SYSTEM_VECTOR_TYPE PLASMID; EXPRESSION_SYSTEM_PLASMID PMAL-P4X: 1.32: 0.16 -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKSRTA--QPIVE MLPQEDQEHWDTQTQKAQGGERDFDWNLNRLPERYNKSKG-SHTMQMMFGCDILEDGS-IRGYDQYAFDGRDFLA FDMDTMTFTAADPVAEITKRRWETEGTYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEADGILT LSCHAHGFYPRPITISWMKDGMVRDQ-ETRWGGIVPNSDGTYHASAAIDVLPEDGDKYWCRVEHASLPQPGLFSW EP------------------------------------------------Q* >P1;HFE_HUMAN sequence:reference: : : : :::-1.00:-1.00 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVE-PRTPW VSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHLE FCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTT LRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW EPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE*
Model editing
We tried to edit the single-template model of MODELLER, because it is one of our best models. As we looked at our alignment with Jalview 2.6 (Figure 19), we noticed that the alignment is already very well defined and changes will only lead to worse results. The average conservation is about 7 to 8 and the quality around 5 to 6.
The hydrophobic groups are also very well aligned, so we decided to leave that model as it is, because there is nothing to edit. Only the end of the alignment has much gaps, but shifting the gaps would result in a break of the conserved block in the middle of the alignment.
The only difference between the see-supported model and the single-template model are the different aligned residues (Figure 20). These result from the information about the secondary structure of the template incorporated into the model and thus we will not edit them.
It is hard to edit the msa model because of the multiple alignments between the different sequences. We tried changing some aligned groups to different position inside the sequence alignment, but were not able to manage the corresponding alignment at the other sequences. After some unfruitful tempts we decided to leave also that alignment as it is (Figure 21).
In a summary, we have not edited any alignment successful because there was nothing to edit or it was too complicated and introduced too much errors.
Model comparison
3D-Jigsaw
We had several issues with the execution of 3D-Jigsaw<ref>Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2001) Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM.</ref>, like strange error messages and non accepting of our input. Finally we got it to work with the following instruction:
- Server: http://bmm.cancerresearchuk.org/~populus/populus_submit.html
- Mode: upload
- sequence box: FASTA sequence of [1A6Z_A]
- own models: one pdb file containing all of our models (except the SwissModel selfhit) separated by the 'TER' command
- predicted runtime: 18.33 hours
Result
Name | Data |
---|---|
Length | _________10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270___ |
AA | RLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW |
Prediction | CCCCEEEEEEEEEECCCCCCCCCCEEEEEEECCCCEEEECCCCCCCCCHHHHHHCCCCCHHHHHHHHHHHHHCCHHHHHHHHHHHHHCCCCCCEEEECCCCCEECCCCCCCCEEECCCCCCEEEEECHHHHHHCCCCCHHHCCHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCEEEEEEEEECCEEEEEEEECCCCCCCEEEEEEECCEECCHHHEEEEEEECCCCCCCCCEEEEEECCCCCCCEEEEEEECCCCCCEEEEC |
Confidence | 93303453556763258999987358999894885499848988867403652146670222210276553000136609989986528798168852425544699858524402146712899971352101652001125776224548999999999998999999999987887332589869999951389499999762611761499996677566832279987550889983003699826988533699999504888767859 |
Disorder | DDDDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODOOOOOOOOOODDDDDDDDDDOOOODDDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODDOOODDDDDOOOOOOOOOOOOOOOOOOOOOOOOOOOODDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODDDOOOOOOOOOOOODOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODOOODD |
3D-Jigsaw gave us also information about the predicted secondary structure and the ordered and disordered regions. It used that information to successfully optimize all of our submitted models. All optimized models have an energy around ~ -340 and an coverage of 0.99, which is really good. Their date and pictures are visible at the table below.
Model evaluation
After trying serveral tools (RasWin, JMol, SwissPDB-Viewer), we decided to use PyMol for superimposing and displaying the model-target alignment of the proteins. We truncated the original HFE_HUMAN protein (pdbid: 1a6z) at chain C, thus we used only chain A and B for displaying. The original HFE_HUMAN is always shown in green and the model in red (see table below).
We created our models using PyMol by:
- load '1A6Z_AB.pdb' into PyMol (alternatively: command 'fetch 1A6Z' and then hide chain C and D)
- hide everything
- show cartoon
- color red
- load 'model.pdb' into PyMol
- hide everything
- show cartoon
- color green
- align 'model.pdb' to '1A6Z_AB.pdb'
- command 'ray' (nicer output!)
- save the image
For evaluating our models with the RMSD and TMScore we used TMalign. We were advised to use SAP for the RMSD and TMScore for the TMScore but TMScore failed because our target is the sequence of the HFE_HUMAN from UniProt and therefore longer than the '1BII' template. This causes a problem with TMScore because it needs pdbs with same length and the thus the superimposing of TMScore does not really work.
TMalign is able to use pdbs with different length and the scores are normalized by the second structure. We use '1A6Z' as second structure to create comparable scores of all our models. The modeling of HFE_HUMAN is very difficult because it is a multi domain protein. All the methods do not support a multi domain modeling.
TMalign can be found at the website of the Zhang-Lab.
Picture | Model | RMSD | TM-Score | Optimized picture | Optimized RMSD | Optimized TM-Score | 3D-JigSaw energy calculation |
MODELLER: superimpose, template:1BII | 2.58 | 0.86468 | 1.70 | 0.95082 | -341.87 | ||
MODELLER: superimpose, see-support, template:1BII | 3.42 | 0.59586 | 0.98 | 0.96990 | -341.41 | ||
MODELLER: superimpose, msa, template:1BII,1S79,3P73 | 2.05 | 0.89042 | 1.70 | 0.95087 | -341.23 | ||
I-Tasser | 1.61 | 0.93760 | 2.48 | 0.87855 | -339.33 | ||
SwissModel | 2.67 | 0.85048 | 2.48 | 0.87851 | -339.17 | ||
SwissModel self | 0.08 | 0.99984 |
As one can clearly see, the I-Tasser model is the best with an TM-Score ~0.94 followed by the MSA model of MODELLER with an TM-Score of ~0.89 and the SwissModel with an TM-Score of ~0.85.
The worst model is the secondary structure supported information at the template site model of MODELLER with an TM-Score of ~0.6. We are sure, that the low sequence identity and secondary structure similarity of only 22% affected this model the bad way, because the normal model is also based on the same template and achieves an significantly higher TM-Score.
All of our models are really good, except for the sse-supported model of MODELLER.
After optimization by 3D-Jigsaw all MODELLER model are much better because 3D-Jigsaw cut off those clearly wrong modeled strands of useless amino acids. Surprisingly, the worse sse-supported model is now the best of all MODELLER models and even better than the previous best model of I-Tasser. It is not surprising that 3D-Jigsaw was also able to optimize the Swissmodel model but failed at the I-Tasser model, because it incorporated the information of the not so well done models. But surprisingly, the RMSD of the I-Tasser model got worse after the 3D-Jigsaw optimization.
Our models are still all very good, but the best one is now the sse-supported model of MODELLER with an RMSD of below 1 and and TM-Score of almost 1; it is now almost an perfect model. The second and third best models are standard pairwise and the msa model of MODELLER which are now very similar according to the RMSD and TM-Score. The I-Tasser and SwissModel model are now both very similar, too.
Discussion
For the I-Tasser protocol, it is not possible to choose a specific template, so we run I-Tasser twice, first with standard parameter, and one with a similarity threshold of 80%. In the second case, we got a model also based on a self hit. So we repeated the prediction a third time with the same result. We were not able to find out for what reason the given threshold was ignored.
Our attempts to get homologous at all given categories (>60%, >40%, >20%) was not successful, because HHSearch was not able to list matching ones. Doing a Blast search against the NR-Database also failed to provide acceptable results and resulted only in proteins with 40% or less sequence identity. Thus we come to the conclusion, that the HFE family must have a very high diversity of the sequence by a high structural conservation. This theory got supported as we did an alignment of structural homologous listed in CATH.
The templates which we had chosen from the HHSearch were used to cover the whole protein sequence and give a special coverage of the transmembrane region. But as we saw later the tools do not support multiple sequence alignments. Therefore we decided to use '1BII' as template for SwissModel and Modeller because it covers the sequence completely and with a sequence similarity of 22% it is in the lower midrange of the HHSearch results. '1S79' has with 37% more sequence identity but also a very worse conservation with HFE_HUMAN. We decided to rank coverage of the whole sequence higher than the sequence identity.
After this task, we would suggest SwissModel to use in the first place to get a quick overview and a first idea about the protein structure. We also would advice I-Tasser because of its nice usability. The Modeller approach we would just advise for experts, which are really interested in a special alignment, as the usability is awful for layman.
Extra diligence task
We were not able to perform the task of calculating the RMSD of all atoms inside an 6 Angström threshold of the catalytic core, because there is no one defined at UniProt:Q30201(HFE_HUMAN) and also not at PDB:1A6Z.
References
<references />