Difference between revisions of "Homology based structure predictions"

Revision as of 17:38, 20 June 2011

Homologous

1bii, the template structure

Because we found no homologous structures in Task 2, we extended our list by using HHSearch.

HHSearch found just sequences with an indentity below 40% therefore we will use the 12 proteins shown below for creating a multiple alignment for homologous modeling. We choose sequences to cover the whole protein and we pay specific attention on the transmembrane region.

PDB-ID	Identity	Description
1S79	37%	human La protein
2WY3	29%	HCMV UL16-MICB complex
3P73	28%	classical MHC class I molecule
3JTS	25%	Mamu A*2
1KCG	22%	NKG2D
1BII	22%	H-2DD MHC CLASS I
1OW0	22%	human FcaRI
2P24	21%	alphabeta TCR
1CD1	21%	MHC-like fold with a large hydrophobic binding groove
1HXM	18%	Human Vgamma9/Vdelta2 T Cell Receptor
1LQV	14%	Endothelial protein C receptor
1JFM	14%	MURINE NK CELL LIGAND RAE-1 BETA

With these sequences including the HFE-Gen(Q30201), we did a multiple sequence alignment with t-coffee(EXPRESSO). This multiple sequence alignment is later used as a raw alignment in the Alignment Mode of SwissModel and Modeller. Later on, we will try to fit better models by editing the alignment by keeping functional regions together.

  DSSP                                   --EEEEEEEEEEB-SS-SSB--EEEEEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH
Q30201          MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHES--RRVE-PRTPWVSSRISSQMWLQLSQSLKGWDHM
1S79_A          --------------------------------------------------------------------GRW-IL-KNDVKNRSVYIKGFPTDATLDDIKE
3P73_A          -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKS--RTAQ-PIVEML-PQEDQEHWDTQTQKAQGGERD
1KCG_C          -------------------------DAHSLWYNFTIIHLPRHGQQWCEVQSQVDQKNFLSYDCGS--DKVLSMGHL-EEQLYATDAWGKQLEMLREVGQR
1JFM_A          -------------------------DAHSLRCNLTIKDPTPADPLWYEAKCFVGEILILHLSNIN--KTMT-SG-DPGETANATEVKKCLTQPLKNLCQK
1BII_A          -MGAMAPRTLLLLLAAALGPTQTRAGSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYE-PRARWIE-QEGPEYWERETRRAKGNEQS
2P24_A          ----------------------------------------------------------------------------M----AIMAPRTLVLLLSGALALT
1CD1_A          -----------------------QQKNYTFRCLQMSSFANR-SWSRTDSVVWLGDLQTHRWSNDS--ATIS-FTKPWSQGKLSNQQWEKLQHMFQVYRVS
2WY3_A          ------------------------MEPHSLRYNLMVLSQDESVQSGFLAEGHLDGQPFLRYDRQK--RRAK-PQGQWAEDVLGAETWDTETEDLTENGQD
1LQV_A          -------------------SQDASDGLQRLHMLQISYFR-DPYHVWYQGNASLGGHLTHVLEGPDTNTTII-QLQPL----QEPESWARTQSGLQSYLLQ
3JTS_A          -------------------------GSHSMRYFYTSMSRPGRWEPRFIAVGYVDDTQFVRFDSDAASQRME-PRAPWVE-QEGPEYWDRETRNMKAETQN
1OW0_A          ----------------------------------------------------------------------------------------------------
1HXM_A          -------------------------------------------------------------------------------------AIELVPEHQTVPVSI
                                                                 
  DSSP          HHHHHHHHTTT-SSS--E--------EEEEEE-EEE-TTS-E-EEE-E------------EEEETTEE----------------EEEEEGGGTEEEES--
Q30201          FTVDFWTIMENHN-HSKE--------SHTLQV-ILGCEMQED-NST-E------------GYWKYGYD----------------GQDHLEFCPDTLDW--
1S79_A          WLEDKGQV-LNIQMRRTL--------HKAFKG-SIFVVFDSI-ESA-KKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE---------------
3P73_A          FDWNLNRLPERYN-KSKG--------SHTMQM-MFGCDILED-GSI-R------------GYDQYAFD----------------GRDFLAFDMDTMTF--
1KCG_C          LRLELADT---------ELEDFTPSGPLTLQV-RMSCECEAD-GYI-R------------GSWQFSFD----------------GRKFLLFDSNNRKW--
1JFM_A          LRNKVSNT-KVDTHKTNG--------YPHLQV-TMIYPQSQG-RTP-S------------ATWEFNIS----------------DSYFFTFYTENMSW--
1BII_A          FRVDLRTALRYYNQSAGG--------SHTLQW-MAGCDVESD-GRLLR------------GYWQFAYD----------------GCDYIALNEDLKTW--
2P24_A          QTWAGSHSRGEDD--IEA--------DHVGSYGIVVYQSP----GD-I------------GQYTFEFD----------------GDELFYVDLDKKET--
1CD1_A          FTRDIQELVKMMSPKEDY--------PIEIQL-SAGCEMYPG-NAS-E------------SFLHVAFQ----------------GKYVVRFWG--TSWQT
2WY3_A          LRRTLTHI----KDQKGG--------LHSLQE-IRVCEIHED-SST-R------------GSRHFYYN----------------GELFLSQNLETQES--
1LQV_A          FHGLVRLVHQERT--LAF--------PLTIRC-FLGCELPPEGSRA-H------------VFFEVAVN----------------GSSFVSFRPERALW--
3JTS_A          APVNLRNLRGYYNQSEAG--------SHTIQR-MYGCDLGPD-GRLLR------------GYHQSAYD----------------GKDYIALNEDLRSW--
1OW0_A          -----ACHPRLSLHRPAL--------EDLLLG-SEANLTCTL-TGLRD------------ASGVTFTW----------------TPSSGKSAV--QGPPE
1HXM_A          GVPATLRCSMKGEAIGNY--------YINWYR-KTQGNTMTF-IYRE-------------KDIYGPGF----------------KDNFQGDIDIAKNL--
                                                                 
  DSSP          SGG-G----HHH-HHHHHSSTHHH--HHHHHHHHTHHHHHHHHHHHHHTTTSS--B--EEEEEEEE-SS-----E-EEEEEEEEEBSS--EEEEEETTEE
Q30201          RAA-E----PRA-WPTKLEWERHK--IRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVT----S-SVTTLRCRALNYYPQNITMKWLKD
1S79_A          ----------------------------------------------------------------------------------------------------
3P73_A          TAA-D----PVA-EITKRRWETEG--TYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEA----D-GILTLSCHAHGFYPRPITISWMKD
1KCG_C          TVV-H----AGA-RRMKEKWEKDS--GLTTFFKMVSMRDCKSWLRDFLMHRKKRLE--------------------------------------------
1JFM_A          RSA-N----DES-GVIMNKWKDDG--EFVKQLKFLI-HECSQKMDEFLKQSKEK----------------------------------------------
1BII_A          TAA-D----MAA-QITRRKWEQA---GAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRR----PEGDVTLRCWALGFYPADITLTWQLN
2P24_A          IWM-------------LPEFAQLR--SFDPQGGLQNIATGKHNLGVLTKRSNSTPATNEAPQATVFPKSP--VLLGQPNTLICFVDNIFPPVINITWLRN
1CD1_A          VPGAP----SWL-DLPIKVLNADQ--GTSATVQMLLNDTCPLFVRGLLEAGKSDLEKQEKPVAWLSSVP---SSAHGHRQLVCHVSGFYPKPVWVMWMRG
2WY3_A          TVP-QSSRAQTLAMNVTNFW-KEDAMKTKTHYRAMQ-ADCLQKLQRYLKSGVAIRRTVPPMVNVTCSEVS----EGNITVTCRASSFYPRNITLTWRQDG
1LQV_A          QAD-TQVTSGVV-TFTLQQLNAYN--RTRYELREFLEDTCVQYVQKHISAENTKGSQTSRSYTS------------------------------------
3JTS_A          TAA-D----MAA-QNTQRKWEAA---GEAEQHRTYLEGECLEWLRRYLENGKETLQRADPPKTHVTHHPV----SDQEATLRCWALGFYPAEITLTWQRD
1OW0_A          R--DL----CGC-YSVSSVLPGCA--EPWNHGKTFTCTAAYPESKTPLTATLSKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQG
1HXM_A          AVL-K----ILA-PSERDEGSYYC--ACDTLGMGGEYTDKLIFGKGTRVTVEPRSQPHTKPSVFVMKNG---------TNVACLVKEFYPKDIRINLVSS
                                                                  
  DSSP          --GGGS---EEEE-TTS-E----EEEEEEEE-TTGGGGEE---EEEE-TTSSS-EEE-E-
Q30201          K-QPMDAKEFEPKDVLPNG----DGTYQGWITLAVPPGEE---QRYTCQVEHPGLDQ-PLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQ
1S79_A          ----------------------------------------------------------------------------------------------------
3P73_A          --GMVRDQETRWGGIVPNS----DGTYHASAAIDVLPEDG---DKYWCRVEHASLPQ-PGLFSWEPQ---------------------------------
1KCG_C          ----------------------------------------------------------------------------------------------------
1JFM_A          ----------------------------------------------------------------------------------------------------
1BII_A          --GEELTQEMELVETRPAG----DGTFQKWASVVVPLGKE---QKYTCHVEHEGLPE-PLTLRWGKEEPPSSTKTNTVIIAVPVVLGAVVILGAVMAFVM
2P24_A          --SKSVADGVYETSFFVNR----DYSFHKLSYLTFIPSDD---DIYDCKVEHWGLEE-PVLKHWEPEIPAPMSELTETSGSRLEVLFQ------------
1CD1_A          --DQ-EQQGTHRGDFLPNA----DETWYLQATLDVEAGEE---AGLACRVKHSSLGG-QDIILYWDARQAPVGLIVFIVLIMLVVVGAVVYYIWRRRSAY
2WY3_A          --VSLSHNTQQWGDVLPDG----NGTYQTWVATRIRQGEE---QRFTCYMEHSGNHG-THPVPSGKVLVLQSQRTDFPYVSAAMPCFVIIIILCVPCCKK
1LQV_A          ----------------------------------------------------------------------------------------------------
3JTS_A          --GEDQTQDTELVETRPAG----DGTFQKWAAVVVPSGKE---QRYTCHVQHEGLRE-PLTLRWEP----------------------------------
1OW0_A          SQEL-PREKYLTW-ASRQEPSQGTTTFAVTSILRVAAEDWKKGDTFSCMVGHEALPLAFTQKTIDRLAGK------------------------------
1HXM_A          -----KKITEFDPAIVISP----SGKYNAVKLGKYE--DS---NSVTCSVQHDNK---TVHSTDFEVKTDSTDHVKPKETENTKQPSKS-----------
                                                                  
  DSSP
Q30201          GSRGAMGHYVLAERE----------------
1S79_A          -------------------------------
3P73_A          -------------------------------
1KCG_C          -------------------------------
1JFM_A          -------------------------------
1BII_A          KRRRNTGGKGGDYALAPGSQSSDMSLPDCKV
2P24_A          -------------------------------
1CD1_A          QDIR---------------------------
2WY3_A          KTSAAEGP-----------------------
1LQV_A          -------------------------------
3JTS_A          -------------------------------
1OW0_A          -------------------------------
1HXM_A          -------------------------------

Based on the secondary structure for the HFE-Gen assigned by DSSP from the PDB structure (1a6z) the multiple sequence alignment conserves most parts of the secondary structure.

As HHSearch found just weak homologous, we searched in CATH to find structure homologous. The BLAST search in CATH found sequence homologous in a range from 49% to 22%. The HFE protein is classified as a two domain protein (Alpha Beta, Mainly Beta)<ref>http://www.cathdb.info/domain/1a6zA01</ref>. We found both domains with a sequence similarity of 100%. We than used BLAST to test the results at random with another search against CATH. We found for several proteins the same sequence identity distribution. With this BLAST search, we are now sure HFE is a protein with a high conservation in structure elements but a very weak sequence conservation. Therefore we would recommend a new acceptance range of about 20% to 40% sequence similarity for this protein.

I-Tasser

perfomance of I-TASSER at CASP
Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html

Workflow of the I-Tasser server
Source: http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html

I-Tasser is a webservice for protein structure prediction provided and published by Ambrish Roy, Alper Kucukural and Yang Zhang at http://zhanglab.ccmb.med.umich.edu/I-TASSER/ for the CASP competition with outstanding achievement.

The I-Tasser protocol consists of several steps which are:

threading the sequence into different structures to create an initial template.
break the template apart into fragments which match the structure (leave the parts of the structrue out to which no sequence is assigned).
Structure assembly and clustering
use the cluster centroid for structure reassembly
search the structure with the lowest energy and do REMO H-bond optimization to get the final model.

Further on, I-Tasser also predicts GO-Terms and binding sites. Therefore it uses the final model to search for global and local matches in the PDB to predict these terms.

For us, a problem is that I-Tasser only generates complete models, but the PDB structure of our protein is not complete. Therefore we compared the predicted secondary structure with the one form UniProt.

Compare secondary structure of the model and the structure assigned in UniProt:

Seq:  MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMEN
Pred: CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCEEEEECCCCCCCCCCEEEEEEECCCEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHH
UniP: ---------------------------EEEEEEEEEEE----EEE--EEEEEE--EEEEEEEEEE--EEE--------TTTHHHHHHHHHHHHHHHHHHHHHHHHHHT

Seq:  HNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHV
Pred: HCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCEEEECCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHCCCCCCCCCCCCC
UniP: TT-EEE--EEEEEEEEEE-----EEEEEEEEE--EEEEEEEHHH-EEEEEE---HHHHHHHH---HHHHHHHHHHH-HHHHHHHHHHHHHTTT-------EEEEEEEE

Seq:  TSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGI
Pred: CCCHHHHCHHHHCCCCCCEEEEEEECCCCCCCCCCEEEECCCCCCCCCCCEEEEECCCCCCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCCCCCHHHHHHHCCHHH
UniP: ----EEEEEEEEEEEEE--EEEEEE------HHH----EEEE-----EEEEEEEEE---HHHHEEEEEE---EEE-EEEE----------------------------

Seq:  LFIILRKRQGSRGAMGHYVLAERE
Pred: HHHHHHCCCCCCCCCCCCCHCCCC
UniP: ------------------------

For a better overview we replaced the I-Tasser S for Sheet by an E like in the UniProt secondary structure.

As we can see, the secondary structure predicted by I-Tasser is mostly correct. Somtimes we see a slightly shift in the structure and sometimes the secondary structure elements have not the correct length. As this model is also based on a self hit, it is not a suprise to see a good results like this one.

Predicted Secondary Structure by I-Tasser

Sequence:   MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF
Predicted:  CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCSSSSSCCCCCCCCCCSSSSSSSCCCSSSSCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH
Conf-Score: 985028899999999899875122045421036641367999985269985643743686068998778788540145583478888887676654315558

Sequence:   WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ
Predicted:  HHHHHHHCCCCCCSSSSSSSCCCCCCCCCCCCCCCCCCCCCCSSSSCCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHC
Conf-Score: 888755315777644463525565898763541000558873365263022202455666677878887004598888767064299999999747666642

Sequence:   QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV
Predicted:  CCCCCCCCCCCCCCCHHHHCHHHHCCCCCCSSSSSSSCCCCCCCCCCSSSSCCCCCCCCCCCSSSSSCCCCCCCCSSSSCCCCCCCCCSSSSCCCCCCCCCC
Conf-Score: 599877567699854442101541541332479864358754456553541024888652112699807986310267512589998726840688766531

Sequence:   IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC
Conf-Score: 010211112222100246665443013678898651020169

Secondary structure elements are shown as H for Alpha helix, S for Beta sheet & C for Coil

Predicted Solvent Accessibility by I-Tasser

Sequence:   MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF
Prediction: 723312000000000101112222011200120120023333331200000102322003123724434241311436413610352044144313323230

Sequence:   WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ
Prediction: 220132133351310001010021136231211333023032003016303403102321432433044143404422010333005103400630351154

Sequence:   QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV
Prediction: 342353313321443300000100101014010203346564435434135233334221320000000347533120214264144202020214542200

Sequence:   IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
Prediction: 000001100000011100000001334446443132333438

Values range from 0 (buried residue) to 9 (highly exposed residue)

I-Tasser predicted five Models with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below. As cutoff for the C-Score, we use -1.5 as recommended by the Zhang group<ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010). Zhang et al.</ref> that is proposed to give a false-positive and a false-negative rate of about 0.1. That means that more than 90% of the quality predictions are correct. Therfore we just use Model1 for the comparison with the other methods.

Model 1 with a C-Score of -0.557

Model 2 with a C-Score of -2.539

Model 3 with a C-Score of -2.266

Model 4 with a C-Score of -2.772

Model 5 with a C-Score of -3.298

Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. A special case is 1de4 which is the transferin receptor, but in complex with the HFE protein (chain A) which is a self hit as well. The sequence in this case is also identical, but we can not give any conclusion about the 3D structure of the protein bind to the receptor. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%.

I-Tasser using templates with a sequence identity below 80% to avoid self hits.
The second run brought to our suprise the same results based also on the same self hit. We have at this point no idea what went wrong but because the self hit is just one out of five templates used to create the model, we decided to keep the best model (Model1) for the comparison with the other methods.

SwissModel

SwissModel is a server based tool provided by the SIB. It combines tools like PSI-PRED and DISOPRED for secondary structure and disordered region prediction.
The SwissModel workspace is a web-based service dedicated to protein structure homology modelling. It provide a personal working enviroment where several projects can be calculated parallel. The enviroment provide tools for template selection, model building and structure quality evaluation as well. To find suitable templates for a given target protein a library of experimental protein structures is searched<ref>http://bioinformatics.oxfordjournals.org/cgi/content/short/22/2/195</ref>.
The SwissModel repository is a database of annotated 3d protein structure models. The database consists of more than 3.4 million structures<ref>http://nar.oxfordjournals.org/content/37/suppl_1/D387.full</ref>. All models were generated from the UniProt database with the SwissModel pipeline. Form the SwissModel repository the density of the QMEAN-Score is estimated to give a dent of the model quality of the predicted model.

The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. We could just set a specific template, therefore we also run SwissModel in Alignment-Mode. So we had the chance to influence the alignment. And as one can see, the density of the QMEAN-Score and of the Automated mode and the Alignment mode are the same. Therefore the target (1a6z) and the template (1bii) are part of the same reference set. We take this as an indicator for a good template choise, because the template is in the same set as the target which is also used as a template in the Alignmet mode. Therefore we rated this as evidence for the high diversity of the MHC 1 family.

Automated Mode

predicted model

Model information: Modelled residue range: 26 to 297
Based on template: 1a6zC (2.60 Å)
Sequence Identity [%]: 100
Evalue: 7.66e-163

Quality information: QMEAN Z-Score: -1.035

Estimated absolute model quality

Estimated density of model quality

Z-Score by category

predicted error

Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one.

Alignment Mode

predicted model

Model information:
Modelled residue range: 1 to 272
Based on template: 1bii_A

Quality information:
QMEAN Z-Score: -2.065

Estimated absolute model quality

Estimated density of model quality

Z-Score by category

predicted error

TARGET    26                                 RSH SLHYLFMGAS EQDLGLSLFE
1biiA     1                                  gsh slryfvtavs rpgfgeprym                                                                     
TARGET                                       sss ssssssssss        sss
1biiA                                        sss ssssssssss        sss
TARGET    49    ALGYVDDQLF VFYDHES--R RVEPRTPWVS SRISSQMWLQ LSQSLKGWDH
1biiA     24    evgyvdntef vrfdsdaenp ryeprarwie -qegpeywer etrrakgneq                                                                      
TARGET          ssssss sss sssss        sss  hhh hh   hhhhh hhhhhhhhhh
1biiA           ssssss sss sssss        sss  hh       hhhhh hhhhhhhhhh
TARGET    97    MFTVDFWTIM ENH-NHSKES HTLQVILGCE MQEDNST-EG YWKYGYDGQD
1biiA     73    sfrvdlrtal ryynqsaggs htlqwmagcd vesdgrllrg ywqfaydgcd                                                                     
TARGET          hhhhhhhhhh hhh        ssssssssss sss sss ss sssssss ss
1biiA           hhhhhhhhhh hhh        ssssssssss sss  sssss sssssss ss
TARGET    145   HLEFCPDTLD WRAAEPRAWP TKLEWERHKI RARQNRAYLE RDCPAQLQQL
1biiA     123   yialnedlkt wtaadmaaqi trrkweqa-g aaerdrayle gecvewlrry                                                                     
TARGET          sssss    s ss      hh hhhhh       hhhhhhhhh hhhhhhhhhh
1biiA           sssss    s ss     hhh hhhhhhh     hhhhhhhhh hhhhhhhhhh
TARGET    195   LELGRGVLDQ QVPPLVKVTH HVTS-SVTTL RCRALNYYPQ NITMKWLKDK
1biiA     172   lkngnatllr tdppkahvth hrrpegdvtl rcwalgfypa ditltwqln-                                                                     
TARGET          hhh            ssssss sss   ssss ssssss       sssssss 
1biiA           hhh            ssssss sss   ssss ssssss       sssss   
TARGET    244   QPMDAKEFEP KDVLPNGDGT YQGWITLAVP PGEEQRYTCQ VEHPGLDQPL
1biiA     221   geeltqemel vetrpagdgt fqkwasvvvp lgkeqkytch veheglpepl                                                                     
TARGET                  ss s  sss   s sssssssss       sssss ss       s
1biiA                   ss s  sss   s sssssssss         sss ss       s
TARGET    294   IVIW                                                  
1biiA     271   tlrw-                                                                                                                       
TARGET          ss                                                    
1biiA           ss

As one can see, a very similar secondary structure in this alignment is shown, and also a very similar 3d structure. But one beta-sheet that is not connected to the rest of the protein is not part of the SwissModel model (chain B). The RMSD for the model is about 2.9. This is a quite good results but just the residues which are superimposed are used for the calculation. So the missing beta-sheet is not a part of the calculation.

MODELLER

MODELLER is a standalone application used for protein structure modelling by satisfying spatial restraints. These restraints derive from differenty types of information, so the model is not only based on the target-template alignemt (but it also could). MODELLER is capable of pairwise/multiple alignment, fold assignment und modeling of loops.

We downloaded and installed Modeller locally to our Windows PC and used the examples given at the Workflow homology modelling glucocerebrosidase.

Our target has been set to the FASTA sequence of HFE_HUMAN. Our standard template for the single template-target alignment has been set to chain A of 1BII, because it covers the whole sequence of the HFE_HUMAN. For the multiple sequence alignment we used additional to 1BII the protein structures 1S79 and 3P73. Both, 1S79 and 3P73 were chosen because of the relativ high sequence indentity of about 37% of 1S79 and because 3P73 is a classical MHC class I molecule with a similar function to the HFE_HUMAN protein.

Single template-target

Scripts

script_pairwise-alignment-template-target.py

from modeller import *

env = environ()
aln = alignment(env)
mdl = model(env, file='1BII.pdb', model_segment=('FIRST:A', 'END:A'))
aln.append_model(mdl, align_codes='1BII', atom_files='1BII.pdb')
aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN')
aln.align2d()
aln.check()
aln.write(file='pairwise-2d.ali', alignment_format='PIR') 
aln.align()
aln.check()
aln.write(file='pairwise.ali', alignment_format='PIR')

script_pairwise-to-model.py

from modeller import *
from modeller.automodel import *

env = environ() 
a = automodel(env,
            alnfile  = 'pairwise.ali', #file:pir:alignment
            knowns   = '1BII',               #file:pdb:template
            sequence = 'HFE_HUMAN',          #id:target
            assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1                
a.ending_model  = 1                
a.make()
b = automodel(env,
            alnfile  = 'pairwise-2d.ali', #file:pir:alignment
            knowns   = '1BII',               #file:pdb:template
            sequence = 'HFE_HUMAN',          #id:target
            assess_methods=(assess.DOPE, assess.GA341))
b.starting_model= 2                
b.ending_model  = 2                
b.make()

Alignments

We used two different alignments for Modeller, one without use of structural information at the template side:

pairwise.ali

>P1;1BII
structureX:1BII.pdb:   1 :A:+383 :P:MOL_ID  1; MOLECULE  MHC CLASS I H-2DD; CHAIN  A; FRAGMENT  HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM  DD; ENGINEERED  YES; MOL_ID  2; MOLECULE  BETA-2 MICROGLOBULIN; CHAIN  B;
ENGINEERED  YES; MOL_ID  3; MOLECULE  DECAMERIC PEPTIDE; CHAIN  P; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON  HOUSE MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; 
EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-3A; MOL_ID  2; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON 
HOUSE   MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-8C; MOL_ID  3: 2.40: 0.28
-------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR
WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI
ALNEDLKTWTAADMAAQITRRKWEQAGAAER-DRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD
VTLRCWALGFYPADITLTWQLNGEEL-TQEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL
RW/IQKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTEFTP
TETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI*

>P1;HFE_HUMAN
sequence:reference:     : :     : :::-1.00:-1.00
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDH--ESRRVEPRTP
WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNST-EGYWKYGYDGQDHL
EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV
TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV
IW-------------EPSPSGTLVI---------------------GVISGIAVFVVILFIGILFIILRKRQGSR
GAMGHYV-------LAERE-------------------*

And one with the use of structural information at the template side:

pairwise-2d.ali

>P1;1BII
structureX:1BII.pdb:   1 :A:+383 :P:MOL_ID  1; MOLECULE  MHC CLASS I H-2DD; CHAIN  A; FRAGMENT  HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM  DD; ENGINEERED  YES; MOL_ID  2; MOLECULE  BETA-2 MICROGLOBULIN; CHAIN  B;
ENGINEERED  YES; MOL_ID  3; MOLECULE  DECAMERIC PEPTIDE; CHAIN  P; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON  HOUSE MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; 
EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-3A; MOL_ID  2; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON 
HOUSE MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-8C; MOL_ID  3: 2.40: 0.28
---------------------G----SHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR
WIEQE-GPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYI
ALNEDLKTWTAADMAAQITRRKWE-QAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGD
VTLRCWALGFYPADITLTWQLNGEELT-QEMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTL
RW/I---QKTPQIQVYSRHPPENGKPNILNCYVTQFHPPHIEIQMLKNGKKIPKVEMSDMSFSKDWSFYILAHTE
FTPTETDTYACRVKHDSMAEPKTVYWDRDM/RGPGRAFVTI*

>P1;HFE_HUMAN
sequence:reference:     : :     : :::-1.00:-1.00
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYD--HESRRVEPRTP
WVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSK-ESHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHL
EFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTS-SV
TTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIV
IW-EPSPSGTLVIGVIS---------GIAVFVVILF--IGILFIILRK-RQGSRGAMGH---------YVLAERE
-----------------------------------------*

The models will be presented under model comparison, but surprisingly the model with the structural information is worse than the model without. We think Modeller has some issues to threader the sequence of HFE_HUMAN into the given structure if 1BII. Therefore, we derive the posiblility that 1a6z, which have a very similar structure to 1bii, has a different amino acid composition for this type of structure. But at the moment we have no chance to test and prove this.

Alignment: multiple template-target

Scripts

script_msa-align-templates.py

from modeller import *

env = environ()
aln = alignment(env)
for (code, chain) in (('1BII', 'A'), ('1S79', 'A'), ('3P73', 'A')):
  mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
  aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.check()
aln.write(file='MSA.ali', alignment_format='PIR')

script_msa-align-target-to-msa.py

from modeller import *

env = environ()
aln = alignment(env)
aln.append(file='MSA.ali', align_codes='all')
aln_block = len(aln)
aln.append(file='hfe_human.pir', align_codes='HFE_HUMAN')
aln.salign()
aln.check();
aln.write(file='MSA.ali', alignment_format='PIR')

script_msa-to-model.py

from modeller import *
from modeller.automodel import *

env = environ() 
a = automodel(env,
            alnfile  = 'MSA.ali', #file:pir:alignment
            knowns   = ('1BIIA', '1S79A', '3P73A'),               #file:pdb:template
            sequence = 'HFE_HUMAN',          #id:target
            assess_methods=(assess.DOPE, assess.GA341))
a.starting_model = 1
a.ending_model = 1
a.make()

Alignment

The MSA used by Modeller is:

>P1;1BIIA
structureX:1BII:1    :A:+274 :A:MOL_ID  1; MOLECULE  MHC CLASS I H-2DD; CHAIN  A; FRAGMENT  HEAVY CHAIN, EXTRACELLULAR DOMAINS; SYNONYM  DD; ENGINEERED  YES; MOL_ID  2; MOLECULE  BETA-2 MICROGLOBULIN; CHAIN  B; 
ENGINEERED  YES; MOL_ID  3; MOLECULE  DECAMERIC PEPTIDE; CHAIN  P; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON  HOUSE MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; 
EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-3A; MOL_ID  2; ORGANISM_SCIENTIFIC  MUS MUSCULUS; ORGANISM_COMMON  HOUSE 
MOUSE; ORGANISM_TAXID  10090; CELL_LINE  BL21; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21 (DE3) PLYSS; EXPRESSION_SYSTEM_PLASMID  PET-8C; MOL_ID  3: 2.40: 0.28
-------------------------GSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRAR
WIEQEGPEYWERETRRAKGNEQSFRVDLRTALRYYNQSAGGSHTLQWMAGCDVESDGRLLRGYWQFAYDGCDYIA
LNEDLKTWTAADMAAQITRRKWEQAGAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRRPEGDVT
LRCWALGFYPADITLTWQLNGEELTQ-EMELVETRPAGDGTFQKWASVVVPLGKEQKYTCHVEHEGLPEPLTLRW
---------------------------------------------------*

>P1;1S79A
structureX:1S79:100  :A:+103 :A:MOL_ID  1; MOLECULE  LUPUS LA PROTEIN; CHAIN  A; FRAGMENT  CENTRAL RRM; SYNONYM  SJOGREN SYNDROME TYPE B ANTIGEN, SS-B, LA RIBONUCLEOPROTEIN, LA AUTOANTIGEN; ENGINEERED  YES:MOL_ID
1; ORGANISM_SCIENTIFIC  HOMO SAPIENS; ORGANISM_COMMON  HUMAN; ORGANISM_TAXID  9606; GENE  SSB; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  BL21(DE3)PLYSS; 
EXPRESSION_SYSTEM_VECTOR  PET28:-1.00:-1.00
-------------------------GRWILKNDVKNRSVYIKGFPTDATLDDIK---------------------
---------------------------------------------------------------------------
----------------------------------------EWLEDKGQVLNIQMRRT------------------
--------------LHKAFKGSIFVV-FDSIESAKKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE---
---------------------------------------------------*

>P1;3P73A
structureX:3P73:-1   :A:+275 :A:MOL_ID  1; MOLECULE  MHC RFP-Y CLASS I ALPHA CHAIN; CHAIN  A; FRAGMENT  UNP RESIDUES 20-294; ENGINEERED  YES; MOL_ID  2; MOLECULE  BETA-2-MICROGLOBULIN; CHAIN  B; ENGINEERED 
YES:MOL_ID  1; ORGANISM_SCIENTIFIC  GALLUS GALLUS; ORGANISM_COMMON  BANTAM,CHICKENS; ORGANISM_TAXID  9031; GENE  YFV; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  
TB1; EXPRESSION_SYSTEM_VECTOR_TYPE  PLASMID; EXPRESSION_SYSTEM_PLASMID  PMAL-P4X; MOL_ID  2; ORGANISM_SCIENTIFIC  GALLUS GALLUS; ORGANISM_COMMON  BANTAM,CHICKENS; ORGANISM_TAXID  9031; GENE  B2M; EXPRESSION_SYSTEM 
ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; EXPRESSION_SYSTEM_STRAIN  TB1; EXPRESSION_SYSTEM_VECTOR_TYPE  PLASMID; EXPRESSION_SYSTEM_PLASMID  PMAL-P4X: 1.32: 0.16
-----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKSRTA--QPIVE
MLPQEDQEHWDTQTQKAQGGERDFDWNLNRLPERYNKSKG-SHTMQMMFGCDILEDGS-IRGYDQYAFDGRDFLA
FDMDTMTFTAADPVAEITKRRWETEGTYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEADGILT
LSCHAHGFYPRPITISWMKDGMVRDQ-ETRWGGIVPNSDGTYHASAAIDVLPEDGDKYWCRVEHASLPQPGLFSW
EP------------------------------------------------Q*

>P1;HFE_HUMAN
sequence:reference:     : :     : :::-1.00:-1.00
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVE-PRTPW
VSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKE-SHTLQVILGCEMQEDNS-TEGYWKYGYDGQDHLE
FCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTT
LRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW
EPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE*

Model editing

We tried to edit the single-template model of MODELLER, because it is one of our best models. As we looked at our alignment with Jalview 2.6 (picture below), we noticed that the alignment is already very well defined and changes will only lead to worse results. The conversation is at average around 7 to 8 and the quality around 5 to 6.

visualization of the single-template model of MODELLER done by Jalview

The hydrophobic groups are also very well aligned, so we decided to leave that model as it is, because there is nothing to edit. Only the end of the alignment has much gaps, but shifiting the gaps would result in a break of the conserved block in the middle of the alignment.

The only difference between the see-supported model and the single-template model are the different aligned residues. These result from the information about the secondary structure of the template incorporated into the model and thus we will not edit them.

visualization of the see-supported model of MODELLER done by Jalview

It is hard to edit the msa model because of the multiple alignments between the different sequences. We tried changing some aligned groups to different position inside the sequence alignment, but were not able to manage the corresponding alignment at the other sequences. After some unfruitful tempts we decided to leave also that alignment as it is.

visualization of the multi-template model of MODELLER done by Jalview

In a summary, we have not edited any alignment successful because there was nothing to edit or it was too complicated and introduced too much errors.

Model comparison

3D-Jigsaw

We had serveral issues with the executiion of 3D-Jigsaw, like strange error messages and non accepting of our input. Finally we got it to work with the following instruction:

Server: http://bmm.cancerresearchuk.org/~populus/populus_submit.html
Mode: upload
sequence box: FASTA sequence of [1A6Z_A]
own models: one pdb file containing all of our models (except the SwissModel selfhit) seperated by the 'TER' command
predicted runtime: 18.33 hours

Result

Name	Data
Length	_________10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270___
AA	RLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIW
Prediction	CCCCEEEEEEEEEECCCCCCCCCCEEEEEEECCCCEEEECCCCCCCCCHHHHHHCCCCCHHHHHHHHHHHHHCCHHHHHHHHHHHHHCCCCCCEEEECCCCCEECCCCCCCCEEECCCCCCEEEEECHHHHHHCCCCCHHHCCHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCEEEEEEEEECCEEEEEEEECCCCCCCEEEEEEECCEECCHHHEEEEEEECCCCCCCCCEEEEEECCCCCCCEEEEEEECCCCCCEEEEC
Confidence	93303453556763258999987358999894885499848988867403652146670222210276553000136609989986528798168852425544699858524402146712899971352101652001125776224548999999999998999999999987887332589869999951389499999762611761499996677566832279987550889983003699826988533699999504888767859
Disorder	DDDDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODOOOOOOOOOODDDDDDDDDDOOOODDDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODDOOODDDDDOOOOOOOOOOOOOOOOOOOOOOOOOOOODDOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODDDOOOOOOOOOOOODOOOOOOOOOOOOOOOOOOOOOOOOOOOOOODOOODD

3D-Jigsaw gave us also information about the predicted secondary structure and the ordered and disoredered regions. It used that information to successfully optimize all of our submitted models. All optimized models have an energy around ~ -340 and an coverage of 0.99, which is really good. Their date and pictures are visible at the table below.

Model evaluation

After trying serveral tools (RasWin, JMol, SwissPDB-Viewer), we decided to use PyMol for superimposing and displaying the model-target alignment of the proteins. We truncated the original HFE_HUMAN protein (pdbid: 1a6z) at chain C, thus we used only chain A and B for displaying. The original HFE_HUMAN is always shown in green and the model in red.

We created our models using PyMol by:

load '1A6Z_AB.pdb' into PyMol (alternatively: command 'fetch 1A6Z' and then hide chain C and D)
hide everything
show cartoon
color red
load 'model.pdb' into PyMol
hide everything
show cartoon
color green
align 'model.pdb' to '1A6Z_AB.pdb'
command 'ray' (nicer output!)
save the image

For evaluating our models with the RMSD and TMScore we used TMalign. We were advised to use SAP for the RMSD and TMScore for the TMScore but TMScore failed because our target is the sequence of the HFE_HUMAN from UniProt and therefore longer than the '1BII' template. This causes a problem with TMScore because it needs pdbs with same length and the thus the superimposing of TMScore does not really work.

TMalign is able to use pdbs with different length and the scores are normalized by the second structure. We use '1A6Z' as second structure to create comparable scores of all our models. The modeling of HFE_HUMAN is very difficult because it is a multi domain protein. All the methodes do not support a multi domain modeling.

TMalign can be found at the website of the Zhang-Lab.

Picture	Model	RMSD	TM-Score
Modeller: superimposed, green:1a6z, red:model(1BII)	Modeller: superimpose, template:1BII	2.58	0.86468
Modeller: sse-support, superimposed, green:1a6z, red:model(1BII)	Modeller: superimpose, see-support, template:1BII	3.42	0.59586
Modeller: msa, superimposed, green:1a6z, red:model(1BII,1S79,3P73)	Modeller: superimpose, msa, template:1BII,1S79,3P73	2.05	0.89042
I_Tasser: superimposed, green:1a6z, red:model	I-Tasser	1.61	0.93760
SwissModel: superimposed, green:1a6z, red:model	SwissModel	2.67	0.85048
SwissModel: self-hit, superimposed, green:1a6z, red:model	SwissModel self	0.08	0.99984

As one can clearly see, the I-Tasser model is the best with an TM-Score ~0.94 followed by the MSA model of MODELLER with an TM-Score of ~0.89 and the SwissModel with an TM-Score of ~0.85.

The worst model is the secondary structure supported information at the template site model of MODELLER with an TM-Score of ~0.6. We are sure, that the low sequence identity and secondary structure similarity of only 22% affected this model the bad way, because the normal model is also based on the same template and achieves an significantly higher TM-Score.

All of our models are really good, except for the sse-supported model of MODELLER.

Discussion

For the I-Tasser protocol, it is not possible to choose a specific template, so we run I-Tasser twice, first with standard parameter, and one with a similarity threshold of 80%. In the second case, we got a model also based on a self hit. We repeated the prediction a third time with the same result. We did not find out for what reason the given threshold was ignored.

Our attempts to get homolougs at all given categories (>60%, >40%, >20%) was not successful, because HHSearch was not able to list machting ones. Doing a Blast search against the NR-Database failed too and resulted only in proteins with 40% or less sequence identity. Thus we come to the conclusion, that the HFE family must have a very high diversity of the sequence.

All of our models are only modeled by using chain A and never chain B. We can not fix that problem, because according to PDB, there is no contact between chain A and chain B (it is also clearly visible if you look closely at the pictures oabove and search at the left for the plain betasheets in red).

The templates which we had choosen from the HHSearch were used to cover the whole protein sequence and give a special coverage of the transmembrane region. But as we saw later the tools do not support multiple sequence alignments. Therefore we decided to use '1BII' as template for SwissModel and Modeller because it covers the sequence completely and with a sequence similarity of 22% it is in the lower midrange of the HHSearch results. '1S79' has with 37% more sequence identity but also a very worse conservation with HFE_HUMAN. We decided to rank coverage of the whole sequence higher than the sequence identity.

Extra diligence task

We were not able to perform the task of calculating the RMSD of all atoms inside an 6 Angström treshold of the catalytic core, because there is no one defined at UniProt:Q30201(HFE_HUMAN) and also not at PDB:1A6Z.

References

@@ Line 520: / Line 520: @@
 |-
 |}
+D-Jigsaw gave us also information about the predicted secondary structure and the ordered and disoredered regions. It used that information to successfully optimize all of our submitted models. All optimized models have an energy around ~ -340 and an coverage of 0.99, which is really good. Their date and pictures are visible at the table below.
 ===Model evaluation===

Difference between revisions of "Homology based structure predictions"

Revision as of 17:38, 20 June 2011

Contents

Homologous

I-Tasser

SwissModel

Automated Mode

Alignment Mode

MODELLER

Single template-target

Scripts

Alignments

Alignment: multiple template-target

Scripts

Alignment

Model editing

Model comparison

3D-Jigsaw

Result

Model evaluation

Discussion

Extra diligence task

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools