Homology based structure predictions

From Bioinformatikpedia
Revision as of 13:19, 9 June 2011 by Landerer (talk | contribs) (Homologous)

Homologous

Because we found no homologous structures in Task 2, we extended our list by using HHSearch.

We will use 13 proteins for creating a multiple alignment for homologous modeling. We choosen sequences to cover the whole protein and we payed specific attention on the transmembrane region.


PDB-ID Identity Description
1s79 37% Kram
3p73 28% Kram
1kcg 22% Kram
1jfm 14% Kram
1bii 22% Kram
2p24 21% Kram
1cd1 21% Kram
2wy3 29% Kram
1lqv 14% Kram
3jts 25% Kram
1ow0 22% Kram
1hxm 18% Kram

With these Sequences inclcuding 1a6z, we did a multible algniment with t-coffee(EXPRESSO). This mutlible alignment is later used in the Alignment Mode of SwissModel and Modeller.

 DSSP                                   --EEEEEEEEEEB-SS-SSB--EEE

Q30201 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEAL 1S79_A -------------------------------------------------- 3P73_A -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIV 1KCG_C -------------------------DAHSLWYNFTIIHLPRHGQQWCEVQ 1JFM_A -------------------------DAHSLRCNLTIKDPTPADPLWYEAK 1BII_A -MGAMAPRTLLLLLAAALGPTQTRAGSHSLRYFVTAVSRPGFGEPRYMEV 2P24_A -------------------------------------------------- 1CD1_A -----------------------QQKNYTFRCLQMSSFANR-SWSRTDSV 2WY3_A ------------------------MEPHSLRYNLMVLSQDESVQSGFLAE 1LQV_A -------------------SQDASDGLQRLHMLQISYFR-DPYHVWYQGN 3JTS_A -------------------------GSHSMRYFYTSMSRPGRWEPRFIAV 1OW0_A -------------------------------------------------- 1HXM_A --------------------------------------------------

 DSSP          EEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH

Q30201 GYVDDQLFVFYDHES--RRVE-PRTPWVSSRISSQMWLQLSQSLKGWDHM 1S79_A ------------------GRW-IL-KNDVKNRSVYIKGFPTDATLDDIKE 3P73_A GYVDDKIFGTYNSKS--RTAQ-PIVEML-PQEDQEHWDTQTQKAQGGERD 1KCG_C SQVDQKNFLSYDCGS--DKVLSMGHL-EEQLYATDAWGKQLEMLREVGQR 1JFM_A CFVGEILILHLSNIN--KTMT-SG-DPGETANATEVKKCLTQPLKNLCQK 1BII_A GYVDNTEFVRFDSDAENPRYE-PRARWIE-QEGPEYWERETRRAKGNEQS 2P24_A --------------------------M----AIMAPRTLVLLLSGALALT 1CD1_A VWLGDLQTHRWSNDS--ATIS-FTKPWSQGKLSNQQWEKLQHMFQVYRVS 2WY3_A GHLDGQPFLRYDRQK--RRAK-PQGQWAEDVLGAETWDTETEDLTENGQD 1LQV_A ASLGGHLTHVLEGPDTNTTII-QLQPL----QEPESWARTQSGLQSYLLQ 3JTS_A GYVDDTQFVRFDSDAASQRME-PRAPWVE-QEGPEYWDRETRNMKAETQN 1OW0_A -------------------------------------------------- 1HXM_A -----------------------------------AIELVPEHQTVPVSI

 DSSP          HHHHHHHHTTT-SSS--E--------EEEEEE-EEE-TTS-E-EEE-E--

Q30201 FTVDFWTIMENHN-HSKE--------SHTLQV-ILGCEMQED-NST-E-- 1S79_A WLEDKGQV-LNIQMRRTL--------HKAFKG-SIFVVFDSI-ESA-KKF 3P73_A FDWNLNRLPERYN-KSKG--------SHTMQM-MFGCDILED-GSI-R-- 1KCG_C LRLELADT---------ELEDFTPSGPLTLQV-RMSCECEAD-GYI-R-- 1JFM_A LRNKVSNT-KVDTHKTNG--------YPHLQV-TMIYPQSQG-RTP-S-- 1BII_A FRVDLRTALRYYNQSAGG--------SHTLQW-MAGCDVESD-GRLLR-- 2P24_A QTWAGSHSRGEDD--IEA--------DHVGSYGIVVYQSP----GD-I-- 1CD1_A FTRDIQELVKMMSPKEDY--------PIEIQL-SAGCEMYPG-NAS-E-- 2WY3_A LRRTLTHI----KDQKGG--------LHSLQE-IRVCEIHED-SST-R-- 1LQV_A FHGLVRLVHQERT--LAF--------PLTIRC-FLGCELPPEGSRA-H-- 3JTS_A APVNLRNLRGYYNQSEAG--------SHTIQR-MYGCDLGPD-GRLLR-- 1OW0_A -----ACHPRLSLHRPAL--------EDLLLG-SEANLTCTL-TGLRD-- 1HXM_A GVPATLRCSMKGEAIGNY--------YINWYR-KTQGNTMTF-IYRE---

 DSSP          ----------EEEETTEE----------------EEEEEGGGTEEEES--

Q30201 ----------GYWKYGYD----------------GQDHLEFCPDTLDW-- 1S79_A VETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE--------------- 3P73_A ----------GYDQYAFD----------------GRDFLAFDMDTMTF-- 1KCG_C ----------GSWQFSFD----------------GRKFLLFDSNNRKW-- 1JFM_A ----------ATWEFNIS----------------DSYFFTFYTENMSW-- 1BII_A ----------GYWQFAYD----------------GCDYIALNEDLKTW-- 2P24_A ----------GQYTFEFD----------------GDELFYVDLDKKET-- 1CD1_A ----------SFLHVAFQ----------------GKYVVRFWG--TSWQT 2WY3_A ----------GSRHFYYN----------------GELFLSQNLETQES-- 1LQV_A ----------VFFEVAVN----------------GSSFVSFRPERALW-- 3JTS_A ----------GYHQSAYD----------------GKDYIALNEDLRSW-- 1OW0_A ----------ASGVTFTW----------------TPSSGKSAV--QGPPE 1HXM_A ----------KDIYGPGF----------------KDNFQGDIDIAKNL--

 DSSP          SGG-G----HHH-HHHHHSSTHHH--HHHHHHHHTHHHHHHHHHHHHHTT

Q30201 RAA-E----PRA-WPTKLEWERHK--IRARQNRAYLERDCPAQLQQLLEL 1S79_A -------------------------------------------------- 3P73_A TAA-D----PVA-EITKRRWETEG--TYAERWKHELGTVCVQNLRRYLEH 1KCG_C TVV-H----AGA-RRMKEKWEKDS--GLTTFFKMVSMRDCKSWLRDFLMH 1JFM_A RSA-N----DES-GVIMNKWKDDG--EFVKQLKFLI-HECSQKMDEFLKQ 1BII_A TAA-D----MAA-QITRRKWEQA---GAAERDRAYLEGECVEWLRRYLKN 2P24_A IWM-------------LPEFAQLR--SFDPQGGLQNIATGKHNLGVLTKR 1CD1_A VPGAP----SWL-DLPIKVLNADQ--GTSATVQMLLNDTCPLFVRGLLEA 2WY3_A TVP-QSSRAQTLAMNVTNFW-KEDAMKTKTHYRAMQ-ADCLQKLQRYLKS 1LQV_A QAD-TQVTSGVV-TFTLQQLNAYN--RTRYELREFLEDTCVQYVQKHISA 3JTS_A TAA-D----MAA-QNTQRKWEAA---GEAEQHRTYLEGECLEWLRRYLEN 1OW0_A R--DL----CGC-YSVSSVLPGCA--EPWNHGKTFTCTAAYPESKTPLTA 1HXM_A AVL-K----ILA-PSERDEGSYYC--ACDTLGMGGEYTDKLIFGKGTRVT

 DSSP          TSS--B--EEEEEEEE-SS-----E-EEEEEEEEEBSS--EEEEEETTEE  

Q30201 GRGVLDQQVPPLVKVTHHVT----S-SVTTLRCRALNYYPQNITMKWLKD 1S79_A -------------------------------------------------- 3P73_A GKAALKRRVQPEVRVWGKEA----D-GILTLSCHAHGFYPRPITISWMKD 1KCG_C RKKRLE-------------------------------------------- 1JFM_A SKEK---------------------------------------------- 1BII_A GNATLLRTDPPKAHVTHHRR----PEGDVTLRCWALGFYPADITLTWQLN 2P24_A SNSTPATNEAPQATVFPKSP--VLLGQPNTLICFVDNIFPPVINITWLRN 1CD1_A GKSDLEKQEKPVAWLSSVP---SSAHGHRQLVCHVSGFYPKPVWVMWMRG 2WY3_A GVAIRRTVPPMVNVTCSEVS----EGNITVTCRASSFYPRNITLTWRQDG 1LQV_A ENTKGSQTSRSYTS------------------------------------ 3JTS_A GKETLQRADPPKTHVTHHPV----SDQEATLRCWALGFYPAEITLTWQRD 1OW0_A TLSKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQG 1HXM_A VEPRSQPHTKPSVFVMKNG---------TNVACLVKEFYPKDIRINLVSS

 DSSP          --GGGS---EEEE-TTS-E----EEEEEEEE-TTGGGGEE---EEEE-TT

Q30201 K-QPMDAKEFEPKDVLPNG----DGTYQGWITLAVPPGEE---QRYTCQV 1S79_A -------------------------------------------------- 3P73_A --GMVRDQETRWGGIVPNS----DGTYHASAAIDVLPEDG---DKYWCRV 1KCG_C -------------------------------------------------- 1JFM_A -------------------------------------------------- 1BII_A --GEELTQEMELVETRPAG----DGTFQKWASVVVPLGKE---QKYTCHV 2P24_A --SKSVADGVYETSFFVNR----DYSFHKLSYLTFIPSDD---DIYDCKV 1CD1_A --DQ-EQQGTHRGDFLPNA----DETWYLQATLDVEAGEE---AGLACRV 2WY3_A --VSLSHNTQQWGDVLPDG----NGTYQTWVATRIRQGEE---QRFTCYM 1LQV_A -------------------------------------------------- 3JTS_A --GEDQTQDTELVETRPAG----DGTFQKWAAVVVPSGKE---QRYTCHV 1OW0_A SQEL-PREKYLTW-ASRQEPSQGTTTFAVTSILRVAAEDWKKGDTFSCMV 1HXM_A -----KKITEFDPAIVISP----SGKYNAVKLGKYE--DS---NSVTCSV

 DSSP          SSS-EEE-E-

Q30201 EHPGLDQ-PLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQ 1S79_A -------------------------------------------------- 3P73_A EHASLPQ-PGLFSWEPQ--------------------------------- 1KCG_C -------------------------------------------------- 1JFM_A -------------------------------------------------- 1BII_A EHEGLPE-PLTLRWGKEEPPSSTKTNTVIIAVPVVLGAVVILGAVMAFVM 2P24_A EHWGLEE-PVLKHWEPEIPAPMSELTETSGSRLEVLFQ------------ 1CD1_A KHSSLGG-QDIILYWDARQAPVGLIVFIVLIMLVVVGAVVYYIWRRRSAY 2WY3_A EHSGNHG-THPVPSGKVLVLQSQRTDFPYVSAAMPCFVIIIILCVPCCKK 1LQV_A -------------------------------------------------- 3JTS_A QHEGLRE-PLTLRWEP---------------------------------- 1OW0_A GHEALPLAFTQKTIDRLAGK------------------------------ 1HXM_A QHDNK---TVHSTDFEVKTDSTDHVKPKETENTKQPSKS-----------

' DSSP Q30201 GSRGAMGHYVLAERE---------------- 1S79_A ------------------------------- 3P73_A ------------------------------- 1KCG_C ------------------------------- 1JFM_A ------------------------------- 1BII_A KRRRNTGGKGGDYALAPGSQSSDMSLPDCKV 2P24_A ------------------------------- 1CD1_A QDIR--------------------------- 2WY3_A KTSAAEGP----------------------- 1LQV_A ------------------------------- 3JTS_A ------------------------------- 1OW0_A ------------------------------- 1HXM_A -------------------------------

ITasser

Predicted Secondary Structure by I-Tasser

Sequence:   MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF
Predicted:  CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCSSSSSCCCCCCCCCCSSSSSSSCCCSSSSCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH
Conf-Score: 985028899999999899875122045421036641367999985269985643743686068998778788540145583478888887676654315558

Sequence:   WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ
Predicted:  HHHHHHHCCCCCCSSSSSSSCCCCCCCCCCCCCCCCCCCCCCSSSSCCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHC
Conf-Score: 888755315777644463525565898763541000558873365263022202455666677878887004598888767064299999999747666642

Sequence:   QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV
Predicted:  CCCCCCCCCCCCCCCHHHHCHHHHCCCCCCSSSSSSSCCCCCCCCCCSSSSCCCCCCCCCCCSSSSSCCCCCCCCSSSSCCCCCCCCCSSSSCCCCCCCCCC
Conf-Score: 599877567699854442101541541332479864358754456553541024888652112699807986310267512589998726840688766531

Sequence:   IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC
Conf-Score: 010211112222100246665443013678898651020169

Secondary structure elements are shown as H for Alpha helix,S for Beta sheet & C for Coil

Predicted Solvent Accessibility by I-Tasser

Sequence:   MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF
Prediction: 723312000000000101112222011200120120023333331200000102322003123724434241311436413610352044144313323230

Sequence:   WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ
Prediction: 220132133351310001010021136231211333023032003016303403102321432433044143404422010333005103400630351154

Sequence:   QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV
Prediction: 342353313321443300000100101014010203346564435434135233334221320000000347533120214264144202020214542200

Sequence:   IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
Prediction: 000001100000011100000001334446443132333438

Values range from 0 (buried residue) to 9 (highly exposed residue)

I-Tasser predicted five Models with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below.

Model 1 with a C-Score of -0.557
Model 2 with a C-Score of -2.539
Model 3 with a C-Score of -2.266
Model 4 with a C-Score of -2.772
Model 5 with a C-Score of -3.298

Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 10 Templates found on PDB which are:

SwissModel

SwissProt is a server based tool provided by the SIB. It combines tools like PSI-PRED and DISOPRED for secondary structure and disordered region prediction.


The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. Therefore we also run SwissModel in Alignment-Mode.(TODO)

Automated Mode

predicted model


Model information: Modelled residue range: 26 to 297
Based on template: 1a6zC (2.60 Å)
Sequence Identity [%]: 100
Evalue: 7.66e-163

Quality information: QMEAN Z-Score: -1.035


Estimated absolute model quality
Estimated density of model quality
Z-Score by category
predicted error

Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one.

Alignment Mode

Modeller

References