Difference between revisions of "Homology based structure predictions"
(→ITasser) |
(→ITasser) |
||
Line 133: | Line 133: | ||
Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE |
Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE |
||
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC |
Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC |
||
− | Conf-Score: 010211112222100246665443013678898651020169 |
+ | Conf-Score: 010211112222100246665443013678898651020169>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1 |
+ | MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF |
||
+ | YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV |
||
+ | ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR |
||
+ | AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL |
||
+ | KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS |
||
+ | PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE |
||
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet & C for Coil |
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet & C for Coil |
||
Line 160: | Line 166: | ||
|} |
|} |
||
− | Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. |
+ | Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%. |
== SwissModel == |
== SwissModel == |
Revision as of 13:46, 9 June 2011
Contents
Homologous
Because we found no homologous structures in Task 2, we extended our list by using HHSearch.
HHSearch found just sequences with an indentity below 40% therefore we will use the 12 proteins shown below for creating a multiple alignment for homologous modeling. We choose sequences to cover the whole protein and we pay specific attention on the transmembrane region.
PDB-ID | Identity | Description |
1s79 | 37% | Kram |
3p73 | 28% | Kram |
1kcg | 22% | Kram |
1jfm | 14% | Kram |
1bii | 22% | Kram |
2p24 | 21% | Kram |
1cd1 | 21% | Kram |
2wy3 | 29% | Kram |
1lqv | 14% | Kram |
3jts | 25% | Kram |
1ow0 | 22% | Kram |
1hxm | 18% | Kram |
With these Sequences including the HFE-Gen(Q30201), we did a multible sequence alignment with t-coffee(EXPRESSO). This mutlible sequence alignment is later used in the Alignment Mode of SwissModel and Modeller.
DSSP --EEEEEEEEEEB-SS-SSB--EEEEEETTEEEEEEESSS--EEE--STTS-SSTTTTHHHHHHHHHHHHHHHHH Q30201 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHES--RRVE-PRTPWVSSRISSQMWLQLSQSLKGWDHM 1S79_A --------------------------------------------------------------------GRW-IL-KNDVKNRSVYIKGFPTDATLDDIKE 3P73_A -----------------------EFGSHSLRYFLTGMTDPGPGMPRFVIVGYVDDKIFGTYNSKS--RTAQ-PIVEML-PQEDQEHWDTQTQKAQGGERD 1KCG_C -------------------------DAHSLWYNFTIIHLPRHGQQWCEVQSQVDQKNFLSYDCGS--DKVLSMGHL-EEQLYATDAWGKQLEMLREVGQR 1JFM_A -------------------------DAHSLRCNLTIKDPTPADPLWYEAKCFVGEILILHLSNIN--KTMT-SG-DPGETANATEVKKCLTQPLKNLCQK 1BII_A -MGAMAPRTLLLLLAAALGPTQTRAGSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYE-PRARWIE-QEGPEYWERETRRAKGNEQS 2P24_A ----------------------------------------------------------------------------M----AIMAPRTLVLLLSGALALT 1CD1_A -----------------------QQKNYTFRCLQMSSFANR-SWSRTDSVVWLGDLQTHRWSNDS--ATIS-FTKPWSQGKLSNQQWEKLQHMFQVYRVS 2WY3_A ------------------------MEPHSLRYNLMVLSQDESVQSGFLAEGHLDGQPFLRYDRQK--RRAK-PQGQWAEDVLGAETWDTETEDLTENGQD 1LQV_A -------------------SQDASDGLQRLHMLQISYFR-DPYHVWYQGNASLGGHLTHVLEGPDTNTTII-QLQPL----QEPESWARTQSGLQSYLLQ 3JTS_A -------------------------GSHSMRYFYTSMSRPGRWEPRFIAVGYVDDTQFVRFDSDAASQRME-PRAPWVE-QEGPEYWDRETRNMKAETQN 1OW0_A ---------------------------------------------------------------------------------------------------- 1HXM_A -------------------------------------------------------------------------------------AIELVPEHQTVPVSI DSSP HHHHHHHHTTT-SSS--E--------EEEEEE-EEE-TTS-E-EEE-E------------EEEETTEE----------------EEEEEGGGTEEEES-- Q30201 FTVDFWTIMENHN-HSKE--------SHTLQV-ILGCEMQED-NST-E------------GYWKYGYD----------------GQDHLEFCPDTLDW-- 1S79_A WLEDKGQV-LNIQMRRTL--------HKAFKG-SIFVVFDSI-ESA-KKFVETPGQKYKETDLLILFKDDYFAKKNEERKQNKVE--------------- 3P73_A FDWNLNRLPERYN-KSKG--------SHTMQM-MFGCDILED-GSI-R------------GYDQYAFD----------------GRDFLAFDMDTMTF-- 1KCG_C LRLELADT---------ELEDFTPSGPLTLQV-RMSCECEAD-GYI-R------------GSWQFSFD----------------GRKFLLFDSNNRKW-- 1JFM_A LRNKVSNT-KVDTHKTNG--------YPHLQV-TMIYPQSQG-RTP-S------------ATWEFNIS----------------DSYFFTFYTENMSW-- 1BII_A FRVDLRTALRYYNQSAGG--------SHTLQW-MAGCDVESD-GRLLR------------GYWQFAYD----------------GCDYIALNEDLKTW-- 2P24_A QTWAGSHSRGEDD--IEA--------DHVGSYGIVVYQSP----GD-I------------GQYTFEFD----------------GDELFYVDLDKKET-- 1CD1_A FTRDIQELVKMMSPKEDY--------PIEIQL-SAGCEMYPG-NAS-E------------SFLHVAFQ----------------GKYVVRFWG--TSWQT 2WY3_A LRRTLTHI----KDQKGG--------LHSLQE-IRVCEIHED-SST-R------------GSRHFYYN----------------GELFLSQNLETQES-- 1LQV_A FHGLVRLVHQERT--LAF--------PLTIRC-FLGCELPPEGSRA-H------------VFFEVAVN----------------GSSFVSFRPERALW-- 3JTS_A APVNLRNLRGYYNQSEAG--------SHTIQR-MYGCDLGPD-GRLLR------------GYHQSAYD----------------GKDYIALNEDLRSW-- 1OW0_A -----ACHPRLSLHRPAL--------EDLLLG-SEANLTCTL-TGLRD------------ASGVTFTW----------------TPSSGKSAV--QGPPE 1HXM_A GVPATLRCSMKGEAIGNY--------YINWYR-KTQGNTMTF-IYRE-------------KDIYGPGF----------------KDNFQGDIDIAKNL-- DSSP SGG-G----HHH-HHHHHSSTHHH--HHHHHHHHTHHHHHHHHHHHHHTTTSS--B--EEEEEEEE-SS-----E-EEEEEEEEEBSS--EEEEEETTEE Q30201 RAA-E----PRA-WPTKLEWERHK--IRARQNRAYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVT----S-SVTTLRCRALNYYPQNITMKWLKD 1S79_A ---------------------------------------------------------------------------------------------------- 3P73_A TAA-D----PVA-EITKRRWETEG--TYAERWKHELGTVCVQNLRRYLEHGKAALKRRVQPEVRVWGKEA----D-GILTLSCHAHGFYPRPITISWMKD 1KCG_C TVV-H----AGA-RRMKEKWEKDS--GLTTFFKMVSMRDCKSWLRDFLMHRKKRLE-------------------------------------------- 1JFM_A RSA-N----DES-GVIMNKWKDDG--EFVKQLKFLI-HECSQKMDEFLKQSKEK---------------------------------------------- 1BII_A TAA-D----MAA-QITRRKWEQA---GAAERDRAYLEGECVEWLRRYLKNGNATLLRTDPPKAHVTHHRR----PEGDVTLRCWALGFYPADITLTWQLN 2P24_A IWM-------------LPEFAQLR--SFDPQGGLQNIATGKHNLGVLTKRSNSTPATNEAPQATVFPKSP--VLLGQPNTLICFVDNIFPPVINITWLRN 1CD1_A VPGAP----SWL-DLPIKVLNADQ--GTSATVQMLLNDTCPLFVRGLLEAGKSDLEKQEKPVAWLSSVP---SSAHGHRQLVCHVSGFYPKPVWVMWMRG 2WY3_A TVP-QSSRAQTLAMNVTNFW-KEDAMKTKTHYRAMQ-ADCLQKLQRYLKSGVAIRRTVPPMVNVTCSEVS----EGNITVTCRASSFYPRNITLTWRQDG 1LQV_A QAD-TQVTSGVV-TFTLQQLNAYN--RTRYELREFLEDTCVQYVQKHISAENTKGSQTSRSYTS------------------------------------ 3JTS_A TAA-D----MAA-QNTQRKWEAA---GEAEQHRTYLEGECLEWLRRYLENGKETLQRADPPKTHVTHHPV----SDQEATLRCWALGFYPAEITLTWQRD 1OW0_A R--DL----CGC-YSVSSVLPGCA--EPWNHGKTFTCTAAYPESKTPLTATLSKSGNTFRPEVHLLPPPSEELALNELVTLTCLARGFSPKDVLVRWLQG 1HXM_A AVL-K----ILA-PSERDEGSYYC--ACDTLGMGGEYTDKLIFGKGTRVTVEPRSQPHTKPSVFVMKNG---------TNVACLVKEFYPKDIRINLVSS DSSP --GGGS---EEEE-TTS-E----EEEEEEEE-TTGGGGEE---EEEE-TTSSS-EEE-E- Q30201 K-QPMDAKEFEPKDVLPNG----DGTYQGWITLAVPPGEE---QRYTCQVEHPGLDQ-PLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQ 1S79_A ---------------------------------------------------------------------------------------------------- 3P73_A --GMVRDQETRWGGIVPNS----DGTYHASAAIDVLPEDG---DKYWCRVEHASLPQ-PGLFSWEPQ--------------------------------- 1KCG_C ---------------------------------------------------------------------------------------------------- 1JFM_A ---------------------------------------------------------------------------------------------------- 1BII_A --GEELTQEMELVETRPAG----DGTFQKWASVVVPLGKE---QKYTCHVEHEGLPE-PLTLRWGKEEPPSSTKTNTVIIAVPVVLGAVVILGAVMAFVM 2P24_A --SKSVADGVYETSFFVNR----DYSFHKLSYLTFIPSDD---DIYDCKVEHWGLEE-PVLKHWEPEIPAPMSELTETSGSRLEVLFQ------------ 1CD1_A --DQ-EQQGTHRGDFLPNA----DETWYLQATLDVEAGEE---AGLACRVKHSSLGG-QDIILYWDARQAPVGLIVFIVLIMLVVVGAVVYYIWRRRSAY 2WY3_A --VSLSHNTQQWGDVLPDG----NGTYQTWVATRIRQGEE---QRFTCYMEHSGNHG-THPVPSGKVLVLQSQRTDFPYVSAAMPCFVIIIILCVPCCKK 1LQV_A ---------------------------------------------------------------------------------------------------- 3JTS_A --GEDQTQDTELVETRPAG----DGTFQKWAAVVVPSGKE---QRYTCHVQHEGLRE-PLTLRWEP---------------------------------- 1OW0_A SQEL-PREKYLTW-ASRQEPSQGTTTFAVTSILRVAAEDWKKGDTFSCMVGHEALPLAFTQKTIDRLAGK------------------------------ 1HXM_A -----KKITEFDPAIVISP----SGKYNAVKLGKYE--DS---NSVTCSVQHDNK---TVHSTDFEVKTDSTDHVKPKETENTKQPSKS----------- DSSP Q30201 GSRGAMGHYVLAERE---------------- 1S79_A ------------------------------- 3P73_A ------------------------------- 1KCG_C ------------------------------- 1JFM_A ------------------------------- 1BII_A KRRRNTGGKGGDYALAPGSQSSDMSLPDCKV 2P24_A ------------------------------- 1CD1_A QDIR--------------------------- 2WY3_A KTSAAEGP----------------------- 1LQV_A ------------------------------- 3JTS_A ------------------------------- 1OW0_A ------------------------------- 1HXM_A -------------------------------
Based on the secondary structure for the HFE-Gen assigned by DSSP from the PDB structure (1a6z) the multible sequence alignmet conserves the secondary structure well.
ITasser
Predicted Secondary Structure by I-Tasser
Sequence: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF Predicted: CCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCSSSSSCCCCCCCCCCSSSSSSSCCCSSSSCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH Conf-Score: 985028899999999899875122045421036641367999985269985643743686068998778788540145583478888887676654315558 Sequence: WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ Predicted: HHHHHHHCCCCCCSSSSSSSCCCCCCCCCCCCCCCCCCCCCCSSSSCCCHHHCHHHHHHHHHHHHHHHHCCCHHHHHHHHHCCCCHHHHHHHHHCCHHHHHC Conf-Score: 888755315777644463525565898763541000558873365263022202455666677878887004598888767064299999999747666642 Sequence: QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV Predicted: CCCCCCCCCCCCCCCHHHHCHHHHCCCCCCSSSSSSSCCCCCCCCCCSSSSCCCCCCCCCCCSSSSSCCCCCCCCSSSSCCCCCCCCCSSSSCCCCCCCCCC Conf-Score: 599877567699854442101541541332479864358754456553541024888652112699807986310267512589998726840688766531 Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE Prediction: CCCCCCHHHHHHHCCHHHHHHHHHCCCCCCCCCCCCCHCCCC Conf-Score: 010211112222100246665443013678898651020169>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE Secondary structure elements are shown as H for Alpha helix,S for Beta sheet & C for Coil
Predicted Solvent Accessibility by I-Tasser
Sequence: MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDF Prediction: 723312000000000101112222011200120120023333331200000102322003123724434241311436413610352044144313323230 Sequence: WTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERDCPAQLQQLLELGRGVLDQ Prediction: 220132133351310001010021136231211333023032003016303403102321432433044143404422010333005103400630351154 Sequence: QVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPSPSGTLV Prediction: 342353313321443300000100101014010203346564435434135233334221320000000347533120214264144202020214542200 Sequence: IGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE Prediction: 000001100000011100000001334446443132333438
Values range from 0 (buried residue) to 9 (highly exposed residue)
I-Tasser predicted five Models with a C-Score from -0.557 to -3.298. They are ranked from one to five as seen below.
Model1 has a TM-Score of about 0.64 and a RMSD of 7.7Å. For the prediction, I-Tasser used 1a6zA, 1s7qA, 1i4fA, 1de4A, 2vabA and 2bckA as templates. The templates have an identity of about 40% except for the self hit 1a6z. Because of the self hit, we run I-Tasser a second time with the constrain to exclude all templates with a sequence identity > 80%.
SwissModel
SwissProt is a server based tool provided by the SIB. It combines tools like PSI-PRED and DISOPRED for secondary structure and disordered region prediction.
The model created by SwissModel is based on a self hit, but we had no chance to exclude the protein itself from the prediction. Therefore we also run SwissModel in Alignment-Mode.(TODO)
Automated Mode
Model information:
Modelled residue range: 26 to 297
Based on template: 1a6zC (2.60 Å)
Sequence Identity [%]: 100
Evalue: 7.66e-163
Quality information:
QMEAN Z-Score: -1.035
Even though the model is based on a self hit, the Z-Score is about -1, which means that the model is one standard deviation from the mean. The model is not quite unlikely but also not the most probable one.