Glucocerebrosidase sequence based prediction
Contents
Secondary structure prediction
General
The secondary structure of a protein is the three-dimensional form. In contrast to the tertiary structure it describes the local segments. Because of weak chemical forces like hydrogen bonds and the values of the φ and ψ angles they form different structures. The main types are α-helices and parallel and anti-parallel β-sheets. Some rare structures are π-helices and 3,10-helices. Another possibility are coils, which are irregular formed elements.
A protein consists of several secondary structure elements which build together the tertiary structure. <ref>http://en.wikipedia.org/wiki/Biomolecular_structure#Secondary_structure</ref>
PSIPRED
PSIPRED is a method by David T. Jones, published 1999 in JMB with "Protein Secondary Structure Prediction Based on
Position-specific Scoring Matrices". PSIPRED works with a two-stage neural network to predict secondary structure. These are based on the position specific scoring matrices generated by PSI-BLAST, which is run before.<ref>David T. Jones, Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices, JMB, 1999</ref>
As input only the protein sequence is needed.
We run the online and the local version of PSIPRED and got different results. In the following it is compared to the secondary structure given in Uniprot<ref>http://www.uniprot.org/uniprot/P04062</ref>.
Conf: | 988898954488887622315999999999998641038968865325999649995388
|
online: | CCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCEEEEECCC
|
Conf: | 987898955489988742200466888998986410038977877777863169974474
|
local: | CCCCCCCCCCCCCCCCCEEEEHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCEEEEEECCC
|
uniprot: | ------------------------------------------------EEEE-EEEEEE-
|
AA: | MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNAT
|
Conf: | 558889998889992599996377885421237645688875108995378301079247
|
online: | CCCCCCCCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCEEEECCCCCEEEEEEE
|
Conf: | 148899998788875431100345640022100111177897107840966454557422
|
local: | CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCCCEEEECCCCCCCEEEEE
|
uniprot: | -------------EEEEEEEE-----EEEEEEE-EEE----EEEEEEEEEEEEEE--EEE
|
AA: | YCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGF
|
Conf: | 300233899997249999999999860597882105999750588999986666899999
|
online: | EECCCHHHHHHHHCCCHHHHHHHHHHCCCCCCCEEEEEEEEECCCCCCCCCCCCCCCCCC
|
Conf: | 011335889987508927898998851396893001358621344677653324799999
|
local: | CCCCCHHHHHHHHHCCHHHHHHHHHHHCCCCCCEEEEEEEECCCCCCCCCCCCCCCCCCC
|
uniprot: | EE--HHHHHHH----HHHHHHHHHHHH-CCCC---EEEEEEE--EEEEE------EEE--
|
AA: | GGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDD
|
Conf: | 689999994100245289999999971999389971377785612147247999889999
|
online: | CCCCCCCCCHHCCCCCHHHHHHHHHHCCCCCEEEECCCCCCCCCEECCCCCCCCCCCCCC
|
Conf: | 721111368543220024799998733999689957899974220056347854325899
|
local: | CCCCCCCCCCCCCCCHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCC
|
uniprot: | --------HHHH--HHHHHHHHHHH-----EEEEEEE---HHH----EEEEE-EEEE---
|
AA: | FQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQP
|
Conf: | 922699999999999999975490786872012579899999999986349999999999
|
online: | CCHHHHHHHHHHHHHHHHHHHCCEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHH
|
Conf: | 971468799999999967663395143898112789787678873222114422121122
|
local: | CCHHHHHHHHHHHHHHHHHHHCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCHHHH
|
uniprot: | -HHHHHHHHHHHHHHHHHHH-----EEEEE-----HHH------------HHHHHHHHHH
|
AA: | GDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA
|
Conf: | 955799851689972999944888873334664149955640224689831699998033
|
online: | HHHHHHHHCCCCCCEEEEEECCCCCCHHHHHHHHCCCHHHHCCCCEEEEEECCCCCCHHH
|
Conf: | 111332310577410134212544556520222238976651151878702212236320
|
local: | HHHHHHHHCCCCCCCEEEEECCCCCCCCCCHHHHCCCHHHHHCCEEEEEECCCCCCCCCC
|
uniprot: | -HHHHHH--CCCCEEEEEEEEEHHH--HHHHHHH--HHHH----EEEEEEE------HHH
|
AA: | RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAK
|
Conf: | 412688750999509994343699998866567831444255999999996402335772
|
online: | HHHHHHHHCCCCCEEEEECCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCCEEEEEE
|
Conf: | 011111126898001101210389653344579861210143212566552001100000
|
local: | CCCCCCCCCCCCCCEEHHHHCCCCCCCCCCCCCCCHHHHCCCCHHHHHHHHHHHHHHEEE
|
uniprot: | HHHHHHHH---EEEEEEEEE--------------HHHHHHHHHHHHHHHH--EEEEEEEE
|
AA: | ATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDW
|
Conf: | 000169999986689878535895679769986202333102244469939999542389
|
online: | EECCCCCCCCCCCCCCCCCCEEEECCCCEEEECCHHHHHHHHCCCCCCCCEEEEEEECCC
|
Conf: | 023699999860001325228998208703226821232123444679927984435078
|
local: | CCCCCCCCCCCCEECCCCCCEEEEECCCCEEECCCEEEECCCCCCCCCCCEEEEEEEECC
|
uniprot: | E-----------------EEEEEHHH-EEEE-HHHHHHHHHH-------EEEEEEEEE--
|
AA: | NLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQK
|
Conf: | 99028999928998999999099993779999099304998518951899999309
|
online: | CCCEEEEEECCCCCEEEEEECCCCCCEEEEEEECCCCEEEEECCCCEEEEEEEEEC
|
Conf: | 99538995649997799999146898214741998642000389842568774139
|
local: | CCCCEEEEECCCCCEEEEEEECCCCCEEEEECCCCCCCCCCCCCCCEEEEEEEECC
|
uniprot: | EEEEEEEE-----EEEEEEE-EEE-EEEEEEECCCEEEEEEE---EEEEEEE----
|
AA: | NDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
|
The results differ a lot. The online version of PSIPRED seems to have different parameters than the local version. So we got different results. Compared to the given secondary structure in Uniprot there are many regions that are predicted wrong.
Jpred3
Jpred3 was published 2008 by Christian Cole, Jonathan D. Barber and Geoffrey J. Barton as "The Jpred 3 secondary structure prediction server" in Nucl. Acids Res. The Jnet algorithm predicts the secondary structure and solvent accessibility with the help of alignment profiles. Therefore it uses the position-specific scoring matrix (PSSM) from PSI-BLAST and a hidden Markov model. The prediction is made with a neural network.<ref>http://nar.oxfordjournals.org/content/36/suppl_2/W197.full</ref>
As input only the protein sequence is needed. Alternatively you can also use a multiple sequence alignment.
OrigSeq | MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANH
|
Jnet | -----------------HHHHHHHHHHHHHHHHHHHH---------------EEEEE-----------------EEEEEEE------------------
|
jhmm | ----------------HHHHHHHHHHHHHHHHHHHHH---------------EEEEE-----------------EEEEEEE------------------
|
jpssm | ------------------HHHHHHHHHHHHHHHHHH---------------EEEEEE-----------------EEEEEEE------------------
|
uniprot | ------------------------------------------------EEEE-EEEEEE--------------EEEEEEEE-----EEEEEEE-EEE--
|
OrigSeq | TGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIP
|
Jnet | ----EEEEEE----EEEEEEEEEEHHHHHHHHH----HHHHHHHHHHH-------EEEEEE----------------------------------HHHH
|
jhmm | ----EEEEE-----EEEEEEEEEEEHHHHHHHH----HHHHHHHHHHH-------EEEEEE----------------------------------HHHH
|
jpssm | ----EEEEEE----EEEEEEEE-HHHHHHHHHH----HHHHHHHHHHH------EEEEEEEE--------------------------------HHHHH
|
uniprot | --EEEEEEEEEEEEEE--EEEEE--HHHHHHH----HHHHHHHHHHHH-CCCC---EEEEEEE--EEEEE------EEE----------HHHH--HHHH
|
OrigSeq | LIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRD
|
Jnet | HHHHHHHH----EEEEE--------EE-------------------HHHHHHHHHHHHHHHHH----EEEEE---------------------HHHHHH
|
jhmm | HHHHHHHH----EEEEE-----------------------------HHHHHHHHHHHHHHHHH----EEEEE---------------------HHHHHH
|
jpssm | HHHHHHHHH---EEEEE--------EEE-----------------HHHHHHHHHHHHHHHHHH----EEEEEE--------------------HHHHHH
|
uniprot | HHHHHHH-----EEEEEEE---HHH----EEEEE-EEEE----HHHHHHHHHHHHHHHHHHH-----EEEEE-----HHH------------HHHHHHH
|
OrigSeq | FIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSW
|
Jnet | HHHHHHHHHHHH-----EEEEEE--------HHHHHHH--HHHHHHHHEEEE----------HHHHHHHHHH-----EEEEEEEE--------------
|
jhmm | HHHHHHHHHHHH------EEEEE-------HHHHHHHH--H-HHHHHHEEEE----------HHHHHHHHHH-----EEEEEEEE--------------
|
jpssm | HHHHHHHHHHHH-----EEEEEEE-------HHHHHH----HHHHHH--EE-----------HHHHHHHHHH-----EEEEEEE--------------H
|
uniprot | HHH-HHHHHH--CCCCEEEEEEEEEHHH--HHHHHHH--HHHH----EEEEEEE------HHHHHHHHHHH---EEEEEEEEE--------------HH
|
OrigSeq | DRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSA
|
Jnet | HHHHHHHHHHHHHHHHHHHHHHHHHHH----------------EEEEE----EEEE---HHHHHHHH-------EEEEE-------EEEEEEE-----E
|
jhmm | -HHHHHHHHHHHHHHHHHHHHHHHHHH----------------EEEEE----EEEE---HHHHHHHH-------EEEE--------EEEEEEE-----E
|
jpssm | HHHHHHHHHHHHHHHHHHHHHHHHHHE----------------EEEEE----EEEE--HHHHHHHH--------EEEEEE------EEEEEEEE----E
|
uniprot | HHHHHHHHHHHHHH--EEEEEEEEE-----------------EEEEEHHH-EEEE-HHHHHHHHHH-------EEEEEEEEE--EEEEEEEE-----EE
|
OrigSeq | VVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
|
Jnet | EEEEEE-----EEEEEEE---EEEEEEE----EEEEEEE--
|
jhmm | EEEEEE-----EEEEEEE---EEEEEEE----EEEEEEE--
|
jpssm | EEEEEE----EEEEEEEE---EEEEEEE---EEEEEEE---
|
uniprot | EEEEE-EEE-EEEEEEECCCEEEEEEE---EEEEEEE----
|
Jpred3 predicts the bigger part of the protein correctly. Some parts are not predicted, where are helices or beta sheets and only one time it predicted sheets instead of helices.
For the prediction it used a lot of hits of Blast with an E-value of 0, and also one with an E-value of 2e-52:
2wkl, 3keh, 3ke0, 3gxm, 3gxi, 3gxf, 3gxd, 2wcg, 2vt0, 2v3f, 2v3e, 2v3d, 2nt1, 2nt0, 2nsx, 2j25, 2f61, 1y7v, 1ogs, 2wnw.
All but 2wnw are glucoceramidase proteins of Homo Sapiens, 1ogs is exactly that we use. 2wnw is a hydrolyse activated by transcription factor from salmonella typhimurium. It seems to be also glucoceramidase. Some examples of the used proteins are shown in the following pictures.
Comparison with DSSP
DSSP, which stands for Define Secondary Structure of Proteins, is by Wolfgang Kabsch and Chris Sander, who published it 1983 in Biopolymers with the title "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features".<ref>http://swift.cmbi.ru.nl/gv/dssp/</ref>
The DSSP algorithm recognises the secondary structure. Therefore it defines hydrogen bonds with an electrostatic definition. The different patterns of hydrogen bonds constitute one of eight possible secondary structure types. So it is no prediction tool.<ref>http://en.wikipedia.org/wiki/DSSP_%28protein%29</ref>
Prediction of disordered regions
General
DISOPRED
POODLE
IUPRED
Prediction of transmembrane alpha-helices and signal peptides
General
Transmembrane topology
Signal peptides
Combined transmembrane and signal peptide prediction
The high similarity between the hydrophobic region of a transmembrane helix and the one of a signal peptide leads to cross-predictions when conventional transmembrane topology and signal peptide predictors as TMHMM and SignalP are used. Predictors which are based on submodels for both make less errors coming from cross-predictions and help to discriminate against false positives. Furthermore, a predicted signal peptide indicates that the N-terminus of the protein is non-cytoplasmic and is therefore helpful to assign the orientation of the protein. <ref>Käll L. Krogh A, & Sonnhammer, E. L. (2007) Advantages of combined tranasmembrane topology and signal peptide prediction - the Phobius web server. Nucleic Acids Res., Vol. 35, Web server issue, S.429-32</ref>
TMHMM
Phobius and PolyPhobius
Phobius
PolyPhobius
OCTOPUS and SPOCTOPUS
OCTOPUS
SOCTOPUS
SignalIP
TargetP
Prediction of GO terms
General
GOPET
Pfam
ProtFun 2.2
References
<references />