Glucocerebrosidase sequence based prediction

From Bioinformatikpedia
Revision as of 11:04, 27 May 2011 by Brunners (talk | contribs) (PSIPRED)

Secondary structure prediction

General

The secondary structure of a protein is the three-dimensional form. In contrast to the tertiary structure it describes the local segments. Because of weak chemical forces like hydrogen bonds and the values of the φ and ψ angles they form different structures. The main types are α-helices and parallel and anti-parallel β-sheets. Some rare structures are π-helices and 3,10-helices. Another possibility are coils, which are irregular formed elements.
A protein consists of several secondary structure elements which build together the tertiary structure. <ref>http://en.wikipedia.org/wiki/Biomolecular_structure#Secondary_structure</ref>

beta sheets and a α-helix as examples for secondary structure <ref>http://www.nature.com/horizon/proteinfolding/background/images/importance_f3.gif</ref>

PSIPRED

PSIPRED is a method by David T. Jones, published 1999 in JMB with "Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices". PSIPRED works with a two-stage neural network to predict secondary structure. These are based on the position specific scoring matrices generated by PSI-BLAST, which is run before.<ref>David T. Jones, Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices, JMB, 1999</ref>
As input only the protein sequence is needed.

We run the online and the local version of PSIPRED and got different results. In the following it is compared to the secondary structure given in Uniprot<ref>http://www.uniprot.org/uniprot/P04062</ref>.

Conf: 988898954488887622315999999999998641038968865325999649995388
online: CCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCEEEEECCC
Conf: 987898955489988742200466888998986410038977877777863169974474
local: CCCCCCCCCCCCCCCCCEEEEHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCEEEEEECCC
uniprot: ------------------------------------------------EEEE-EEEEEE-
AA: MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNAT
 
Conf: 558889998889992599996377885421237645688875108995378301079247
online: CCCCCCCCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCEEEECCCCCEEEEEEE
Conf: 148899998788875431100345640022100111177897107840966454557422
local: CCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCCCEEEECCCCCCCEEEEE
uniprot: -------------EEEEEEEE-----EEEEEEE-EEE----EEEEEEEEEEEEEE--EEE
AA: YCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGF
 
Conf: 300233899997249999999999860597882105999750588999986666899999
online: EECCCHHHHHHHHCCCHHHHHHHHHHCCCCCCCEEEEEEEEECCCCCCCCCCCCCCCCCC
Conf: 011335889987508927898998851396893001358621344677653324799999
local: CCCCCHHHHHHHHHCCHHHHHHHHHHHCCCCCCEEEEEEEECCCCCCCCCCCCCCCCCCC
uniprot: EE--HHHHHHH----HHHHHHHHHHHH-CCCC---EEEEEEE--EEEEE------EEE--
AA: GGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDD
 
Conf: 689999994100245289999999971999389971377785612147247999889999
online: CCCCCCCCCHHCCCCCHHHHHHHHHHCCCCCEEEECCCCCCCCCEECCCCCCCCCCCCCC
Conf: 721111368543220024799998733999689957899974220056347854325899
local: CCCCCCCCCCCCCCCHHHHHHHHHHHCCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCC
uniprot: --------HHHH--HHHHHHHHHHH-----EEEEEEE---HHH----EEEEE-EEEE---
AA: FQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQP
 
Conf: 922699999999999999975490786872012579899999999986349999999999
online: CCHHHHHHHHHHHHHHHHHHHCCEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHHH
Conf: 971468799999999967663395143898112789787678873222114422121122
local: CCHHHHHHHHHHHHHHHHHHHCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCHHHH
uniprot: -HHHHHHHHHHHHHHHHHHH-----EEEEE-----HHH------------HHHHHHHHHH
AA: GDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA
 
Conf: 955799851689972999944888873334664149955640224689831699998033
online: HHHHHHHHCCCCCCEEEEEECCCCCCHHHHHHHHCCCHHHHCCCCEEEEEECCCCCCHHH
Conf: 111332310577410134212544556520222238976651151878702212236320
local: HHHHHHHHCCCCCCCEEEEECCCCCCCCCCHHHHCCCHHHHHCCEEEEEECCCCCCCCCC
uniprot: -HHHHHH--CCCCEEEEEEEEEHHH--HHHHHHH--HHHH----EEEEEEE------HHH
AA: RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAK
 
Conf: 412688750999509994343699998866567831444255999999996402335772
online: HHHHHHHHCCCCCEEEEECCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCCEEEEEE
Conf: 011111126898001101210389653344579861210143212566552001100000
local: CCCCCCCCCCCCCCEEHHHHCCCCCCCCCCCCCCCHHHHCCCCHHHHHHHHHHHHHHEEE
uniprot: HHHHHHHH---EEEEEEEEE--------------HHHHHHHHHHHHHHHH--EEEEEEEE
AA: ATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDW
 
Conf: 000169999986689878535895679769986202333102244469939999542389
online: EECCCCCCCCCCCCCCCCCCEEEECCCCEEEECCHHHHHHHHCCCCCCCCEEEEEEECCC
Conf: 023699999860001325228998208703226821232123444679927984435078
local: CCCCCCCCCCCCEECCCCCCEEEEECCCCEEECCCEEEECCCCCCCCCCCEEEEEEEECC
uniprot: E-----------------EEEEEHHH-EEEE-HHHHHHHHHH-------EEEEEEEEE--
AA: NLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQK
 
Conf: 99028999928998999999099993779999099304998518951899999309
online: CCCEEEEEECCCCCEEEEEECCCCCCEEEEEEECCCCEEEEECCCCEEEEEEEEEC
Conf: 99538995649997799999146898214741998642000389842568774139
local: CCCCEEEEECCCCCEEEEEEECCCCCEEEEECCCCCCCCCCCCCCCEEEEEEEECC
uniprot: EEEEEEEE-----EEEEEEE-EEE-EEEEEEECCCEEEEEEE---EEEEEEE----
AA: NDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

The results differ a lot. The online version of PSIPRED seems to have different parameters than the local version. So we got different results. Compared to the given secondary structure in Uniprot there are many regions that are predicted wrong.

Jpred3

Jpred3 was published 2008 by Christian Cole, Jonathan D. Barber and Geoffrey J. Barton as "The Jpred 3 secondary structure prediction server" in Nucl. Acids Res. The Jnet algorithm predicts the secondary structure and solvent accessibility with the help of alignment profiles. Therefore it uses the position-specific scoring matrix (PSSM) from PSI-BLAST and a hidden Markov model. The prediction is made with a neural network.
As input only the protein sequence is needed. Alternatively you can also use a multiple sequence alignment.<ref>http://nar.oxfordjournals.org/content/36/suppl_2/W197.full</ref>

Comparison with DSSP

Prediction of disordered regions

General

DISOPRED

POODLE

IUPRED

Prediction of transmembrane alpha-helices and signal peptides

General

Why is the prediction of transmembrane helices and signal peptides grouped together here?

signal peptides

TMHMM

Phobius and PolyPhobius

Phobius

PolyPhobius

OCTOPUS and SPOCTOPUS

OCTOPUS

SOCTOPUS

SignalIP

TargetP

Prediction of GO terms

General

GOPET

Pfam

ProtFun 2.2

References

<references />