Difference between revisions of "Sequence-based analyses Gaucher Disease"

Revision as of 16:31, 17 May 2012

Secondary structure

Knowing the secondary structure of a protein can shed light on its function since structure implies function. If the structure of a protein is known, secondary elements (helix, sheet, coiled) can be assigned to residues depending on their affinity to form hydrogen bonds. DSSP is the most common method to perform such secondary structure assignments. If the structure of a protein is unknown, secondary structure elements be be predicted by tools like PSIPRED or Reprof. The aim of this task was to analyse the secondary structure of different proteins and the compare the secondary structure predictions of PSIPRED and Reprof with the DSSP secondary structure assignments. Following sequences were taken into account:

NAME	UniProtKB	PDB
Glucosylceramidase	P04062	1OGS
Ribonuclease inhibitor	P10775	1DFJ
Divalent-cation tolerance protein CutA	Q9X0E6	1KR4
Serine/threonine-protein phosphatase	Q08209	1AUI

Information about program calls and implementation details can be found in our protocol.

Predictions

Method	H	E	C
DSSP	H,G,I	E,B	T,S,' '
PSIPRED	H	E	C
Reprof	H	E	L

</figtable> For being able to better compare the different output format, we mapped the secondary structure definitions of all three methods onto the three letters H (helix), E (sheet), and C (coiled) according to table <xr id="tab:ss_mapping"/>. Regions of the UniProt sequences which were not present in the PDB file as well as regions where no DSSP assignment was possible were ignored.

P04062

Crsytal structure of 1OGS (P04062). Red: alpha-helix, yellow: beta-sheet, green: coiled.

</figure> Glycosylcermidase (P04062) is located the the membrane of lysosomes. It exhibits two domains which belong to the (1) glycosyl hydrolase domain fold and (2) then TIM beta/alpha-barrel fold. Both domains have hydrophobic beta sheets which anchor the protein in the membrane. <xr id="fig:ss_P04062"/> depicts the secondary structure elements of the corresponding crystal structure which coincide with the DSSP assignments. The following section shows the secondary structure annotations of the different methods: The predictions of PSIPRED better match with the DSSP assignments than Reprof. Reprof predicts sheets instead of helices in several regions. The resiudes of the beta-barell sheets are marked by asterisks.

          40
          ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKG
Reprof    CCCCCCCCCCCEEEEEEECCEECCCCCCCCCCCCCEEEEEEECCCCCEEEEECCCEECCCCCCEEEEEECCCCEEEEEEC
PSIPRED   CCCCCCCCCCCCCEEEEECCHHCCCCCCCCCCCCCEEEEEEECCCCCCHHCCCCCCCCCCCCCCCEEEECCCCCCEEEEE
DSSP      CECCCEEECCCCCEEEEEECCCCCECCCCCCCCCCEEEEEEEECCCCCCEEEEEECECCCCCCCEEEEEEEEEEEEECCE

          120
          FGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPL
Reprof    CCCCCCHHHHHEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCCEEEEEEEECCCCCCCEEEEECCCCCCCCCEEEE
PSIPRED   EEEEHHHHHHHHHHHCCHHHHHHHHHHHCCCCCCEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHH
DSSP      EEEECCHHHHHHHCCCCHHHHHHHHHHHHCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHCCHHHH

          200
          IHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGL
Reprof    EEHHHHHCCCCCEEEECCCCCCCEEEECCCECCCEECCCCCCCCCCHHHHHHHHHHHHHCCCCEEEEEEEEECCCCCCCE
PSIPRED   HHHHHHHHCCCEEEEEEECCCCHHEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEEECCCCCCCC
DSSP      HHHHHHHCCCCCEEEEEECCCCHHHECCCCCCCCCEECCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCEEECCCCCCHHH
                      ******                                                  ***
          280
          LSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPA
Reprof    ECCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCEEEEEEECCCCEECCCEEEEEECCCCCCEEEEEEEEEEEEEECCCC
PSIPRED   CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHHHCCHHHHHHCCEEEEECCCCCCCCH
DSSP      CCCCCCCCCECCHHHHHHHHHHCHHHHHHCCCCCCCEEEEEEEEHHHCCHHHHHHHCCHHHHCCCCEEEEEEECCCCCCH
                                              ********                      ******* 
          360
          KATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDS
Reprof    CCECCCCCECCCCCEEEECCCCCCCEEEEEEEEECCCCCCCEEECEHHHHEEEEEEECCCCEEEECCCCCCEEEEECCCC
PSIPRED   HHHHHHHHHHCCCCEEEEECCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHEEEEEEEEECCCCCCCCCCCCCCC
DSSP      HHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCEEEEEEEECCECCCCCCCCCCCCCCC
                        ********                                ******** 
          440
          PIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFL
Reprof    CEEEEECCCCCCCCCEEEECCCEEEECCCCCEEEEEEEECCCCCEEEEEECCCCCEEEEEEECCCCCCEEEECCCCEEEE
PSIPRED   CEEEECCCCEEEECHHHHHHHHHHHHCCCCCEEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE
DSSP      CEEEEHHHCEEEECHHHHHHHHHHCCCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE

          520
          ETISPGYSIHTYLWRRQ
Reprof    EEECCCCEEEEEEEECC
PSIPRED   EEECCCCEEEEEEEECC
DSSP      EEEECCCEEEEEEECCC

P10775

Crsytal structure of 1DFJ (P10775). Red: alpha-helix, yellow: beta-sheet, green: coiled.

</figure> <xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: d1dfji_ is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. d1dfje_ contains long curved antiparallel beta-sheets and three alpha-helices. The alternating HHH and EEE regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof.

          1
          MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTC
Reprof    CCCCECHHCCCCCHHHHHHHHHHHCCEEEECCCCCCHHHHHHHHHHHHCCCCHHHHHHHHCCCCCCCHEEEHCCCCCCCC
PSIPRED   CEEECCCCCCCHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHHHCCC
DSSP      CECCCECCCCCHHHHHHHHHHHCCCCEEECECCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCC

          81
          KIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASV
Reprof    EEEEECCCCCCCCHCCCCCCHHHHCHCHHHHHHCCCCCCCCHHHHHHHHHCCCCCHCCHHHHHHHHHHCCCCCCHHHHHH
PSIPRED   CCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH
DSSP      CCCEEECCCCCCCCCHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH

          161
          LRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA
Reprof    HHHHHHHHHHCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHCHCCHHHCCCCCCCCCHHHHH
PSIPRED   HHHCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH
DSSP      HHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHH

          241
          ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSL
Reprof    HHCCCCCCCHHHHCHHEEEHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCHHHHHHHHCHH
PSIPRED   HHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCC
DSSP      HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCC

          321
          TAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL
Reprof    HHHHHHHHHHHHHCCHHHHHHHCCCCCCCCHHHHHHHHHHCCCCCEEEEEEECCCCCCCCCHHHHHHHHHHHCCHHHHCC
PSIPRED   CHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEEC
DSSP      EHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEEC

          401
          SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS
Reprof    CCCCCCCHHHHHHHCCCCCCCCHHHHHHHCCCCCCHHHHHHHHHHHCCCCCCCECC
PSIPRED   CCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEECC
DSSP      CCCECCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC

Q9X0E6

Crsytal structure of 1KR4 (Q9X0E6). Red: alpha-helix, yellow: beta-sheet, green: coiled.

</figure> <xr id="fig:ss_Q9X0E6"> depicts the d1kr4a_ domain of 1KR4 which is made of three alpha-helices interrupted by beta-sheets. Reprof predicts too long helices.

          2
          ILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIF
Reprof    EEEEECCCCHHHHHHHHHHHHHHHHHHHHCHCHHHCCCEEECEEECCHHHHHHHCCCHHHHHHHHHHHHHCCCCCCCHHE
PSIPRED   EEEEEECCCHHHHHHHHHHHHHCCCEEEEEEEEEEEEEEECCCEEEEEEEEEEECCCHHHHHHHHHHHHHHCCCCCCEEE
DSSP      EEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEE

          82
          TLKVENVLTEYMNWLRESVL
Reprof    HHHHHHHHHHHHHHHHHHCC
PSIPRED   EEECCCCCHHHHHHHHHHCC
DSSP      EECCCCEEHHHHHHHHHHCC

Q08209

Crsytal structure of 1AUI (Q08209). Red: alpha-helix, yellow: beta-sheet, green: coiled.

</figure> Q08209 contains the domain d1auib_ and d1auia_ which are mainly assembled of alpha-helices (<xr id="fig:ss_Q08209"/>). PSIPRED predicts these alpha-helices considerably better than Reprof which suggests beta-sheets in some regions.

          14
          TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHG
Reprof    CCCEEEEECCCCCCCEEEEEEECCCCCCEEEEEHHHECCCCCCCEEEEEEEEECCCCEEECCCCCCCCCCCEEEEECCCC
PSIPRED   CCCCCCCCCCCCCCCCCHHHCCCCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHCCCEEEECCCEEEECCCCC
DSSP      CCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCC

          94
          QFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSER
Reprof    HHHHHHEEEEECCCCCCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCCCEEEEEECCCCCCEEEEEEEEEEEEEEEEECH
PSIPRED   HHHHHHHHHHHCCCCCCCCEEECCCCCCCCCCCHHHHHHHHHHHHCCCCCEEEECCCCHHHHHHCCCCHHHHHHHHCCHH
DSSP      CHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHCCHHHHHHHHCCHH

          174
          VYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTV
Reprof    HHHHHHHHCCCCCHHHHHCCCEEEEECCCCCCCCCHHHHHHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCCCECCCCCE
PSIPRED   HHHHHHHHHCCCHHHHHCCCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHCCCCCCCCCCCCCCCCCCCCCC
DSSP      HHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCC

          254
          RGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQ
Reprof    CCEEEEECCCCEEEEHCCCCHHHHEHHHCCCCCCEEEEEECCCCCCCEEEEEEECCCEEEEECCCEEEEEECCCEEEEEE
PSIPRED   CCCCCCCCHHHHHHHHHHCCCCEEEEHHHHHHHHHHHHHCCCCCCCCCEEEEEECCCCCCCCCCEEEEEEECCCCCEEEE
DSSP      CCCCEEECHHHHHHHHHHCCCCEEEECCCCCCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEE

          334
          FNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSSFEEAKGLDRINERMPPR
Reprof    ECCCCCCCCCCCCCEEEEEECCCCCHHHHHHHHHHHEECCEHHHHCCCCHHCCCCCCC
PSIPRED   ECCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHCCCCC
DSSP      ECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHCCCCC

Prediction accuracy/precision

We compared the prediction performance of PSIPRED and Reprof via the Q3 score and the precision of the three secondary structure states H,E, and C. The Q3 score is identical to the accuracy, i.e. the number of correctly predicted states divided by the length of the protein. The precision of state X is the fraction of correct predictions of X, formally Precision(X)=TP(X)/(TP(X)+FP(X)). Table <xr id="ss_acc"/> shows the results: PSIPRED clearly outperforms Reprof in all for cases. PSIPRED achieves an average accuracy of 87% which is significantly higher than 58% in case of Reprof. <figtable id="ss_acc">

Method	Q3	Precision H	Precision E	Precision C
P04062
PSIPRED	0.831	0.830	0.872	0.810
Reprof	0.553	1.000	0.455	0.592
P10775
PSIPRED	0.941	0.959	0.960	0.919
Reprof	0.603	0.589	0.417	0.644
Q9X0E6
PSIPRED	0.890	1.000	0.895	0.720
Reprof	0.580	0.562	0.917	0.458
Q08209
PSIPRED	0.833	0.842	0.902	0.812
Reprof	0.579	0.762	0.293	0.743

</figtable>

Disorder

P04062

Method	Disorder regions
IUPRED	2, 3, 6, 90-93, 229-231, 235, 236
DisProt
Precision: 0%	Sensitivity: undef	Specificy: 98%

</figtable>

P10775

Homolog: DP00554 <figure id="fig:disorder_P10775">

Disordered and ordered regions of P10775 using DisProt entry DP00554.

</figure>

Method	Disorder regions
IUPRED
DisProt	31-50
Precision: undef	Sensitivity: 0%	Specificy: 100%

</figtable>

Q9X0E6

Q08209

Disordered and ordered regions of Q08209 using the DisProt entry DP00092.

</figure> DP00092 <figtable id="disorder_Q08209">

Method	Disorder regions
IUPRED	1-6, 8, 383,384,424,425,432,434-439,443,445,448,449,455,458,463-521
DisProt	1-13,390-414,374-468,469-486,487-521
Precision: 100%	Sensitivity: 52%	Specificy: 100%

</figtable>

@@ Line 81: / Line 81: @@
 [[File:ss_P10775.png|thumb|150px|right|<caption>Crsytal structure of 1DFJ (P10775). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]]
 </figure>
-<xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bb.b.b.b.html d1dfji_] is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.e.f.b.b.b.html d1dfje_] contains long curved antiparallel beta-sheets and three alpha-helices. The alternating <tt>HHH</t> and <tt>EEE</tt> regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof.
+<xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bb.b.b.b.html d1dfji_] is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.e.f.b.b.b.html d1dfje_] contains long curved antiparallel beta-sheets and three alpha-helices. The alternating <tt>HHH</tt> and <tt>EEE</tt> regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof.
 <pre>
@@ Line 120: / Line 120: @@
 </pre>
-Q9X0E6:
+==== Q9X0E6 ====
+<figure id="fig:ss_Q9X0E6">
+[[File:ss_Q9X0E6.png|thumb|150px|right|<caption>Crsytal structure of 1KR4 (Q9X0E6). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]]
+</figure>
+<xr id="fig:ss_Q9X0E6"> depicts the [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.e.bcj.f.c.f.html d1kr4a_] domain of 1KR4 which is made of three alpha-helices interrupted by beta-sheets. Reprof predicts too long helices.
 <pre>
@@ Line 136: / Line 140: @@
 ==== Q08209 ====
+<figure id="fig:ss_Q08209">
+[[File:ss_Q08209.png|thumb|150px|right|<caption>Crsytal structure of 1AUI (Q08209). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]]
+</figure>
+Q08209 contains the domain [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.b.ff.b.g.di.html d1auib_] and [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.e.dbg.b.e.g.html d1auia_] which are mainly assembled of alpha-helices (<xr id="fig:ss_Q08209"/>). PSIPRED predicts these alpha-helices considerably better than Reprof which suggests beta-sheets in some regions.
 <pre>
@@ Line 169: / Line 177: @@
 === Prediction accuracy/precision ===
+We compared the prediction performance of PSIPRED and Reprof via the Q3 score and the precision of the three secondary structure states H,E, and C. The Q3 score is identical to the accuracy, i.e. the number of correctly predicted states divided by the length of the protein. The precision of state X is the fraction of correct predictions of X, formally Precision(X)=TP(X)/(TP(X)+FP(X)). Table <xr id="ss_acc"/> shows the results: PSIPRED clearly outperforms Reprof in all for cases. PSIPRED achieves an average accuracy of 87% which is significantly higher than 58% in case of Reprof.
 <figtable id="ss_acc">
 {|border=1 style="border-collapse: separate; border-spacing: 0; text-align: center"

Difference between revisions of "Sequence-based analyses Gaucher Disease"

Revision as of 16:31, 17 May 2012

Contents

Secondary structure

Predictions

P04062

P10775

Q9X0E6

Q08209

Prediction accuracy/precision

Disorder

P04062

P10775

Q9X0E6

Q08209

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools