Difference between revisions of "Sequence-based analyses Gaucher Disease"
(→Predictions) |
(→Predictions) |
||
Line 31: | Line 31: | ||
[[File:ss_P04062.png|thumb|150px|right|<caption>Crsytal structure of 1OGS (P04062). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]] |
[[File:ss_P04062.png|thumb|150px|right|<caption>Crsytal structure of 1OGS (P04062). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]] |
||
</figure> |
</figure> |
||
− | Glycosylcermidase (P04062) is located the the membrane of lysosomes. It exhibits two domains which belong to the (1) [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.c.bcb.b.c.b.html glycosyl hydrolase domain] fold and (2) then [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.b.j.c.ef.html TIM beta/alpha-barrel] fold. Both domains have hydrophobic beta sheets which anchor the protein in the membrane. <xr id="fig:ss_P04062 |
+ | Glycosylcermidase (P04062) is located the the membrane of lysosomes. It exhibits two domains which belong to the (1) [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.c.bcb.b.c.b.html glycosyl hydrolase domain] fold and (2) then [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.b.j.c.ef.html TIM beta/alpha-barrel] fold. Both domains have hydrophobic beta sheets which anchor the protein in the membrane. <xr id="fig:ss_P04062"/> depicts the secondary structure elements of the corresponding crystal structure which coincide with the DSSP assignments. The following section shows the secondary structure annotations of the different methods: The predictions of PSIPRED better match with the DSSP assignments than Reprof. Reprof predicts sheets instead of helices in several regions. The resiudes of the beta-barell sheets are marked by asterisks. |
<pre> |
<pre> |
||
Line 78: | Line 78: | ||
==== P10775 ==== |
==== P10775 ==== |
||
+ | <figure id="fig:ss_P10775"> |
||
+ | [[File:ss_P10775.png|thumb|150px|right|<caption>Crsytal structure of 1DFJ (P10775). Red: alpha-helix, yellow: beta-sheet, green: coiled.</caption>]] |
||
+ | </figure> |
||
+ | <xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bb.b.b.b.html d1dfji_] is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.e.f.b.b.b.html d1dfje_] contains long curved antiparallel beta-sheets and three alpha-helices. The alternating <tt>HHH</t> and <tt>EEE</tt> regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof. |
||
<pre> |
<pre> |
||
1 |
1 |
Revision as of 16:03, 17 May 2012
Contents
Secondary structure
Knowing the secondary structure of a protein can shed light on its function since structure implies function. If the structure of a protein is known, secondary elements (helix, sheet, coiled) can be assigned to residues depending on their affinity to form hydrogen bonds. DSSP is the most common method to perform such secondary structure assignments. If the structure of a protein is unknown, secondary structure elements be be predicted by tools like PSIPRED or Reprof. The aim of this task was to analyse the secondary structure of different proteins and the compare the secondary structure predictions of PSIPRED and Reprof with the DSSP secondary structure assignments. Following sequences were taken into account:
NAME | UniProtKB | PDB |
---|---|---|
Glucosylceramidase | P04062 | 1OGS |
Ribonuclease inhibitor | P10775 | 1DFJ |
Divalent-cation tolerance protein CutA | Q9X0E6 | 1KR4 |
Serine/threonine-protein phosphatase | Q08209 | 1AUI |
Information about program calls and implementation details can be found in our protocol.
Predictions
<figtable id="tab:ss_mapping">
Method | H | E | C |
---|---|---|---|
DSSP | H,G,I | E,B | T,S,' ' |
PSIPRED | H | E | C |
Reprof | H | E | L |
</figtable> For being able to better compare the different output format, we mapped the secondary structure definitions of all three methods onto the three letters H (helix), E (sheet), and C (coiled) according to table <xr id="tab:ss_mapping"/>. Regions of the UniProt sequences which were not present in the PDB file as well as regions where no DSSP assignment was possible were ignored.
P04062
<figure id="fig:ss_P04062">
</figure> Glycosylcermidase (P04062) is located the the membrane of lysosomes. It exhibits two domains which belong to the (1) glycosyl hydrolase domain fold and (2) then TIM beta/alpha-barrel fold. Both domains have hydrophobic beta sheets which anchor the protein in the membrane. <xr id="fig:ss_P04062"/> depicts the secondary structure elements of the corresponding crystal structure which coincide with the DSSP assignments. The following section shows the secondary structure annotations of the different methods: The predictions of PSIPRED better match with the DSSP assignments than Reprof. Reprof predicts sheets instead of helices in several regions. The resiudes of the beta-barell sheets are marked by asterisks.
40 ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKG Reprof CCCCCCCCCCCEEEEEEECCEECCCCCCCCCCCCCEEEEEEECCCCCEEEEECCCEECCCCCCEEEEEECCCCEEEEEEC PSIPRED CCCCCCCCCCCCCEEEEECCHHCCCCCCCCCCCCCEEEEEEECCCCCCHHCCCCCCCCCCCCCCCEEEECCCCCCEEEEE DSSP CECCCEEECCCCCEEEEEECCCCCECCCCCCCCCCEEEEEEEECCCCCCEEEEEECECCCCCCCEEEEEEEEEEEEECCE 120 FGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPL Reprof CCCCCCHHHHHEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCCEEEEEEEECCCCCCCEEEEECCCCCCCCCEEEE PSIPRED EEEEHHHHHHHHHHHCCHHHHHHHHHHHCCCCCCEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHH DSSP EEEECCHHHHHHHCCCCHHHHHHHHHHHHCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHCCHHHH 200 IHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGL Reprof EEHHHHHCCCCCEEEECCCCCCCEEEECCCECCCEECCCCCCCCCCHHHHHHHHHHHHHCCCCEEEEEEEEECCCCCCCE PSIPRED HHHHHHHHCCCEEEEEEECCCCHHEEECCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHCCCEEEEEEECCCCCCCC DSSP HHHHHHHCCCCCEEEEEECCCCHHHECCCCCCCCCEECCCCCCHHHHHHHHHHHHHHHHHHHCCCCCCEEECCCCCCHHH ****** *** 280 LSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPA Reprof ECCCCEEEECCCCCCCCCEEEECCCCCCCCCCCCEEEEEEECCCCEECCCEEEEEECCCCCCEEEEEEEEEEEEEECCCC PSIPRED CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCEEEEEECCCCCCHHHHHHHHHCCHHHHHHCCEEEEECCCCCCCCH DSSP CCCCCCCCCECCHHHHHHHHHHCHHHHHHCCCCCCCEEEEEEEEHHHCCHHHHHHHCCHHHHCCCCEEEEEEECCCCCCH ******** ******* 360 KATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDS Reprof CCECCCCCECCCCCEEEECCCCCCCEEEEEEEEECCCCCCCEEECEHHHHEEEEEEECCCCEEEECCCCCCEEEEECCCC PSIPRED HHHHHHHHHHCCCCEEEEECCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHEEEEEEEEECCCCCCCCCCCCCCC DSSP HHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCEEEEEEEECCECCCCCCCCCCCCCCC ******** ******** 440 PIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFL Reprof CEEEEECCCCCCCCCEEEECCCEEEECCCCCEEEEEEEECCCCCEEEEEECCCCCEEEEEEECCCCCCEEEECCCCEEEE PSIPRED CEEEECCCCEEEECHHHHHHHHHHHHCCCCCEEEEEECCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE DSSP CEEEEHHHCEEEECHHHHHHHHHHCCCCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCCCEEEEEEECCCEEE 520 ETISPGYSIHTYLWRRQ Reprof EEECCCCEEEEEEEECC PSIPRED EEECCCCEEEEEEEECC DSSP EEEECCCEEEEEEECCC
P10775
<figure id="fig:ss_P10775">
</figure> <xr id="fig:ss_P10775"/> depicts the crystal structure 1DFJ which refers to P10775. It has two domains: d1dfji_ is a repeat domain consisting of altering alpha-helices and parallel beta-sheets. d1dfje_ contains long curved antiparallel beta-sheets and three alpha-helices. The alternating HHH</t> and EEE regions in the following secondary structure annotations suit well with repetitive structure shown in <xr id="fig:ss_P10775"/>. Again, the PSIPRED predictions better match the DSSP assignments than Reprof.
1 MNLDIHCEQLSDARWTELLPLLQQYEVVRLDDCGLTEEHCKDIGSALRANPSLTELCLRTNELGDAGVHLVLQGLQSPTC Reprof CCCCECHHCCCCCHHHHHHHHHHHCCEEEECCCCCCHHHHHHHHHHHHCCCCHHHHHHHHCCCCCCCHEEEHCCCCCCCC PSIPRED CEEECCCCCCCHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHHHCCC DSSP CECCCECCCCCHHHHHHHHHHHCCCCEEECECCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCHHHHHHHHHHHHCCCCC 81 KIQKLSLQNCSLTEAGCGVLPSTLRSLPTLRELHLSDNPLGDAGLRLLCEGLLDPQCHLEKLQLEYCRLTAASCEPLASV Reprof EEEEECCCCCCCCHCCCCCCHHHHCHCHHHHHHCCCCCCCCHHHHHHHHHCCCCCHCCHHHHHHHHHHCCCCCCHHHHHH PSIPRED CCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH DSSP CCCEEECCCCCCCCCHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHH 161 LRATRALKELTVSNNDIGEAGARVLGQGLADSACQLETLRLENCGLTPANCKDLCGIVASQASLRELDLGSNGLGDAGIA Reprof HHHHHHHHHHCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHCHCCHHHCCCCCCCCCHHHHH PSIPRED HHHCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHH DSSP HHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHH 241 ELCPGLLSPASRLKTLWLWECDITASGCRDLCRVLQAKETLKELSLAGNKLGDEGARLLCESLLQPGCQLESLWVKSCSL Reprof HHCCCCCCCHHHHCHHEEEHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCHHHHHHHHHHCCCCCCHHHHHHHHCHH PSIPRED HHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCEEECCCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCC DSSP HHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEECCCCCCHHHHHHHHHHHHHCCCCCCCEEECCCCCC 321 TAACCQHVSLMLTQNKHLLELQLSSNKLGDSGIQELCQALSQPGTTLRVLCLGDCEVTNSGCSSLASLLLANRSLRELDL Reprof HHHHHHHHHHHHHCCHHHHHHHCCCCCCCCHHHHHHHHHHCCCCCEEEEEEECCCCCCCCCHHHHHHHHHHHCCHHHHCC PSIPRED CHHHHHHHHHHHHHCCCCCEEECCCCCCCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCCEEEC DSSP EHHHHHHHHHHHHHCCCCCEEECCCCECHHHHHHHHHHHHHCCCCCCCEEECCCCCCEHHHHHHHHHHHHHCCCCCEEEC 401 SNNCVGDPGVLQLLGSLEQPGCALEQLVLYDTYWTEEVEDRLQALEGSKPGLRVIS Reprof CCCCCCCHHHHHHHCCCCCCCCHHHHHHHCCCCCCHHHHHHHHHHHCCCCCCCECC PSIPRED CCCCCCHHHHHHHHHHHHCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEECC DSSP CCCECCHHHHHHHHHHHCCCCCCCCEEECCCCCCCHHHHHHHHHHHHHCCCCEEEC
Q9X0E6:
2 ILVYSTFPNEEKALEIGRKLLEKRLIACFNAFEIRSGYWWKGEIVQDKEWAAIFKTTEEKEKELYEELRKLHPYETPAIF Reprof EEEEECCCCHHHHHHHHHHHHHHHHHHHHCHCHHHCCCEEECEEECCHHHHHHHCCCHHHHHHHHHHHHHCCCCCCCHHE PSIPRED EEEEEECCCHHHHHHHHHHHHHCCCEEEEEEEEEEEEEEECCCEEEEEEEEEEECCCHHHHHHHHHHHHHHCCCCCCEEE DSSP EEEEEEECCHHHHHHHHHHHHHCCCCCEEEEEEEEEEEEECCEEEEEEEEEEEEEEEHHHHHHHHHHHHHHCCCCCCCEE 82 TLKVENVLTEYMNWLRESVL Reprof HHHHHHHHHHHHHHHHHHCC PSIPRED EEECCCCCHHHHHHHHHHCC DSSP EECCCCEEHHHHHHHHHHCC
Q08209
14 TDRVVKAVPFPPSHRLTAKEVFDNDGKPRVDILKAHLMKEGRLEESVALRIITEGASILRQEKNLLDIDAPVTVCGDIHG Reprof CCCEEEEECCCCCCCEEEEEEECCCCCCEEEEEHHHECCCCCCCEEEEEEEEECCCCEEECCCCCCCCCCCEEEEECCCC PSIPRED CCCCCCCCCCCCCCCCCHHHCCCCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHHCCCEEEECCCEEEECCCCC DSSP CCCCCCCCCCCCCCCECHHHHECCCCCECHHHHHHHHHCCCCECHHHHHHHHHHHHHHHHCCCCEEEECCCEEEECCCCC 94 QFFDLMKLFEVGGSPANTRYLFLGDYVDRGYFSIECVLYLWALKILYPKTLFLLRGNHECRHLTEYFTFKQECKIKYSER Reprof HHHHHHEEEEECCCCCCCEEEEEEEECCCCEEEEEEEHHHHHHHCCCCCEEEEEECCCCCCEEEEEEEEEEEEEEEEECH PSIPRED HHHHHHHHHHHCCCCCCCCEEECCCCCCCCCCCHHHHHHHHHHHHCCCCCEEEECCCCHHHHHHCCCCHHHHHHHHCCHH DSSP CHHHHHHHHHHHCCCCCCCEEECCCCCCCCCCHHHHHHHHHHHHHHCCCCEEECCCCCCCHHHHHHCCHHHHHHHHCCHH 174 VYDACMDAFDCLPLAALMNQQFLCVHGGLSPEINTLDDIRKLDRFKEPPAYGPMCDILWSDPLEDFGNEKTQEHFTHNTV Reprof HHHHHHHHCCCCCHHHHHCCCEEEEECCCCCCCCCHHHHHHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCCCECCCCCE PSIPRED HHHHHHHHHCCCHHHHHCCCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHCCCCCCCCCCCCCCCCCCCCCC DSSP HHHHHHHHHCCCCCEEEECCCEEEECCCCCCCCCCHHHHHHCCCCCCCCCCCHHHHHHHCEECCCCCCCCCCCCEEECCC 254 RGCSYFYSYPAVCEFLQHNNLLSILRAHEAQDAGYRMYRKSQTTGFPSLITIFSAPNYLDVYNNKAAVLKYENNVMNIRQ Reprof CCEEEEECCCCEEEEHCCCCHHHHEHHHCCCCCCEEEEEECCCCCCCEEEEEEECCCEEEEECCCEEEEEECCCEEEEEE PSIPRED CCCCCCCCHHHHHHHHHHCCCCEEEEHHHHHHHHHHHHHCCCCCCCCCEEEEEECCCCCCCCCCEEEEEEECCCCCEEEE DSSP CCCCEEECHHHHHHHHHHCCCCEEEECCCCCCCCEEECCECCCCCCECEEEECCCCCHHHCCCCCEEEEEEECCEEEEEE 334 FNCSPHPYWLPNFMDVFTWSLPFVGEKVTEMLVNVLNICSSFEEAKGLDRINERMPPR Reprof ECCCCCCCCCCCCCEEEEEECCCCCHHHHHHHHHHHEECCEHHHHCCCCHHCCCCCCC PSIPRED ECCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHCCCCC DSSP ECCCCCCCCCHHHCCHHHHHHHHHHHHHHHHHHHHHCCCCCHHHHHHHHHHHHCCCCC
Prediction accuracy/precision
<figtable id="ss_acc">
Method | Q3 | Precision H | Precision E | Precision C |
---|---|---|---|---|
P04062 | ||||
PSIPRED | 0.831 | 0.830 | 0.872 | 0.810 |
Reprof | 0.553 | 1.000 | 0.455 | 0.592 |
P10775 | ||||
PSIPRED | 0.941 | 0.959 | 0.960 | 0.919 |
Reprof | 0.603 | 0.589 | 0.417 | 0.644 |
Q9X0E6 | ||||
PSIPRED | 0.890 | 1.000 | 0.895 | 0.720 |
Reprof | 0.580 | 0.562 | 0.917 | 0.458 |
Q08209 | ||||
PSIPRED | 0.833 | 0.842 | 0.902 | 0.812 |
Reprof | 0.579 | 0.762 | 0.293 | 0.743 |
</figtable>
Disorder
P04062
<figtable id="disorder_P04062">
Method | Disorder regions | |
---|---|---|
IUPRED | 2, 3, 6, 90-93, 229-231, 235, 236 | |
DisProt | ||
Precision: 0% | Sensitivity: undef | Specificy: 98% |
</figtable>
P10775
Homolog: DP00554 <figure id="fig:disorder_P10775">
</figure>
<figtable id="disorder_P10775">
Method | Disorder regions | |
---|---|---|
IUPRED | ||
DisProt | 31-50 | |
Precision: undef | Sensitivity: 0% | Specificy: 100% |
</figtable>
Q9X0E6
Q08209
<figure id="fig:disorder_Q08209">
</figure> DP00092 <figtable id="disorder_Q08209">
Method | Disorder regions | |
---|---|---|
IUPRED | 1-6, 8, 383,384,424,425,432,434-439,443,445,448,449,455,458,463-521 | |
DisProt | 1-13,390-414,374-468,469-486,487-521 | |
Precision: 100% | Sensitivity: 52% | Specificy: 100% |
</figtable>