Difference between revisions of "Sequence-based analyses of ARS A"

From Bioinformatikpedia
(Additional Proteins)
Line 76: Line 76:
 
* [http://www.uniprot.org/uniprot/P02753 RET4_HUMAN] is a human retinal-binding protein. It delivers retinol from the liver stores to the peripheral tissues. Defects can cause night vision problems.
 
* [http://www.uniprot.org/uniprot/P02753 RET4_HUMAN] is a human retinal-binding protein. It delivers retinol from the liver stores to the peripheral tissues. Defects can cause night vision problems.
   
  +
no regions available
{| border="1" style="text-align:center; border-spacing:0;"
 
| ''' type '''
 
| ''' Position ''''
 
| ''' Description '''
 
|-
 
| Signal peptide
 
| 1 - 18
 
|
 
|-
 
| Chain
 
| 19 - 201
 
| Retinol-binding protein 4
 
|-
 
| Chain
 
| 19 - 200
 
| Plasma retinol-binding protein (1-182)
 
|-
 
| Chain
 
| 19 - 199
 
| Plasma retinol-binding protein (1-181)
 
|-
 
| Chain
 
| 19 - 197
 
| Plasma retinol-binding protein (1-179)
 
|-
 
| Chain
 
| 19 - 194
 
| Plasma retinol-binding protein (1-176)
 
|-
 
|}
 
   
   
Line 111: Line 82:
 
* [http://www.uniprot.org/uniprot/Q9Y5Q6 INSL5_HUMAN] is a human insulin-like peptide. It consists of two chains and may have a role in gut contractility or in thymic development and regulation.
 
* [http://www.uniprot.org/uniprot/Q9Y5Q6 INSL5_HUMAN] is a human insulin-like peptide. It consists of two chains and may have a role in gut contractility or in thymic development and regulation.
   
  +
no regions available
{| border="1" style="text-align:center; border-spacing:0;"
 
| ''' type '''
 
| ''' Position ''''
 
| ''' Description '''
 
|-
 
| Signal peptide
 
| 1 - 22
 
|
 
|-
 
| Peptide
 
| 23 - 46
 
| Insulin-like peptide INSL5 B chain
 
|-
 
| Propeptide
 
| 49 - 114
 
| Connecting peptide
 
|-
 
| Peptide
 
| 115 - 135
 
| Insulin-like peptide INSL5 A chain
 
|-
 
|}
 
   
   
Line 143: Line 93:
 
| ''' Description '''
 
| ''' Description '''
 
|-
 
|-
  +
| Topological Domain
| Signal peptide
 
| 1 - 28
+
| 29 - 382
|
+
| Lumenal
 
|-
 
|-
  +
| Transmembrane
| Chain
 
| 29 - 417
+
| 383 - 405
  +
| Helical
| Lysosome-associated membrane glycoprotein 1
 
|-
+
|-
  +
| Topological Domain
  +
| 406 - 417
  +
| Cytoplasmic
  +
|-
  +
| Region
  +
| 29 - 194
  +
| First lumenal domain
  +
|-
  +
| Region
  +
| 195 - 227
  +
| Hinge
  +
|-
  +
| Region
  +
| 228 - 382
  +
| Second lumenal domain
  +
|-
 
|}
 
|}
   
Line 162: Line 128:
 
| ''' Description '''
 
| ''' Description '''
 
|-
 
|-
  +
| Topological domain
| Signal peptide
 
| 1 - 17
+
| 18 - 699
  +
| Extracellular
 
|-
 
|-
  +
| Transmembrane
| Chain
 
| 18 - 770
+
| 700 - 723
  +
| Helical
| Amyloid beta A4 protein
 
 
|-
 
|-
  +
| Topological domain
| Chain
 
| 18 - 687
+
| 724 - 770
  +
| Cytoplasmic
| Soluble APP-alpha
 
 
|-
 
|-
| Chain
+
| Domain
| 18 - 671
+
| 291 - 341
  +
| BPTI / Kunitz inhibitor
| Soluble APP-beta
 
 
|-
 
|-
| Chain
+
| Region
| 18 - 286
+
| 96 - 110
  +
| Heparin-binding
| N-APP
 
 
|-
 
|-
| Chain
+
| Region
| 672 - 770
+
| 181 - 188
  +
| Zinc-binding
| C99
 
 
|-
 
|-
| Chain
+
| Region
| 672 - 713
+
| 391 - 423
  +
| Heparin-binding
| Beta-amyloid protein 42
 
 
|-
 
|-
| Chain
+
| Region
| 672 - 711
+
| 491 - 522
  +
| Heparin-binding
| Beta-amyloid protein 40
 
 
|-
 
|-
| Chain
+
| Region
| 688 - 770
+
| 523 - 540
  +
| Collagen-binding
| C83
 
 
|-
 
|-
| Chain
+
| Region
| 688 - 713
+
| 732 - 751
  +
| Interaction with G(o)-alpha
| P3 (42)
 
 
|-
 
|-
| Chain
+
| Motif
| 688 - 711
+
| 724 - 734
  +
| Basolateral sorting signal
| P3 (40)
 
 
|-
 
|-
| Chain
+
| Motif
| 691 - 770
+
| 759 - 762
  +
| NPXY motif; contains endocytosis signal
| C80
 
 
|-
 
|-
  +
| Compositional bias
| Chain
 
| 712 - 770
+
| 230 - 260
  +
| Asp/Glu-rich (acidic)
| Gamma-secretase C-terminal fragment 59
 
 
|-
 
|-
  +
| Compositional bias
| Chain
 
| 714 - 770
+
| 274 - 280
  +
| Poly-Thr
| Gamma-secretase C-terminal fragment 57
 
 
|-
 
|-
| Chain
 
| 721 - 770
 
| Gamma-secretase C-terminal fragment 50 (by similarity)
 
|-
 
| Chain
 
| 740 - 770
 
| C31
 
|-
 
 
|}
 
|}
   

Revision as of 12:12, 3 June 2011

Additional Proteins

The following proteins are additionally used for the prediction of transmembrand alpha-helices and signal peptides and for the prediction of GO Terms:

BACR

BACR_HALSA is a bacterial membrane protein...

type Position ' Description
Topological domain 14 – 23 Extracellular
Transmembrane 24 – 42 Helical; Name=Helix A
Topological domain 43 – 56 Cytoplasmic
Transmembrane 57 – 75 Helical; Name=Helix B
Topological domain 76 – 91 Extracellular
Transmembrane 92 – 109 Helical; Name=Helix C
Topological domain 110 – 120 Cytoplasmic
Transmembrane 121 – 140 Helical; Name=Helix D
Topological domain 141 – 147 Extracellular
Transmembrane 148 – 167 Helical; Name=Helix E
Topological domain 168 – 185 Cytoplasmic
Transmembrane 186 – 204 Helical; Name=Helix F
Topological domain 205 – 216 Extracellular
Transmembrane 217 – 236 Helical; Name=Helix G
Topological domain 237 – 262 Cytoplasmic


RET 4

  • RET4_HUMAN is a human retinal-binding protein. It delivers retinol from the liver stores to the peripheral tissues. Defects can cause night vision problems.

no regions available


INSL 5

  • INSL5_HUMAN is a human insulin-like peptide. It consists of two chains and may have a role in gut contractility or in thymic development and regulation.

no regions available


LAMP 1

  • LAMP1_HUMAN is a human membrane glycoprotein. It presents cabohydrate ligands to selectins.
type Position ' Description
Topological Domain 29 - 382 Lumenal
Transmembrane 383 - 405 Helical
Topological Domain 406 - 417 Cytoplasmic
Region 29 - 194 First lumenal domain
Region 195 - 227 Hinge
Region 228 - 382 Second lumenal domain


A 4

  • A4_HUMAN is a human cell surface receptor involved in neurite growth, neuronal adhesion and axonogenesis. It can be involved in Alzheimer disease and Amyloidosis.
type Position ' Description
Topological domain 18 - 699 Extracellular
Transmembrane 700 - 723 Helical
Topological domain 724 - 770 Cytoplasmic
Domain 291 - 341 BPTI / Kunitz inhibitor
Region 96 - 110 Heparin-binding
Region 181 - 188 Zinc-binding
Region 391 - 423 Heparin-binding
Region 491 - 522 Heparin-binding
Region 523 - 540 Collagen-binding
Region 732 - 751 Interaction with G(o)-alpha
Motif 724 - 734 Basolateral sorting signal
Motif 759 - 762 NPXY motif; contains endocytosis signal
Compositional bias 230 - 260 Asp/Glu-rich (acidic)
Compositional bias 274 - 280 Poly-Thr

Secondary Structure Prediction

Structure comparison.jpeg

Program #TP #FP accuracy
PSI-PRED 374 133 0.74
Jpred 359 148 0.71

mmmmmmmmmmmmmmmmmmCCCEEEEEEECCCCCCCCHHHCCCCCCCHHHHHHHHCCEEECCEECCCCCHHHHHHHHHHCCCHHHHCC (DSSP)
CCHHHHHHHHHHHCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCHHHHHHHHCCCEECCCCCCCCCCHHHHHHHHHCCCCCCCCC (JPRED)
CCHHHHHHHHHHHHCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCCCCCCCCHHHHHHHHHCCCCCCCCC (PSI-PRED)

CCCCCCCCECCECCCCCCCHHHHHHCCCCEEEEEECCCCECCHHHCCCHHHHCCCEEEECCCCCCCCECCCCEEECCCEECCCCECC (DSSP)
CCCCCCCCCCCCCCCCCCHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (JPRED)
CCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (PSI-PRED)

CCCCCCEEECCEEEEECCCHHHHHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHH (DSSP)
CCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH (JPRED)
CCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHH (PSI-PRED)

HHHHHHCCCHHHEEEEEEECCCCCHHHHHHCCCCCCCCCCCCCCCHHHHECCCEEECCCCCCCEEECCCEEHHHHHHHHHHHHCCCC (DSSP)
HHHHHHCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEECCCCCCCCCECCCCCCCCHHHHHHHHHCCCC (JPRED)
HHHHHHCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEECCCCCCCCEECCCHHHHHHHHHHHHHHCCCC (PSI-PRED)

CCCCCCCCCCHHHHHCCCCCCCCEEEECCCCCCCCCCCCEEEECCEEEEEEECCCHHHCCCCCHHHCCCCCCEEEEEEEEEECCCCC (DSSP)
CCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCEEEEECCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCCC (JPRED)
CCCCCCCCCCHHHHCCCCCCCCCEEEECCCCCCCCCCEEEEEECCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCEEEECCCCC (PSI-PRED)

CCCCCCCCCmmmCCHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHCECHHHCCCCCCCCCCCCCCCCECmmmm (DSSP)
CCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (JPRED)
CCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (PSI-PRED)

DSSP

Prediction of Disordered Regions

DISOPRED

POODLE

IUPRED

Meta-Disorder

Prediction of transmembrane alpha-helices and signal peptides

SignalP

ARS A A4 RET4 INSL5 LAMP1 BACR
ARSA.jpeg A4.jpeg RET4.jpeg INSL5.jpeg LAMP1.jpeg BACR.jpeg


TMHMM

TMHMM predicts transmembrane helices (TMH) using a Hidden Markov Model (HMM). The protein described by TMH model essentially consists of seven different states. Globular domains can occur on the cytoplasmic and the non-cytoplasmic side. On the cytoplsmic side, globular domains are linked to loops, ehich are agin linked to cytoplasimc caps. These caps are followed by the helex core and there is again a cap on the non-cytoplasmic side. These caps are linked to globular domains by either short or long non-cytoplasmic loops.
TMHMM outputs the most likely structure of the protein, ragarding to the above model. It also includes the orientation (cytoplasmic or non-cytoplasmic side) of the N-terminal signal sequence. The ouput consists of a plot - graphically showing the different states along the protein - and some additional statistics <ref> http://www.cbs.dtu.dk/services/TMHMM-2.0/TMHMM2.0.guide.html#output </ref>:

  • The number of predicted transmembrane helices.
  • The expected number of amino acids in transmembrane helices. If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
  • The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein. If this number more than a few, you should be warned that a predicted transmembrane helix in the N-term could be a signal peptide.
  • The total probability that the N-term is on the cytoplasmic side of the membrane.


Protein #predicted TMHs #expected AAs in TMHs #expected AAs in TMHs in first 60 positions orientation (N-term at non-cyto. side) Graphical output
ARS A 0 2.65106 2.63079 0.12149 Sp P15289 ARSA HUMAN.gif
A4_HUMAN 1 22.72525 0.0027 0.00015 Sp P05067 A4 HUMAN.gif
INSL5_HUMAN 0 0.50415 0.50415 0.03772 Sp Q9Y5Q6 INSL5 HUMAN.gif
LAMP1_HUMAN 2 44.89582 22.24286 0.99287 Sp P11279 LAMP1 HUMAN.gif
RET4_HUMAN 0 0.01196 0.01179 0.01909 Sp P02753 RET4 HUMAN.gif
BACR 6 140.4032 26.1196 0.01887 Sp P02945 BACR HALSA.gif


Discussion
  • ARS A:outside 1 507 (=all)
  • A4_HUMAN: The topology is given below
Description Position '
outside 1-700
TMhelix 701-723
inside 724-770
  • INSL5_HUMAN: outside 1 135 (all residues)
  • LAMP1_HUMAN POSSIBLE N-term signal sequence
Description Position '
inside 1-10
TMhelix 11-33
outside 34-383
TMhelix 384-406
inside 407-417
  • RET4_HUMAN: outside 1 201 (all)
  • BACR:
  1. sp_P02945_BACR_HALSA POSSIBLE N-term signal sequence
Description Position '
outside 1-22
TMhelix 23-42
inside 43-54
TMhelix 55-77
outside 78-91
TMhelix 92-114
inside 115-120
TMhelix 121-143
outside 144-147
TMhelix 148-170
inside 171-189
TMhelix 190-212
outside 213-262

Phobius and Polyphobius

OCTOPUS and SPOCTOPUS

OCPTOPUS uses a combination of a Hidden Markov Model and neural network to predict the topology of a transmembrane protein. It uses BALST to create a sequence profile, whihc is then used by the neural network to predict the preference of the amino acids to be located within a transmembrane (M), interface (I), close loop (L) globular loop (G), inside (i) or outside (o). These scores are then passed to the HMM, which predicts the final states.
SPOCTOPUS extends the OCTOPUS algorithm with a preprocessing step. OCTOPUS does not predict signal peptides. The N-terminal targeting sequences mainly consist of hydrophobic residues and thus thier properties strongly resemble the transmembrane helices. Not considering the signal peptides in the prediction often leads to a false prediction of a transmembrane helix at the N-terminal domain. Therefore SPOCTOPUS extends the OCTOPUS algorithm with the prediction of signal peptide preference scores within the first 70 amino acids of the protein. The exact location of a potential signal peptide are then predicted by a HMM in OCTOPUS.


Protein OCTOPUS SPOCTOPUS
ARS A Octopus arsa leuko.png Spoctopus arsa leuko.png
A4 Octopus a4 leuko.png Spoctopus a4 leuko.png
RET4 Octopus ret4 leuko.png Spoctopus ret4 leuko.png
INSL5 Octopus insl5 leuko.png Spoctopus insl5 leuko.png
LAMP1 Octopus lamp1 leuko.png Spoctopus lamp1 leuko.png
BACR Octopus bacr leuko.png Spoctopus bacr leuko.png


TargetP

TargetP is used to predict the cellular localization of a protein. It combines the two methods ChloroP and SignalP. The following targeting sequences can be identified:

  • chloroplast transit peptide (cTP)
  • mitochondrial targeting peptide (mTP)
  • secretory pathway signal peptide (SP)

TargetP uses a neural network to calculate and outputs scores for each of the above subcellular targets. TargetP finally predicts the location with the highest score. In our case all proteins are predicted to be targeted to the secretory pathway (S). Results are shown below. Note, that cTP is not included in our predictions, as we only considered eukaryotic and bacterial proteins. Also note, that TargetP is trained on eukaryotic proteins and hence the prediction for the protein "BACR", which is bacterial does not make sense, because there are completely different pathways of localization and secretion in eukayotes and bacteria (e.g. bacteria do not have an endoplasmatic reticulum, Golgi-Apparatus or Lysosome). Nevertheless, we included it in our analysis to see if TargetP predicts finds any localization sequence in it or predicts "-" (= no localization signal found).

Protein mTP SP other prediction
ARS A 0.070 0.926 0.054 S
A4_HUMAN 0.035 0.937 0.084 S
INSL5_HUMAN 0.074 0.899 0.037 S
LAMP1_HUMAN 0.043 0.953 0.017 S
RET4_HUMAN 0.242 0.928 0.020 S
BACR (bacterial) 0.019 0.897 0.562 S
Discussion

All proteins are assigned to the secretory pathway.

  • Arylsulfatase A is a lysosomal enzyme. Therefore, the prediction is correct, as lysosomal proteins are guided there by the secretory pathway, via the endoplasmatic reticulum and the Golgi apparatus.
  • coming
  • coming
  • coming
  • coming
  • As described above, BACR is a bacterial protein. TargetP assigns, that this protein is also guided to the secretory pathway, which makes no sense as the bacterial protein secretion is different from eukaryotic secretion. Nevertheless, the prediction is much less obvious in this case, compared to the others. The "other" class - meaning that no targeting sequence is found in the protein gets a considerable high score in this prediction, hence the assignment to S is more questionable here.

SignalP

sudo /apps/signalp-3.0/signalp -t gram- ../BACR.fasta > BACR.signalp
sudo /apps/signalp-3.0/signalp -t euk ../ARSA.fasta > ARSA.signalp
sudo /apps/signalp-3.0/signalp -t euk ../A4.fasta > A4.signalp
sudo /apps/signalp-3.0/signalp -t euk ../LAMP1.fasta > LAMP1.signalp
sudo /apps/signalp-3.0/signalp -t euk ../INSL5.fasta > INSL5.signalp
sudo /apps/signalp-3.0/signalp -t euk ../RET4.fasta > RET4.signalp

Prediction of GO Terms

GOPET

Pfam

ProtFun 2.2