Difference between revisions of "Sequence-based analyses of ARS A"

Revision as of 15:38, 3 June 2011

Additional Proteins

The following proteins are additionally used for the prediction of transmembrand alpha-helices and signal peptides and for the prediction of GO Terms:

BACR

BACR_HALSA is a bacterial membrane protein...

type	Position '	Description
Topological domain	14 – 23	Extracellular
Transmembrane	24 – 42	Helical; Name=Helix A
Topological domain	43 – 56	Cytoplasmic
Transmembrane	57 – 75	Helical; Name=Helix B
Topological domain	76 – 91	Extracellular
Transmembrane	92 – 109	Helical; Name=Helix C
Topological domain	110 – 120	Cytoplasmic
Transmembrane	121 – 140	Helical; Name=Helix D
Topological domain	141 – 147	Extracellular
Transmembrane	148 – 167	Helical; Name=Helix E
Topological domain	168 – 185	Cytoplasmic
Transmembrane	186 – 204	Helical; Name=Helix F
Topological domain	205 – 216	Extracellular
Transmembrane	217 – 236	Helical; Name=Helix G
Topological domain	237 – 262	Cytoplasmic

RET 4

RET4_HUMAN is a human retinal-binding protein. It delivers retinol from the liver stores to the peripheral tissues. Defects can cause night vision problems.

no regions available

INSL 5

INSL5_HUMAN is a human insulin-like peptide. It consists of two chains and may have a role in gut contractility or in thymic development and regulation.

no regions available

LAMP 1

LAMP1_HUMAN is a human membrane glycoprotein. It presents cabohydrate ligands to selectins.

type	Position '	Description
Topological Domain	29 - 382	Lumenal
Transmembrane	383 - 405	Helical
Topological Domain	406 - 417	Cytoplasmic
Region	29 - 194	First lumenal domain
Region	195 - 227	Hinge
Region	228 - 382	Second lumenal domain

A 4

A4_HUMAN is a human cell surface receptor involved in neurite growth, neuronal adhesion and axonogenesis. It can be involved in Alzheimer disease and Amyloidosis.

type	Position '	Description
Topological domain	18 - 699	Extracellular
Transmembrane	700 - 723	Helical
Topological domain	724 - 770	Cytoplasmic
Domain	291 - 341	BPTI / Kunitz inhibitor
Region	96 - 110	Heparin-binding
Region	181 - 188	Zinc-binding
Region	391 - 423	Heparin-binding
Region	491 - 522	Heparin-binding
Region	523 - 540	Collagen-binding
Region	732 - 751	Interaction with G(o)-alpha
Motif	724 - 734	Basolateral sorting signal
Motif	759 - 762	NPXY motif; contains endocytosis signal
Compositional bias	230 - 260	Asp/Glu-rich (acidic)
Compositional bias	274 - 280	Poly-Thr

Secondary Structure Prediction

PSI-PRED

PSI-PRED creates a profile obtained from a PSI-BLAST search, which is fed into a feed-forward neural network. The output of this network then serves as input of a second network, which yields the final prediction. The average Q3 score, reached by PSI-PRED is 80,3 %. <ref name="psipred">Jones, D. T.. "[Protein secondary structure prediction based on position-specific scoring matrices.]". J Mol Biol, 1999</ref>

Jpred

Jpred also uses a neural network to predict secondary structure. The prediction relies on the Jnet algorithm, wich either takes a multiple sequence alignment or a single sequence as input. If a single sequence is passed to the program, Jpred also uses sequence profiles derived from a PSI-BLAST search. It reaches an average Q3 score up to 81,5 %. <ref name="jpred">Cole, C. and Barber, J. D. and Barton, G. J.. "[The Jpred 3 secondary structure prediction server.]". Nucleic Acid Res, 2008</ref>

DSSP

DSSP is a database of protein secondary structure assignments for all proteins in PDB. It is based on a method, which takes the 3D coordinates of a protein and assigns a hierarchical definition of secondary structure elements to the protein. <ref name="dssp">Kabsch W. and Sander C.. "[Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.]". Biopolymers, 1983</ref>

DSSP file of ARS A

Results and Discussion

We predicted secondary structure of Arylsulfatase A with PSI-BLAST and Jpred3 using the Ebserver user interface. Further on, we downloaded the DSSP secondary structure assignment. DSSP assigns a hiearchical definition of secondary structure and therefore the assignment contains more structural classes than the 3 class prediction (H=helix, E=sheet, C=coil) of PSI-PRED and Jpred. To be able to compare the predictions to the assignemnt of DSSP, we converted the DSSP output classes to the three letter classification, using a perl script. The following table depicts DSSP classes, their description and the "3-letter-class", we converted it to.

DSSP class	Description '	3-letter class
H	Helix	H
G	3-10 Helix	H
I	Phi-Helix	H
B	single bridge	E
E	beta sheet	E
T	turn	C
S	bend	C
\s	coil	C

Both methods yield similar predictions. The following figure shows a schematic representation of the prediction. Besides, it depicts the true positive prediction - i.e. the same class was predicted by the method and assigned by DSSP - in green.

The actual predictions and the DSSP assignment are listed below. Missing residues in the DSSP output are marked by an "m".

mmmmmmmmmmmmmmmmmmCCCEEEEEEECCCCCCCCHHHCCCCCCCHHHHHHHHCCEEECCEECCCCCHHHHHHHHHHCCCHHHHCC (DSSP) CCHHHHHHHHHHHCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCHHHHHHHHCCCEECCCCCCCCCCHHHHHHHHHCCCCCCCCC (JPRED) CCHHHHHHHHHHHHCCCCCCCCEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCCCCCCCCHHHHHHHHHCCCCCCCCC (PSI-PRED) CCCCCCCCECCECCCCCCCHHHHHHCCCCEEEEEECCCCECCHHHCCCHHHHCCCEEEECCCCCCCCECCCCEEECCCEECCCCECC (DSSP) CCCCCCCCCCCCCCCCCCHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (JPRED) CCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (PSI-PRED) CCCCCCEEECCEEEEECCCHHHHHHHHHHHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHH (DSSP) CCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHH (JPRED) CCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHH (PSI-PRED) HHHHHHCCCHHHEEEEEEECCCCCHHHHHHCCCCCCCCCCCCCCCHHHHECCCEEECCCCCCCEEECCCEEHHHHHHHHHHHHCCCC (DSSP) HHHHHHCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEECCCCCCCCCECCCCCCCCHHHHHHHHHCCCC (JPRED) HHHHHHCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEECCCCCCCCEECCCHHHHHHHHHHHHHHCCCC (PSI-PRED) CCCCCCCCCCHHHHHCCCCCCCCEEEECCCCCCCCCCCCEEEECCEEEEEEECCCHHHCCCCCHHHCCCCCCEEEEEEEEEECCCCC (DSSP) CCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCEEEEECCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCCC (JPRED) CCCCCCCCCCHHHHCCCCCCCCCEEEECCCCCCCCCCEEEEEECCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCEEEECCCCC (PSI-PRED) CCCCCCCCCmmmCCHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHCECHHHCCCCCCCCCCCCCCCCECmmmm (DSSP) CCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (JPRED) CCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC (PSI-PRED)

Both methods show a good performance on the main part of the protein with an overall accurcy of 74 % for PSI-PRED and an accuracy of 71 % for Jpred3. Thus, the accuracy (Q3) in this prediction is around 10 % lower than the average Q3 scores in the original publications of PSI-PRED and Jpred. Both methods predict the wrong secondary structure for the region from around position 110-200. DSSP assigns very short helices and beta sheets in this regions. Perhaps these are too short for a proper prediction. It is also remarkable, that the scores within this false predicted region are as high as for the rest of the protein sequence.

Prediction and confidence scores for PSI-PRED.

Confidence scroes of the Jpred prediction.

Program	#TP	#FP	accuracy
PSI-PRED	374	133	0.74
Jpred	359	148	0.71

Prediction of Disordered Regions

Three different servers were challenged to predict disordered regions in ARSA, but no region was found that is consistent between the three methods.

DISOPRED

DISOPRED Server

Output of Disopred showing the probability of being disordered along the sequence

DISOPRED predictions for a false positive rate threshold of: 2%

conf: 930000000000012210000000000000000000000000000000000000000000
pred: *...........................................................
  AA: MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
              10        20        30        40        50        60

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
              70        80        90       100       110       120

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
             130       140       150       160       170       180

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
             190       200       210       220       230       240

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
             250       260       270       280       290       300

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
             310       320       330       340       350       360

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
             370       380       390       400       410       420

conf: 000000000000000000000000000000000000000000000000000000000000
pred: ............................................................
  AA: TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
             430       440       450       460       470       480

conf: 000000000000000002571699999
pred: ......................*****
  AA: EDPALQICCHPGCTPRPACCHCPDPHA
             490       500

Asterisks (*) represent disorder predictions and dots (.) 
prediction of order. The confidence estimates give a rough
indication of the probability that each residue is disordered.

As you can see, only the first residue and the five last residues are predicted to be in a disordered region. The confidence for not being disordered is very clear: only for the last ten residues there is a uncertainty.

POODLE

POODLE Server

plot of POODLE-output showing the probability of being disordered along the sequence

POODLE predicts many disordered residues. Depending on the treshold one can identify 6 or more disordered regions.

IUPred

IUPRED Server

The three different options of prediction were tried and are illustrated below. In general, IUPred did not predict any disordered region with a "Disorder tendency" above 0.6 except one very short region around residue 415 with the "long disorder"-option.

long disorder

The main profile of our server is to predict context-independent global disorder that encompasses at least 30 consecutive residues of predicted disorder. For this application the sequential neighbourhood of 100 residues is considered. <ref name="IUPred"> http://iupred.enzim.hu/Help.html</ref>

IUPred-output showing the probability of being disordered along the sequence with "long disorder"-option

short disorder

It uses a parameter set suited for predicting short, probably context-dependent, disordered regions, such as missing residues in the X-ray structure of an otherwise globular protein. For this application the sequential neighbourhood of 25 residues is considered. As chain termini of globular proteins are often disordered in X-ray structures, this is taken into account by an end-adjustment parameter which favors disorder prediction at the ends. <ref name="IUPred"> http://iupred.enzim.hu/Help.html</ref>

IUPred-output showing the probability of being disordered along the sequence with "short disorder"-option

structured domains

The dependable identification of ordered regions is a crucial step in target selection for structural studies and structural genomics projects. Finding putative structured domains suitable for stucture determination is another potential application of this server. In this case the algorithm takes the energy profile and finds continuous regions confidently predicted ordered. Neighbouring regions close to each other are merged, while regions shorter than the minimal domain size of at least 30 residues are ignored. When this prediction type is selected, the region(s) predicted to correspond to structured/globular domains are returned. <ref name="IUPred"> http://iupred.enzim.hu/Help.html</ref>

IUPred-output showing the probability of being disordered along the sequence with "structured domains"-option

Meta-Disorder

PredictProtein Server

PredictProtein needs a registration which I tried, but it does not work: "username does not exist!"

Prediction of transmembrane alpha-helices and signal peptides

The prediction of membrane proteins and their topology is very important, because the experimental determination of these protein is quite challenging. It is very dificult to determine the structure, because the influence of membrane mimetic environments might lead to non-native structures and thus lead to a wrongf structural model of the protein. <ref>Cross, Timothy, Mukesh Sharma, Myunggi Yi, Huan-Xiang Zhou (2010). "Influence of Solubilizing Environments on Membrane Protein Structures"</ref>

SignalP

ARS A	A4	RET4	INSL5	LAMP1	BACR

TMHMM

TMHMM predicts transmembrane helices (TMH) using a Hidden Markov Model (HMM). The protein described by TMH model essentially consists of seven different states. Globular domains can occur on the cytoplasmic and the non-cytoplasmic side. On the cytoplsmic side, globular domains are linked to loops, ehich are agin linked to cytoplasimc caps. These caps are followed by the helex core and there is again a cap on the non-cytoplasmic side. These caps are linked to globular domains by either short or long non-cytoplasmic loops.
TMHMM outputs the most likely structure of the protein, ragarding to the above model. It also includes the orientation (cytoplasmic or non-cytoplasmic side) of the N-terminal signal sequence. The ouput consists of a plot - graphically showing the different states along the protein - and some additional statistics <ref> http://www.cbs.dtu.dk/services/TMHMM-2.0/TMHMM2.0.guide.html#output </ref>:

The number of predicted transmembrane helices.
The expected number of amino acids in transmembrane helices. If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein. If this number more than a few, you should be warned that a predicted transmembrane helix in the N-term could be a signal peptide.
The total probability that the N-term is on the cytoplasmic side of the membrane.

Protein	#predicted TMHs	#expected AAs in TMHs	#expected AAs in TMHs in first 60 positions	orientation (N-term at non-cyto. side)	Graphical output
ARS A	0	2.65106	2.63079	0.12149
A4_HUMAN	1	22.72525	0.0027	0.00015
INSL5_HUMAN	0	0.50415	0.50415	0.03772
LAMP1_HUMAN	2	44.89582	22.24286	0.99287
RET4_HUMAN	0	0.01196	0.01179	0.01909
BACR	6	140.4032	26.1196	0.01887

Discussion

ARS A:outside 1 507 (=all)

A4_HUMAN: The topology is given below

Description	Position '
outside	1-700
TMhelix	701-723
inside	724-770

INSL5_HUMAN: outside 1 135 (all residues)

LAMP1_HUMAN POSSIBLE N-term signal sequence

Description	Position '
inside	1-10
TMhelix	11-33
outside	34-383
TMhelix	384-406
inside	407-417

RET4_HUMAN: outside 1 201 (all)
BACR:

sp_P02945_BACR_HALSA POSSIBLE N-term signal sequence

Description	Position '
outside	1-22
TMhelix	23-42
inside	43-54
TMhelix	55-77
outside	78-91
TMhelix	92-114
inside	115-120
TMhelix	121-143
outside	144-147
TMhelix	148-170
inside	171-189
TMhelix	190-212
outside	213-262

Phobius and Polyphobius

Protein	Phobius - graphical	Phobius - textual	Polyphobius - graphical	Polyphobius - textual
A4		FT SIGNAL 1 17 FT REGION 1 1 N-REGION. FT REGION 2 12 H-REGION. FT REGION 13 17 C-REGION. FT TOPO_DOM 18 700 NON CYTOPLASMIC. FT TRANSMEM 701 723 FT TOPO_DOM 724 770 CYTOPLASMIC.		FT SIGNAL 1 17 FT REGION 1 3 N-REGION. FT REGION 4 12 H-REGION. FT REGION 13 17 C-REGION. FT TOPO_DOM 18 700 NON CYTOPLASMIC. FT TRANSMEM 701 723 FT TOPO_DOM 724 770 CYTOPLASMIC.
ARS A		FT SIGNAL 1 28 FT REGION 1 6 N-REGION. FT REGION 7 18 H-REGION. FT REGION 19 28 C-REGION. FT TOPO_DOM 29 507 NON CYTOPLASMIC.		FT SIGNAL 1 16 FT REGION 1 4 N-REGION. FT REGION 5 12 H-REGION. FT REGION 13 16 C-REGION. FT TOPO_DOM 17 507 NON CYTOPLASMIC.
BACR		FT TOPO_DOM 1 22 NON CYTOPLASMIC. FT TRANSMEM 23 42 FT TOPO_DOM 43 53 CYTOPLASMIC. FT TRANSMEM 54 76 FT TOPO_DOM 77 95 NON CYTOPLASMIC. FT TRANSMEM 96 114 FT TOPO_DOM 115 120 CYTOPLASMIC. FT TRANSMEM 121 142 FT TOPO_DOM 143 147 NON CYTOPLASMIC. FT TRANSMEM 148 169 FT TOPO_DOM 170 189 CYTOPLASMIC. FT TRANSMEM 190 212 FT TOPO_DOM 213 217 NON CYTOPLASMIC. FT TRANSMEM 218 237 FT TOPO_DOM 238 262 CYTOPLASMIC.		FT TOPO_DOM 1 21 NON CYTOPLASMIC. FT TRANSMEM 22 43 FT TOPO_DOM 44 54 CYTOPLASMIC. FT TRANSMEM 55 77 FT TOPO_DOM 78 94 NON CYTOPLASMIC. FT TRANSMEM 95 114 FT TOPO_DOM 115 120 CYTOPLASMIC. FT TRANSMEM 121 141 FT TOPO_DOM 142 147 NON CYTOPLASMIC. FT TRANSMEM 148 166 FT TOPO_DOM 167 186 CYTOPLASMIC. FT TRANSMEM 187 205 FT TOPO_DOM 206 215 NON CYTOPLASMIC. FT TRANSMEM 216 237 FT TOPO_DOM 238 262 CYTOPLASMIC.
INSL5		FT SIGNAL 1 22 FT REGION 1 5 N-REGION. FT REGION 6 17 H-REGION. FT REGION 18 22 C-REGION. FT TOPO_DOM 23 135 NON CYTOPLASMIC.		FT SIGNAL 1 22 FT REGION 1 4 N-REGION. FT REGION 5 16 H-REGION. FT REGION 17 22 C-REGION. FT TOPO_DOM 23 135 NON CYTOPLASMIC.
LAMP1		FT SIGNAL 1 28 FT REGION 1 10 N-REGION. FT REGION 11 22 H-REGION. FT REGION 23 28 C-REGION. FT TOPO_DOM 29 381 NON CYTOPLASMIC. FT TRANSMEM 382 405 FT TOPO_DOM 406 417 CYTOPLASMIC.		FT SIGNAL 1 28 FT REGION 1 9 N-REGION. FT REGION 10 22 H-REGION. FT REGION 23 28 C-REGION. FT TOPO_DOM 29 381 NON CYTOPLASMIC. FT TRANSMEM 382 405 FT TOPO_DOM 406 417 CYTOPLASMIC.
RET4		FT SIGNAL 1 18 FT REGION 1 2 N-REGION. FT REGION 3 13 H-REGION. FT REGION 14 18 C-REGION. FT TOPO_DOM 19 201 NON CYTOPLASMIC.		FT SIGNAL 1 18 FT REGION 1 3 N-REGION. FT REGION 4 13 H-REGION. FT REGION 14 18 C-REGION. FT TOPO_DOM 19 201 NON CYTOPLASMIC.

OCTOPUS and SPOCTOPUS

OCPTOPUS uses a combination of a Hidden Markov Model and neural network to predict the topology of a transmembrane protein. It uses BALST to create a sequence profile, whihc is then used by the neural network to predict the preference of the amino acids to be located within a transmembrane (M), interface (I), close loop (L) globular loop (G), inside (i) or outside (o). These scores are then passed to the HMM, which predicts the final states.
SPOCTOPUS extends the OCTOPUS algorithm with a preprocessing step. OCTOPUS does not predict signal peptides. The N-terminal targeting sequences mainly consist of hydrophobic residues and thus thier properties strongly resemble the transmembrane helices. Not considering the signal peptides in the prediction often leads to a false prediction of a transmembrane helix at the N-terminal domain. Therefore SPOCTOPUS extends the OCTOPUS algorithm with the prediction of signal peptide preference scores within the first 70 amino acids of the protein. The exact location of a potential signal peptide are then predicted by a HMM in OCTOPUS.

Protein	OCTOPUS	SPOCTOPUS
ARS A
A4
RET4
INSL5
LAMP1
BACR

TargetP

TargetP is used to predict the cellular localization of a protein. It combines the two methods ChloroP and SignalP. The following targeting sequences can be identified:

chloroplast transit peptide (cTP)
mitochondrial targeting peptide (mTP)
secretory pathway signal peptide (SP)

TargetP uses a neural network to calculate and outputs scores for each of the above subcellular targets. TargetP finally predicts the location with the highest score. In our case all proteins are predicted to be targeted to the secretory pathway (S). Results are shown below. Note, that cTP is not included in our predictions, as we only considered eukaryotic and bacterial proteins. Also note, that TargetP is trained on eukaryotic proteins and hence the prediction for the protein "BACR", which is bacterial does not make sense, because there are completely different pathways of localization and secretion in eukayotes and bacteria (e.g. bacteria do not have an endoplasmatic reticulum, Golgi-Apparatus or Lysosome). Nevertheless, we included it in our analysis to see if TargetP predicts finds any localization sequence in it or predicts "-" (= no localization signal found).

Protein	mTP	SP	other	prediction
ARS A	0.070	0.926	0.054	S
A4_HUMAN	0.035	0.937	0.084	S
INSL5_HUMAN	0.074	0.899	0.037	S
LAMP1_HUMAN	0.043	0.953	0.017	S
RET4_HUMAN	0.242	0.928	0.020	S
BACR (bacterial)	0.019	0.897	0.562	S

Discussion

All proteins are assigned to the secretory pathway.

Arylsulfatase A is a lysosomal enzyme. Therefore, the prediction is correct, as lysosomal proteins are guided there by the secretory pathway, via the endoplasmatic reticulum and the Golgi apparatus.
coming
coming
coming
coming
As described above, BACR is a bacterial protein. TargetP assigns, that this protein is also guided to the secretory pathway, which makes no sense as the bacterial protein secretion is different from eukaryotic secretion. Nevertheless, the prediction is much less obvious in this case, compared to the others. The "other" class - meaning that no targeting sequence is found in the protein gets a considerable high score in this prediction, hence the assignment to S is more questionable here.

SignalP

sudo /apps/signalp-3.0/signalp -t gram- ../BACR.fasta > BACR.signalp
sudo /apps/signalp-3.0/signalp -t euk ../ARSA.fasta > ARSA.signalp
sudo /apps/signalp-3.0/signalp -t euk ../A4.fasta > A4.signalp
sudo /apps/signalp-3.0/signalp -t euk ../LAMP1.fasta > LAMP1.signalp
sudo /apps/signalp-3.0/signalp -t euk ../INSL5.fasta > INSL5.signalp
sudo /apps/signalp-3.0/signalp -t euk ../RET4.fasta > RET4.signalp

Prediction of GO Terms

GOPET

GOPET Server

GO-Terms for 6 different proteins were predicted. The results are shown below. Bold entries are GO-Terms which are really connected to the protein. <ref>http://www.ebi.ac.uk/QuickGO/</ref>

A4

GOid	Confidence	GO term
GO:0004866	87%	endopeptidase inhibitor activity
GO:0004867	86%	serine-type endopeptidase inhibitor activity
GO:0030568	83%	plasmin inhibitor activity
GO:0030304	83%	trypsin inhibitor activity
GO:0030414	82%	peptidase inhibitor activity
GO:0005488	79%	binding
GO:0005515	74%	protein binding
GO:0046872	73%	metal ion binding
GO:0003677	71%	DNA binding
GO:0008201	70%	heparin binding
GO:0008270	69%	zinc ion binding
GO:0005507	69%	copper ion binding
GO:0005506	67%	iron ion binding

ARS A

GOid	Confidence	GO term
GO:0003824	97%	catalytic activity
GO:0016787	96%	hydrolase activity
GO:0008484	95%	sulfuric ester hydrolase activity
GO:0004065	92%	arylsulfatase activity
GO:0004098	89%	cerebroside-sulfatase activity
GO:0003943	83%	N-acetylgalactosamine-4-sulfatase activity
GO:0004773	82%	steryl-sulfatase activity
GO:0004423	82%	iduronate-2-sulfatase activity
GO:0008449	82%	N-acetylglucosamine-6-sulfatase activity
GO:0047753	82%	choline-sulfatase activity
GO:0018741	81%	alkyl sulfatase activity
GO:0046872	63%	metal ion binding
GO:0016250	61%	N-sulfoglucosamine sulfohydrolase activity

BACR_HALSA

GOid	Confidence	GO term
GO:0005216	77%	ion channel activity
GO:0008020	75%	G-protein coupled photoreceptor activity
GO:0015078	60%	hydrogen ion transmembrane transporter activity

INSL 5

GOid	Confidence	GO term
GO:0005179	80%	hormone activity

LAMP 1

GOid	Confidence	GO term
GO:0004812	60%	aminoacyl-tRNA ligase activity
GO:0005524	60%	ATP binding

RET 4

GOid	Confidence	GO term
GO:0005488	90%	binding
GO:0005501	81%	retinoid binding
GO:0008289	80%	lipid binding
GO:0019841	78%	retinol binding
GO:0005215	78%	transporter activity
GO:0016918	78%	retinal binding
GO:0005319	69%	lipid transporter activity
GO:0008035	60%	high-density lipoprotein particle binding

Pfam

Pfam Server

ProtFun 2.2

ProtFun 2.2 Server

References

@@ Line 949: / Line 949: @@
 | ''' GO term '''
 |-
-| [http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0005216 GO:0005216]
+| '''[http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0005216 GO:0005216]'''
-| 77%
+| '''77%'''
-| ion channel activity
+| '''ion channel activity'''
 |-
 | [http://amigo.geneontology.org/cgi-bin/amigo/term_details?term=GO:0008020 GO:0008020]
@@ Line 962: / Line 962: @@
 |-
 |}
 ===== INSL 5 =====

Difference between revisions of "Sequence-based analyses of ARS A"

Revision as of 15:38, 3 June 2011

Contents

Additional Proteins

BACR

RET 4

INSL 5

LAMP 1

A 4

Secondary Structure Prediction

PSI-PRED

Jpred

DSSP

Results and Discussion

Prediction of Disordered Regions

DISOPRED

POODLE

IUPred

long disorder

short disorder

structured domains

Meta-Disorder

Prediction of transmembrane alpha-helices and signal peptides

SignalP

TMHMM

Discussion

Phobius and Polyphobius

OCTOPUS and SPOCTOPUS

TargetP

Discussion

SignalP

Prediction of GO Terms

GOPET

A4

ARS A

BACR_HALSA

INSL 5

LAMP 1

RET 4

Pfam

ProtFun 2.2

References

Navigation menu

Search