Difference between revisions of "Sequence-based predictions TSD"

Revision as of 16:25, 15 May 2012

Thor: He's my brother

Natasha Romanoff: He killed 80 people in 2 days

Thor: ...He's adopted

If not noted otherwise, the sequence for all predictions is the HEXA Reference sequence (Uniprot P06865). A protocol for this task can be found here.

Secondary structure

Proteins: Ribonuclease inhibitor P10775 , CutA Q9X0E6 , CAM-PRP catalytic subunit Q08209
Ribonuclease inhibitor and CutA are located in the cytoplasm whereas the CAM-PRP catalytic subunit is located in the nucleus.

DSSP and handling of differing sequences

DSSP builds upon 3D structures, therefore a PDB entry has to be selected for every given Uniprot entry. The chosen mapping is 2bnh for P10775, 1kr4 for Q9X0E6, 1aui for Q08209 and 2gjx for P06865. This creates an additional problem. All other resources base their predictions on the Uniprot sequence. The sequence used by DSSP, inferred from the PDB file might be significantly different due to changes of the experimentalists solving the structure. There are automated ways to resolve this, using PDBs new mmCIF files, which provide a residue-level mapping between the atom recored inferred sequence and the SEQRES record sequence. From there one could use SIFTS which provides a residue-level mapping between SEQRES and Uniprot. However both these tools are automated and, while surely developed with great care, looking at the sequence might be considered favorable and will also directly point out interesting parts. Therefore manual alignments were performed and, if applicable, special cases noted in the text.

Disorder

Transmembrane helices

Proteins: Dopamine D3 receptor P35462 , KvAP Q9YDF8 , AQP-4 P47863
Dopamine D3 receptor, KvAP and AQP-4 are multi-pass membrane proteins. <figtable id="tab:gopetgo">

Positions of transmembrane helices
Drd3	33–35	66–88	105–126	150–170	188–212	330–351	367–388
KvAP	39–63	68–92	109–125	129–145	160–184	222–253
AQP-4	37–57	65–85	116–136	156–176	185–205	232–252

Table TODO: Assigned transmembrane regions in Uniprot </figtable>

Signal peptides

Proteins: Serum albumin P02768, LAMP-1 P11279, AQP-4 P47863
HEXA LAMP-1 and Serum albumin contain a signal peptide. HEXA has an assigned peptide between position 1 and 22, LAMP-1 between 1 and 28 and Serum albumin between position 1 and 18.
LAMP-1 is a membrane protein which passes the membrane with one helix. Serum albumin, the main protein of plasma, is a secreted extracellular protein. AQP-4 is a multi-pass membrane protein which forms a waterspecific channel and functions in transport.

The prediction of the displayed results was performed with SignalP version 4.0.
SignalP employs 3 main scores for the prediction of signal peptides, C, S and Y. The S-score stands for the actual signal peptide prediction, with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. The C-score is the cleavage score, which indicates the best cleavage cite when significantly high. (When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein.) Y-max is a derivative of the C-score combined with the S-score calculated to give a better cleavage site prediction than the raw C-score alone.
There are two additional scores reported in the SignalP output, namely the S-mean and the D-score. The S-mean is the average of the S-score, ranging from the N-terminal amino acid to the amino acid assigned with the highest Y-max score. The D-score is implemented as a weighted average of the S-mean and the Y-max scores.
For non-secretory proteins all scores are supposed to be very low.



	Table : Signal peptide predictions.

</figtable>

The <xr id="tbl:signalp"/> displays the results of the SignalP predictions. The additional scores can be viewed here. HEXA, LAMP-1 and Serum albumin are correctly predicted one signal peptide at the beginning of the sequence and AQP-4 is identified as a mature protein.

GO terms

GOpet

<xr id="tab:gopetgo"/> depicts the prediction results for the Hexa protein from GOpet. The predictions are all given with a very high confidence. Of the 35 GO terms which are associated with the HexA GOpet identified 6 correctly and 2 are falsely assigned.

GO-Term ID	Type	Confidence	GO-Term description	Validation
GO:0003824	Molecular function	97%	catalytic activity	true
GO:0004563	Molecular function	96%	beta-N-acetylhexosaminidase activity	true
GO:0015929	Molecular function	96%	hexosaminidase activity	false
GO:0016787	Molecular function	96%	hydrolase activity	true
GO:0016798	Molecular function	96%	hydrolase activity acting on glycosyl bonds	true
GO:0004553	Molecular function	96%	hydrolase activity hydrolyzing O-glycosyl compounds	true
GO:0016799	Molecular function	77%	hydrolase activity hydrolyzing N-glycosyl compounds	false
GO:0046982	Molecular function	61%	protein heterodimerization activity	true

Table TODO: GO term prediction from GOpet. </figtable>

ProtFun2.0

ProtFun2.0 employes various tools for the protein function prediction. A large number of feature prediction servers are queried such as SignalP to obtain information, which are integrated into final predictions of the cellular role, enzyme class, and selected Gene Ontology categories of the submitted sequence.
The Gene Ontology categories are displayed in <xr id="tab:gopetgo"/>. There is no single prediction above 10%, thus the HexA is not attributed to any of these GO categories. With a closer examination of these classes it becomes clear that neither of them matches the function of our protein.
Further on the HexA protein is predicted to be a enzyme more specifically Ligase (EC 6.-.-.-).
"Cell_envelope" is chosen as the functional category with a probability of over 80%. This prediction seems to be the most accurate although it is not very apparent where this classification comes from and how it can be validated. The GO category prediction can be neglected for the HexA protein.

Gene Ontology category	Probability
Signal_transducer	8.3%
Receptor	10.5%
Hormone	0.1%
Structural_protein	1.0%
Transporter	2.4%
Ion_channel	1.8%
Voltage-gated_ion_channel	0.2%
Cation_channel	1.0%
Transcription	5.8%
Transcription_regulation	2.6%
Stress_response	4.4%
Immune_response	1.4%
Growth_factor	0.5%
Metal_ion_transport	0.9%

Table TODO: GO term prediction from from ProtFun2.0. </figtable>

Revision as of 16:21, 15 May 2012 (view source) Meiera (talk \| contribs) (→‎ProtFun2.0) ← Older edit		Revision as of 16:25, 15 May 2012 (view source) Reeb (talk \| contribs) m (→‎Pfam) Newer edit →
Line 222:		Line 222:

	==Pfam==		==Pfam==
		+	<!-- show a three structure of our proteins, with the PFAM domains highlighted -->

Difference between revisions of "Sequence-based predictions TSD"

Revision as of 16:25, 15 May 2012

Contents

Secondary structure

DSSP and handling of differing sequences

Disorder

Transmembrane helices

Signal peptides

GO terms

GOpet

ProtFun2.0

Pfam

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools