Sequence-based predictions TSD

From Bioinformatikpedia
Revision as of 15:09, 15 May 2012 by Meiera (talk | contribs) (GOpet)

Thor: He's my brother

Natasha Romanoff: He killed 80 people in 2 days

Thor: ...He's adopted

If not noted otherwise, the sequence for all predictions is the HEXA Reference sequence (Uniprot P06865). A protocol for this task can be found here.

Secondary structure

Proteins: Ribonuclease inhibitor P10775 , CutA Q9X0E6 , CAM-PRP catalytic subunit Q08209
Ribonuclease inhibitor and CutA are located in the cytoplasm whereas the CAM-PRP catalytic subunit is located in the nucleus.

Disorder

Transmembrane helices

Proteins: Dopamine D3 receptor P35462 , KvAP Q9YDF8 , AQP-4 P47863
Dopamine D3 receptor, KvAP and AQP-4 are multi-pass membrane proteins. <figtable id="tab:gopetgo">

Positions of transmembrane helices
Drd3 33–35 66–88 105–126 150–170 188–212 330–351 367–388
KvAP 39–63 68–92 109–125 129–145 160–184 222–253
AQP-4 37–57 65–85 116–136 156–176 185–205 232–252

Table TODO: Assigned transmembrane regions in Uniprot </figtable>


Signal peptides

Proteins: Serum albumin P02768, LAMP-1 P11279, AQP-4 P47863
HEXA LAMP-1 and Serum albumin contain a signal peptide. HEXA has an assigned peptide between position 1 and 22, LAMP-1 between 1 and 28 and Serum albumin between position 1 and 18.
LAMP-1 is a membrane protein which passes the membrane with one helix. Serum albumin, the main protein of plasma, is a secreted extracellular protein. AQP-4 is a multi-pass membrane protein which forms a waterspecific channel and functions in transport.


The prediction of the displayed results was performed with SignalP version 4.0.
SignalP employs 3 main scores for the prediction of signal peptides, C, S and Y. The S-score stands for the actual signal peptide prediction, with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. The C-score is the cleavage score, which indicates the best cleavage cite when significantly high. (When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein.) Y-max is a derivative of the C-score combined with the S-score calculated to give a better cleavage site prediction than the raw C-score alone.
There are two additional scores reported in the SignalP output, namely the S-mean and the D-score. The S-mean is the average of the S-score, ranging from the N-terminal amino acid to the amino acid assigned with the highest Y-max score. The D-score is implemented as a weighted average of the S-mean and the Y-max scores.
For non-secretory proteins all scores are supposed to be very low.

<figtable id="tbl:signalp">

Sp P06865 HEXA HUMAN.png
Sp P47863 AQP4 RAT TSD.png
Sp P11279 LAMP1 HUMAN TSD.png
Sp P02768 ALBU HUMAN TSD.png
Table : Signal peptide predictions.

</figtable>

The <xr id="tbl:signalp"/> displays the results of the SignalP predictions. The additional scores can be viewed here. HEXA, LAMP-1 and Serum albumin are correctly predicted one signal peptide at the beginning of the sequence and AQP-4 is identified as a mature protein.


GO terms

GOpet

<figtable id="tab:gopetgo">

GO-Term ID Type Confidence GO-Term description Validation
GO:0003824 Molecular function 97% catalytic activity true
GO:0004563 Molecular function 96% beta-N-acetylhexosaminidase activity true
GO:0015929 Molecular function 96% hexosaminidase activity false
GO:0016787 Molecular function 96% hydrolase activity true
GO:0016798 Molecular function 96% hydrolase activity acting on glycosyl bonds true
GO:0004553 Molecular function 96% hydrolase activity hydrolyzing O-glycosyl compounds true
GO:0016799 Molecular function 77% hydrolase activity hydrolyzing N-glycosyl compounds false
GO:0046982 Molecular function 61% protein heterodimerization activity true

Table TODO: GO term prediction. </figtable>