Sequence-based predictions TSD
Thor: He's my brother
Natasha Romanoff: He killed 80 people in 2 days
Thor: ...He's adopted
If not noted otherwise, the sequence for all predictions is the HEXA Reference sequence (Uniprot P06865). A protocol for this task can be found here.
Contents
Secondary structure
Proteins: Ribonuclease inhibitor P10775 , CutA Q9X0E6 , CAM-PRP catalytic subunit Q08209
Ribonuclease inhibitor and CutA are located in the cytoplasm whereas the CAM-PRP catalytic subunit is located in the nucleus.
DSSP builds upon 3D structures, therefore a PDB entry has to be selected for every given Uniprot entry. The chosen mapping is 2bnh for P10775, 1kr4 for Q9X0E6, 1aui for Q08209 and 2gjx for P06865.
Disorder
Transmembrane helices
Proteins: Dopamine D3 receptor P35462 , KvAP Q9YDF8 , AQP-4 P47863
Dopamine D3 receptor, KvAP and AQP-4 are multi-pass membrane proteins.
<figtable id="tab:gopetgo">
Positions of transmembrane helices | |||||||
---|---|---|---|---|---|---|---|
Drd3 | 33–35 | 66–88 | 105–126 | 150–170 | 188–212 | 330–351 | 367–388 |
KvAP | 39–63 | 68–92 | 109–125 | 129–145 | 160–184 | 222–253 | |
AQP-4 | 37–57 | 65–85 | 116–136 | 156–176 | 185–205 | 232–252 |
Table TODO: Assigned transmembrane regions in Uniprot </figtable>
Signal peptides
Proteins: Serum albumin P02768, LAMP-1 P11279, AQP-4 P47863
HEXA LAMP-1 and Serum albumin contain a signal peptide. HEXA has an assigned peptide between position 1 and 22, LAMP-1 between 1 and 28 and Serum albumin between position 1 and 18.
LAMP-1 is a membrane protein which passes the membrane with one helix. Serum albumin, the main protein of plasma, is a secreted extracellular protein.
AQP-4 is a multi-pass membrane protein which forms a waterspecific channel and functions in transport.
The prediction of the displayed results was performed with SignalP version 4.0.
SignalP employs 3 main scores for the prediction of signal peptides, C, S and Y. The S-score stands for the actual signal peptide prediction, with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein.
The C-score is the cleavage score, which indicates the best cleavage cite when significantly high. (When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein.)
Y-max is a derivative of the C-score combined with the S-score calculated to give a better cleavage site prediction than the raw C-score alone.
There are two additional scores reported in the SignalP output, namely the S-mean and the D-score. The S-mean is the average of the S-score, ranging from the N-terminal amino acid to the amino acid assigned with the highest Y-max score. The D-score is implemented as a weighted average of the S-mean and the Y-max scores.
For non-secretory proteins all scores are supposed to be very low.
<figtable id="tbl:signalp">
Table : Signal peptide predictions. |
</figtable>
The <xr id="tbl:signalp"/> displays the results of the SignalP predictions. The additional scores can be viewed here. HEXA, LAMP-1 and Serum albumin are correctly predicted one signal peptide at the beginning of the sequence and AQP-4 is identified as a mature protein.
GO terms
GOpet
<xr id="tab:gopetgo"/> depicts the prediction results for the Hexa protein from GOpet. The accuracy is 0.75.
<figtable id="tab:gopetgo">
GO-Term ID | Type | Confidence | GO-Term description | Validation |
---|---|---|---|---|
GO:0003824 | Molecular function | 97% | catalytic activity | true |
GO:0004563 | Molecular function | 96% | beta-N-acetylhexosaminidase activity | true |
GO:0015929 | Molecular function | 96% | hexosaminidase activity | false |
GO:0016787 | Molecular function | 96% | hydrolase activity | true |
GO:0016798 | Molecular function | 96% | hydrolase activity acting on glycosyl bonds | true |
GO:0004553 | Molecular function | 96% | hydrolase activity hydrolyzing O-glycosyl compounds | true |
GO:0016799 | Molecular function | 77% | hydrolase activity hydrolyzing N-glycosyl compounds | false |
GO:0046982 | Molecular function | 61% | protein heterodimerization activity | true |
Table TODO: GO term prediction from GOpet. </figtable>
ProtFun2.0
<xr id="tab:gopetgo"/>
<figtable id="tab:protfun">
Gene Ontology category | Probability |
---|---|
Signal_transducer | 8.3% |
Receptor | 10.5% |
Hormone | 0.1% |
Structural_protein | 1.0% |
Transporter | 2.4% |
Ion_channel | 1.8% |
Voltage-gated_ion_channel | 0.2% |
Cation_channel | 1.0% |
Transcription | 5.8% |
Transcription_regulation | 2.6% |
Stress_response | 4.4% |
Immune_response | 1.4% |
Growth_factor | 0.5% |
Metal_ion_transport | 0.9% |
Table TODO: GO term prediction from from ProtFun2.0. </figtable>