Task 3: Sequence-based predictions
- 1 Task description
- 2 Task 3.1: Secondary structure prediction
- 3 Task 3.2: Prediction of disordered regions
- 4 Task 3.3: Prediction of transmembrane alpha-helices and signal peptides
- 4.1 Annotated sequence features
- 4.2 General Questions to prediction of transmembrane alpha-helices and signal peptides
- 4.3 TMHMM
- 4.4 Phobius and PolyPhobius
- 4.5 OCTOPUS and SPOCTOPUS
- 4.6 SignalP
- 4.7 TargetP
- 5 Task 3.4: Prediction of GO terms
The full description of this task can be found here.
Task 3.1: Secondary structure prediction
Task 3.2: Prediction of disordered regions
Task 3.3: Prediction of transmembrane alpha-helices and signal peptides
Annotated sequence features
The phenylalanine-4-hydroxylase has no annotated signal peptide or transmembrane helices.
The bacteriorhodopsin has the following annotated signal peptide and transmembrane helices:
|1 - 13||Propeptide|
|14 – 23||Topological domain||Extracellular|
|24 - 42||Transmembrane||Helical; Name=Helix A|
|43 – 56||Topological domain||Cytoplasmic|
|57 - 75||Transmembrane||Helical; Name=Helix B|
|76 – 91||Topological domain||Extracellular|
|92 - 109||Transmembrane||Helical; Name=Helix C|
|110 – 120||Topological domain||Cytoplasmic|
|121 - 140||Transmembrane||Helical; Name=Helix D|
|141 – 147||Topological domain||Extracellular|
|148 - 167||Transmembrane||Helical; Name=Helix E|
|168 – 185||Topological domain||Cytoplasmic|
|186 - 204||Transmembrane||Helical; Name=Helix F|
|205 – 216||Topological domain||Extracellular|
|217 - 236||Transmembrane||Helical; Name=Helix G|
|237 – 262||Topological domain||Cytoplasmic|
The retinol-binding protein 4 has the following annotated signal peptide (no transmembrane helices are annotated):
|1 - 18||Signal peptide|
The Insulin-like peptide INSL5 has the following annotated signal peptide (no transmembrane helices are annotated):
|1 - 22||Signal peptide|
The lysosome-associated membrane glycoprotein 1 has the following annotated signal peptide and transmembrane helices:
|1 - 28||Signal peptide|
|29 – 382||Topological domain||Lumenal|
|383 - 405||Transmembrane||Helical;|
|406 – 417||Topological domain||Cytoplasmic|
The Amyloid beta A4 protein has the following annotated signal peptide and transmembrane helices:
|1 - 17||Signal peptide|
|18 – 699||Topological domain||Extracellular|
|700 - 723||Transmembrane||Helical;|
|724 – 770||Topological domain||Cytoplasmic|
General Questions to prediction of transmembrane alpha-helices and signal peptides
Why is the prediction of transmembrane helices and signal peptides grouped together here?
Methods which only predict transmembrane helices often predict signal peptides as transmembrane helices as well. The reason for this is that both, transmembrane helices and signal peptides consist mainly of hydrophobic residues. These false predictions lead to inaccurate topological features and thus to wrongly annotated function of a protein. To avoid these cases most recent methods couple their transmembrane prediction together with a signal peptide prediction.
Description of different signal peptides
Signalpeptides for the import to the endoplasmic reticulum (ER)
The import to the ER is usually required for the secretory pathway (to export proteins out of a cell). The import process can occur either co-translational (the nascent protein chain is translocated together with the ribosome) or post-translational (only the fully synthesized protein is transported to the ER). However, for both cases the SEC-pathway is mostly used.
The co-translational transport to the ER is done by the signal recognition particle (SRP). This particle recognizes the N-terminal signal-sequence of the nascent polypeptide chain and then transports it to the ER membrane where the complex, consisting of SRP, polypeptide chain and ribosome, is recognized by the ER membrane bound signal recognition particle receptor (SR). After this recognition the polypeptide chain is imported into the ER lumen via the SEC channel in an ATP dependent process.
The post-translational import to the ER lumen is done by chaperons which guide the polypeptide chain to the SEC channel which is then imported in an ATP dependent process.
However, not only the import to the ER lumen is possible, an import to the ER membrane is possible as well. So far, 5 different types of import to the ER membrane are known.
Type 1 requires an N-terminal signal sequence and an intrinsic stop transfer anchor sequence which will be the part which is inserted in the membrane.
Type 2 and 3 do not require a N-terminal signal sequence only a intrinsic signal anchor sequence is required. The difference between type 2 and 3 is that type 2 has positively charged residues before the signal anchor sequence (on the N-Terminal side) and type 3 has positively charged residues after the signal anchor sequence (on C-Terminal side). These charged residues of trans-membrane protein are always in the cytosol. Thus, type 2 inserted proteins have their N-terminal end residing in the cytosol whereas type 3 inserted proteins have a C-terminal end in the cytosol.
Type 4-A and 4-B insertion is also known as multipass membrane insertion. These proteins have not one trans-membrane helix like the proteins imported via Type 1,2 and 3, instead they have several trans-membrane helices. Hence, they consist of multiple internal stop-transfer anchor sequences and internal signal-anchor sequences. The difference between type 4-A and 4-B is that in type 4-A the N and C terminal ends are located in the cytosol whereas type 4-B import results in a N-terminal end residing in the ER lumen and a C-terminal end residing in the cytosol.
In addition to the N-terminal import of trans-membrane proteins there is also the possiblity for a C-terminal import. Obviously, these proteins are imported post-translation.
Signalpeptides for the import to the mitochondrion
There are several targets for import to the mitochondrion, proteins can be translocated to the matrix, the outer membrane, the inner membrane and the inter membrane space.
Proteins who are designated to be imported to the matrix of a mitochondrion have a N-terminal matrix-targeting sequence. This mitochondrial import to the matrix is assisted by chaperons (Hsc70) which guide the protein to the import pore complex of the mitochondrion. The import through the outer membrane is conducted by the TOM complex and the following import through the inner membrane is conducted by the TOM complex. After successful import to the matrix the signal sequence is cleaved off by proteolytic active enzymes.
Import to the inner membrane can occur in three ways. The first way is the TIM22 pathway, proteins using this pathway need internal targeting sequences. The next way is the stop transfer import, for this proteins need a stop transfer sequence and a N-terminal matrix targeting sequence. The third way is called conservative sorting proteins using this pathway have a N-terminal targeting sequence as well and in addition intrinsic Oxa1-targeting sequences which are recognized by Ox1-proteins which execute the import to the membrane.
Proteins imported to the outer membrane of a mitochondrion usually have PORTA domains which are recognized by the TOB/SAM complex.
Signalpeptides for the import to the chloroplast
Proteins heading to chloroplasts can target different parts of it. For example the stroma, inner and outer membrane, the thylakoids membrane or the thylakoids lumen.
Usually these protein have a N-terminal targeting sequence.
Signalpeptides for the import to the peroxisome
Peroxisomal proteins can be imported to the lumen or to the membrane. Proteins imported to the lumen have either a peroxisomal targeting signal at the C-termins (also known as PTS1) or a targeting sequence close to the N-terminus (also known as PTS2). Proteins imported to the membrane can have an intrinsic membrane peroxisomal targeting signal (mPTS). However, not all proteins have this mPTS. These proteins are imported to the ER and from there they bud off together with the mature peroxisome.
Signalpeptides for the import to the nucleus and the export form the nucleus
Proteins which are imported to the nucleus require a nuclear localisation signal (NLS) which is recognized by importin. The NLS containing protein is then imported via the nuclear pore complex (NPC) to the nucleoplasm.
Proteins which are exported from the nucleus require a nuclear export signal which is recognized by exportin, a protein which binds to the NES of the cargo protein. In addition to exportin a second component, known as Ran*GTP, is required to mediate the export through the NPC.
Details of the method
Author: Sonnhammer, Heijne & Krogh
This method is based on a hidden markov model (HMM). The authors of this method tried to model the 'grammar' of transmembrane proteins in order to predict the protein topology of transmembrane more accurate than methods who only e.g. rely on propensity values and do not consider the topological constraints of these class of proteins.
TMHMM defined for their HMM for each feature one or more states which present this feature. For example the transmembrane helix is modeled by three sub models. A model for the helix core, the cap of the helix which lies partly in the cytoplasm and the membrane and the cap which is partly in the membrane and cytoplasm. In addition to this helix model they also created sub models for the cytoplasmic loop and the non-cytoplasmic loop as well as a sub model for the globular region. Each sub model can reflect one or more states in the HMM model. For example the globular sub model only consists of one HMM state whereas the helix-core and caps are modeled by multiple HMM states.
The 'grammar' is incorporated to this HMM model by defining the possible transitions from one sub model to another one. For example it is only possible to change from a cytoplasmic loop region to a cytoplasmic cap region and then to the helix core and after that either to non-cytoplasmic short loop or long non-cytoplasmic loop and so on.
This methods predicts the transmembrane helix and whether this part is in the cytoplasm (in) or outside of it (out).
Required information for the prediction
User who want to use it just need their amino acid sequence of their query sequence. The transmission and emission probabilities are derived from 160 transmembrane protein sequences.