Sequence-based predictions (PKU)

From Bioinformatikpedia
Revision as of 15:16, 14 May 2012 by Boidolj (talk | contribs) (Signalpeptides)

Short Task Description

Go read..

and while you're at it, read some more

Our task is to use the primary sequence of our protein (and some other example proteins) to predict comparatively simple features like secondary structure, signal peptides and transmembrane regions and more advanced like GO terms and similar functional annotations with different tools. The used commands and programms are listed at the appropriate places (if short and interesting enough) or linked at their own site.

Secondary Structure Prediction

Test some SS-predictors ans compare the results to the 'gold standard' of dssp

ReProfSeq

reprof -i <uniprotID>.fasta
egrep -v "^#|^No" <uniprotID>.reprof |awk '{print $3}'|tr -d '\n' > <uniprotID>_reprof.secstruc

PsiPred

Webserver here

grep Pred: <uniprotID>.psipred_out |cut -d " " -f2|tr -d '\n' > <uniprotID>_psipred.secstruc

DSSP

Download dssp here and get PDB files matching the Uniprot entries

dssp -i <PDBID>.pdb > <PDBID>.dssp
tail -n+29 <PDBID>.dssp |cut -c17|tr ' ' '-'|tr -d '\n' > <PDBID>_dssp.secstruc

PDB-Files contains only part of the structure or more than 1 chain! e.g. 117aa-424aa for PAH.

Aligned structure to sequence manually..

Script to calculate Q3 and SOV: ss_score.py

Disorderd Regions

Transmembreane Helices

Signalpeptides

P47863 from rat, rest human => type eukaryotes; command:

signalp -trunc 70 -t euk <ID>.fasta > <ID>.sigP_out
SignalP
UniprotID SigP-HMM SigP_NN summary prediction experimental
P00439 0% signal peptide or anchor 5 times No no signal peptide or anchor no evidence for peptide or anchor
P02768 100% signal peptide, 0% signal anchor 5 times Yes signal peptide confirmed signal peptide
P11279 100% signal peptide, 0% signal anchor 5 times Yes signal peptide confirmed signal peptide
P47863 52.6% signal peptide, 45.7 signal anchor 4 No, 1 Yes signal peptide no evidence for peptide or anchor

GO Terms

GOPet

GOPet predicted GO terms
GO ID Aspect Confidence GO Term True/False
GO:0003824 F 94% catalytic activity true
GO:0016491 F 88% oxidoreductase activity true
GO:0004497 F 87% monooxygenase activity true
GO:0004505 F 84% phenylalanine 4-monooxygenase activity true
GO:0004510 F 80% tryptophan 5-monooxygenase activity false
GO:0004511 F 79% tyrosine 3-monooxygenase activity false
GO:0046872 F 78% metal ion binding true
GO:0005506 F 78% iron ion binding true
GO:0008199 F 72% ferric iron binding false
GO:0008198 F 72% ferrous iron binding false
GO:0016597 F 71% amino acid binding true

ProtFun

# Functional category                  Prob     Odds
 Amino_acid_biosynthesis           => 0.210    9.530
 Biosynthesis_of_cofactors            0.229    3.180
 Cell_envelope                        0.034    0.563
 Cellular_processes                   0.063    0.867
 Central_intermediary_metabolism      0.061    0.970
 Energy_metabolism                    0.343    3.815
 Fatty_acid_metabolism                0.025    1.889
 Purines_and_pyrimidines              0.392    1.615
 Regulatory_functions                 0.020    0.125
 Replication_and_transcription        0.118    0.438
 Translation                          0.204    4.630
 Transport_and_binding                0.024    0.060

# Enzyme/nonenzyme                     Prob     Odds
 Enzyme                            => 0.724    2.527
 Nonenzyme                            0.276    0.387

# Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.154    0.738
 Transferase    (EC 2.-.-.-)          0.271    0.785
 Hydrolase      (EC 3.-.-.-)          0.083    0.261
 Lyase          (EC 4.-.-.-)          0.047    1.002
 Isomerase      (EC 5.-.-.-)       => 0.100    3.138
 Ligase         (EC 6.-.-.-)          0.019    0.370

# Gene Ontology category               Prob     Odds
 Signal_transducer                    0.075    0.350
 Receptor                             0.003    0.016
 Hormone                              0.001    0.206
 Structural_protein                   0.005    0.166
 Transporter                          0.025    0.229
 Ion_channel                          0.010    0.168
 Voltage-gated_ion_channel            0.005    0.232
 Cation_channel                       0.010    0.215
 Transcription                        0.043    0.334
 Transcription_regulation             0.032    0.255
 Stress_response                      0.010    0.118
 Immune_response                      0.012    0.140
 Growth_factor                        0.006    0.407
 Metal_ion_transport                  0.009    0.020

Additional Predictions:

Feature 	Output summary
   SignalP 3.0 	  No signal peptide cleavage site predicted 
   ProP 1.0 	  1 propeptide cleavage site predicted at position:   74 
   TargetP 1.1 	  No high confidence targeting predition 
   NetPhos 2.0 	  22 putative phosphorylation sites at positions 16 23 40 70 110 196 250 303 339 391 411 22 105 189 278 418 24 77 198 268 277 317
   NetOGlyc 3.1   No O-glycosylated sites predicted
   NetNGlyc 1.0   2 putative N-glycosylated sites at positions 61 376
   TMHMM 2.0 	  No TM helices predicted