Secondary structure prediction

From Protein Prediction 1 Summer Semester 2016 For Informaticians

This page is organized as follows:

Keywords are terms that you need to understand to follow the lecture. To test your knowledge, try to define and explain these keywords (in a few sentences). If you cannot think of anything to say about a keyword, read up on that topic.

Under Sources you find literature we suggest (textbooks, web pages, articles) that will help you to understand the topic. You can use this as a resource to complete your knowledge of the keywords and to help you answer the questions and solve the tasks. You are not required to read and study any of these, but they provide more detailed knowledge on the topic and are a good complement to the lecture. Of course you can feel free to use any other source you like.

In the section Exercise we provide Questions and Hands-on tasks that allow you to test and further your knowledge of the given topic. During the exercise session you can ask questions pertaining to the topic (keywords and exercises).


  • Protein secondary structure
  • Alpha-helix
  • Beta-strand, (parallel and antiparallel) beta-sheet
  • Loop, coil
  • Protein disorder
  • Solvent accessible surface
  • Homology information
  • Features: amino acid distribution, hydrophobicity, (disorder), solvent accessibility
  • Machine learning
  • Training set, test set, cross-validation
  • DSSP
  • Ramachandran plot




  • Can we predict secondary structure from protein sequence?
  • What information do we obtain when predicting protein secondary structure? What features are predicted?
  • How can we estimate the performance of secondary structure prediction methods?
  • Most often secondary structure predictions refers to the prediction of alpha helices, beta sheets and random coils. What other features of protein structure can be considered as secondary structure and be predicted?
  • List two secondary structure prediction methods.
  • Initially, prediction methods often focused on alpha-helices or underpredicted beta-sheets. What is the difficulty in recognizing beta-sheets from a window-based prediction method?

Hands on tasks

  • Test the following secondary structure prediction methods and describe their output: RePROF and PROFsec (available via PredictProtein) and PsiPred
    • Use these example proteins: UniProt IDs P10775 and Q9X0E6
    • Download the sequence for these proteins from UniProt to predict the secondary structure
    • Compare the predictions
    • Look for the proteins in the PDB (structures are available, you can search for the UniProt ID also in the PDB or look up the PDB ID for these proteins in UniProt and then go to the PDB). Do the predictions match the 3D protein structure?
  • Does the number of secondary structure elements and their length differ on average between sequences of real proteins and of proteins with a randomly shuffled sequence? To answer this question, you need to:
    • Write your own program to generate random amino acid sequences
    • The composition of amino acids in random sequences should be on average the same as in real protein sequences
    • Predict secondary structure for your random sequneces