Psipred

From Bioinformatikpedia

PSIPRED

Basic Information

Author David T. Jones
Year 1999
Reference PubMed 10493868
ML Method Neural Network
Architecture feed-forward back-propagation, single hidden layer

The idea of this method is to use the information of evolutionary related proteins to predict the secondary structure of a new amino acid sequence. PSIBLAST is used to find related sequences and to build a position specific scoring matrix. This matrix is processed by a neural network, which was constructed and trained in order to predict the secondary structure of the input sequence. A neural network is a machine learning method.

Details

The algorithm is separated in three stages: generation of a sequence profile, prediction of initial secondary structure and the filtering of the predicted structure.

The generation of a sequence profile is done by PSI-BLAST. This profile is normalized by PSIPRED.

The prediction of an initial secondary structure is done by a neural network. For each amino acid in the sequence the neural network is feeded with a window of 15 acids. There is an additional information attached, which indicates if the window spans the N or C terminus of the chain. This results in a final input layer of 315 input units, divided into 15 groups of 21 units. The network has a single hidden layer of 75 units and 3 output nodes (one for each secondary structure element: helix, sheet, coil).

A second neural network is used for filtering the predicted structure of the first network. This network is also feeded with a window of 15 positions. The indicator on the possible position of the window at a chain terminus is also forwarded. This results in 60 input units, divided into 15 groups of four. The network has a single hidden layer of 60 units and results in three output nodes (one for each secondary structure element: helix, sheet, coil).

The three final output nodes deliver a score for each secondary structure element for the central position of the window. PSIPRED predicts the secondary structure with the highest score.

FAQ

How to run it locally?

Change in /apps/psipred_3.2/runpsipred the following lines:

# The name of the BLAST data bank
set dbname = /data/blast/swiss/uniprot_sprot.fasta

# Where the NCBI programs have been installed
set ncbidir = /apps/blast_old/bin

# Where the PSIPRED V2 programs have been installed
set execdir = /apps/psipred_3.2/src

# Where the PSIPRED V2 data files have been installed
set datadir = /apps/psipred_3.2/data

Then you should be able to run psipred in /apps/psipred_3.2/: sudo ./runpsipred <path to your fasta-file>

The results can then be found in the directory /apps/psipred_3.2/ (kind of ugly)