|Author||David T. Jones|
|ML Method||Neural Network|
|Architecture||feed-forward back-propagation, single hidden layer|
The idea of this method is to use the information of evolutionary related proteins to predict the secondary structure of a new amino acid sequence. PSIBLAST is used to find related sequences and to build a position specific scoring matrix. This matrix is processed by a neural network, which was constructed and trained in order to predict the secondary structure of the input sequence. A neural network is a machine learning method.
The algorithm is separated in three stages: generation of a sequence profile, prediction of initial secondary structure and the filtering of the predicted structure.
The generation of a sequence profile is done by PSI-BLAST. This profile is normalized by PSIPRED.
The prediction of an initial secondary structure is done by a neural network. For each amino acid in the sequence the neural network is feeded with a window of 15 acids. There is an additional information attached, which indicates if the window spans the N or C terminus of the chain. This results in a final input layer of 315 input units, divided into 15 groups of 21 units. The network has a single hidden layer of 75 units and 3 output nodes (one for each secondary structure element: helix, sheet, coil).
A second neural network is used for filtering the predicted structure of the first network. This network is also feeded with a window of 15 positions. The indicator on the possible position of the window at a chain terminus is also forwarded. This results in 60 input units, divided into 15 groups of four. The network has a single hidden layer of 60 units and results in three output nodes (one for each secondary structure element: helix, sheet, coil).
The three final output nodes deliver a score for each secondary structure element for the central position of the window. PSIPRED predicts the secondary structure with the highest score.
How to run it locally?
Change in /apps/psipred_3.2/runpsipred the following lines:
# The name of the BLAST data bank
set dbname = /data/blast/swiss/uniprot_sprot.fasta
# Where the NCBI programs have been installed
set ncbidir = /apps/blast_old/bin
# Where the PSIPRED V2 programs have been installed
set execdir = /apps/psipred_3.2/src
# Where the PSIPRED V2 data files have been installed
set datadir = /apps/psipred_3.2/data
Then you should be able to run psipred in /apps/psipred_3.2/:
sudo ./runpsipred <path to your fasta-file>
The results can then be found in the directory /apps/psipred_3.2/ (kind of ugly)