Difference between revisions of "Sequence-based predictions Protocol TSD"
(→Secondary Structure) |
m (→Transmembrane helices) |
||
Line 129: | Line 129: | ||
wget http://www.uniprot.org/uniprot/P47863.fasta |
wget http://www.uniprot.org/uniprot/P47863.fasta |
||
wget http://www.uniprot.org/uniprot/P06865.fasta |
wget http://www.uniprot.org/uniprot/P06865.fasta |
||
+ | |||
+ | wget http://www.uniprot.org/uniprot/P02768.fasta |
||
+ | wget http://www.uniprot.org/uniprot/P47863.fasta |
||
+ | wget http://www.uniprot.org/uniprot/P11279.fasta |
||
</source> |
</source> |
||
Revision as of 14:14, 20 May 2012
Secondary Structure
Get the sequences <source lang="bash">
- !/bin/bash
cd ../input
wget http://www.uniprot.org/uniprot/P10775.fasta wget http://www.uniprot.org/uniprot/Q9X0E6.fasta wget http://www.uniprot.org/uniprot/Q08209.fasta wget http://www.uniprot.org/uniprot/P06865.fasta </source>
For DSSP first get the executable <source lang="bash"> wget ftp://ftp.cmbi.ru.nl/pub/software/dssp/dssp-2.0.4-linux-amd64 chmod +x dssp-2.0.4-linux-amd64 </source>
Get the PDB files for the according Uniprot entries <source lang="bash">
- !/bin/bash
cd ../input
wget http://www.pdb.org/pdb/files/2BNH.pdb wget http://www.pdb.org/pdb/files/1KR4.pdb wget http://www.pdb.org/pdb/files/1AUI.pdb wget http://www.pdb.org/pdb/files/2GJX.pdb </source>
Start the predictions <source lang="bash">
- !/bin/bash
cd ../input/
for file in `ls | grep .fasta` ; do
reprof -i $file -o ../prediction/
done
for file in `ls | grep .pdb` ; do
./../bin/dssp-2.0.4-linux-amd64 -i $file -o ../prediction/$file.dssp
done </source>
For PSIPred use the webserver
Make ReProf output more readable
<source lang="bash">
- !/bin/bash
cd ../prediction/
for file in `ls *.reprof` ; do
grep -v -P "^(#|No)" $file | cut -f 2 | tr -d '\n' > $file.parsed echo "" >> $file.parsed grep -v -P "^(#|No)" $file | cut -f 3 | tr -d '\n' >> $file.parsed
done </source>
Make DSSP output more readable <source lang="bash">
- !/bin/bash
cd ../prediction/
for file in `ls *.dssp` ; do
tail -n+29 $file | cut -c14 | tr -d '\n' > $file.parsed #Thanks to Jonathan echo "" >> $file.parsed tail -n+29 $file | cut -c17 | tr ' ' '-' | tr -d '\n' >> $file.parsed
done </source>
Make PsiPred Output more readable <source lang="bash">
- !/bin/bash
cd ../prediction/
for file in `ls *.psipred | grep -v "pdf"` ; do
grep "AA:" $file | sed -r 's/\s+AA: //' | tr -d '\n' > $file.parsed echo "" >> $file.parsed grep "Pred:" $file | sed 's/Pred: //' | tr -d '\n' >> $file.parsed
done </source>
Disorder
Get the required sequences <source lang="bash">
- !/bin/bash
cd ../input
wget http://www.uniprot.org/uniprot/P10775.fasta wget http://www.uniprot.org/uniprot/Q9X0E6.fasta wget http://www.uniprot.org/uniprot/Q08209.fasta wget http://www.uniprot.org/uniprot/P06865.fasta </source>
Start the predictions <source lang="bash">
- !/bin/bash
cd /opt/iupred/ END=.iupred
for file in `ls /mnt/home/student/reeb/3_SeqBasedPred/2_DISO/input | grep .fasta` ; do
IFS="." array=($file) unset IFS ./iupred /mnt/home/student/reeb/3_SeqBasedPred/2_DISO/input/$file long > /mnt/home/student/reeb/3_SeqBasedPred/2_DISO/predictions/${array[0]}$END
done
</source>
Transmembrane helices
Get the required sequence and our reference sequence <source lang="bash"> cd ../input/
wget http://www.uniprot.org/uniprot/P35462.fasta wget http://www.uniprot.org/uniprot/Q9YDF8.fasta wget http://www.uniprot.org/uniprot/P47863.fasta wget http://www.uniprot.org/uniprot/P06865.fasta
wget http://www.uniprot.org/uniprot/P02768.fasta wget http://www.uniprot.org/uniprot/P47863.fasta wget http://www.uniprot.org/uniprot/P11279.fasta </source>
Script for running polyphobius and creating everything needed in advance <source lang="bash">
- !/bin/bash
- $ -S /bin/sh
BLASTDB=$1 #/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot
BLASTINDEX=$2 #/mnt/project/pracstrucfunc12/data/index_pp/uniprot_sprot.idx
WD=$3
OUT=$4
EXEC=/mnt/project/pracstrucfunc12/polyphobius/jphobius
EXECBG=/mnt/project/pracstrucfunc12/polyphobius/blastget
EXECKA=/mnt/opt/T-Coffee/bin/kalign
END=.pred
ENDBG=.bg
ENDKA=.msa
PARAMS=-poly
PARAMSKA="-f fasta"
PARAMSBG="-db $BLASTDB -ix $BLASTINDEX"
PATH=$PATH:/mnt/project/pracstrucfunc12/polyphobius/ export PATH
mkdir -p $OUT
cd $WD
pwd
`rm $OUT/log &> /dev/null`
for file in `ls | grep ".fasta"`; do
echo "Processing $file" &>> $OUT/log
IFS="." array=($file) unset IFS `perl $EXECBG $PARAMSBG $file > $OUT/${array[0]}$ENDBG`
wait
if [ `grep "^>" $OUT/${array[0]}$ENDBG | wc -l` -gt 1 ]; then
`$EXECKA $PARAMSKA -input $OUT/${array[0]}$ENDBG -output $OUT/${array[0]}$ENDKA`
wait
`perl $EXEC $PARAMS $OUT/${array[0]}$ENDKA &> $OUT/${array[0]}$END`
wait else
`perl $EXEC $PARAMS $OUT/${array[0]}$ENDBG &> $OUT/${array[0]}$END` fi done </source>
Start the predictions
<source lang="bash"> ./callPolyPhobius.sh /mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot /mnt/project/pracstrucfunc12/data/index_pp/uniprot_sprot.idx ../input/ ../prediction/sp/ </source>
Signal peptides
<source lang="bash">
- !/bin/bash
for file in /mnt/home/student/reeb/3_SeqBasedPred/4_SIGP/input/*fasta; do
prot=${file##*/} protein=${prot%.*} signalp -t euk -graphics gif -d /mnt/home/student/reeb/3_SeqBasedPred/4_SIGP/prediction_v3/gif_$protein -trunc 70 $file > /mnt/home/student/reeb/3_SeqBasedPred/4_SIGP/prediction_v3/$protein.out
done
</source>
GO terms
Start the predictions for the methods by going to their webservers. For GOPet the most recent model, program version and database were used. We also incresed the maximum number of reported GO-Terms to the maxmimum of 100.