Difference between revisions of "Lab Journal - Task 3 (PAH)"
(→Signal peptides) |
(→Secondary structure) |
||
(17 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Secondary structure == |
== Secondary structure == |
||
+ | For P10775 three calls were done using ReProf. For the first one a FASTA file is used as input, whereas PSSM matrices are delivered for the other two. One created with PSI-Blast against the big80 database the other against swissprot. PSI-Blast is used with the same parameter like in Task2 with two iterations and an e-value cutoff of 10e-10(for big80: <code>blastpgp -i /mnt/home/student/waldraffs/Masterpraktikum/Task3/secondary_structure/<UniprotID>.fasta -d /mnt/project/rost_db/data/big/big_80 -j 2 -h 10e-10 -b 2000 -v 2000 -o check_out_files/<UniprotID>.out -Q swiss_matrix_<UniprotID>.pssm </code>, for swissprot only the database is changed: <code>-d /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot)</code>. |
||
+ | *<code> reprof -i <query>.fasta </code> |
||
+ | *<code> reprof -i <query>.pssm </code> |
||
+ | |||
+ | Scripts created for this task: |
||
+ | #[https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Phenylketonuria/Task3/Scripts#filter_secStruc.pl filter_secStruc.pl] |
||
+ | #[https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Phenylketonuria/Task3/Scripts#SecStrucComparison.jar SecStrucComparison.jar] |
||
+ | |||
== Disorder == |
== Disorder == |
||
+ | |||
+ | === IUPred === |
||
+ | Before using this tool for the prediction, we had to compile IUPred with following command: |
||
+ | cc /opt/iupred/iupred.c -o ''/mnt/home/student/.../iupred'' |
||
+ | |||
+ | Afterwards one can invoke the programm as shown here: |
||
+ | iupred ''sequence.fasta'' long|short|glob > ''output.txt'' |
||
+ | |||
+ | Since the output is only given to Standard Out, we had to save the output into a file. |
||
+ | |||
+ | === MD (MetaDisorder) === |
||
+ | To invoke the programm one can use following command: |
||
+ | predictprotein --seqfile ''sequence.fasta'' --target metadisorder -p ''output_name'' -o ''output-directory'' |
||
+ | |||
== Transmembrane helices == |
== Transmembrane helices == |
||
+ | |||
+ | === PolyPhobius === |
||
+ | Before using PolyPhobius, we had to do some steps: |
||
+ | <ol> |
||
+ | <li>We generated a fasta file with all homologous sequences to the query sequence inside. Therefore, we used the blastget perl script and the swissprot database as followed: <br /><pre> /mnt/project/pracstrucfunc13/polyphobius/blastget -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot |
||
+ | -ix /mnt/project/pracstrucfunc13/data/index_pp/uniprot_sprot.idx sequence.fasta > sequence-blast.fasta </pre> |
||
+ | </li> |
||
+ | <li> |
||
+ | Afterwards, we used kalign for the MSA generation as shown here: |
||
+ | <br /><pre> /mnt/opt/T-Coffee/bin/kalign -i sequence-blast.fasta -o sequence-kalign.fasta -f fasta |
||
+ | </pre> |
||
+ | </li> |
||
+ | <li> |
||
+ | Now, we can run PolyPhobius with following command: |
||
+ | <br /><pre> /mnt/project/pracstrucfunc13/polyphobius/jphobius -poly sequence-kalign.fasta > sequence-polyphobius.txt |
||
+ | </pre> |
||
+ | </li> |
||
+ | </ol> |
||
+ | |||
== Signal peptides == |
== Signal peptides == |
||
We tried two different parameters for our predictions:<br> |
We tried two different parameters for our predictions:<br> |
||
First we simple run SignalP without any constraints. The only thing, which has to be stated is ''-t euk'' as all four sequences are eukaryotic. Otherwise SignalP only would accept Gran+ or Gran-. |
First we simple run SignalP without any constraints. The only thing, which has to be stated is ''-t euk'' as all four sequences are eukaryotic. Otherwise SignalP only would accept Gran+ or Gran-. |
||
− | -o can be set, so the output is written automatically in output.txt or it can be set with '>'. |
+ | -o can be set, so the output is written automatically in output.txt or it can be set with '>'.<br> |
− | <code>signalp -t euk <UniprotID>.fasta > <UniprotID>_output.out </code> |
+ | *<code>signalp -t euk <UniprotID>.fasta > <UniprotID>_output.out </code><br> |
− | In our second run we choose only the N-terminal with 70 residues as it is recommended in the manual page of SignalP to avoid false positives. |
+ | In our second run we choose only the N-terminal with 70 residues as it is recommended in the manual page of SignalP to avoid false positives. <br> |
− | <code>signalp -trunc 70 -t euk <UniprotID>.fasta > <UniprotID>_trunc.out</code> |
+ | *<code>signalp -trunc 70 -t euk <UniprotID>.fasta > <UniprotID>_trunc.out</code><br> |
− | In our case there are only few differences between the runs for the whole sequence or only the N-terminal. For example for the whole sequence the NN result of P47863 gives also a YES for C and not only for max.S. |
+ | In our case there are only few differences between the runs for the whole sequence or only the N-terminal. For example for the whole sequence the NN result of P47863 gives also a YES for C and not only for max.S. [[Sequence-based predictions (Phenylketonuria) #Signal peptides|Table 15]] shows the results of the N-terminal run only. |
+ | |||
+ | [[Category: Phenylketonuria 2013]] |
Latest revision as of 13:41, 17 August 2013
Contents
Secondary structure
For P10775 three calls were done using ReProf. For the first one a FASTA file is used as input, whereas PSSM matrices are delivered for the other two. One created with PSI-Blast against the big80 database the other against swissprot. PSI-Blast is used with the same parameter like in Task2 with two iterations and an e-value cutoff of 10e-10(for big80: blastpgp -i /mnt/home/student/waldraffs/Masterpraktikum/Task3/secondary_structure/<UniprotID>.fasta -d /mnt/project/rost_db/data/big/big_80 -j 2 -h 10e-10 -b 2000 -v 2000 -o check_out_files/<UniprotID>.out -Q swiss_matrix_<UniprotID>.pssm
, for swissprot only the database is changed: -d /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot)
.
reprof -i <query>.fasta
reprof -i <query>.pssm
Scripts created for this task:
Disorder
IUPred
Before using this tool for the prediction, we had to compile IUPred with following command:
cc /opt/iupred/iupred.c -o /mnt/home/student/.../iupred
Afterwards one can invoke the programm as shown here:
iupred sequence.fasta long|short|glob > output.txt
Since the output is only given to Standard Out, we had to save the output into a file.
MD (MetaDisorder)
To invoke the programm one can use following command:
predictprotein --seqfile sequence.fasta --target metadisorder -p output_name -o output-directory
Transmembrane helices
PolyPhobius
Before using PolyPhobius, we had to do some steps:
- We generated a fasta file with all homologous sequences to the query sequence inside. Therefore, we used the blastget perl script and the swissprot database as followed:
/mnt/project/pracstrucfunc13/polyphobius/blastget -db /mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot -ix /mnt/project/pracstrucfunc13/data/index_pp/uniprot_sprot.idx sequence.fasta > sequence-blast.fasta
-
Afterwards, we used kalign for the MSA generation as shown here:
/mnt/opt/T-Coffee/bin/kalign -i sequence-blast.fasta -o sequence-kalign.fasta -f fasta
-
Now, we can run PolyPhobius with following command:
/mnt/project/pracstrucfunc13/polyphobius/jphobius -poly sequence-kalign.fasta > sequence-polyphobius.txt
Signal peptides
We tried two different parameters for our predictions:
First we simple run SignalP without any constraints. The only thing, which has to be stated is -t euk as all four sequences are eukaryotic. Otherwise SignalP only would accept Gran+ or Gran-.
-o can be set, so the output is written automatically in output.txt or it can be set with '>'.
signalp -t euk <UniprotID>.fasta > <UniprotID>_output.out
In our second run we choose only the N-terminal with 70 residues as it is recommended in the manual page of SignalP to avoid false positives.
signalp -trunc 70 -t euk <UniprotID>.fasta > <UniprotID>_trunc.out
In our case there are only few differences between the runs for the whole sequence or only the N-terminal. For example for the whole sequence the NN result of P47863 gives also a YES for C and not only for max.S. Table 15 shows the results of the N-terminal run only.