Difference between revisions of "Resource software"

Latest revision as of 11:15, 13 August 2013

Here, we collect descriptions of the software used in the practical. This can be software used in online portals or software installed locally on your own computers or the lab resources. In each case, please describe how to access the software and where to find manuals. Also use this site to collect scripts or HOW_TOs that could be useful for others.

Your own scripts

If you have produced a script that does something that could be useful for others, please "publish" it here. E.g. create a page for your tool where you provide information where to find the software (path on the student cluster, git repository, ...) and how to use it. -- For users: If you use a script produced by another group, please document that (e.g. in the "lab book" part of your wiki section). And if you find bugs, please help the other group improve.

Task 2

Executing blastp, blastpgp and hhblits

A script run.pl executes blastp; blastpgp and hhblits with different options: databases, number of iterations and E-value cutoffs. Also uses checkfiles for blastpgp and outputs PSSMs.

Convert hhr to parseable tsv format

A C program for extraction of statistics results from hhblits output (hhr format) to a tsv (tab separated values) file: hhr2tsv.git

Install:

./configure --prefix=$HOME
./make install
./make clean

Usage:

$HOME/bin/hhr2tsv <input_hhr_file> <output_tsv_file>

Parser of (Psi-)BLAST and HHblits hhr output files

A script parse_output.pl parses alignments information from (Psi-)BLAST and HHblits hhr output files into tab-separated format, suitable for plotting, calculates the number of hits and overlap of hits with same ID between (Psi-)BLAST and HHblits outputs. Moreover, there is an option to evaluate PDB hits against COPS and create files for plotting.

Plotting of TPR and precision

The script tpr_precision.pl bases on the output files of parse_output.pl and it makes an R-plot of TPR and precision as a function of E-value of the hits.

Script for comparing the CATH fold classes of the quer and the pdb hits

The script compareCath.py reads the output from parse_output.pl (see above) and compares the fold classes of the query domains with the fold classes of the hits and writes a histogram to stdout.

Script for finding GOAnnotations

This Script finds GOAnnotations for a given Protein and creates an outfile.
The script can be found here
A typical command would be: python goAnnotation.py B2JCG3 /Desktop/result.out

Task 3

Script to filter the secondary structure of reprof output files

This script reads the output of a ReProf, a PsiPred or a DSSP run and filters for the secondary structure: filter_secStruc.pl

Calculation precision between two secondary structure sequences

This script reads two sequences in the format given by the output of filter_secStruc.pl and calculates the precision between those: SecStrucComparison.jar

Automatically run Polyphobius

This script combines blastget, kalign and jphobius together. Run polyphobius should be easier for us. You can find the code here (polyphobius.pl).

Task 5

Wrapper program for modeller

This program intends to help you to run modeller automatically. Both single template modelling and multi-template modelling are supported. You can find the program here: Modeller.py

Usage:

Usage: Modeller.py \
	--template template.pdb[,template2.pdb[,template3.pdb]] \
	--chain chain[,chain2,[chain3]]	--target target.pir \
	--align alignment-file.ali [--has-align]

It has following parameters:

--template template protein structures (multiple structures are separated by commas)
--chain selected chains from template protein structures (multiple templates are separated by commas)
--target target sequence in PIR
--align file to store sequence alignments
--has-align OPTIONAL: whether we have already have alignment file

Convert table to wiki format

R script for formatting a table (e. g. in csv format) to wiki table format: format_to_wiki_table.r

Task 8

Calculator for residue conservation in MSA

This python script helps you to calculate conservation of residues in a MSA in FASTA format. First entry in the FASTA file should be the query sequence.

Usage: python msa-conservation.py <MSA.fasta> residue_pos1,[residue_pos2,[...]]

Molecular visualization

To look at protein structures you can use any molecular visualization programm. Here are a few options:

PyMOL -> installed on the i12k-biolab computers
Jmol, e.g. via PDB
VMD
the SRS 3D server -- unfortunately not working any more. Maybe Aquaria will become publicly available within this practical.

Changing Blast output

By default, Blast lists 500 search hits and 250 alignment details. This can be changed (see Blast manual for details):

You can use a custom output format to get a table with "-m 8" (see "-help" or this hint on how to parse Blast output).
You can use "-b" to set the number of alignments to be shown, "-b 20000" is the maximum.

Modeller

Troubleshooting

A very common error from Modeller is the following: "Sequence difference between alignment and pdb" . This usually means the structure of the template available in PDB (which was experimentally solved) has missing residues, which could be a result of technical problems with the X-ray diffraction data. Therefore, you need to make sure sure target-template alignment uses the sequence implied in the ATOM records, not the SEQRES record. To locate the error you could e.g. generate a fasta sequence can be generated from the PDB file coordinates, align this sequence with the fasta sequence for the SEQRES sequence and check for missing residues (gaps within the alignment). If residues are missing, regenerate you target template alignment based on the new fasta sequence made from the coordinates.

SNAP

There is a very brief explanation about SNAP available here --> Media:SNAP.pdf.

Energy Minimization

There is a script to automatically run energy minimizations with Gromacs here --> MutEn.pl.

R

Error in hist.default(a$V2, main = "evals") : 'x' must be numeric

Blast uses non-standard scientific notation and ommits the preceding 1 for eValues like 'e-190'. Change it to '1e-190' and R will stop complaining.

@@ Line 1: / Line 1: @@
 Here, we collect descriptions of the software used in the practical. This can be software used in online portals or software installed locally on your own computers or the lab resources. In each case, please describe how to access the software and where to find manuals. Also use this site to collect scripts or HOW_TOs that could be useful for others.
+== Your own scripts ==
+If you have produced a script that does something that could be useful for others, please "publish" it here. E.g. create a page for your tool where you provide information where to find the software (path on the student cluster, git repository, ...) and how to use it. -- For users: If you use a script produced by another group, please document that (e.g. in the "lab book" part of your wiki section). And if you find bugs, please help the other group improve.
+=== Task 2 ===
+==== Executing blastp, blastpgp and hhblits ====
+A script [[run.pl]] executes blastp; blastpgp and hhblits with different options: databases, number of iterations and E-value cutoffs. Also uses checkfiles for blastpgp and outputs PSSMs.
+==== Convert hhr to parseable tsv format ====
+A C program for extraction of statistics results from hhblits output (hhr format) to a tsv (tab separated values) file:
+[https://github.com/uheeschen/hhr2tsv.git hhr2tsv.git]
+* Install:
+ <nowiki>
+./configure --prefix=$HOME
+./make install
+./make clean</nowiki>
+* Usage:
+ <nowiki>$HOME/bin/hhr2tsv <input_hhr_file> <output_tsv_file></nowiki>
+==== Parser of (Psi-)BLAST and HHblits hhr output files ====
+A script [[parse_output.pl]] parses alignments information from (Psi-)BLAST and HHblits hhr output files into tab-separated format, suitable for plotting, calculates the number of hits and overlap of hits with same ID between (Psi-)BLAST and HHblits outputs. Moreover, there is an option to evaluate PDB hits against COPS and create files for plotting.
+==== Plotting of TPR and precision ====
+The script [[tpr_precision.pl]] bases on the output files of [[parse_output.pl]] and it makes an R-plot of TPR and precision as a function of E-value of the hits.
+==== Script for comparing the CATH fold classes of the quer and the pdb hits ====
+The script [[compareCath.py]] reads the output from parse_output.pl (see above) and compares the fold classes of the query domains with the fold classes of the hits and writes a histogram to stdout.
+==== Script for finding GOAnnotations ====
+This Script finds GOAnnotations for a given Protein and creates an outfile.<br>
+The script can be found [https://dl.dropboxusercontent.com/u/9441182/goAnnotation.py here]<br>
+A typical command would be: python goAnnotation.py B2JCG3 /Desktop/result.out
+=== Task 3 ===
+==== Script to filter the secondary structure of reprof output files ====
+This script reads the output of a ReProf, a PsiPred or a DSSP run and filters for the secondary structure: [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Phenylketonuria/Task3/Scripts#filter_secStruc.pl filter_secStruc.pl]
+==== Calculation precision between two secondary structure sequences ====
+This script reads two sequences in the format given by the output of filter_secStruc.pl and calculates the precision between those: [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Phenylketonuria/Task3/Scripts#SecStrucComparison.jar SecStrucComparison.jar]
+==== Automatically run Polyphobius ====
+This script combines <tt>blastget</tt>, <tt>kalign</tt> and <tt>jphobius</tt> together. Run polyphobius should be easier for us. You can find the code [[polyphobius.pl|here]] ([[polyphobius.pl]]).
+=== Task 5 ===
+==== Wrapper program for modeller ====
+This program intends to help you to run modeller automatically. Both single template modelling and multi-template modelling are supported.
+You can find the program here: [[Modeller.py]]
+Usage:
+ <nowiki>Usage: Modeller.py \
+	--template template.pdb[,template2.pdb[,template3.pdb]] \
+	--chain chain[,chain2,[chain3]]	--target target.pir \
+	--align alignment-file.ali [--has-align]</nowiki>
+It has following parameters:
+* <tt>--template</tt>  template protein structures (multiple structures are separated by commas)
+* <tt>--chain</tt>     selected chains from template protein structures (multiple templates are separated by commas)
+* <tt>--target</tt>    target sequence in PIR
+* <tt>--align</tt>     file to store sequence alignments
+* <tt>--has-align</tt> OPTIONAL: whether we have already have alignment file
+=== Convert table to wiki format ===
+R script for formatting a table (e. g. in csv format) to wiki table format: [[format_to_wiki_table.r]]
+=== Task 8 ===
+==== Calculator for residue conservation in MSA ====
+This [[msa-conservation.py|python script]] helps you to calculate conservation of residues in a MSA in FASTA format. First entry in the FASTA file should be the query sequence.
+Usage: <nowiki>python msa-conservation.py <MSA.fasta> residue_pos1,[residue_pos2,[...]]</nowiki>
 == Molecular visualization ==
 To look at protein structures you can use any molecular visualization programm. Here are a few options:
+* [http://www.pymol.org/ PyMOL] -> installed on the i12k-biolab computers
 * [http://jmol.sourceforge.net/ Jmol], e.g. via [http://www.pdb.org/ PDB]
-* the [http://srs3d.org/ SRS 3D server]
+* [http://www.ks.uiuc.edu/Research/vmd/ VMD]
+* the [http://srs3d.org/ SRS 3D server] -- unfortunately not working any more. Maybe Aquaria will become publicly available within this practical.
-* [http://www.pymol.org/ PyMOL] -> will be installed on the i12k-biolab computers soon.
 == Changing Blast output ==
@@ Line 19: / Line 93: @@
 Troubleshooting
-A very common error from Modeller is the following: ''' "Sequence difference between alignment and pdb" '''. This usually means the structure of the template available in PDB (which was experimentally solved) has missing residues, what could be a result of technical problems with the X-ray diffraction data (more frequently). To overcome this error, the first step should be to identify its source. For that, a fasta sequence can be generated from the PDB file containing the coordinates (it should be a simple script, or even a couple of command lines, but if any help needed, please write an email to bitar@rostlab.org). After this, align this sequence with the fasta sequence for the same protein and check for missing residues (gaps within the alignment). If residues are missing, simple remove those from the original sequence and generate a new alignment between template and target.
+A very common error from Modeller is the following: ''' "Sequence difference between alignment and pdb" '''. This usually means the structure of the template available in PDB (which was experimentally solved) has missing residues, which could be a result of technical problems with the X-ray diffraction data. Therefore, you need to make sure sure target-template alignment uses the sequence implied in the ATOM records, not the SEQRES record. To locate the error you could e.g. generate a fasta sequence can be generated from the PDB file coordinates, align this sequence with the fasta sequence for the SEQRES sequence and check for missing residues (gaps within the alignment). If residues are missing, regenerate you target template alignment based on the new fasta sequence made from the coordinates.
 == SNAP ==
 There is a very brief explanation about SNAP available here --> [[Media:SNAP.pdf]].
 == Energy Minimization ==