Difference between revisions of "Resource software"

From Bioinformatikpedia
Line 69: Line 69:
   
 
PDB2GMX
 
PDB2GMX
  +
 
Create MDP file
 
Create MDP file
  +
 
GROMPP
 
GROMPP
  +
 
MDRUN
 
MDRUN
  +
 
Create file for analysis
 
Create file for analysis
   

Revision as of 19:30, 3 July 2011

Here, we collect descriptions of the software used in the practical. This can be software used in online portals or software installed locally on your own computers or the lab resources. In each case, please describe how to access the software and where to find manuals. Also use this site to collect scripts or HOW_TOs that could be useful for others.

Mapping sequence identifiers

Protein sequence databases

Here are a number of suggestions how to map identifiers between Refseq and other identifiers contained in NR and Uniprot. -- Be careful: Refseq identifiers have version numbers (.n). Not all mapping tools take these into account or are able to cope with the attached version number. So if you use a mapping tool, look for the documentation of the input and output.

  • Use the mapping tool at Uniprot (see also Uniprot FAQ). Be careful to separate out the different types of identifiers (Genbank, Refseq, ...).
  • Use CRONOS on the web or as a web service.
  • Use SRS (sequence retrieval system) on the web (e.g. at EMBL, EBI, or any other installation) to do a query for a list of sequences. You can also do this programatically, but then you have to have access to and learn how to use the web services.


Changing Blast output

By default, Blast lists 500 search hits and 250 alignment details. This can be changed (see Blast manual for details):

  • You can use a custom output format to get a table with "-m 8" (see "-help" or this hint on how to parse Blast output).
  • You can use "-b" to set the number of alignments to be shown, "-b 20000" is the maximum.

Python

In the Linux version of the virtual machine seems to be a falsely configured python. Probably you have already noticed, that the interactive python session starts with several errors. If you import sys in this interactive session, you can regard sys.path where non-existing paths are listed.

By using

which python

you can see that the falsely configured python is /apps/bin/python. But there should be a running version /usr/bin/python, which is a symlink to /usr/bin/python2.7.

Just delete the false one by using: sudo rm /apps/bin/python

If you open a new console the normal user should be able to run a normal python session.

The problem with this procedure is, that you have to reinstall missing modules (e.g. the Modeller's modules).

Modeller

(Written for the linux virtual machine) Some might notice that after the proposed Python fix (and also before) python could not load the Modeller modules. Therefore it seems to be necessary to reinstall Modeller after the python fix.

  • First you have to save/note/remember the licence key of the already installed Modeller version:
    less /apps/modeller9.9/modlib/modeller/config.py
  • If you installed the new Modeller you can add the licence key to this installation in:
    /usr/lib/modeller9.9/modlib/modeller/config.py

Now your Modeller should be runnable.



Troubleshooting

A very common error from Modeller is the following: "Sequence difference between alignment and pdb" . This usually means the structure of the template available in PDB (which was experimentally solved) has missing residues, what could be a result of technical problems with the X-ray diffraction data (more frequently). To overcome this error, the first step should be to identify its source. For that, a fasta sequence can be generated from the PDB file containing the coordinates (it should be a simple script, or even a couple of command lines, but if any help needed, please write an email to bitar@rostlab.org). After this, align this sequence with the fasta sequence for the same protein and check for missing residues (gaps within the alignment). If residues are missing, simple remove those from the original sequence and generate a new alignment between template and target.


SNAP

There is a very brief explanation about SNAP available here --> Media:SNAP.pdf. I am currently writing a script for automatic SNAP runs and structural comparison. This will only require the alignment between two sequences (in our case one would be the 'wild type' protein sequence and the other would be the SNP containing sequence). I am not sure I will finish today (23.6). Probably tomorrow. But meanwhile I have other scripts that may help, so feel free to write me (bitar@rostlab.org).


Energy Minimization

I created a script to automatically run all steps (see below) for energy minimization with Gromacs. A snapshot of this script is available here Media:MutEn.png and the script itself is here MutEn.pl . The script includes the following steps:

1. Runs SCWRL to make sure there are no missing sidechains.

2. Runs repairPDB to clean the PDB and extract the protein only.

3. Runs Gromacs packages for Energy Minimization (in the following order):

PDB2GMX

Create MDP file

GROMPP

MDRUN

Create file for analysis

Any questions, please write me an email (bitar@rostlab.org).