Resource software

From Bioinformatikpedia
Revision as of 18:22, 4 July 2011 by Bitar (talk | contribs)

Here, we collect descriptions of the software used in the practical. This can be software used in online portals or software installed locally on your own computers or the lab resources. In each case, please describe how to access the software and where to find manuals. Also use this site to collect scripts or HOW_TOs that could be useful for others.

Mapping sequence identifiers

Protein sequence databases

Here are a number of suggestions how to map identifiers between Refseq and other identifiers contained in NR and Uniprot. -- Be careful: Refseq identifiers have version numbers (.n). Not all mapping tools take these into account or are able to cope with the attached version number. So if you use a mapping tool, look for the documentation of the input and output.

  • Use the mapping tool at Uniprot (see also Uniprot FAQ). Be careful to separate out the different types of identifiers (Genbank, Refseq, ...).
  • Use CRONOS on the web or as a web service.
  • Use SRS (sequence retrieval system) on the web (e.g. at EMBL, EBI, or any other installation) to do a query for a list of sequences. You can also do this programatically, but then you have to have access to and learn how to use the web services.


Changing Blast output

By default, Blast lists 500 search hits and 250 alignment details. This can be changed (see Blast manual for details):

  • You can use a custom output format to get a table with "-m 8" (see "-help" or this hint on how to parse Blast output).
  • You can use "-b" to set the number of alignments to be shown, "-b 20000" is the maximum.

Python

In the Linux version of the virtual machine seems to be a falsely configured python. Probably you have already noticed, that the interactive python session starts with several errors. If you import sys in this interactive session, you can regard sys.path where non-existing paths are listed.

By using

which python

you can see that the falsely configured python is /apps/bin/python. But there should be a running version /usr/bin/python, which is a symlink to /usr/bin/python2.7.

Just delete the false one by using: sudo rm /apps/bin/python

If you open a new console the normal user should be able to run a normal python session.

The problem with this procedure is, that you have to reinstall missing modules (e.g. the Modeller's modules).

Modeller

(Written for the linux virtual machine) Some might notice that after the proposed Python fix (and also before) python could not load the Modeller modules. Therefore it seems to be necessary to reinstall Modeller after the python fix.

  • First you have to save/note/remember the licence key of the already installed Modeller version:
    less /apps/modeller9.9/modlib/modeller/config.py
  • If you installed the new Modeller you can add the licence key to this installation in:
    /usr/lib/modeller9.9/modlib/modeller/config.py

Now your Modeller should be runnable.



Troubleshooting

A very common error from Modeller is the following: "Sequence difference between alignment and pdb" . This usually means the structure of the template available in PDB (which was experimentally solved) has missing residues, what could be a result of technical problems with the X-ray diffraction data (more frequently). To overcome this error, the first step should be to identify its source. For that, a fasta sequence can be generated from the PDB file containing the coordinates (it should be a simple script, or even a couple of command lines, but if any help needed, please write an email to bitar@rostlab.org). After this, align this sequence with the fasta sequence for the same protein and check for missing residues (gaps within the alignment). If residues are missing, simple remove those from the original sequence and generate a new alignment between template and target.


SNAP

There is a very brief explanation about SNAP available here --> Media:SNAP.pdf.