Difference between revisions of "Task alignments 2011"

From Bioinformatikpedia
m
Line 1: Line 1:
 
Most prediction methods are based on comparisons to related proteins. Therefore, the search for related sequences and the alignment to other proteins is a prerequisite for most of the analyses in this practical. Hence we will investigate the recall and alignment quality of different alignment methods.
 
Most prediction methods are based on comparisons to related proteins. Therefore, the search for related sequences and the alignment to other proteins is a prerequisite for most of the analyses in this practical. Hence we will investigate the recall and alignment quality of different alignment methods.
  +
  +
The introductory talks shall give an overview of
  +
* pairwise alignments and high-throuput profile searches (e.g. Fasta, Blast, PSI-Blast, HHsearch)
  +
* multiple alignments (e.g. ClustalW, Probcons, Mafft, Muscle, T-Coffee, Cobalt) and MSA editors (e.g. Jalview)
  +
with special attention to advantages and limitations of theses methods.
  +
  +
Subsequently, for every native protein sequence for every disease the students shall employ different tools for database searching and multiple sequence alignment.
  +
The methods to employ (minimally) are:
  +
* Searches of the non-redundant sequence database:
  +
** Fasta
  +
** Blast
  +
** PSI-Blast using standard parameters with all combinations of
  +
*** 3 iterations
  +
*** 5 iterations
  +
*** default E-value cutoff (0.005)
  +
*** E-value cutoff 10E-6;
  +
** HHsearch
  +
  +
* Multiple sequence alignments of 20 sequences from your database search, including sequences from these ranges:
  +
** 99 - 90% sequence identity
  +
** 89 - 60% sequence identity
  +
** 59 - 40% sequence identity
  +
** 39 - 20% sequence identity
  +
Ideally there should be 5 sequences from each range with at least one pdb-structure in each range.
  +
The alignment methods to use are:
  +
** Cobalt
  +
** ClustalW
  +
** Muscle
  +
** T-Coffee with
  +
*** default parameters ("t_coffee your_sequences.fasta)
  +
*** use of 3D-Coffee

Revision as of 13:41, 3 May 2011

Most prediction methods are based on comparisons to related proteins. Therefore, the search for related sequences and the alignment to other proteins is a prerequisite for most of the analyses in this practical. Hence we will investigate the recall and alignment quality of different alignment methods.

The introductory talks shall give an overview of

  • pairwise alignments and high-throuput profile searches (e.g. Fasta, Blast, PSI-Blast, HHsearch)
  • multiple alignments (e.g. ClustalW, Probcons, Mafft, Muscle, T-Coffee, Cobalt) and MSA editors (e.g. Jalview)

with special attention to advantages and limitations of theses methods.

Subsequently, for every native protein sequence for every disease the students shall employ different tools for database searching and multiple sequence alignment. The methods to employ (minimally) are:

  • Searches of the non-redundant sequence database:
    • Fasta
    • Blast
    • PSI-Blast using standard parameters with all combinations of
      • 3 iterations
      • 5 iterations
      • default E-value cutoff (0.005)
      • E-value cutoff 10E-6;
    • HHsearch
  • Multiple sequence alignments of 20 sequences from your database search, including sequences from these ranges:
    • 99 - 90% sequence identity
    • 89 - 60% sequence identity
    • 59 - 40% sequence identity
    • 39 - 20% sequence identity

Ideally there should be 5 sequences from each range with at least one pdb-structure in each range. The alignment methods to use are:

    • Cobalt
    • ClustalW
    • Muscle
    • T-Coffee with
      • default parameters ("t_coffee your_sequences.fasta)
      • use of 3D-Coffee