Difference between revisions of "Task alignments 2012"

Revision as of 11:53, 10 April 2012

Most prediction methods are based on comparisons to related proteins. Therefore, the search for related sequences and the alignment to other proteins is a prerequisite for most of the analyses in this practical. Hence we will investigate the recall and alignment quality of different alignment methods.

Theoretical background talks

The introductory talks should given an overview of

pairwise alignments and high-throuput profile searches (e.g. Fasta, Blast, PSI-Blast, HHsearch)
multiple alignments (e.g. ClustalW, Probcons, Mafft, Muscle, T-Coffee, Cobalt) and MSA editors (e.g. Jalview)

with special attention to advantages and limitations of theses methods.

Sequence searches

Subsequently, for every native protein sequence for every disease the students shall employ different tools for database searching and multiple sequence alignment in the "big80" database. The methods to employ (minimally) are:

Searches of the non-redundant sequence database big80:
- Blast
- PSI-Blast using standard parameters with all combinations of
  - 3 iterations
  - 10 iterations
  - default E-value cutoff (0.005)
  - E-value cutoff 10E-6
- HHblits / HHsearch

Note: Check the outcome of your simple blast search. If there are many significant hits, increase the e-value cutoff for reporting hits (-b or max_target_seqs depending on blast version) until no more relevant hits are found. Use that parameter also for the PSI-Blast searches and use a similar setting for HHblits / HHsearch. (Think about why we ask you to do this.)

For evaluating the differences of the search methods:

compare the result lists (how much overlap, distribution of %identity and score)
validate the result list (e.g. using COPS / GO)

Multiple sequence alignments

Multiple sequence alignments of 20 sequences from the database search, including sequences from these ranges:

99 - 90% sequence identity
89 - 60% sequence identity
59 - 40% sequence identity
39 - 20% sequence identity

Ideally there should be 5 sequences from each range with at least one pdb-structure in each range.

The alignment methods to use are:

Cobalt
ClustalW
Muscle
T-Coffee with
- default parameters ("t_coffee your_sequences.fasta)
- use of 3D-Coffee

Comparison of the alignments:

How many conserved columns?
Are functionally important residues conserved?
How many gaps?
Are there gaps in secondary structure elements?

@@ Line 10: / Line 10: @@
 Subsequently, for every native protein sequence for every disease the students shall employ different tools for database searching and multiple sequence alignment in the "big80" database.
 The methods to employ (minimally) are:
-* Searches of the non-redundant sequence database:
+* Searches of the non-redundant sequence database big80:
-** Fasta (?)
+** Blast
-** Blast
 ** PSI-Blast using standard parameters with all combinations of
 *** 3 iterations
@@ Line 18: / Line 17: @@
 *** default E-value cutoff (0.005)
 *** E-value cutoff 10E-6
-** HHsearch
+** HHblits / HHsearch
+'''Note:''' Check the outcome of your simple blast search. If there are many significant hits, increase the e-value cutoff for reporting hits (-b or max_target_seqs depending on blast version) until no more relevant hits are found. Use that parameter also for the PSI-Blast searches and use a similar setting for HHblits / HHsearch. (Think about why we ask you to do this.)
 For evaluating the differences of the search methods:
 * compare the result lists (how much overlap, distribution of %identity and score)
-* validate the result list (COPS / GO / ???)
+* validate the result list (e.g. using COPS / GO)
 == Multiple sequence alignments==

Difference between revisions of "Task alignments 2012"

Revision as of 11:53, 10 April 2012

Theoretical background talks

Sequence searches

Multiple sequence alignments

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools