Gaucher Disease: Task 02 - Alignments

Alignments allow a comparisons of Strings. In the field of bioinformatics, sequence alignments show the relation between two or more sequences.

Theoretical Background

Description

Pairwise or multiple alignments opten contain an aditional line below the proper alignment. This line gives a more accurate description of the relation of the aligned residues above. The symbols show if there is match between identical amino acids or if they are only similar.

Symbols for describing sequence alignments
Symbol	Example	Meaning
*	blub	identical residues
:		similar residues
.

Sequence Searches

Sequence searches with our query protein sequence, P04062.fasta, were done with the following programs:

BLAST
- using standard parameters
- against big_80
- against big

Psi-BLAST
- with number of shown hits and alignments set to 10000 (-b, -v options), so that all the hits will be shown.
- with all combinations of:
  - 2 iterations: 1 iterations against big_80 followed by 1 iteration against big
  - 10 iterations: 9 iterations against big_80 followed by 1 iteration against big
  - default E-value cutoff (0.002)
  - E-value cutoff 10E-10
- other options leaved default

HHblits
- with number of shown hits and alignments set to 10000 (-Z, -B options), as in Psi-BLAST
- with all combinations of:
  - 2 iterations against uniprot_20
  - 10 iterations against uniprot_20
  - default E-value cutoff (0.002)
  - E-value cutoff 10E-10
- other options leaved default

The script run.pl was written and used for the runs. PSSM files - a3m and hhr for HHblits, chk ("checkpoint") and PSSM for Psi-BLAST were created in order to start the search against another database, from big_80 to big for Psi-BLAST and later against a PDB database for the evaluation.

For Psi-BLAST, first a search against big_80 was done in order to create a good profile, then a last iteration against big was done with this profile. The idea was to get as many hits as possible, so that the results will be comparable with HHblits, where the runs were made against the clustered HMM database. All uniprot_20 cluster members were count in the following comparison and evaluation.

Comparison

Overlap of hits

Percentage identity distribution

E-value distribution

Evaluation

Validation against COPS L30 - L60 groups was made.

Multiple Sequence Alignment

For the multiple Sequence Alignments three sets were created. Therefore the results of the previous task were according to their sequnece identiy to the native protein sequence of glucocerebrosidase. The native protein sequence is also included into the sets to keep the alignments in relation to the Gaucher's disease causing protein.

Set 1: sequence identity >60%
ID	Identity in %
P04062	100
A9UD35	84.1
D1L2S0	83.0
3gxi_A	99.8
2nt1_A	99.8
F6WDY8	90.7
Q2KHZ8	89.2
F5CB27	81.8
F5H241	98.2
B7Z6S9	99.8

Set 2: sequence identity <30%
ID	Identity in %
P04062	100
H6CEV7	26.5
Q21GD0	24.3
I1WBF3	23.7
I9HH59	29.4
D0TN48	25.4
B1VPJ0	27.5
E2LY19	24.8
K9HBW2	25.9
B5QQZ8	27.1

Gaucher Disease: Task 02 - Alignments

Contents

Theoretical Background

Description

Sequence Searches

Comparison

Evaluation

Multiple Sequence Alignment

ClustalW

Muscle

T-Coffee

Alignment Comparison

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools