Task 2: Alignments

From Bioinformatikpedia
Revision as of 23:29, 29 August 2013 by Betza (talk | contribs)

Multiple sequence alignments

< 30% sequence identity > 60% sequence identity whole range of sequence identity
AC Number Sequence Identity AC Number Sequence Identity AC Number Sequence Identity AC Number Sequence Identity
F6JYA9 0.29 B4DDZ1 0.83 B4DDZ1 0.83 F6WCX4 0.34
D9J389 0.28 Q5EEZ1 0.76 Q5EEZ1 0.76 F6JYA9 0.29
D5MSB3 0.27 G3THV5 0.75 G1MBW1 0.73 D9J389 0.28
B3FRK2 0.20 G1MBW1 0.73 H0VAR7 0.72 D5MSB3 0.27
3ov6_A 0.22 H0VAR7 0.72 F1PX48 0.71 2zok_E 0.24
1p7k_L 0.21 F1PX48 0.71 G1T7D7 0.70 H0Y1D0 0.20
H0Y1D0 0.20 G1T7D7 0.70 G5BQE5 0.67 Q8SNJ4 0.22
B3FRK3 0.18 O35799 0.68 Q95IT9 0.38 B3FRK3 0.18
Q8HWL2 0.17 G5BQE5 0.67 2qrt_A 0.38 Q8HWL2 0.17
Q8HX83 0.36
Table 5: List of all sequences that were used in the multiple sequence alignments.

In order to assess the difference of multiple sequence alignments between close and related homologs, three different groups of sequences were selected. One with sequences with a sequence identity above 60% to human hfe, one with sequences below 30% identity and one with sequences covering the whole range of sequence identity. The selected sequences are listed in Table 5.
In order to ensure that also sequences with known structures are included in the alignments, the sequences from the following PDB structures were included:

  • 1p7k_L
  • 3ov6_A
  • 2qrt_A
  • 2zok_E


ClustalW

<figtable id="clustalW">

ClustalW alignment of sequences from the below 30% identity group.
ClustalW alignment of sequences from the above 60% identity group.
ClustalW alignment of sequences from the whole range identity group.

</figtable>

MAFFT

<figtable id="mafft">

MAFFT alignment of sequences from the below 30% identity group.
MAFFT alignment of sequences from the above 60% identity group.
MAFFT alignment of sequences from the whole range identity group.

</figtable>

T-Coffee

<figtable id="tcoffee">

T-Coffee alignment of sequences from the below 30% identity group.
T-Coffee alignment of sequences from the above 60% identity group.
T-Coffee alignment of sequences from the whole range identity group.

</figtable>


Comparison

In the above 60% sequence identity group, the residues and the gaps are well conserved, especially in the first third of the sequence. In the other two thirds, the sequence conservation drops slightly, but is still at a generally high level. In the below 30% identity group, the number of gaps inside the sequences is not much higher than in the above 60% group, but the overall residue conservation is significantly lower. Nevertheless, there are some very well conserved residues in this group that might be functionally important. In the mixed sequence identity group, the gaps are not as well conserved as in the other two groups and the conservation is even lower than in the below 30% identity group. But this effect is probably strengthened by the higher amount of sequences present.

The different alignment programs yield comparable results, i.e. none of them is considerably better or worse than the other two. Nevertheless, some differences can be observed. The first notable one is that MAFFT yields a lot less conserved columns for the low sequence identity group than the other programs. Also, the positioning of consecutive gap columns varies between the programs.