Task 2: Multiple Sequence Alignment

From Bioinformatikpedia

Sorry, were behind scedule, page will be filled with content as soon as possible.


We researched the protein sequence of the branched-chain alpha-keto acid dehydrogenase complex subunit alpha (BCKDHA) with the following original sequence:

  • BCKDHA
>sp|P12694|ODBA_HUMAN 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial OS=Homo sapiens GN=BCKDHA PE=1 SV=2
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL
RKQQESLARHLQTYGEHYPLDHFDK


Blast

To calculate the sequence alignments we used the blast and psiblast binaries from NCBI (version 2.2.26+) As the standard blast alignment hits the limit of 250 matches per alignment, and all of them still seemed very significant (Evalue of < 1e-60) we increased the number of max target hits to 2000 and set an Evalue threshold of 0.002. With this method we found about 1550 matching alignments.

As can be seen in the figure to the right

Distibution of sequence similarity with the BCKDHA blast-query against the big80 database.

, the sequence alignments mainly have a similarity between 15% and 40%.

Distribution of evalues in BLAST.

Psi-Blast

HHBlits

Multiple Sequence Alignment (MSA)

In this task we are to produce MSA´s out of our database search results. The first step here is to create representative datasets, followed by creating MSA´s using different tools, and finally review the alignments using jalview and compare the tools against each other.

Dataset creation

We have chosen the following sequences from the Psi-Blast run with evalue E-10 and 10 iterations, trying to fit into the scheme given on the task-page:

Marinobacter sp. ELB17F2GCR2_ALTMD Alpha keto acid dehydrogenase complex, E1 component, alpha subunit OS=Alteromonas macleodii (strain DSM 17117 / Deep ecotype)
IdentifierIdentityOrganismDescription
ref seq
P12694100%humanODBA_HUMAN 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
high identity ( >60% )
B4DP4790%humanB4DP47_HUMAN Uncharacterized protein
H2L9X980%Oryzias latipesH2L9X9_ORYLA Uncharacterized protein
H2NYX790%Pongo abeliiH2NYX7_PONAB Uncharacterized protein
G3RDZ399%Gorilla gorilla gorillaG3RDZ3_GORGO Uncharacterized protein
G7PXN692%Macaca fascicularisG7PXN6_MACFA Putative uncharacterized protein
C1BZX061%Caligus clemensiC1BZX0_9MAXI 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial
E9C2J862%Capsaspora owczarzakiE9C2J8_CAPO3 Branched chain keto acid dehydrogenase E1
A8XMR663%Caenorhabditis briggsaeA8XMR6_CAEBR Putative uncharacterized protein
F1L13161%Ascaris suumF1L131_ASCSU 2-oxoisovalerate dehydrogenase subunit alpha
B3S5B466%Trichoplax adhaerensB3S5B4_TRIAD Putative uncharacterized protein
moderate identity ( >40% )
F7NS2050%Rheinheimera sp. A13LF7NS20_9GAMM Pyruvate/2-oxoglutarate dehydrogenase complex, dehydrogenase component alpha subunit
A3JES250%A3JES2_9ALTE 2-oxoglutarate dehydrogenase complex, dehydrogenase (E1) component, eukaryotic type, alpha subunit
F2GCR249%Alteromonas macleodii (strain DSM 17117 / Deep ecotype)
E1VBY448%Halomonas elongataE1VBY4_HALED 3-methyl-2-oxobutanoate dehydrogenase (2-methylpropanoyl-transferring)
A9TD4447%Physcomitrella patens subsp. patensA9TD44_PHYPA Predicted protein
E1ZBL651%Chlorella variabilisE1ZBL6_CHLVA Putative uncharacterized protein
F2Q10751%Trichophyton equinumF2Q107_TRIEC 2-oxoisovalerate dehydrogenase subunit alpha
B6GYK752%Penicillium chrysogenumB6GYK7_PENCW Pc12g08790 protein
G2QH9153%Thielavia heterothallicaG2QH91_THIHA Dehydrogenase-like protein
G0S8V251%Chaetomium thermophilumG0S8V2_CHATD Alpha subunit-like protein
whole range identity ( 0-100% )
F6EVQ939%Sphingobium chlorophenolicumF6EVQ9_SPHCR 3-methyl-2-oxobutanoate dehydrogenase
B1YII639%Exiguobacterium sibiricumB1YII6_EXIS2 Pyruvate dehydrogenase
Q9HN7734%Halobacterium salinariumQ9HN77_HALSA Pyruvate dehydrogenase
G2QH9153%Thielavia heterothallicaG2QH91_THIHA Dehydrogenase-like protein
A9TD4447%Physcomitrella patens subsp. patensA9TD44_PHYPA Predicted protein
G0S8V251%Chaetomium thermophilumG0S8V2_CHATD Alpha subunit-like protein
B4DP4790%humanB4DP47_HUMAN Uncharacterized protein
H2L9X980%Oryzias latipesH2L9X9_ORYLA Uncharacterized protein
H2NYX790%Pongo abeliiH2NYX7_PONAB Uncharacterized protein
C1BZX061%Caligus clemensiC1BZX0_9MAXI 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial

clustalW

T-Coffee

Muscle

Results

Jalview representations of the alignments:

>60% fraction
ClustalW MSA >60%
T-Coffee MSA >60%
Muscle MSA >60%
>40% fraction
ClustalW MSA >40%
T-Coffee MSA >40%
Muscle MSA >40%
whole range fraction
ClustalW MSA 0-100%
T-Coffee MSA 0-100%
Muscle MSA 0-100%
clustalw >60 IDgaps
tr|C1BZX0|C1BZX0_9MAXI 84
sp|P12694|ODBA_HUMAN 70
tr|B3S5B4|B3S5B4_TRIAD 174
tr|H2L9X9|H2L9X9_ORYLA 66
tr|H2NYX7|H2NYX7_PONAB 52
tr|F1L131|F1L131_ASCSU 74
tr|B4DP47|B4DP47_HUMAN 67
tr|G7PXN6|G7PXN6_MACFA 32
tr|G3RDZ3|G3RDZ3_GORGO 90
tr|E9C2J8|E9C2J8_CAPO3 78
tr|A8XMR6|A8XMR6_CAEBR 83
conserved >8235
conserved >10130
t-coffee >60 IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-44580
UniProt/Swiss-Prot|C1BZX0|C1BZX0_9MAXI/1-431 94
UniProt/Swiss-Prot|A8XMR6|A8XMR6_CAEBR/1-432 93
UniProt/Swiss-Prot|B4DP47|E7EW46|B4DP47_HUMAN/1-448 77
UniProt/Swiss-Prot|H2L9X9|H2L9X9_ORYLA/1-449 76
UniProt/Swiss-Prot|F1L131|F1L131_ASCSU/1-441 84
UniProt/Swiss-Prot|G3RDZ3|G3RDZ3_GORGO/1-425 100
UniProt/Swiss-Prot|B3S5B4|B3S5B4_TRIAD/1-341 184
UniProt/Swiss-Prot|G7PXN6|G7PXN6_MACFA/1-483 42
UniProt/Swiss-Prot|H2NYX7|H2NYX7_PONAB/1-463 62
UniProt/Swiss-Prot|E9C2J8|E9C2J8_CAPO3/1-437 88
conserved >8235
conserved >10129
muscle >60 IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-44575
UniProt/Swiss-Prot|C1BZX0|C1BZX0_9MAXI/1-431 89
UniProt/Swiss-Prot|A8XMR6|A8XMR6_CAEBR/1-432 88
UniProt/Swiss-Prot|B4DP47|E7EW46|B4DP47_HUMAN/1-448 72
UniProt/Swiss-Prot|H2L9X9|H2L9X9_ORYLA/1-449 71
UniProt/Swiss-Prot|F1L131|F1L131_ASCSU/1-441 79
UniProt/Swiss-Prot|G3RDZ3|G3RDZ3_GORGO/1-425 95
UniProt/Swiss-Prot|B3S5B4|B3S5B4_TRIAD/1-341 179
UniProt/Swiss-Prot|G7PXN6|G7PXN6_MACFA/1-483 37
UniProt/Swiss-Prot|H2NYX7|H2NYX7_PONAB/1-463 57
UniProt/Swiss-Prot|E9C2J8|E9C2J8_CAPO3/1-437 83
conserved >8236
conserved >10130
clustalw >40 IDgaps
tr|G0S8V2|G0S8V2_CHATD 29
sp|P12694|ODBA_HUMAN 92
tr|E1ZBL6|E1ZBL6_CHLVA 141
tr|A9TD44|A9TD44_PHYPA 70
tr|A3JES2|A3JES2_9ALTE 133
tr|F2Q107|F2Q107_TRIEC 90
tr|B6GYK7|B6GYK7_PENCW 89
tr|F2GCR2|F2GCR2_ALTMD 142
tr|G2QH91|G2QH91_THIHA 66
tr|F7NS20|F7NS20_9GAMM 143
tr|E1VBY4|E1VBY4_HALED 130
conserved >8139
conserved >1099
t-coffee >40 IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-445109
UniProt/Swiss-Prot|G2QH91|G2QH91_THIHA/1-471 83
UniProt/Swiss-Prot|F7NS20|F7NS20_9GAMM/1-394 160
UniProt/Swiss-Prot|A3JES2|A3JES2_9ALTE/1-404 150
UniProt/Swiss-Prot|B6GYK7|B6GYK7_PENCW/1-448 106
UniProt/Swiss-Prot|E1ZBL6|E1ZBL6_CHLVA/1-396 158
UniProt/Swiss-Prot|F2GCR2|F2GCR2_ALTMD/1-395 159
UniProt/Swiss-Prot|F2Q107|F2Q107_TRIEC/1-447 107
UniProt/Swiss-Prot|A9TD44|A9TD44_PHYPA/1-467 87
UniProt/Swiss-Prot|E1VBY4|E1VBY4_HALED/1-407 147
UniProt/Swiss-Prot|G0S8V2|G0S8V2_CHATD/1-508 46
conserved >8149
conserved >10101
muscle >40 IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-44593
UniProt/Swiss-Prot|G2QH91|G2QH91_THIHA/1-471 67
UniProt/Swiss-Prot|F7NS20|F7NS20_9GAMM/1-394 144
UniProt/Swiss-Prot|A3JES2|A3JES2_9ALTE/1-404 134
UniProt/Swiss-Prot|B6GYK7|B6GYK7_PENCW/1-448 90
UniProt/Swiss-Prot|E1ZBL6|E1ZBL6_CHLVA/1-396 142
UniProt/Swiss-Prot|F2GCR2|F2GCR2_ALTMD/1-395 143
UniProt/Swiss-Prot|F2Q107|F2Q107_TRIEC/1-447 91
UniProt/Swiss-Prot|A9TD44|A9TD44_PHYPA/1-467 71
UniProt/Swiss-Prot|E1VBY4|E1VBY4_HALED/1-407 131
UniProt/Swiss-Prot|G0S8V2|G0S8V2_CHATD/1-508 30
conserved >8149
conserved >10100
clustalw whole range IDgaps
sp|P12694|ODBA_HUMAN 133
tr|C1BZX0|C1BZX0_9MAXI 147
tr|G0S8V2|G0S8V2_CHATD 70
tr|H2L9X9|H2L9X9_ORYLA 129
tr|A9TD44|A9TD44_PHYPA 111
tr|H2NYX7|H2NYX7_PONAB 115
tr|B1YII6|B1YII6_EXIS2 228
tr|B4DP47|B4DP47_HUMAN 130
tr|F6EVQ9|F6EVQ9_SPHCR 143
tr|Q9HN77|Q9HN77_HALSA 159
tr|G2QH91|G2QH91_THIHA 107
conserved >8132
conserved >1058
t-coffee whole range IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-445170
UniProt/Swiss-Prot|G2QH91|G2QH91_THIHA/1-471 144
UniProt/Swiss-Prot|B1YII6|B1YII6_EXIS2/1-350 265
UniProt/Swiss-Prot|C1BZX0|C1BZX0_9MAXI/1-431 184
UniProt/Swiss-Prot|B4DP47|E7EW46|B4DP47_HUMAN/1-448 167
UniProt/Swiss-Prot|H2L9X9|H2L9X9_ORYLA/1-449 166
UniProt/Swiss-Prot|F6EVQ9|F6EVQ9_SPHCR/1-435 180
UniProt/Swiss-Prot|Q9HN77|Q9HN77_HALSA/1-419 196
UniProt/Swiss-Prot|G0S8V2|G0S8V2_CHATD/1-508 107
UniProt/Swiss-Prot|H2NYX7|H2NYX7_PONAB/1-463 152
UniProt/Swiss-Prot|A9TD44|A9TD44_PHYPA/1-467 148
conserved >8138
conserved >1060
muscle whole range IDgaps
UniProt/Swiss-Prot|P12694|Q16034|Q16472|ODBA_HUMAN/1-445135
UniProt/Swiss-Prot|G2QH91|G2QH91_THIHA/1-471 109
UniProt/Swiss-Prot|B1YII6|B1YII6_EXIS2/1-350 230
UniProt/Swiss-Prot|C1BZX0|C1BZX0_9MAXI/1-431 149
UniProt/Swiss-Prot|B4DP47|E7EW46|B4DP47_HUMAN/1-448 132
UniProt/Swiss-Prot|H2L9X9|H2L9X9_ORYLA/1-449 131
UniProt/Swiss-Prot|F6EVQ9|F6EVQ9_SPHCR/1-435 145
UniProt/Swiss-Prot|Q9HN77|Q9HN77_HALSA/1-419 161
UniProt/Swiss-Prot|G0S8V2|G0S8V2_CHATD/1-508 72
UniProt/Swiss-Prot|H2NYX7|H2NYX7_PONAB/1-463 117
UniProt/Swiss-Prot|A9TD44|A9TD44_PHYPA/1-467 113
conserved >8136
conserved >1058