Difference between revisions of "Fabry:Sequence alignments (sequence searches and multiple alignments)"

From Bioinformatikpedia
 
(66 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
[[Fabry Disease]] » Sequence alignments (sequence searches and multiple alignments)
  +
<hr>
  +
  +
  +
This page contains our results and discussions. The lab journal can be found [[Fabry:Sequence alignments (sequence searches and multiple alignments):Journal|here]].
  +
  +
 
== Reference sequence ==
 
== Reference sequence ==
   
Line 22: Line 29:
 
For the comparison of the GO terms, we obtained the set of terms for each hit and analyzed the number of those in common with the GO terms of the search protein α-Galactosidase A . We devided the number of common terms by the number of GO terms of P06280 (49). Since these proportions are very small, we thought it would also make sense to explore the fraction of the hits GO terms shared with the reference terms. Thus we devided the number if common terms by the number of terms of the hit. The histogram of the second rate show that in average over 80% of the GO terms of each hit are common with those of AGAL. The small amount of average accordance of the Galactosidase terms to the hit terms may be due to the fact that humans are a lot more complex than the species the homologous hits belong to. So the protein has to fullfill more needs in a more complex organism and thus has more GO terms assigned.
 
For the comparison of the GO terms, we obtained the set of terms for each hit and analyzed the number of those in common with the GO terms of the search protein α-Galactosidase A . We devided the number of common terms by the number of GO terms of P06280 (49). Since these proportions are very small, we thought it would also make sense to explore the fraction of the hits GO terms shared with the reference terms. Thus we devided the number if common terms by the number of terms of the hit. The histogram of the second rate show that in average over 80% of the GO terms of each hit are common with those of AGAL. The small amount of average accordance of the Galactosidase terms to the hit terms may be due to the fact that humans are a lot more complex than the species the homologous hits belong to. So the protein has to fullfill more needs in a more complex organism and thus has more GO terms assigned.
   
The average length fits the length of the α-Galactosidase A protein very well. This can be seen in the left picture [[Media:blastsearch_default_v700_Lengths.png | Histogram of the length of the BLAST hits for P06280]].
+
The average length fits the length of the α-Galactosidase A protein very well. This can be seen in the picture below ([[Media:blastsearch_default_v700_Lengths.png | Histogram of the length of the BLAST hits for P06280]]).
 
On average over 51% of the residues are positive and almost 36% are identical hits. Thus on average 87% of the residues in each alignment are similar to the protein sequence of AGAL.
 
On average over 51% of the residues are positive and almost 36% are identical hits. Thus on average 87% of the residues in each alignment are similar to the protein sequence of AGAL.
   
Line 42: Line 49:
 
=== Psi-Blast ===
 
=== Psi-Blast ===
   
  +
<figure id="fig:psi_overlaps">[[File:Fabry psi blast run overlaps.png|200px|thumb|<caption>Overview of the Psi-Blast result sets</caption>]]</figure>
  +
  +
We also used Psi-Blast - the profile blast program - to search against the big80 database. The one-line description output parameter (-v) was increased to show a maximum of 4000 hits and the same goes for the number of alignments to show (-b). There were four runs with the iterations set to 2 and 10 and the e-value cut-off to 2e-3 and 1e-9, respectively. All the other options were left at its default values. The exact commandline calls can be obtained from the [[Fabry:Sequence alignments (sequence searches and multiple alignments):Journal#Psi-Blast|journal]].
  +
  +
{| style="border-spacing: 0em; text-align: center; margin: 2em auto;"
  +
|-
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 1em;" | Iterations
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 1em; border-left: 2px solid #000;" | E-value cut-off
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 1em; border-left: 2px solid #000;" | Number of Hits
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 2em; border-left: 2px solid #000;" | Runtime
  +
|-
  +
| 2
  +
| style="border-left: 2px solid #000;" | 2e-3
  +
| style="border-left: 2px solid #000;" | 1129
  +
| style="border-left: 2px solid #000;" | 3m0.814s
  +
|-
  +
| 2
  +
| style="border-left: 2px solid #000;" | 1e-9
  +
| style="border-left: 2px solid #000;" | 683
  +
| style="border-left: 2px solid #000;" | 3m9.422s
  +
|-
  +
| 10
  +
| style="border-left: 2px solid #000;" | 2e-3
  +
| style="border-left: 2px solid #000;" | 3181
  +
| style="border-left: 2px solid #000;" | 14m29.179s
  +
|-
  +
| 10
  +
| style="border-left: 2px solid #000;" | 1e-9
  +
| style="border-left: 2px solid #000;" | 1491
  +
| style="border-left: 2px solid #000;" | 15m39.251s
  +
|-
  +
|}
  +
  +
<xr id="fig:psi_overlaps"/> shows the hit set overlaps of the psi-blast runs.
  +
<br style="clear:both;">
  +
  +
<gallery caption="Psi-Blast run with 2 iterations and an E-value threshold of 2e-3" style="float: left; margin: 10px auto">
  +
File:Fabry psi results 2its eVal 2e-3 coverage.png | Coverage
  +
File:Fabry psi results 2its eVal 2e-3 eValue.png | E-value
  +
File:Fabry psi results 2its eVal 2e-3 identity.png | Sequence identity
  +
</gallery>
  +
  +
<gallery caption="Psi-Blast run with 2 iterations and an E-value threshold of 1e-9" style="float: right; margin: 10px auto">
  +
File:Fabry psi results 2its eVal 1e-9 coverage.png | Coverage
  +
File:Fabry psi results 2its eVal 1e-9 eValue.png | E-value
  +
File:Fabry psi results 2its eVal 1e-9 identity.png | Sequence identity
  +
</gallery>
  +
  +
<gallery caption="Psi-Blast run with 10 iterations and an E-value threshold of 2e-3" style="float: left; margin: 10px auto">
  +
File:Fabry psi results 10its eVal 2e-3 coverage.png | Coverage
  +
File:Fabry psi results 10its eVal 2e-3 eValue.png | E-value
  +
File:Fabry psi results 10its eVal 2e-3 identity.png | Sequence identity
  +
</gallery>
  +
  +
<gallery caption="Psi-Blast run with 10 iterations and an E-value threshold of 1e-9" style="float: right; margin: 10px auto">
  +
File:Fabry psi results 10its eVal 1e-9 coverage.png | Coverage
  +
File:Fabry psi results 10its eVal 1e-9 eValue.png | E-value
  +
File:Fabry psi results 10its eVal 1e-9 identity.png | Sequence identity
  +
</gallery>
  +
  +
<figure id="fig:psi_go_comparison_10its_1e-9">
  +
[[File:Fabry ids psiblast 10its eVal 1e-9 GOterms comparison.png|thumb|250px|<caption>GO terms of P06280 and each Psi-BLAST hit (with Evalue ≤ 1e-9) compared.
  +
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to the number of each hit</caption>]]
  +
</figure>
  +
  +
<br style="clear:both;">
   
 
=== HHblits ===
 
=== HHblits ===
 
We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8).
 
We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8).
  +
 
<figtable id="blastidev">
 
<figtable id="blastidev">
{| class="wikitable" style="float: left; border: 2px solid darkgray;" cellpadding="2"
+
{| class="wikitable" style="border: 2px solid darkgray; float: left; margin: auto;" cellpadding="2"
   
 
! scope="row" align="left" |
 
! scope="row" align="left" |
 
| align="right" | '''2 iterations - default'''
 
| align="right" | '''2 iterations - default'''
 
|-
 
|-
  +
 
! scope="row" align="left" |
 
! scope="row" align="left" |
| align="right" | [[File:hhblits_default_ids_GOterms_comparison.png|thumb|250px| GO terms of P06280 and each HHblits hit (with Evalue < 0.003) compared. Percentageterms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
+
| align="right" | [[File:hhblits_default_ids_GOterms_comparison.png|thumb|250px| GO terms of P06280 and the first identifier in each HHblits cluster (with Evalue < 0.003) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit]]
| align="right" | [[File:hhblits_default.out_Evalues.png|thumb|250px| Histogram of the logarithmic E-values of the HHblits hits for P06280]]
+
| align="right" | [[File:hhblits_default.out_cluster_ids_only_GOterms_comparison.png|thumb|250px| GO terms of P06280 and each identifier in each HHblits cluster (with Evalue < 0.003) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit]]
|-
 
! scope="row" align="left" |
 
| align="right" | [[File:hhblits_default_Similarity.png|thumb|250px| Histogram of the similarity of the HHblits hits to P06280]]
 
| align="right" | [[File:hhblits_default_Identities.png|thumb|250px| Histogram of the identical amino acids of the pairwise alignments of the HHblits hits for P06280]]
 
 
|-
 
|-
 
|}
 
|}
 
</figtable>
 
</figtable>
  +
The HHBlits search was performed with the maximum E-value in the summary and alignment list set to 0.003 (-E) and the minimum number of lines in the summary hit list had to be 700 (-z). From this search we obtained only 325 significant cluster.
   
  +
We also compared the GO terms in a similar manner as in the [[Fabry:Sequence_alignments_(sequence_searches_and_multiple_alignments):Results#Blast| BLAST section]]. Here we discovered that on average only 14% of the AGAL_HUMAN protein's GO terms are included in the hits' terms. The "reverse" calculation revealed that around 70% of the hits' GO classes are in common with the search protein. This is rather low in comparison to the BLAST results.<br>
<br>
 
  +
In the pictures on the left, one can see, that it did make only a slight difference in the distribution of the shared GO terms whether only the first identifier of each cluster was included or all of them were analyzed.
The HHBlits search was performed with the maximum E-value in the summary and alignment list set to 0.003 (-E) and the minimum number of lines in the summary hit list had to be 700 (-z). From this search we obtained only 326 significant hits.
 
   
  +
The mean E-value in contrast is almost equal to the average E-value of the BLAST search. The same applies to the number of identical amino acids. The number of E-values and fraction of identical residues is comparable to the BLAST values, since there is only one E-value, %Identical and %Similar for each cluster.
We also compared the GO terms in a similar manner as in the [[Fabry:Sequence_alignments_(sequence_searches_and_multiple_alignments):Results#Blast| BLAST section]]. Here we discovered that on average only 14% of the AGAL_HUMAN protein's GO terms are included in the hits' terms. The "reverse" calculation revealed that around 70% of the hits' GO classes are in common with the search protein. This is rather low in comparison to the BLAST results.
 
   
The mean E-value in contrast is almost equal to the average E-value of the BLAST search. The same applies to the number of identical amino acids.
 
   
  +
<figtable id="blastidev">
  +
{| class="wikitable" style="border: 2px solid darkgray; float: left; margin: 2em auto;" cellpadding="2"
  +
  +
! scope="row" align="left" |
  +
| align="right" | [[File:hhblits_default.out_Evalues.png|thumb|250px| Histogram of the logarithmic E-values of the HHblits hits for P06280]]
  +
| align="right" | [[File:hhblits_default_Similarity.png|thumb|250px| Histogram of the similarity of the HHblits hits to P06280]]
  +
| align="right" | [[File:hhblits_default_Identities.png|thumb|250px| Histogram of the identical amino acids of the pairwise alignments of the HHblits hits for P06280]]
  +
|-
  +
|}
  +
</figtable>
   
 
<br style="clear:both;">
 
<br style="clear:both;">
   
 
<figtable id="blastidev">
 
<figtable id="blastidev">
{| class="wikitable" style="float: left; border: 2px solid darkgray;" cellpadding="2"
+
{| class="wikitable" style="border: 2px solid darkgray; float: left; margin: auto;" cellpadding="2"
   
 
! scope="row" align="left" |
 
! scope="row" align="left" |
Line 79: Line 160:
 
|-
 
|-
 
! scope="row" align="left" |
 
! scope="row" align="left" |
| align="right" | [[File:hhblits_n8_neu_ids_GOterms_comparison.png|thumb|250px| GO terms of P06280 and each HHblits hit (with Evalue < 0.003 and 8 iterations) compared.
+
| align="right" | [[File:hhblits_n8_neu_ids_GOterms_comparison.png|thumb|250px| GO terms of P06280 and the first identifier in each HHblits cluster (with Evalue < 0.003 and 8 iterations) compared.
 
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
 
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
| align="right" | [[File:hhblits_n8_neu.out_Evalues.png|thumb|250px| Histogram of the logarithmic E-values of the HHblits search with 8 iterations for P06280]]
 
|-
 
 
! scope="row" align="left" |
 
! scope="row" align="left" |
| align="right" | [[File:hhblits_n8_neu_Similarity.png|thumb|250px| Histogram of the similarity of the BLAST hits (search with 8 iterations) to P06280]]
+
| align="right" | [[File:hhblits_n8_neu.out_cluster_ids_only_GOterms_comparison.png|thumb|250px| GO terms of P06280 and each identifier in each HHblits cluster (with Evalue < 0.003 and 8 iterations) compared.
  +
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit]]
| align="right" | [[File:hhblits_n8_neu_Identities.png|thumb|250px| Histogram of the identical amino acids of the pairwise alignments of the BLAST hits (search with 8 iterations) for P06280]]
 
 
|-
 
|-
 
|}
 
|}
 
</figtable>
 
</figtable>
  +
Since we thought that the number of significant hits was too low, we performed another HHBlits search with 8 iterations. Doing so, we gained 729 cluster with E-value smaller or equal to 0.003.
<br>
 
Since we thought that the number of significant hits was too low, we performed another HHBlits search with 8 iterations. Doing so, we gained 729 hits with E-value smaller or equal to 0.003.
 
   
The similarity in GO terms got better, but all other comparative values, like average E-value, similarity and identical residues got worse.
+
Considering only the first identifier in each cluster, the similarity in GO terms became better, but all other comparative values, like average E-value, similarity and identical residues got worse. If all identifiers are taken into account, also the similarity in GO terms becomes worse, but the distribution again is akin. This leads to the conclusion, that the proteins in one cluster actually do have almost completely the same GO terms. This can absolutely be explained by the algorithm of HHBlits.
   
 
Thus increasing the number of iterations might be better to obtain more homologous proteins, but since the similarity is smaller, the conservation might also be not as high as for proteins detected with less iterations.
 
Thus increasing the number of iterations might be better to obtain more homologous proteins, but since the similarity is smaller, the conservation might also be not as high as for proteins detected with less iterations.
   
  +
<figtable id="blastidev">
  +
{| class="wikitable" style="border: 2px solid darkgray; float: left; margin: 2em auto;" cellpadding="2"
   
  +
! scope="row" align="left" |
<br style="clear:both;">
 
  +
| align="right" | [[File:hhblits_n8_neu.out_Evalues.png|thumb|250px| Histogram of the logarithmic E-values of the HHblits search with 8 iterations for P06280]]
  +
| align="right" | [[File:hhblits_n8_neu_Similarity.png|thumb|250px| Histogram of the similarity of the BLAST hits (search with 8 iterations) to P06280]]
  +
| align="right" | [[File:hhblits_n8_neu_Identities.png|thumb|250px| Histogram of the identical amino acids of the pairwise alignments of the BLAST hits (search with 8 iterations) for P06280]]
  +
|-
  +
|}
  +
</figtable>
   
 
The first HHblits run took about 2.5 minutes, the second one about 16 minutes (see section [[Sequence_alignments_(sequence_searches_and_multiple_alignments)#Time | Time]]).
 
The first HHblits run took about 2.5 minutes, the second one about 16 minutes (see section [[Sequence_alignments_(sequence_searches_and_multiple_alignments)#Time | Time]]).
  +
  +
<br style="clear:both;">
   
 
== Comparison sequence searches ==
 
== Comparison sequence searches ==
Line 111: Line 199:
   
 
Venn diagrams created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.]
 
Venn diagrams created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.]
  +
  +
For the Venn diagrams above, only the first identifier of each HHBlits cluster is used. This is only useful in the sense, that the number of proteins is comparable.
 
In the Venn diagrams one realises, that only a small portion of the found hits is shared by all three methods. Each method seems to have a very unique set of findings. The biggest overlap is between the BLAST and Psi-BLAST hits, which is according to our expectations, since these two use similar approaches, while HHBlits searches by using iterative HMM-HMM comparison. These facts become most obvious in the last picture, where only the 100 best hits of all three methods are compared. Only 6 hits are common among all methods. In the remaining 94, about half are shared by BLAST and Psi-BLAST, the other half is unique in BLAST and Psi-BLAST. HHBlits has 84 unique hits and shares 5 hits solely with each of the BLAST algorithms. The comparison of all hits with E-value smaller or equal to 0.03 in all methods looks similar. It is noteworthy that here even a small number of hits is even shared only by HHBlits and BLAST (52), as well as Psi-BLAST and HHBlits (2).
 
In the Venn diagrams one realises, that only a small portion of the found hits is shared by all three methods. Each method seems to have a very unique set of findings. The biggest overlap is between the BLAST and Psi-BLAST hits, which is according to our expectations, since these two use similar approaches, while HHBlits searches by using iterative HMM-HMM comparison. These facts become most obvious in the last picture, where only the 100 best hits of all three methods are compared. Only 6 hits are common among all methods. In the remaining 94, about half are shared by BLAST and Psi-BLAST, the other half is unique in BLAST and Psi-BLAST. HHBlits has 84 unique hits and shares 5 hits solely with each of the BLAST algorithms. The comparison of all hits with E-value smaller or equal to 0.03 in all methods looks similar. It is noteworthy that here even a small number of hits is even shared only by HHBlits and BLAST (52), as well as Psi-BLAST and HHBlits (2).
 
The shared hits of the two different HHBlits searches with 2 and 8 iterations shows also a great amount of overlap.
 
The shared hits of the two different HHBlits searches with 2 and 8 iterations shows also a great amount of overlap.
   
  +
In the picture below, all identifiers in the HHBlits clusters were used. In this case a lot more identifiers are shared among all three methods. Not in respect to the total number of ids in the the HHBlits clusters in total (8463), but in respect to the number of identifiers that the two BLAST methods do not share with HHBlits. Here only 193 ids are not at all shared with HHBlits.
=== Comparing the Evalues ===
 
  +
The comparison of the first 100 hits of BLAST and Psi-BLAST was performed with the 100 first clusters in the HHBlits output. Again a larger amount of ids was shared by all three methods (39) and also few hits were unique in the BLAST searches (29 in sum).
   
{| class="centered"
+
{| class="centered"
  +
| [[File:venn_blast_psi_hhblits_allHHBLITSRESULTS.png|thumb| Venn diagram of proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations. In this picture '''all''' identifiers in each cluster are used for the HHBlits result]]
[[File:Fabry_animation.gif]]
 
  +
| [[File:venn_blast_psi_hhblits_allHHBLITSRESULTS_first100.png|thumb| Venn diagramof the first 100 proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations. In this picture '''all''' identifiers in each cluster are used for the HHBlits result]]
 
|}
 
|}
  +
Above you can see an animated histogram of the distribution of the E-values, for the search performed with different methods.
 
  +
=== Comparing the Evalues ===
  +
  +
[[File:Fabry_animation.gif|frame]]
  +
On the right, you can see an animated histogram of the distribution of the E-values, for the search performed with different methods.
 
The R Script is based on Andrea's [[ARSA_search_protocol | R Script psiBlast.evalueHist.Rscript]]
 
The R Script is based on Andrea's [[ARSA_search_protocol | R Script psiBlast.evalueHist.Rscript]]
   
 
The most obvious fact is, that the E-value distribution of the Psi-BLAST hits is very different from the other two methods' hits. The Psi-BLAST histogram has its maximum around -60, while the histograms of the BLAST and the HHBlits search do not, but rather tend towards the zero point. Comparing especially the BLAST and Psi-BLAST results the advantage of refining steps and more iterations becomes clear, since the quality, in respect to the E-value, increases. Thus in respect to the E-values I would prefer using Psi-Blast.
 
The most obvious fact is, that the E-value distribution of the Psi-BLAST hits is very different from the other two methods' hits. The Psi-BLAST histogram has its maximum around -60, while the histograms of the BLAST and the HHBlits search do not, but rather tend towards the zero point. Comparing especially the BLAST and Psi-BLAST results the advantage of refining steps and more iterations becomes clear, since the quality, in respect to the E-value, increases. Thus in respect to the E-values I would prefer using Psi-Blast.
  +
  +
<br style="clear: both;">
   
 
=== Time ===
 
=== Time ===
Line 150: Line 248:
 
== Multiple sequence alignments ==
 
== Multiple sequence alignments ==
 
=== Dataset ===
 
=== Dataset ===
We used the following 30 proteins to create multiple sequence alignments with the different methods. Since we only no sequences with sequence identity of more than 90%, we created three datasets. One with 10 sequences spanning the whole range of sequence identity, one with sequences having an sequence identity <40% and the last one with sequence identity >60%.
+
The dataset was generated from the result set of the Psi-Blast run with 10 iterations and an E-value cut-off of 1e-9. We used the following 30 proteins to create multiple sequence alignments with the different methods. Since there were no sequences with a sequence identity of more than 90%, we created three datasets. One with 10 sequences spanning the whole range of sequence identity, one with sequences having an sequence identity <40% and the last one with sequence identity >60%.
   
 
id eVal identity coverage alignment_length
 
id eVal identity coverage alignment_length
 
 
#whole range
+
# whole range = Set100
 
tr|C7PCU7|C7PCU7_CHIPD 5e-63 21 0.9933 474
 
tr|C7PCU7|C7PCU7_CHIPD 5e-63 21 0.9933 474
 
tr|B3RSE1|B3RSE1_TRIAD 2e-93 49 0.8415 362
 
tr|B3RSE1|B3RSE1_TRIAD 2e-93 49 0.8415 362
Line 166: Line 264:
 
tr|F8FLU8|F8FLU8_PAEMK 1e-76 10 0.6091 474
 
tr|F8FLU8|F8FLU8_PAEMK 1e-76 10 0.6091 474
 
 
#<40% sequence identity
+
# <40% sequence identity = Set40
 
tr|B8P149|B8P149_POSPM 3e-80 28 0.9425 432
 
tr|B8P149|B8P149_POSPM 3e-80 28 0.9425 432
 
tr|G2TQE8|G2TQE8_BACCO 7e-68 8 0.5795 452
 
tr|G2TQE8|G2TQE8_BACCO 7e-68 8 0.5795 452
Line 178: Line 276:
 
tr|F2USV1|F2USV1_SALS5 1e-88 35 1 467
 
tr|F2USV1|F2USV1_SALS5 1e-88 35 1 467
 
 
#>60%
+
# >60% sequence identity = Set60
 
tr|G1P280|G1P280_MYOLU 1e-108 78 0.9699 420
 
tr|G1P280|G1P280_MYOLU 1e-108 78 0.9699 420
 
tr|Q4RTE7|Q4RTE7_TETNG 7e-89 71 0.7319 314
 
tr|Q4RTE7|Q4RTE7_TETNG 7e-89 71 0.7319 314
Line 189: Line 287:
 
tr|G3WK18|G3WK18_SARHA 1e-108 72 0.9388 414
 
tr|G3WK18|G3WK18_SARHA 1e-108 72 0.9388 414
 
tr|H2L5H7|H2L5H7_ORYLA 1e-100 61 0.9534 411
 
tr|H2L5H7|H2L5H7_ORYLA 1e-100 61 0.9534 411
  +
  +
=== Methods ===
  +
Each of the alignment tools ClustalW, Muscle, T-Coffee and 3D-Coffee was run on the three dataset to produce an multiple sequence alignment. The exact commandline calls are listed in our [[Fabry:Sequence alignments (sequence searches and multiple alignments):Journal#Multiple_sequence_alignments |lab journal]]. Since 3D-Coffee needs a structure template and there were no suitable hits with a pdb entry assigned, we searched for a structure of a homologous sequence in pdb. We used the following pdb templates for the three datasets:
  +
* Set 100: [http://www.uniprot.org/uniprot/Q90744 1KTB]
  +
* Set 40: [http://www.uniprot.org/uniprot/Q92456 1SZN]
  +
* set 60: [http://www.uniprot.org/uniprot/P06280 1R46] (which is also the structure of our reference sequence)
  +
  +
However, the structures do not seem to add much valuable information to the alignment, so that the 3D-Coffee alignments were all identical to the one that T-Coffee generated.
   
 
=== Results ===
 
=== Results ===
TODO: Add pictures of MSA and find a way to present them, since they are _very_ wide --[[User:Rackersederj|Rackersederj]] 07:06, 5 May 2012 (UTC)<br>
 
Maybe only interesting parts or the active site...? --[[User:Rackersederj|Rackersederj]] 13:07, 5 May 2012 (UTC) ... Active site (D179 and D231) and partly the surrounding parts are highly conserved! Functional sites... maybe also Glycosylation site (139,192,215,408) and Disulfide bonds (52 ↔ 94, 56 ↔ 63, 142 ↔ 172, 202 ↔ 223, 378 ↔ 382)
 
   
==== ClustalW ====
+
==== Set 100 ====
  +
===== ClustalW =====
msa/clustalw_fabry_dataset_0.msa
 
  +
[[File:Fabry clustalw 0.png|800px|thumb|The MSA calculated by ClustalW]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 202: Line 307:
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 144
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 133
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 383
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 372
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 389
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 108
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 97
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 104
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 93
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 463
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 452
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 433
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 422
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 416
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 89
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 464
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 453
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 100
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 405
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 145
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 134
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
Line 239: Line 347:
 
|}
 
|}
   
  +
===== Muscle =====
msa/clustalw_fabry_dataset_40.msa
 
  +
[[File:Fabry muscle 0.png|800px|thumb|The MSA calculated by Muscle]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 245: Line 354:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 187
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 217
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 164
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 213
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 270
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 496
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 153
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 221
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 169
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 513
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 151
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 577
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 230
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 257
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 459
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 258
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 151
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 576
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 431
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 546
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 529
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 2
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 10
 
|-
 
|-
 
|}
 
|}
   
  +
===== T-Coffee =====
msa/clustalw_fabry_dataset_61.msa
 
  +
[[File:Fabry tcoffe 0.png|800px|thumb|The MSA calculated by T-Coffee]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 289: Line 402:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 23
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 516
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 33
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 755
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 48
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 772
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 25
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 476
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 16
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 835
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 18
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 480
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 80
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 805
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 29
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 472
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 27
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 788
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 49
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 836
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 517
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 154
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 9
 
|-
 
|-
 
|}
 
|}
   
  +
===== 3D-Coffee =====
 
  +
[[File:Fabry 3Dcoffee 0.png|800px|thumb|The MSA calculated by 3D-Coffee]]
==== Muscle ====
 
msa/muscle_fabry_dataset_0.msa
 
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 335: Line 450:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 187
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 516
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 183
 
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 466
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 755
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 191
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 772
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 547
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 476
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 227
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 835
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 228
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 480
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 546
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 805
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 516
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 472
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 499
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 788
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 836
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 517
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 6
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 9
 
|-
 
|-
 
|}
 
|}
   
  +
==== Set 40 ====
msa/muscle_fabry_dataset_40.msa
 
  +
===== ClustalW =====
  +
[[File:Fabry clustalw 40.png|800px|thumb|The MSA calculated by ClustalW]]
  +
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 305
 
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 222
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 290
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 494
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 267
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 466
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 373
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 265
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 256
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 568
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 204
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 272
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 199
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 254
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 188
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 333
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 562
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 186
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 254
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 186
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 534
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 2
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 4
 
|-
 
|-
 
|}
 
|}
   
  +
===== Muscle =====
msa/muscle_fabry_dataset_61.msa
 
  +
[[File:Fabry muscle 40.png|800px|thumb|The MSA calculated by Muscle]]
  +
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 39
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 485
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 59
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 402
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 58
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 674
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 90
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 680
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 33
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 646
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 43
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 445
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 26
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 384
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 28
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 379
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 37
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 368
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 35
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 366
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 366
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 157
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 8
 
|-
 
|-
 
|}
 
|}
   
==== T-Coffee ====
+
===== T-Coffee =====
  +
[[File:Fabry tcoffe 40.png|800px|thumb|The MSA calculated by T-Coffee]]
 
msa/tcoffe_fabry_dataset_0.msa
 
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 467: Line 595:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 491
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 541
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 730
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 518
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 451
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 624
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 810
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 507
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 455
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 819
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 780
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 523
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 447
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 505
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 763
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 584
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 811
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 813
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 492
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 505
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 785
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 8
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 12
 
|-
 
|-
 
|}
 
|}
   
  +
===== 3D-Coffee =====
msa/tcoffe_fabry_dataset_40.msa
 
  +
[[File:Fabry 3Dcoffee 40.png|800px|thumb|The MSA calculated by 3D-Coffee]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 3px 3px 0" | Sequence ID
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 540
 
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 497
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 541
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 474
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 518
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 580
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 624
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 769
 
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 463
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 507
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 819
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
 
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 479
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 523
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 461
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 505
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 584
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 813
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 461
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 505
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 741
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 785
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 11
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 12
 
|-
 
|-
 
|}
 
|}
   
  +
==== Set 60 ====
msa/tcoffe_fabry_dataset_61.msa
 
  +
===== ClustalW =====
  +
[[File:Fabry clustalw 61.png|800px|thumb|The MSA calculated by ClustalW]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 556: Line 693:
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 44
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 24
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 54
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 34
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 69
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 49
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 46
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 26
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 29
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 37
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 17
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 101
 
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 39
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 19
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 81
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 50
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 30
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 48
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 28
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 70
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 50
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 156
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 154
 
|-
 
|-
 
|}
 
|}
   
  +
===== Muscle =====
 
  +
[[File:Fabry muscle 61.png|800px|thumb|The MSA calculated by Muscle]]
==== 3D-Coffee ====
 
msa/3Dcoffee_fabry_dataset_0.msa
 
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 601: Line 740:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2PG26|G2PG26_STRVO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 491
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 29
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C7PCU7|C7PCU7_CHIPD
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 730
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 49
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 451
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 48
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1ZHK5|E1ZHK5_CHLVA
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 810
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 80
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G8NYA7|G8NYA7_GRAMM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 455
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 23
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q8RX86|Q8RX86_ARATH
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 780
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 16
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F8FLU8|F8FLU8_PAEMK
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 447
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 33
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F5BFS9|F5BFS9_TOBAC
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 763
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 18
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3RSE1|B3RSE1_TRIAD
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 811
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 25
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H1Q7I8|H1Q7I8_9ACTO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 492
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 28
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 27
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 8
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 156
 
|-
 
|-
 
|}
 
|}
   
  +
===== T-Coffee =====
msa/3Dcoffee_fabry_dataset_40.msa
 
  +
[[File:Fabry tcoffe 61.png|800px|thumb|The MSA calculated by T-Coffee]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 645: Line 788:
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
! style="border-style: solid; border-width: 0 0 3px 0; " align="center"| Number of gaps
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B3CFN7|B3CFN7_9BACE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2U095|H2U095_TAKRU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 540
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 44
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C5AKH4|C5AKH4_BURGB
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H0WQ54|H0WQ54_OTOGA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 497
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 54
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G2TQE8|G2TQE8_BACCO
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F1Q5G5|F1Q5G5_DANRE
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 474
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 69
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2JN17|H2JN17_STRHJ
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 580
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 46
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|B8P149|B8P149_POSPM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 769
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 49
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F9HJT9|F9HJT9_9STRE
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 463
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 37
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|Q0CEF5|AGALG_ASPTN
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|Q4RTE7|Q4RTE7_TETNG
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 479
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 101
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4W2N5|D4W2N5_9FIRM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|E1B725|E1B725_BOVIN
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 461
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 39
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|D4KDQ2|D4KDQ2_9FIRM
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|H2L5H7|H2L5H7_ORYLA
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 461
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 50
 
|-
 
|-
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|F2USV1|F2USV1_SALS5
+
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1T044|G1T044_RABIT
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 741
+
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 48
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|C0HA45|C0HA45_SALSA
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 70
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 11
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 155
 
|-
 
|-
 
|}
 
|}
   
  +
===== 3D-Coffee =====
msa/3Dcoffee_fabry_dataset_61.msa
 
  +
[[File:Fabry 3Dcoffee 61.png|800px|thumb|The MSA calculated by 3D-Coffee]]
   
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
 
{| style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 4"
Line 700: Line 847:
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G1P280|G1P280_MYOLU
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 46
 
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 46
  +
|-
  +
| style="border-style: solid; border-width: 0 3px 3px 0" |sp|P06280|AGAL_HUMAN
  +
| style="border-style: solid; border-width: 0 0 3px 0; " align="center" | 49
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
 
| style="border-style: solid; border-width: 0 3px 3px 0" |tr|G3WK18|G3WK18_SARHA
Line 720: Line 870:
 
|-
 
|-
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
 
| style="border-style: solid; border-width: 0 3px 3px 0" |
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
+
| style="border-style: solid; border-width: 0 0 3px 0; align:center"|
 
|-
 
|-
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
 
! style="border-style: solid; border-width: 0 3px 3px 0" |conserved
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 156
+
! style="border-style: solid; border-width: 0 0 3px 0; align:center"| 155
 
|-
 
|-
 
|}
 
|}
  +
  +
  +
=== Discussion ===
  +
In comparison to the other methods ClustalW seems to find less conserved columns in all three datasets.
  +
  +
[[Category: Fabry Disease 2012]]

Latest revision as of 17:05, 9 May 2012

Fabry Disease » Sequence alignments (sequence searches and multiple alignments)



This page contains our results and discussions. The lab journal can be found here.


Reference sequence

The reference sequence of α-Galactosidase A that will be used in this task was obtained from Swissprot P06280.

>gi|4504009|ref|NP_000160.1| alpha-galactosidase A precursor [Homo sapiens]
MQLRNPELHLGCALALRFLALVSWDIPGARALDNGLARTPTMGWLHWERFMCNLDCQEEPDSCISEKLFM
EMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQLANYVHSKGLKLGIYADVGNK
TCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRSIVYSCEWPLYMWP
FQKPNYTEIRQYCNHWRNFADIDDSWKSIKSILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQ
VTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWA
VAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENT
MQMSLKDLL

Sequence searches

Blast

GO terms of P06280 and each BLAST hit (with Evalue <= 0.003) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit

First we performed a BLAST search with the default parameter. Since all hits were significant we raised the number of shown one line descriptions (-v) as well as the number of database sequences to show alignments for (-b). This led to 663 hits with an E-value smaller or equal to 0.003, which we declared as significant in our search. For these proteins we extracted the E-value and the number of positive and also of identical amino acids of the pairwise alignments, as well as the length of each hit. You can see a histogram of each of these features on the left.

For the comparison of the GO terms, we obtained the set of terms for each hit and analyzed the number of those in common with the GO terms of the search protein α-Galactosidase A . We devided the number of common terms by the number of GO terms of P06280 (49). Since these proportions are very small, we thought it would also make sense to explore the fraction of the hits GO terms shared with the reference terms. Thus we devided the number if common terms by the number of terms of the hit. The histogram of the second rate show that in average over 80% of the GO terms of each hit are common with those of AGAL. The small amount of average accordance of the Galactosidase terms to the hit terms may be due to the fact that humans are a lot more complex than the species the homologous hits belong to. So the protein has to fullfill more needs in a more complex organism and thus has more GO terms assigned.

The average length fits the length of the α-Galactosidase A protein very well. This can be seen in the picture below ( Histogram of the length of the BLAST hits for P06280). On average over 51% of the residues are positive and almost 36% are identical hits. Thus on average 87% of the residues in each alignment are similar to the protein sequence of AGAL.

<figtable id="blastidev">

Histogram of the logarithmic E-values of the BLAST hits for P06280
Histogram of the positive amino acids of the pairwise alignments of the BLAST hits for P06280
Histogram of the identical amino acids of the pairwise alignments of the BLAST hits for P06280
Histogram of the length of the BLAST hits for P06280

</figtable>


Psi-Blast

<figure id="fig:psi_overlaps">

Overview of the Psi-Blast result sets

</figure>

We also used Psi-Blast - the profile blast program - to search against the big80 database. The one-line description output parameter (-v) was increased to show a maximum of 4000 hits and the same goes for the number of alignments to show (-b). There were four runs with the iterations set to 2 and 10 and the e-value cut-off to 2e-3 and 1e-9, respectively. All the other options were left at its default values. The exact commandline calls can be obtained from the journal.

Iterations E-value cut-off Number of Hits Runtime
2 2e-3 1129 3m0.814s
2 1e-9 683 3m9.422s
10 2e-3 3181 14m29.179s
10 1e-9 1491 15m39.251s

<xr id="fig:psi_overlaps"/> shows the hit set overlaps of the psi-blast runs.

<figure id="fig:psi_go_comparison_10its_1e-9">

GO terms of P06280 and each Psi-BLAST hit (with Evalue ≤ 1e-9) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to the number of each hit

</figure>


HHblits

We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8).

<figtable id="blastidev">

2 iterations - default
GO terms of P06280 and the first identifier in each HHblits cluster (with Evalue < 0.003) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit
GO terms of P06280 and each identifier in each HHblits cluster (with Evalue < 0.003) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit

</figtable> The HHBlits search was performed with the maximum E-value in the summary and alignment list set to 0.003 (-E) and the minimum number of lines in the summary hit list had to be 700 (-z). From this search we obtained only 325 significant cluster.

We also compared the GO terms in a similar manner as in the BLAST section. Here we discovered that on average only 14% of the AGAL_HUMAN protein's GO terms are included in the hits' terms. The "reverse" calculation revealed that around 70% of the hits' GO classes are in common with the search protein. This is rather low in comparison to the BLAST results.
In the pictures on the left, one can see, that it did make only a slight difference in the distribution of the shared GO terms whether only the first identifier of each cluster was included or all of them were analyzed.

The mean E-value in contrast is almost equal to the average E-value of the BLAST search. The same applies to the number of identical amino acids. The number of E-values and fraction of identical residues is comparable to the BLAST values, since there is only one E-value, %Identical and %Similar for each cluster.


<figtable id="blastidev">

Histogram of the logarithmic E-values of the HHblits hits for P06280
Histogram of the similarity of the HHblits hits to P06280
Histogram of the identical amino acids of the pairwise alignments of the HHblits hits for P06280

</figtable>


<figtable id="blastidev">

8 iterations
GO terms of P06280 and the first identifier in each HHblits cluster (with Evalue < 0.003 and 8 iterations) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit
GO terms of P06280 and each identifier in each HHblits cluster (with Evalue < 0.003 and 8 iterations) compared. Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the second picture in relation to number of each hit

</figtable> Since we thought that the number of significant hits was too low, we performed another HHBlits search with 8 iterations. Doing so, we gained 729 cluster with E-value smaller or equal to 0.003.

Considering only the first identifier in each cluster, the similarity in GO terms became better, but all other comparative values, like average E-value, similarity and identical residues got worse. If all identifiers are taken into account, also the similarity in GO terms becomes worse, but the distribution again is akin. This leads to the conclusion, that the proteins in one cluster actually do have almost completely the same GO terms. This can absolutely be explained by the algorithm of HHBlits.

Thus increasing the number of iterations might be better to obtain more homologous proteins, but since the similarity is smaller, the conservation might also be not as high as for proteins detected with less iterations.

<figtable id="blastidev">

Histogram of the logarithmic E-values of the HHblits search with 8 iterations for P06280
Histogram of the similarity of the BLAST hits (search with 8 iterations) to P06280
Histogram of the identical amino acids of the pairwise alignments of the BLAST hits (search with 8 iterations) for P06280

</figtable>

The first HHblits run took about 2.5 minutes, the second one about 16 minutes (see section Time).


Comparison sequence searches

Comparing the hits

Venn diagram of proteins found by BLAST, HHBlits and HHBlits with 8 iterations
Venn diagram of the proteins found by BLAST, Psi-BLAST (10 iterations and E-value cutoff 10e-10 ) and HHBlits with 8 iterations
Venn diagram of the first 100 proteins found by BLAST, HHBlits and HHBlits with 8 iterations
Venn diagram of the first 100 proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations

Venn diagrams created with Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.

For the Venn diagrams above, only the first identifier of each HHBlits cluster is used. This is only useful in the sense, that the number of proteins is comparable. In the Venn diagrams one realises, that only a small portion of the found hits is shared by all three methods. Each method seems to have a very unique set of findings. The biggest overlap is between the BLAST and Psi-BLAST hits, which is according to our expectations, since these two use similar approaches, while HHBlits searches by using iterative HMM-HMM comparison. These facts become most obvious in the last picture, where only the 100 best hits of all three methods are compared. Only 6 hits are common among all methods. In the remaining 94, about half are shared by BLAST and Psi-BLAST, the other half is unique in BLAST and Psi-BLAST. HHBlits has 84 unique hits and shares 5 hits solely with each of the BLAST algorithms. The comparison of all hits with E-value smaller or equal to 0.03 in all methods looks similar. It is noteworthy that here even a small number of hits is even shared only by HHBlits and BLAST (52), as well as Psi-BLAST and HHBlits (2). The shared hits of the two different HHBlits searches with 2 and 8 iterations shows also a great amount of overlap.

In the picture below, all identifiers in the HHBlits clusters were used. In this case a lot more identifiers are shared among all three methods. Not in respect to the total number of ids in the the HHBlits clusters in total (8463), but in respect to the number of identifiers that the two BLAST methods do not share with HHBlits. Here only 193 ids are not at all shared with HHBlits. The comparison of the first 100 hits of BLAST and Psi-BLAST was performed with the 100 first clusters in the HHBlits output. Again a larger amount of ids was shared by all three methods (39) and also few hits were unique in the BLAST searches (29 in sum).

Venn diagram of proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations. In this picture all identifiers in each cluster are used for the HHBlits result
Venn diagramof the first 100 proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations. In this picture all identifiers in each cluster are used for the HHBlits result

Comparing the Evalues

Fabry animation.gif

On the right, you can see an animated histogram of the distribution of the E-values, for the search performed with different methods. The R Script is based on Andrea's R Script psiBlast.evalueHist.Rscript

The most obvious fact is, that the E-value distribution of the Psi-BLAST hits is very different from the other two methods' hits. The Psi-BLAST histogram has its maximum around -60, while the histograms of the BLAST and the HHBlits search do not, but rather tend towards the zero point. Comparing especially the BLAST and Psi-BLAST results the advantage of refining steps and more iterations becomes clear, since the quality, in respect to the E-value, increases. Thus in respect to the E-values I would prefer using Psi-Blast.


Time

We evaluated the time the programs ran with the command "time"


Method Parameter Time
Blast v = 700 b = 700, v = 700 1m53.944s
HHBlits default 2m19.519s
HHBlits n = 8 16m7.754s


Multiple sequence alignments

Dataset

The dataset was generated from the result set of the Psi-Blast run with 10 iterations and an E-value cut-off of 1e-9. We used the following 30 proteins to create multiple sequence alignments with the different methods. Since there were no sequences with a sequence identity of more than 90%, we created three datasets. One with 10 sequences spanning the whole range of sequence identity, one with sequences having an sequence identity <40% and the last one with sequence identity >60%.

id                     eVal  identity coverage alignment_length

# whole range = Set100
tr|C7PCU7|C7PCU7_CHIPD	5e-63	21	0.9933	474
tr|B3RSE1|B3RSE1_TRIAD	2e-93	49	0.8415	362
tr|G2PG26|G2PG26_STRVO	1e-105	33	0.5854	427
tr|Q8RX86|Q8RX86_ARATH	1e-105	35	0.9814	422
tr|G8NYA7|G8NYA7_GRAMM	4e-61	22	0.6186	470
tr|H1Q7I8|H1Q7I8_9ACTO	1e-97	37	0.5409	396
tr|E1ZHK5|E1ZHK5_CHLVA	8e-80	38	0.8485	368
sp|Q0CEF5|AGALG_ASPTN	4e-63	12	0.611	478
tr|F5BFS9|F5BFS9_TOBAC	1e-106	36	0.9534	410
tr|F8FLU8|F8FLU8_PAEMK	1e-76	10	0.6091	474

# <40% sequence identity = Set40
tr|B8P149|B8P149_POSPM	3e-80	28	0.9425	432
tr|G2TQE8|G2TQE8_BACCO	7e-68	8	0.5795	452
tr|F9HJT9|F9HJT9_9STRE	3e-70	11	0.5709	452
tr|H2JN17|H2JN17_STRHY	3e-69	23	0.774	504
tr|C5AKH4|C5AKH4_BURGB	2e-67	26	0.5488	403
sp|Q0CEF5|AGALG_ASPTN	4e-63	12	0.611	478
tr|B3CFN7|B3CFN7_9BACE	1e-78	26	0.5828	412
tr|D4KDQ2|D4KDQ2_9FIRM	8e-76	10	0.611	483
tr|D4W2N5|D4W2N5_9FIRM	1e-67	10	0.5478	435
tr|F2USV1|F2USV1_SALS5	1e-88	35	1	467
 
# >60% sequence identity = Set60
tr|G1P280|G1P280_MYOLU	1e-108	78	0.9699	420
tr|Q4RTE7|Q4RTE7_TETNG	7e-89	71	0.7319	314
tr|F1Q5G5|F1Q5G5_DANRE	1e-106	67	0.9138	392
tr|E1B725|E1B725_BOVIN	1e-111	76	0.9727	428
tr|H2U095|H2U095_TAKRU	1e-101	65	0.9424	412
tr|G1T044|G1T044_RABIT	1e-109	82	0.9698	417
tr|C0HA45|C0HA45_SALSA	1e-102	63	0.9534	409
tr|H0WQ54|H0WQ54_OTOGA	4e-87	71	0.9953	428
tr|G3WK18|G3WK18_SARHA	1e-108	72	0.9388	414
tr|H2L5H7|H2L5H7_ORYLA	1e-100	61	0.9534	411

Methods

Each of the alignment tools ClustalW, Muscle, T-Coffee and 3D-Coffee was run on the three dataset to produce an multiple sequence alignment. The exact commandline calls are listed in our lab journal. Since 3D-Coffee needs a structure template and there were no suitable hits with a pdb entry assigned, we searched for a structure of a homologous sequence in pdb. We used the following pdb templates for the three datasets:

  • Set 100: 1KTB
  • Set 40: 1SZN
  • set 60: 1R46 (which is also the structure of our reference sequence)

However, the structures do not seem to add much valuable information to the alignment, so that the 3D-Coffee alignments were all identical to the one that T-Coffee generated.

Results

Set 100

ClustalW
The MSA calculated by ClustalW
Sequence ID Number of gaps
tr|G2PG26|G2PG26_STRVO 133
tr|C7PCU7|C7PCU7_CHIPD 372
sp|P06280|AGAL_HUMAN 389
tr|G8NYA7|G8NYA7_GRAMM 97
sp|Q0CEF5|AGALG_ASPTN 93
tr|E1ZHK5|E1ZHK5_CHLVA 452
tr|Q8RX86|Q8RX86_ARATH 422
tr|F8FLU8|F8FLU8_PAEMK 89
tr|B3RSE1|B3RSE1_TRIAD 453
tr|F5BFS9|F5BFS9_TOBAC 405
tr|H1Q7I8|H1Q7I8_9ACTO 134
conserved 2
Muscle
The MSA calculated by Muscle
Sequence ID Number of gaps
sp|Q0CEF5|AGALG_ASPTN 217
tr|F8FLU8|F8FLU8_PAEMK 213
tr|C7PCU7|C7PCU7_CHIPD 496
tr|G8NYA7|G8NYA7_GRAMM 221
sp|P06280|AGAL_HUMAN 513
tr|B3RSE1|B3RSE1_TRIAD 577
tr|G2PG26|G2PG26_STRVO 257
tr|H1Q7I8|H1Q7I8_9ACTO 258
tr|E1ZHK5|E1ZHK5_CHLVA 576
tr|Q8RX86|Q8RX86_ARATH 546
tr|F5BFS9|F5BFS9_TOBAC 529
conserved 10
T-Coffee
The MSA calculated by T-Coffee
Sequence ID Number of gaps
tr|G2PG26|G2PG26_STRVO 516
tr|C7PCU7|C7PCU7_CHIPD 755
sp|P06280|AGAL_HUMAN 772
sp|Q0CEF5|AGALG_ASPTN 476
tr|E1ZHK5|E1ZHK5_CHLVA 835
tr|G8NYA7|G8NYA7_GRAMM 480
tr|Q8RX86|Q8RX86_ARATH 805
tr|F8FLU8|F8FLU8_PAEMK 472
tr|F5BFS9|F5BFS9_TOBAC 788
tr|B3RSE1|B3RSE1_TRIAD 836
tr|H1Q7I8|H1Q7I8_9ACTO 517
conserved 9
3D-Coffee
The MSA calculated by 3D-Coffee
Sequence ID Number of gaps
tr|G2PG26|G2PG26_STRVO 516
tr|C7PCU7|C7PCU7_CHIPD 755
sp|P06280|AGAL_HUMAN 772
sp|Q0CEF5|AGALG_ASPTN 476
tr|E1ZHK5|E1ZHK5_CHLVA 835
tr|G8NYA7|G8NYA7_GRAMM 480
tr|Q8RX86|Q8RX86_ARATH 805
tr|F8FLU8|F8FLU8_PAEMK 472
tr|F5BFS9|F5BFS9_TOBAC 788
tr|B3RSE1|B3RSE1_TRIAD 836
tr|H1Q7I8|H1Q7I8_9ACTO 517
conserved 9

Set 40

ClustalW
The MSA calculated by ClustalW
Sequence ID Number of gaps
tr|C5AKH4|C5AKH4_BURGB 290
tr|G2TQE8|G2TQE8_BACCO 267
tr|H2JN17|H2JN17_STRHJ 373
tr|F9HJT9|F9HJT9_9STRE 256
sp|P06280|AGAL_HUMAN 568
sp|Q0CEF5|AGALG_ASPTN 272
tr|D4W2N5|D4W2N5_9FIRM 254
tr|B3CFN7|B3CFN7_9BACE 333
tr|B8P149|B8P149_POSPM 562
tr|D4KDQ2|D4KDQ2_9FIRM 254
tr|F2USV1|F2USV1_SALS5 534
conserved 4
Muscle
The MSA calculated by Muscle
Sequence ID Number of gaps
tr|H2JN17|H2JN17_STRHJ 485
tr|C5AKH4|C5AKH4_BURGB 402
tr|B8P149|B8P149_POSPM 674
sp|P06280|AGAL_HUMAN 680
tr|F2USV1|F2USV1_SALS5 646
tr|B3CFN7|B3CFN7_9BACE 445
sp|Q0CEF5|AGALG_ASPTN 384
tr|G2TQE8|G2TQE8_BACCO 379
tr|F9HJT9|F9HJT9_9STRE 368
tr|D4KDQ2|D4KDQ2_9FIRM 366
tr|D4W2N5|D4W2N5_9FIRM 366
conserved 8
T-Coffee
The MSA calculated by T-Coffee
Sequence ID Number of gaps
tr|C5AKH4|C5AKH4_BURGB 541
tr|G2TQE8|G2TQE8_BACCO 518
tr|H2JN17|H2JN17_STRHJ 624
tr|F9HJT9|F9HJT9_9STRE 507
sp|P06280|AGAL_HUMAN 819
sp|Q0CEF5|AGALG_ASPTN 523
tr|D4W2N5|D4W2N5_9FIRM 505
tr|B3CFN7|B3CFN7_9BACE 584
tr|B8P149|B8P149_POSPM 813
tr|D4KDQ2|D4KDQ2_9FIRM 505
tr|F2USV1|F2USV1_SALS5 785
conserved 12
3D-Coffee
The MSA calculated by 3D-Coffee
Sequence ID Number of gaps
tr|C5AKH4|C5AKH4_BURGB 541
tr|G2TQE8|G2TQE8_BACCO 518
tr|H2JN17|H2JN17_STRHJ 624
tr|F9HJT9|F9HJT9_9STRE 507
sp|P06280|AGAL_HUMAN 819
sp|Q0CEF5|AGALG_ASPTN 523
tr|D4W2N5|D4W2N5_9FIRM 505
tr|B3CFN7|B3CFN7_9BACE 584
tr|B8P149|B8P149_POSPM 813
tr|D4KDQ2|D4KDQ2_9FIRM 505
tr|F2USV1|F2USV1_SALS5 785
conserved 12

Set 60

ClustalW
The MSA calculated by ClustalW
Sequence ID Number of gaps
tr|H2U095|H2U095_TAKRU 24
tr|H0WQ54|H0WQ54_OTOGA 34
tr|F1Q5G5|F1Q5G5_DANRE 49
tr|G1P280|G1P280_MYOLU 26
sp|P06280|AGAL_HUMAN 29
tr|G3WK18|G3WK18_SARHA 17
tr|E1B725|E1B725_BOVIN 19
tr|Q4RTE7|Q4RTE7_TETNG 81
tr|H2L5H7|H2L5H7_ORYLA 30
tr|G1T044|G1T044_RABIT 28
tr|C0HA45|C0HA45_SALSA 50
conserved 154
Muscle
The MSA calculated by Muscle
Sequence ID Number of gaps
tr|H2L5H7|H2L5H7_ORYLA 29
tr|C0HA45|C0HA45_SALSA 49
tr|F1Q5G5|F1Q5G5_DANRE 48
tr|Q4RTE7|Q4RTE7_TETNG 80
tr|H2U095|H2U095_TAKRU 23
tr|G3WK18|G3WK18_SARHA 16
tr|H0WQ54|H0WQ54_OTOGA 33
tr|E1B725|E1B725_BOVIN 18
tr|G1P280|G1P280_MYOLU 25
sp|P06280|AGAL_HUMAN 28
tr|G1T044|G1T044_RABIT 27
conserved 156
T-Coffee
The MSA calculated by T-Coffee
Sequence ID Number of gaps
tr|H2U095|H2U095_TAKRU 44
tr|H0WQ54|H0WQ54_OTOGA 54
tr|F1Q5G5|F1Q5G5_DANRE 69
tr|G1P280|G1P280_MYOLU 46
sp|P06280|AGAL_HUMAN 49
tr|G3WK18|G3WK18_SARHA 37
tr|Q4RTE7|Q4RTE7_TETNG 101
tr|E1B725|E1B725_BOVIN 39
tr|H2L5H7|H2L5H7_ORYLA 50
tr|G1T044|G1T044_RABIT 48
tr|C0HA45|C0HA45_SALSA 70
conserved 155
3D-Coffee
The MSA calculated by 3D-Coffee
Sequence ID Number of gaps
tr|H2U095|H2U095_TAKRU 44
tr|H0WQ54|H0WQ54_OTOGA 54
tr|F1Q5G5|F1Q5G5_DANRE 69
tr|G1P280|G1P280_MYOLU 46
sp|P06280|AGAL_HUMAN 49
tr|G3WK18|G3WK18_SARHA 37
tr|Q4RTE7|Q4RTE7_TETNG 101
tr|E1B725|E1B725_BOVIN 39
tr|H2L5H7|H2L5H7_ORYLA 50
tr|G1T044|G1T044_RABIT 48
tr|C0HA45|C0HA45_SALSA 70
conserved 155


Discussion

In comparison to the other methods ClustalW seems to find less conserved columns in all three datasets.