Difference between revisions of "Fabry:Sequence alignments (sequence searches and multiple alignments):Results"

From Bioinformatikpedia
(Multiple sequence alignments)
 
(37 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Please see [[Fabry:Sequence_alignments_(sequence_searches_and_multiple_alignments) | Task 2 ]] for our scripts and line of action on this topic.
 
 
== Reference sequence ==
 
 
The reference sequence of [[Alpha-galactosidase|α-Galactosidase A]] that will be used in this task was obtained from Swissprot [http://www.uniprot.org/uniprot/P06280 P06280].
 
 
>gi|4504009|ref|NP_000160.1| alpha-galactosidase A precursor [Homo sapiens]
 
MQLRNPELHLGCALALRFLALVSWDIPGARALDNGLARTPTMGWLHWERFMCNLDCQEEPDSCISEKLFM
 
EMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQLANYVHSKGLKLGIYADVGNK
 
TCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRSIVYSCEWPLYMWP
 
FQKPNYTEIRQYCNHWRNFADIDDSWKSIKSILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQ
 
VTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWA
 
VAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENT
 
MQMSLKDLL
 
 
== Sequence searches ==
 
=== Blast ===
 
{| class="centered"
 
| [[File:Blastsearch_default_v700_ids_GOterms_comparison.png|thumb| GO terms of P06280 and each BLAST hit (with Evalue <= 0.003) compared.
 
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
 
| [[File:blastsearch_default_v700_Evalues.png|thumb| Histogram of the logarithmic E-values of the BLAST hits for P06280]]
 
|}
 
{| class="centered"
 
| [[File:blastsearch_default_v700_Positives.png|thumb| Histogram of the positive amino acids of the pairwise alignments of the BLAST hits for P06280]]
 
| [[File:blastsearch_default_v700_Identities.png|thumb| Histogram of the identical amino acids of the pairwise alignments of the BLAST hits for P06280]]
 
| [[File:blastsearch_default_v700_Lengths.png|thumb| Histogram of the length of the BLAST hits for P06280]]
 
|}
 
 
Number of hits with Evalue < 0.003: 663
 
 
 
The run took about 2 minutes (see section [[Sequence_alignments_(sequence_searches_and_multiple_alignments)#Time | Time]])
 
 
=== Psi-Blast ===
 
 
 
=== HHblits ===
 
We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8).
 
==== 2 iterations - default ====
 
{| class="centered"
 
| [[File:hhblits_default_ids_GOterms_comparison.png|thumb| GO terms of P06280 and each HHblits hit (with Evalue < 0.003) compared.
 
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
 
| [[File:hhblits_default.out_Evalues.png|thumb| Histogram of the logarithmic E-values of the HHblits hits for P06280]]
 
|}
 
 
Number of hits with Evalue < 0.003: 326
 
{| class="centered"
 
| [[File:hhblits_default_Similarity.png|thumb| Histogram of the similarity of the HHblits hits to P06280]]
 
| [[File:hhblits_default_Identities.png|thumb| Histogram of the identical amino acids of the pairwise alignments of the HHblits hits for P06280]]
 
|}
 
 
==== 8 iterations ====
 
{| class="centered"
 
| [[File:hhblits_n8_neu_ids_GOterms_comparison.png|thumb| GO terms of P06280 and each HHblits hit (with Evalue < 0.003 and 8 iterations) compared.
 
Percentage terms shared, in relation to number of GO terms of P06280 (AGAL_HUMAN) in the upper picture, in the secon picture in relation to number of each hit]]
 
| [[File:hhblits_n8_neu.out_Evalues.png|thumb| Histogram of the logarithmic E-values of the HHblits search with 8 iterations for P06280]]
 
|}
 
 
Number of hits with Evalue < 0.003: 729
 
{| class="centered"
 
| [[File:hhblits_n8_neu_Similarity.png|thumb| Histogram of the similarity of the BLAST hits (search with 8 iterations) to P06280]]
 
| [[File:hhblits_n8_neu_Identities.png|thumb| Histogram of the identical amino acids of the pairwise alignments of the BLAST hits (search with 8 iterations) for P06280]]
 
|}
 
 
 
The first HHblits run took about 2.5 minutes, the second one about 16 minutes (see section [[Sequence_alignments_(sequence_searches_and_multiple_alignments)#Time | Time]]).
 
 
== Comparison sequence searches ==
 
=== Comparing the hits ===
 
{| class="centered"
 
| [[File:venn_blast_hhblits.png|thumb| Venn diagram of proteins found by BLAST, HHBlits and HHBlits with 8 iterations]]
 
| [[File:venn_blast_psi_hhblits.png|thumb| Venn diagram of the proteins found by BLAST, Psi-BLAST (10 iterations and E-value cutoff 10e-10 ) and HHBlits with 8 iterations]]
 
| [[File:venn_blast_hhblits_best100.png|thumb| Venn diagram of the first 100 proteins found by BLAST, HHBlits and HHBlits with 8 iterations]]
 
| [[File:venn_blast_psi_hhblits_best100.png|thumb| Venn diagram of the first 100 proteins found by BLAST, Psi-BLAST and HHBlits with 8 iterations]]
 
|}
 
 
Venn diagrams created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.]
 
 
In the Venn diagrams one realises, that only a small portion of the found hits is shared by all three methods. Each method seems to have a very unique set of findings. The biggest overlap is between the BLAST and Psi-BLAST hits, which is according to our expectations, since these two use similar approaches, while HHBlits searches by using iterative HMM-HMM comparison. These facts become most obvious in the last picture, where only the 100 best hits of all three methods are compared. Only eleven hits are common among all methods. The remaining 89 are shared by BLAST and Psi-BLAST and are unique in the HHBlits search. The comparison of all hits with E-value smaller or equal to 0.03 in all methods looks similar. It is noteworthy that here even a small number of hits is even shared only by HHBlits and BLAST (52), as well as Psi-BLAST and HHBlits (2).
 
The overlap of the two different HHBlits searched with 2 and 8 iterations shows also a great amount of overlap.
 
 
=== Comparing the Evalues ===
 
 
{| class="centered"
 
[[File:Fabry_animation.gif]]
 
|}
 
Above you can see a histogram of the distribution of the E-values, for the search performed with different methods.
 
The R Script is based on Andrea's [[ARSA_search_protocol | R Script psiBlast.evalueHist.Rscript]]
 
 
As one can clearly see, the number of significant hits in the Psi-Blast search exceeds the number of hits in any of the other two searches by far. Also this histogram looks more like a normal distribution with mean -80, while the histograms of the BLAST and the HHBlits search do not, but rather tend towards the zero point. The least hits are generated by the "ordinary" BLAST search (663), the Psi-BLAST search finds the ten-fold number (6868). Thus in respect to the E-values I would prefer using Psi-Blast.
 
 
=== Time ===
 
We evaluated the time the programs ran with the command "time"
 
 
 
{|style="border-collapse: separate; border-spacing: 0; border-width: 1px; border-style: solid; border-color: #000; padding: 0; align: center; " width="85%"
 
! style="border-style: solid; border-width: 0 1px 1px 0"| Method
 
! style="border-style: solid; border-width: 0 1px 1px 0"| Parameter
 
! style="border-style: solid; border-width: 0 1px 1px 0"| Time
 
|-
 
| style="border-style: solid; border-width: 0 1px 1px 0"| Blast v = 700
 
| style="border-style: solid; border-width: 0 1px 1px 0"| b = 700, v = 700
 
| style="border-style: solid; border-width: 0 1px 1px 0"| 1m53.944s
 
|-
 
| style="border-style: solid; border-width: 0 1px 1px 0"| HHBlits
 
| style="border-style: solid; border-width: 0 1px 1px 0"| default
 
| style="border-style: solid; border-width: 0 1px 1px 0"| 2m19.519s
 
|-
 
| style="border-style: solid; border-width: 0 1px 1px 0"| HHBlits
 
| style="border-style: solid; border-width: 0 1px 1px 0"| n = 8
 
| style="border-style: solid; border-width: 0 1px 1px 0"| 16m7.754s
 
|}
 
 
 
 
== Multiple sequence alignments ==
 
TODO: Add pictures of MSA and find a way to present them, since they are _very_ wide --[[User:Rackersederj|Rackersederj]] 07:06, 5 May 2012 (UTC)
 

Latest revision as of 21:00, 6 May 2012