Fabry:Sequence alignments (sequence searches and multiple alignments):Results
Please see Task 2 for our scripts and line of action on this topic.
Contents
Reference sequence
The reference sequence of α-Galactosidase A that will be used in this task was obtained from Swissprot P06280.
>gi|4504009|ref|NP_000160.1| alpha-galactosidase A precursor [Homo sapiens] MQLRNPELHLGCALALRFLALVSWDIPGARALDNGLARTPTMGWLHWERFMCNLDCQEEPDSCISEKLFM EMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQLANYVHSKGLKLGIYADVGNK TCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENLADGYKHMSLALNRTGRSIVYSCEWPLYMWP FQKPNYTEIRQYCNHWRNFADIDDSWKSIKSILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQ VTQMALWAIMAAPLFMSNDLRHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWA VAMINRQEIGGPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENT MQMSLKDLL
Sequence searches
Blast
Number of hits with Evalue < 0.003: 663
The run took about 2 minutes (see section Time)
Psi-Blast
HHblits
We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8).
2 iterations - default
Number of hits with Evalue < 0.003: 326
8 iterations
Number of hits with Evalue < 0.003: 729
The first HHblits run took about 2.5 minutes, the second one about 16 minutes (see section Time).
Comparison sequence searches
Comparing the hits
Venn diagrams created with Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.
In the Venn diagrams one realises, that only a small portion of the found hits is shared by all three methods. Each method seems to have a very unique set of findings. The biggest overlap is between the BLAST and Psi-BLAST hits, which is according to our expectations, since these two use similar approaches, while HHBlits searches by using iterative HMM-HMM comparison. These facts become most obvious in the last picture, where only the 100 best hits of all three methods are compared. Only eleven hits are common among all methods. The remaining 89 are shared by BLAST and Psi-BLAST and are unique in the HHBlits search. The comparison of all hits with E-value smaller or equal to 0.03 in all methods looks similar. It is noteworthy that here even a small number of hits is even shared only by HHBlits and BLAST (52), as well as Psi-BLAST and HHBlits (2). The overlap of the two different HHBlits searched with 2 and 8 iterations shows also a great amount of overlap.
Comparing the Evalues
Above you can see a histogram of the distribution of the E-values, for the search performed with different methods. The R Script is based on Andrea's R Script psiBlast.evalueHist.Rscript
As one can clearly see, the number of significant hits in the Psi-Blast search exceeds the number of hits in any of the other two searches by far. Also this histogram looks more like a normal distribution with mean -80, while the histograms of the BLAST and the HHBlits search do not, but rather tend towards the zero point. The least hits are generated by the "ordinary" BLAST search (663), the Psi-BLAST search finds the ten-fold number (6868). Thus in respect to the E-values I would prefer using Psi-Blast.
Time
We evaluated the time the programs ran with the command "time"
Method | Parameter | Time |
---|---|---|
Blast v = 700 | b = 700, v = 700 | 1m53.944s |
HHBlits | default | 2m19.519s |
HHBlits | n = 8 | 16m7.754s |
Multiple sequence alignments
Dataset
#99 - 90% sequence identity >tr|B4DLT5|B4DLT5_HUMAN cDNA FLJ56739, highly similar to Alpha-galactosidase A (EC 3.2.1.22) OS=Homo sapiens | Identities = 183/183 (100%) >tr|G3SI81|G3SI81_GORGO Uncharacterized protein OS=Gorilla gorilla|Identities = 390/432 (90%) >UP20|LOZLIBUBA|1|64 Alpha-galactosidase A (Fragment). [Ateles belzebuth chamek]|O97898|Identities=97% #89 - 60% sequence identity >tr|G1P280|G1P280_MYOLU Uncharacterized protein OS=Myotis lucifugus|Identities = 341/410 (83%) >tr|G1T044|G1T044_RABIT Uncharacterized protein OS=Oryctolagus|Identities = 348/420 (82%) >tr|E1B725|E1B725_BOVIN Uncharacterized protein OS=Bos taurus GN=GLA|Identities = 334/413 (80%) >tr|D3ZJF9|D3ZJF9_RAT Galactosidase, alpha (Mapped), isoform CRA_a|Identities = 331/418 (79%) >tr|H2L5H7|H2L5H7_ORYLA Uncharacterized protein (Fragment)|Identities = 255/402 (63%) >tr|E1BT44|E1BT44_CHICK Uncharacterized protein OS=Gallus gallus|Identities = 276/385 (71%) #59 - 40% sequence identity >tr|E2B637|E2B637_HARSA Alpha-N-acetylgalactosaminidase|Identities = 204/411 (49%) >tr|F4WJD6|F4WJD6_ACREC Alpha-N-acetylgalactosaminidase|Identities = 195/394 (49%) >tr|D2T1A8|D2T1A8_CHOPA Putative alpha-N-acetylgalactosaminidase|Identities = 202/413 (48%) >tr|G5BPR2|G5BPR2_HETGA Alpha-N-acetylgalactosaminidase|Identities = 168/292 (57%) >tr|F5BFS9|F5BFS9_TOBAC Alpha-galactosidase OS=Nicotiana tabacum|Identities = 145/329 (44%) >tr|Q3UZX5|Q3UZX5_MOUSE Putative uncharacterized protein OS=Mus|Identities = 140/239 (58%) #39 - 20% sequence identity >tr|F1T8Q0|F1T8Q0_9CLOT Alpha-galactosidase (Precursor)|Identities = 164/416 (39%) >tr|G6DHZ5|G6DHZ5_DANPL Alpha-N-acetylgalactosaminidase OS=Danaus|Identities = 110/294 (37%) >tr|C7G8V8|C7G8V8_9FIRM Alpha-galactosidase OS=Roseburia|Identities = 134/347 (38%) >tr|F8EMB8|F8EMB8_RUNSL Alpha-galactosidase (Precursor) OS=Runella|Identities = 106/366 (28%) >tr|G0FV12|G0FV12_AMYMD Melibiase OS=Amycolatopsis mediterranei S699|Identities = 104/411 (25%) >tr|B9TQP6|B9TQP6_RICCO Alpha-galactosidase/alpha-n-acetylgalactosaminidase|Identities = 31/93 (33%)
TODO: Add pictures of MSA and find a way to present them, since they are _very_ wide --Rackersederj 07:06, 5 May 2012 (UTC)