Difference between revisions of "Glucocerebrosidase sequence alignments"
(→Sequence searches) |
(→Sequence searches) |
||
Line 26: | Line 26: | ||
The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>. The results of HHSearch are not included as a different database (pdb70) was used. |
The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>. The results of HHSearch are not included as a different database (pdb70) was used. |
||
+ | [[Image:Venn_diagrams_overlap_sequence_search_glucocerebrosidase.jpg|thumb|Overlap of the different results]] |
||
− | [[Image:venn_diagrams_sequence_search_glucocerebrosidase.jpg]] |
||
Revision as of 16:57, 21 May 2011
Sequence searches
Several different tools were used in order to look for sequences that are related to glucocerebrosidase in the non-redundant sequence database.
- FASTA
- As Fasta was not initially installed, it was downloaded from the EBI FTP Download Site <ref>ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/fasta36/</ref>.
- Command:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out
- BLAST
- Command:
blastall -p blastp -d /data/blast/nr/nr -i gbaseq.fasta -o blast.out
- PSI-BLAST
- This tool was used 4 times with all different combinations of 3 or 5 iterations (x) and an E-value cut-off (y) of 0.005 or 10e-6.
- Command:
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_x_y.out -j x -h y
Furthermore the online version of HHSearch <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> was used to search against the pdb70 database of May 14th.
Results
The sequence search with FASTA returned 520 sequences. BLAST, as well as the different PSI-BLAST runs returned 500 sequences. The search with HHSearch against the pdb database only resulted in 100 sequences.
Overlap
The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>. The results of HHSearch are not included as a different database (pdb70) was used.
Fasta
As Fasta was not initially installed, it was downloaded from the EBI FTP Download Site <ref>ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/fasta36/</ref>.
Command:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out
- To apply the tool, the following command was used:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out
Fasta
Fasta was not yet installed, so we downloaded it here: [[1]]
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out
identity | percentage | number |
0-9 % | 0.0 | 0 |
10-19% | 0.0 | 0 |
20-29% | 0.462 | 240 |
30-39% | 0.308 | 160 |
40-49% | 0.085 | 44 |
50-59% | 0.012 | 6 |
60-69% | 0.006 | 3 |
70-79% | 0.012 | 6 |
80-89% | 0.052 | 27 |
90-100% | 0.065 | 34 |
Total: 520 |
Blast
blastall -p blastp -d /data/blast/nr/nr -i gbaseq.fasta -o blast.out
identity | percentage | number |
>= 0% | 0.0 | 0 |
>= 10% | 0.0 | 0 |
>= 20% | 0.172 | 43 |
>= 30% | 0.508 | 127 |
>= 40% | 0.112 | 28 |
>= 50% | 0.024 | 6 |
>= 60% | 0.012 | 3 |
>= 70% | 0.004 | 1 |
>= 80% | 0.06 | 15 |
>= 90% | 0.108 | 27 |
Total: 250 |
PSI-Blast
3 iterations, E-value cutoff 0.005
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_0.005.out -j 3 -h 0.005
identity | percentage | number |
>= 0% | 0.0 | 0 |
>= 10% | 0.0 | 0 |
>= 20% | 0.34 | 85 |
>= 30% | 0.392 | 98 |
>= 40% | 0.104 | 26 |
>= 50% | 0.016 | 4 |
>= 60% | 0.0 | 0 |
>= 70% | 0.004 | 1 |
>= 80% | 0.036 | 9 |
>= 90% | 0.108 | 27 |
Total: 250 |
3 iterations, E-value cutoff 10E-6
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_10E-6.out -j 3 -h 10E-6
identity | percentage | number |
>= 0% | 0.0 | 0 |
>= 10% | 0.0 | 0 |
>= 20% | 0.34 | 85 |
>= 30% | 0.392 | 98 |
>= 40% | 0.104 | 26 |
>= 50% | 0.016 | 4 |
>= 60% | 0.0 | 0 |
>= 70% | 0.004 | 1 |
>= 80% | 0.036 | 9 |
>= 90% | 0.108 | 27 |
Total: 250 |
5 iterations, E-value cutoff 0.005
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_0.005.out -j 5 -h 0.005
identity | percentage | number |
>= 0% | 0.0 | 0 |
>= 10% | 0.0 | 0 |
>= 20% | 0.384 | 96 |
>= 30% | 0.36 | 90 |
>= 40% | 0.096 | 24 |
>= 50% | 0.012 | 3 |
>= 60% | 0.0 | 0 |
>= 70% | 0.004 | 1 |
>= 80% | 0.036 | 9 |
>= 90% | 0.108 | 27 |
Total: 250 |
5 iterations, E-value cutoff 10E-6
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_10E-6.out -j 5 -h 10E-6
identity | percentage | number |
>= 0% | 0.0 | 0 |
>= 10% | 0.0 | 0 |
>= 20% | 0.376 | 94 |
>= 30% | 0.364 | 91 |
>= 40% | 0.1 | 25 |
>= 50% | 0.012 | 3 |
>= 60% | 0.0 | 0 |
>= 70% | 0.004 | 1 |
>= 80% | 0.036 | 9 |
>= 90% | 0.108 | 27 |
Total: 250 |
HHSearch
For HHSearch we used [[2]] with the pdb70 database of May 14th.
Discussion
Multiple sequence alignments
Sequences used for multiple sequence alignments
For the multiple sequence alignments we used our reference sequence and twenty sequences we had found with sequence searches. We tried to avoid hypothetical sequences and tried to take sequences, that have similiar identities in all sequence searches. The following tables show the chosen sequences with their identities in the different searches. We only found one pdb structure (which was also in the HSSP database).
our reference sequence: P04062, GLCM_HUMAN Glucosylceramidase
99 - 90% sequence identity
NP_001127488.1 | glucosylceramidase precursor | Pongo abelii | 95.0, 98.0, 98.0, 98.0, 98.0, 98.1 |
3KE0 | A Chain A, Crystal Structure Of N370s Glucocerebrosidase At Acidic Ph. | 97.0, 99.0, 99.0, 99.0, 99.0, 99.8 | |
EAW53100.1 | glucosidase, beta; acid (includes glucosylceramidase), isoform CRA_a | Homo sapiens | 97.0, 99.0, 99.0, 99.0, 99.0, 99.6 |
NP_001165283.1 | glucosylceramidase isoform 3 precursor | Homo sapiens | 88.0, 90.0, 90.0, 90.0, 90.0, 90.9 |
NP_001128784.1 | DKFZP469B0323 protein | Pongo abelii | 95.0, 97.0, 97.0, 97.0, 97.0, 97.4 |
89 - 60% sequence identity
NP_032120.1 | glucosylceramidase isoform 1 | Mus musculus | 84.0, 86.0, 86.0, 86.0, 86.0, 86.4 |
EDL15229.1 | glucosidase, beta, acid, isoform CRA_a | Mus musculus | 84.0, 86.0, 86.0, 86.0, 86.0, 86.3 |
NP_001121111.1 | glucosidase, beta, acid | Rattus norvegicus | 85.0, 87.0, 87.0, 87.0, 87.0, 87.6 |
NP_001039886.1 | glucosylceramidase precursor | Bos taurus | 86.0, 89.0, 89.0, 89.0, 89.0, 89.2 |
NP_001005730.1 | glucosylceramidase precursor | Sus scrofa | 87.0, 89.0, 89.0, 89.0, 89.0, 89.6 |
59 - 40% sequence identity
EFN73638.1 | Glucosylceramidase | Camponotus floridanus | 41.0, 40.0, 40.0, 41.0, 40.0, 42.2 |
CAG11843.1 | unnamed protein product | Tetraodon nigroviridis | 52.0, 53.0, 53.0, 53.0, 53.0, 54.2 |
NP_500785.1 | hypothetical protein Y4C6B.6 | Caenorhabditis elegans | 41.0, 40.0, 39.0, 40.0, 39.0, 41.9 |
EFA07058.1 | hypothetical protein TcasGA2_TC010035 | Tribolium castaneum | 41.0, 42.0, 41.0, 42.0, 41.0, 43.2 |
EFO26573.1 | O-glycosyl hydrolase family 30 protein | Loa loa | 40.0, 40.0, 40.0, 40.0, 40.0, 41.7 |
39 - 20% sequence identity
ZP_07040024.1 | glucosylceramidase | Bacteroides sp. 3_1_23 | 26.0, 24.0, 24.0, 24.0, 24.0, 25.5 |
YP_244236.1 | glycosyl hydrolase | Xanthomonas campestris pv. campestris str. 8004 | 33.0, 31.0, 30.0, 31.0, 31.0, 33.4 |
ZP_01885435.1 | glycosyl hydrolase | Pedobacter sp. BAL39 | 36.0, 33.0, 32.0, 33.0, 33.0, 37.2 |
ZP_07388379.1 | Glucan endo-1,6-beta-glucosidase | Paenibacillus curdlanolyticus YK9 | 28.0, 24.0, 23.0, 24.0, 24.0, 30.1 |
NP_623885.1 | O-glycosyl hydrolase family protein | Thermoanaerobacter tengcongensis MB4 | 37.0, 34.0, 33.0, 34.0, 32.0, 37.5 |
Cobalt
Cobalt was not yet installed, so we downloaded it here: [[3]]
time /home/student/Desktop/ncbi-cobalt-2.0.1/cobalt -i multiple_alignment.fasta -norps T > cobalt_multiple_alignment.aln
time | |
real | 0m3.488s |
user | 0m2.320s |
sys | 0m0.180s |
ClustalW
time clustalw
time | |
real | 0m40.625s |
user | 0m5.320s |
sys | 0m0.070s |
Muscle
time muscle -in multiple_alignment.fasta -out muscle_multiple_alignment.aln
time | |
real | 0m3.018s |
user | 0m1.710s |
sys | 0m0.100s |
T-Coffee
time t_coffee multiple_alignment.fasta
time | |
real | 0m41.360s |
user | 0m34.000s |
sys | 0m0.920s |
3D-Coffee/Expresso
time t_coffee -seq multiple_alignment.fasta -mode expresso -pdb_type dn
time | |
real | 12m19.825s |
user | 5m17.140s |
sys | 0m46.970s |
Discussion
References
<references />