Glucocerebrosidase sequence alignments

From Bioinformatikpedia
Revision as of 10:03, 21 May 2011 by Brunners (talk | contribs) (Sequences used for multiple sequence alignments)

Sequence searches

Fasta

Fasta was not yet installed, so we downloaded it here: [[1]]
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.462 240
>= 30% 0.308 160
>= 40% 0.085 44
>= 50% 0.012 6
>= 60% 0.006 3
>= 70% 0.012 6
>= 80% 0.052 27
>= 90% 0.065 34
Total: 520

Blast

blastall -p blastp -d /data/blast/nr/nr -i gbaseq.fasta -o blast.out

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.172 43
>= 30% 0.508 127
>= 40% 0.112 28
>= 50% 0.024 6
>= 60% 0.012 3
>= 70% 0.004 1
>= 80% 0.06 15
>= 90% 0.108 27
Total: 250

PSI-Blast

3 iterations, E-value cutoff 0.005

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_0.005.out -j 3 -h 0.005

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.34 85
>= 30% 0.392 98
>= 40% 0.104 26
>= 50% 0.016 4
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

3 iterations, E-value cutoff 10E-6

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_10E-6.out -j 3 -h 10E-6

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.34 85
>= 30% 0.392 98
>= 40% 0.104 26
>= 50% 0.016 4
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

5 iterations, E-value cutoff 0.005

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_0.005.out -j 5 -h 0.005

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.384 96
>= 30% 0.36 90
>= 40% 0.096 24
>= 50% 0.012 3
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

5 iterations, E-value cutoff 10E-6

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_10E-6.out -j 5 -h 10E-6

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.376 94
>= 30% 0.364 91
>= 40% 0.1 25
>= 50% 0.012 3
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

HHSearch

For HHSearch we used [[2]] with the pdb70 database of May 14th.

Discussion

Multiple sequence alignments

Sequences used for multiple sequence alignments

For the multiple sequence alignments we used our reference sequence and twenty sequences we had found with sequence searches. We tried to avoid hypothetical sequences and tried to take sequences, that have similiar identities in all sequence searches. The following tables show the chosen sequences with their identities in the different searches. We only found one pdb structure (which was also in the HSSP database).

our reference sequence: P04062, GLCM_HUMAN Glucosylceramidase

99 - 90% sequence identity

NP_001127488.1 glucosylceramidase precursor Pongo abelii 95.0, 98.0, 98.0, 98.0, 98.0, 98.1
3KE0 A Chain A, Crystal Structure Of N370s Glucocerebrosidase At Acidic Ph. 97.0, 99.0, 99.0, 99.0, 99.0, 99.8
EAW53100.1 glucosidase, beta; acid (includes glucosylceramidase), isoform CRA_a Homo sapiens 97.0, 99.0, 99.0, 99.0, 99.0, 99.6
NP_001165283.1 glucosylceramidase isoform 3 precursor Homo sapiens 88.0, 90.0, 90.0, 90.0, 90.0, 90.9
NP_001128784.1 DKFZP469B0323 protein Pongo abelii 95.0, 97.0, 97.0, 97.0, 97.0, 97.4

89 - 60% sequence identity

NP_032120.1 glucosylceramidase isoform 1 Mus musculus 84.0, 86.0, 86.0, 86.0, 86.0, 86.4
EDL15229.1 glucosidase, beta, acid, isoform CRA_a Mus musculus 84.0, 86.0, 86.0, 86.0, 86.0, 86.3
NP_001121111.1 glucosidase, beta, acid Rattus norvegicus 85.0, 87.0, 87.0, 87.0, 87.0, 87.6
NP_001039886.1 glucosylceramidase precursor Bos taurus 86.0, 89.0, 89.0, 89.0, 89.0, 89.2
NP_001005730.1 glucosylceramidase precursor Sus scrofa 87.0, 89.0, 89.0, 89.0, 89.0, 89.6

59 - 40% sequence identity

EFN73638.1 Glucosylceramidase Camponotus floridanus 41.0, 40.0, 40.0, 41.0, 40.0, 42.2
CAG11843.1 unnamed protein product Tetraodon nigroviridis 52.0, 53.0, 53.0, 53.0, 53.0, 54.2
NP_500785.1 hypothetical protein Y4C6B.6 Caenorhabditis elegans 41.0, 40.0, 39.0, 40.0, 39.0, 41.9
EFA07058.1 hypothetical protein TcasGA2_TC010035 Tribolium castaneum 41.0, 42.0, 41.0, 42.0, 41.0, 43.2
EFO26573.1 O-glycosyl hydrolase family 30 protein Loa loa 40.0, 40.0, 40.0, 40.0, 40.0, 41.7

39 - 20% sequence identity

ZP_07040024.1 glucosylceramidase Bacteroides sp. 3_1_23 26.0, 24.0, 24.0, 24.0, 24.0, 25.5
YP_244236.1 glycosyl hydrolase Xanthomonas campestris pv. campestris str. 8004 33.0, 31.0, 30.0, 31.0, 31.0, 33.4
ZP_01885435.1 glycosyl hydrolase Pedobacter sp. BAL39 36.0, 33.0, 32.0, 33.0, 33.0, 37.2
ZP_07388379.1 Glucan endo-1,6-beta-glucosidase Paenibacillus curdlanolyticus YK9 28.0, 24.0, 23.0, 24.0, 24.0, 30.1
NP_623885.1 O-glycosyl hydrolase family protein Thermoanaerobacter tengcongensis MB4 37.0, 34.0, 33.0, 34.0, 32.0, 37.5

Cobalt

Cobalt was not yet installed, so we downloaded it here: [[3]]

time /home/student/Desktop/ncbi-cobalt-2.0.1/cobalt -i multiple_alignment.fasta -norps T > cobalt_multiple_alignment.aln

time
real 0m3.488s
user 0m2.320s
sys 0m0.180s

ClustalW

time clustalw

time
real 0m40.625s
user 0m5.320s
sys 0m0.070s

Muscle

time muscle -in multiple_alignment.fasta -out muscle_multiple_alignment.aln

time
real 0m3.018s
user 0m1.710s
sys 0m0.100s

T-Coffee

time t_coffee multiple_alignment.fasta

time
real 0m41.360s
user 0m34.000s
sys 0m0.920s

3D-Coffee/Expresso

time t_coffee -seq multiple_alignment.fasta -mode expresso -pdb_type dn

time
real 12m19.825s
user 5m17.140s
sys 0m46.970s

Discussion