Difference between revisions of "Glucocerebrosidase sequence alignments"

From Bioinformatikpedia
(Sequence searches)
(Sequence searches)
Line 24: Line 24:
 
==== Overlap ====
 
==== Overlap ====
   
The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>.
+
The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>. The results of HHSearch are not included as a different database (pdb70) was used.
  +
  +
[[Image:Example.jpg]]
   
   

Revision as of 16:50, 21 May 2011

Sequence searches

Several different tools were used in order to look for sequences that are related to glucocerebrosidase in the non-redundant sequence database.

  • FASTA
As Fasta was not initially installed, it was downloaded from the EBI FTP Download Site <ref>ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/fasta36/</ref>.
Command:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out
  • BLAST
Command:
blastall -p blastp -d /data/blast/nr/nr -i gbaseq.fasta -o blast.out
  • PSI-BLAST
This tool was used 4 times with all different combinations of 3 or 5 iterations (x) and an E-value cut-off (y) of 0.005 or 10e-6.
Command:
blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_x_y.out -j x -h y


Furthermore the online version of HHSearch <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> was used to search against the pdb70 database of May 14th.

Results

The sequence search with FASTA returned 520 sequences. BLAST, as well as the different PSI-BLAST runs returned 500 sequences. The search with HHSearch against the pdb database only resulted in 100 sequences.

Overlap

The overlaps between the results of the different tools applied to the non-redundant sequence database are visualized using Venn-Diagrams<ref>http://bioinformatics.psb.ugent.be/webtools/Venn/</ref>. The results of HHSearch are not included as a different database (pdb70) was used.

Example.jpg




Fasta

As Fasta was not initially installed, it was downloaded from the EBI FTP Download Site <ref>ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/fasta36/</ref>.

Command:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out


To apply the tool, the following command was used:
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out

Fasta

Fasta was not yet installed, so we downloaded it here: [[1]]
../bin/fasta36 gbaseq.fasta /data/blast/nr/nr > fasta_gba_search.out

identity percentage number
0-9 % 0.0 0
10-19% 0.0 0
20-29% 0.462 240
30-39% 0.308 160
40-49% 0.085 44
50-59% 0.012 6
60-69% 0.006 3
70-79% 0.012 6
80-89% 0.052 27
90-100% 0.065 34
Total: 520

Blast

blastall -p blastp -d /data/blast/nr/nr -i gbaseq.fasta -o blast.out

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.172 43
>= 30% 0.508 127
>= 40% 0.112 28
>= 50% 0.024 6
>= 60% 0.012 3
>= 70% 0.004 1
>= 80% 0.06 15
>= 90% 0.108 27
Total: 250

PSI-Blast

3 iterations, E-value cutoff 0.005

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_0.005.out -j 3 -h 0.005

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.34 85
>= 30% 0.392 98
>= 40% 0.104 26
>= 50% 0.016 4
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

3 iterations, E-value cutoff 10E-6

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_3_10E-6.out -j 3 -h 10E-6

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.34 85
>= 30% 0.392 98
>= 40% 0.104 26
>= 50% 0.016 4
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

5 iterations, E-value cutoff 0.005

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_0.005.out -j 5 -h 0.005

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.384 96
>= 30% 0.36 90
>= 40% 0.096 24
>= 50% 0.012 3
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

5 iterations, E-value cutoff 10E-6

blastpgp -d /data/blast/nr/nr -i gbaseq.fasta -o psi_blast_5_10E-6.out -j 5 -h 10E-6

identity percentage number
>= 0% 0.0 0
>= 10% 0.0 0
>= 20% 0.376 94
>= 30% 0.364 91
>= 40% 0.1 25
>= 50% 0.012 3
>= 60% 0.0 0
>= 70% 0.004 1
>= 80% 0.036 9
>= 90% 0.108 27
Total: 250

HHSearch

For HHSearch we used [[2]] with the pdb70 database of May 14th.

Discussion

Multiple sequence alignments

Sequences used for multiple sequence alignments

For the multiple sequence alignments we used our reference sequence and twenty sequences we had found with sequence searches. We tried to avoid hypothetical sequences and tried to take sequences, that have similiar identities in all sequence searches. The following tables show the chosen sequences with their identities in the different searches. We only found one pdb structure (which was also in the HSSP database).

our reference sequence: P04062, GLCM_HUMAN Glucosylceramidase

99 - 90% sequence identity

NP_001127488.1 glucosylceramidase precursor Pongo abelii 95.0, 98.0, 98.0, 98.0, 98.0, 98.1
3KE0 A Chain A, Crystal Structure Of N370s Glucocerebrosidase At Acidic Ph. 97.0, 99.0, 99.0, 99.0, 99.0, 99.8
EAW53100.1 glucosidase, beta; acid (includes glucosylceramidase), isoform CRA_a Homo sapiens 97.0, 99.0, 99.0, 99.0, 99.0, 99.6
NP_001165283.1 glucosylceramidase isoform 3 precursor Homo sapiens 88.0, 90.0, 90.0, 90.0, 90.0, 90.9
NP_001128784.1 DKFZP469B0323 protein Pongo abelii 95.0, 97.0, 97.0, 97.0, 97.0, 97.4

89 - 60% sequence identity

NP_032120.1 glucosylceramidase isoform 1 Mus musculus 84.0, 86.0, 86.0, 86.0, 86.0, 86.4
EDL15229.1 glucosidase, beta, acid, isoform CRA_a Mus musculus 84.0, 86.0, 86.0, 86.0, 86.0, 86.3
NP_001121111.1 glucosidase, beta, acid Rattus norvegicus 85.0, 87.0, 87.0, 87.0, 87.0, 87.6
NP_001039886.1 glucosylceramidase precursor Bos taurus 86.0, 89.0, 89.0, 89.0, 89.0, 89.2
NP_001005730.1 glucosylceramidase precursor Sus scrofa 87.0, 89.0, 89.0, 89.0, 89.0, 89.6

59 - 40% sequence identity

EFN73638.1 Glucosylceramidase Camponotus floridanus 41.0, 40.0, 40.0, 41.0, 40.0, 42.2
CAG11843.1 unnamed protein product Tetraodon nigroviridis 52.0, 53.0, 53.0, 53.0, 53.0, 54.2
NP_500785.1 hypothetical protein Y4C6B.6 Caenorhabditis elegans 41.0, 40.0, 39.0, 40.0, 39.0, 41.9
EFA07058.1 hypothetical protein TcasGA2_TC010035 Tribolium castaneum 41.0, 42.0, 41.0, 42.0, 41.0, 43.2
EFO26573.1 O-glycosyl hydrolase family 30 protein Loa loa 40.0, 40.0, 40.0, 40.0, 40.0, 41.7

39 - 20% sequence identity

ZP_07040024.1 glucosylceramidase Bacteroides sp. 3_1_23 26.0, 24.0, 24.0, 24.0, 24.0, 25.5
YP_244236.1 glycosyl hydrolase Xanthomonas campestris pv. campestris str. 8004 33.0, 31.0, 30.0, 31.0, 31.0, 33.4
ZP_01885435.1 glycosyl hydrolase Pedobacter sp. BAL39 36.0, 33.0, 32.0, 33.0, 33.0, 37.2
ZP_07388379.1 Glucan endo-1,6-beta-glucosidase Paenibacillus curdlanolyticus YK9 28.0, 24.0, 23.0, 24.0, 24.0, 30.1
NP_623885.1 O-glycosyl hydrolase family protein Thermoanaerobacter tengcongensis MB4 37.0, 34.0, 33.0, 34.0, 32.0, 37.5

Cobalt

Cobalt was not yet installed, so we downloaded it here: [[3]]

time /home/student/Desktop/ncbi-cobalt-2.0.1/cobalt -i multiple_alignment.fasta -norps T > cobalt_multiple_alignment.aln

time
real 0m3.488s
user 0m2.320s
sys 0m0.180s
Multiple Alignment by Cobalt in Jalview

ClustalW

time clustalw

time
real 0m40.625s
user 0m5.320s
sys 0m0.070s
Multiple Alignment by ClustalW in Jalview

Muscle

time muscle -in multiple_alignment.fasta -out muscle_multiple_alignment.aln

time
real 0m3.018s
user 0m1.710s
sys 0m0.100s
Multiple Alignment by Muscle with Jalview

T-Coffee

time t_coffee multiple_alignment.fasta

time
real 0m41.360s
user 0m34.000s
sys 0m0.920s
Multiple Alignment by T-Coffee with Jalview

3D-Coffee/Expresso

time t_coffee -seq multiple_alignment.fasta -mode expresso -pdb_type dn

time
real 12m19.825s
user 5m17.140s
sys 0m46.970s
Multiple Alignment by 3D-Coffee/Expresso in Jalview

Discussion

References

<references />