Difference between revisions of "Sequence Alignment GLA"
(→HSSP) |
m (→HSSP) |
||
Line 37: | Line 37: | ||
==HSSP== |
==HSSP== |
||
We used a HSSP [http://mrs.cmbi.ru.nl/mrs-5/search?db=hssp&q=alpha+galactosidase online resource] to find proteins related to alpha galactosidase and choose results, that are found in homo sapiens. |
We used a HSSP [http://mrs.cmbi.ru.nl/mrs-5/search?db=hssp&q=alpha+galactosidase online resource] to find proteins related to alpha galactosidase and choose results, that are found in homo sapiens. |
||
− | The resulting files contained the ID's that were found by HSSP |
+ | The resulting files contained the ID's that were found by HSSP, which were used to classify the hits of the searches (FASTA/BLAST/PSI-BLAST) as true positives (TP). |
{|class="wikitable" border="1" style="text-align:center; border-spacing:0;" |
{|class="wikitable" border="1" style="text-align:center; border-spacing:0;" |
Revision as of 01:46, 24 May 2011
Sequence Searches
GLA sequence was searched in the PDB non-redundant(nr) database using three different tools, that are listed below. An additional search has been made applying HHsearch on the pdb70 database(max.70% sequence identity).
Blast
We used the NCBI Blast Version 2.2.18 with the command:
blast -i sequence.fasta -d database -p blastp
Fasta
As no fasta program was installed, we downloaded fasta36 and installed it to the virtual machine. The command to run the program is:
fasta36 sequence.fasta database
PSI-Blast
We used the PSI-Blast version 2.2.18 with the command:
blastpgp -i sequence.fasta -d database -j iterations -h e-value
Parameter
We used the following combinations of parameter to run the program:
- 3 iterations and e-value threshold of 0.005
- 3 iterations and e-value threshold of 0.002
- 3 iterations and e-value threshold of 10e-6
- 5 iterations and e-value threshold of 0.005
- 5 iterations and e-value threshold of 0.002
- 5 iterations and e-value threshold of 10e-6
HHsearch
We used the online tool from Gene Center of the LMU Munich with the default parameters:
- Database = PDB70
- Max. number of PSI-BLAST iterations = 3
- Alignment mode = local
The results are still available: Results.
HSSP
We used a HSSP online resource to find proteins related to alpha galactosidase and choose results, that are found in homo sapiens. The resulting files contained the ID's that were found by HSSP, which were used to classify the hits of the searches (FASTA/BLAST/PSI-BLAST) as true positives (TP).
Program | Sensitivity |
---|---|
Blast | 20.9% |
Fasta | 31.6% |
PSI-Blast: 3 Iterations: E-value cutoff 10e-6 | 24.0% |
PSI-Blast: 3 Iterations: E-value cutoff 0.002 | 24.1% |
PSI-Blast: 3 Iterations: E-value cutoff 0.005 | 24.2% |
PSI-Blast: 5 Iterations: E-value cutoff 10e-6 | 24.0% |
PSI-Blast: 5 Iterations: E-value cutoff 0.002 | 24.5% |
PSI-Blast: 5 Iterations: E-value cutoff 0.005 | 24.5% |
Overlap
E-value Distribution
Figure 4 shows the e-value distributions of the programs.
Since all blast based programs had a huge number of e-values of the value 0, it was impossible to plot the logarithm of the distribution correctly. The certain values have been set to -500 to provide any plot at all. The logarithm function was necessary, because some outliers were so widely spread, that there was no visible distribution(Plot). |
Identity Distribution
Figure 5 shows the identity distributions of the programs. As not all programs delivered the same number of sequences, the values are normalized to 100. Despite some differences, the majority distribution of the identities is similar, except for HHsearch. The other ones does all have peaks between 30-45% identity. |
Runtime Analysis
The runtime of each program was measured by using the command time
as a prefix in the commandline.
Program | Runtime |
---|---|
Blast | 2:40 min |
Fasta | 5:16 min |
PSI-Blast: 3 Iterations: E-value cutoff 10e-6 | 7:50 min |
PSI-Blast: 3 Iterations: E-value cutoff 0.002 | 7:48 min |
PSI-Blast: 3 Iterations: E-value cutoff 0.005 | 7:55 min |
PSI-Blast: 5 Iterations: E-value cutoff 10e-6 | 13:27 min |
PSI-Blast: 5 Iterations: E-value cutoff 0.002 | 13:06 min |
PSI-Blast: 5 Iterations: E-value cutoff 0.005 | 12:49 min |
Multiple Sequence Alignments
Selection of Sequences
We selected twenty sequences of the FASTA/BLAST/PSI-BLAST search results which fullfilled the following criteria:
- about 400 amino acids long
- true positive according to the research with HSSP
Unfortunately we were not able to find a sequence of a PDB structure with an identity between 89%-60%. The selected sequences are listed in the following table.
GenBank Identifier | Source | Description | Organism | Identity |
---|---|---|---|---|
99%-90% Sequence Identity | ||||
295789486 | PDB | alpha-galactosidase A, chain A | Homo sapiens | 99% |
62896813 | GenBank | alpha-galactosidase | Homo sapiens | 99% |
269914455 | PDB | alpha-galactosidase | Homo sapiens | 99% |
297710567 | RefSeq | alpha-galactosidase A-like | Pongo abelii | 96% |
296235998 | RefSeq | alpha-galactosidase A | Callithrix jacchus | 95% |
89%-60% Sequence Identity | ||||
301788124 | RefSeq | alpha-galactosidase A-like | Ailuropoda melanoleuca | 83% |
133778924 | RefSeq | alpha-galactosidase A | Mus musculus | 78% |
114051916 | RefSeq | alpha-N-acetylgalactosaminidase | Bombyx mori | 76% |
291190554 | RefSeq | alpha-galactosidase A | Salmo salar | 67% |
148228315 | RefSeq | alpha-galactosidase | Xenopus laevis | 65% |
59%-40% Sequence Identity | ||||
20151048 | PDB | alpha-N-acetylgalactosaminidase, chain A | Gallus gallus | 57% |
261824882 | PDB | alpha-N-acetylgalactosaminidase, chain A | Homo sapiens | 54% |
148229665 | RefSeq | alpha-N-acetylgalactosaminidase | Xenopus laevis | 47% |
92096920 | GenBank | NAGA protein | Bos taurus | 46% |
260593558 | RefSeq | alpha-galactosidase | Prevotella veroralis F0319 | 41% |
39%-20% Sequence Identity | ||||
51701639 | Swiss-Prot | alpha-galactosidase precursor | Lachancea cidri | 38% |
74626383 | Swiss-Prot | alpha-galactosidase B precursor | Aspergillus niger | 35% |
299856763 | PDB | alpha-galactosidase, chain A | Saccharomyces cerevisiae | 34% |
310699603 | GenBank | alpha-D-galactopyranosidase | Fusarium oxysporum | 33% |
226293587 | Swiss-Prot | alpha-galactosidase precursor | Torulaspora delbrueckii | 31% |
Cobalt
We used NCBI Cobalt version 2.0.1 with the command:
cobalt -i sequences.fasta -norps T
ClustalW
We used ClustalW version 1.83 with the command:
clustalw -infile=sequences.fasta
Muscle
We used Muscle version 3.8.31 with the command:
muscle -in sequences.fasta -out muscle_msa.aln
T-Coffee
The basic command to start T-Coffee version 8.99 is:
t_coffee sequences.fasta
3D
To start the 3D mode the additional parameters -mode expresso -pdb_type dn
were given as a suffix to the command.
References
<references />