Difference between revisions of "Metachromatic leukodystrophy reference aminoacids"

Revision as of 12:36, 23 May 2011

Sequence

>sp|P15289|ARSA_HUMAN Arylsulfatase A OS=Homo sapiens GN=ARSA PE=1 SV=3 MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG EDPALQICCHPGCTPRPACCHCPDPHA

Source

Uniprot

Database Searches

FASTA, BLAST and PSI-BLAST were run against the non-redundant database (NR). HHsearch was run through the web interface<ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> aigainst the PDB and Interpro database. The following parameter settings were used:

BLAST: blastall -p blastp -i refSeq.fasta -d /data/blast/nr/nr > blastp with refSeq.fasta being the file containing the reference sequence and blastp the * PSI-BLAST: blastpgp -i refSeq.fasta -d /data/blast/nr/nr -e"e-value" -j "#iterations" > psiblast_"e-value"_"#iterations"
PSI-BLAST was run with the following parameter settings:
- e-value cutoff 0.005, 3 iterations (Psi-blast1)
- e-value cutoff 0.005, 5 iterations (Psi-blast2)
- e-value cutoff 10E-6, 3 iterations (Psi-blast3)
- e-value cutoff 10E-6, 5 iterations (Psi-blast4)
HHsearch: We used the online version of hhPred <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> with default parameters. One search was performed against PDB and one against Interpro.

Alignment results

We wrote a perl script to parse the output files of the individual programs and extracted identifier, alignment score and the percentage of identical residues within the alignment.

Mapping of identifier

The non-redundant database contains entries from various databases, including RefSeq, PDB, PIR, PRF, GenBank and Swiss-Prot. In order to compare results of NR database searches with the results of the HHpred searches, a mapping of the IDs is necessary. Furthermore, the entries in HSSP - which is used later to benchmark the alignment results - contains only references to the UniProtKB accession number (ACCNUM). To overcome this problem we downloaded a mapping table between the IDs from <ref>http://pir.georgetown.edu/pirwww/search/idmapping.shtml</ref>. This table was used - together with some short perl scripts - to map IDs between the databases and compare the results.

Summary of database searches

In this section, we give a short summary description of the search results of the individual programs and the compare them to each other.

Comparison of the methods

FASTA yielded with 4733 alignments the highest number of hits.
BLAST produced 252 alignments.
PSI-BLAST
- Using an E-value cutoff of 0.005, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
- Using an E-value cutoff of 10E-6, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
HHsearch produced 33 alignments for the search against PDB and 74 alignments for search against Interpro.

FASTA shows the highest number of alignments, probably due to the fact, that no e-value cutoff was chosen. Contrary, hhsearch has very few alignments. This could be ascribed to the fact, that completely different databases were used for the alignments and Interpro and pdb just did not have as much homolguous sequences as the nr database. This is also supported by the benchmark with HSSP (see next section). Aother interesting fact is that the results of PSI-BLAST depended for our parameter setting only on the number of iterations. Ragarding the results for the number of iterations, both e-value cutoffs yielded excepted of some single exceptions the same aligned target sequences from the database.

The density of alignment scores and percentage of identical residues within the alignments are plotted

The number of shared target sequences between two methods is shown in the upper panel (self overlaps not shown). The lower panel depicts how many percent of the aligned target sequences of a given method (x-axis) are shared with the other methods.

HSSP

Method	Recall (GI)	Recall (pdb)	Precision (GI)
FASTA	0.92	0.67	0.23
BLAST	0.11	0.42	0.54
Psi-blast1	0.21	0.42	0.65
Psi-blast2	0.23	0.5	0.62
Psi-blast3	0.21	0.42	0.65
Psi-blast4	0.23	0.5	0.62
hhpred (pdb)	0.01	1	0.11
hhpred (interpro)	0.01	0.92	0.12

Multiple Alignments

For building the multiple Alignments the results of the Psiblast run with e-value cutoff of 10E-6 and 5 iterations were divided into 6 groups by sequence identity:

<20%
20% - 39%
40% - 59%
60% - 89%
90% - 99%
>99%

The sequences with <20% and >99% sequence identitiy were ignored and 5 samples were randomly picked from the other ranges. So 20 sequences were available for the multiple alignments.

References

@@ Line 17: / Line 17: @@
 == Database Searches ==
-FASTA, BLAST and PSI-BLAST were run against the non-redundant database (NR). HHsearch was run through the web interface (LINK!!) aigainst the PDB and Interpro database. The following parameter settings were used:
+FASTA, BLAST and PSI-BLAST were run against the non-redundant database (NR). HHsearch was run through the web interface<ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> aigainst the PDB and Interpro database. The following parameter settings were used:
 * BLAST: <code>blastall -p blastp -i refSeq.fasta -d /data/blast/nr/nr > blastp</code> with refSeq.fasta being the file containing the reference sequence and blastp the * PSI-BLAST: <code>blastpgp -i refSeq.fasta -d /data/blast/nr/nr -e''"e-value"'' -j ''"#iterations"'' > psiblast_"e-value"_"#iterations"</code> <br />
 *  PSI-BLAST was run with the following parameter settings:
-  * e-value cutoff 0.005, 3 iterations (Psi-blast1)
+** e-value cutoff 0.005, 3 iterations (Psi-blast1)
-  * e-value cutoff 0.005, 5 iterations (Psi-blast2)
+** e-value cutoff 0.005, 5 iterations (Psi-blast2)
-  * e-value cutoff 10E-6, 3 iterations (Psi-blast3)
+** e-value cutoff 10E-6, 3 iterations (Psi-blast3)
-  * e-value cutoff 10E-6, 5 iterations (Psi-blast4)
+** e-value cutoff 10E-6, 5 iterations (Psi-blast4)
 * HHsearch: We used the online version of hhPred <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> with default parameters. One search was performed against PDB and one against Interpro.
 === Alignment results ===
-We wrote a perl script to parse the output files of the individual programs and extracted identifier, alignment score and the percentage of identical residues within the alignment (UPLOAD file!!).
+We wrote a perl script to parse the output files of the individual programs and extracted identifier, alignment score and the percentage of identical residues within the alignment.
 === Mapping of identifier ===
@@ Line 36: / Line 36: @@
 In this section, we give a short summary description of the search results of the individual programs and the compare them to each other.
-==== FASTA ====
+==== Comparison of the methods ====
-FASTA yielded  with 4733 alignments the highest number of hits.
+* FASTA yielded  with 4733 alignments the highest number of hits.
-==== BLAST ====
-BLAST produced 252 alignments.
+* BLAST produced 252 alignments.
+* PSI-BLAST
+** Using an E-value cutoff of 0.005, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
+** Using an E-value cutoff of 10E-6, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
+* HHsearch produced 33 alignments for the search against PDB and 74 alignments for search against Interpro.
+FASTA shows the highest number of alignments, probably due to the fact, that no e-value cutoff was chosen. Contrary, hhsearch has very few alignments. This could be ascribed to the fact, that completely different databases were used for the alignments and Interpro and pdb just did not have as much homolguous sequences as the nr database. This is also supported by the benchmark with HSSP (see next section). Aother interesting fact is that the results of PSI-BLAST depended for our parameter setting only on the number of iterations. Ragarding the results for the number of iterations, both e-value cutoffs yielded excepted of some single exceptions the same aligned target sequences from the database.
-==== PSI-BLAST ====
-* Using an E-value cutoff of 0.005, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
-* Using an E-value cutoff of 10E-6, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
-==== HHsearch ====
-HHsearch produced 33 alignments for the search against PDB and 74 alignments for search against Interpro.
-==== Comparison ====
 [[Image:Scores identity.jpeg|thumb|right| The density of alignment scores and percentage of identical residues within the alignments are plotted]]
 [[Image:overlap.jpeg|thumb|right| The number of shared target sequences between two methods is shown in the upper panel (self overlaps not shown). The lower panel depicts how many percent of the aligned target sequences of a given method (x-axis) are shared with the other methods.]]

Difference between revisions of "Metachromatic leukodystrophy reference aminoacids"

Revision as of 12:36, 23 May 2011

Contents

Sequence

Source

Database Searches

Alignment results

Mapping of identifier

Summary of database searches

Comparison of the methods

HSSP

Multiple Alignments

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools