Difference between revisions of "Metachromatic leukodystrophy reference aminoacids"

From Bioinformatikpedia
Line 17: Line 17:
   
 
== Database Searches ==
 
== Database Searches ==
  +
FASTA, BLAST and PSI-BLAST were run against the non-redundant database (NR). HHsearch was run through the web interface (LINK!!) aigainst the PDB and Interpro database. The following parameter settings were used:
   
  +
* BLAST: <code>blastall -p blastp -i refSeq.fasta -d /data/blast/nr/nr > blastp</code> with refSeq.fasta being the file containing the reference sequence and blastp the * PSI-BLAST: <code>blastpgp -i refSeq.fasta -d /data/blast/nr/nr -e''"e-value"'' -j ''"#iterations"'' > psiblast_"e-value"_"#iterations"</code> <br />
=== BLAST against NR ===
 
  +
* PSI-BLAST was run with the following parameter settings:
==== How To ====
 
  +
* e-value cutoff 0.005, 3 iterations (Psi-blast1)
With BLAST being installed, the following steps were performed:
 
  +
* e-value cutoff 0.005, 5 iterations (Psi-blast2)
* typed <code>blastall -p blastp -i refSeq.fasta -d /data/blast/nr/nr > blastp</code> with refSeq.fasta being the file containing the reference sequence and blastp the outfile
 
  +
* e-value cutoff 10E-6, 3 iterations (Psi-blast3)
==== Best 20 results ====
 
  +
* e-value cutoff 10E-6, 5 iterations (Psi-blast4)
  +
* HHsearch: We used the online version of hhPred <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> with default parameters. One search was performed against PDB and one against Interpro.
   
  +
=== Alignment results ===
  +
We wrote a perl script to parse the output files of the individual programs and extracted identifier, alignment score and the percentage of identical residues within the alignment (UPLOAD file!!).
   
=== PSI-BLAST against NR ===
+
=== Mapping of identifier ===
  +
The non-redundant database contains entries from various databases, including RefSeq, PDB, PIR, PRF, GenBank and Swiss-Prot. In order to compare results of NR database searches with the results of the HHpred searches, a mapping of the IDs is necessary. Furthermore, the entries in HSSP - which is used later to benchmark the alignment results - contains only references to the UniProtKB accession number (ACCNUM). To overcome this problem we downloaded a mapping table between the IDs from ref>http://pir.georgetown.edu/pirwww/search/idmapping.shtml</ref>. This table was used - together with some short perl scripts - to map IDs between the databases and compare the results.
==== How To ====
 
With PSI-BLAST being installed, the following command was executed:
 
* <code>blastpgp -i refSeq.fasta -d /data/blast/nr/nr -e''"e-value"'' -j ''"#iterations"'' > psiblast_"e-value"_"#iterations"</code>
 
==== e-value cutoff 0.005, 3 iterations ====
 
===== Best 20 results =====
 
==== e-value cutoff 0.005, 5 iterations ====
 
===== Best 20 results =====
 
==== e-value cutoff 10E-6, 3 iterations ====
 
===== Best 20 results =====
 
==== e-value cutoff 10E-6, 5 iterations ====
 
===== Best 20 results =====
 
   
=== FASTA against NR ===
+
=== Summary of database searches ===
  +
In this section, we give a short summary description of the search results of the individual programs and the compare them to each other.
==== How To ====
 
fasta was not yet installed on the computer, so it was installed, executing the following command from the ./src directory from the software's sourc code:
 
* <code>make -f ../make/Makefile.linux_sse2 all</code>
 
We aligned the sequences, using the parameters written below:
 
* <code>./bin/fasta36 -q ~/Documents/refSeq.fasta /data/blast/nr/nr > fasta_results.txt </code>
 
   
=== hhsearch ===
+
==== FASTA ====
  +
FASTA yielded with 4733 alignments the highest number of hits.
==== How To ====
 
  +
We used the online version of hhPred <ref>http://toolkit.lmb.uni-muenchen.de/hhpred</ref> with the following parameters
 
  +
==== BLAST ====
* local alignment
 
  +
BLAST produced 252 alignments.
* 3 iterations
 
  +
  +
==== PSI-BLAST ====
  +
* Using an E-value cutoff of 0.005, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
  +
* Using an E-value cutoff of 10E-6, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
  +
  +
==== HHsearch ====
  +
HHsearch produced 33 alignments for the search against PDB and 74 alignments for search against Interpro.
  +
  +
==== Comparison ====
  +
  +
===== HSSP =====
  +
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
|'''Method'''
  +
|''' Recall (GI)'''
  +
| ''' Recall (pdb)'''
  +
| '''Precision (GI) '''
  +
|-
  +
|FASTA
  +
|0.92
  +
|0.67
  +
|0.23
  +
|-
  +
|BLAST
  +
| 0.11
  +
| 0.42
  +
| 0.54
  +
|-
  +
|Psi-blast1
  +
| 0.21
  +
| 0.42
  +
| 0.65
  +
|-
  +
|Psi-blast2
  +
| 0.23
  +
| 0.5
  +
| 0.62
  +
|-
  +
|Psi-blast3
  +
| 0.21
  +
| 0.42
  +
| 0.65
  +
|-
  +
|Psi-blast4
  +
| 0.23
  +
| 0.5
  +
| 0.62
  +
|-
  +
|hhpred (pdb)
  +
| 0.01
  +
| 1
  +
| 0.11
  +
|-
  +
|hhpred (interpro)
  +
| 0.01
  +
| 0.92
  +
| 0.12
  +
|}
   
Due to the fact that only PDB-IDs could be extracted from the HHpred-output, we had to do a mapping from PDB ID to RefSeq AC. This was done by mapping PDB ID to UniProt AC and then to RefSeq AC by PIR ID Mapping <ref>http://pir.georgetown.edu/pirwww/search/idmapping.shtml</ref>
 
==== against PDB ====
 
===== Best 20 results =====
 
   
 
== Multiple Alignments ==
 
== Multiple Alignments ==

Revision as of 13:13, 23 May 2011

Sequence

>sp|P15289|ARSA_HUMAN Arylsulfatase A OS=Homo sapiens GN=ARSA PE=1 SV=3
MGAPRSLLLALAAGLAVARPPNIVLIFADDLGYGDLGCYGHPSSTTPNLDQLAAGGLRFT
DFYVPVSLCTPSRAALLTGRLPVRMGMYPGVLVPSSRGGLPLEEVTVAEVLAARGYLTGM
AGKWHLGVGPEGAFLPPHQGFHRFLGIPYSHDQGPCQNLTCFPPATPCDGGCDQGLVPIP
LLANLSVEAQPPWLPGLEARYMAFAHDLMADAQRQDRPFFLYYASHHTHYPQFSGQSFAE
RSGRGPFGDSLMELDAAVGTLMTAIGDLGLLEETLVIFTADNGPETMRMSRGGCSGLLRC
GKGTTYEGGVREPALAFWPGHIAPGVTHELASSLDLLPTLAALAGAPLPNVTLDGFDLSP
LLLGTGKSPRQSLFFYPSYPDEVRGVFAVRTGKYKAHFFTQGSAHSDTTADPACHASSSL
TAHEPPLLYDLSKDPGENYNLLGGVAGATPEVLQALKQLQLLKAQLDAAVTFGPSQVARG
EDPALQICCHPGCTPRPACCHCPDPHA


Source


Database Searches

FASTA, BLAST and PSI-BLAST were run against the non-redundant database (NR). HHsearch was run through the web interface (LINK!!) aigainst the PDB and Interpro database. The following parameter settings were used:

  • BLAST: blastall -p blastp -i refSeq.fasta -d /data/blast/nr/nr > blastp with refSeq.fasta being the file containing the reference sequence and blastp the * PSI-BLAST: blastpgp -i refSeq.fasta -d /data/blast/nr/nr -e"e-value" -j "#iterations" > psiblast_"e-value"_"#iterations"
  • PSI-BLAST was run with the following parameter settings:
 * e-value cutoff 0.005, 3 iterations (Psi-blast1)
 * e-value cutoff 0.005, 5 iterations (Psi-blast2)
 * e-value cutoff 10E-6, 3 iterations (Psi-blast3)
 * e-value cutoff 10E-6, 5 iterations (Psi-blast4)

Alignment results

We wrote a perl script to parse the output files of the individual programs and extracted identifier, alignment score and the percentage of identical residues within the alignment (UPLOAD file!!).

Mapping of identifier

The non-redundant database contains entries from various databases, including RefSeq, PDB, PIR, PRF, GenBank and Swiss-Prot. In order to compare results of NR database searches with the results of the HHpred searches, a mapping of the IDs is necessary. Furthermore, the entries in HSSP - which is used later to benchmark the alignment results - contains only references to the UniProtKB accession number (ACCNUM). To overcome this problem we downloaded a mapping table between the IDs from ref>http://pir.georgetown.edu/pirwww/search/idmapping.shtml</ref>. This table was used - together with some short perl scripts - to map IDs between the databases and compare the results.

Summary of database searches

In this section, we give a short summary description of the search results of the individual programs and the compare them to each other.

FASTA

FASTA yielded with 4733 alignments the highest number of hits.

BLAST

BLAST produced 252 alignments.

PSI-BLAST

  • Using an E-value cutoff of 0.005, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.
  • Using an E-value cutoff of 10E-6, PSI-BLAST produced 756 alignments for 3 iterations and 1257 for 5 iterations.

HHsearch

HHsearch produced 33 alignments for the search against PDB and 74 alignments for search against Interpro.

Comparison

HSSP
Method Recall (GI) Recall (pdb) Precision (GI)
FASTA 0.92 0.67 0.23
BLAST 0.11 0.42 0.54
Psi-blast1 0.21 0.42 0.65
Psi-blast2 0.23 0.5 0.62
Psi-blast3 0.21 0.42 0.65
Psi-blast4 0.23 0.5 0.62
hhpred (pdb) 0.01 1 0.11
hhpred (interpro) 0.01 0.92 0.12


Multiple Alignments

For building the multiple Alignments the results of the Psiblast run with e-value cutoff of 10E-6 and 5 iterations were divided into 6 groups by sequence identity:

  • <20%
  • 20% - 39%
  • 40% - 59%
  • 60% - 89%
  • 90% - 99%
  • >99%

The sequences with <20% and >99% sequence identitiy were ignored and 5 samples were randomly picked from the other ranges. So 20 sequences were available for the multiple alignments.


References

<references />