Difference between revisions of "Fabry:Sequence alignments (sequence searches and multiple alignments)/Journal"
Rackersederj (talk | contribs) m (→Psi-Blast) |
Staniewski (talk | contribs) |
||
Line 8: | Line 8: | ||
blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P06280.fasta -m 0 -o blastsearch_default.out -v 700 -b 700 |
blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P06280.fasta -m 0 -o blastsearch_default.out -v 700 -b 700 |
||
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/extract_ids_blast.pl.html extract_ids_blast.pl]</span> blastsearch_default.out |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/download-annotation.pl.html download-annotation.pl]</span> blastsearch_default_ids.html |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/compare_GO_terms.pl.html compare_GO_terms.pl]</span> P06280 blastsearch_default_ids_GOterms.tsv |
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/parse_blast.pl.html parse_blast.pl]</span> blastsearch_default.out |
− | R CMD BATCH <span class="plainlinks">[https:// |
+ | R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/hist_blast.R.html hist_blast.R]</span> |
=== Psi-Blast === |
=== Psi-Blast === |
||
Line 19: | Line 19: | ||
The following command was used to run Psi-Blast with AGAL as query sequence against big80. It was run with two and ten iterations configured and an e-value cut-off of 2e-3 and 1e-9, respectively. |
The following command was used to run Psi-Blast with AGAL as query sequence against big80. It was run with two and ten iterations configured and an e-value cut-off of 2e-3 and 1e-9, respectively. |
||
− | $ bash <span class="plainlinks">[https:// |
+ | $ bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/run_psi_blast.sh.html run_psi_blast.sh]</span> &> run_psi_blast.log |
The log file contains the runtimes of the different psi-blast runs: |
The log file contains the runtimes of the different psi-blast runs: |
||
Line 49: | Line 49: | ||
Afterwards, the psi-blast output was parsed to collect the all the information about all the hits of the last iteration, which include the e-value, the sequence identity, the coverage in the longer sequence of the pairwise alignment and the length of the alignment. When there were more than one alignment per hit, we used the first one which was also listed in the short result output. |
Afterwards, the psi-blast output was parsed to collect the all the information about all the hits of the last iteration, which include the e-value, the sequence identity, the coverage in the longer sequence of the pairwise alignment and the length of the alignment. When there were more than one alignment per hit, we used the first one which was also listed in the short result output. |
||
− | $ for i in psi_results_*. |
+ | $ for i in psi_results_*.html; do |
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/parse_psiblast.pl.html parse_psiblast.pl]</span> "$i" > "${i%.*}.stats" |
done |
done |
||
− | The histograms were generated with the [https:// |
+ | The histograms were generated with the [https://dl.dropbox.com/u/13796643/fabry/msa_scripts/generate_histograms.sh.html generate_histograms.sh] script: |
− | $ bash <span class="plainlinks">[https:// |
+ | $ bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/generate_histograms.sh.html generate_histograms.sh]</span> *.stats |
GO term comparison |
GO term comparison |
||
− | perl ../download-annotation.pl ids_psiblast_10th_round. |
+ | perl ../download-annotation.pl ids_psiblast_10th_round.html |
perl ../compare_GO_terms.pl P06280 ids_psiblast_10th_round_GOterms.tsv |
perl ../compare_GO_terms.pl P06280 ids_psiblast_10th_round_GOterms.tsv |
||
Line 67: | Line 67: | ||
time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_default.out -E 0.003 -z 700 |
time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_default.out -E 0.003 -z 700 |
||
− | ./<span class="plainlinks">[https:// |
+ | ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/extract_ids_hhblits.sh.html extract_ids_hhblits.sh]</span> hhblits_default.out |
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/parse_hhblits.pl.html parse_hhblits.pl]</span> hhblits_default.out |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/download-annotation.pl.html download-annotation.pl]</span> hhblits_default.out_cluster_ids_only.tsv |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/compare_GO_terms.pl.html compare_GO_terms.pl]</span> P06280 hhblits_default.out_cluster_ids_only_GOterms.tsv |
time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_n8_neu.out -E 0.003 -n 8 -z 800 -b 800 |
time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_n8_neu.out -E 0.003 -n 8 -z 800 -b 800 |
||
− | ./<span class="plainlinks">[https:// |
+ | ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/extract_ids_hhblits.sh.html extract_ids_hhblits.sh]</span> hhblits_n8_neu.out |
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/parse_hhblits.pl.html parse_hhblits.pl]</span> hhblits_n8_neu.out |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/download-annotation.pl.html download-annotation.pl]</span> hhblits_n8_neu.out_cluster_ids_only.tsv |
− | perl ../<span class="plainlinks">[https:// |
+ | perl ../<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/compare_GO_terms.pl.html compare_GO_terms.pl]</span> P06280 hhblits_n8_neu.out_cluster_ids_only_GOterms.tsv |
− | R CMD BATCH <span class="plainlinks">[https:// |
+ | R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/hist_hhblits.R.html hist_hhblits.R]</span> |
== Comparison == |
== Comparison == |
||
Line 84: | Line 84: | ||
Venn diagrams created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.] |
Venn diagrams created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.] |
||
− | >R CMD BATCH <span class="plainlinks">[https:// |
+ | >R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/all_Evalues.R.html all_Evalues.R]</span> |
== Multiple sequence alignments == |
== Multiple sequence alignments == |
||
=== Results === |
=== Results === |
||
− | The following commands were used in our bash script [https:// |
+ | The following commands were used in our bash script [https://dl.dropbox.com/u/13796643/fabry/msa_scripts/calculate_msas.sh.html calculate_msas.sh] to generate the multiple sequence alignments. The pictures were obtained by using [http://www.jalview.org/ jalview]. |
$ clustalw -infile="<filename>.fasta" -outfile="msa/clustalw_<filename>.msa" & |
$ clustalw -infile="<filename>.fasta" -outfile="msa/clustalw_<filename>.msa" & |
||
Line 100: | Line 100: | ||
-outfile "msa/3Dcoffee_<filename>.msa" & |
-outfile "msa/3Dcoffee_<filename>.msa" & |
||
− | We counted the number of gaps and conserved columns with the perl script [https:// |
+ | We counted the number of gaps and conserved columns with the perl script [https://dl.dropbox.com/u/13796643/fabry/msa_scripts/countGaps.pl.html countGaps.pl]. There is also a small wrapper script - [https://dl.dropbox.com/u/13796643/fabry/msa_scripts/countAllGaps.sh.html countAllGaps.sh] which runs countGaps.pl on all .msa files in a specific folder: |
#!/bin/bash |
#!/bin/bash |
||
for file in msa/*.msa; do |
for file in msa/*.msa; do |
||
− | perl <span class="plainlinks">[https:// |
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/msa_scripts/countGaps.pl.html countGaps.pl]</span> "$file" > "${file%.*}.counts" |
done |
done |
||
Revision as of 16:57, 7 May 2012
Fabry Disease » Sequence alignments » Journal
Please see Task 2 Scripts for the used scripts.
Contents
Sequence searches
Blast
We searched the "big80" database with Blast with the following command:
blastall -p blastp -d /mnt/project/pracstrucfunc12/data/big/big_80 -i P06280.fasta -m 0 -o blastsearch_default.out -v 700 -b 700 perl extract_ids_blast.pl blastsearch_default.out perl ../download-annotation.pl blastsearch_default_ids.html perl ../compare_GO_terms.pl P06280 blastsearch_default_ids_GOterms.tsv perl parse_blast.pl blastsearch_default.out R CMD BATCH hist_blast.R
Psi-Blast
The following command was used to run Psi-Blast with AGAL as query sequence against big80. It was run with two and ten iterations configured and an e-value cut-off of 2e-3 and 1e-9, respectively.
$ bash run_psi_blast.sh &> run_psi_blast.log
The log file contains the runtimes of the different psi-blast runs:
Iterations | E-value cut-off | Runtime |
---|---|---|
2 | 2e-3 | 3m0.814s |
2 | 1e-9 | 3m9.422s |
10 | 2e-3 | 14m29.179s |
10 | 1e-9 | 15m39.251s |
Afterwards, the psi-blast output was parsed to collect the all the information about all the hits of the last iteration, which include the e-value, the sequence identity, the coverage in the longer sequence of the pairwise alignment and the length of the alignment. When there were more than one alignment per hit, we used the first one which was also listed in the short result output.
$ for i in psi_results_*.html; do
perl parse_psiblast.pl "$i" > "${i%.*}.stats"
done
The histograms were generated with the generate_histograms.sh script:
$ bash generate_histograms.sh *.stats
GO term comparison
perl ../download-annotation.pl ids_psiblast_10th_round.html perl ../compare_GO_terms.pl P06280 ids_psiblast_10th_round_GOterms.tsv R CMD BATCH hist_psiblast.R
HHblits / HHsearch
We searched the "big80" database with HHblits using the default settings and also with the maximum number of possible iterations (8) with the following commands:
time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_default.out -E 0.003 -z 700 ./extract_ids_hhblits.sh hhblits_default.out perl parse_hhblits.pl hhblits_default.out perl ../download-annotation.pl hhblits_default.out_cluster_ids_only.tsv perl ../compare_GO_terms.pl P06280 hhblits_default.out_cluster_ids_only_GOterms.tsv time hhblits -i ../P06280.fasta -d /mnt/project/pracstrucfunc12/data/hhblits/uniprot20_current -e 0.003 -o hhblits_n8_neu.out -E 0.003 -n 8 -z 800 -b 800 ./extract_ids_hhblits.sh hhblits_n8_neu.out perl parse_hhblits.pl hhblits_n8_neu.out perl ../download-annotation.pl hhblits_n8_neu.out_cluster_ids_only.tsv perl ../compare_GO_terms.pl P06280 hhblits_n8_neu.out_cluster_ids_only_GOterms.tsv R CMD BATCH hist_hhblits.R
Comparison
Venn diagrams created with Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams.
>R CMD BATCH all_Evalues.R
Multiple sequence alignments
Results
The following commands were used in our bash script calculate_msas.sh to generate the multiple sequence alignments. The pictures were obtained by using jalview.
$ clustalw -infile="<filename>.fasta" -outfile="msa/clustalw_<filename>.msa" & $ muscle -in "<filename>.fasta" -out "msa/muscle_<filename>.msa" & $ /mnt/opt/T-Coffee/bin/t_coffee -seq "<filename>.fasta" -outfile "msa/tcoffe_<filename>.msa" & $ /mnt/opt/T-Coffee/bin/t_coffee -seq "<filename>.fasta" -method sap_pair -template_file "<filename>.pdb" \ -outfile "msa/3Dcoffee_<filename>.msa" &
We counted the number of gaps and conserved columns with the perl script countGaps.pl. There is also a small wrapper script - countAllGaps.sh which runs countGaps.pl on all .msa files in a specific folder:
#!/bin/bash
for file in msa/*.msa; do
perl countGaps.pl "$file" > "${file%.*}.counts"
done