Difference between revisions of "Fabry:Sequence-based analyses/Journal"

From Bioinformatikpedia
(Signal peptides: description added)
 
(9 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
::- Whatever is said in Latin sounds profound.
 
::- Whatever is said in Latin sounds profound.
   
TODO for Julia: Add links to scripts! --[[User:Rackersederj|Rackersederj]] 13:27, 12 May 2012 (UTC)
 
   
 
== Secondary structure ==
 
== Secondary structure ==
  +
=== ReProf ===
  +
  +
bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/run_reprof.sh.html run_reprof.sh]</span>
  +
bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_reprof.sh.html parse_reprof.sh]</span> *.reprof
  +
  +
The *.reprof files contain the raw output from ReProf whereas the *.reprof.ss files only contain the extracted precidcted secondary structure of the protein.
  +
  +
=== PsiPred ===
  +
  +
For the PsiPred prediction, a [http://bioinf.cs.ucl.ac.uk/psipred/ PsiPred Server] was used. The results were parsed using the scripts [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_psipred_ss2.sh.html parse_psipred_ss2.sh] for the PSIPRED HFORMAT and [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_psipred_txt.sh.html parse_psipred_txt.sh] for the PSIPRED VFORMAT. The resulting secondary strcutres were consistent with each other.
  +
  +
bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_psipred_ss2.sh.html parse_psipred_ss2.sh]</span> *.ss2
  +
bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_psipred_txt.sh.html parse_psipred_txt.sh]</span> *_psipass2.txt
  +
  +
=== DSSP ===
  +
  +
For the DSSP anaylsis, the following PDB files were obtained from pdb.org:
  +
  +
{| style="border-spacing: 0em; text-align: center; margin: 2em 0px;"
  +
|-
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 1em;" | UniProt-AC
  +
! scope="col" style="border-bottom: 2px solid #000; padding: 0px 1em; border-left: 2px solid #000;" | PDB-ID
  +
|-
  +
| [http://www.uniprot.org/uniprot/P06280 P06280]
  +
| style="border-left: 2px solid #000;" | [http://www.pdb.org/pdb/explore/explore.do?structureId=1R46 1R46]
  +
|-
  +
| [http://www.uniprot.org/uniprot/P10775 P10775]
  +
| style="border-left: 2px solid #000;" | [http://www.pdb.org/pdb/explore/explore.do?structureId=2BNH 2BNH]
  +
|-
  +
| [http://www.uniprot.org/uniprot/Q08209 Q08209]
  +
| style="border-left: 2px solid #000;" | [http://www.pdb.org/pdb/explore/explore.do?structureId=1AUI 1AUI]
  +
|-
  +
| [http://www.uniprot.org/uniprot/Q9X0E6 Q9X0E6]
  +
| style="border-left: 2px solid #000;" | [http://www.pdb.org/pdb/explore/explore.do?structureId=1KR4 1KR4]
  +
|-
  +
|}
  +
  +
After the DSSP results where fetched from the [http://mrs.cmbi.ru.nl/hsspsoap/ DSSP Server], the secondary structure was extraced from the output:
  +
  +
perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/parse_dssp.pl.html parse_dssp.pl]</span> *.dssp
   
 
== Disorder ==
 
== Disorder ==
  +
IUPred was run on all the proteins in each result type mode and the output is stored in files named <UniProt-AC>.<type>, e.g. P06280.long.
  +
bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/run_iupred.sh.html run_iupred.sh]</span>
   
 
== Transmembrane helices ==
 
== Transmembrane helices ==
First we obtained the fasta sequences with the help of the bash script [ get_sequences.sh]
+
First we obtained the fasta sequences with the help of the bash script [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/get_sequences.sh.html get_sequences.sh]
   
  +
./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/get_sequences.sh.html get_sequences.sh]</span>
./[ get_sequences.sh]
 
   
With these sequences we carried out a blast search, using Polyphobius' blastget. The output we used for a multiple sequence alignment derived from Kalign and used this MSA as input for PolyPhobius. All this was done with the script [ call_polyphobius.sh]
+
With these sequences we carried out a blast search, using Polyphobius' blastget. The output we used for a multiple sequence alignment derived from Kalign and used this MSA as input for PolyPhobius. All this was done with the script [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/call_polyphobius.sh.html call_polyphobius.sh]
  +
./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/call_polyphobius.sh.html call_polyphobius.sh]</span>
./[ call_polyphobius.sh]
 
All pictures were autogenerated by the databases and programs. We executed an extra Polyphobius online search for each protein to obtain additional data and plots.
+
Most of the pictures were autogenerated by the databases and programs. We executed an extra Polyphobius online search for each protein to obtain additional data and plots.<br>
  +
For plotting the comparison of the length distribution, we used the R script [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/length_distr.R.html length_distr.R]
  +
R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/length_distr.R.html length_distr.R]</span>
   
 
== Signal peptides ==
 
== Signal peptides ==
Again, we obtained the fasta files for the given proteins and our own desease causing AGAL with a script called [ get_sequences_sp.sh]
+
Again, we obtained the fasta files for the given proteins and our own desease causing AGAL with a script called [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/get_sequences_sp.sh.html get_sequences_sp.sh]
  +
./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/get_sequences_sp.sh.html get_sequences_sp.sh]</span>
./[ get_sequences_sp.sh]
 
 
Since by the time we worked on SignalP, the version 4.0 was not working yet, we used the [http://www.cbs.dtu.dk/services/SignalP/ SignalP] server, version 4.0. <br>
 
Since by the time we worked on SignalP, the version 4.0 was not working yet, we used the [http://www.cbs.dtu.dk/services/SignalP/ SignalP] server, version 4.0. <br>
 
To gain additional information we performed another Polyphobius prediction with all proteins.
 
To gain additional information we performed another Polyphobius prediction with all proteins.
  +
./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/call_polyphobius_sp.sh.html call_polyphobius_sp.sh]</span>
./[ call_polyphobius_sp.sh]
 
 
For better demonstration, we depicted Serum Albumin's signal peptide in Pymol using following commands:
 
For better demonstration, we depicted Serum Albumin's signal peptide in Pymol using following commands:
  +
orient
(see create_pic_pdb.txt)
 
  +
hide everything,all
  +
show cartoon, all
  +
  +
color cyan, ss h
  +
color yellow, ss s
  +
color purple, ss ""
  +
  +
select signPep, resi 1-18
  +
color red, signPep
  +
  +
bg_color gray70
  +
ray
  +
(see [https://www.dropbox.com/s/04qvke9o6brnxmi/create_pic_pdb.txt create_pic_pdb.txt])
 
Besides from that all pictures were obtained from databases and programs.
 
Besides from that all pictures were obtained from databases and programs.
   
 
== GO terms ==
 
== GO terms ==
  +
Since all predictions were performed online, no script was used.
  +
We used the R Script [https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/protfun_category_barplots.R.html protfun_category_barplots.R] to illustrate the results.
  +
R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/seq_analysis_scripts/protfun_category_barplots.R.html protfun_category_barplots.R]</span>
  +
   
 
[[Category: Fabry Disease 2012]]
 
[[Category: Fabry Disease 2012]]

Latest revision as of 15:02, 11 June 2012

Fabry Disease » Sequence-based analyses » Journal



Quidquid latine dictum sit, altum sonatur.

- Whatever is said in Latin sounds profound.


Secondary structure

ReProf

bash run_reprof.sh
bash parse_reprof.sh *.reprof

The *.reprof files contain the raw output from ReProf whereas the *.reprof.ss files only contain the extracted precidcted secondary structure of the protein.

PsiPred

For the PsiPred prediction, a PsiPred Server was used. The results were parsed using the scripts parse_psipred_ss2.sh for the PSIPRED HFORMAT and parse_psipred_txt.sh for the PSIPRED VFORMAT. The resulting secondary strcutres were consistent with each other.

bash parse_psipred_ss2.sh *.ss2
bash parse_psipred_txt.sh *_psipass2.txt

DSSP

For the DSSP anaylsis, the following PDB files were obtained from pdb.org:

UniProt-AC PDB-ID
P06280 1R46
P10775 2BNH
Q08209 1AUI
Q9X0E6 1KR4

After the DSSP results where fetched from the DSSP Server, the secondary structure was extraced from the output:

perl parse_dssp.pl *.dssp

Disorder

IUPred was run on all the proteins in each result type mode and the output is stored in files named <UniProt-AC>.<type>, e.g. P06280.long.

bash run_iupred.sh

Transmembrane helices

First we obtained the fasta sequences with the help of the bash script get_sequences.sh

./get_sequences.sh

With these sequences we carried out a blast search, using Polyphobius' blastget. The output we used for a multiple sequence alignment derived from Kalign and used this MSA as input for PolyPhobius. All this was done with the script call_polyphobius.sh

./call_polyphobius.sh

Most of the pictures were autogenerated by the databases and programs. We executed an extra Polyphobius online search for each protein to obtain additional data and plots.
For plotting the comparison of the length distribution, we used the R script length_distr.R

R CMD BATCH length_distr.R

Signal peptides

Again, we obtained the fasta files for the given proteins and our own desease causing AGAL with a script called get_sequences_sp.sh

./get_sequences_sp.sh

Since by the time we worked on SignalP, the version 4.0 was not working yet, we used the SignalP server, version 4.0.
To gain additional information we performed another Polyphobius prediction with all proteins.

./call_polyphobius_sp.sh

For better demonstration, we depicted Serum Albumin's signal peptide in Pymol using following commands:

orient
hide everything,all
show cartoon, all

color cyan, ss h
color yellow, ss s
color purple, ss "" 

select signPep, resi 1-18
color red, signPep

bg_color gray70
ray 
(see create_pic_pdb.txt)

Besides from that all pictures were obtained from databases and programs.

GO terms

Since all predictions were performed online, no script was used. We used the R Script protfun_category_barplots.R to illustrate the results.

R CMD BATCH protfun_category_barplots.R