Difference between revisions of "Gaucher Task06 Protocol"
(Created page with "== Sources == You can checkout the git repository containing all relevant data an scripts by: <pre> git clone /mnt/home/student/angermue/mp/tasks/task06 </pre> == PSSM == We cr…") |
(→SNAP) |
||
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | == Mutation list == |
||
+ | We need a file "mutations.txt" containing the mutations for using SIFT and SNAP: |
||
+ | H99R |
||
+ | V211I |
||
+ | E150K |
||
+ | L236P |
||
+ | W248R |
||
+ | L509P |
||
+ | W351C |
||
+ | A423D |
||
+ | D482N |
||
+ | R83S |
||
+ | |||
== Sources == |
== Sources == |
||
You can checkout the git repository containing all relevant data an scripts by: |
You can checkout the git repository containing all relevant data an scripts by: |
||
Line 10: | Line 23: | ||
blastpgp -i data/P04062.seq -d $NR -j 5 -h 1e-3 -b 1000 -o pssm/all/P04062.bla -Q pssm/all/P04062.pssm |
blastpgp -i data/P04062.seq -d $NR -j 5 -h 1e-3 -b 1000 -o pssm/all/P04062.bla -Q pssm/all/P04062.pssm |
||
</pre> |
</pre> |
||
+ | |||
+ | We used the script <tt>alignhits.pl</tt> from the HHsuite for filtering out the most similar hits from the PSI-BLAST result file: |
||
+ | <pre> |
||
+ | alignhits.pl -Q data/P04062.seq -qsc 1.5 pssm/all/P04062.bla pssm/best/P04062.psi |
||
+ | </pre> |
||
+ | |||
+ | The PSSM for the resulting PSI-BLAST alignment was computed as follows: |
||
+ | <pre> |
||
+ | blastpgp -i data/P04062.seq -B pssm/best/P04062.psi -d $DUMMY -j 0 -Q pssm/best/P04062.pssm |
||
+ | </pre> |
||
+ | |||
+ | == SIFT == |
||
+ | |||
+ | We used the [http://sift.jcvi.org/www/SIFT_seq_submit2.html online server of SIFT]. It took a little bit long (10-15 min) because they have to search for the related sequences in database.<br/> |
||
+ | |||
+ | Input: the protein sequence [http://www.uniprot.org/uniprot/P04062.fasta P04062], the list of Mutations. Other setting default. |
||
+ | |||
+ | Alternatively, the [http://sift.jcvi.org/www/SIFT_BLink_submit.html online server of SIFT Blink] was used. The predictions there are based on pre-computed BLAST searches, therefore are returned almost immediately. For SIFT Blink, we should provide the corresponding NCBI GI number ([http://www.ncbi.nlm.nih.gov/protein/CAI95090.1 66347912]) for our protein (UniProt id: [http://www.uniprot.org/uniprot/P04062 P04062]).<br/> |
||
+ | |||
+ | Input: the corresponding NCBI GI number ([http://www.ncbi.nlm.nih.gov/protein/CAI95090.1 66347912]), the list of Mutations. Other setting default. |
||
+ | |||
+ | == PlyPhen2 == |
||
+ | |||
+ | We used the [http://genetics.bwh.harvard.edu/pph2/ online server of PolyPhen-2]. <br/> |
||
+ | |||
+ | Input: the protein sequence [http://www.uniprot.org/uniprot/P04062.fasta P04062], the position of the mutant, wildtype residue and the mutant. Other setting default. |
||
+ | |||
+ | == SNAP == |
||
+ | |||
+ | The [http://rostlab.org/services/snap/submit web site version of SNAP] seems not work. [https://rostlab.org/owiki/index.php/Snap SNAP] is also installed on the student cluster and should be used command-line only. We need to create our own '~/.snapfunrc' file to point to the correct paths. |
||
+ | vi ~/.snapfunrc |
||
+ | The '~/.snapfunrc' file should have the following contents (<span style="color:red">note that the path to pfam database should be changed compared to the given file '<code>~/[[.snapfunrc]]</code>', otherwise "Failed to open HMM database..." will be reported </span>): |
||
+ | |||
+ | [snapfun] |
||
+ | # snapfun_utildir=path - path to package utilities, default: /usr/share/snapfun |
||
+ | snapfun_utildir=/usr/share/snapfun |
||
+ | # librg_utils_perl=path - path to librg-utils-perl utilities, default: /usr/share/librg-utils-perl |
||
+ | librg_utils_perl=/usr/share/librg-utils-perl |
||
+ | # blastpgp_segfilter=[T|F] - SEG filtering for blastpgpg, untested, default: F |
||
+ | blastpgp_segfilter=F |
||
+ | # blastpgp_processors=int - number of processors to use with blastpgp and hmmpfam, default: 1 |
||
+ | blastpgp_processors=1 |
||
+ | # hmmpfam executable |
||
+ | hmmpfam=hmm2pfam |
||
+ | # psic matrix |
||
+ | psic_matrix=/usr/share/psic/blosum62_psic.txt |
||
+ | # psic runner |
||
+ | runpsic=/usr/share/rost-runpsic/runNewPSIC.pl |
||
+ | # default mode of the program: [published|optimized|[predictprotein] |
||
+ | mode=optimized |
||
+ | |||
+ | [data] |
||
+ | # swiss_dat=path - location of UniProt/Swiss-Prot dat file |
||
+ | swiss_dat=/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot.dat |
||
+ | # db_swiss=path - path to ID index of Swiss-Prot dat file (generated by /usr/share/librg-utils-perl/dbSwiss.pl) |
||
+ | db_swiss=/mnt/project/pracstrucfunc12/data/swissprot/dbswiss |
||
+ | # pfamdata=path - path to pfam database |
||
+ | pfamdata=/mnt/project/pracstrucfunc12/data/pfam_legacy/Pfam_ls |
||
+ | |||
+ | [blast] |
||
+ | # uniref=path - path to comprehensive sequence database (UniRef or equivalent) |
||
+ | uniref=/mnt/project/pracstrucfunc12/data/big/big |
||
+ | # uniref90=path - path to redundancy reduced database (UniRef90 or equivalent) |
||
+ | uniref90=/mnt/project/pracstrucfunc12/data/big/big_80 |
||
+ | # swiss=path - path to SwissProt database |
||
+ | swiss=/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot |
||
+ | |||
+ | To run SNAP: |
||
+ | snapfun -i P04062.fasta -m mutations.txt -o snapfun_out.out |
Latest revision as of 13:12, 18 June 2012
Contents
Mutation list
We need a file "mutations.txt" containing the mutations for using SIFT and SNAP:
H99R V211I E150K L236P W248R L509P W351C A423D D482N R83S
Sources
You can checkout the git repository containing all relevant data an scripts by:
git clone /mnt/home/student/angermue/mp/tasks/task06
PSSM
We created the PSSM as follows:
blastpgp -i data/P04062.seq -d $NR -j 5 -h 1e-3 -b 1000 -o pssm/all/P04062.bla -Q pssm/all/P04062.pssm
We used the script alignhits.pl from the HHsuite for filtering out the most similar hits from the PSI-BLAST result file:
alignhits.pl -Q data/P04062.seq -qsc 1.5 pssm/all/P04062.bla pssm/best/P04062.psi
The PSSM for the resulting PSI-BLAST alignment was computed as follows:
blastpgp -i data/P04062.seq -B pssm/best/P04062.psi -d $DUMMY -j 0 -Q pssm/best/P04062.pssm
SIFT
We used the online server of SIFT. It took a little bit long (10-15 min) because they have to search for the related sequences in database.
Input: the protein sequence P04062, the list of Mutations. Other setting default.
Alternatively, the online server of SIFT Blink was used. The predictions there are based on pre-computed BLAST searches, therefore are returned almost immediately. For SIFT Blink, we should provide the corresponding NCBI GI number (66347912) for our protein (UniProt id: P04062).
Input: the corresponding NCBI GI number (66347912), the list of Mutations. Other setting default.
PlyPhen2
We used the online server of PolyPhen-2.
Input: the protein sequence P04062, the position of the mutant, wildtype residue and the mutant. Other setting default.
SNAP
The web site version of SNAP seems not work. SNAP is also installed on the student cluster and should be used command-line only. We need to create our own '~/.snapfunrc' file to point to the correct paths.
vi ~/.snapfunrc
The '~/.snapfunrc' file should have the following contents (note that the path to pfam database should be changed compared to the given file '~/.snapfunrc
', otherwise "Failed to open HMM database..." will be reported ):
[snapfun] # snapfun_utildir=path - path to package utilities, default: /usr/share/snapfun snapfun_utildir=/usr/share/snapfun # librg_utils_perl=path - path to librg-utils-perl utilities, default: /usr/share/librg-utils-perl librg_utils_perl=/usr/share/librg-utils-perl # blastpgp_segfilter=[T|F] - SEG filtering for blastpgpg, untested, default: F blastpgp_segfilter=F # blastpgp_processors=int - number of processors to use with blastpgp and hmmpfam, default: 1 blastpgp_processors=1 # hmmpfam executable hmmpfam=hmm2pfam # psic matrix psic_matrix=/usr/share/psic/blosum62_psic.txt # psic runner runpsic=/usr/share/rost-runpsic/runNewPSIC.pl # default mode of the program: [published|optimized|[predictprotein] mode=optimized [data] # swiss_dat=path - location of UniProt/Swiss-Prot dat file swiss_dat=/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot.dat # db_swiss=path - path to ID index of Swiss-Prot dat file (generated by /usr/share/librg-utils-perl/dbSwiss.pl) db_swiss=/mnt/project/pracstrucfunc12/data/swissprot/dbswiss # pfamdata=path - path to pfam database pfamdata=/mnt/project/pracstrucfunc12/data/pfam_legacy/Pfam_ls [blast] # uniref=path - path to comprehensive sequence database (UniRef or equivalent) uniref=/mnt/project/pracstrucfunc12/data/big/big # uniref90=path - path to redundancy reduced database (UniRef90 or equivalent) uniref90=/mnt/project/pracstrucfunc12/data/big/big_80 # swiss=path - path to SwissProt database swiss=/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot
To run SNAP:
snapfun -i P04062.fasta -m mutations.txt -o snapfun_out.out