Canavan Disease: Task 02 - Journal
Link to back Task 02: Alignments
Contents
Pairwise Sequence Alignments
Command Lines
The following command lines were used to generate the pairwise sequence alignments:
Blast:
blastall -p blastp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big_80 -o /folders/outfile.txt -v 20000 -b 20000 where: -p kind of blast -i input file -d database to search against -o outfile -v number of one line descriptions to show -b number of database sequences to show alignment for
HHblits:
hhblits -i /folders/ASPA.fasta -o /folders/hhblits.out -d /folders/rost_db/data/hhblits/uniprot20_02Sept11 -oa3m /folders/hhblits.a3m where: -i infile -o outfile -d database to search against -oa3m intermediate file: result msa with significan matches in a3m format
PsiBlast:
blastpgp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big_80 -o /folders/outfile.txt -j #iteration -h #eValCutoff -C /folders/checkfileOut.chk -Q/folders/pssmMatrixOut.pssm -v 20000 -b 20000 where: -i infile -d database to search against -o outfile -j number of iterations -h eValue Cutoff -C checkfile -Q pssm matrix -v number of one line descriptions to show -b number of database sequences to show alignment for
Big:
blastpgp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big -o /folders/outfile.txt -j 1 -h 10e-10 -Q /folders/pssmMatrixOut.pssm -R /folders/PsiBlast10it10cut_big80.chk -v 20000 -b 20000 where: -i infile -d database to search against -o outfile -j number of iterations -h eValue Cutoff -Q pssm matrix -R input file for psi-blast restart -v number of one line descriptions to show -b number of database sequences to show alignment for
Parsers and Programs
Several Python and R Scripts were programed to get all results. Those were several parsers to find e-value composition, sequence identity, GO annotations and common sequences. These can be viewed on request.
For the validation using GeneOntology the following Python-script was written:
A simple start to find GOAnnotations of Protein B2JCG3 would be:
python goAnnotation.py B2JCG3 /Desktop/result.out
(the resulting file is saved to /Desktop/result.out)
#! /usr/bin/python import sys, urllib ###### # Main Method # @author: ariane ###### def main(): progname = sys.argv[0] if len(sys.argv) > 3: sys.exit("Usage: %s PROTEIN OUTFILE" %progname) try: infile = sys.argv[1] outfile = sys.argv[2] except IndexError: sys.exit("Usage: %s PROTEIN OUTFILE" %progname) GOannotation(infile, outfile) ###### # GOAnnotation ###### def GOannotation(Gprotein, outfilename): out = open(outfilename, 'wb') url = "http://www.ebi.ac.uk/QuickGO/GAnnotation?protein={0}&format=tsv".format(Gprotein) website = urllib.urlopen(url) for line in website: temp = line.split("\t") GOid = temp[6] GOname = temp[7] outfile = "{0}\t{1}\n".format(GOid, GOname) out.write(outfile) website.close() out.close() ################## ######END######### ################## if __name__ == '__main__': try: main() except KeyboardInterrupt: pass
Multiple Sequence Alignments
Parsers and Programs
Several Python and R Scripts were programed to get all results. Those were programed to find sets, finding conserved blocks in a consensus sequence or do statistics. These can also be viewed on request.