Canavan Disease: Task 02 - Journal

From Bioinformatikpedia

Link to back Task 02: Alignments

Pairwise Sequence Alignments

Command Lines

The following command lines were used to generate the pairwise sequence alignments:

Blast:

blastall -p blastp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big_80 -o /folders/outfile.txt -v 20000 -b 20000

   where:
-p kind of blast
-i input file
-d database to search against
-o outfile
-v number of one line descriptions to show
-b number of database sequences to show alignment for

HHblits:

hhblits -i /folders/ASPA.fasta -o /folders/hhblits.out -d /folders/rost_db/data/hhblits/uniprot20_02Sept11 -oa3m /folders/hhblits.a3m

   where:
-i infile
-o outfile
-d database to search against
-oa3m intermediate file: result msa with significan matches in a3m format

PsiBlast:

blastpgp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big_80 -o /folders/outfile.txt -j #iteration -h #eValCutoff
 -C /folders/checkfileOut.chk -Q/folders/pssmMatrixOut.pssm -v 20000 -b 20000

   where:
-i infile
-d database to search against
-o outfile
-j number of iterations
-h eValue Cutoff
-C checkfile
-Q pssm matrix
-v number of one line descriptions to show
-b number of database sequences to show alignment for

Big:

blastpgp -i /folders/ASPA.fasta -d /folders/pracstrucfunc13/data/big/big -o /folders/outfile.txt -j 1 -h 10e-10
 -Q /folders/pssmMatrixOut.pssm -R /folders/PsiBlast10it10cut_big80.chk -v 20000 -b 20000

   where:
-i infile
-d database to search against
-o outfile
-j number of iterations
-h eValue Cutoff
-Q pssm matrix
-R input file for psi-blast restart
-v number of one line descriptions to show
-b number of database sequences to show alignment for

Parsers and Programs

Several Python and R Scripts were programed to get all results. Those were several parsers to find e-value composition, sequence identity, GO annotations and common sequences. These can be viewed on request.

For the validation using GeneOntology the following Python-script was written: A simple start to find GOAnnotations of Protein B2JCG3 would be:
python goAnnotation.py B2JCG3 /Desktop/result.out
(the resulting file is saved to /Desktop/result.out)

#! /usr/bin/python

import sys, urllib

######
# Main Method
# @author: ariane
######
def main():

	progname = sys.argv[0]
	if len(sys.argv) > 3:
		sys.exit("Usage: %s PROTEIN OUTFILE" %progname)
	try:
		infile = sys.argv[1]
		outfile = sys.argv[2]
	except IndexError:
		sys.exit("Usage: %s PROTEIN OUTFILE" %progname)
	
	GOannotation(infile, outfile)
	
######
# GOAnnotation
######
def GOannotation(Gprotein, outfilename):

	out = open(outfilename, 'wb')

	url = "http://www.ebi.ac.uk/QuickGO/GAnnotation?protein={0}&format=tsv".format(Gprotein)
	website = urllib.urlopen(url) 

	for line in website:
		temp = line.split("\t")
		GOid = temp[6]
		GOname = temp[7]
		outfile = "{0}\t{1}\n".format(GOid, GOname)
		out.write(outfile)
	website.close()

	out.close()


##################
######END#########
##################
if __name__ == '__main__':
	try:
		main()
	except KeyboardInterrupt:
		pass

Multiple Sequence Alignments

Parsers and Programs

Several Python and R Scripts were programed to get all results. Those were programed to find sets, finding conserved blocks in a consensus sequence or do statistics. These can also be viewed on request.