Sequence searches and multiple sequence alignments (Phenylketonuria)

From Bioinformatikpedia
Revision as of 20:20, 5 May 2013 by Worfk (talk | contribs) (Datasets)

Summary of the task

In this task we compare the protein sequence of interest, in this case the phenylalanine hydroxylase (PAH), to other protein sequences. Therefore both sequence searches and multiple sequence alignments were done using the big80 database meaning a database that contains subsets of swissprot and pdb, where the entries have a sequence similarity of 80% or less. Furthermore searches against a pdb database were done. For sequence searches the programs BLAST, PSIBLAST and HHblits are used. Their results were taken for the creation of multiple sequence alignments (MSA) using he methods ClustalW, Muscle and TCoffee.

Sequence searches

The following invocations were used for Blast, PSI-Blast and HHBlits:

BLAST (Basic Local Alignment Search Tool)

blastall -p blastp -d /mnt/project/rost_db/data/big/big_80 -i /mnt/home/student/worfk
/Masterpractical/Task2/PAH.fasta -o /mnt/home/student/worfk/Masterpractical/Task2/Blast/PAH
_Blast_big_80.out -v 2000 -b 2000

PSI-BLAST (Position-Specific Iterated BLAST)

For PSI-Blast (PSI-BLAST Tutorial) more than one vocation was performed. First two iterations were done with an E-value cutoff of 0.002 and then again with cutoff 10E-10. The same for ten iterations. An example vocation would be:

blastpgp -i /mnt/home/student/worfk/Masterpractical/Task2/PAH.fasta -d /mnt/project/rost_db
/data/big/big_80 -j 2 -h 0.002 -v 2000 -b 2000 -o psi_blast_big_80_2_2.out -C big_80_check_
2_2.chk -Q big_80_matrix_2_2.pssm

HHblits

hhblits -i /mnt/home/student/waldraffs/Masterpraktikum/PAH.fasta -d /mnt/project/rost_db/data/hhblits/uniprot20_02Sep11 
-o /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.hrr -oa3m /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.a3m 
-ohhm /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.hhm -Z 2000 -B 2000


To perform all programms at once, one could use the Perl-script from Maria, like shown here:

perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/run.pl ...

Comparison of the results

  • Sequence identity in percent
  • E-Value
  • GO-terms

For the reference sequence (P00432) following GO-terms were found on QuickGO: ...

To look for similarities between the reference sequence and the sequences found in the searches, those terms are counted. ...

Multiple sequence alignments

Datasets

For the multiple sequence alignments three different datasets were generated with a python script. One group of ten sequences with higher than 60% sequence identity to our target sequence (PAH gene) including two pdb-sequences, another group with 20 sequences between 30% and 60% sequence identity containing four pdb-sequences and one group with ten seuqences with lower sequence identity than 30%. In this group the pdb sequences have a 32% identity, because these are the lowest ones found in the Blast output against the pdb dataset.

is
a table

ClustalW

...

Muscle

...

T-Coffee

...