Sequence searches and multiple sequence alignments (Phenylketonuria)
Contents
Summary of the task
In this task we compare the protein sequence of interest, in this case the phenylalanine hydroxylase (PAH), to other protein sequences. Therefore both sequence searches and multiple sequence alignments were done using the big80 database meaning a database that contains subsets of swissprot and pdb, where the entries have a sequence similarity of 80% or less. Furthermore searches against a pdb database were done. For sequence searches the programs BLAST, PSIBLAST and HHblits are used. Their results were taken for the creation of multiple sequence alignments (MSA) using he methods ClustalW, Muscle and TCoffee.
Sequence searches
The following invocations were used for Blast, PSI-Blast and HHBlits:
BLAST (Basic Local Alignment Search Tool)
blastall -p blastp -d /mnt/project/rost_db/data/big/big_80 -i /mnt/home/student/worfk /Masterpractical/Task2/PAH.fasta -o /mnt/home/student/worfk/Masterpractical/Task2/Blast/PAH _Blast_big_80.out -v 2000 -b 2000
PSI-BLAST (Position-Specific Iterated BLAST)
For PSI-Blast (PSI-BLAST Tutorial) more than one vocation was performed. First two iterations were done with an E-value cutoff of 0.002 and then again with cutoff 10E-10. The same for ten iterations. An example vocation would be:
blastpgp -i /mnt/home/student/worfk/Masterpractical/Task2/PAH.fasta -d /mnt/project/rost_db /data/big/big_80 -j 2 -h 0.002 -v 2000 -b 2000 -o psi_blast_big_80_2_2.out -C big_80_check_ 2_2.chk -Q big_80_matrix_2_2.pssm
HHblits
hhblits -i /mnt/home/student/waldraffs/Masterpraktikum/PAH.fasta -d /mnt/project/rost_db/data/hhblits/uniprot20_02Sep11 -o /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.hrr -oa3m /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.a3m -ohhm /mnt/home/student/waldraffs/Masterpraktikum/PAH_2000.hhm -Z 2000 -B 2000
To perform all programms at once, one could use the Perl-script from Maria, like shown here:
perl /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1/run.pl ...
Comparison of the results
- Sequence identity in percent
- E-Value
- GO-terms
For the reference sequence (P00432) following GO-terms were found on QuickGO: ...
To look for similarities between the reference sequence and the sequences found in the searches, those terms are counted. ...
Multiple sequence alignments
Datasets
For the multiple sequence alignments three different datasets were generated with a python script. One group of ten sequences with higher than 60% sequence identity to our target sequence (PAH gene) including two pdb-sequences, another group with 20 sequences between 30% and 60% sequence identity containing four pdb-sequences and one group with ten seuqences with lower sequence identity than 30%. In this group the pdb sequences have a 32% identity, because these are the lowest ones found in the Blast output against the pdb dataset.
is | |
---|---|
a | table |
ClustalW
...
Muscle
...
T-Coffee
...