Lab journal task 4

From Bioinformatikpedia

Explore structural alignments

The structure assembly was done manually using the results from the sequence search, but also browsing CATH. We constructed pairwise sequence alignments with EMBOSS Needle (http://www.ebi.ac.uk/Tools/psa/emboss_needle/) using default parameters to compute the pairwise sequence identity.

Pymol was used to view the different structures and to construct structural alignments to the template 1A6Z_A. Therfore we used the command " align to structure" from the action menu.

The pymol alignment uses only the C_alpha atoms per default. The following command can be used to align all atoms and create the object ali12

align structure1 and resi 1-n, structure2 and resi 1-m, object=ali12 

,where n equals the number of atoms in structure 1 and m the number of atoms in structure 2.

We saved the current view of the first alignment

my_view = cmd.get_view()

and then used this variable later to set the orientation to these values, in order to be able to make images from the different alignments always using the same view.

cmd.set_view(my_view)

<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 70%; }

.colBasic2 th,td { padding: 3px; border: 2px solid black; }

.colBasic2 td { text-align:left; }

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;} </css>

Apart from Pymol, the following four alignment methods were used from their web interface:

method name location
LGA http://proteinmodel.org/AS2TS/LGA/lga.html
SSAP http://protein.hbu.cn/cath/cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GetSsapRasmol.html
TopMatch https://topmatch.services.came.sbg.ac.at/
CE http://source.rcsb.org/jfatcatserver/index.jsp

LGA:

  • superimpose to template structure (second structure, in our case 1A6Z )
  • default parameters

SSAP:

  • no target structure to specify
  • no parameters to set
  • score from 0 to 100

TopMatch:

CE:

  • no template to specify
  • no parameters to set
  • jCE algorithm


Structural alignments for evaluating sequence alignments

The .hhr input file for hhmakemodel.pl was created with hhsearch using the reference structure 1A6Z_A and the current pdb70 database. The maximum # of lines in the summary hit list and the # of alignments were set to 10000:

hhsearch -i /mnt/home/student/betza/data/1A6Z_A.fasta -d -d /mnt/project/pracstrucfunc13/data/hhblits/pdb70_current_hhm_db -o /mnt/home/student/betza/task4/1A6Z_A_pdb.hhr -Z 10000 -B 10000

A models were then created with hhmakemodel.pl

perl /usr/share/hhsuite/scripts/hhmakemodel.pl -i /mnt/home/student/betza/task4/1A6Z_A_pdb.hhr -ts /mnt/home/student/betza/task4/1A6Z_A_models.pdb

The hhsearch results contain 449 proteins. Therfore, we selected 9 proteins that span different ranges of probability, e-value and sequence identity. We did not take proteins with an E-value above 1, because only sequences with a low E-value can be considererd as true hits. The identity was taken from the psiblast runs, but not all of the 9 proteins were contained in the results. For the remaining proteins, we calculated pairwise sequence alignments using the above mentioned EMBOSS Needle with standard parameters.

hhmakemodel.pl was executed for the selected proteins with following command:

perl /usr/share/hhsuite/scripts/hhmakemodel.pl -i /mnt/home/student/betza/task4/1A6Z_A_pdb.hhr -d /mnt/project/pracstrucfunc13/data/pdb/20120401/entries -m 2 30 47 65 103 141 151 160 253 -ts /mnt/home/student/betza/task4/1A6Z_A_models.pdb

The resulting structures from 1A6Z_A_models.pdb wer then extracted into single pdb files. Using the models we then created structural alignments to the reference with LGA.

Pearson's correlation coefficient was computed with R.