Difference between revisions of "Gaucher Disease: Task 04 - Lab Journal"

From Bioinformatikpedia
(Evaluation of structural alignments and sequence alignments)
(Evaluation of structural alignments and sequence alignments)
Line 90: Line 90:
 
|-
 
|-
 
| 31 || 1fob_A || 1.1e-08 || 101.53 || 197 || 16%
 
| 31 || 1fob_A || 1.1e-08 || 101.53 || 197 || 16%
|-
 
| 59 || 1vjz_A || 9.7e-06 || 79.45 || 228 || 15%
 
 
|-
 
|-
 
| 82 || 1ur1_A || 9.8e-05 || 73.33 || 175 || 11%
 
| 82 || 1ur1_A || 9.8e-05 || 73.33 || 175 || 11%

Revision as of 23:44, 19 August 2013

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>

Exploring Structural Alignments

The structure of the Pyridoxal kinase (2F7K) is not from our set, because we did not have any hits with sequence identity <30% to our reference protein. So we extracted 2F7K randomly from COPS entries with a different L30 group. The sequence identity to the reference structure 1ogs_A is determined by CATH (SSAP).

CATH description of different levels

  • 3.20.20 TIM Barrel
  • 3.20 Alpha-Beta Barrel
  • 3 Alpha Beta
  • 1 mostly Alpha


LGA

On the LGA server the PDB IDs of the proteins were used. The method automatically chose chain A. 1. default parameters:

-4 -o2 -gdc

2. with aligned CA atoms:

-4 -o2 -gdc -atom:CA -lga_m

3. with aligned all atoms

-4 -o2 -gdc -ah:0 -lga_m

In the end we decided to use the default parameters (1.), as this results led to better RMSD values than the other parameters (2. und 3.).

Pymol

In Pymol we aligned each structure of our set, shown in table1, to our disease causing protein structure. For this we only used chain A, as we also used only chain A for calculating the rmsd with other methods. On the other hand the steric configuration of both identical chains are aranged different than two chains of another structure. So, even if each chain has a low RMSD to one chain of our protein, the steric configuration can lead to a high RMSD anyway.

To align all atoms:

align 1ogs_A and resi 1-496, structure2 and resi 1-n

To align all C alpha atoms:

align 1ogs_A and resi 1-496 and name ca, structure2 and resi 1-n and name ca

whereas n is the sequence length of structure2.

SSAP

SSAP is the structural alignment method used by CATH. The structures were entered via their PDB ID. In case of several chains, we always used Chain A.

TopMatch

For TopMatch, we used the TopMatch webservice. Only the chains A of the structures are aligned.

SAP

SAP is a pairwise protein structure alignment method which uses double dynamic programming. We used the SAP webservice with the second option: "uploading the PDB files". Other possibility is to enter the PDB IDs. There is no possibility to restrain the alignments to a specific chain, therefore the whole structures were used for superposition. However, because the two chains in each protein are identical, duplicate alignments were eliminated implicitly by the program.

Evaluation of structural alignments and sequence alignments

The script hhmakemodel.pl requires a .hhr file of HHsearch or HHblits with hit list and alignments as an input. An .hhr file created in task 2 was used, namely by running 2 iterations HHblits against uniprot_20 followed by one iteration against pdb_full with the UniProt sequence of Glucocerebrosidase, P04062, as a query. The number of shown hits and alignments was set to 10000. Otherwise default parameters were used. Overall 220 hits were found, from which 82 first have E-values lower then 0 and can be regarded as more or less significant. From those hits we selected the following that span different E-values (lower then 0), scores, alignment lengths and sequence identities:

<figtable id="selected_hits">

No_hit PDB_ID E-value Score Aligned_cols Identities
1 2v3f_A 2.4e-132 1078.26 497 100%
4 2nt0_A 4.4e-132 1074.98 496 100%
5 2wnw_A 3.7e-107 870.15 439 29%
8 3kl0_A 1.1e-77 633.96 356 19%
19 4atw_A 9e-13 138.45 245 14%
31 1fob_A 1.1e-08 101.53 197 16%
82 1ur1_A 9.8e-05 73.33 175 11%
Selected hits with different E-values, HHblits scores, alignment lengths and sequence identities to the query sequence P04062.

</figtable>

Four of the structures that we used in our set in the "Exploring Structural Alignments" task are found in the hit list and were selected: 2XWD is contained in the cluster of hit 1 and the sequence identical 1OGS, 2NSX and 2NT1 are contained in the cluster of hit 4.

Then, we create models of P04062 using one of the selected hits at a time with hhmakemodel.pl as follows:

perl /usr/share/hhsuite/scripts/hhmakemodel.pl -i Assignment2_Alignments/output/task1/hhblits-2iter-uniprot20--1iter-pdb.hhr 
-d /mnt/project/pracstrucfunc13/data/pdb/20120401/entries/* -m 1 -ts Assignment4_StructuralAlignments/hhmakemodel/P04062_models.pdb

In the command above, the parameter "-i" is the input .hhr file, "-d" is the directory with the PDB structures of the hits, "-m" lists indices of the hits to be used for modelling (default='-m 1') and "-ts" is the file, where the PDB-formatted models based on pairwise alignments should be written.