Task 9 Lab Journal (MSUD)

From Bioinformatikpedia
Revision as of 23:53, 12 July 2013 by Weish (talk | contribs) (FoldX)

Selection of structure model

As a tradeoff between resolution and sequence completeness of the structure model, we have chose the PDB structure 2BFF as model structure for BCKDHA.

Visualization of mutant structures

In order to get the position of mutations in PDB structure 2BFF, we have aligned the SEQRES sequence of 2BFF to reference sequence of BCKDHA using Needleman Wunsch algorithm. The position of mutations should be shifted 45 residues back. Alignment is shown below:

NP_000700.1        1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD     50
                                                                  |||||
SEQUENCE           1 ---------------------------------------------SSLDD      5

NP_000700.1       51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE           6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE     55

NP_000700.1      101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE          56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN    105

NP_000700.1      151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER    200
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER    155

NP_000700.1      201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF    250
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF    205

NP_000700.1      251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG    300
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG    255

NP_000700.1      301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE    350
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE    305

NP_000700.1      351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK    400
                     |.||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         306 VGYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK    355

NP_000700.1      401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK    445
                     |||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK    400

Create mutated structures (SCWRL)

With repairPDB the PDB file of 2BFF was repaired and the sequence was extracted:

/opt/SS12-Practical/scripts/repairPDB 2BFF.pdb > 2BFF_repaired.pdb

/opt/SS12-Practical/scripts/repairPDB 2BFF.pdb -seq > 2BFF_sequence.txt


The mutations were introduced into the sequence with the following script (located at /mnt/home/student/schillerl/MasterPractical/task9/create_mutated_sequences.py):


<source lang=python> Create mutated sequences for a sequence extracted from pdb file.

Usage: python create_mutated_sequences.py <sequence file> <pdb sequence file> [mutations] <sequence file> reference sequence fasta file (mutation numbers correspond to this sequence) <pdb sequence file> sequence in pdb file (extracted with repairPDB -seq option) mutations list of mutations

One output file for each mutation will be created (named like pdb sequence file extended with mutation identifier), all residues lower case except for mutated.

Example for usage: python create_mutated_sequences.py refseq_BCKDHA_protein.fasta 2BFF_sequence.txt M82L A222T C264W R346H I361V

@author: Laura Schiller

import sys from Bio import SeqIO, pairwise2 from Bio.SubsMat import MatrixInfo

ref_seq = SeqIO.read(sys.argv[1], "fasta") mutations = sys.argv[3:len(sys.argv)]

pdb_seq_file = open(sys.argv[2]) pdb_seq = pdb_seq_file.readline()[0:-1] pdb_seq_file.close()

  1. pairwise alignment

matrix = MatrixInfo.blosum62 gap_open = -10 gap_extend = -0.5 alignment = pairwise2.align.globalds(ref_seq, pdb_seq, matrix, gap_open, gap_extend) ref_seq_aligned = alignment[0][0] pdb_seq_aligned = alignment[0][1]

for mutation in mutations:

   mut_pos = int(mutation[1:-1])
   old_aa = mutation[0]
   new_aa = mutation[-1]
   
   # determine corresponding position in pdb sequence
   pos = 0
   for i in range(len(ref_seq_aligned)):
       if ref_seq_aligned[i] != '-':
           pos += 1
           if pos == mut_pos:
               break 
   
   assert pdb_seq_aligned[i] == old_aa
   
   mutated_seq = pdb_seq_aligned.lower()
   mutated_seq = mutated_seq[0:i] + new_aa + mutated_seq[i+1:len(mutated_seq)]
   
   out_file = open(sys.argv[2].split(".")[0] + "_" + mutation + "." + sys.argv[2].split(".")[-1], "w")
   out_file.write(mutated_seq.replace('-', ) + "\n")
   out_file.close()

</source>


For each mutated sequence, a structure was created with SQWRL:


<source lang=bash> for mutation in M82L A222T C264W R346H I361V; do

 /opt/SS12-Practical/scwrl4/Scwrl4 -i 2BFF_repaired.pdb -s 2BFF_sequence_${mutation}.txt -o 2BFF_${mutation}_scwrl_model.pdb

done</source>

Energy comparisons

FoldX

We have adopted the example files from FoldX to perform a batched evaluation of the energy of mutant structures. Because, in our case, we are interested in the effect of single point mutations on protein structure and function, we simply assigned the 5 chosen mutations into 5 rows with tailing semicolon. Again, we have to adjust the position of mutations to the position in PDB structure.

List of individual mutants (Chain id must be assigned after WT residue):

MA37L;
AA177T;
CA219W;
RA301H;
IA316V;

Minimise

Gromacs