Task 8 Lab Journal (MSUD)

From Bioinformatikpedia
Revision as of 13:23, 30 June 2013 by Weish (talk | contribs) (Using MSA)

Visualization of mutations

Very often the PDB structures do not have identical sequences in comparison to their original proteins. Deletion of residues is very frequent in PDB structure. In order to map the positions of mutations reported in SNP databases like HGMD and dbSNP, we have applied global alignment between reference sequence and the sequence in PDB structure of BCKDHA.

Following is the alignment between reference sequence of BCKDHA and chain A of 1U5B. The result shows that the position of mutations reported in SNP databases should be shifted back with 45 residues. It means the 50th residue in reference sequence is actually the 5th in chain A of 1U5B.

########################################
# Program: needle
# Rundate: Sat 29 Jun 2013 21:56:11
# Commandline: needle
#    -auto
#    -stdout
#    -asequence emboss_needle-I20130629-215609-0499-45143350-es.aupfile
#    -bsequence emboss_needle-I20130629-215609-0499-45143350-es.bupfile
#    -datafile EBLOSUM62
#    -gapopen 10.0
#    -gapextend 0.5
#    -endopen 10.0
#    -endextend 0.5
#    -aformat3 pair
#    -sprotein1
#    -sprotein2
# Align_format: pair
# Report_file: stdout
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: NP_000700.1
# 2: SEQUENCE
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 445
# Identity:     400/445 (89.9%)
# Similarity:   400/445 (89.9%)
# Gaps:          45/445 (10.1%)
# Score: 2124.0
# 
#
#=======================================

NP_000700.1        1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD     50
                                                                  |||||
SEQUENCE           1 ---------------------------------------------SSLDD      5

NP_000700.1       51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE    100
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE           6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE     55

NP_000700.1      101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN    150
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE          56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN    105

NP_000700.1      151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER    200
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER    155

NP_000700.1      201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF    250
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF    205

NP_000700.1      251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG    300
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG    255

NP_000700.1      301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE    350
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE    305

NP_000700.1      351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK    400
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         306 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK    355

NP_000700.1      401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK    445
                     |||||||||||||||||||||||||||||||||||||||||||||
SEQUENCE         356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK    400


#---------------------------------------
#---------------------------------------

Analysis of evolutionary conservation of mutated positions

Using PSSM

A position-specific scoring matrix (PSSM) was created with PSI-Blast:

blastpgp -d /mnt/project/pracstrucfunc13/data/big/big_80 -i /mnt/home/student/weish/master-practical-2013/task01/refseq_BCKDHA_protein.fasta -j 5 -h 10e-10 -Q BCKDHA.pssm

The conservation (observed frequency) of the wildtype and mutant amino acids was extracted from the PSSM file with the following Python script (located at /mnt/home/student/schillerl/MasterPractical/task8/extract_aa_frequency.py):


<source lang=python> Extract amino acid frequencies from PsiBlast PSSM (created with -Q option).

Usage: python extract_aa_frequency.py <pssm file> [mutations]

Example for usage: python extract_aa_frequency.py BCKDHA.pssm P38H M82L T151M A222T C264W R265W R314Q R346H I361V Y413H

@author: Laura Schiller

import sys

pssm_file = open(sys.argv[1]) lines = pssm_file.readlines() pssm_file.close()

  1. positions of amino acid frequencies in PSSM

aa_position = { 'A': 22,

               'R': 23,
               'N': 24,
               'D': 25,
               'C': 26,
               'Q': 27,
               'E': 28,
               'G': 29,
               'H': 30,
               'I': 31,
               'L': 32,
               'K': 33,
               'M': 34,
               'F': 35,
               'P': 36,
               'S': 37,
               'T': 38,
               'W': 39,
               'Y': 40,
               'V': 41}

for i in range(2, len(sys.argv)):

   old_aa = sys.argv[i][0]
   new_aa = sys.argv[i][-1]
   position = sys.argv[i][1:(len(sys.argv[i]) - 1)]
   
   for j in range(2, len(lines)):
       splitted = lines[j].split()
       if splitted[0] == position:
           print("Frequencies at position %s: %s %s, %s %s" 
                 % (position, 
                    old_aa, 
                    splitted[aa_position[old_aa]], 
                    new_aa, 
                    splitted[aa_position[new_aa]]))
           break

</source>

Using MSA

Reference sequence of BCKDHA was aligned against uniprot database of mammals. Significance cutoff was set to 0.1 and maximal 1000 hit is to be shown. Using the web tool from uniprot, we can easily extract uniprot identifiers of the homologous mammal proteins. Then with the retrieve tool from uniprot all homologous mammal sequences can be extracted. At last, the multiple sequence alignment of reference sequence of BCKDHA and all homologous mammal proteins was generated with the ClustalW2 tool from EBI.

Predicting effects of mutations

SIFT

SIFT was run on this server with default parameters.

Polyphen2

This server was used with default parameters to run Polyphen2.

MutationTaster

SNAP