Task 8 Lab Journal (MSUD)
Contents
Visualization of mutations
Very often the PDB structures do not have identical sequences in comparison to their original proteins. Deletion of residues is very frequent in PDB structure. In order to map the positions of mutations reported in SNP databases like HGMD and dbSNP, we have applied global alignment between reference sequence and the sequence in PDB structure of BCKDHA.
Following is the alignment between reference sequence of BCKDHA and chain A of 1U5B. The result shows that the position of mutations reported in SNP databases should be shifted back with 45 residues. It means the 50th residue in reference sequence is actually the 5th in chain A of 1U5B.
######################################## # Program: needle # Rundate: Sat 29 Jun 2013 21:56:11 # Commandline: needle # -auto # -stdout # -asequence emboss_needle-I20130629-215609-0499-45143350-es.aupfile # -bsequence emboss_needle-I20130629-215609-0499-45143350-es.bupfile # -datafile EBLOSUM62 # -gapopen 10.0 # -gapextend 0.5 # -endopen 10.0 # -endextend 0.5 # -aformat3 pair # -sprotein1 # -sprotein2 # Align_format: pair # Report_file: stdout ######################################## #======================================= # # Aligned_sequences: 2 # 1: NP_000700.1 # 2: SEQUENCE # Matrix: EBLOSUM62 # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 445 # Identity: 400/445 (89.9%) # Similarity: 400/445 (89.9%) # Gaps: 45/445 (10.1%) # Score: 2124.0 # # #======================================= NP_000700.1 1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD 50 ||||| SEQUENCE 1 ---------------------------------------------SSLDD 5 NP_000700.1 51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 100 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 55 NP_000700.1 101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 150 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 105 NP_000700.1 151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 200 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 155 NP_000700.1 201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 250 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 205 NP_000700.1 251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 300 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 255 NP_000700.1 301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 350 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 305 NP_000700.1 351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 400 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 306 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 355 NP_000700.1 401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 445 ||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 400 #--------------------------------------- #---------------------------------------
Analysis of evolutionary conservation of mutated positions
Using PSSM
A position-specific scoring matrix (PSSM) was created with PSI-Blast:
blastpgp -d /mnt/project/pracstrucfunc13/data/big/big_80 -i /mnt/home/student/weish/master-practical-2013/task01/refseq_BCKDHA_protein.fasta -j 5 -h 10e-10 -Q BCKDHA.pssm
The conservation (observed frequency) of the wildtype and mutant amino acids was extracted from the PSSM file with the following Python script (located at /mnt/home/student/schillerl/MasterPractical/task8/extract_aa_frequency.py
):
<source lang=python>
Extract amino acid frequencies from PsiBlast PSSM (created with -Q option).
Usage: python extract_aa_frequency.py <pssm file> [mutations]
Example for usage: python extract_aa_frequency.py BCKDHA.pssm P38H M82L T151M A222T C264W R265W R314Q R346H I361V Y413H
@author: Laura Schiller
import sys
pssm_file = open(sys.argv[1]) lines = pssm_file.readlines() pssm_file.close()
- positions of amino acid frequencies in PSSM
aa_position = { 'A': 22,
'R': 23, 'N': 24, 'D': 25, 'C': 26, 'Q': 27, 'E': 28, 'G': 29, 'H': 30, 'I': 31, 'L': 32, 'K': 33, 'M': 34, 'F': 35, 'P': 36, 'S': 37, 'T': 38, 'W': 39, 'Y': 40, 'V': 41}
for i in range(2, len(sys.argv)):
old_aa = sys.argv[i][0] new_aa = sys.argv[i][-1] position = sys.argv[i][1:(len(sys.argv[i]) - 1)] for j in range(2, len(lines)): splitted = lines[j].split() if splitted[0] == position: print("Frequencies at position %s: %s %s, %s %s" % (position, old_aa, splitted[aa_position[old_aa]], new_aa, splitted[aa_position[new_aa]])) break
</source>
Using MSA
Reference sequence of BCKDHA was aligned against uniprot database of mammals. Significance cutoff was set to 0.1 and maximal 1000 hit is to be shown. Using the web tool from uniprot, we can easily extract uniprot identifiers of the homologous mammal proteins. Then with the retrieve tool from uniprot all homologous mammal sequences can be extracted. At last, the multiple sequence alignment of reference sequence of BCKDHA and all homologous mammal proteins was generated with the ClustalW2 tool from EBI.
Predicting effects of mutations
SIFT
SIFT was run on this server with default parameters.
Polyphen2
This server was used with default parameters to run Polyphen2.