Task 8 Lab Journal (MSUD)
Contents
Visualization of mutations
Very often the PDB structures do not have identical sequences in comparison to their original proteins. Deletion of residues is very frequent in PDB structure. In order to map the positions of mutations reported in SNP databases like HGMD and dbSNP, we have applied global alignment between reference sequence and the sequence in PDB structure of BCKDHA.
Following is the alignment between reference sequence of BCKDHA and chain A of 1U5B. The result shows that the position of mutations reported in SNP databases should be shifted back with 45 residues. It means the 50th residue in reference sequence is actually the 5th in chain A of 1U5B.
######################################## # Program: needle # Rundate: Sat 29 Jun 2013 21:56:11 # Commandline: needle # -auto # -stdout # -asequence emboss_needle-I20130629-215609-0499-45143350-es.aupfile # -bsequence emboss_needle-I20130629-215609-0499-45143350-es.bupfile # -datafile EBLOSUM62 # -gapopen 10.0 # -gapextend 0.5 # -endopen 10.0 # -endextend 0.5 # -aformat3 pair # -sprotein1 # -sprotein2 # Align_format: pair # Report_file: stdout ######################################## #======================================= # # Aligned_sequences: 2 # 1: NP_000700.1 # 2: SEQUENCE # Matrix: EBLOSUM62 # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 445 # Identity: 400/445 (89.9%) # Similarity: 400/445 (89.9%) # Gaps: 45/445 (10.1%) # Score: 2124.0 # # #======================================= NP_000700.1 1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD 50 ||||| SEQUENCE 1 ---------------------------------------------SSLDD 5 NP_000700.1 51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 100 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 55 NP_000700.1 101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 150 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 105 NP_000700.1 151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 200 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 155 NP_000700.1 201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 250 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 205 NP_000700.1 251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 300 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 255 NP_000700.1 301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 350 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 305 NP_000700.1 351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 400 |||||||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 306 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 355 NP_000700.1 401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 445 ||||||||||||||||||||||||||||||||||||||||||||| SEQUENCE 356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 400 #--------------------------------------- #---------------------------------------
Analysis of evolutionary conservation of mutated positions
Using PSSM
A position-specific scoring matrix (PSSM) was created with PSI-Blast:
blastpgp -d /mnt/project/pracstrucfunc13/data/big/big_80 -i /mnt/home/student/weish/master-practical-2013/task01/refseq_BCKDHA_protein.fasta -j 5 -h 10e-10 -Q BCKDHA.pssm
The conservation (observed frequency) of the wildtype and mutant amino acids was extracted from the PSSM file with the following Python script (located at /mnt/home/student/schillerl/MasterPractical/task8/extract_aa_frequency.py
):
<source lang=python>
Extract amino acid frequencies from PsiBlast PSSM (created with -Q option).
Usage: python extract_aa_frequency.py <pssm file> [mutations]
Example for usage: python extract_aa_frequency.py BCKDHA.pssm P38H M82L T151M A222T C264W R265W R314Q R346H I361V Y413H
@author: Laura Schiller
import sys
pssm_file = open(sys.argv[1]) lines = pssm_file.readlines() pssm_file.close()
- positions of amino acid frequencies in PSSM
aa_position = { 'A': 22,
'R': 23, 'N': 24, 'D': 25, 'C': 26, 'Q': 27, 'E': 28, 'G': 29, 'H': 30, 'I': 31, 'L': 32, 'K': 33, 'M': 34, 'F': 35, 'P': 36, 'S': 37, 'T': 38, 'W': 39, 'Y': 40, 'V': 41}
for i in range(2, len(sys.argv)):
old_aa = sys.argv[i][0] new_aa = sys.argv[i][-1] position = sys.argv[i][1:(len(sys.argv[i]) - 1)] for j in range(2, len(lines)): splitted = lines[j].split() if splitted[0] == position: print("Frequencies at position %s: %s %s, %s %s" % (position, old_aa, splitted[aa_position[old_aa]], new_aa, splitted[aa_position[new_aa]])) break
</source>
Using MSA
Reference sequence of BCKDHA was aligned against uniprot database of mammals. Significance cutoff was set to 0.1 and maximal 1000 hit is to be shown. Using the web tool from uniprot, we can easily extract uniprot identifiers of the homologous mammal proteins. Then with the retrieve tool from uniprot all homologous mammal sequences can be extracted. At last, the multiple sequence alignment of reference sequence of BCKDHA and all homologous mammal proteins was generated with the ClustalW2 tool from EBI.
Predicting effects of mutations
SIFT
SIFT was run on this server with default parameters.
Polyphen2
This server was used with default parameters to run Polyphen2.
MutationTaster
The MutationTaster web tool was used to analyze mutations in BCKDHA. Because the server uses mRNA sequence to evaluate SNPs, we have consulted the reference sequence of BCKDHA (NM_000709) and find out the open reading frame of BCKDHA begins at 40th nucleotide i.e. the first start codon. Then we can map mutations of amino acid to the mRNA sequences.
SNAP
For this part of the task we have used SNAP2. In order to get SNAP2 run properly, we have to write a run-configuration file .snap2rc and put it into the home directory on the student server. Following is the configuration file:
<source lang="ini"> [snap2]
- snapfun_utildir=path - path to package utilities, default: /usr/share/snap2
snap2dir=/usr/share/snap2
- use snap cache [0|1], default: 0
use_snap_cache=0
- snap cache fetch executable
snapc_fetch=/usr/bin/snapc_fetch
- snap cache store executable
snapc_store=/usr/bin/snapc_store
- snap cache root - overrides snap-cache-mgr configuration
snap_cache_root=
- blastpgp_processors
blastpgp_processors=1
- use predictprotein cache, default: 0
use_pp_cache=0
- predictprotein executable
pp_exe=/usr/bin/predictprotein
- sift executable
sift_exe=/usr/bin/sift_for_submitting_fasta_seq.csh
- reprof executable
reprof_exe=/usr/bin/reprof
- blast executable
blast_exe=/usr/bin/blastpgp
[data]
- swiss_dat=path - location of UniProt/Swiss-Prot dat file
swiss_dat=/mnt/project/pracstrucfunc13/data/swissprot/20120501/uniprot_sprot.dat
- db_swiss=path - path to ID index of Swiss-Prot dat file (generated by /usr/share/librg-utils-perl/dbSwiss.pl)
db_swiss=/mnt/project/pracstrucfunc13/data/swissprot/20120501/dbswiss
- PHAT substitution matrix
phat_matrix=/usr/share/snap2/phat.txt
[blast]
- big80=path - path to redundancy reduced database (UniProtKB 80 or equivalent)
big80=/mnt/project/pracstrucfunc13/data/big/big_80
- swiss=path - path to SwissProt database
swiss=/mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot </source>
To call snap2, we have used this command: snap2 -m mutation-list.txt -i ../task01/refseq_BCKDHA_protein.fasta -o snapfun-bckdha.result