Difference between revisions of "Reference Sequence BCKDHA"

From Bioinformatikpedia
(Multiple Alignments)
(Mutated sequence)
 
(13 intermediate revisions by 2 users not shown)
Line 4: Line 4:
   
 
<tt>
 
<tt>
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE
+
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE<br>
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY
+
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY<br>
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG
+
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG<br>
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA
+
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA<br>
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
+
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG<br>
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP
+
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP<br>
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL
+
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL<br>
 
RKQQESLARHLQTYGEHYPLDHFDK
 
RKQQESLARHLQTYGEHYPLDHFDK
 
</tt>
 
</tt>
Line 21: Line 21:
   
 
<tt>
 
<tt>
SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQ
+
SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQ<br>
GRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTI
+
GRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTI<br>
SSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIA
+
SSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIA<br>
ARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLR
+
ARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLR<br>
HYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
+
HYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK<br>
 
</tt>
 
</tt>
   
 
Sequence info: [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1U5B]
 
Sequence info: [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1U5B]
   
  +
== Mutated sequence ==
   
  +
The following sequence shows the sequence inclusive all point mutations (missense/nonsense) listed in HGMD. (green: signal sequence)
== Sequence Alignments ==
 
   
  +
<tt>
=== Sequence searches ===
 
  +
> bckdha 445 aminoacids; Mw=50481.62Da
* FASTA
 
../bin/fasta36 sequence.fasta database > FastaOutput.txt
 
   
  +
<font color=green>
* BLAST
 
  +
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQF</font>SSLDDKPQFPGASAE<br>
blastall -p blastp -d database -i sequence.fasta > BlastOutput.txt
 
  +
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY<br>
  +
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG<br>
  +
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA<br>
  +
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG<br>
  +
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP<br>
  +
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL<br>
  +
RKQQESLARHLQTYGEHYPLDHFDK*
  +
</tt>
   
* PSIBLAST
 
blastpgp -i sequence.fasta -j iterations -h evalueCutoff -d database > PsiblastOutput.txt
 
   
  +
The following sequence is the reference sequence used by dbSNP. Note that this sequence is longer that 400 amino acids (protein length) and even longer than 445 amino acids (protein plus signal sequence(green)). It contains additional amino acids both at the beginning and the end of the sequence (blue).
* HHSearch
 
hhsearch -i query -d database -o output.txt
 
   
  +
<tt>
  +
<font color=blue>LRECRTAEWLLAK</font><font color=green>MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQF</font>SS<br>
  +
LDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYK<br>
  +
SMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYR<br>
  +
DYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRAN<br>
  +
ANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAAR<br>
  +
GPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS<br>
  +
VDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNL<br>
  +
LFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK<font color=blue>.DLLSPPPPILSYPER.PHSKG<br>
  +
SRGT.QHTTVFPSQLPLKYSAARAAATLHPCSSRLLHCQGTASAAVAEAPSAPSSPVVTV<br>
  +
PSPRGWVRAHSGLEAPLGMGWTWQVSLWNLRRCEWPAEVTNKLHLCAWLSTKKKKKK</font>
  +
</tt>
   
database = /data/blast/nr/nr
 
 
Sequences chosen for the multiple Alignment:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
!SeqIdentifier
 
!Seq Identity
 
!source
 
|-
 
!colspan="3"| 99-90% Sequence Identity
 
|-
 
|gi|56967006|pdb|1X7Z||99%||PSI BLAST, 3 iterations, E-value cutoff 0.005
 
|-
 
|gi|7546384|pdb|1DTW||95%||BLAST
 
|-
 
|gi|34810149|pdb|1OLU||99%||PSI BLAST, 3 iterations, E-value cutoff 10E-6
 
|-
 
|gi|13277798|gb|AAH03787.1||95%||PSI BLAST, 3 iterations, E-value cutoff 10E-6
 
|-
 
|gi|148727347|ref|NP_001092034.1||95%||BLAST
 
|-
 
!colspan="3"| 89-60% Sequence Identity
 
|-
 
|gi|196011048|ref|XP_002115388.1||66%||PSI BLAST, 3 iterations, E-value cutoff 0.005
 
|-
 
|gi|149543950|ref|XP_001517857.1||67%||BLAST
 
|-
 
|gi|47227873|emb|CAG09036.1||82,5%||FASTA
 
|-
 
|gi|47196273|emb|CAF88112.1||81%||PSI BLAST, 5 iterations, E-value cutoff 0.005
 
|-
 
|gi|12964598|dbj|BAB32665.1||88%||PSI BLAST, 5 iterations, E-value cutoff 10E-6
 
|-
 
!colspan="3"| 59-40% Sequence Identity
 
|-
 
|gi|193290664|gb|ACF17640.1||47%||BLAST
 
|-
 
|gi|215431443|ref|ZP_03429362.1||40%||FASTA
 
|-
 
|gi|225557347|gb|EEH05633.1||51%||PSI BLAST, 3 iterations, E-value cutoff 10E-6
 
|-
 
|gi|58267618|ref|XP_570965.1||50%||PSI BLAST, 5 iterations, E-value cutoff 0.005
 
|-
 
|gi|162449842|ref|YP_001612209.1||41%||PSI BLAST, 5 iterations, E-value cutoff 10E-6
 
|-
 
!colspan="3"| 39-20% Sequence Identity
 
|-
 
|gi|56966700|pdb|1W85||31%||PSI BLAST, 3 iterations, E-value cutoff 0.005
 
|-
 
|gi|5822330|pdb|1QS0||38.1%||FASTA
 
|-
 
|gi|13516864|dbj|BAB40585.1||33%||PSI BLAST, 3 iterations, E-value cutoff 10E-6
 
|-
 
|gi|284166853|ref|YP_003405132.1||35%||PSI BLAST, 5 iterations, E-value cutoff 0.005
 
|-
 
|gi|76800932|ref|YP_325940.1||34%||PSI BLAST, 5 iterations, E-value cutoff 10E-6
 
 
|}
 
 
Sequences for the Multiple Sequences Alignment were downloaded via NCBI, the sequence id can be changed in the link to retrieve the fasta format:
 
http://www.ncbi.nlm.nih.gov/protein/76800932?report=fasta
 
 
=== Multiple Alignments ===
 
* [[ClustalW]]
 
clustalw sequences.fasta
 
 
* [[T-Coffee]]
 
t_coffee -seq sequences.fasta
 
 
* [[T-Coffee(3D)]]
 
t_coffee -seq sequences.fasta -mode expresso
 
 
* [[Muscle]]
 
muscle -in sequences.fasta -out output.aln
 
 
* [[Cobalt]]
 
 
download [ftp://ftp.ncbi.nlm.nih.gov/pub/cobalt cobalt]
 
   
  +
This sequence has an additional 13 amino acids at the beginning, which should be taken care of when comparing the SNP positions with the positions retrieved by HGMD.
./cobalt -i sequences.fasta -norps T > output.aln
 
   
  +
go to Task 2: [[Sequence_Alignments]]
=== Conservation and Gaps ===
 
   
  +
go to Task 5: [[Mapping_SNPs_BCKDHA| Mapping SNPs]]
{| border="1" style="text-align:center; border-spacing:0;"
 
|'''Alignment methods'''
 
|'''Gaps'''
 
|'''Avg Gap Length'''
 
|-
 
|ClustalW
 
|12
 
|3,75
 
|-
 
|T-Coffee
 
|25
 
|4,56
 
|-
 
|T-Coffee
 
|56
 
|4,75
 
|-
 
|Cobalt
 
|19
 
|3,26
 
|-
 
|Muscle
 
|17
 
|4,76
 
|}
 
   
 
back to [[Maple syrup urine disease]] main page
 
back to [[Maple syrup urine disease]] main page

Latest revision as of 22:44, 16 June 2011

Sequence

  • Uniprot:

>sp|P12694|ODBA_HUMAN 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial OS=Homo sapiens GN=BCKDHA PE=1 SV=2

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL
RKQQESLARHLQTYGEHYPLDHFDK

Sequence info: [1] The Uniprot sequence is 445 aa long, as is contains the transit peptide sequence from position 1-45.

  • PDB:

>1U5B:A|PDBID|CHAIN|SEQUENCE

SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQ
GRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTI
SSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIA
ARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPISRLR
HYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK

Sequence info: [2]

Mutated sequence

The following sequence shows the sequence inclusive all point mutations (missense/nonsense) listed in HGMD. (green: signal sequence)

> bckdha 445 aminoacids; Mw=50481.62Da

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAE
FIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILY
ESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYG
NISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGA
ASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG
NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHP
ISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQL
RKQQESLARHLQTYGEHYPLDHFDK*


The following sequence is the reference sequence used by dbSNP. Note that this sequence is longer that 400 amino acids (protein length) and even longer than 445 amino acids (protein plus signal sequence(green)). It contains additional amino acids both at the beginning and the end of the sequence (blue).

LRECRTAEWLLAKMAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSS
LDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYK
SMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYR
DYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRAN
ANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAAR
GPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRS
VDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNL
LFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK.DLLSPPPPILSYPER.PHSKG
SRGT.QHTTVFPSQLPLKYSAARAAATLHPCSSRLLHCQGTASAAVAEAPSAPSSPVVTV
PSPRGWVRAHSGLEAPLGMGWTWQVSLWNLRRCEWPAEVTNKLHLCAWLSTKKKKKK


This sequence has an additional 13 amino acids at the beginning, which should be taken care of when comparing the SNP positions with the positions retrieved by HGMD.

go to Task 2: Sequence_Alignments

go to Task 5: Mapping SNPs

back to Maple syrup urine disease main page