https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/api.php?action=feedcontributions&user=Weish&feedformat=atomBioinformatikpedia - User contributions [en]2024-03-28T16:05:09ZUser contributionsMediaWiki 1.31.16https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_4_(MSUD)&diff=37386Task 4 (MSUD)2013-08-30T20:14:34Z<p>Weish: /* Quality measures */</p>
<hr />
<div>== Structural alignment ==<br />
<br />
[[Lab Journal of Task 4 (MSUD)#Structural alignment|Lab journal]]<br />
<br />
=== Results ===<br />
Following are the structures we have chosen for evaluation of structural alignment methods.<br />
<br />
<table border="1"><br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A1"><br />
<p class="P1">'''PDB ID'''</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A1"><br />
<p class="P1">'''Identity to BCKDHA'''</p><br />
</td><br />
<br />
<td>'''Origin'''</td><br />
<br />
<td style="text-align:left;width:2.1972in;" class="Table1_C1"><br />
<p class="P2">'''Comment'''</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P1">1u5b</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P1">100%</p><br />
</td><br />
<br />
<td>Mitochondrial branched-chain &alpha;-keto acid dehydrogenase (Human)</td><br />
<br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P2">BCKDHA</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P1">2bfe</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P3">99%</p><br />
</td><br />
<br />
<td>Mitochondrial branched-chain &alpha;-keto acid dehydrogenase (Human)</td><br />
<br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P2">BCKDHA alternative structure</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P4">3exg</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P4">24.9%</p><br />
</td><br />
<td>Pyruvate dehydrogenase (Human)</td><br />
<br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P4">structure with low identity</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P2">13pk</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P2">4.1%</p><br />
</td><br />
<td>Phosphoglycerate kinase (''Trypanosoma brucei'')</td><br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P2">same CAT classification</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P2">17gz</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P2">8.3%</p><br />
</td><br />
<td>Glutathione S-Transferase (Human)</td><br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P2">same CA classification</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P5">2z37</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P5">5.4%</p><br />
</td><br />
<td>Chitinase (''Brassica juncea'')</td><br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P2">same C classification</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td style="text-align:left;width:0.6326in;" class="Table1_A2"><br />
<p class="P5">1f0y</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.5264in;" class="Table1_A2"><br />
<p class="P5">8.1%</p><br />
</td><br />
<td>L-3-Hydroxyacyl-CoA Dehydrogenase (Human)</td><br />
<td style="text-align:left;width:2.1972in;" class="Table1_C2"><br />
<p class="P5">different CATH classification</p><br />
</td><br />
</tr><br />
</table><br />
<br />
We did not find PDB structure which has a sequence identity between 40% to 90% in comparison to BCKDHA.<br />
<br />
==== Structural alignments ====<br />
<gallery perrow=3 widths=300px heights=220px caption="Structures superimposed to X-ray structures of BCKDHA using PyMOL"><br />
File:Bckdha-aligned-to-2bfe.png|BCKDHA aligned to 2BFE. Structure of BCKDHA is shown in green.<br />
File:Bckdha-aligned-to-3exg.png|BCKDHA aligned to 3EXG. Structure of BCKDHA is shown in green.<br />
File:Bckdha-aligned-to-13pk.png|BCKDHA aligned to 13PK. Structure of BCKDHA is shown in green.<br />
File:Bckdha-aligned-to-17gs.png|BCKDHA aligned to 17GS. Structure of BCKDHA is shown in green.<br />
File:Bckdha-aligned-to-2z37.png|BCKDHA aligned to 2Z37. Structure of BCKDHA is shown in green.<br />
File:Bckdha-aligned-to-1f0y.png|BCKDHA aligned to 1F0Y. Structure of BCKDHA is shown in green.<br />
</gallery><br />
<br />
==== Quality measures ====<br />
RMSD (&Aring;) for structural alignments using different methods are shown in following table (All these structures were aligned to the structure of BCKDHA - 1u5b):<br />
<br />
<table border="1"><br />
<tr><br />
<td>'''Structure type'''</td><br />
<td style="text-align:left;width:1.1542in;"><br />
<p class="P6">'''PDB'''</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;"><br />
<p class="P7">'''PyMOL'''</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;"><br />
<p class="P6">'''LGA'''</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;"><br />
<p class="P6">'''SSAP'''</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;"><br />
<p class="P6">'''TopMatch'''<span class="T2">(''E<sub>r</sub>'')</span></p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F1"><br />
<p class="P6">'''CE'''</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>&gt;90% identity</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">2bfe</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">0.21</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">0.29</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">0.30</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P13"><span class="T2">0.3</span><span class="T5">4</span></p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P6">0.29</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>&lt;30% identity</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">3exg</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">1.30</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">1.73</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">9.22</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P15">2.05</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P6">1.97</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>CATH: same CAT</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">13pk</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">12.05</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P6">3.11</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P9">19.22</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P16">2.45</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P9">4.01</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>CATH: same CA</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P9">17gs</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">10.75</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P10">3.41</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P10">15.88</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P16">2.72</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P10">6.05</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>CATH: same C</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P11">2z37</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">9.07</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P12">3.27</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P12">25.26</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P16">2.17</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P12">6.17</p><br />
</td><br />
</tr><br />
<br />
<tr><br />
<td>different CATH</td><br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P12">1f0y</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P8">18.73</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P12">2.96</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P12">14.07</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_A2"><br />
<p class="P17">2.52</p><br />
</td><br />
<br />
<td style="text-align:left;width:1.1542in;" class="Table2_F2"><br />
<p class="P12">5.29</p><br />
</td><br />
</tr><br />
</table><br />
<br />
=== Discussion ===<br />
* Sequence dissimilarity does not implicit structural dissimilarity. <br />
** Although chain A of PDB structure 3EXG has only a sequence identity of 25% to sequence of BCKDHA, they have similar structures.<br />
<br />
* Different alignment methods shows high deviation of RMSD. <br />
** RMSD derived from SSAP and TopMatch have an amost 8 fold value difference.<br />
** Deviation of RMSD can be explained by the different goal of methods. <br />
*** For example, TopMatch tends to optimize structural alignments to a local optimal, i.e. instead of align the whole structure, regions with high structural similarity are aligned. The resulted RMSD only describes the structural deviation between the local aligned regions.<br />
*** In contrast to TopMatch, SSAP aligns structures in a global favor. Domains with high similarity in different proteins can not be detected due to the overall diverged structures.<br />
<br />
* User should select the best tool to use<br />
** Although these methods returns very different results, it is not easy to decide which one is the universal solution in the field of structural alignment.<br />
** For proteins with similar structures, it is meaningful to use a global alignment like SSAP which can align conserved domains, secondary structures together.<br />
** For researching conservation of specific domains, local alignment can find out common structural patterns in structurally dissimilar proteins.<br />
<br />
== Evaluation of alignments using structures ==<br />
<br />
[[Lab Journal of Task 4 (MSUD)#Evaluation of alignments using structures|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
The following table shows an overview of the structures used for building models, the scores of the structural alignment (RMSD and LGA_S - structure similarity score, both according to the C&alpha; atoms), and the scores of the sequence alignment (E-value, probability and sequence identity).<br />
<br />
<br />
{| class="wikitable" border="1" style="text-align:center;width:500px" align="center"<br />
!model !! RMSD [Â] !! LGA_S !! E-value !! probability [%] !! sequence identity [%]<br />
|-<br />
|1qs0 || 1.24 || 84.09 || 5.8E-94 || 100.0 || 38<br />
|-<br />
|1w85 || 1.77 || 78.36 || 8.3E-87 || 100.0 || 33<br />
|-<br />
|2ozl || 1.63 || 74.03 || 3.2E-69 || 100.0 || 27<br />
|-<br />
|2yic || 2.45 || 41.73 || 5.7E-47 || 100.0 || 16<br />
|-<br />
|3l84 || 2.01 || 32.41 || 6.5E-18 || 99.5 || 21<br />
|-<br />
|2q28 || 1.86 || 25.40 || 1.6E-08 || 97.9 || 13<br />
|-<br />
|1r9j || 1.73 || 30.10 || 1.1E-06 || 97.2 || 25<br />
|-<br />
|2vk8 || 2.12 || 21.99 || 3.7E-05 || 96.4 || 22<br />
|-<br />
|1t9b || 1.83 || 23.72 || 1.1E-03 || 94.9 || 18<br />
|-<br />
|2c31 || 2.00 || 21.85 || 1.1E-02 || 92.7 || 21<br />
|}<br />
<br />
<br />
{| class="wikitable" border="1" style="text-align:center;width:500px" align="center"<br />
|+Correlations of structural to sequence alignment scores<br />
|-<br />
! !! E-value !! log10(E-value) !! probability [%] !! sequence identity<br />
|-<br />
|'''RMSD''' || 0.15 || 0.49 || -0.19 || -0.74<br />
|-<br />
|'''LGA_S''' || -0.33 || -0.98 || 0.71 || 0.82<br />
|}<br />
<br />
<br />
As can be seen in the above table, the RMSD has a weak correlation to the logarithm of E-value and a higher correlation to sequence identity. The RMSD is lower, if the E-value is lower or the sequence identity is higher.<br />
<br />
The same tendency can be seen for the LGA_S score, but here the correlations are higher. The LGA_S score shows also a correlation to the probability in contrast to the RMSD.<br />
<br />
The signs are opposite for RMSD and LGA_S, because the RMSD is lower for higher similarity, but the LGA_S is higher.<br />
<br />
The relationship of LGA_S and E-value, the pair of scores with the highest correlation, for the 10 models is shown in the following plot.<br />
<br />
[[File:MSUD_cor_LGA-S_evalue.jpeg|400px|center]]<br />
<br />
=== Discussion ===<br />
<br />
The correlations between structural and sequence alignment scores are as expected. A low E-value indicates a hit that is unlikely to occur only by chance, so it is significant. This means it is related to the query and will have a similar structure. So the RMSD, which measures the difference (root mean squared distance) in the aligned structures, will be low for nearly related proteins. Also if two sequences have a high sequence identity, they will more likely have the same structure, which explains the correlation of RMSD to it. The reason for the observed correlations of RMSD to the alignment scores being weaker than those of the LGA_S score, could be that the RMSD is calculated only locally for structurally aligned residues. So it tends to be too low, because a protein pair which has a very similar part but another dissimilar, not alignable part, would have a low RMSD. For probability the values in the sample do not cover the whole range of possible values, so there was observed almost no correlation to RMSD. We did not take very distant relatives to create structure models, if they had a too high E-value.<br />
<br />
The LGA_S score, which combines local and global distances to the reference structure (here: the structure of our BCKDHA protein), gives a better indication of the overall structure similarity, than the local RMSD. It is correlated to the sequence alignment scores (in particular to the logarithm of E-value), so a significant hit in the sequence alignment is likely to have a structure similar to those of the query, and thus can be used to create a model of the structure of the query protein.<br />
<br />
All observed local RMSD values vere relatively low and also the LGA_S values were high at least for nearly related structures. So the approach of simply copying C&alpha; atom coordinates of aligned residues from the structure of a related sequence helps to build a model of the unknown structure of a protein.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_9_(MSUD)&diff=34487Task 9 (MSUD)2013-08-09T18:21:17Z<p>Weish: /* Visualization of mutant structures */</p>
<hr />
<div>==Results==<br />
<br />
[[Task 9 Lab Journal (MSUD)|Lab journal]]<br />
<br />
<br />
For this task we have chosen 5 mutations from HGMD and dbSNP. 2 of them are neutral mutations which do not have functional changes over the protein structure. Following are the 5 mutations:<br />
<br />
<br />
<table border=1><br />
<tr style="font-weight:bold;"><td>Database</td><td>Accession_code</td><td>Mutation</td><td>Pathogenic</td></tr><br />
<tr><td>dbSNP</td><td>rs11549938</td><td>M82L</td><td>False</td></tr><br />
<tr><td>dbSNP</td><td>rs141086188</td><td>A222T</td><td>False</td></tr><br />
<tr style="background-color: #ffbbbb;"><td>HGMD</td><td>CM045934</td><td>C264W</td><td>True</td></tr><br />
<tr style="background-color: #ffbbbb;"><td>HGMD</td><td>CM062448</td><td>R346H</td><td>True</td></tr><br />
<tr><td>dbSNP</td><td>rs61736656</td><td>I361V </td><td>False</td></tr><br />
</table><br />
<br />
<br />
===Selection of structure model===<br />
<br />
The following table shows an overview of all structures that are available for our protein:<br />
<br />
<br />
{| class='wikitable' border='1' style='width:800px'<br />
! Entry !! Method !! Resolution [Å] !! Chain !! Positions !! R-value !! pH !! gaps<br />
|-<br />
| 2BFD || X-ray || 1.39 || A || 46-445 || 0.15 || 5.5 || 289GLY-292SER,293THR-313ASP<br />
|- style="background-color: #FFBBBB;"<br />
| 2BFF || X-ray || 1.46 || A || 46-445 || 0.15 || 5.5 || 301ARG-305GLU<br />
|-<br />
| 1X7Y || X-ray || 1.57 || A || 46-445 || 0.15 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 2BFC || X-ray || 1.64 || A || 46-445 || 0.144 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 2BFE || X-ray || 1.69 || A || 46-445 || 0.15 || 5.5 || 289GLY-291HIS,293THR-313ASP<br />
|-<br />
| 1X7Z || X-ray || 1.72 || A || 46-445 || 0.154 || 5.8 || 301ARG-306VAL<br />
|-<br />
| 1X7W || X-ray || 1.73 || A || 46-445 || 0.148 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 2BFB || X-ray || 1.77 || A || 46-445 || 0.145 || 5.5 || 287ARG-313ASP<br />
|-<br />
| 2BEW || X-ray || 1.79 || A || 46-445 || 0.147 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1V1R || X-ray || 1.80 || A || 46-445 || 0.158 || 5.5 || 222ASN-231SER,286TYR-314HIS<br />
|-<br />
| 2BEV || X-ray || 1.80 || A || 46-445 || 0.139 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1U5B || X-ray || 1.83 || A || 46-445 || 0.156 || 5.8 || 301ARG-306VAL<br />
|-<br />
| 1WCI || X-ray || 1.84 || A || 46-445 || 0.149 || 5.5 || 301ARG-309TRP<br />
|-<br />
| 1OLS || X-ray || 1.85 || A || 46-445 || 0.172 || 5.5 || 292SER-295ASP,298SER-300TYR,301ARG-308TYR<br />
|-<br />
| 2J9F || X-ray || 1.88 || A/C || 46-445 || 0.171 || 5.5 || 300TYR-313ASP<br />
|-<br />
| 2BEU || X-ray || 1.89 || A || 46-445 || 0.171 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1OLU || X-ray || 1.90 || A || 46-445 || 0.161 || 5.5 || 26ASN-30GLY,287ARG-313ASP<br />
|-<br />
| 1V16 || X-ray || 1.90 || A || 46-445 || 0.132 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 1V11 || X-ray || 1.95 || A || 46-445 || 0.139 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 1V1M || X-ray || 2.00 || A || 46-445 || 0.13 || 5.5 || 289GLY-313ASP<br />
|-<br />
| 1X80 || X-ray || 2.00 || A || 46-445 || 0.161 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 1X7X || X-ray || 2.10 || A || 46-445 || 0.149 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 1OLX || X-ray || 2.25 || A || 46-445 || 0.161 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1DTW || X-ray || 2.70 || A || 46-445 || 0.224 || 7.5 || 301ARG-314HIS<br />
|}<br />
<br />
<br />
Unfortunately all structures contain gaps that span positions 302-304 (corresponding to 347-349 in the reference sequence), so we cannot create a composite structure, that does not contain this gap. We chose 2BFF because it has the smallest gap, a good resolution and a low R-value. It is not resolved at physiological pH, but the only structure with pH 7.5 (1DTW) has a bad resolution. The RMSD between 2BFF and 1DTW is about 0.3, so the different pH does not lead to a different structure and therefore the low pH at that 2BFF was resolved should not be a problem.<br />
<br />
<br />
===Visualization of mutant structures===<br />
The mutagen tool of PyMOL was used to introduce mutations to the protein structure. Mutations and their neighboring residues are visualized and shown in following figures.<br />
<gallery caption="Visualization of mutations in BCKDHA (PDB: 2BFF, all residue positions refer to the reference sequence of BCKDHA)" widths="300" heights="240" perrow=3><br />
File:2bff_m82l.png|Local environment of residue 82. Original methionine is replaced by leucine.<br />
File:2bff_a222t.png|Local environment of residue 222. Original alanine is replaced by threonine.<br />
File:2bff_c264w.png|Local environment of residue 264. Original cysteine is replaced by tryptophan. Clearly, the mutant residue tryptophan clashes with residues near the binding site (<span style="color:blue;">Blue</span>). The smallest atomic distance we have found is 1.2 &Aring;.<br />
File:2bff_r346h.png|Local environment of residue 346. Original arginine is replaced by histidine.<br />
File:2bff_i361v.png|Local environment of residue 361. Original isoleucine is replaced by valine.<br />
</gallery><br />
<br />
<!--<br />
===Mutant structures (SCWRL)===<br />
--><br />
<br />
===Energy comparisons===<br />
<br />
====FoldX====<br />
<table border=1><br />
<tr style="font-weight: bold;"><td>Mutation</td><td>PDB</td><td>SD</td><td>total</td><td>energy</td><td>Backbone</td><td>Hbond</td><td>Sidechain</td><td>Hbond</td><td>Van der Waals</td></tr><br />
<tr><td>M82L</td><td>pdb2bff_1</td><td>0</td><td>0.08</td><td>0.01</td><td>-0.05</td><td>0.46</td><td>-0.1</td><td>0</td><td>0.44</td></tr><br />
<tr><td>A222T</td><td>pdb2bff_2</td><td>0</td><td>1.22</td><td>0.17</td><td>-1.17</td><td>-0.7</td><td>0.02</td><td>1.48</td><td>-0.51</td></tr><br />
<tr><td>C264W</td><td>pdb2bff_3</td><td>0</td><td>34.77</td><td>-0.16</td><td>-0.15</td><td>-1.99</td><td>-0.09</td><td>2.88</td><td>-2.95</td></tr><br />
<tr><td>R346H</td><td>pdb2bff_4</td><td>0</td><td>1.55</td><td>1.28</td><td>1.9</td><td>1.36</td><td>-0.57</td><td>-3.03</td><td>0.73</td></tr><br />
<tr><td>I361V</td><td>pdb2bff_5</td><td>0</td><td>0.78</td><td>-0.02</td><td>0.01</td><td>0.54</td><td>-0.02</td><td>-0.29</td><td>1.06</td></tr><br />
</table><br />
<br />
Resulted structural models were compared to the structures calculated by SCWRL. By using alignment tool of PyMOL, we did not find any global deviation between the structures. Sequentially, structures produced by SCWRL have some residues missing. Following table shows the sequential difference.<br />
<br />
<table border=1><br />
<tr style="font-weight: bold;"><td>Mutation</td><td>RMSD(&Aring;)</td><td>Missing residues in SCWRL</td></tr><br />
<tr><td>M82L</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>A222T</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>C264W</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>R346H</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>I361V</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
</table><br />
<br />
In 3 mutant structures the mutation residues have different side-chain conformation between results of FoldX and SCWRL. The other 2 mutant structures do not show such difference.<br />
<br />
<gallery perrow=3 widths="330" heights="200" caption="Mutation residues with different side-chain conformations (FoldX in red, SCWRL in blue)"><br />
File:M82L-foldx-scwrl.png|Slightly difference between side-chain conformations of LEU82 in mutant '''M82L'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>)<br />
File:C264W-foldx-scwrl.png|Difference between side-chain conformations of TRP264 in mutant '''C264W'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>). They show completely different orientation of the heterocyclic ring in tryptophane.<br />
File:R346H-foldx-scwrl.png|Slightly difference between side-chain conformations of HIS346 in mutant '''R346H'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>)<br />
</gallery><br />
<br />
====Minimise====<br />
<br />
The enegies of each Minimise run for the wild type and mutant structures are given in the following table. For the mutant structures there was only an output for structures calculated with FoldX. For input structures from SCWRL, Minimise gave no output (probably there was a problem with gaps).<br />
<br />
<br />
{| class='wikitable' border='1' style='width:400px'<br />
! run !! 1 !! 2 !! 3 !! 4 !! 5<br />
|-<br />
| WT || -17591 || -17424 || -16955 || -16792 || -16578<br />
|-<br />
| M82L FoldX || -17981 || -17722 || -17298 || -17001 || -16809<br />
|-<br />
| A222T FoldX || -17940 || -17708 || -17297 || -17001 || -16800<br />
|-<br />
| C264W FoldX || -15606 || -17724 || -17319 || -17164 || -16897<br />
|-<br />
| R346H FoldX || -18014 || -17769 || -17339 || -17049 || -16853<br />
|-<br />
| I361V FoldX || -18010 || -17684 || -17313 || -17012 || -16799<br />
|}<br />
<br />
<br />
A reduction in the energy can only be observed for C264W, and only for the second recursive run. All other energies increase slightly with every run, but are overall similar to each other.<br />
<br />
<!--<br />
====Gromacs====<br />
--><br />
<br />
== Discussion ==<br />
<br />
* For the disease causing mutations (C264W and R346H), FoldX and SCWRL come to different results regarding the side chain conformation. This gives an indication, that the substituted amino acids do not fit into the local environment compared to the wild type structure. So they change the structure of the protein and hence can alter its function.<br />
* The tryptophane that replaces a cysteine in C264W reaches into an ion binding site and so might interfere with binding of that ion, resulting in inhibition of enzymatic activity.<br />
* We did not observe any minimization when running Minimise on the WT and mutant structures (except for C264W), the energy rather stagnated around a value of about -17000. Probably the structures are already in an energy minimum and can not be minimized any more, which does not mean that the mutant structures keep the functionality of the wild type.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_9_(MSUD)&diff=34486Task 9 (MSUD)2013-08-09T18:20:19Z<p>Weish: /* Visualization of mutant structures */</p>
<hr />
<div>==Results==<br />
<br />
[[Task 9 Lab Journal (MSUD)|Lab journal]]<br />
<br />
<br />
For this task we have chosen 5 mutations from HGMD and dbSNP. 2 of them are neutral mutations which do not have functional changes over the protein structure. Following are the 5 mutations:<br />
<br />
<br />
<table border=1><br />
<tr style="font-weight:bold;"><td>Database</td><td>Accession_code</td><td>Mutation</td><td>Pathogenic</td></tr><br />
<tr><td>dbSNP</td><td>rs11549938</td><td>M82L</td><td>False</td></tr><br />
<tr><td>dbSNP</td><td>rs141086188</td><td>A222T</td><td>False</td></tr><br />
<tr style="background-color: #ffbbbb;"><td>HGMD</td><td>CM045934</td><td>C264W</td><td>True</td></tr><br />
<tr style="background-color: #ffbbbb;"><td>HGMD</td><td>CM062448</td><td>R346H</td><td>True</td></tr><br />
<tr><td>dbSNP</td><td>rs61736656</td><td>I361V </td><td>False</td></tr><br />
</table><br />
<br />
<br />
===Selection of structure model===<br />
<br />
The following table shows an overview of all structures that are available for our protein:<br />
<br />
<br />
{| class='wikitable' border='1' style='width:800px'<br />
! Entry !! Method !! Resolution [Å] !! Chain !! Positions !! R-value !! pH !! gaps<br />
|-<br />
| 2BFD || X-ray || 1.39 || A || 46-445 || 0.15 || 5.5 || 289GLY-292SER,293THR-313ASP<br />
|- style="background-color: #FFBBBB;"<br />
| 2BFF || X-ray || 1.46 || A || 46-445 || 0.15 || 5.5 || 301ARG-305GLU<br />
|-<br />
| 1X7Y || X-ray || 1.57 || A || 46-445 || 0.15 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 2BFC || X-ray || 1.64 || A || 46-445 || 0.144 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 2BFE || X-ray || 1.69 || A || 46-445 || 0.15 || 5.5 || 289GLY-291HIS,293THR-313ASP<br />
|-<br />
| 1X7Z || X-ray || 1.72 || A || 46-445 || 0.154 || 5.8 || 301ARG-306VAL<br />
|-<br />
| 1X7W || X-ray || 1.73 || A || 46-445 || 0.148 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 2BFB || X-ray || 1.77 || A || 46-445 || 0.145 || 5.5 || 287ARG-313ASP<br />
|-<br />
| 2BEW || X-ray || 1.79 || A || 46-445 || 0.147 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1V1R || X-ray || 1.80 || A || 46-445 || 0.158 || 5.5 || 222ASN-231SER,286TYR-314HIS<br />
|-<br />
| 2BEV || X-ray || 1.80 || A || 46-445 || 0.139 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1U5B || X-ray || 1.83 || A || 46-445 || 0.156 || 5.8 || 301ARG-306VAL<br />
|-<br />
| 1WCI || X-ray || 1.84 || A || 46-445 || 0.149 || 5.5 || 301ARG-309TRP<br />
|-<br />
| 1OLS || X-ray || 1.85 || A || 46-445 || 0.172 || 5.5 || 292SER-295ASP,298SER-300TYR,301ARG-308TYR<br />
|-<br />
| 2J9F || X-ray || 1.88 || A/C || 46-445 || 0.171 || 5.5 || 300TYR-313ASP<br />
|-<br />
| 2BEU || X-ray || 1.89 || A || 46-445 || 0.171 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1OLU || X-ray || 1.90 || A || 46-445 || 0.161 || 5.5 || 26ASN-30GLY,287ARG-313ASP<br />
|-<br />
| 1V16 || X-ray || 1.90 || A || 46-445 || 0.132 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 1V11 || X-ray || 1.95 || A || 46-445 || 0.139 || 5.5 || 288ILE-313ASP<br />
|-<br />
| 1V1M || X-ray || 2.00 || A || 46-445 || 0.13 || 5.5 || 289GLY-313ASP<br />
|-<br />
| 1X80 || X-ray || 2.00 || A || 46-445 || 0.161 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 1X7X || X-ray || 2.10 || A || 46-445 || 0.149 || 5.8 || 287ARG-313ASP<br />
|-<br />
| 1OLX || X-ray || 2.25 || A || 46-445 || 0.161 || 5.5 || 301ARG-307ASN<br />
|-<br />
| 1DTW || X-ray || 2.70 || A || 46-445 || 0.224 || 7.5 || 301ARG-314HIS<br />
|}<br />
<br />
<br />
Unfortunately all structures contain gaps that span positions 302-304 (corresponding to 347-349 in the reference sequence), so we cannot create a composite structure, that does not contain this gap. We chose 2BFF because it has the smallest gap, a good resolution and a low R-value. It is not resolved at physiological pH, but the only structure with pH 7.5 (1DTW) has a bad resolution. The RMSD between 2BFF and 1DTW is about 0.3, so the different pH does not lead to a different structure and therefore the low pH at that 2BFF was resolved should not be a problem.<br />
<br />
<br />
===Visualization of mutant structures===<br />
The mutagen tool of PyMOL was used to introduce mutations to the protein structure. Mutations and their neighboring residues are visualized and shown in following figures.<br />
<gallery caption="Visualization of mutations in BCKDHA (PDB: 2BFF, all residue positions refer to the reference sequence of BCKDHA)" widths="300" heights="240" perrow=3><br />
File:2bff_m82l.png|Local environment of residue 82. Original methionine is replaced by leucine.<br />
File:2bff_a222t.png|Local environment of residue 222. Original alanine is replaced by threonine.<br />
File:2bff_c264w.png|Local environment of residue 264. Original cysteine is replaced by tryptophan. Clearly, the mutant residue tryptophan clashes with residues near the binding site (<span style="color:blue;">Blue</span>). The smallest atomic distance we have found is 1.2 &Acirc;.<br />
File:2bff_r346h.png|Local environment of residue 346. Original arginine is replaced by histidine.<br />
File:2bff_i361v.png|Local environment of residue 361. Original isoleucine is replaced by valine.<br />
</gallery><br />
<br />
<!--<br />
===Mutant structures (SCWRL)===<br />
--><br />
<br />
===Energy comparisons===<br />
<br />
====FoldX====<br />
<table border=1><br />
<tr style="font-weight: bold;"><td>Mutation</td><td>PDB</td><td>SD</td><td>total</td><td>energy</td><td>Backbone</td><td>Hbond</td><td>Sidechain</td><td>Hbond</td><td>Van der Waals</td></tr><br />
<tr><td>M82L</td><td>pdb2bff_1</td><td>0</td><td>0.08</td><td>0.01</td><td>-0.05</td><td>0.46</td><td>-0.1</td><td>0</td><td>0.44</td></tr><br />
<tr><td>A222T</td><td>pdb2bff_2</td><td>0</td><td>1.22</td><td>0.17</td><td>-1.17</td><td>-0.7</td><td>0.02</td><td>1.48</td><td>-0.51</td></tr><br />
<tr><td>C264W</td><td>pdb2bff_3</td><td>0</td><td>34.77</td><td>-0.16</td><td>-0.15</td><td>-1.99</td><td>-0.09</td><td>2.88</td><td>-2.95</td></tr><br />
<tr><td>R346H</td><td>pdb2bff_4</td><td>0</td><td>1.55</td><td>1.28</td><td>1.9</td><td>1.36</td><td>-0.57</td><td>-3.03</td><td>0.73</td></tr><br />
<tr><td>I361V</td><td>pdb2bff_5</td><td>0</td><td>0.78</td><td>-0.02</td><td>0.01</td><td>0.54</td><td>-0.02</td><td>-0.29</td><td>1.06</td></tr><br />
</table><br />
<br />
Resulted structural models were compared to the structures calculated by SCWRL. By using alignment tool of PyMOL, we did not find any global deviation between the structures. Sequentially, structures produced by SCWRL have some residues missing. Following table shows the sequential difference.<br />
<br />
<table border=1><br />
<tr style="font-weight: bold;"><td>Mutation</td><td>RMSD(&Aring;)</td><td>Missing residues in SCWRL</td></tr><br />
<tr><td>M82L</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>A222T</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>C264W</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>R346H</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
<tr><td>I361V</td><td>0.0</td><td>SER76,HIS96,MET146,LEU325,SER326</td></tr><br />
</table><br />
<br />
In 3 mutant structures the mutation residues have different side-chain conformation between results of FoldX and SCWRL. The other 2 mutant structures do not show such difference.<br />
<br />
<gallery perrow=3 widths="330" heights="200" caption="Mutation residues with different side-chain conformations (FoldX in red, SCWRL in blue)"><br />
File:M82L-foldx-scwrl.png|Slightly difference between side-chain conformations of LEU82 in mutant '''M82L'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>)<br />
File:C264W-foldx-scwrl.png|Difference between side-chain conformations of TRP264 in mutant '''C264W'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>). They show completely different orientation of the heterocyclic ring in tryptophane.<br />
File:R346H-foldx-scwrl.png|Slightly difference between side-chain conformations of HIS346 in mutant '''R346H'''(FoldX in <span style="color:red">red</span>, SCWRL in <span style="color:blue">blue</span>)<br />
</gallery><br />
<br />
====Minimise====<br />
<br />
The enegies of each Minimise run for the wild type and mutant structures are given in the following table. For the mutant structures there was only an output for structures calculated with FoldX. For input structures from SCWRL, Minimise gave no output (probably there was a problem with gaps).<br />
<br />
<br />
{| class='wikitable' border='1' style='width:400px'<br />
! run !! 1 !! 2 !! 3 !! 4 !! 5<br />
|-<br />
| WT || -17591 || -17424 || -16955 || -16792 || -16578<br />
|-<br />
| M82L FoldX || -17981 || -17722 || -17298 || -17001 || -16809<br />
|-<br />
| A222T FoldX || -17940 || -17708 || -17297 || -17001 || -16800<br />
|-<br />
| C264W FoldX || -15606 || -17724 || -17319 || -17164 || -16897<br />
|-<br />
| R346H FoldX || -18014 || -17769 || -17339 || -17049 || -16853<br />
|-<br />
| I361V FoldX || -18010 || -17684 || -17313 || -17012 || -16799<br />
|}<br />
<br />
<br />
A reduction in the energy can only be observed for C264W, and only for the second recursive run. All other energies increase slightly with every run, but are overall similar to each other.<br />
<br />
<!--<br />
====Gromacs====<br />
--><br />
<br />
== Discussion ==<br />
<br />
* For the disease causing mutations (C264W and R346H), FoldX and SCWRL come to different results regarding the side chain conformation. This gives an indication, that the substituted amino acids do not fit into the local environment compared to the wild type structure. So they change the structure of the protein and hence can alter its function.<br />
* The tryptophane that replaces a cysteine in C264W reaches into an ion binding site and so might interfere with binding of that ion, resulting in inhibition of enzymatic activity.<br />
* We did not observe any minimization when running Minimise on the WT and mutant structures (except for C264W), the energy rather stagnated around a value of about -17000. Probably the structures are already in an energy minimum and can not be minimized any more, which does not mean that the mutant structures keep the functionality of the wild type.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:2bff_c264w.png&diff=34485File:2bff c264w.png2013-08-09T18:17:37Z<p>Weish: uploaded a new version of "File:2bff c264w.png"</p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34484Task 7 (MSUD)2013-08-09T17:49:31Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 213 || N/A || I213I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 216 || N/A || A216A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 225 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R225W || missense || TRUE || disease<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34483Task 7 (MSUD)2013-08-09T17:47:52Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 213 || N/A || I213I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 216 || N/A || A216A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 225 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R225W || missense || TRUE || disease<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34482Task 7 (MSUD)2013-08-09T17:42:19Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 213 || N/A || I213I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 216 || N/A || A216A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 220 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R220W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34481Task 7 (MSUD)2013-08-09T17:41:22Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 213 || N/A || I213I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 215 || N/A || A215A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 220 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R220W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34480Task 7 (MSUD)2013-08-09T17:39:48Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 212 || N/A || I212I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 215 || N/A || A215A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 220 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R220W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=34479Task 7 (MSUD)2013-08-09T17:29:42Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|-<br />
| rs34442879 || 150 || N/A || T150M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 212 || N/A || I212I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 215 || N/A || A215A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 220 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R220W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_8_Lab_Journal_(MSUD)&diff=34478Task 8 Lab Journal (MSUD)2013-08-09T17:26:30Z<p>Weish: /* SNAP */</p>
<hr />
<div>== Visualization of mutations ==<br />
<br />
Very often the PDB structures do not have identical sequences in comparison to their original proteins. Deletion of residues is very frequent in PDB structure. In order to map the positions of mutations reported in SNP databases like HGMD and dbSNP, we have applied global alignment between reference sequence and the sequence in PDB structure of BCKDHA.<br />
<br />
Following is the alignment between reference sequence of BCKDHA and chain A of 1U5B. The result shows that the position of mutations reported in SNP databases should be shifted back with 45 residues. It means the 50th residue in reference sequence is actually the 5th in chain A of 1U5B.<br />
<br />
<nowiki>########################################<br />
# Program: needle<br />
# Rundate: Sat 29 Jun 2013 21:56:11<br />
# Commandline: needle<br />
# -auto<br />
# -stdout<br />
# -asequence emboss_needle-I20130629-215609-0499-45143350-es.aupfile<br />
# -bsequence emboss_needle-I20130629-215609-0499-45143350-es.bupfile<br />
# -datafile EBLOSUM62<br />
# -gapopen 10.0<br />
# -gapextend 0.5<br />
# -endopen 10.0<br />
# -endextend 0.5<br />
# -aformat3 pair<br />
# -sprotein1<br />
# -sprotein2<br />
# Align_format: pair<br />
# Report_file: stdout<br />
########################################<br />
<br />
#=======================================<br />
#<br />
# Aligned_sequences: 2<br />
# 1: NP_000700.1<br />
# 2: SEQUENCE<br />
# Matrix: EBLOSUM62<br />
# Gap_penalty: 10.0<br />
# Extend_penalty: 0.5<br />
#<br />
# Length: 445<br />
# Identity: 400/445 (89.9%)<br />
# Similarity: 400/445 (89.9%)<br />
# Gaps: 45/445 (10.1%)<br />
# Score: 2124.0<br />
# <br />
#<br />
#=======================================<br />
<br />
NP_000700.1 1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD 50<br />
|||||<br />
SEQUENCE 1 ---------------------------------------------SSLDD 5<br />
<br />
NP_000700.1 51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 100<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 55<br />
<br />
NP_000700.1 101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 150<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 105<br />
<br />
NP_000700.1 151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 200<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 155<br />
<br />
NP_000700.1 201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 250<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 205<br />
<br />
NP_000700.1 251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 300<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 255<br />
<br />
NP_000700.1 301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 350<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 305<br />
<br />
NP_000700.1 351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 400<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 306 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 355<br />
<br />
NP_000700.1 401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 445<br />
|||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 400<br />
<br />
<br />
#---------------------------------------<br />
#---------------------------------------</nowiki><br />
<br />
== Analysis of evolutionary conservation of mutated positions ==<br />
<br />
=== Using PSSM ===<br />
<br />
A position-specific scoring matrix (PSSM) was created with PSI-Blast:<br />
<br />
<code><br />
blastpgp -d /mnt/project/pracstrucfunc13/data/big/big_80 -i /mnt/home/student/weish/master-practical-2013/task01/refseq_BCKDHA_protein.fasta -j 5 -h 10e-10 -Q BCKDHA.pssm<br />
</code><br />
<br />
The conservation (observed frequency) of the wildtype and mutant amino acids was extracted from the PSSM file with the following Python script (located at <code>/mnt/home/student/schillerl/MasterPractical/task8/extract_aa_frequency.py</code>):<br />
<br />
<br />
<source lang=python><br />
'''<br />
Extract amino acid frequencies from PsiBlast PSSM (created with -Q option).<br />
<br />
Usage: python extract_aa_frequency.py <pssm file> [mutations]<br />
<br />
Example for usage:<br />
python extract_aa_frequency.py BCKDHA.pssm P38H M82L T151M A222T C264W R265W R314Q R346H I361V Y413H<br />
<br />
@author: Laura Schiller<br />
'''<br />
<br />
import sys<br />
<br />
pssm_file = open(sys.argv[1])<br />
lines = pssm_file.readlines()<br />
pssm_file.close()<br />
<br />
# positions of amino acid frequencies in PSSM<br />
aa_position = { 'A': 22,<br />
'R': 23,<br />
'N': 24,<br />
'D': 25,<br />
'C': 26,<br />
'Q': 27,<br />
'E': 28,<br />
'G': 29,<br />
'H': 30,<br />
'I': 31,<br />
'L': 32,<br />
'K': 33,<br />
'M': 34,<br />
'F': 35,<br />
'P': 36,<br />
'S': 37,<br />
'T': 38,<br />
'W': 39,<br />
'Y': 40,<br />
'V': 41}<br />
<br />
for i in range(2, len(sys.argv)):<br />
old_aa = sys.argv[i][0]<br />
new_aa = sys.argv[i][-1]<br />
position = sys.argv[i][1:(len(sys.argv[i]) - 1)]<br />
<br />
for j in range(2, len(lines)):<br />
splitted = lines[j].split()<br />
if splitted[0] == position:<br />
print("Frequencies at position %s: %s %s, %s %s" <br />
% (position, <br />
old_aa, <br />
splitted[aa_position[old_aa]], <br />
new_aa, <br />
splitted[aa_position[new_aa]]))<br />
break<br />
</source><br />
<br />
=== Using MSA ===<br />
<br />
Reference sequence of BCKDHA was aligned against uniprot database of mammals. Significance cutoff was set to 0.1 and maximal 1000 hit is to be shown. Using the web tool from uniprot, we can easily extract uniprot identifiers of the homologous mammal proteins. Then with the retrieve tool from uniprot all homologous mammal sequences can be extracted. At last, the multiple sequence alignment of reference sequence of BCKDHA and all homologous mammal proteins was generated with the ClustalW2 tool from EBI.<br />
<br />
== Predicting effects of mutations ==<br />
<br />
=== SIFT ===<br />
<br />
SIFT was run on [http://sift.jcvi.org/www/SIFT_seq_submit2.html this server] with default parameters.<br />
<br />
=== Polyphen2 ===<br />
<br />
[http://genetics.bwh.harvard.edu/pph2/ This server] was used with default parameters to run Polyphen2.<br />
<br />
=== MutationTaster ===<br />
The [http://www.mutationtaster.org/ MutationTaster] web tool was used to analyze mutations in BCKDHA. Because the server uses mRNA sequence to evaluate SNPs, we have consulted the reference sequence of BCKDHA (NM_000709) and find out the open reading frame of BCKDHA begins at 40th nucleotide i.e. the first start codon. Then we can map mutations of amino acid to the mRNA sequences.<br />
<br />
=== SNAP ===<br />
For this part of the task we have used SNAP2. In order to get SNAP2 run properly, we have to write a run-configuration file .snap2rc and put it into the home directory on the student server. Following is the configuration file:<br />
<br />
<source lang="ini"><br />
[snap2]<br />
# snapfun_utildir=path - path to package utilities, default: /usr/share/snap2<br />
snap2dir=/usr/share/snap2<br />
#use snap cache [0|1], default: 0<br />
use_snap_cache=0<br />
#snap cache fetch executable<br />
snapc_fetch=/usr/bin/snapc_fetch<br />
#snap cache store executable<br />
snapc_store=/usr/bin/snapc_store<br />
#snap cache root - overrides snap-cache-mgr configuration<br />
snap_cache_root=<br />
# blastpgp_processors<br />
blastpgp_processors=1<br />
#use predictprotein cache, default: 0<br />
use_pp_cache=0<br />
#predictprotein executable<br />
pp_exe=/usr/bin/predictprotein<br />
#sift executable<br />
sift_exe=/usr/bin/sift_for_submitting_fasta_seq.csh<br />
#reprof executable<br />
reprof_exe=/usr/bin/reprof<br />
#blast executable<br />
blast_exe=/usr/bin/blastpgp<br />
<br />
<br />
[data]<br />
# swiss_dat=path - location of UniProt/Swiss-Prot dat file<br />
swiss_dat=/mnt/project/pracstrucfunc13/data/swissprot/20120501/uniprot_sprot.dat<br />
# db_swiss=path - path to ID index of Swiss-Prot dat file (generated by /usr/share/librg-utils-perl/dbSwiss.pl)<br />
db_swiss=/mnt/project/pracstrucfunc13/data/swissprot/20120501/dbswiss<br />
# PHAT substitution matrix<br />
phat_matrix=/usr/share/snap2/phat.txt<br />
<br />
[blast]<br />
# big80=path - path to redundancy reduced database (UniProtKB 80 or equivalent)<br />
big80=/mnt/project/pracstrucfunc13/data/big/big_80<br />
# swiss=path - path to SwissProt database<br />
swiss=/mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot<br />
</source><br />
<br />
To call snap2, we have used this command: <tt>snap2 -m mutation-list.txt -i ../task01/refseq_BCKDHA_protein.fasta -o snapfun-bckdha.result</tt></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_8_Lab_Journal_(MSUD)&diff=34477Task 8 Lab Journal (MSUD)2013-08-09T17:23:14Z<p>Weish: /* SNAP */</p>
<hr />
<div>== Visualization of mutations ==<br />
<br />
Very often the PDB structures do not have identical sequences in comparison to their original proteins. Deletion of residues is very frequent in PDB structure. In order to map the positions of mutations reported in SNP databases like HGMD and dbSNP, we have applied global alignment between reference sequence and the sequence in PDB structure of BCKDHA.<br />
<br />
Following is the alignment between reference sequence of BCKDHA and chain A of 1U5B. The result shows that the position of mutations reported in SNP databases should be shifted back with 45 residues. It means the 50th residue in reference sequence is actually the 5th in chain A of 1U5B.<br />
<br />
<nowiki>########################################<br />
# Program: needle<br />
# Rundate: Sat 29 Jun 2013 21:56:11<br />
# Commandline: needle<br />
# -auto<br />
# -stdout<br />
# -asequence emboss_needle-I20130629-215609-0499-45143350-es.aupfile<br />
# -bsequence emboss_needle-I20130629-215609-0499-45143350-es.bupfile<br />
# -datafile EBLOSUM62<br />
# -gapopen 10.0<br />
# -gapextend 0.5<br />
# -endopen 10.0<br />
# -endextend 0.5<br />
# -aformat3 pair<br />
# -sprotein1<br />
# -sprotein2<br />
# Align_format: pair<br />
# Report_file: stdout<br />
########################################<br />
<br />
#=======================================<br />
#<br />
# Aligned_sequences: 2<br />
# 1: NP_000700.1<br />
# 2: SEQUENCE<br />
# Matrix: EBLOSUM62<br />
# Gap_penalty: 10.0<br />
# Extend_penalty: 0.5<br />
#<br />
# Length: 445<br />
# Identity: 400/445 (89.9%)<br />
# Similarity: 400/445 (89.9%)<br />
# Gaps: 45/445 (10.1%)<br />
# Score: 2124.0<br />
# <br />
#<br />
#=======================================<br />
<br />
NP_000700.1 1 MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDD 50<br />
|||||<br />
SEQUENCE 1 ---------------------------------------------SSLDD 5<br />
<br />
NP_000700.1 51 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 100<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 6 KPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKE 55<br />
<br />
NP_000700.1 101 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 150<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 56 KVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDN 105<br />
<br />
NP_000700.1 151 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 200<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 106 TDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKER 155<br />
<br />
NP_000700.1 201 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 250<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 156 HFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGDAHAGF 205<br />
<br />
NP_000700.1 251 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 300<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 206 NFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDG 255<br />
<br />
NP_000700.1 301 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 350<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 256 NDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDE 305<br />
<br />
NP_000700.1 351 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 400<br />
||||||||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 306 VNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERK 355<br />
<br />
NP_000700.1 401 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 445<br />
|||||||||||||||||||||||||||||||||||||||||||||<br />
SEQUENCE 356 PKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK 400<br />
<br />
<br />
#---------------------------------------<br />
#---------------------------------------</nowiki><br />
<br />
== Analysis of evolutionary conservation of mutated positions ==<br />
<br />
=== Using PSSM ===<br />
<br />
A position-specific scoring matrix (PSSM) was created with PSI-Blast:<br />
<br />
<code><br />
blastpgp -d /mnt/project/pracstrucfunc13/data/big/big_80 -i /mnt/home/student/weish/master-practical-2013/task01/refseq_BCKDHA_protein.fasta -j 5 -h 10e-10 -Q BCKDHA.pssm<br />
</code><br />
<br />
The conservation (observed frequency) of the wildtype and mutant amino acids was extracted from the PSSM file with the following Python script (located at <code>/mnt/home/student/schillerl/MasterPractical/task8/extract_aa_frequency.py</code>):<br />
<br />
<br />
<source lang=python><br />
'''<br />
Extract amino acid frequencies from PsiBlast PSSM (created with -Q option).<br />
<br />
Usage: python extract_aa_frequency.py <pssm file> [mutations]<br />
<br />
Example for usage:<br />
python extract_aa_frequency.py BCKDHA.pssm P38H M82L T151M A222T C264W R265W R314Q R346H I361V Y413H<br />
<br />
@author: Laura Schiller<br />
'''<br />
<br />
import sys<br />
<br />
pssm_file = open(sys.argv[1])<br />
lines = pssm_file.readlines()<br />
pssm_file.close()<br />
<br />
# positions of amino acid frequencies in PSSM<br />
aa_position = { 'A': 22,<br />
'R': 23,<br />
'N': 24,<br />
'D': 25,<br />
'C': 26,<br />
'Q': 27,<br />
'E': 28,<br />
'G': 29,<br />
'H': 30,<br />
'I': 31,<br />
'L': 32,<br />
'K': 33,<br />
'M': 34,<br />
'F': 35,<br />
'P': 36,<br />
'S': 37,<br />
'T': 38,<br />
'W': 39,<br />
'Y': 40,<br />
'V': 41}<br />
<br />
for i in range(2, len(sys.argv)):<br />
old_aa = sys.argv[i][0]<br />
new_aa = sys.argv[i][-1]<br />
position = sys.argv[i][1:(len(sys.argv[i]) - 1)]<br />
<br />
for j in range(2, len(lines)):<br />
splitted = lines[j].split()<br />
if splitted[0] == position:<br />
print("Frequencies at position %s: %s %s, %s %s" <br />
% (position, <br />
old_aa, <br />
splitted[aa_position[old_aa]], <br />
new_aa, <br />
splitted[aa_position[new_aa]]))<br />
break<br />
</source><br />
<br />
=== Using MSA ===<br />
<br />
Reference sequence of BCKDHA was aligned against uniprot database of mammals. Significance cutoff was set to 0.1 and maximal 1000 hit is to be shown. Using the web tool from uniprot, we can easily extract uniprot identifiers of the homologous mammal proteins. Then with the retrieve tool from uniprot all homologous mammal sequences can be extracted. At last, the multiple sequence alignment of reference sequence of BCKDHA and all homologous mammal proteins was generated with the ClustalW2 tool from EBI.<br />
<br />
== Predicting effects of mutations ==<br />
<br />
=== SIFT ===<br />
<br />
SIFT was run on [http://sift.jcvi.org/www/SIFT_seq_submit2.html this server] with default parameters.<br />
<br />
=== Polyphen2 ===<br />
<br />
[http://genetics.bwh.harvard.edu/pph2/ This server] was used with default parameters to run Polyphen2.<br />
<br />
=== MutationTaster ===<br />
The [http://www.mutationtaster.org/ MutationTaster] web tool was used to analyze mutations in BCKDHA. Because the server uses mRNA sequence to evaluate SNPs, we have consulted the reference sequence of BCKDHA (NM_000709) and find out the open reading frame of BCKDHA begins at 40th nucleotide i.e. the first start codon. Then we can map mutations of amino acid to the mRNA sequences.<br />
<br />
=== SNAP ===<br />
For this part of the task we have used SNAP2. In order to get SNAP2 run properly, we have to write a run-configuration file .snap2rc and put it into the home directory on the student server. Following is the configuration file:<br />
<br />
<source lang="ini"><br />
[snap2]<br />
# snapfun_utildir=path - path to package utilities, default: /usr/share/snap2<br />
snap2dir=/usr/share/snap2<br />
#use snap cache [0|1], default: 0<br />
use_snap_cache=0<br />
#snap cache fetch executable<br />
snapc_fetch=/usr/bin/snapc_fetch<br />
#snap cache store executable<br />
snapc_store=/usr/bin/snapc_store<br />
#snap cache root - overrides snap-cache-mgr configuration<br />
snap_cache_root=<br />
# blastpgp_processors<br />
blastpgp_processors=1<br />
#use predictprotein cache, default: 0<br />
use_pp_cache=0<br />
#predictprotein executable<br />
pp_exe=/usr/bin/predictprotein<br />
#sift executable<br />
sift_exe=/usr/bin/sift_for_submitting_fasta_seq.csh<br />
#reprof executable<br />
reprof_exe=/usr/bin/reprof<br />
#blast executable<br />
blast_exe=/usr/bin/blastpgp<br />
<br />
<br />
[data]<br />
# swiss_dat=path - location of UniProt/Swiss-Prot dat file<br />
swiss_dat=/mnt/project/pracstrucfunc13/data/swissprot/20120501/uniprot_sprot.dat<br />
# db_swiss=path - path to ID index of Swiss-Prot dat file (generated by /usr/share/librg-utils-perl/dbSwiss.pl)<br />
db_swiss=/mnt/project/pracstrucfunc13/data/swissprot/20120501/dbswiss<br />
# PHAT substitution matrix<br />
phat_matrix=/usr/share/snap2/phat.txt<br />
<br />
[blast]<br />
# big80=path - path to redundancy reduced database (UniProtKB 80 or equivalent)<br />
big80=/mnt/project/pracstrucfunc13/data/big/big_80<br />
# swiss=path - path to SwissProt database<br />
swiss=/mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot<br />
</source><br />
<br />
To call snap2, we have used this command: <tt>snap2 -i </tt></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:Pymol-alignment-multitemplate-low.png&diff=34475File:Pymol-alignment-multitemplate-low.png2013-08-09T17:00:37Z<p>Weish: uploaded a new version of "File:Pymol-alignment-multitemplate-low.png"</p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:Pymol-alignment-3exg.png&diff=34474File:Pymol-alignment-3exg.png2013-08-09T16:59:43Z<p>Weish: uploaded a new version of "File:Pymol-alignment-3exg.png"</p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34463Task 2 (MSUD)2013-08-09T16:26:38Z<p>Weish: /* Results */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
<br />
==== Distributions of E-value and sequence identity ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
==== Intersection of hits ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
==== Evaluation through structure and function ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
* E-value distribution: <br />
** Very few hits were found with very low E-values. These hits show high statistical significance.<br />
** Because that different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has a set of hits with larger range of e-value.<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations. Although the hits are statistically more significant, but the biological significance should be tested. If more iterations were used the shift could be even larger, so the overlap between statistical hits and biological homologs must be evaluated. A proper number of iterations should be selected. <br />
<br />
* Identity distribution<br />
** Results show that BLAST depends mostly on sequence identity. Homologs with low sequence identity but high biological similarity could be lost. <br />
<br />
* Intersection of hits<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST.<br />
** Two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits.<br />
** PSI-BLAST with 10 iterations has smaller intersection with BLAST. <br />
** Two PSI-BLAST runs with 10 iterations and different E-value cutoffs share the fewest common hits. The explanation could be, the E-value cutoff may have higher influence than the number of iterations. <br />
<br />
* SCOP of hit sequences<br />
** Both BLAST and PSI-BLAST find the right fold class for BCKDHA.<br />
** PSI-BLAST finds more hits in the fold class that describes the query protein best. Most hits have c.36 which is for Thiamin diphosphate-binding fold. This fold classification is just the main binding function of BCKDHA. <br />
** PSI-BLAST also find hits in more fold classes which may describe biological similarities of domains and motives between hits and query protein. <br />
<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms. It may be more sensitive to functional patterns in sequence.<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_lab_journal_(MSUD)&diff=34462Task 2 lab journal (MSUD)2013-08-09T16:26:26Z<p>Weish: /* Data creation */</p>
<hr />
<div>== Sequence searches ==<br />
=== Configure parameters===<br />
* We have used <tt>blastall</tt> for blastp and psiblast<br />
* BLAST and PSI-BLAST<br />
** '''big_80''' database was used for sequence search: <tt>-d /mnt/project/pracstrucfunc13/data/big/big_80</tt><br />
** output put format was set to xml and tab separated values:<br />
*** XML output: <tt>-m 7</tt><br />
*** TSV output: <tt>-m 9</tt><br />
** the number of hits to be shown was set to 200000: <tt>-b 200000</tt><br />
** For PSI-Blast number of iterations and cutoff of E-values were set with following parameters<br />
*** iteration: -j <number_of_iterations><br />
*** E-value cutoff: -h <threshold><br />
* HHBlits<br />
** '''uniprot20''' database was used for sequence search: <tt>-d /mnt/project/pracstrucfunc13/data/hhblits/uniprot20_current</tt><br />
** Resulted a3m, hhm and hhr files are stored: <tt>-o result.hhr -oa3m result.a3m -ohhm result.hhm</tt><br />
** Output size was also set to 200000: <tt>-Z 200000 -B 200000</tt><br />
<br />
=== Data creation ===<br />
For each of blast, psi-blast and hhblits, a shell script was written to perform sequence search.<br />
* Blast: [[all-blast.sh]] [[run-blast.pl]]<br />
* Psi-Blast: [[all-psiblast.sh]]<br />
* HHBlits: [[all-hhblits.sh]]<br />
<br />
==== Convert result of hhblits ====<br />
We find result of hhblits in hhr format is not parser-friendly. So the program '''<tt>[[hhr2tsv]]</tt>''' was written. It finds out all hits and their statistics information in hhr file and write the data out to a tsv (tab-separated values) file.<br />
<br />
==== SCOP classification ====<br />
In order to check the quality of sequence search programs, we have used the parsable file from Structural Classification of Proteins('''SCOP''').<br />
The classification of domains in PDB files can be observed in file <tt>[http://scop.mrc-lmb.cam.ac.uk/scop/parse/ dir.cla.scop.txt_1.75]</tt>.<br />
<br />
==== Id mapping from Uniprot to Gene Ontology ====<br />
Quality check of sequence search programs was also performed using Gene Ontology(GO) annotations. The Idmapping data was used to assign GO annotations to each Uniprot proteins (genes).<br />
* Idmapping data in TSV format was downloaded from FTP site of Uniprot: [http://www.uniprot.org/downloads idmapping_selected.tab.gz] (large file! 1.2 GB) [ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/README README of idmapping]<br />
* GO term IDs are stored in the 7th column<br />
* Uniprot accession codes are in the 1st column<br />
<br />
Following command was used for extraction of all Uniprot entries with GO annotation:<br />
<nowiki>zcat idmapping_selected.tab.gz | cut -f1,7 | grep -vP '\s$' > uniprot_go_mapping.tsv</nowiki><br />
<br />
For efficient searching of GO terms, the mapping data was stored into a SQLite3 database. Following is the schema of the database :<br />
<source lang='sql'><br />
CREATE TABLE uniprot_go_mapping ( <br />
uniprot_ac CHAR( 6 ) NOT NULL,<br />
go_term CHAR( 10 ) NOT NULL,<br />
PRIMARY KEY ( uniprot_ac, go_term ) <br />
);<br />
CREATE INDEX idx_uniprot_ac ON uniprot_go_mapping (uniprot_ac ASC);<br />
</source><br />
<br />
Following is the python script for transporting mapping data into SQLite.<br />
<source lang='python'><br />
#!/usr/bin/env python<br />
import sqlite3<br />
import math<br />
conn = sqlite3.connect('idmapping.sqlite3')<br />
c = conn.cursor()<br />
<br />
'''callback function for map procedure<br />
pack key and value into one tuple'''<br />
def generate_key_val_pair(key, val):<br />
return (key, val)<br />
<br />
tsv = file('/mnt/datentank/uniprot_go_reduced.map')<br />
<br />
entriesBuffer = []<br />
<br />
# add uniprot ac and GO term pairs into sqlite db<br />
for lineId, line in enumerate(tsv):<br />
line = line.strip()<br />
keyval = line.split('\t')<br />
key = keyval[0]<br />
vals = keyval[1].split('; ')<br />
keys = [ key ] * len(vals)<br />
entries = map(generate_key_val_pair, keys, vals)<br />
if (len(entriesBuffer) < 1000000):<br />
entriesBuffer.extend(entries)<br />
else:<br />
c.executemany("INSERT OR IGNORE INTO uniprot_go_mapping \<br />
(uniprot_ac,go_term) VALUES (?,?);", entriesBuffer)<br />
conn.commit()<br />
entriesBuffer[:] = []<br />
print 'Processed %d lines' % (lineId)<br />
<br />
tsv.close()<br />
</source><br />
<br />
==== Input and output ====<br />
The query sequences for the 4 subunits of BCKDC locate at /mnt/home/student/weish/master-practical-2013/task01/.<br />
Results for sequence search locate in the directory /mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results. For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in *.tsv files. Detailed results are shown in xml files. For HHBlits, the *.hhr files contain information about statistics and hits.<br />
<br />
=== Perform statistics ===<br />
After all hhr files were converted into tsv files and preparation of SQLite3 database for id mapping, a R script was used for all the statistical tasks, including distribution of e-values and identity, intersection of hits, SCOP and GO tests.<br />
<br />
Here is the R code:<br />
<nowiki><br />
#######################################################################<br />
#File: evalAlign.R<br />
#Description: perform pairwise comparison of sequence search methods<br />
#######################################################################<br />
<br />
library(ggplot2)<br />
<br />
library(RSQLite)<br />
<br />
loadHHBlitsTSV <- function(file) {<br />
data <- read.csv(file=file, sep='\t', header=TRUE)<br />
data$eval <- data$e_value<br />
return(data)<br />
}<br />
<br />
loadBlastTSV <- function(file) {<br />
data <- read.csv(file=file, sep='\t', header=FALSE, comment.char='#')<br />
names(data) <- c('query', <br />
'hit', <br />
'identity', <br />
'algn_len', <br />
'mismatch', <br />
'gap_open',<br />
'q_start',<br />
'q_end',<br />
's_start', <br />
's_end', <br />
'eval',<br />
'score')<br />
return(data)<br />
}<br />
<br />
#get pdb hits from sequence search results<br />
getPDBEntries <- function(data)<br />
{<br />
d <- data[grep('pdb\\|pdb', data$hit),]<br />
d$chain <- gsub(pattern='^.*(\\w)$', replacement='\\1', d$hit)<br />
d$pdb <- gsub(pattern='^.*(\\w{4})_\\w$', replacement='\\1', d$hit)<br />
d$pdb_chain <- paste(d$pdb, d$chain, sep='_')<br />
return(d)<br />
}<br />
<br />
#get uniprot hits from sequence search results<br />
getUniProtEntries <- function(data)<br />
{<br />
d <- data[grep("^tr|sp\\|", data$hit), ]<br />
d$uniprot <- gsub(pattern="^(tr|sp)\\|(\\w*)\\|.*$", replacement="\\2", d$hit)<br />
return(d)<br />
}<br />
<br />
#load parsable file: dir.cla.scop.txt<br />
loadSCOPClassification <- function(file)<br />
{<br />
data <- read.csv(file=file, sep='\t', comment.char='#')<br />
names(data) <- c('domain',<br />
'pdb',<br />
'chain',<br />
'sccs',<br />
'sunid',<br />
'fullstr')<br />
data$pdb_chain <- paste(data$pdb, substr(data$chain,1,1), sep='_')<br />
return(data)<br />
}<br />
<br />
getSCOPFold <- function(scopData)<br />
{<br />
gsub(pattern="^(\\w\\.\\d*)\\..*$", replacement="\\1", scopData$sccs)<br />
}<br />
<br />
getUniprotGO <- function(data, database.path)<br />
{<br />
conn <- dbConnect(dbDriver('SQLite'), dbname=database.path)<br />
uniprot_acs <- unique(data$uniprot)<br />
#create temperory table<br />
dbGetQuery(conn, "CREATE TEMPORARY TABLE temp_uniprot_ac(uniprot_ac CHAR(6));")<br />
dbSendPreparedQuery(<br />
conn, <br />
"INSERT INTO temp_uniprot_ac(uniprot_ac) VALUES(:uniprot_ac)",<br />
bind.data=data.frame(uniprot_ac=data$uniprot))<br />
result <- dbGetQuery(<br />
conn, <br />
"SELECT count(t.uniprot_ac) as count, go_term as go<br />
FROM temp_uniprot_ac t,uniprot_go_mapping m <br />
WHERE t.uniprot_ac = m.uniprot_ac GROUP BY go_term ORDER BY count")<br />
dbSendQuery(conn, "DROP TABLE temp_uniprot_ac;")<br />
dbDisconnect(conn)<br />
return(result)<br />
}<br />
<br />
###<br />
# configure test cases for reference sequences<br />
###<br />
querys <- c('BCKDHA', 'BCKDHB', 'DBT', 'DLD')<br />
data.path <- <br />
'/home/wei/git/MasterPractical2013/results/task02/01-seq-search/search-results'<br />
color.palette <- 'Set2'<br />
database.path <- '/home/wei/idmapping.sqlite3'<br />
image.type <- '.png'<br />
<br />
#load SCOP classification<br />
scop <- loadSCOPClassification(<br />
file='/home/wei/git/MasterPractical2013/data/SCOP/dir.cla.scop.txt_1.75')<br />
<br />
for (query in querys) {<br />
#configure input files<br />
input.filenames <- c(paste('blastp_', query, '.tsv', sep=''),<br />
paste('psiblast-2_iterations_eval_0.002-refseq_', query, '_protein.fasta.tsv', sep=''),<br />
paste('psiblast-2_iterations_eval_10e-10-refseq_', query, '_protein.fasta.tsv', sep=''),<br />
paste('psiblast-10_iterations_eval_0.002-refseq_', query, '_protein.fasta.tsv', sep=''),<br />
paste('psiblast-10_iterations_eval_10e-10-refseq_', query, '_protein.fasta.tsv', sep=''),<br />
paste('hhblits_refseq_', query, '_protein.fasta.hhr.tsv', sep=''))<br />
data.names <- c('blast', <br />
'psiblast(iter. 2, e-val. 0.002)', <br />
'psiblast(iter. 2, e-val. 10e-10)',<br />
'psiblast(iter. 10, e-val. 0.002)',<br />
'psiblast(iter. 10, e-val. 10e-10)',<br />
'hhblits')<br />
data.titles <- c('Blast',<br />
'PSI-Blast[iter. 2, eval. 0.002]',<br />
'PSI-Blast[iter. 2, eval. 10e-10]',<br />
'PSI-Blast[iter. 10, eval. 0.002]',<br />
'PSI-Blast[iter. 10, eval. 10e-10]',<br />
'HHBlits')<br />
<br />
#load data<br />
data <- list()<br />
evals <- c()<br />
methods <- c()<br />
identities <- c()<br />
for (index in 1:length(input.filenames))<br />
{<br />
data.name <- data.names[index]<br />
print(data.name)<br />
input.path <- file.path(data.path, input.filenames[index])<br />
if (data.name != 'hhblits') <br />
{<br />
frame <- loadBlastTSV(input.path)<br />
} else {<br />
frame <- loadHHBlitsTSV(input.path)<br />
}<br />
data[[ data.name ]] <- frame<br />
n <- length(frame$eval)<br />
evals <- c(evals, frame$eval)<br />
methods <- c(methods, rep(x=data.name, n))<br />
identities <- c(identities, frame$identity)<br />
}<br />
DATA <- data.frame(evalue=evals, identity=identities, method=methods)<br />
<br />
###<br />
# evaluation: e-value distribution<br />
###<br />
PLOT <- ggplot(DATA, aes(x=evalue))<br />
PLOT + geom_density(aes(colour=factor(method), fill=factor(method)), alpha=.7) +<br />
scale_x_log10() + scale_alpha(range=c(0, 1)) +<br />
ggtitle(paste('E-value distribtution (', query, ')', sep='')) +<br />
xlab('E-value') + ylab('Density')+<br />
scale_colour_brewer(palette=color.palette) +<br />
scale_fill_brewer(palette=color.palette)<br />
ggsave(paste("e-value-distribution_", query, image.type, sep=''), width=8.3, height=6.8, dpi=100)<br />
###<br />
# evaluation: identity distribution<br />
### <br />
PLOT <- ggplot(DATA, aes(x=identity))<br />
PLOT + geom_density(aes(colour=factor(method), fill=factor(method)), alpha=.7) +<br />
scale_alpha(range=c(0, 1)) +<br />
ggtitle(paste('Identity distribtution (', query, ')', sep='')) +<br />
xlab('Identity') + ylab('Density')+<br />
scale_colour_brewer(palette=color.palette) +<br />
scale_fill_brewer(palette=color.palette)<br />
ggsave(paste("identity_distribution_", query,image.type, sep=''), width=8.3, height=6.8, dpi=100)<br />
<br />
###<br />
# evaluation: intersection curve<br />
###<br />
thresholds <- c(0, 1e-100, 1e-90, 1e-80, 1e-70, <br />
1e-60, 1e-50, 1e-40, 1e-30, 1e-20, <br />
1e-10, 1, 10)<br />
for (index1 in 1:length(data.names))<br />
{<br />
dataset1 <- data[[index1]]<br />
hits1 <- dataset1$hit<br />
meth.name <- data.names[index1]<br />
threshold <- c()<br />
quality <- c()<br />
method <- c()<br />
<br />
for (index2 in 1:length(data.names))<br />
{<br />
if (index2 == index1) {<br />
next();<br />
}<br />
dataset2 <- data[[index2]]<br />
hits2 <- dataset2$hit<br />
curmeth <- data.names[index2]<br />
for (thr in thresholds)<br />
{<br />
intersection <- length( <br />
intersect(<br />
hits1[ dataset1$eval <= thr ],<br />
hits2[ dataset2$eval <= thr ]) )<br />
avg.intersect <- 0.5 * (<br />
intersection / length(hits1) + <br />
intersection / length(hits2) )<br />
<br />
threshold <- c(threshold, thr)<br />
method <- c(method, curmeth)<br />
quality <- c(quality, avg.intersect)<br />
}<br />
}<br />
<br />
performance <- data.frame(threshold, quality, method)<br />
PLOT <- ggplot(performance, aes(x=threshold, y=quality, colour=method))<br />
PLOT + geom_line(size=1) + geom_point() + scale_x_log10() +<br />
ggtitle( <br />
paste('Relative intersections (comparison to ',meth.name,')', sep='') ) +<br />
xlab('E-value') + ylab('intersecting results') +<br />
scale_colour_brewer(palette=color.palette) + ylim(0,1)<br />
ggsave( paste("intersection_to_", meth.name,"_", query,image.type, sep=''), width=8.3, height=6.8, dpi=100)<br />
}<br />
<br />
###<br />
# evaluation: protein structure classification<br />
###<br />
for (methId in 1:length(data.names))<br />
{<br />
method <- data.names[methId]<br />
dataset <- getPDBEntries(data[[ method ]])<br />
subsetSCOP <- scop[ scop$pdb_chain %in% dataset$pdb_chain, ]<br />
d <- data.frame(folds=getSCOPFold(subsetSCOP))<br />
if (length(d$folds) == 0)<br />
{<br />
next();<br />
}<br />
d$freq <- rep(0, length(d$folds))<br />
ftable <- table(d$folds)<br />
for (fold in names(ftable))<br />
{<br />
d$freq[ d$folds == fold ] <- ftable[ fold ]<br />
}<br />
PLOT <- ggplot(d, aes(x=reorder(folds, freq)))<br />
PLOT + geom_histogram() + <br />
ggtitle("Histogram of fold classes from annotated pdb hits") +<br />
xlab('fold class')<br />
ggsave( paste("SCOP_histogram_", method, "_", query, image.type, sep=''), width=8.3, height=6.8, dpi=100 );<br />
}<br />
<br />
###<br />
# evaluation: Gene Ontology<br />
###<br />
# common_gos <- list()<br />
for (methId in 1:length(data.names))<br />
{<br />
method <- data.names[methId]<br />
dataset <- getUniProtEntries(data[[ method ]])<br />
if (length(dataset$uniprot) == 0)<br />
{<br />
next();<br />
}<br />
result <- getUniprotGO(dataset, database.path=database.path)<br />
total.count <- length(dataset$uniprot)<br />
#result$percent <- result$count / length(dataset$uniprot)<br />
result$percent <- result$count / total.count<br />
#result <- result[ result$count > 0.05 * total.count, ]<br />
result <- result[ order(result$count, decreasing=TRUE)[1:5], ]<br />
PLOT <- ggplot(result, aes(x=reorder(go, count), y=percent))<br />
PLOT + geom_bar(stat='identity') + coord_flip() + <br />
ggtitle("Histogram of top-5 go terms from annotated uniprot hits") +<br />
xlab('GO term') + ylab('Frequency') + ylim(0,1)<br />
ggsave( paste("GO_histogram_", method, "_", query, ".png", sep=''), width=8.3, height=6.8, dpi=100 );<br />
<br />
# go_terms <- data.frame(threshold = thresholds, count=rep(0, length(thresholds)))<br />
# for (thr in thresholds)<br />
# {<br />
# subset <- dataset[ dataset$eval <= thr, ]<br />
# result <- getUniprotGO(dataset, database.path=database.path)<br />
# go_terms$count[ go_terms$threshold == thr ] <- length(unique(result$go))<br />
# }<br />
# common_gos[method] <- go_terms<br />
}<br />
<br />
save.image(file=paste('Workspace_', query, '.RData', sep=''), compress="xz")<br />
}<br />
</nowiki><br />
<br />
== Multiple sequence alignments ==<br />
<br />
=== Dataset creation ===<br />
The datasets were created from the Blast output.<br />
<br />
For creating datasets with low, high and whole range sequence identity, the following Python script was used:<br />
<br />
<br />
<br />
<source lang="python"><br />
'''<br />
Use blast output to create datasets with different sequence identities.<br />
<br />
Call with:<br />
python create_dataset.py <query fasta file> <blast xml output> <database fasta file><br />
<br />
@author: Laura Schiller<br />
'''<br />
<br />
import sys<br />
from Bio import SeqIO, pairwise2<br />
from Bio.Blast import NCBIXML<br />
from Bio.SubsMat import MatrixInfo<br />
<br />
def get_sequences(query, blast_xml, db_path, out_file):<br />
'''<br />
Fetch full length sequences for a BLAST result and write them in a FASTA file.<br />
<br />
@param query: fasta file with query sequence.<br />
@param blast_xml: xml file with BLAST search result.<br />
@param db_path: path of db fasta file.<br />
@param out_file: file to store the sequences.<br />
@return: name of the file where the sequences are stored.<br />
'''<br />
<br />
print("get all sequences for blast result in %s" % blast_xml)<br />
query_sequence = SeqIO.read(query, "fasta")<br />
hit_seqs = [query_sequence]<br />
<br />
blast_output = open(blast_xml)<br />
blast_result = NCBIXML.read(blast_output)<br />
blast_output.close()<br />
<br />
hit_list = []<br />
for alignment in blast_result.alignments:<br />
hit_list.append(alignment.title.split(" ")[1]) # the id of the sequence <br />
<br />
# get all blast hits<br />
seqs_db = SeqIO.parse(db_path, "fasta")<br />
counter = 0<br />
number = len(blast_result.alignments)<br />
for seq in seqs_db:<br />
if seq.id in hit_list:<br />
hit_seqs.append(seq)<br />
counter = counter + 1<br />
if (counter % 100) == 0:<br />
print("%d of %d sequences found" % (counter, number))<br />
if counter == number:<br />
break<br />
print("%d of %d sequences found" % (counter, len(blast_result.alignments)))<br />
<br />
# sort sequences according to order in blast result<br />
hit_seqs_sorted = [hit_seqs[0]]<br />
for seq_id in hit_list:<br />
for seq in hit_seqs:<br />
if seq.id == seq_id:<br />
hit_seqs_sorted.append(seq)<br />
break<br />
<br />
SeqIO.write(hit_seqs_sorted, out_file, "fasta")<br />
<br />
print("sequences saved in %s" % out_file)<br />
return out_file<br />
<br />
def filter_seqs(seq_file, name):<br />
'''<br />
Filter sequences according to sequence identity limits.<br />
<br />
@param seq_file: fasta file with sequences.<br />
@param name: string used for output file names.<br />
'''<br />
<br />
sequences = SeqIO.parse(seq_file, "fasta")<br />
seqs = []<br />
for seq in sequences:<br />
seqs.append(seq)<br />
query = seqs.pop(0) # always keep query (first sequence)<br />
# lists for low / high / whole range sequence identity<br />
filtered = [[query], [query], [query]]<br />
<br />
# identify hits with pdb structures -> these are preferentially taken<br />
hits_pdb = [hit for hit in seqs if (hit.id.split("|")[0] == "pdb")]<br />
seqs = [seq for seq in seqs if not (seq.id in [pdb_seq.id for pdb_seq in hits_pdb])]<br />
hits_pdb.extend(seqs) # now pdb hits are at the beginning<br />
<br />
print("filter sequences")<br />
for seq in hits_pdb: <br />
try: <br />
ident = identity(query, seq) <br />
except KeyError: # raises if there is a non amino acid letter<br />
continue<br />
if (len(filtered[0]) < 10):<br />
keep = True <br />
for seq2 in filtered[0]:<br />
try: <br />
ident2 = identity(seq, seq2) <br />
except KeyError:<br />
keep = False<br />
break<br />
if ident2 >= 0.3:<br />
keep = False<br />
break<br />
if keep:<br />
filtered[0].append(seq)<br />
if (len(filtered[1]) < 10) and (ident > 0.6):<br />
filtered[1].append(seq)<br />
if (len(filtered[2]) < 8) and (ident >= 0.3) and (ident <= 0.6):<br />
filtered[2].append(seq)<br />
if (len(filtered[0]) == 10) and (len(filtered[1]) == 10) and (len(filtered[2]) == 8):<br />
break<br />
<br />
# for whole range take a part of low and a part of high sequence identity<br />
# plus the rest middle sequence identity<br />
filtered[2].extend(filtered[0][1:min(len(filtered[0]), 7)])<br />
filtered[2].extend(filtered[1][1:min(len(filtered[1]), 7)])<br />
<br />
SeqIO.write(filtered[0], name + "_low_seq_ident.fasta", "fasta")<br />
SeqIO.write(filtered[1], name + "_high_seq_ident.fasta", "fasta")<br />
SeqIO.write(filtered[2], name + "_whole_range_seq_ident.fasta", "fasta")<br />
<br />
print("sequences with low / high / whole range sequence identity saved in:")<br />
print(name + "_low_seq_ident.fasta")<br />
print(name + "_high_seq_ident.fasta")<br />
print(name + "_whole_range_seq_ident.fasta") <br />
<br />
def identity(seq1, seq2):<br />
'''<br />
Calculate relative sequence identity of two sequences.<br />
<br />
@return: number of identical residues divided by mean length.<br />
'''<br />
<br />
#pairwise alignment<br />
matrix = MatrixInfo.blosum62<br />
gap_open = -10<br />
gap_extend = -0.5 <br />
alignment = pairwise2.align.globalds(seq1, seq2, matrix, gap_open, gap_extend)<br />
seq1_aligned = alignment[0][0]<br />
seq2_aligned = alignment[0][1]<br />
<br />
#sequence identity<br />
ident = sum(c1 == c2 for c1, c2 in zip(seq1_aligned, seq2_aligned))<br />
ref_length = (len(seq1) + len(seq2)) / 2 # mean length<br />
return float(ident) / ref_length<br />
<br />
<br />
if __name__ == '__main__':<br />
namestring = sys.argv[1].split("/")[-1].split(".")[0] # used as beginning of the output files<br />
print("-----------------------------------------------------------")<br />
all_seqs = get_sequences(sys.argv[1], sys.argv[2], sys.argv[3], namestring + "_all_sequences.fasta")<br />
filter_seqs(all_seqs, namestring)<br />
</source><br />
<br />
<br />
<br />
The script can be found in <code>/mnt/home/student/schillerl/MasterPractical/task2/create_dataset.py</code>.<br />
<br />
The datasets are located at <code>/mnt/home/student/schillerl/MasterPractical/task2/datasets/</code>.<br />
<br />
=== Calling MSA programs ===<br />
<br />
Call of T-Coffee:<br />
<br />
<pre><br />
#!/bin/bash<br />
<br />
proteins=( BCKDHA BCKDHB DBT DLD )<br />
identities=( low high whole_range )<br />
<br />
for protein in ${proteins[*]}<br />
do<br />
for identity in ${identities[*]}<br />
do<br />
t_coffee -output fasta -infile ${protein}_${identity}_seq_ident.fasta -outfile ${protein}_${identity}_seq_ident_tcoffee.fasta <br />
done<br />
done<br />
</pre><br />
<br />
Muscle:<br />
<br />
<pre><br />
#!/bin/bash<br />
<br />
proteins=( BCKDHA BCKDHB DBT DLD )<br />
identities=( low high whole_range )<br />
<br />
for protein in ${proteins[*]}<br />
do<br />
for identity in ${identities[*]}<br />
do<br />
muscle -in ${protein}_${identity}_seq_ident.fasta -out ${protein}_${identity}_seq_ident_muscle.fasta <br />
done<br />
done<br />
</pre><br />
<br />
Mafft:<br />
<br />
<pre><br />
#!/bin/bash<br />
<br />
proteins=( BCKDHA BCKDHB DBT DLD )<br />
identities=( low high whole_range )<br />
<br />
for protein in ${proteins[*]}<br />
do<br />
for identity in ${identities[*]}<br />
do<br />
mafft ${protein}_${identity}_seq_ident.fasta > ${protein}_${identity}_seq_ident_mafft.fasta <br />
done<br />
done<br />
</pre><br />
<br />
<br />
<br />
Espresso (a version of T-Coffee that uses structural information to find the right alignment) was run at the folling server: [http://www.igs.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/index.cgi?stage1=1&daction=EXPRESSO%283DCoffee%29::Regular Espresso].<br />
<br />
The MSAs are located at <code>/mnt/home/student/schillerl/MasterPractical/task2/MSAs/</code>.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34460Task 2 (MSUD)2013-08-09T16:15:45Z<p>Weish: /* Discussion */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
<br />
<div style="color:silver"><br />
<div style="color:red">Old version</div><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== Distributions of E-value and sequence identity ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
==== Intersection of hits ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
==== Evaluation through structure and function ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
* E-value distribution: <br />
** Very few hits were found with very low E-values. These hits show high statistical significance.<br />
** Because that different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has a set of hits with larger range of e-value.<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations. Although the hits are statistically more significant, but the biological significance should be tested. If more iterations were used the shift could be even larger, so the overlap between statistical hits and biological homologs must be evaluated. A proper number of iterations should be selected. <br />
<br />
* Identity distribution<br />
** Results show that BLAST depends mostly on sequence identity. Homologs with low sequence identity but high biological similarity could be lost. <br />
<br />
* Intersection of hits<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST.<br />
** Two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits.<br />
** PSI-BLAST with 10 iterations has smaller intersection with BLAST. <br />
** Two PSI-BLAST runs with 10 iterations and different E-value cutoffs share the fewest common hits. The explanation could be, the E-value cutoff may have higher influence than the number of iterations. <br />
<br />
* SCOP of hit sequences<br />
** Both BLAST and PSI-BLAST find the right fold class for BCKDHA.<br />
** PSI-BLAST finds more hits in the fold class that describes the query protein best. Most hits have c.36 which is for Thiamin diphosphate-binding fold. This fold classification is just the main binding function of BCKDHA. <br />
** PSI-BLAST also find hits in more fold classes which may describe biological similarities of domains and motives between hits and query protein. <br />
<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms. It may be more sensitive to functional patterns in sequence.<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34456Task 2 (MSUD)2013-08-09T15:57:44Z<p>Weish: /* Discussion */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
<br />
<div style="color:silver"><br />
<div style="color:red">Old version</div><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== Distributions of E-value and sequence identity ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
==== Intersection of hits ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
==== Evaluation through structure and function ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values -> hits with high statistical significance<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value -> higher density for hhblits at high E-value<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations -> better search result?<br />
* Identity distribution<br />
** Results show that BLAST depend mostly on sequence identity -> possible lose of patterns with low sequence identity but high biological similarity<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST<br />
** two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST<br />
** two PSI-BLAST run with 10 iterations and different E-value cutoffs share the fewest common hits -> E-value cutoff may have higher influence after more iterations<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for query protein<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also find hits in more fold classes which may describe domains of query protein<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms -> It may be more sensitive to functional patterns in sequence<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MUSD)_Additional_Results&diff=34455Task 2 (MUSD) Additional Results2013-08-09T15:56:05Z<p>Weish: /* Discussion */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values. These hits have very high statistical significance.<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value. HHblits found more frequently hits with high E-value.<br />
** For protein BCKDHB, PSI-BLAST tends to find out more hits with intermediate E-value (1e-106 to 1e-25).<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations. If we have chosen more iterations for PSI-Blast the shift of result space might be even larger. This phenomenon may cause PSI-blast to find out more hits that are over fitted to the statistical model but less hits with high biological significance. So there is a trade off between iteration number and result quality. <br />
<br />
* Identity distribution<br />
** Results show that BLAST depends mostly on sequence identity because most hits shift towards higher sequence identity. Hits with low sequence identity but high biological similarity may be lose in this case. <br />
<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database. We were not able to find out the sequence ID from the results for HHBlits, which is important for getting the intersection of results. <br />
** PSI-BLAST with 2 iterations has a result set of larger intersection with the result set of BLAST.<br />
** Two PSI-BLAST runs with 2 iterations and different E-value cutoffs have very similar set of hits.<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST. This might be explained by the shift of result space by using PSI-Blast with higher number of iterations. <br />
** Two PSI-BLAST runs with 10 iterations and different E-value cutoffs share the fewest common hits. It is possibly due to the effect of choice of E-value cutoff. <br />
<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for the query proteins.<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also finds out hits in more fold classes which may describe domains of query protein.<br />
** PSI-Blast seems to find out more hits with biological significance.<br />
<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms. It may be more sensitive to functional patterns in sequence. Another explanation could be, due to the shift effect of result space, more irrelevant hits might be included which come with more different GO terms.<br />
<br />
== Multiple sequence alignment ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
=== Discussion ===</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MUSD)_Additional_Results&diff=34446Task 2 (MUSD) Additional Results2013-08-09T15:34:58Z<p>Weish: /* Multiple sequence alignment */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
<br />
== Multiple sequence alignment ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
=== Discussion ===</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MUSD)_Additional_Results&diff=34443Task 2 (MUSD) Additional Results2013-08-09T15:33:03Z<p>Weish: </p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
<br />
== Multiple sequence alignment ==<br />
<br />
=== Results ===<br />
<br />
=== Discussion ===</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34381Task 2 (MSUD)2013-08-08T23:45:57Z<p>Weish: /* Results */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
<br />
<div style="color:silver"><br />
<div style="color:red">Old version</div><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== Distributions of E-value and sequence identity ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
==== Intersection of hits ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
==== Evaluation through structure and function ====<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values -> hits with high statistical significance<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value -> higher density for hhblits at high E-value<br />
** For protein BCKDHA and BCKDHB, PSI-BLAST tends to find out more hits with intermediate E-value (1e-106 to 1e-25)<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations -> better search result?<br />
* Identity distribution<br />
** Results show that BLAST depend mostly on sequence identity -> possible lose of patterns with low sequence identity but high biological similarity<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST<br />
** two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST<br />
** two PSI-BLAST run with 10 iterations and different E-value cutoffs share the fewest common hits -> E-value cutoff may have higher influence after more iterations<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for query protein<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also find hits in more fold classes which may describe domains of query protein<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms -> It may be more sensitive to functional patterns in sequence<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34380Task 2 (MSUD)2013-08-08T23:42:49Z<p>Weish: /* Sequence searches */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
<br />
<div style="color:silver"><br />
<div style="color:red">Old version</div><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== BCKDHA ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values -> hits with high statistical significance<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value -> higher density for hhblits at high E-value<br />
** For protein BCKDHA and BCKDHB, PSI-BLAST tends to find out more hits with intermediate E-value (1e-106 to 1e-25)<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations -> better search result?<br />
* Identity distribution<br />
** Results show that BLAST depend mostly on sequence identity -> possible lose of patterns with low sequence identity but high biological similarity<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST<br />
** two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST<br />
** two PSI-BLAST run with 10 iterations and different E-value cutoffs share the fewest common hits -> E-value cutoff may have higher influence after more iterations<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for query protein<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also find hits in more fold classes which may describe domains of query protein<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms -> It may be more sensitive to functional patterns in sequence<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MUSD)_Additional_Results&diff=34379Task 2 (MUSD) Additional Results2013-08-08T23:40:10Z<p>Weish: Created page with "=== Results === The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>. Results for sequence search loc…"</p>
<hr />
<div>=== Results ===<br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34378Task 2 (MSUD)2013-08-08T23:36:14Z<p>Weish: /* Results */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
<div style="border:1px dash;"><br />
<div style="color:red">New version</div><br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
</div><br />
<br />
<div style="color:silver"><br />
<div style="color:red">Old version</div><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== BCKDHA ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values -> hits with high statistical significance<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value -> higher density for hhblits at high E-value<br />
** For protein BCKDHA and BCKDHB, PSI-BLAST tends to find out more hits with intermediate E-value (1e-106 to 1e-25)<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations -> better search result?<br />
* Identity distribution<br />
** Results show that BLAST depend mostly on sequence identity -> possible lose of patterns with low sequence identity but high biological similarity<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST<br />
** two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST<br />
** two PSI-BLAST run with 10 iterations and different E-value cutoffs share the fewest common hits -> E-value cutoff may have higher influence after more iterations<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for query protein<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also find hits in more fold classes which may describe domains of query protein<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms -> It may be more sensitive to functional patterns in sequence<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_2_(MSUD)&diff=34377Task 2 (MSUD)2013-08-08T23:35:20Z<p>Weish: /* Results */</p>
<hr />
<div>== Sequence searches ==<br />
<br />
[[Task 2 lab journal (MSUD)#Sequence searches|Lab journal]]<br />
<br />
=== Results ===<br />
<div style="border:1px dash;"><br />
<span style="color:red">New version</span><br />
We have performed sequence search experiments for all of the 4 subunits of BCKDC. In this page, we mainly describe and discuss the results for the subunit BCKDHA. Results and discussions for other 3 subunits are covered in this page: [[Task 2 (MUSD) Additional Results|Additional Results]].<br />
</div><br />
<br />
<div style="color:silver"><br />
The query sequences for the 4 subunits of BCKDC locate at <nowiki>/mnt/home/student/weish/master-practical-2013/task01/</nowiki>.<br />
<br />
Results for sequence search locate in the directory <nowiki>/mnt/home/student/weish/master-practical-2013/task02/01-seq-search/results</nowiki>.<br />
For BLAST and PSI-BLAST, statistics (such as E-value, probability and identity) are stored in <nowiki>*.tsv</nowiki> files. Detailed results are shown in xml files. For HHBlits, the <nowiki>*.hhr</nowiki> files contain information about statistics and hits.<br />
</div><br />
<br />
==== BCKDHA ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHA.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
File:Identity distribution BCKDHA.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHA)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHA.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHA.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHA.png|Distribution of SCOP fold in BLAST hits(only one classified PDB structure was found in SCOP)<br />
File:SCOP histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHA.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHA.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== BCKDHB ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution BCKDHB.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
File:Identity distribution BCKDHB.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of BCKDHB)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast BCKDHB.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits BCKDHB.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast BCKDHB.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast BCKDHB.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) BCKDHB.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DBT ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DBT.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
File:Identity distribution DBT.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DBT)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DBT.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DBT.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DBT.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram psiblast(iter. 10, e-val. 0.002) DBT.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 0.002). This is the only test run which have PDB hits with SCOP classification.<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DBT.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DBT.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
==== DLD ====<br />
<gallery widths=500px heights=411px caption="E-value and identity distribution for different sequence search methods"><br />
File:E-value-distribution DLD.png|E-value distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
File:Identity distribution DLD.png|Indentity distribution of sequence search methods. (Query sequence is RefSeq of DLD)<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Intersection of hits between different sequence search methods"><br />
File:Intersection to blast DLD.pdf.png|Relative intersection of hits between BLAST and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 2, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 2, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 0.002) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 0.002) and other sequence search methods.<br />
File:Intersection to psiblast(iter. 10, e-val. 10e-10) DLD.pdf.png|Relative intersection between PSI-BLAST(iter. 10, E-value 10e-10) and other sequence search methods.<br />
File:Intersection to hhblits DLD.pdf.png|Relative intersection between HHBlits and other sequence search methods.<br />
</gallery><br />
<br />
<gallery widths=500px heights=400px perrow=2 caption="Distribution of SCOP folds"><br />
File:SCOP histogram blast DLD.png|Distribution of SCOP fold in BLAST hits<br />
File:SCOP histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Distribution of SCOP fold in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
<gallery widths=335px heights=267px perrow=3 caption="Top-5 common GO terms in hits with GO annotation"><br />
File:GO histogram blast DLD.png|Top-5 common GO terms in BLAST hits<br />
File:GO histogram psiblast(iter. 2, e-val. 0.002) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 2, E-value 0.002)<br />
File:GO histogram psiblast(iter. 10, e-val. 10e-10) DLD.png|Top-5 common GO terms in hits of PSI-BLAST(iter. 10, E-value 10e-10)<br />
</gallery><br />
<br />
=== Discussion ===<br />
*E-value distribution: <br />
** Very few hits were found with very low E-values -> hits with high statistical significance<br />
** Because different databases were used for BLAST/PSI-BLAST and HHBLits, hhblits has found hits with larger range of e-value -> higher density for hhblits at high E-value<br />
** For protein BCKDHA and BCKDHB, PSI-BLAST tends to find out more hits with intermediate E-value (1e-106 to 1e-25)<br />
*** E-value distribution of PSI-BLAST shift to low E-value side with more iterations -> better search result?<br />
* Identity distribution<br />
** Results show that BLAST depend mostly on sequence identity -> possible lose of patterns with low sequence identity but high biological similarity<br />
* Intersection of hits<br />
** HHBlits was not comparable to other methods due to different sequence database<br />
** PSI-BLAST with 2 iterations has bigger intersection with BLAST<br />
** two PSI-BLAST run with 2 iterations and different E-value cutoffs have very similar set of hits<br />
** PSI-BLAST with 10 iterations has less intersection with BLAST<br />
** two PSI-BLAST run with 10 iterations and different E-value cutoffs share the fewest common hits -> E-value cutoff may have higher influence after more iterations<br />
* SCOP of hit sequences<br />
** PDB sequence required -> no evaluation for HHBlits<br />
** Both BLAST and PSI-BLAST find the right fold class for query protein<br />
** PSI-BLAST generally find more hits in the fold class that describes the query protein best (e.g. DLD protein, c.3 is FAD/NAD(P)-binding domain)<br />
** PSI-BLAST also find hits in more fold classes which may describe domains of query protein<br />
* Gene Ontology of hit proteins<br />
** Top-5 GO terms in hits of PSI-BLAST with different iterations are more conserved. They also have similar ranking of frequency. <br />
** PSI-BLAST finds out hits with more GO terms -> It may be more sensitive to functional patterns in sequence<br />
<br />
== Multiple sequence alignments ==<br />
<br />
[[Task 2 lab journal (MSUD)#Multiple sequence alignments|Lab journal]]<br />
<br />
=== Results ===<br />
<br />
In the following sections the MSAs, visualised with [http://www.jalview.org/ Jalview], are shown.<br />
<br />
==== BCKDHA ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHA_whole_range_seq_ident_espresso.png]]<br />
<br />
==== BCKDHB ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_low_seq_ident_espresso.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_high_seq_ident_espresso.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_tcoffee.png]]<br />
<br />
Espresso:<br />
[[Image:MSUD_BCKDHB_whole_range_seq_ident_espresso.png]]<br />
<br />
==== DBT ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_low_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_low_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_low_seq_ident_tcoffee.png]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DBT_whole_range_seq_ident_tcoffee.png]]<br />
<br />
==== DLD ====<br />
<br />
===== Low sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_low_seq_ident_mafft.png|18716px]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_low_seq_ident_muscle.png|18455px]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_low_seq_ident_tcoffee.png|18644px]]<br />
<br />
===== High sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_high_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_high_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_high_seq_ident_tcoffee.png]]<br />
<br />
===== Whole range sequence identity =====<br />
<br />
Mafft:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_mafft.png]]<br />
<br />
Muscle:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_muscle.png]]<br />
<br />
T-Coffee:<br />
[[Image:MSUD_DLD_whole_range_seq_ident_tcoffee.png]]<br />
<br />
=== Discussion ===<br />
<br />
For the datasets with high sequence identity the three MSA programs Mafft, Muscle and T-Coffee come to similar results and find almost the same conserved blocks. Sometimes T-Coffee arranges gaps differently than the others and so does not find as much conserved columns. Especially at the ends of the sequences, the results of the programs differ a little. This is due to different scoring schemes that are used in the programs.<br />
<br />
For low sequence identity, the programs have problems to find the right alignment. They do not agree in the position of gaps and also sometimes find different conserved columns. They do not cope with low similarity and so one cannot really rely on these results. Here structural information, as it is used in Espresso (which belongs to T-Coffee), can help to find the right alignment: Espresso can align more residues than T-Coffee.<br />
<br />
For whole range sequence identity the results are similar w. r. t. many and different gaps at the ends of the sequences, but the programs agree more in the conserved columns that they find.<br />
<br />
The results of Muscle and Mafft seem more similar to each other than to those of T-Coffee. T-Coffee often treats the ends of the sequences, which have low sequence identity, differently than the others. It is striking that almost always the alignment of Muscle has the shortest length, especially in cases with low sequence identity. If an alignment is very long, this means there are many gaps and less aligned residues, this might be a sign of bad alignment quality.<br />
<br />
Altogether, there appear regions with many conserved columns and those with many gaps. The conserved blocks or columns correspond to secondary structure elements and functionally important residues, respectively. Gaps in the alignment appear in regions where there are loops in the structure of the protein, so that insertions or deletions that occur during evolution do not alter the overall structure or function of the protein.<br />
<br />
As criteria for a good alignment one could run different alignment algorithms like in this task and compare the results. If one of them finds more conserved columns, this might be better than another. Different programs can be better than others if different datasets are used, so it is always a good idea to try more than one algorithm and pick out the best result. Mafft is often a good choice because it generated relatively precise results but still is very fast.</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_7_(MSUD)&diff=33663Task 7 (MSUD)2013-07-30T09:38:29Z<p>Weish: /* Mutation map */</p>
<hr />
<div>== HGMD ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Only disease causing mutations are included. HGMD lists the following mutation types:<br />
* missense/nonsense <br />
* splicing <br />
* regulatory<br />
* small deletions <br />
* small insertions <br />
* small indels <br />
* gross deletions <br />
* gross insertions/duplications<br />
* complex rearrangements <br />
* repeat variations<br />
For each mutation entry the following information is given in the public version:<br />
* accession number<br />
* codon change<br />
* amino acid change<br />
* codon number<br />
* phenotype<br />
* reference <br />
* comments<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Release 2013.1 is the current professional version. Entries are made publicly accessible three years after they are included. Mutations that are taken from publicly available locus-specific mutation databases are immediately added to the public version.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information is extracted from articles that describe genetic diseases. So only published mutations are included.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
The following mutations are listed for BCKDHA (20 June 2013):<br />
<br />
{| class="wikitable" border="1" style="width:300px"<br />
! mutation type !! number of mutations<br />
|-<br />
|missense/nonsense ||40 <br />
|-<br />
|splicing ||2<br />
|-<br />
|small deletions ||4<br />
|-<br />
|small insertions ||1<br />
|-<br />
|small indels ||1<br />
|-<br />
|gross deletions ||2<br />
|-<br />
|complex rearrangements ||1<br />
|}<br />
<br />
<br />
All reported mutations are associated with MSUD. Among the 40 mutations of category "missense/nonsense", there are 37 missense mutations listet and 3 nonsense mutations.<br />
<br />
Definition of the mutation types:<br />
* missense: single base substitution that leads to amino acid change<br />
* nonsense: single base substitution that leads to a stop codon<br />
* splicing: mutation that affects a splicing side<br />
* small deletion: deletion of few base pairs<br />
* small insertion: insertion of few base pairs<br />
* small indel: insertion / deletion of few base pairs<br />
* gross deletion: deletion of many base pairs<br />
* complex rearrangement: insertion / deletion of many base pairs<br />
<br />
== dbSNP ==<br />
<br />
[[File:Hist.png|thumb|320px|Histogram of different types of SNPs reported in dbSNP.]]<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Short variations in '''nucleotide''' sequence from many different organisms. It contains following information:<br />
* mutations of different categories:<br />
** single nucleotide variations<br />
** indels<br />
** short tandem repeats<br />
** microsatellites<br />
* additional information for rare variations<br />
** disease relationship<br />
** genotype information<br />
** allele origin<br />
** somatic or germline events<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Current version of dbSNP is build 137. dbSNP web query, ftp data and Entrez Indexing were released on Jun 26, 2012. New release of BLAST database is not yet done. The newest release of BLAST database was released on Nov 14, 2011 from build 135.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': dbSNP is created by the cooperation of the National Human Genome Research Institute and the National Center for Biotechnology Information. It is integrated with the NCBI Genomic data. There are two sorts of content in dbSNP: submitted and computed data. During a build cycle, submitted SNPs (identified by ss#) which map to the same genomic position, are clustered to a non-redundant set of reference SNPs (refSNPs), that get a unique rs# identifier.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Totally 292 SNPs in coding region of BCKDHA were found in dbSNP. 4 mutations are nonsense (stop-gained) which introduce stop condon in the coding region. 152 mutations are missense among which 28 mutations can cause disease. 136 mutations are synonymous codons.<br />
<br />
== SNPdbe ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': Experimentally annotated effects of non-synonymous SNPs (nsSNP). Computationally annotated structural and functional effects of nsSNP. Association between nsSNP and diseases.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': The most recent update took place on Mar 05, 2012. <br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Experimentally annotated nsSNP from dbSNP; Variants from UniProt and PMD; Genomic data from 1000 Genome collection; predicted impacts on protein structure and function are computed with SNAP and SIFT.<br />
<br />
=== Mutations of BCKDHA ===<br />
102 SNPs were reported in SNPdbe for BCKDHA. Among them 8 SNPs were reported to have association to MSUD.<br />
<br />
== OMIM ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': In the allelic variants section of a gene entry, mutations (e. g. substitions or deletions) are given and the phenotype that they are causing. Only selected mutations are listed (see [http://omim.org/help/faq#1.4 OMIM FAQ]), most of which are disease associated.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': OMIM is updated daily. The entry for BCKDHA was last updated 05/23/2012.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': The information comes from published articles. For each mutation the reference article is given in the text of the allelic variants section.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
For BCKDHA, there are 7 missense mutations listed and 2 deletions, where one is a 1-bp (base pair) deletion and the other 8-bp (last update of entry: 05/23/2012). All these mutations are associated with MSUD type IA (classic or intermediate form).<br />
<br />
== SNPedia ==<br />
<br />
=== Database facts ===<br />
<span style="color:blue">'''Q'''</span>: What information is given?<br />
<br />
'''A''': The wiki style project 'SNPedia' is open to the internet community. It contains information about effects of SNPs. Annotations from wide range of internet resources such as the dbSNP project, Ensembl or even google search are included into SNPedia. It tries to gather all SNP related information to one web site.<br />
<br />
<span style="color:blue">'''Q'''</span>: How recent is the release?<br />
<br />
'''A''': Due to contribution of its user community, new updates could occur at any time point. But still it depends on the release of other SNP related resources.<br />
<br />
<span style="color:blue">'''Q'''</span>: Where does the information come from?<br />
<br />
'''A''': Many different public available databases, resources about SNPs, publications about genomic studies.<br />
<br />
=== Mutations of BCKDHA ===<br />
<br />
Due the fact that SNPedia is not a database-like data source. Statistics over reported SNPs for BCKDHA is hard to obtain.<br />
<br />
== Mutation map ==<br />
<br />
<span style="color:red; background-color:yellow">TODO: correct SNP position</span><br />
<br />
102 mutations were selected from different databases. Disease causing mutations are marked in <span style="color: red">'''red'''</span>, mutations that do not cause disease are marked in <span style="color: blue">'''blue'''</span>.<br />
<br />
[[File:Mutation-map.png|950px]]<br />
<br />
Following table contains the SNPs that we have chosen from different databases:<br />
{| class='wikitable' border='1' style='width:900px'<br />
! Accession.Number !! Codon.number !! Pathogenic !! Mutation !! Type !! pathogenic !! all<br />
|-<br />
| || 17 || N/A || L17F || missense || FALSE || silent<br />
|-<br />
| || 29 || N/A || G29E || missense || FALSE || silent<br />
|-<br />
| rs11549936 || 38 || N/A || P38H || missense || FALSE || silent<br />
|-<br />
| rs80014754 || 38 || N/A || P38P || synonymous-codon || FALSE || silent<br />
|-<br />
| rs150177278 || 41 || N/A || Q41R || missense || FALSE || silent<br />
|-<br />
| || 59 || N/A || A59V || missense || FALSE || silent<br />
|-<br />
| rs149251798 || 61 || N/A || I61M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM093772 || 69 || Maple syrup urine disease || Q69* || nonsense || TRUE || disease<br />
|-<br />
| rs138025447 || 70 || N/A || N70N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549938 || 81 || N/A || M81L || missense || FALSE || silent<br />
|-<br />
| rs148571328 || 95 || N/A || H95H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs11549937 || 96 || N/A || L96L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM005526 || 109 || Maple syrup urine disease || M109T || missense || TRUE || disease<br />
|-<br />
| rs150700696 || 111 || N/A || L111L || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021496 || 125 || Maple syrup urine disease || Q125E || missense || TRUE || disease<br />
|-<br />
| rs139678295 || 126 || N/A || R126W || missense || FALSE || silent<br />
|-<br />
| rs201638798 || 133 || N/A || N133N || synonymous-codon || FALSE || silent<br />
|-<br />
| rs146804716 || 140 || N/A || H140H || synonymous-codon || FALSE || silent<br />
|-<br />
| rs200947033 || 145 || N/A || A145A || synonymous-codon || FALSE || silent<br />
|-<br />
| rs34442879 || 150 || N/A || T150M || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM021497 || 151 || Maple syrup urine disease || T151M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082498 || 152 || Maple syrup urine disease || D152N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM930067 || 159 || Maple syrup urine disease || R159W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984173 || 166 || Maple syrup urine disease || Y166N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984174 || 167 || Maple syrup urine disease || R167Q || missense || TRUE || disease<br />
|-<br />
| || 170 || N/A || P170S || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930068 || 190 || Maple syrup urine disease || Q190K || missense || TRUE || disease<br />
|-<br />
| rs190610188 || 199 || N/A || R199C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 6 || 204 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || G204S || missense || TRUE || disease<br />
|-<br />
| || 209 || N/A || L209A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM097509 || 211 || Maple syrup urine disease || T211M || missense || TRUE || disease<br />
|-<br />
| rs10404506 || 212 || N/A || I212I || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984175 || 213 || Maple syrup urine disease || I213T || missense || TRUE || disease<br />
|-<br />
| rs114716391 || 215 || N/A || A215A || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062450 || 216 || Maple syrup urine disease || A216V || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 8 || 219 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || C219W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 5 || 220 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || R220W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062451 || 220 || Maple syrup urine disease || A220V || missense || TRUE || disease<br />
|-<br />
| rs141086188 || 221 || N/A || A221T || missense || FALSE || silent<br />
|-<br />
| rs146932786 || 235 || N/A || F235F || synonymous-codon || FALSE || silent<br />
|-<br />
| || 244 || N/A || G244R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 3 || 245 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA || G245R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852874 || 248 || True || G248S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984176 || 249 || Maple syrup urine disease || G249S || missense || TRUE || disease<br />
|-<br />
| rs199599175 || 252 || N/A || A252T || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930069 || 253 || Maple syrup urine disease || A253T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM005527 || 254 || Maple syrup urine disease || A254D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM021498 || 258 || Maple syrup urine disease || C258Y || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852876 || 263 || True || C263W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM045934 || 264 || Maple syrup urine disease || C264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852873 || 264 || True || R264W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 7 || 265 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || T265R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984177 || 265 || Maple syrup urine disease || R265W || missense || TRUE || disease<br />
|-<br />
| || 265 || N/A || R265A || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984178 || 267 || Maple syrup urine disease || N267S || missense || TRUE || disease<br />
|-<br />
| rs201991385 || 272 || N/A || T272T || synonymous-codon || FALSE || silent<br />
|-<br />
| rs61737367 || 279 || N/A || R279R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062452 || 283 || Maple syrup urine disease || G283D || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM984179 || 285 || Maple syrup urine disease || A285P || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM970163 || 287 || Maple syrup urine disease || R287* || nonsense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852871 || 289 || True || G289R || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM950135 || 290 || Maple syrup urine disease || G290R || missense || TRUE || disease<br />
|-<br />
| || 296 || N/A || R296C || missense || FALSE || silent<br />
|-<br />
| rs200137189 || 296 || N/A || R296H || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062446 || 297 || Maple syrup urine disease || R297C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076017 || 297 || Maple syrup urine disease || R297H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062453 || 300 || Maple syrup urine disease || G300S || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062449 || 302 || Maple syrup urine disease || D302A || missense || TRUE || disease<br />
|-<br />
| rs139390622 || 306 || N/A || N306N || synonymous-codon || FALSE || silent<br />
|-<br />
| || 309 || N/A || T309R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984180 || 310 || Maple syrup urine disease || T310R || missense || TRUE || disease<br />
|-<br />
| rs144372407 || 313 || N/A || R313Q || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM034725 || 314 || Maple syrup urine disease || R314* || nonsense || TRUE || disease<br />
|-<br />
| rs201109190 || 314 || N/A || R314Q || missense || FALSE || silent<br />
|-<br />
| rs284652 || 323 || N/A || F323F || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM930070 || 326 || Maple syrup urine disease || I326T || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM062447 || 327 || Maple syrup urine disease || E327K || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM076018 || 328 || Maple syrup urine disease || A328T || missense || TRUE || disease<br />
|-<br />
| || 337 || N/A || S337D || missense || FALSE || silent<br />
|-<br />
| rs146300600 || 343 || N/A || A343V || missense || FALSE || silent<br />
|-<br />
| || 345 || N/A || R345C || missense || FALSE || silent<br />
|-<br />
| rs139556493 || 345 || N/A || S345L || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM062448 || 346 || Maple syrup urine disease || R346H || missense || TRUE || disease<br />
|-<br />
| rs144276456 || 346 || N/A || S346S || synonymous-codon || FALSE || silent<br />
|-<br />
| rs185688419 || 356 || N/A || Q356R || missense || FALSE || silent<br />
|-<br />
| rs61736656 || 359 || N/A || I359V || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM984181 || 363 || Maple syrup urine disease || R363W || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| 4 || 364 || MAPLE SYRUP URINE DISEASE, INTERMEDIATE, TYPE IA. MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA, INCLUDED || F364C || missense || TRUE || disease<br />
|-<br />
| rs190202447 || 382 || N/A || R382R || synonymous-codon || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| 1 || 393 || MAPLE SYRUP URINE DISEASE, CLASSIC, TYPE IA || Y393N || missense || TRUE || disease<br />
|-<br />
| rs145595627 || 401 || N/A || P401P || synonymous-codon || FALSE || silent<br />
|-<br />
| || 403 || N/A || P403R || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852872 || 407 || True || F407C || missense || TRUE || disease<br />
|-<br />
| || 408 || N/A || F408C || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| CM950136 || 409 || Maple syrup urine disease || F409C || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM032853 || 412 || Maple syrup urine disease || V412M || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM082499 || 413 || Maple syrup urine disease || Y413H || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM940169 || 413 || Maple syrup urine disease || Y413C || missense || TRUE || disease<br />
|-<br />
| rs34492894 || 419 || N/A || L419L || synonymous-codon || FALSE || silent<br />
|-<br />
| rs141991700 || 422 || N/A || Q422K || missense || FALSE || silent<br />
|- style="background-color: #FFBBBB;"<br />
| rs137852870 || 436 || True || Y436N || missense || TRUE || disease<br />
|- style="background-color: #FFBBBB;"<br />
| CM890022 || 438 || Maple syrup urine disease || Y438N || missense || TRUE || disease<br />
|}<br />
<br />
== Discussion ==<br />
<br />
* SNPs are very frequent in human genome. <br />
* Many missense mutations also lead to disease, because a mutation is a random event, which in most cases will lead to a loss of function.<br />
* Nonsense mutations seem to be even more severe. All reported nonsense mutations in BCKDHA are disease causing. The only case, where a nonsense mutation could be neutral, is if the mutation is near the end of the protein, which is unlikely for a random SNP.<br />
* There are disease causing mutations widespread almost over the whole length of the protein. So not only mutations at functionally important sites like binding sites or catalytic centres can cause disease, but also mutations that occur somewhere in the protein and might change its whole structure, which also disturbs the function.<br />
* The databases have different focuses: <br />
** OMIM and HGMD list only disease causing SNPs<br />
** SNPdbe adds information about functional effects of non-synonymous SNPs<br />
** dbSNP aims to collect all known SNPs<br />
<br />
== References ==<br />
<br />
* Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), 2013. [http://omim.org/ OMIM]<br />
* Stenson et al (2003). The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat(2003) 21:577-581. [http://www.hgmd.org/ HGMD]<br />
* Kitts A, Sherry S. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. 2002 Oct 9 [Updated 2011 Feb 2]. In: McEntyre J, Ostell J, editors. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2002-. Chapter 5. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21088/<br />
* [http://www.ncbi.nlm.nih.gov/SNP/ dbSNP]<br />
* Schaefer C, Meier A, Rost B, Bromberg Y (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics; 28(4):601-602. [http://www.rostlab.org/services/snpdbe/ SNPdbe]<br />
* Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic Acids Research 2011; doi: 10.1093/nar/gkr798. [http://www.snpedia.com/ SNPedia]</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33584Task 10 (MSUD)2013-07-27T14:02:54Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Movements</td><br />
<td>Long flexible loop at N-terminal has large movement. Rest part of protein does not show obvious movements.</td><br />
<td>Two subunits of the protein have small change in conformation. Slight structural changes take place within subunit.</td><br />
<td>Large conformational change between two subunits. Structural changes within subunits are larger.</td><br />
<td>Large conformational change between two subunits. No obvious intra-subunit structural change.</td><br />
<td>Large conformational change between two subunits. Large part of subunits are moving apart to each other.</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
<gallery perrow=2 widths=500px heights=300px caption="R2 of mode 8, 12 13 14 and 16. It seems only R2 values for chain A were calculated."><br />
File:BCKDHA-Mode8-r2.png|R<sup>2</sup> of mode 8. <br />
File:BCKDHA-Mode12-r2.png|R<sup>2</sup> of mode 12. <br />
File:BCKDHA-Mode13-r2.png|R<sup>2</sup> of mode 13. <br />
File:BCKDHA-Mode14-r2.png|R<sup>2</sup> of mode 14. <br />
File:BCKDHA-Mode16-r2.png|R<sup>2</sup> of mode 16. <br />
</gallery><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33583Task 10 (MSUD)2013-07-27T14:00:21Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Movements</td><br />
<td>Long flexible loop at N-terminal has large movement. Rest part of protein does not show obvious movements.</td><br />
<td>Two subunits of the protein have small change in conformation. Slight structural changes take place within subunit.</td><br />
<td>Large conformational change between two subunits. Structural changes within subunits are larger.</td><br />
<td>Large conformational change between two subunits. No obvious intra-subunit structural change.</td><br />
<td>Large conformational change between two subunits. Large part of subunits are moving apart to each other.</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
<gallery perrow=2 widths=420px heights=250px caption="R2 of mode 8, 12 13 14 and 16. It seems only R2 values for chain A were calculated."><br />
File:BCKDHA-Mode8-r2.png|R<sup>2</sup> of mode 8. <br />
File:BCKDHA-Mode12-r2.png|R<sup>2</sup> of mode 12. <br />
File:BCKDHA-Mode13-r2.png|R<sup>2</sup> of mode 13. <br />
File:BCKDHA-Mode14-r2.png|R<sup>2</sup> of mode 14. <br />
File:BCKDHA-Mode16-r2.png|R<sup>2</sup> of mode 16. <br />
</gallery><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33582Task 10 (MSUD)2013-07-27T13:47:43Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
<gallery perrow=2 widths=420px heights=250px caption="R2 of mode 8, 12 13 14 and 16. It seems only R2 values for chain A were calculated."><br />
File:BCKDHA-Mode8-r2.png|R<sup>2</sup> of mode 8. <br />
File:BCKDHA-Mode12-r2.png|R<sup>2</sup> of mode 12. <br />
File:BCKDHA-Mode13-r2.png|R<sup>2</sup> of mode 13. <br />
File:BCKDHA-Mode14-r2.png|R<sup>2</sup> of mode 14. <br />
File:BCKDHA-Mode16-r2.png|R<sup>2</sup> of mode 16. <br />
</gallery><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33581Task 10 (MSUD)2013-07-27T13:46:52Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
<gallery perrow=2 widths=420px caption="R2 of mode 8, 12 13 14 and 16. It seems only R2 values for chain A were calculated."><br />
File:BCKDHA-Mode8-r2.png|R<sup>2</sup> of mode 8. <br />
File:BCKDHA-Mode12-r2.png|R<sup>2</sup> of mode 12. <br />
File:BCKDHA-Mode13-r2.png|R<sup>2</sup> of mode 13. <br />
File:BCKDHA-Mode14-r2.png|R<sup>2</sup> of mode 14. <br />
File:BCKDHA-Mode16-r2.png|R<sup>2</sup> of mode 16. <br />
</gallery><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33580Task 10 (MSUD)2013-07-27T13:43:52Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">R<sup>2</sup></td><br />
<td>[[File:BCKDHA-Mode8-r2.png|thumb|300px]]</td><br />
<td>[[File:BCKDHA-Mode12-r2.png|thumb|300px]]</td><br />
<td>[[File:BCKDHA-Mode13-r2.png|thumb|300px]]</td><br />
<td>[[File:BCKDHA-Mode14-r2.png|thumb|300px]]</td><br />
<td>[[File:BCKDHA-Mode16-r2.png|thumb|300px]]</td><br />
</tr><br />
</table><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33579Task 10 (MSUD)2013-07-27T13:43:07Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">R<sup>2</sup></td><br />
<td>[[File:BCKDHA-Mode8-r2.png|thumb|250px]]</td><br />
<td>[[File:BCKDHA-Mode12-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-r2.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33578Task 10 (MSUD)2013-07-27T13:42:49Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">R<sup>2</sup></td><br />
<td>[[File:BCKDHA-Mode8-r2.png|thumb|200px]]</td><br />
<td>[[File:BCKDHA-Mode12-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-r2.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode16-r2.png&diff=33577File:BCKDHA-Mode16-r2.png2013-07-27T13:40:58Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode14-r2.png&diff=33576File:BCKDHA-Mode14-r2.png2013-07-27T13:40:45Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode13-r2.png&diff=33575File:BCKDHA-Mode13-r2.png2013-07-27T13:40:23Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode12-r2.png&diff=33574File:BCKDHA-Mode12-r2.png2013-07-27T13:37:28Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode8-r2.png&diff=33573File:BCKDHA-Mode8-r2.png2013-07-27T13:37:02Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33572Task 10 (MSUD)2013-07-27T13:36:48Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">R<sup>2</sup></td><br />
<td>[[File:BCKDHA-Mode8-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-r2.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-r2.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-mode-16.gif&diff=33571File:BCKDHA-mode-16.gif2013-07-27T13:05:53Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-mode-14.gif&diff=33570File:BCKDHA-mode-14.gif2013-07-27T13:04:43Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-mode-13.gif&diff=33569File:BCKDHA-mode-13.gif2013-07-27T13:03:55Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-mode-12.gif&diff=33568File:BCKDHA-mode-12.gif2013-07-27T13:02:38Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=Task_10_(MSUD)&diff=33567Task 10 (MSUD)2013-07-27T13:02:22Z<p>Weish: /* ElN&eacute;mo */</p>
<hr />
<div>==Results==<br />
[[Task 10 Lab Journal (MSUD)|Lab journal]]<br />
<br />
===WEBnm@===<br />
<br />
The informations that are provided on the results page include:<br />
<br />
* deformation energies and eigenvalues for the lowest-frequency non-trivial modes<br />
* atomic displacement and fluctuation plots<br />
* mode visualization with Jmol and vector field represention of modes<br />
* correlation matrix for motions between Calphas<br />
* overlap analysis for different conformations of the same protein<br />
<br />
For the calculation of normal modes, WEBnm@ takes only Calpha atoms. The mass of the whole residue is assigned to its Calpha atom for the analysis.<br />
<br />
The server gives eigenvalues for modes 7-56, deformation energies for modes 7-20 and atomic displacement and visualization for modes 7-12.<br />
<br />
In all modes that are visualized by the server, a movement of alpha and beta chain against each other can be observed, sometimes there is a hinge-movement (mode 7) and in other modes there is a torsion between the subunits. In most modes, a part at the end of alpha subunit (BCKDHA), consisting of 3 small helices that reach into the beta subunit, moves independently from the rest of the alpha subunit but together with beta subunit. An exception is mode 9, where this part moves a lot compared to the rest of the protein. Vector representations of modes 7 and 9:<br />
<br />
<gallery widths=500px heights=250px perrow=2"><br />
File:MSUD_mode7.png| Normal mode 7 of 2BFF, alpha and beta chain move against each other.<br />
File:MSUD_mode9.png| Normal mode 9 of 2BFF, end of alpha chain moves strongly.<br />
</gallery><br />
<br />
The analysis was also performed for the alpha subunit (chain A) alone, but the results are similar to those presented for the whole structure (see above), only the frequency of motion was higher compared to the whole structure.<br />
<br />
The part at the beginning and end of the alpha chain are most flexible, as can be seen in the atomic dicplacement plots for modes 7-12 (at approximately position 400 in the plot beta chain begins):<br />
<br />
[[File:MSUD_2BFF_modeall.png|800px]]<br />
<br />
<br />
In the correlation matrix, that shows the correlation of motions between different Calpha atoms, the two chains can clearly be identified as differently moving domains:<br />
<br />
[[File:MSUD_2BFF_correlation_matrix.png|600px]]<br />
<br />
<br />
For BCKDHA (alpha chain in the structure) we can identify only one domain, which is consistent with the domain listed in the databases:<br />
<br />
* CATH: 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial, Homo sapiens<br />
* SCOP: Branched-chain alpha-keto acid dehydrogenase PP module (pyrophosphate-binding)<br />
* Pfam: Dehydrogenase E1 component (E1_dh) at residues 106-405<br />
<br />
In the comparative analysis no difference in the normal modes between the structure with and without ligand could be observed.<br />
<br />
===ElN&eacute;mo===<br />
<br />
The web tool ElN&eacute;mo provides following information:<br />
* Predicted B-factors for all residues<br />
* Mode frequencies and collectivities. Collectivity is defined by the fraction of residues which are significantly affected in the given mode. Modes are sorted by frequency in ascending order.<br />
* Fluctuation maps for selected top-10 normal modes. The maps are derived from C&alpha;-C&alpha; distances. <br />
* C&alpha; atom strains for each residue<br />
* R<sup>2</sup>(normalized mean square displacement) of all C&alpha;-atoms in the protein.<br />
* PDB structures for the corresponding modes.<br />
* Animation for each of the top-10 modes.<br />
<br />
Totally 100 modes were generated by ElN&eacute;mo. They are sorted by frequency in ascending order. Analysis over the first 10 modes with highest frequencies were automatically done by ElN&eacute;mo. Among the top-10 modes, 6 modes have low collectivity. Following table shows 5 of the top-10 modes which we found they are interesting for interpretation:<br />
<br />
<table border="1"><br />
<tr style="font-weight: bold;"><br />
<td>Mode properties</td><br />
<td>Mode 8</td><br />
<td>Mode 12</td><br />
<td>Mode 13</td><br />
<td>Mode 14</td><br />
<td>Mode 16</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Frequency</td><br />
<td>1.82</td><br />
<td>8.91</td><br />
<td>10.51</td><br />
<td>10.94</td><br />
<td>14.59</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Collectivity</td><br />
<td><span style="color:red">0.0365</span></td><br />
<td>0.3276</td><br />
<td>0.5536</td><br />
<td>0.2636</td><br />
<td>0.5809</td><br />
</tr><br />
<br />
<tr><br />
<td style="font-weight: bold;">Animation</td><br />
<td>[[File:BCKDHA-mode-8.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-12.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-13.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-14.gif|thumb]]</td><br />
<td>[[File:BCKDHA-mode-16.gif|thumb]]</td><br />
</tr><br />
<tr><br />
<td style="font-weight: bold;">Fluctuation map</td><br />
<td>[[File:BCKDHA-Mode8-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode12-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode13-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode14-fluctuation.png|thumb]]</td><br />
<td>[[File:BCKDHA-Mode16-fluctuation.png|thumb]]</td><br />
</tr><br />
</table><br />
<br />
==Discussion==</div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-mode-8.gif&diff=33566File:BCKDHA-mode-8.gif2013-07-27T13:01:11Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode16-fluctuation.png&diff=33565File:BCKDHA-Mode16-fluctuation.png2013-07-27T13:00:23Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode14-fluctuation.png&diff=33564File:BCKDHA-Mode14-fluctuation.png2013-07-27T13:00:00Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode13-fluctuation.png&diff=33563File:BCKDHA-Mode13-fluctuation.png2013-07-27T12:59:37Z<p>Weish: </p>
<hr />
<div></div>Weishhttps://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php?title=File:BCKDHA-Mode12-fluctuation.png&diff=33562File:BCKDHA-Mode12-fluctuation.png2013-07-27T12:59:12Z<p>Weish: </p>
<hr />
<div></div>Weish