Difference between revisions of "Sequence-based mutation analysis of ARSA"
(→SNAP) |
m (→References) |
||
(44 intermediate revisions by 2 users not shown) | |||
Line 49: | Line 49: | ||
|- |
|- |
||
|} |
|} |
||
− | |||
− | In the following sections, we will apply the methods and discuss the individual results. An overall summary and guess of the impact on the function is then made in the last section. |
||
=== Substitution Matrices === |
=== Substitution Matrices === |
||
Line 92: | Line 90: | ||
|} |
|} |
||
− | + | === PSI-BLAST === |
|
+ | An improvement to looking at the standard substitution matrices from above could be made by generating a substitution matrix, which is specific to our protein and its homologs. Such a matrix can be obtained by executing a PSI-BLAST search. To infer the position specific sequence profile, we executed PSI-BLAST with the following command: |
||
− | * ''Mutation 1:'' The scores are high in all substituion matrices. Thus, this substitution is likely to have no effect. |
||
+ | |||
− | * ''Mutation 2:'' All scores are low. Thus, the properties of the amino acids are should differ and a the substitution should rather be neutral than non-neutral. |
||
+ | <code> |
||
− | * ''Mutation 3:'' |
||
+ | blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt |
||
− | * ''Mutation 4:'' |
||
+ | </code> |
||
− | * ''Mutation 5:'' |
||
+ | |||
− | * ''Mutation 6:'' |
||
+ | The graphic shows the relevant lines of the profile matrix regarding our mutated positions. The scores of interest - which score our mutation substitutions - are highlighted in green. |
||
− | * ''Mutation 7:'' |
||
+ | |||
− | * ''Mutation 8:'' |
||
+ | <code> |
||
− | * ''Mutation 9:'' |
||
+ | Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts |
||
− | * ''Mutation 10:'' |
||
+ | A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V |
||
+ | 29 D <span style="background:#00FF00">-5</span> -5 -2 8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.49 1.56 |
||
+ | 153 Q 3 2 -1 4 -4 -1 -1 -2 <span style="background:#00FF00">0</span> -2 -3 -3 4 -2 -3 -1 -2 -3 -2 -2 26 10 3 23 0 3 3 3 2 2 1 1 13 2 1 3 2 0 1 2 0.53 1.48 |
||
+ | 274 T -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 <span style="background:#00FF00">-3</span> -5 -4 1 8 -6 -5 -3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 92 0 0 0 1.94 1.62 |
||
+ | 409 T -1 0 0 -1 -2 -1 -1 0 -1 <span style="background:#00FF00">-1</span> -1 0 -1 -1 3 0 1 6 0 -1 5 5 5 4 1 3 4 8 1 3 6 5 1 2 13 6 8 11 3 4 0.26 0.95 |
||
+ | 489 C 2 -1 1 -4 8 -4 -4 <span style="background:#00FF00">-2</span> -1 -1 -2 -3 -1 -4 -4 0 0 5 -1 -3 15 4 8 0 36 0 0 2 1 3 3 1 1 0 0 6 5 9 2 0 0.99 1.22 |
||
+ | 440 N -5 -3 6 5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6 2 <span style="background:#00FF00">-2</span> -3 -6 -6 -5 0 1 46 36 0 1 2 0 0 0 0 1 0 0 10 1 1 0 0 0 1.48 1.67 |
||
+ | 356 F -3 -1 -5 -5 -3 0 -1 -6 1 3 0 -1 0 2 -6 -3 -2 -3 5 <span style="background:#00FF00">3</span> 1 4 0 0 1 5 4 0 3 18 8 5 2 8 0 1 2 0 20 20 0.59 1.62 |
||
+ | 193 W -2 4 2 3 <span style="background:#00FF00">-5</span> 0 0 -2 0 -3 -4 1 -3 -1 -2 -1 -2 1 1 -3 3 25 11 16 0 4 5 3 2 2 1 7 0 2 2 4 2 2 5 2 0.46 1.45 |
||
+ | 136 P <span style="background:#00FF00">-3</span> -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7 9 -4 -4 -7 -6 -5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 98 0 0 0 0 0 3.03 1.61 |
||
+ | 496 R -3 1 0 -3 -4 1 1 -1 <span style="background:#00FF00">1</span> -3 1 1 -2 2 4 0 -3 -1 -1 -3 1 7 4 1 0 5 10 4 3 1 16 9 0 9 20 8 1 1 1 1 0.34 0.96 |
||
+ | </code> |
||
+ | |||
+ | === Multiple sequence alignments === |
||
+ | Another interesting feature one could look at is the conservation of the wild type and mutant residues of our protein in the sequence of homologs. To calculate this, we first downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term <code>taxonomy:40674</code>, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA. <br> |
||
+ | Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below. |
||
+ | |||
+ | [[File:homomusclearsa.png | 200px | center | thumb | Multiple sequence alignments of all 75 homologuous sequences using muscle]] |
||
+ | |||
+ | The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions. |
||
+ | |||
+ | {| border="1" style="text-align:center; border-spacing:0;" |
||
+ | ! pos || conservation - reference || conservation - mutant |
||
+ | |- |
||
+ | | 29 || 0.86 || 0 |
||
+ | |- |
||
+ | | 153 || 0.14 || 0 |
||
+ | |- |
||
+ | | 274 || 0.87 || 0 |
||
+ | |- |
||
+ | | 409 || 0.35 || 0.16 |
||
+ | |- |
||
+ | | 489 || 0.80 || 0.05 |
||
+ | |- |
||
+ | | 193 || 0.13 || 0 |
||
+ | |- |
||
+ | | 356 || 0.15 || 0 |
||
+ | |- |
||
+ | | 440 || 0.15 || 0 |
||
+ | |- |
||
+ | | 496 || 0.14 || 0.01 |
||
+ | |- |
||
+ | | 136 || 0.93 || 0 |
||
+ | |- |
||
+ | |} |
||
=== Secondary Structure === |
=== Secondary Structure === |
||
+ | |||
+ | Secondary structure is an important structural feature of the protein, which also stabilizes the overall tertiary structure and is therefore also important for a proper functioning of the protein. Mutations, which are located within secondary structure elements might destroy the secondary structure and migth therefore have an impact on the protein function. To consider the position of the mutations, relative to the secondary structure of ARSA, we generated the following map: |
||
[[ File:Sec_Struct_Mutations_ARSA.png | 900px ]] |
[[ File:Sec_Struct_Mutations_ARSA.png | 900px ]] |
||
Line 253: | Line 298: | ||
|} |
|} |
||
− | ==== |
+ | ==== Summary of the prediction results ==== |
To compare the results of the different prediction methods we created the table below. If a mutation was predicted to have an effect, a "X" was set, if a mutation was predicted to have no effect, a "-" was set. For PolyPhen "X" means "damaging" or "probably damaging", a "/" means "possibly damaging" and a "-" means "benign". |
To compare the results of the different prediction methods we created the table below. If a mutation was predicted to have an effect, a "X" was set, if a mutation was predicted to have no effect, a "-" was set. For PolyPhen "X" means "damaging" or "probably damaging", a "/" means "possibly damaging" and a "-" means "benign". |
||
Line 287: | Line 332: | ||
|} |
|} |
||
− | === |
+ | === Summary and Discussion === |
+ | In this section we compare the results of the previous analyses and additionaly use pymol mutagenesis images and the physico-chemical properties of the amino acids to make our final guess of the impact of the mutation. All mutations are listed below, together with a pymol mutagenesis image and a description of the properties of the mutations. We also included short summary tables of the methods we applied and added a short discussion/interpretation of the results. For a detailed descitption of the summary tables, please read the individual sections. |
||
− | First, we downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term <code>taxonomy:40674</code>, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA. <br> |
||
− | Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below. |
||
− | [[File:homomusclearsa.png | 200px | center | thumb | Multiple sequence alignments of all 75 homologuous sequences using muscle]] |
||
+ | ==== Mutation 1 ==== |
||
− | The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions. |
||
− | |||
− | {| border="1" style="text-align:center; border-spacing:0;" |
||
− | ! pos || conservation - reference || conservation - mutant |
||
− | |- |
||
− | | 29 || 0.86 || 0 |
||
− | |- |
||
− | | 153 || 0.14 || 0 |
||
− | |- |
||
− | | 274 || 0.87 || 0 |
||
− | |- |
||
− | | 409 || 0.35 || 0.16 |
||
− | |- |
||
− | | 489 || 0.80 || 0.05 |
||
− | |- |
||
− | | 193 || 0.13 || 0 |
||
− | |- |
||
− | | 356 || 0.15 || 0 |
||
− | |- |
||
− | | 440 || 0.15 || 0 |
||
− | |- |
||
− | | 496 || 0.14 || 0.01 |
||
− | |- |
||
− | | 136 || 0.93 || 0 |
||
− | |- |
||
− | |} |
||
− | |||
− | === PSI-BLAST === |
||
− | |||
− | To infer the position specific sequence profile, we executed PSI-BLAST with the following command: |
||
− | |||
− | <code> |
||
− | blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt |
||
− | </code> |
||
− | |||
− | The graphic shows the relevant lines of the profile matrix regarding our mutated positions. The scores of interest - which score our mutation substitutions - are highlighted in green. |
||
− | |||
− | <code> |
||
− | Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts |
||
− | A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V |
||
− | 29 D <span style="background:#00FF00">-5</span> -5 -2 8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.49 1.56 |
||
− | 153 Q 3 2 -1 4 -4 -1 -1 -2 <span style="background:#00FF00">0</span> -2 -3 -3 4 -2 -3 -1 -2 -3 -2 -2 26 10 3 23 0 3 3 3 2 2 1 1 13 2 1 3 2 0 1 2 0.53 1.48 |
||
− | 274 T -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 <span style="background:#00FF00">-3</span> -5 -4 1 8 -6 -5 -3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 92 0 0 0 1.94 1.62 |
||
− | 409 T -1 0 0 -1 -2 -1 -1 0 -1 <span style="background:#00FF00">-1</span> -1 0 -1 -1 3 0 1 6 0 -1 5 5 5 4 1 3 4 8 1 3 6 5 1 2 13 6 8 11 3 4 0.26 0.95 |
||
− | 489 C 2 -1 1 -4 8 -4 -4 <span style="background:#00FF00">-2</span> -1 -1 -2 -3 -1 -4 -4 0 0 5 -1 -3 15 4 8 0 36 0 0 2 1 3 3 1 1 0 0 6 5 9 2 0 0.99 1.22 |
||
− | 440 N -5 -3 6 5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6 2 <span style="background:#00FF00">-2</span> -3 -6 -6 -5 0 1 46 36 0 1 2 0 0 0 0 1 0 0 10 1 1 0 0 0 1.48 1.67 |
||
− | 356 F -3 -1 -5 -5 -3 0 -1 -6 1 3 0 -1 0 2 -6 -3 -2 -3 5 <span style="background:#00FF00">3</span> 1 4 0 0 1 5 4 0 3 18 8 5 2 8 0 1 2 0 20 20 0.59 1.62 |
||
− | 193 W -2 4 2 3 <span style="background:#00FF00">-5</span> 0 0 -2 0 -3 -4 1 -3 -1 -2 -1 -2 1 1 -3 3 25 11 16 0 4 5 3 2 2 1 7 0 2 2 4 2 2 5 2 0.46 1.45 |
||
− | 136 P <span style="background:#00FF00">-3</span> -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7 9 -4 -4 -7 -6 -5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 98 0 0 0 0 0 3.03 1.61 |
||
− | 496 R -3 1 0 -3 -4 1 1 -1 <span style="background:#00FF00">1</span> -3 1 1 -2 2 4 0 -3 -1 -1 -3 1 7 4 1 0 5 10 4 3 1 16 9 0 9 20 8 1 1 1 1 0.34 0.96 |
||
− | </code> |
||
− | |||
− | |||
− | === Summary and Discussion === |
||
− | The mutations are listed below, together with a pymol mutagenesis image and a description of the properties of the mutations. We also included short summary tables of the methods we applied and added a short discussion/interpretation of the results. For a detailed descitption of the summary tables, please read the individual sections. |
||
{| border="1" style="text-align:left; border-spacing:0;" |
{| border="1" style="text-align:left; border-spacing:0;" |
||
Line 386: | Line 375: | ||
|} |
|} |
||
|} |
|} |
||
− | Aspartic acid is an acidic amino acid while Asparagine is a hydrophilic amino acid. So the mutation changes the behaviour towards water as well as the pH. The lysosomal enzyme |
+ | Aspartic acid is an acidic amino acid while Asparagine is a hydrophilic amino acid. So the mutation changes the behaviour towards water as well as to the pH. The lysosomal enzyme ARSA is active at a very low pH value, thus acidic amino acids are preferred in this environment. Consequently the effect could be deleterious. This hypothesis is supported by all predictions and also the substitution matrices show rather low values. The mutation is located at the border of a beta sheet, which is also an indicator for a possible deleterious effect. Also the conservation of the amino acid is very high in the MSA of related sequences, which indicates, that the residue is quite important. Furthermore it is classified as important residue by our SNAP analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect. <br> |
+ | If there is an effect, it is not introduced by a structural change of the aminpo acid itself - structures are very similar (see mutagenesis images abbove) - but through the drastic change of the amino acid property. Regarding to our analysis we classify this mutation as deleterious. |
||
− | This is supported by the Uniprot annotation, which associates it to infantile-onset Metachromatic leukodystrophy. It causes a severe reduction of enzyme activity. |
||
|- |
|- |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 2 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
+ | |- |
||
| rowspan="2" | 2 |
| rowspan="2" | 2 |
||
| rowspan="2" | Pro - Ala |
| rowspan="2" | Pro - Ala |
||
Line 417: | Line 420: | ||
|} |
|} |
||
|} |
|} |
||
− | Proline and Alanine are both hydrophobic amino acids. In contrast to mutation 1, the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is |
+ | Proline and Alanine are both hydrophobic amino acids. In contrast to mutation 1, the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is structurally very important. It is even located at the border of an alpha-helix. Thus, the change to the small amino acid Alanine could introduce a big structural change, despite the similarity, regarding to their chemical properties. This structural change might e.g. occur due to an extension of the alpha-helix. <br> |
− | For this mutation, all predictions yield damaging effects and the substitution matrices indicate, that a substitution from Pro to Ala is very unlikely. Again the |
+ | For this mutation, all predictions yield damaging effects and the substitution matrices indicate, that a substitution from Pro to Ala is very unlikely. Again, the conservation of the amino acid is very high in the MSA of related sequences, which indicates, that the residue is quite important. <br> |
Furthermore it is classified as important residue by our analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect. <br> |
Furthermore it is classified as important residue by our analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect. <br> |
||
− | + | Due to all these indicators, we guess that this amino acid is deleterious and leads t oan outbreak of the disease. |
|
|- |
|- |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 3 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
+ | |- |
||
| rowspan="2" | 3 |
| rowspan="2" | 3 |
||
| rowspan="2" | Gln-His |
| rowspan="2" | Gln-His |
||
Line 450: | Line 467: | ||
|} |
|} |
||
|} |
|} |
||
− | Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. |
+ | Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. Furthermore, Glutamine and Histidine are very different in structure. Histidine is bigger and needs much more space than Glutamine, which could have an influence on the structure of ARSA (see above pymol images). <br> |
− | In this case the |
+ | In this case the mutation is not located within a secondary structure element and it is also not conserved in the MSA. Further on, the values in PAM and the PSSM are quite high. The value in the BLSOUM62 matrix however lies in the mid range. These factors indicate, that the mutation should not have a severe effect. <br> |
− | The predictions made by SNAP, SIFT and Polyphen are not consistent. Whereas SIFT and SNAP predict a neutral effect, Polyphen predicts a benign effect. Regarding to the above results, we tend to classify this mutation as neutral. <br> |
+ | The predictions made by SNAP, SIFT and Polyphen are not fully consistent. Whereas SIFT and SNAP predict a neutral effect, Polyphen predicts a benign effect. Regarding to the above results, we tend to classify this mutation as neutral. However this prediction is not supported by a striking evidence. <br> |
− | This is however not the case. HGMD states, that the mutation is associated to Metachromatic Leukodystrophy. |
||
|- |
|- |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 4 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
+ | |- |
||
| rowspan="2" | 4 |
| rowspan="2" | 4 |
||
| rowspan="2" | Trp-Cys |
| rowspan="2" | Trp-Cys |
||
Line 483: | Line 513: | ||
|} |
|} |
||
|} |
|} |
||
− | Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. |
+ | Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. Structural features and chemical properties indicate an influence on the structure and function. <br> |
− | + | The wild type residue is not conserved across the homologs of ARSA, which could mean that it is a not very important residue. The mutation is not located within a secondary structure element. This could indicate a neutral substitution. However, all substitution matrices yield very low values for the given substitution, thus the substitution is very unlikely. The predictions of SNAP, SIFT and Polyphen suggest a damaging effect. Also our SNAP "all combination" analysis assigns importance to the residues. Like for mutation 3 the results here are again a bit contradictory, but because of the great evidence of the prediction tools and the low values for the scoring matrices, assign a deleterious effect to this mutation. <br> |
|
− | However, HGMD does not contain this mutation and dbSNP does not assign a deleterious effect. The mutation is a single nucleotide polymorphism (SNP), which - by defintion - occurs in a certain part of the population. As Metachromatic leukodystrophy is not very widespread this mutation should be a non-damaging natural variant. |
||
|- |
|- |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 5 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
+ | |- |
||
| rowspan="2" | 5 |
| rowspan="2" | 5 |
||
| rowspan="2" | Thr-Met |
| rowspan="2" | Thr-Met |
||
Line 515: | Line 558: | ||
|} |
|} |
||
|} |
|} |
||
− | Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. |
+ | Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. AMethionine has a very long sidechain while Threonine does not. So the physico-chemical features indicate, that the structure of ARSA could be altered by this mutation. <br> |
− | Besides these properties, the mutation is located within a secondary structure element, the residue (Thr) is highly conserved across homologs and all prediction tools predict a deleterious effect on the enzyme's function. Further on, the values in the substitution matrices are very low, indicating a deleterious effect. |
+ | Besides these properties, the mutation is located within a secondary structure element, the residue (Thr) is highly conserved across homologs and all prediction tools predict a deleterious effect on the enzyme's function. Further on, the values in the substitution matrices are very low, indicating a deleterious effect. This time everything indicates a deleterious effect, thus predict this mutation to be harmful. <br> |
+ | |- |
||
− | HGMD assigns a deleterious effect to the mutation. |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 6 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
+ | |- |
||
|- |
|- |
||
| rowspan="2" | 6 |
| rowspan="2" | 6 |
||
Line 547: | Line 604: | ||
|} |
|} |
||
|} |
|} |
||
− | Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val. |
+ | Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val. <br> |
− | + | When looking at the substitution-matrices, one can notice that the scores are not very high but also not really low. The prediction methods all agree, that this mutation should have no harmful effect. Furthemore, the conservation in the MSA is very low and the mutation is not disrupting a secondary structure element, which are indicators that the mutation should not have a deleterious effect. <br> |
|
− | + | In this case, we guess that this mutation should be neutral. |
|
+ | |- |
||
+ | |} |
||
+ | |||
+ | |||
+ | |||
+ | ==== Mutation 7 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
|- |
|- |
||
| rowspan="2" | 7 |
| rowspan="2" | 7 |
||
Line 579: | Line 650: | ||
|} |
|} |
||
|} |
|} |
||
− | Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes. |
+ | Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes. Furthermore they are structurally not very similar (see mutagenesis image). <br> |
− | All prediction methods except the HumVar-Mode of PolyPhen assign a functional change to this mutation |
+ | All prediction methods except the HumVar-Mode of PolyPhen assign a functional change to this mutation, which is also a clear indicator for a non-neutral effect. Our SNAP analysis also classifies this position to be important. <br> |
+ | The conservation in the MSA is not high and the mutation does not disrupt a secondary structure element, which is again an indicator that a mutation at this position might not cause any effect. However, the scores in the substitution matrices are rather low, which supports the prediction tools. <br> |
||
− | The mutation is known to cause Metachromatic Leukodystrophy. |
||
+ | Due to the stronger evidence for a non-neutral effect, we expect this mutation to be non-neutral. |
||
+ | |- |
||
+ | |} |
||
+ | |||
+ | ==== Mutation 8 ==== |
||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
|- |
|- |
||
| rowspan="2" | 8 |
| rowspan="2" | 8 |
||
Line 611: | Line 694: | ||
|} |
|} |
||
|} |
|} |
||
+ | Asparagine and Serine are both hydrophilic amino acids. Also they are of similar size. The scores in the substitution matrices for this mutation are very high and the conservation in the MSA is very low, which indicates that the residue is not very important to the protein. Furthermore, the mutation is not disrupting a secondary structure element, which also favors the neutral effect hypothesis. The prediction methods do not agree on the effect of the mutation. SNAP predicts a non-neutral effect, whereas SIFT and Polyphen(HumVar) predict a neutral effetc. Polyphen(Humdiv) predicts a benign effect. However, the indications for a neutral effect predominate, so we guess, that this mutation has a neutral effect. |
||
− | Asparagine and Serine are both hydrophilic amino acids. Also they are almost of the same size. So the mutation should not have a very dramatic effect. |
||
+ | |- |
||
− | The scores in the substitution matrices for this mutation are very high, the conservation in the MSA is very low and the mutation is not disrupting a secondary structure elemtent but nevertheless the prediction methods do not agree on the effect of the mutation. |
||
+ | |} |
||
− | DbSNP classifies this mutation as SNP, so it should not be harmful. |
||
+ | |||
+ | ==== Mutation 9 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
|- |
|- |
||
| rowspan="2" | 9 |
| rowspan="2" | 9 |
||
Line 643: | Line 736: | ||
|} |
|} |
||
|} |
|} |
||
− | Cystein and Glycine are both hydrophilic amino acids. One difference is the |
+ | Cystein and Glycine are both hydrophilic amino acids. One difference is the ize. Gly is the smallest of the amino acids, while Cys is a little bigger. But even more important Cystein contains sulfur which is important for building sulfur bridges. Sulfur birdges serve as important covalent interactions for a lot of protein structures. Thus, if a Cysteine, which forms sulfur bridges, is replaced by another amino acid, the structure migth be dramatically changed and this again is likely to affect the function. This is also reflected by the scores in the substitution matrices. Furthermore, the residue is conserved across homologs to ARSA and all prediction methods agree, that this mutation is non-neutral. <br> |
+ | It does not disrupt any secondary structure elements, but all other facts indicate, that this mutation is has a non-neutral effect on the function. |
||
− | The conservation of Cystein is very high in the MSA and the scores in the substitution matrices are very low. Also, all 4 methods agree that this mutation changes the function of the Arylsulfatase A. |
||
+ | |- |
||
− | This mutation causes Metachromatic leukodystrophy. |
||
+ | |} |
||
+ | |||
+ | ==== Mutation 10 ==== |
||
+ | |||
+ | {| border="1" style="text-align:left; border-spacing:0;" |
||
+ | | '''Nr.''' |
||
+ | | '''mutation''' |
||
+ | | '''position''' |
||
+ | | ''' reference ''' |
||
+ | | ''' mutation ''' |
||
+ | | ''' both ''' |
||
|- |
|- |
||
| rowspan="2" | 10 |
| rowspan="2" | 10 |
||
Line 675: | Line 779: | ||
|} |
|} |
||
|} |
|} |
||
− | Arginine and Histidine are both basic amino acids |
+ | Arginine and Histidine are both basic amino acids. They strongly differ in thier structure. Histidine has an aromatic ring, whereas Arginine has not. <br> |
− | + | However, the conservation of Arginine in the MSA is very low and all 4 methods agree, that this mutation is neutral. Also the fact that the mutation does not disrupt a secondary structure element supports this idea and the scores in the substitution matrices are quite high. This is, why we conclude, that this mutation should be neutral. |
|
− | The mutation is classified as SNP and due to that not disease-causing. |
||
|- |
|- |
||
|} |
|} |
||
+ | |||
+ | === Lifting the curtain - Our predictions vs. HGMD and dbSNP === |
||
+ | |||
+ | |||
+ | {| border="1" style="border-spacing:0" align="center" cellpadding="3" cellspacing="3" |
||
+ | !Mutation NR |
||
+ | !Substitution |
||
+ | !SNAP |
||
+ | !SIFT |
||
+ | !colspan="2" | PolyPhen |
||
+ | !assignment by HGMD/dbSNP |
||
+ | !our prediction |
||
+ | !result of our prediction |
||
+ | |- |
||
+ | |||||||||HumDiv||HumVar |
||
+ | |- |
||
+ | |1|| D29N ||X||X||X||X|| non-neutral || non-neutral|| true |
||
+ | |- |
||
+ | |2|| P136A ||X||X||X||X||non-neutral ||non-neutral || true |
||
+ | |- |
||
+ | |3||Q153H||-||-||/||/||non-neutral ||neutral || wrong |
||
+ | |- |
||
+ | |4||W193C||X||X||X||/||neutral || non-neutral||wrong |
||
+ | |- |
||
+ | |5||T274M||X||X||X||X|| non-neutral||non-neutral ||true |
||
+ | |- |
||
+ | |6||F356V||-||-||-||-|| neutral|| neutral||true |
||
+ | |- |
||
+ | |7||T409I||X||X||X||-||non-neutral || non-neutral||true |
||
+ | |- |
||
+ | |8||N440S||X||-||/||-|| neutral|| neutral||true |
||
+ | |- |
||
+ | |9||C489G||X||X||X||X|| non-neutral|| non-neutral ||true |
||
+ | |- |
||
+ | |10||R496H||-||-||-||-||neutral || neutral||true |
||
+ | |} |
||
+ | |||
+ | |||
+ | The table shows, that our predictions are true in 8 out of ten cases. In most cases the prediction tools agree in their prediction and they are right. We can conclude, that one can make a good prediction, using the tools and methods we applied, but an uncertainty remains, if the prediction is correct. This can be seen, e.g. for mutation 4 where 3 of 4 methods assign a non-neutral effectm, but the mutation is neutral. |
||
=== References === |
=== References === |
||
<references /> |
<references /> |
||
+ | |||
+ | [[Category : Metachromatic_Leukodystrophy 2011]] |
Latest revision as of 14:01, 29 March 2012
Contents
Introduction
Many mutations in the human genome are suspected to have an impact on protein function. Thus, the prediction of the effects of these mutations on the function - especially for disease causing mutation - is a very important task. In this TASK, we will apply different sequence based methods to predict mutation effects on the protein's function and then try to discriminate neutral from non-neutral mutations.
We randomly picked 10 missense mutations from dbSNP and HGMD. At this point, we act like we did not know which of these mutations is causing the disease and which is not. After having applied the methods and interpreted the results, we are going to lift the curtain and check if our guesses were correct. The mutations, we picked are summarized in the table below:
Nr. | mutation | position |
1 | Asp-Asn | 29 |
2 | Pro - Ala | 136 |
3 | Gln-His | 153 |
4 | Trp-Cys | 193 |
5 | Thr-Met | 274 |
6 | Phe -Val | 356 |
7 | Thr-Ile | 409 |
8 | Asn-Ser | 440 |
9 | Cys-Gly | 489 |
10 | Arg-His | 496 |
Substitution Matrices
A first very rough guess on the effect of mutation can be made by looking at the standard substitution matrices, like the BLOSUM and PAM matrices. Low scores in these matrices indicate, that mutations of two amino acids are rarely observed and thus the amino acids should have very different physico-chemical properities. Consequently substitution with low scores might affect structure and/or the function of the protein.
Substitutions with a high score are observed very frequently. Thus the properties of the amino acids are similar and thus the substiotion is not very likely to affect the protein's structure or function.
When doing this analysis, we have to keep in mind, that this is a very inaccurate method to "predict" the impact of a certain mutation, as these matrices are calculated with a lot of proteins, which evens out effects specific to our protein, protein familiy respectively. But it can give a first gues, if the mutations is likely to occur in general or not.
We extracted the scores for our mutations from BLOSUM62, PAM1 and PAM100 and summarized these in the following table. Additionaly, we extracted the lowest score possible for any substitution of the amino acid of interest.
Nr. | Substitution | BLOSUM62 | PAM1 | PAM250 |
---|---|---|---|---|
1 | Asp(D) -> Asn(N) | 1 (worst: -4) | 36 (worst: 0) | 7 (worst: 0) |
2 | Pro(P) -> Ala(A) | -1 (worst: -4) | 22 (worst: 0) | 11 (worst: 0) |
3 | Gln(Q) -> His(H) | 0 (worst: -3) | 20 (worst: 0) | 7 (worst: 0) |
4 | Trp(W) -> Cys(C) | -2 (worst: -4) | 0 (worst: 0) | 1 (worst: 1) |
5 | Thr((T) -> Met(M) | -1 (worst: -3) | 2 (worst: 0) | 1 (worst: 0) |
6 | Phe(F) -> Val(V) | -1 (worst: -4) | 1 (worst: 0) | 10 (worst: 1) |
7 | Thr(T) -> Ile(I) | -2 (worst: -3) | 7 (worst: 0) | 4 (worst: 0) |
8 | Asn(N) -> Ser(S) | 1 (worst: -4) | 34 (worst: 0) | 8 (worst: 0) |
9 | Cys(C) -> Gly(G) | -3 (worst: -4) | 1 (worst: 0) | 4 (worst: 0) |
10 | Arg(R) -> His(H) | 0 (worst: -3) | 8 (worst: 0) | 5 (worst: 1) |
PSI-BLAST
An improvement to looking at the standard substitution matrices from above could be made by generating a substitution matrix, which is specific to our protein and its homologs. Such a matrix can be obtained by executing a PSI-BLAST search. To infer the position specific sequence profile, we executed PSI-BLAST with the following command:
blastpgp -i ARSA.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Q psiblast.mat -o psiblast_eval10E_6.it.5.new.txt
The graphic shows the relevant lines of the profile matrix regarding our mutated positions. The scores of interest - which score our mutation substitutions - are highlighted in green.
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
29 D -5 -5 -2 8 -7 -3 -1 -4 -4 -6 -7 -4 -6 -7 -5 -3 -4 -7 -6 -6 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.49 1.56
153 Q 3 2 -1 4 -4 -1 -1 -2 0 -2 -3 -3 4 -2 -3 -1 -2 -3 -2 -2 26 10 3 23 0 3 3 3 2 2 1 1 13 2 1 3 2 0 1 2 0.53 1.48
274 T -3 -4 -3 -4 -2 -4 -4 -5 -5 -4 -4 -4 -3 -5 -4 1 8 -6 -5 -3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 92 0 0 0 1.94 1.62
409 T -1 0 0 -1 -2 -1 -1 0 -1 -1 -1 0 -1 -1 3 0 1 6 0 -1 5 5 5 4 1 3 4 8 1 3 6 5 1 2 13 6 8 11 3 4 0.26 0.95
489 C 2 -1 1 -4 8 -4 -4 -2 -1 -1 -2 -3 -1 -4 -4 0 0 5 -1 -3 15 4 8 0 36 0 0 2 1 3 3 1 1 0 0 6 5 9 2 0 0.99 1.22
440 N -5 -3 6 5 -6 -2 -1 -4 -3 -6 -6 -3 -6 -6 2 -2 -3 -6 -6 -5 0 1 46 36 0 1 2 0 0 0 0 1 0 0 10 1 1 0 0 0 1.48 1.67
356 F -3 -1 -5 -5 -3 0 -1 -6 1 3 0 -1 0 2 -6 -3 -2 -3 5 3 1 4 0 0 1 5 4 0 3 18 8 5 2 8 0 1 2 0 20 20 0.59 1.62
193 W -2 4 2 3 -5 0 0 -2 0 -3 -4 1 -3 -1 -2 -1 -2 1 1 -3 3 25 11 16 0 4 5 3 2 2 1 7 0 2 2 4 2 2 5 2 0.46 1.45
136 P -3 -5 -5 -5 -6 -4 -4 -5 -5 -6 -6 -4 -6 -7 9 -4 -4 -7 -6 -5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 98 0 0 0 0 0 3.03 1.61
496 R -3 1 0 -3 -4 1 1 -1 1 -3 1 1 -2 2 4 0 -3 -1 -1 -3 1 7 4 1 0 5 10 4 3 1 16 9 0 9 20 8 1 1 1 1 0.34 0.96
Multiple sequence alignments
Another interesting feature one could look at is the conservation of the wild type and mutant residues of our protein in the sequence of homologs. To calculate this, we first downloaded the HSSP file for ARSA to get all proteins, which are homologuous to it. Then we downloaded all mammalian protein sequences from Uniprot. This was achieved by searching for the term taxonomy:40674
, which codes for all mammalian protein sequences. We saved all sequences in one multiple fasta file. Then we extracted all homologuous mammalian proteins to human ARSA by mapping the ids from the HSSP file to sequence ids in the multi fasta file. This yielded 75 homologuous mammalian sequences to human ARSA.
Next, we calculated a multiple sequence alignments of these proteins (including ARSA) with Muscle. The Jalview image of the alignment is shown below.
The following table shows the conservation of the original amino acid in the reference sequence and their mutations at the respective positions.
pos | conservation - reference | conservation - mutant |
---|---|---|
29 | 0.86 | 0 |
153 | 0.14 | 0 |
274 | 0.87 | 0 |
409 | 0.35 | 0.16 |
489 | 0.80 | 0.05 |
193 | 0.13 | 0 |
356 | 0.15 | 0 |
440 | 0.15 | 0 |
496 | 0.14 | 0.01 |
136 | 0.93 | 0 |
Secondary Structure
Secondary structure is an important structural feature of the protein, which also stabilizes the overall tertiary structure and is therefore also important for a proper functioning of the protein. Mutations, which are located within secondary structure elements might destroy the secondary structure and migth therefore have an impact on the protein function. To consider the position of the mutations, relative to the secondary structure of ARSA, we generated the following map:
As one can see in the picture above, none of the mutations is in the middle of a secondary structure element. Only the mutations 1,2,4 and 5 are close to or - depending on the prediction method - at the border of secondary structure elements.
Prediction of effect
SNAP
SNAP uses a neural-network approach to predict effects of single amino acid substitutions on protein function. It uses in silico derived protein information - like secondary structure, conservation, solvent accessibility, etc. - for the prediction. <ref> SNAP: predict effect of non-synonymous polymorphisms on function. Yana Bromberg and Burkhard Rost Nucleic Acids Research, 2007, Vol. 35, No. 11 3823-3835 </ref>
We ran snap using the following command:
snapfun -i ARSA.fasta -m mutants.txt -o snap.out
output:
nsSNP Prediction Reliability Index Expected Accuracy
----- ------------ ------------------- -------------------
D29N Non-neutral 7 96%
Q153H Neutral 0 53%
T274M Non-neutral 6 93%
T409I Non-neutral 1 63%
C489G Non-neutral 5 87%
W193C Non-neutral 3 78%
F356V Neutral 1 60%
N440S Non-neutral 2 70%
R496H Neutral 1 60%
P136A Non-neutral 4 82%
SNAP predicts three of our proteins to be neutral, the other non-neutral. In order to analyze all possible combinations of amino acid substitutions from the above mutated positions, we used the Generate Mutants
tool on http://rostlab.org/services/snap/submit to create all possible exchanges from the following pattern: referenceAminoAcidPosition*
. Then we again executed snap:
snapfun -i ARSA.fasta -m all_mutants.txt -o snap_all.out
Next, we wrote a perl script to parse and summarize the SNAP output in the following table, which shows which amino acid substitutions are Non-neutral or Neutral. We consider a residue as important if 66-100 % of all possible substitutions are Non-Neutral, as probably important if 33-66 % of possible substitutions are Non-Neutral and as not important, if 0-33 % of all possible substitutions are Non-Neutral.
ref\mutation | important | A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D29 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
Q153 | yes | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
T274 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
T409 | yes | Neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
C489 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
W193 | yes | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
F356 | probably | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Neutral | Neutral | Neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Neutral | |
N440 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | |
R496 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | Neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Neutral | Non-neutral | |
P136 | yes | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral | Non-neutral |
SIFT
SIFT predicts the effect of amino acid substitutions by building a multiple alignment and then calculating the probability of each possible substitution. The score in the SIFT-output is the probability of the substitution. SIFT predicts a substitution as damaging if this probability is <= 0.05 and as tolerated if the probability is > 0.05. The median conservation in the output measures the diversity of the sequences used in the multiple alignment. It should be between 2.75 and 3.25. Higher values indicate that the sequences were too closely related. <ref>http://sift.jcvi.org/www/SIFT_help.html</ref> We used SIFT with the UniProt-TrEMBL 2009 Database and uploaded a file containing our chosen mutations:
D29N P136A Q153H W193C T274M F356V T409I N440S C489G R496H
As median conservation we used the standard parameter 3.00 and we excluded all sequences with a sequence identity higher than 90%.
Mutation NR | Substitution | predicted | score | median conservation | comment |
---|---|---|---|---|---|
1 | D29N | AFFECT PROTEIN FUNCTION | 0.00 | 3.04 | |
2 | P136A | AFFECT PROTEIN FUNCTION | 0.00 | 3.07 | |
3 | Q153H | TOLERATED | 0.29 | 3.04 | |
4 | W193C | AFFECT PROTEIN FUNCTION | 0.04 | 3.04 | |
5 | T274M | AFFECT PROTEIN FUNCTION | 0.00 | 3.04 | |
6 | F356V | TOLERATED | 0.81 | 3.04 | |
7 | T409I | AFFECT PROTEIN FUNCTION | 0.02 | 3.48 | low confidence |
8 | N440S | TOLERATED | 0.07 | 3.08 | |
9 | C489G | AFFECT PROTEIN FUNCTION | 0.00 | 3.56 | low confidence |
10 | R496H | TOLERATED | 0.28 | 3.56 |
PolyPhen
PolyPhen predicts wether a mutation is damaging or not by using a Naïve-Bayes-approach. The score is the posterior probability that the mutation is damaging.<ref>http://genetics.bwh.harvard.edu/pph2/dokuwiki/overview</ref> We used PolyPhen with standard parameters. The results are shown below.
Summary of the prediction results
To compare the results of the different prediction methods we created the table below. If a mutation was predicted to have an effect, a "X" was set, if a mutation was predicted to have no effect, a "-" was set. For PolyPhen "X" means "damaging" or "probably damaging", a "/" means "possibly damaging" and a "-" means "benign".
Mutation NR | Substitution | SNAP | SIFT | PolyPhen | |
---|---|---|---|---|---|
HumDiv | HumVar | ||||
1 | D29N | X | X | X | X |
2 | P136A | X | X | X | X |
3 | Q153H | - | - | / | / |
4 | W193C | X | X | X | / |
5 | T274M | X | X | X | X |
6 | F356V | - | - | - | - |
7 | T409I | X | X | X | - |
8 | N440S | X | - | / | - |
9 | C489G | X | X | X | X |
10 | R496H | - | - | - | - |
Summary and Discussion
In this section we compare the results of the previous analyses and additionaly use pymol mutagenesis images and the physico-chemical properties of the amino acids to make our final guess of the impact of the mutation. All mutations are listed below, together with a pymol mutagenesis image and a description of the properties of the mutations. We also included short summary tables of the methods we applied and added a short discussion/interpretation of the results. For a detailed descitption of the summary tables, please read the individual sections.
Mutation 1
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
1 | Asp-Asn | 29 | |||||||||||||||||||||||||
Description of Asp-Asn
Aspartic acid is an acidic amino acid while Asparagine is a hydrophilic amino acid. So the mutation changes the behaviour towards water as well as to the pH. The lysosomal enzyme ARSA is active at a very low pH value, thus acidic amino acids are preferred in this environment. Consequently the effect could be deleterious. This hypothesis is supported by all predictions and also the substitution matrices show rather low values. The mutation is located at the border of a beta sheet, which is also an indicator for a possible deleterious effect. Also the conservation of the amino acid is very high in the MSA of related sequences, which indicates, that the residue is quite important. Furthermore it is classified as important residue by our SNAP analysis of all possible mutants, i.e. most of the substitutions lead to a deleterious effect. |
Mutation 2
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
2 | Pro - Ala | 136 | |||||||||||||||||||||||||
Description of Pro-Ala
Proline and Alanine are both hydrophobic amino acids. In contrast to mutation 1, the behaviour towards water does not change. As Proline is a cyclic amino acid, it can "break" alpha-helices and is structurally very important. It is even located at the border of an alpha-helix. Thus, the change to the small amino acid Alanine could introduce a big structural change, despite the similarity, regarding to their chemical properties. This structural change might e.g. occur due to an extension of the alpha-helix. |
Mutation 3
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
3 | Gln-His | 153 | |||||||||||||||||||||||||
Description of Gln-His
Glutamine is a hydrophilic amino acid while Histidine is a basic amino acid. So the behaviour towards water changes as well as the charge of the amino acid. Furthermore, Glutamine and Histidine are very different in structure. Histidine is bigger and needs much more space than Glutamine, which could have an influence on the structure of ARSA (see above pymol images). |
Mutation 4
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
4 | Trp-Cys | 193 | |||||||||||||||||||||||||
Description of Trp-Cys
Tryptophan is a hydrophobic, aromatic amino acid while Cysteine is a hydrophilic amino acid. So the behaviour towards water changes dramatically. Also, Trp is the largest amino acid while Cys is a rather small amino acid. So the space needed for the amino acid changes also. Structural features and chemical properties indicate an influence on the structure and function. |
Mutation 5
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
5 | Thr-Met | 274 | |||||||||||||||||||||||||
Description of Thr-Met
Threonine is a hydrophilic amino acid while Methionine is a hydrophobic amino acid. So the behaviour towards water changes. AMethionine has a very long sidechain while Threonine does not. So the physico-chemical features indicate, that the structure of ARSA could be altered by this mutation. |
Mutation 6
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
6 | Phe -Val | 356 | |||||||||||||||||||||||||
Description of Phe-Val
Phenylalanine and Valine are both hydrophobic amino acids. So the only impact on structure could come frome the structural differences between Phe and Val. Phe has a aromatic ring and due to that needs more space than Val. |
Mutation 7
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
7 | Thr-Ile | 409 | |||||||||||||||||||||||||
Description of Thr-Ile
Threonine is a hydrophilic amino acid while Isoleucine is a hydrophobic amino acid. So the behaviour towards water changes. Furthermore they are structurally not very similar (see mutagenesis image). |
Mutation 8
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
8 | Asn-Ser | 440 | |||||||||||||||||||||||||
Description of Asn-Ser
Asparagine and Serine are both hydrophilic amino acids. Also they are of similar size. The scores in the substitution matrices for this mutation are very high and the conservation in the MSA is very low, which indicates that the residue is not very important to the protein. Furthermore, the mutation is not disrupting a secondary structure element, which also favors the neutral effect hypothesis. The prediction methods do not agree on the effect of the mutation. SNAP predicts a non-neutral effect, whereas SIFT and Polyphen(HumVar) predict a neutral effetc. Polyphen(Humdiv) predicts a benign effect. However, the indications for a neutral effect predominate, so we guess, that this mutation has a neutral effect. |
Mutation 9
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
9 | Cys-Gly | 489 | |||||||||||||||||||||||||
Description of Cys-Gly
Cystein and Glycine are both hydrophilic amino acids. One difference is the ize. Gly is the smallest of the amino acids, while Cys is a little bigger. But even more important Cystein contains sulfur which is important for building sulfur bridges. Sulfur birdges serve as important covalent interactions for a lot of protein structures. Thus, if a Cysteine, which forms sulfur bridges, is replaced by another amino acid, the structure migth be dramatically changed and this again is likely to affect the function. This is also reflected by the scores in the substitution matrices. Furthermore, the residue is conserved across homologs to ARSA and all prediction methods agree, that this mutation is non-neutral. |
Mutation 10
Nr. | mutation | position | reference | mutation | both | ||||||||||||||||||||||
10 | Arg-His | 496 | |||||||||||||||||||||||||
Description of Arg-His
Arginine and Histidine are both basic amino acids. They strongly differ in thier structure. Histidine has an aromatic ring, whereas Arginine has not. |
Lifting the curtain - Our predictions vs. HGMD and dbSNP
Mutation NR | Substitution | SNAP | SIFT | PolyPhen | assignment by HGMD/dbSNP | our prediction | result of our prediction | |
---|---|---|---|---|---|---|---|---|
HumDiv | HumVar | |||||||
1 | D29N | X | X | X | X | non-neutral | non-neutral | true |
2 | P136A | X | X | X | X | non-neutral | non-neutral | true |
3 | Q153H | - | - | / | / | non-neutral | neutral | wrong |
4 | W193C | X | X | X | / | neutral | non-neutral | wrong |
5 | T274M | X | X | X | X | non-neutral | non-neutral | true |
6 | F356V | - | - | - | - | neutral | neutral | true |
7 | T409I | X | X | X | - | non-neutral | non-neutral | true |
8 | N440S | X | - | / | - | neutral | neutral | true |
9 | C489G | X | X | X | X | non-neutral | non-neutral | true |
10 | R496H | - | - | - | - | neutral | neutral | true |
The table shows, that our predictions are true in 8 out of ten cases. In most cases the prediction tools agree in their prediction and they are right. We can conclude, that one can make a good prediction, using the tools and methods we applied, but an uncertainty remains, if the prediction is correct. This can be seen, e.g. for mutation 4 where 3 of 4 methods assign a non-neutral effectm, but the mutation is neutral.
References
<references />