Difference between revisions of "Sequence-based mutation analysis"

From Bioinformatikpedia
(Evaluation)
(General)
 
Line 203: Line 203:
 
==Discussion==
 
==Discussion==
 
===General===
 
===General===
We tried to gain as much knowledge of the SNP and therefore combined the information of the mutated amino acids, results from SNAP, SIFT and PolyPhen2 and if the mutation occurs inside an secondary structure element. We decided to to introduce the field ''HGMD/dbSNP disease'' with the ground truth according to HGMD or dbSNP, if that mutation causes the disease. We are going to discuss each SNP one by one and will finally give an conclusion based on our opinion whether the mutation is disease causing or not. This statement will be compared against the ground truth from HGMD/dbSNP.
+
We tried to gain as much knowledge of the SNP and therefore combined the information of the mutated amino acids, results from SNAP, SIFT and PolyPhen2 and if the mutation occurs inside an secondary structure element. We decided to introduce the field ''HGMD/dbSNP disease'' with the ground truth according to HGMD or dbSNP, if that mutation causes the disease. We are going to discuss each SNP one by one and will finally give an conclusion based on our opinion whether the mutation is disease causing or not. This statement will be compared against the ground truth from HGMD/dbSNP.
  +
  +
In general, we see damaging mutations mostly in regions which are assumed to interact with M2B and TFRC.
   
 
===Mutation 1 [R6S]===
 
===Mutation 1 [R6S]===

Latest revision as of 15:10, 30 August 2011

by Robert Greil and Cedric Landerer

SNP's

Figure 1: HUMAN_HFE with SNPs from above, HGMD:green, dbSNP:red and both:blue

Because the HFE-Gen has no annotated functional site, we can just adress the biochemical changes for each SNP. A change in functionality or stability can not be described.
To compare the biochemical properties of the amino acid's, we used the very convenient function of wolfram alpha and the Wikipedia entry about amino acids <ref>http://en.wikipedia.org/wiki/Amino_acid</ref>. All SNPs are visualized in Figure 1.

Mutation Position Database Blosum62 PAM1 Pam250 Physicochemical changes
S/C 65 HGMD/dbSNP -1 5 3 There is no change in size and charge, just the polarity changes because an oxygen atom is replaced by a sulfur atom.
I/T 105 HGMD/dbSNP -1 11 6 An OH-group is replaced by an ethyl-group which leads to a small change in size and a large change in hydrophobicity.
Q/H 127 HGMD/dbSNP 0 20 7 Because of the aromatic ring, the flexibility is reduced with a histidine at this position. The polarity and hydrophobicity remain the same.
C/Y,S 282 HGMD/dbSNP -2/-1 3/11 3/7 In both cases, the hydrophobicity is changed. Cysteine is a nonpolar whereas Serine and Tyrosine are polar amino acids.
R/M 330 HGMD -1 1 1 The charge of the side chain changes from positive to neutral and the size is changing but this should not have such a strong impact like the different charge.
A/V 176 HGMD 0 13 9 Alanine and Valine are pretty similar in their properties, just the size changes and the hydrophobicity decreases, but both are water soluble.
R/S 6 HGMD -1 11 6 The polarity is changing which has an impact on the surface on the water solubility of the protein. Also the size of the side chain changes.
T/I 217 dbSNP -1 7 4 An ethyl-group is replaced by an OH-group which leads so a small change in size and a large change in hydrophobicity.
M/T 35 dbSNP -1 6 5 The polarity is changing which has an impact on the surface on the water solubility of the protein.
R/M 58 dbSNP -1 1 1 The charge of the side chain changes from positive to neutral and the size is changing but this should not have such a strong impact like the different charge.

A change in polarity is just important for residues at the surface of the protein, because with a change in polarity, the hydrophobicity changes and so the water solubility decreases/increases.

Remark

We were only able to visualize 7 of our 10 mutations because of the following reasons (Figure 1).

Mutation with position 6 lies inside the signaling peptide and is therefore not included in our model.

The reference sequence from UniProt has at position 58 a F, the SNP at this position is annotated as a R to M change. After a comparison of the accession number with the OMIM<ref>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1852721/?tool=pubmed</ref> database, we believe that the position is not correct annotated in dbSNP<ref> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC29783/?tool=pubmedf</ref>. The position reported in OMIM is 330, therefore we treated these SNP's as the same and ignored the position 58 SNP. But because we had no SNP's occurring only in dbSNP and HGMD<ref>http://link.springer.de/link/service/journals/00439/papers/6098005/60980629.pdf</ref> is annotating only disease related SNP's, we decided to not replace this SNP by another one.

We were also not able to visualize the SNP at position 330: R/M with our pdb of 1A6Z_A. We tried to align the UniProt sequence to chain a and b of 1A6Z but the alignment was after the ending of chain a not well conserved and many gaps has been inserted. Therefore we decided to not visualize the SNP at position 330.

SNP analysis by Hand

First we searched in a non redundant database for homologous sequences in mammalians. The result list can be found here. We created then a multiple sequence alignment using COBALT to calculate the conservation of the SNP positions.


Mutation Position Conservation(JalView) Secondary Structure (UniProt)
S/C 65 7 Beta-Strand
I/T 105 7 Helix
Q/H 127 2 ---
C/Y,S 282 11 Beta-Strand
R/M 330 6 ---
A/V 176 7 Helix
R/S 6 0 ---
T/I 217 0 ---
M/T 35 0 Beta-Strand

We removed all nonsensical sequences like short ones and sequences with an obvious large insertion or deletion.

Psi-Blast<ref>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647318/?tool=pubmed</ref>

  • command line: blastpgp -i hfe.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Qpsiblast.mat -o psiblast.out
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gap-less real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
   6 R   -2  2 -3 -4 -2 -2 -3 -4 -3  1  4 -2  1  0 -4 -3 -2 -3 -2  1    0  17   0   0   0   0   0   0   0   0  72   0   0   3   0   0   0   0   0   8  0.51 0.24
  35 M    0 -2 -1 -2 -2 -1 -2 -2 -2 -1 -1 -2  3 -2 -2  0  5 -3 -2  0    3   0   1   0   0   0   0   0   0   0   0   0  17   0   0   0  78   0   0   1  0.66 0.25
  65 S    1 -2  0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -2  5  1 -4 -3 -2    1   0   0   0   0   0   0   0   0   0   0   0   0   0   0  99   0   0   0   0  0.80 0.25
 105 I    1 -3 -4 -4 -2 -3 -3 -3 -3  2  4 -3  1 -1 -3 -2 -2 -3 -2  1   19   0   0   0   0   0   0   0   0  17  64   0   0   0   0   0   0   0   0   1  0.50 0.25
 127 Q   -1 -2 -1 -2 -3  2 -1  6 -2 -4 -4 -2 -3 -4 -3 -1 -2 -3 -4 -4    0   0   0   0   0  17   2  81   0   0   0   0   0   0   0   0   0   0   0   0  1.06 0.25
 176 A    5 -2 -2 -2 -1 -1 -1  0 -2 -2 -2 -1 -2 -3 -2  0 -1 -3 -3 -1   96   0   0   0   0   0   4   1   0   0   0   0   0   0   0   0   0   0   0   0  0.72 0.25
 217 T    0 -2  0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1  5  3 -4 -2 -2    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  83  17   0   0   0  0.70 0.25
 282 C   -1 -4 -4 -4 10 -4 -5 -3 -4 -2 -2 -4 -2 -3 -4 -2 -2 -3 -3 -2    0   0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  1.89 0.25
 330 R   -3  2 -4 -4  3 -2 -3 -3 -3 -3 -3 -2 -2  0 -4 -3 -3 11  1 -3    0  19   0   0  10   0   0   0   0   0   0   0   0   0   0   0   0  70   1   0  1.90 0.26

It is obvious that almost all of our mutations (red) have values below zero, which means they are not tolerated.

SIFT

SIFT (Sorting Intolerant From Tolerant) uses the knowledge, that important amino acids are more conserved and therefore less mutable than non important amino acids. It predicts if the mutation affects the protein function or not. SIFT was created in 2003 by Ng P. and Henikoff S.

The prediction of SIFT is marked with a low confidence warning because, the sequences used for the prediction were not diverse enough.

Mutation Position Prediction Score
S/C 65 AFFECT PROTEIN FUNCTION 0.00
I/T 105 AFFECT PROTEIN FUNCTION 0.00
Q/H 127 TOLERATED 0.16
C/Y,S 282 AFFECT PROTEIN FUNCTION 0.00
R/M 330 TOLERATED 0.06
A/V 176 AFFECT PROTEIN FUNCTION 0.01
R/S 6 AFFECT PROTEIN FUNCTION 0.01
T/I 217 TOLERATED 1.00
M/T 35 TOLERATED 1.00

The complete prediction for each position and amino acid can be found here

We got three warning messages form SIFT for which we have no explanation at this time.

WARNING: Original amino acid H at position 31 is not allowed by the prediction. 
WARNING: Original amino acid S at position 45 is not allowed by the prediction. 
WARNING: Original amino acid Y at position 230 is not allowed by the prediction.

PolyPhen-2

PolyPhen-2 (Polymorphism Phenotyping Version 2) uses structure and sequence based features to predict the influence of an single nucleotide mutation at an human protein. PolyPhen-2 was established by Adzhubei et al. in 2010.

Mutation Position Prediction Score
S/C 65 PROBABLY DAMAGING 0.997
I/T 105 PROBABLY DAMAGING 0.998
Q/H 127 BENIGN 0.002
C/Y,S 282 PROBABLY DAMAGING 1.000/0.997
R/M 330 PROBABLY DAMAGING 0.948
A/V 176 PROBABLY DAMAGING 0.998
R/S 6 PROBABLY DAMAGING 0.738
T/I 217 BENIGN 0.195
M/T 35 PROBABLY DAMAGING 0.989

SNAP

SNAP (screening for non acceptable polymorphisms) uses an neural network approach to predict the functional effects of non-synonymous SNPs. It was developed by Rost B. and Bromberg Y. in 2007. It is installed locally at our VM and therefore usable with the command line.

  • command line: snapfun -i hfe.fasta -m muta.txt -o snap.out
Mutation Position Prediction Reliability Index Expected Accuracy
S/C 65 Non-neutral 3 78%
I/T 105 Non-neutral 3 78%
Q/H 127 Non-neutral 2 70%
C/Y,S 282 Non-neutral 6/5 93%/87%
R/M 330 Neutral 2 69%
A/V 176 Non-neutral 3 78%
R/S 6 Non-neutral 0 58%
T/I 217 Non-neutral 1 63%
M/T 35 Neutral 1 60%

Discussion

General

We tried to gain as much knowledge of the SNP and therefore combined the information of the mutated amino acids, results from SNAP, SIFT and PolyPhen2 and if the mutation occurs inside an secondary structure element. We decided to introduce the field HGMD/dbSNP disease with the ground truth according to HGMD or dbSNP, if that mutation causes the disease. We are going to discuss each SNP one by one and will finally give an conclusion based on our opinion whether the mutation is disease causing or not. This statement will be compared against the ground truth from HGMD/dbSNP.

In general, we see damaging mutations mostly in regions which are assumed to interact with M2B and TFRC.

Mutation 1 [R6S]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
R/S 6 Affect Function Affect Function Affect Function --- yes - - -

Change of the nonpolar, positive Arginine to the polar and neutral Serine indicates some issues, because of the hydrophobocity and the side-chains. According to the substitution matrices occurs that change less often, because is scored with high values. The change is also not inside an secondary structure element, which is good for the protein functionality.

SNAP, SIFT and PolyPhen2 are all in common, that this change is non-neutral and will interrupt the protein function. SNAP has an reliability index of 0, which indicates it is very unsure about its prediction but SIFT and PolyPhen2 are both sure, that their prediction is correct. SIFT scores with 0.01, which is very good and PolyPhen2 with an score of 0.75 which is also very good and confident.

Because all three methods came to the same results, we are going to trust them and claim that mutation as non-neutral. According to HGMD is that prediction correct.

Mutation 2 [M35T]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
M/T 35 Tolerated Tolerated Affect Function Beta-Strand no UProt(35) pdb(13) org.png UProt(35) pdb(13) mutant.png UProt(35) pdb(13) superpos.png

A change from a non-polar(Methionine) to a polar(Threonine) amino acid causes an alteration in the hydrophobicity of the protein. The hydrophobic sulfur-group is replaced by an OH-group. According to the substitution matrices and the PSI-BLAST matrix this should be a very common mutation, especially for this position.

According to this information and that 2 out of three methods predict the mutation as a neutral one. According to dbSNP is that a correct prediction. But it might just be unknown yet.

Mutation 3 [S65C]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
S/C 65 Affect Function Affect Function Affect Function Beta-Strand yes UProt(65) pdb(43) org.png UProt(65) pdb(43) mutant.png UProt(65) pdb(43) superpos.png

Replacing the polar and neutral Serine with the non-polar and neutral Cysteine does only change the polarity of the side-chain, because a sulfur atom is replaced with an oxygen atom. Therefore this mutation might not even affect the protein function. The scores of the substitution matrices are in the midrange, which means this change occurs from time to time and is nothing very special. Its medium conserved but placed inside and beta-strand which could affect the protein folding.

SNAP, SIFT and PolyPhen2 are again all confident, that this mutation is non-neutral. SNAPs prediction is reliable with an confidence score of 3, SIFT scores this mutation with 0.00 and is absolutely sure and PolyPhen2 achieves an score of almost 1.0. All assumptions are correct.

Mutation 4 [I105T]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
I/T 105 Affect Function Affect Function Affect Function Helix yes UProt(105) pdb(83) org.png UProt(105) pdb(83) mutant.png UProt(105) pdb(83) superpos.png

A change from a non-polar(Isoleucine) to a polar(Threonine) amino acid causes an alteration in the hydrophobicity of the protein because of the change of an ethyl-group into an OH-group. This could change the affinity of the binding site. But, because there is no annotated binding site we just assume that this is amino acid could be, according to the I-Tasser prediction of Task4, a part of the binding site.

All Methods predict a damaging influence of this change, but according to the substitution matrix, this is a common substitution in general. Just the substitution matrix gained from the PSI-BLAST run indicates that this SNP is very rare at this position.

Based on this information, we would trust the prediction of the Tools which is supported by the PSI-BLAST matrix. Therefor we would classify this SNP as non-neutral. According to HGMD and dbSNP this prediction is correct.

Mutation 5 [Q127H]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
Q/H 127 Affect Function Tolerated Tolerated --- yes UProt(127) pdb(105) org.png UProt(127) pdb(105) mutant.png UProt(127) pdb(105) superpos.png

This mutation changes the polar amino acid Glutamine into the polar and aromatic amino acid Histidine. This is in first case a change in the size of the side chain. According to the substitution matrices, this is a very common mutation in evolution in general. Also 2 out of three tools predict a neutral mutation, just SNAP predict a damaging influence of the mutation. This and the fact that according to the PSI-BLAST matrix which shows that this is a not occurring mutation at this position is the drop of bitterness in this prediction. Therefore we stay undecided for this mutation and would prefer to not assign a damaging effect or the neutrality to this mutation. According to HGMD and dbSNP this is a damaging mutation. But it seems to be a hard to decide case.

Mutation 6 [A176V]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
A/V 176 Affect Function Affect Function Affect Function Helix yes UProt(176) pdb(154) org.png UProt(176) pdb(154) mutant.png UProt(176) pdb(154) superpos.png

Alanine and Valine are both non-polar and neutral and very similar in their properties but Valine has a longer side-chain. A mutation will lead to an decrease of the hydrophobocity. According to the PAM substitution matrices is that change very uncommon, because it is scored very high. The mutation affects a helix which can lead to an break of the helix or to a complete breakdown of the protein structure.

SNAP, SIFT and PolyPhen assign that mutation an non-neutral tag. All methods are very confident with their predictions, SIFT scores wich 0.01, SNAP with an reliability index of 3 and 78% confidence and finally PolyPhen2 with 0.998 confidence.

We are surprised, because we first thought such a small change of Alanine to Valine would not cause that much issues. We think, the most issues derive from the possible break of the secondary structure element and therefore the loss of the protein function. According to HGMD are all three methods correct because the mutation is deleterious.

Mutation 7 [T217I]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
T/I 217 Affect Function Tolerated Tolerated --- no UProt(217) pdb(195) org.png UProt(217) pdb(195) mutant.png UProt(217) pdb(195) superpos.png

A change from a polar(Threonine) to a non-polar(Isoleucine) amino acid causes an alteration in the hydrophobicity of the protein because of the change of an OH-group into an ethyl-group. This mutation is a common one and is predicted as a neutral one. Just SNAP predict a non-neural mutation but with a low reliability index. In this case, we would assume that this mutation is a neutral one. This is according to dbSNP true.

Mutation 8 [C282Y,S]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation:Y Superposition:Y Mutation:S Superposition:S
C/Y,S 282 Affect Function Affect Function Affect Function Beta-Strand yes UProt(282) pdb(260) org.png UProt(282a) pdb(260) mutant.png UProt(282a) pdb(260) superpos.png UProt(282b) pdb(260) mutant.png UProt(282b) pdb(260) superpos.png

It is the mutation from the non-polar and neutral Cysteine to the polar and neutral Tyrosine or Serine, therefore a change in the hydrophobocity occurs. Thyrosine also forms a ring and can influence the protein folding. The change occurs inside an beta-strand which is also bad, because changes inside of secondary structure elements might affect the structure of the protein and thus the function.

The substitution matrices values are high for the change of Cysteine to Serine and low for the change to Thyrosine. That indicates, the replacement with Serine is rare and the change to Thyrosine occurs often. According to JalView is the conversation very high.

SNAP, SIFT and PolyPhen2 came to the conclusion, that the change affects the function of the protein. Looking also at the other results, we came to the same conclusion, because all indicates and breakdown of the protein function.

According to HGMD/dbSNP is our assumption correct, the C282Y,S mutation is the most commonly occurring mutation and causes around 90% of all hemochromatosis cases.

Mutation 9 [R330M]

Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease Reference Mutation Superposition
R/M 330 Tolerated Tolerated Affect Function --- yes - - -

The mutation of the non-polar and positive Arginine to the non-polar and neutral Methionine is bad because of the change of the polarity. This causes problems with the hydrophobocity and protein folding. But the mutation is not inside and secondary structure and good conserved, so this change might not affect the protein function. According to the substitution matrices is that change scored with very low values and thus should happen more often and could not be that deleterious.

That statement is also supported by the results of SNAP and SIFT; both define the mutation as neutral. SNAP is not very sure about its prediction with a reliability index score of only 2 but an accuracy of 69%. SIFT is also not sure with score of 0.06 but still states as neutral. However PolyPhen2 lists that mutation clearly as non-neutral with an score of ~0.95.

Although two of our three methods call this mutation neutral, we decided it has to be non-neutral. Because SIFT ans SNAP are both very unsure about their prediction but the PolyPhen2 prediction has an score of almost 1.0, we are going to believe more into PolyPhen2. This decision is also supported by the facts, that a change of polarity inside and protein must introduce problems. Therefore we call that mutation as non-neutral and protein-function affecting.

Evaluation

Name Mutation Position SNAP SIFT PolyPhen2 Secondary Structure HGMD/dbSNP disease
Mutation 1 [R6S] R/S 6 Affect Function Affect Function Affect Function --- yes
Mutation 2 [M35T] M/T 35 Tolerated Tolerated Affect Function Beta-Strand no
Mutation 3 [S65C] S/C 65 Affect Function Affect Function Affect Function Beta-Strand yes
Mutation 4 [I105T] I/T 105 Affect Function Affect Function Affect Function Helix yes
Mutation 5 [Q127H] Q/H 127 Affect Function Tolerated Tolerated --- yes
Mutation 6 [A176V] A/V 176 Affect Function Affect Function Affect Function Helix yes
Mutation 7 [T217I] T/I 217 Affect Function Tolerated Tolerated --- no
Mutation 8 [C282Y,S] C/Y,S 282 Affect Function Affect Function Affect Function Beta-Strand yes
Mutation 9 [R330M] R/M 330 Tolerated Tolerated Affect Function --- yes
Correct Predictions 7 7 7

As one can clearly see, we can not make any suggestions, which method works better. All methods achieved 7 of 9 right predictions and 2 wrong predictions. It is informative to know, that PolyPhen2 was always very confident about its own predictions, SNAP and SIFT sometimes not.

References

<references />