Difference between revisions of "Sequence-based mutation analysis"
m (→SNAP) |
m (→SIFT) |
||
Line 88: | Line 88: | ||
==SIFT== |
==SIFT== |
||
+ | SIFT (Sorting Intolerant From Tolerant) uses the knowledge, that important amino acids are more conservered and therefore less mutable than non important amino acids. It predicts if the mutation affects the protein function or not. SIFT was created in 2003 by Ng P. and Henikoff S. |
||
− | The prediction of SIFT is marked with a low confidence warning because, the sequences used for the prediction were not diverse enough. |
||
* Server: http://sift.jcvi.org/www/SIFT_seq_submit2.html |
* Server: http://sift.jcvi.org/www/SIFT_seq_submit2.html |
||
+ | |||
+ | The prediction of SIFT is marked with a low confidence warning because, the sequences used for the prediction were not diverse enough. |
||
{|class="sortable" |
{|class="sortable" |
Revision as of 13:34, 27 June 2011
SNP's
Because the HFE-Gen has no annotated functional site, we can just adress the biochemical changes for each SNP. A change in functionality or stability can not be described.
To compare the biochemical properties of the amino acid's, we used the very convenient function of wolfram alpha and the Wikipedia entry about amino acids <ref>http://en.wikipedia.org/wiki/Amino_acid</ref>. All SNPs are visualized at the following picture of the HFE_HUMAN UniProt sequence.
Mutation | Position | Database | Blosum62 | PAM1 | Pam250 | Physicochemical changes |
---|---|---|---|---|---|---|
S/C | 65 | HGMD/dbSNP | -1 | 5 | 3 | There is no change in size and charge, just the polarity changes because an oxygen atom is replaced by a sulfur atom. |
I/T | 105 | HGMD/dbSNP | -1 | 11 | 6 | An OH-group is replaced by an ethyl-group which leads to a small change in size and a large change in hydrophobicity. |
Q/H | 127 | HGMD/dbSNP | 0 | 20 | 7 | Because of the aromatic ring, the flexibility is reduced with a histidine at this position. The polarity and hydrophobicity remain the same. |
C/Y,S | 282 | HGMD/dbSNP | -2/-1 | 3/11 | 3/7 | In both cases, the hydrophobicity is changed. Cysteine is a nonpolar whereas Serine and Tyrosine are polar amino acids. |
R/M | 330 | HGMD | -1 | 1 | 1 | The charge of the side chain changes from positive to neutral and the size is changing but this should not have such a strong impact like the different charge. |
A/V | 176 | HGMD | 0 | 13 | 9 | Alanin and Valine are pretty similar in there properties, just the size changes and the hydrophobicity decreases, but both are water soluble. |
R/S | 6 | HGMD | -1 | 11 | 6 | The polarity is changing which has an impact on the surface on the water solubility of the protein. Also the size of the side chain changes. |
T/I | 217 | dbSNP | -1 | 7 | 4 | An ethyl-group is replaced by an OH-group which leads so a small change in size and a large change in hydrophobicity. |
M/T | 35 | dbSNP | -1 | 6 | 5 | The polarity is changing which has an impact on the surface on the water solubility of the protein. |
R/M | 58 | dbSNP | -1 | 1 | 1 | The charge of the side chain changes from positive to neutral and the size is changing but this should not have such a strong impact like the different charge. |
A change in polarity is just important for residues at the surface of the protein, because with a change in polarity, the hydrophobicity changes and so the water solubility decreases/increases.
Remark:
The reference sequence from UniProt has at position 58 a F, the SNP at this position is annotated as a R to M change. After a comparison of the accession number with the OMIM database, we believe that the position is not correct annotated in dbSNP. The position reported in OMIM is 330, therefore we treated these SNP's as the same and ignored the position 58 SNP. But because we had no SNP's occuring only in dbSNP and HGMD is annotating only disease realated SNP's, we decided to not replace this SNP by another one.
SNP analysis by Hand
First we searched in a non redundant database for homologoues sequences in mammalians. The result list can be found here. We created then a multiple sequence alignment using COBALT to calculate the conservation of the SNP positions.
Mutation | Position | Conservation(JalView) | Secondary Structure (UniProt) |
---|---|---|---|
S/C | 65 | 7 | Beta-Strand |
I/T | 105 | 7 | Helix |
Q/H | 127 | 2 | --- |
C/Y,S | 282 | 11 | Beta-Strand |
R/M | 330 | 6 | --- |
A/V | 176 | 7 | Helix |
R/S | 6 | 0 | --- |
T/I | 217 | 0 | --- |
M/T | 35 | 0 | Beta-Strand |
We removed all nonsensical sequences like short ones and sequences with an obvious large insertion or deletion.
Psi-Blast
- command line:
blastpgp -i hfe.fasta -d /data/blast/nr/nr -e 10E-6 -j 5 -Qpsiblast.mat -o psiblast.out
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 6 R -2 2 -3 -4 -2 -2 -3 -4 -3 1 4 -2 1 0 -4 -3 -2 -3 -2 1 0 17 0 0 0 0 0 0 0 0 72 0 0 3 0 0 0 0 0 8 0.51 0.24 35 M 0 -2 -1 -2 -2 -1 -2 -2 -2 -1 -1 -2 3 -2 -2 0 5 -3 -2 0 3 0 1 0 0 0 0 0 0 0 0 0 17 0 0 0 78 0 0 1 0.66 0.25 65 S 1 -2 0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -2 5 1 -4 -3 -2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 99 0 0 0 0 0.80 0.25 105 I 1 -3 -4 -4 -2 -3 -3 -3 -3 2 4 -3 1 -1 -3 -2 -2 -3 -2 1 19 0 0 0 0 0 0 0 0 17 64 0 0 0 0 0 0 0 0 1 0.50 0.25 127 Q -1 -2 -1 -2 -3 2 -1 6 -2 -4 -4 -2 -3 -4 -3 -1 -2 -3 -4 -4 0 0 0 0 0 17 2 81 0 0 0 0 0 0 0 0 0 0 0 0 1.06 0.25 176 A 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -2 -3 -2 0 -1 -3 -3 -1 96 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0.72 0.25 217 T 0 -2 0 -1 -2 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 3 -4 -2 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 17 0 0 0 0.70 0.25 282 C -1 -4 -4 -4 10 -4 -5 -3 -4 -2 -2 -4 -2 -3 -4 -2 -2 -3 -3 -2 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.89 0.25 330 R -3 2 -4 -4 3 -2 -3 -3 -3 -3 -3 -2 -2 0 -4 -3 -3 11 1 -3 0 19 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 70 1 0 1.90 0.26
SIFT
SIFT (Sorting Intolerant From Tolerant) uses the knowledge, that important amino acids are more conservered and therefore less mutable than non important amino acids. It predicts if the mutation affects the protein function or not. SIFT was created in 2003 by Ng P. and Henikoff S.
The prediction of SIFT is marked with a low confidence warning because, the sequences used for the prediction were not diverse enough.
Mutation | Position | Prediction | Score |
---|---|---|---|
S/C | 65 | AFFECT PROTEIN FUNCTION | 0.00 |
I/T | 105 | AFFECT PROTEIN FUNCTION | 0.00 |
Q/H | 127 | TOLERATED | 0.16 |
C/Y,S | 282 | AFFECT PROTEIN FUNCTION | 0.00 |
R/M | 330 | TOLERATED | 0.06 |
A/V | 176 | AFFECT PROTEIN FUNCTION | 0.01 |
R/S | 6 | AFFECT PROTEIN FUNCTION | 0.01 |
T/I | 217 | TOLERATED | 1.00 |
M/T | 35 | TOLERATED | 1.00 |
The complete prediction for each position and amino acid can be found here
We got three warning messages form SIFT for which we have no explanation at this time.
WARNING: Original amino acid H at position 31 is not allowed by the prediction. WARNING: Original amino acid S at position 45 is not allowed by the prediction. WARNING: Original amino acid Y at position 230 is not allowed by the prediction.
PolyPhen-2
Mutation | Position | Prediction | Score |
---|---|---|---|
S/C | 65 | PROBABLY DAMAGING | 0.997 |
I/T | 105 | PROBABLY DAMAGING | 0.998 |
Q/H | 127 | BENIGN | 0.002 |
C/Y,S | 282 | PROBABLY DAMAGING | 1.000/0.997 |
R/M | 330 | PROBABLY DAMAGING | 0.948 |
A/V | 176 | PROBABLY DAMAGING | 0.998 |
R/S | 6 | PROBABLY DAMAGING | 0.738 |
T/I | 217 | BENIGN | 0.195 |
M/T | 35 | PROBABLY DAMAGING | 0.989 |
SNAP
SNAP (screening for non acceptable polymorphisms) uses an neural network approach to predict the functional effects of non-synonymous SNPs. It was developed by Rost B. and Bromberg Y. in 2007. It is installed locally at our VM and therefore useable with the command line.
- command line:
snapfun -i hfe.fasta -m muta.txt -o snap.out
Mutation | Position | Prediction | Reliability Index | Expected Accuracy |
---|---|---|---|---|
S/C | 65 | Non-neutral | 3 | 78% |
I/T | 105 | Non-neutral | 3 | 78% |
Q/H | 127 | Non-neutral | 2 | 70% |
C/Y,S | 282 | Non-neutral | 6/5 | 93%/87% |
R/M | 330 | Neutral | 2 | 69% |
A/V | 176 | Non-neutral | 3 | 78% |
R/S | 6 | Non-neutral | 0 | 58% |
T/I | 217 | Non-neutral | 1 | 63% |
M/T | 35 | Neutral | 1 | 60% |
Discussion
General
We tried to gain as much knowledge of the SNP and therefore combined the information of the mutated amino acids, results from SNAP, SIFT and PolyPhen2 and if the mutation occurs inside an secondary structure element. We decided to to introduce the field HGMD/dbSNP disease with the groundtruth according to HGMD or dbSNP, if that mutation causes the disease. We are going to discuss each SNP one by one and will finally give an conclusion based on our opinion whether the mutation is disease causing or not. This statement will be compared against the groundtruth from HGMD/dbSNP.
Mutation 1 [R6S]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
R/S | 6 | Affect Function | Affect Function | Affect Function | --- | yes |
The polarity is changing which has an impact on the surface on the water solubility of the protein. Also the size of the side chain changes.
Mutation 2 [M35T]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
M/T | 35 | Tolerated | Tolerated | Affect Function | Beta-Strand | no |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 3 [S65C]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
S/C | 65 | Affect Function | Affect Function | Affect Function | Beta-Strand | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 4 [I105T]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
I/T | 105 | Affect Function | Affect Function | Affect Function | Helix | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 5 [Q127H]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
Q/H | 127 | Affect Function | Tolerated | Tolerated | --- | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 6 [A176V]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
A/V | 176 | Affect Function | Affect Function | Affect Function | Helix | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 7 [T217I]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
T/I | 217 | Affect Function | Tolerated | Tolerated | --- | no |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 8 [C282Y,S]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
C/Y,S | 282 | Affect Function | Affect Function | Affect Function | Beta-Strand | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
Mutation 9 [R330M]
Mutation | Position | SNAP | SIFT | PolyPhen2 | Secondary Structure | HGMD/dbSNP disease |
---|---|---|---|---|---|---|
R/M | 330 | Tolerated | Tolerated | Affect Function | --- | yes |
hier lassen wir uns noch über die mutation aus und was wir so gelernt haben..
References
<references />