Difference between revisions of "Fabry:Sequence-based mutation analysis"
Staniewski (talk | contribs) (→PSSM) |
Staniewski (talk | contribs) (→PSSM) |
||
Line 510: | Line 510: | ||
| style="border-right: 2px solid #000; padding: 5px;" | wild type |
| style="border-right: 2px solid #000; padding: 5px;" | wild type |
||
| style="padding: 5px;" | 81% |
| style="padding: 5px;" | 81% |
||
− | | style="padding: 5px;" | 16% |
+ | | style="padding: 5px;" | '''16%''' |
| style="padding: 5px;" | 8% |
| style="padding: 5px;" | 8% |
||
| style="padding: 5px;" | 11% |
| style="padding: 5px;" | 11% |
||
| style="padding: 5px;" | 6% |
| style="padding: 5px;" | 6% |
||
− | | style="padding: 5px;" | 12% |
+ | | style="padding: 5px;" | '''12%''' |
| style="padding: 5px;" | 17% |
| style="padding: 5px;" | 17% |
||
| style="padding: 5px;" | 29% |
| style="padding: 5px;" | 29% |
||
Line 522: | Line 522: | ||
| style="border-right: 2px solid #000; padding: 0px 5px;" | mutant type |
| style="border-right: 2px solid #000; padding: 0px 5px;" | mutant type |
||
| style="padding: 0px 5px;" | 2% |
| style="padding: 0px 5px;" | 2% |
||
− | | style="padding: 0px 5px;" | 20% |
+ | | style="padding: 0px 5px;" | '''20%''' |
| style="padding: 0px 5px;" | 2% |
| style="padding: 0px 5px;" | 2% |
||
| style="padding: 0px 5px;" | 6% |
| style="padding: 0px 5px;" | 6% |
||
| style="padding: 0px 5px;" | 4% |
| style="padding: 0px 5px;" | 4% |
||
− | | style="padding: 0px 5px;" | 38% |
+ | | style="padding: 0px 5px;" | '''38%''' |
| style="padding: 0px 5px;" | 11% |
| style="padding: 0px 5px;" | 11% |
||
| style="padding: 0px 5px;" | 19% |
| style="padding: 0px 5px;" | 19% |
Revision as of 17:00, 18 June 2012
Fabry Disease » Sequence-based mutation analysis
The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.
Contents
Dataset preparation
Q279E N215S I289V S65T R356W V316I P323T P40S R118H A143T
Amino acid properties
<figtable id="tab:aaProp">
Physicochemical properties of the chosen SNPs and changes of properties between wildtype (wt) and mutant (mt).
Used abbreviation in this table:
AA: Amino Acid, Pol: Side-chain polarity, Charge: Side-chain charge at pH 7.4, HI: Hydropathy index, RM: Residue Mass, iP: isoelectric point
SNP | wt AA |
wt Pol |
wt Charge |
wt HI |
wt RM |
wt iP |
mt AA |
mt Pol |
mt Charge |
mt HI |
mt RM |
mt iP |
change in Pol | change in Charge | change in HI | change in RM | change in iP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Q279E | Q | polar | neutral | -3.5 | 128.131 | 5.65 | E | polar | negative | -3.5 | 129.116 | 3.15 | none | neutral to negative |
0 | 0.99 | -2.5 |
N215S | N | polar | neutral | -3.5 | 114.104 | 5.41 | S | polar | neutral | -0.8 | 87.078 | 5.68 | none | none | 2.7 | -27.026 | 0.27 |
I289V | I | nonpolar | neutral | 4.5 | 113.160 | 6.05 | V | nonpolar | neutral | 4.2 | 99.133 | 6.00 | none | none | -0.3 | -14.027 | -0.05 |
S65T | S | polar | neutral | -0.8 | 87.078 | 5.68 | T | polar | neutral | -0.7 | 101.105 | 5.60 | none | none | 0.1 | 14.027 | -0.08 |
R356W | R | polar | positive | -4.5 | 156.188 | 10.76 | W | nonpolar | neutral | -0.9 | 186.213 | 5.89 | polar to nonpolar |
positive to neutral |
3.6 | 30.025 | -4.87 |
V316I | V | nonpolar | neutral | 4.2 | 99.133 | 6.00 | I | nonpolar | neutral | 4.5 | 113.160 | 6.05 | none | none | 0.3 | 14.027 | 0.05 |
P323T | P | nonpolar | neutral | -1.6 | 97.117 | 6.30 | T | polar | neutral | -0.7 | 101.105 | 5.60 | nonpolar to polar |
none | 0.9 | 3.988 | -0.7 |
P40S | P | nonpolar | neutral | -1.6 | 97.117 | 6.30 | S | polar | neutral | -0.8 | 87.078 | 5.68 | nonpolar to polar |
none | 0.8 | -10.039 | -0.62 |
R118H | R | polar | positive | -4.5 | 156.188 | 10.76 | H | polar | pos(10%), neutr(90%) |
-3.2 | 137.142 | 7.60 | none | positive to pos(10%), neutr(90%) |
1.3 | -19.046 | -3.16 |
A143T | A | nonpolar | neutral | 1.8 | 71.079 | 6.01 | T | polar | neutral | -0.7 | 101.105 | 5.60 | nonpolar to polar |
none | -2.5 | 30.026 | -0.41 |
</figtable>
The polarity of the side chain determines whether an amino acid is hydrophobic or not.
Hydrophobicity is a measure of how soluble an amino acid is in water. Hydrophobic amino acids are more likely to be found inside a protein, while hydrophilic amino acids rather are in contact with the aqueous environment. <ref>Hydrophobicity Index for Common Amino Acids http://www.sigmaaldrich.com/life-science/metabolomics/learning-center/amino-acid-reference-chart.html#hydro, June 16, 2012</ref>
Therefore, depending on the localisation of an amino acid, a change in the polarity due to a mutation can cause a major defect. This may be the case in the SNPs P40S, A143T, P323T and R356W.
Furthermore the type of charge (positive or negative) is important for the structure of a protein, because it controls the binding of the amino acid to close-by residues. A modification again can break the coherence of the protein, which might happen when the mutations R118H, Q279E and R356W occur.
The hydropathy index of an amino acid is a number representing the hydrophobic or hydrophilic properties of its sidechain.<ref>Kyte J, Doolittle RF (May 1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1): 105–32. PMID 7108955. </ref>
The larger the number is, the more hydrophobic the amino acid, thus the most hydrophobic amino acid is isoleucine (4.5) and the most hydrophilic one is arginine (-4.5).
Considering a hydropathy change greater than 1 (in both direction, positive and negative) as crucial, only one SNP highly increases the hydrophobicity (A143T) and 3 increase the hydrophilic character of the position (R118H, N215S and R356W)
The average residue mass ranges from 57.052 (Glycine) to 186.213 (Tryptophan), thus we expect an alteration of the mass of greater than 10 as critical. This concerns all mutations expect for Q279E and P323T.
The isoelectric point is the pH at which an amino acid carries no net charge. Below the pI it carries a net positive charge, above it a net negative charge. Since the pH in the human body is on average 6.7, only three amino acids are positively charged (Histidine, Lysine and Arginine). The pI ranges from 2.85 (Aspartic acid) to 10.76 (Arginine), therefore we considered a change of 0.8 as probably desease causing. This applies only for R118H, Q279E and R356W.
Simple structural analysis
- Now take into consideration where in the protein the mutation occurs and document: Create a picture with PyMOL showing the original and mutated residue in the protein. Use PyMOL for this. More thorough structural analyses will be introduced in the next task.
Secondary Structure
<figtable id="tab:Location">
Secondary structure assignment predicted by the three methods Psipred, Reprof and DSSP in Task 3 for the mutated amino acid itself and
the ten adjacent residues to the left and to the right.
H represents a helix at this position, C represents coiled regions, E sheets and - is a not predictable region
SNP | SecStruc Psipred |
SecStruc Psipred long |
SecStruc Reprof |
SecStruc Reprof long |
SecStruc DSSP |
SecStruc DSSP long |
---|---|---|---|---|---|---|
Q279E | H | CCCCCCCHHHHHHHHHHHHHH | H | EECCCCCCHHHHHHHHHHHHH | H | CCCCC--HHHHHHHHHHHHHC |
N215S | H | CCCCCCCCCCHHHHHCCCCCC | H | CECCCCCCCCHHHHHHHHHHH | H | HHHCCCC---HHHHCCC-CEE |
I289V | H | HHHHHHHHHHHCCCEEEECCC | H | HHHHHHHHHHHHHCHHCCCCC | C | HHHHHHHHHHCC--EEE-C-C |
S65T | H | CCCCCCCCCCHHHHHHHHHHH | H | CCCCCCCHHHHHHHHHHHHHH | H | CCC-CCCC-CHHHHHHHHHHH |
R356W | C | CCEEEEEEECCCCCCEEEEEE | C | HHHHHHHHHHCCCCCCCCHHH | - | CEEEEEEEE---CCC-EEEEE |
V316I | H | HHHCCCCHHHHHHCCCCCCCC | E | HHHHCCCCCEEEECCCCCCCC | H | HHHHHH-HHHHHHHC-CC--- |
P323T | C | HHHHHHCCCCCCCCCEEEEEC | C | CCEEEECCCCCCCCCCEECCC | C | HHHHHHHC-CC----EEEE-C |
P40S | C | CCCCCCCCCCCCCCCCCCCCC | C | HHCCCCCCCCCCCHHHHHHEE | - | ---CC--CC--EEEECHHHHC |
R118H | H | CCCCCCCCHHHHHHHHHHCCC | H | CCCCCCHHHHHHHHHHHCCCC | H | -CCC-CCHHHHHHHHHHHCC- |
A143T | C | EECCCCCCCCCCCCCCCCHHH | C | EEECCCCCCCCCCCCCCCCCC | C | EEECCCE-CCCCE--CCCHHH |
</figtable>
Despite the fact, we know from last weeks' task, that the disease causing mutations are spread all over the protein without any respect to the secondary structure, we assumed we had no prior knowledge about it. Thus we looked at the predicted secondary structure at the position of each point mutation and its surrounding (10 residues to the left and 10 residues to the right). The only remarkable fact is, that there are (almost) no sheets at the mutated residues. From Task 6 we know, that this happened only by chance and due to the small amount of picked SNPs.
Substitution matrices
<figtable id="tab:Subsmatr">
Substitution values for all SNPs,
assigned by the three substitution matrices
BLOSUM62, PAM1 and PAM250.
SNP | Value BLOSUM62 |
Value PAM1 |
Value PAM250 |
---|---|---|---|
Q279E | 3 | 27 | 2 |
N215S | 1 | 20 | 1 |
I289V | 4 | 33 | 4 |
S65T | 2 | 38 | 1 |
R356W | -4 | 8 | 2 |
V316I | 4 | 57 | 4 |
P323T | -2 | 4 | 0 |
P40S | -1 | 12 | 1 |
R118H | 0 | 10 | 2 |
A143T | 0 | 32 | 1 |
</figtable>
Since the PAM1 and the PAM250 matrices are designed for proteins of very diverse degree of kinship, 99% and ~20% relationship, respectively, those two matrices tend to give contradictory scores of how likely a substitution is. On the other hand, BLOSUM62 and PAM1, although the BLOSUM matrix was created from sequences with identity of less than 62 percent, usually provide similar predictions.
PSSM
conservation of | P40S | S65T | R118H | A143T | N215S | Q279E | I289V | V316I | P323T | R356W |
---|---|---|---|---|---|---|---|---|---|---|
wild type | 81% | 16% | 8% | 11% | 6% | 12% | 17% | 29% | 15% | 15% |
mutant type | 2% | 20% | 2% | 6% | 4% | 38% | 11% | 19% | 7% | 4% |
Multiple sequence alignment
- And another step close to evolution: Identify all mammalian homologous sequences. Create a multiple sequence alignment for them with a method of your choice. Using this you can now calculate conservation for WT and mutant residues again. Compare this to the matrix- and PSSM-derived results.
Scoring methods
SIFT
<figtable id="tab:Sift"> Sift Scores
SNP | Prediction | Sift Score | Sequences represented at this position |
---|---|---|---|
P40S | AFFECT PROTEIN FUNCTION | 0.00 | 41 |
S65T | AFFECT PROTEIN FUNCTION | 0.01 | 45 |
R118H | be TOLERATED | 0.06 | 48 |
A143T | AFFECT PROTEIN FUNCTION | 0.01 | 48 |
N215S | AFFECT PROTEIN FUNCTION | 0.01 | 48 |
Q279E | AFFECT PROTEIN FUNCTION | 0.00 | 48 |
I289V | AFFECT PROTEIN FUNCTION | 0.05 | 48 |
V316I | be TOLERATED | 0.75 | 48 |
P323T | AFFECT PROTEIN FUNCTION | 0.01 | 48 |
R356W | AFFECT PROTEIN FUNCTION | 0.01 | 47 |
</figtable>
Median sequence conservation: 2.99
Polyphen2
<figtable id="tab:Polyphen"> Polyphen Scores
SNP | rs ID | Sec Struc | Prediction | pph2 Class | pph2 Prob | pph2 FPR | pph2 TPR | pph2 FDR |
---|---|---|---|---|---|---|---|---|
Q279E | rs28935485 | H | probably damaging | deleterious | 0.983 | 0.0387 | 0.745 | 0.0657 |
N215S | rs28935197 | . | benign | neutral | 0.048 | 0.167 | 0.941 | 0.194 |
I289V | ? | H | probably damaging | deleterious | 0.975 | 0.0436 | 0.762 | 0.072 |
S65T | ? | . | probably damaging | deleterious | 0.995 | 0.0277 | 0.681 | 0.0521 |
R356W | ? | . | probably damaging | deleterious | 1 | 0.00026 | 0.00018 | 0.0109 |
V316I | ? | H | benign | neutral | 0.308 | 0.113 | 0.904 | 0.144 |
P323T | ? | T | possibly damaging | deleterious | 0.612 | 0.091 | 0.872 | 0.124 |
P40S | ? | . | probably damaging | deleterious | 1 | 0.00026 | 0.00018 | 0.0109 |
R118H | ? | H | benign | neutral | 0.015 | 0.209 | 0.956 | 0.229 |
A143T | ? | T | probably damaging | deleterious | 1 | 0.00026 | 0.00018 | 0.0109 |
</figtable>
SNAP
- SNAP is installed on the VirtualBox and should be used command-line only. -- As blast is the bottleneck of SNAP, and you are doing that anyway, we might as well look at all possible substitutions in the position of our mutations. This way we can learn much more about the nature of the given mutation: Is our mutation problematic because we introduce an unwanted effect, or because the WT residue is essential and by mutating we remove that?
Results and Conclusion
<figtable id="tab:Overview">
This table gives an overview over all features examined in the sections above. The red background color indicates a disease causing prediction, the green color a non-disease causing one. In the end all red fields are summed up for each row and the resulting value leads to our prediction given in <xr id="tab:Result"/>.
Used abbreviation in this table:
AA: Amino Acid, Pol: Side-chain polarity, Charge: Side-chain charge at pH 7.4, HI: Hydropathy index, RM: Residue Mass, iP: isoelectric point
SNP | change in Pol | change in Charge | change in HI | change in RM | change in iP | SecStruc Psipred | SecStruc Reprof | SecStruc DSSP | Value BLOSUM62 |
Value PAM1 |
Value PAM250 |
Sift Score | pph2 Class | Sum bad scores |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A143T | nonpolar to polar |
none | -2.5 | 30.026 | -0.41 | C | C | C | 0 | 32 | 1 | 0.01 | deleterious | 6 |
R356W | polar to nonpolar |
positive to neutral |
3.6 | 30.025 | -4.87 | C | C | - | -4 | 8 | 2 | 0.01 | deleterious | 9 |
I289V | none | none | -0.3 | -14.027 | -0.05 | H | H | C | 4 | 33 | 4 | 0.05 | deleterious | 5 |
V316I | none | none | 0.3 | 14.027 | 0.05 | H | E | H | 4 | 57 | 4 | 0.75 | neutral | 4 |
R118H | none | positive to pos(10%), neutr(90%) |
1.3 | -19.046 | -3.16 | H | H | H | 0 | 10 | 2 | 0.06 | neutral | 8 |
N215S | none | none | 2.7 | -27.026 | 0.27 | H | H | H | 1 | 20 | 1 | 0.01 | neutral | 7 |
Q279E | none | neutral to negative |
0 | 0.99 | -2.5 | H | H | H | 3 | 27 | 2 | 0.00 | deleterious | 7 |
P40S | nonpolar to polar |
none | 0.8 | -10.039 | -0.62 | C | C | - | -1 | 12 | 1 | 0.00 | deleterious | 7 |
S65T | none | none | 0.1 | 14.027 | -0.08 | H | H | H | 2 | 38 | 1 | 0.01 | deleterious | 7 |
P323T | nonpolar to polar |
none | 0.9 | 3.988 | -0.7 | C | C | C | -2 | 4 | 0 | 0.01 | deleterious | 6 |
</figtable>
<figtable id="tab:Result">
All examined SNPs with the "sum bad score" according to <xr id="tab:Overview"/> and our resulting
prediction. The table also shows if the SNP truly is disease causing or not and whether our sequence
based prediction is true
SNP | Sum bad scores |
Prediction | True classification | Result prediction |
---|---|---|---|---|
A143T | 6 | Non-disease causing | Disease causing | Wrong |
R356W | 9 | Disease causing | Disease causing | Right |
I289V | 5 | Non-disease causing | Non-disease causing | Right |
V316I | 4 | Non-disease causing | Non-disease causing | Right |
R118H | 8 | Disease causing | Non-disease causing | Wrong |
N215S | 7 | Disease causing | Disease causing | Right |
Q279E | 7 | Disease causing | Disease causing | Right |
P40S | 7 | Disease causing | Disease causing | Right |
S65T | 7 | Disease causing | Disease causing | Right |
P323T | 6 | Non-disease causing | Non-disease causing | Right |
</figtable>
In <xr id="tab:Overview"/> we list all afore gathered information in condensed form and highlight values that we consider as an indicator for a disease causing mutation with red color. Results contained in green colored fields are considered neutral. The summed up score of disease indicators is again shown in <xr id="tab:Result"/> along with our prediction for each SNP. Mutations with score smaller than 7 are considered neutral, greater or equal to that disease causing. Next to the prediction we show the true classification of the single nucleotide polymorphism acording to the mapping we did in Task 6. We show that only two of the ten predictions are wrong, which we consider as a surprisingly good result.
References
<references/>