Task 6: MSUD - Sequence-based mutation analysis
- 1 Sequence-based mutation analysis
- 1.1 Task description
- 1.2 Comparison of wild type AA and mutant AA
- 1.3 PSI-BLAST - PSSM // homolog MSA
- 1.4 Tools
- 2 Discussion
- 3 References
Sequence-based mutation analysis
For this Task the group that reviewed our page last week was to chose 10 SNP's on which we can work on. We assume we don't know which of the SNP's cause MSUP or affects the protein structure or function. As we haven't received any message from the group which reviewed our page last week by thursday, we've decided to chose the 10 SNP's by ourselves: L17F, M82I,Q125E, I213T, C258Y, T310R, A328T, I361V, R429H, N404S
Comparison of wild type AA and mutant AA
Physiological properties<ref>http://en.wikipedia.org/wiki/Amino_acid</ref> and predicted secondary structure element (reprof with pssm)
|Position||wt-AA||wt properties||mutant-AA||mutant properties||expected effect on protein||sec-struct-element( reprof )|
|17||L||-Hydrophobic side chain, non-polar, charge: neutral||F||Hydrophobic side chain, non-polar, charge: neutral||-||L|
|82||M||Hydrophobic side chain, non-polar, charge: neutral||I||Hydrophobic side chain, non-polar, charge: neutral||-||E|
|125||Q||polar, charge: neutral||E||polar, charge: negative||+||L|
|213||I||Hydrophobic side chain, non-polar, charge: neutral||T||polar, charge: neutral||+||H|
|258||C||polar, charge: neutral||Y||polar, charge: neutral||-||L|
|310||T||polar, charge: neutral||R||polar, charge: positive||+||H|
|328||A||non-polar, charge: neutral||T||polar, charge: neutral||-||E|
|361||I||Hydrophobic side chain, non-polar, charge: neutral||V||non-polar, charge: neutral||-||H|
|404||N||polar, charge: neutral||S||polar, charge: neutral||-||L|
|429||R||polar, charge: positive||H||polar, charge: positive( 10% ) neeutral( 90% )||+||H|
Blosum62, PAM1, PAM250 scores
The scores for an aminoacid change in the matrices Blosum and PAM should give another hint, whether the substitution has an effect on the resulting protein or not. The higher the X in BlosumX, the shorter is the evolutionary context of sequences it is calculated for, while in PAMX the opposite is the case.
We used pyMols Mutagenesis Wizard to visualize the individual Mutations. In each image the C-chain of the original residue is coloured green, that of the SNP is coloured pink. Oxygen atoms are coloured red and nitrogen blue, while sulphur is coloured yellow for both residues. The L17F SNP is not displayed because it was not in the pdb file we used (1DTW). The easiest to spot changes that look like the might have an effect are C258Y, T310R and R429H. In case of the C258Y mutation the protein looses a sulphur atom and gains an aromatic ring, while the T310R mutation greatly increases the size of the amino acid. In case of the R429H mutation the protein again gains an aromatic ring. While the Q125E mutation is similar in size to the original amino acid the change in charge is very likely to have an effect for the protein.
PSI-BLAST - PSSM // homolog MSA
The command we used to create the PSSM-file, the file itsself, and the MSA can be found in the journal
The underlying database is big_db. To interpret these values better, we should find all mammalian hologs of the gene, build a MSA out of these, and calculate the conservation of wild type and mutant in this context. We used HomoloGene<ref>http://www.ncbi.nlm.nih.gov/homologene</ref> from NCBI for this task. Because the result was, there are no mammalian homologs, we just used all results from the search, which can be found at <ref>http://www.ncbi.nlm.nih.gov/sites/homologene/569</ref>.
A visual comparison of the predictions can be seen in Figure 1. For the comparison we only used results from SNAP2. The first observation made when comparing the three predictions is, that the number of SNPs that are predicted to have a negative effect greatly varies between the tools. While PolyPhen predicts seven of the SNPs to be deleterious, SIFT predicts five and SNAP only four to impact protein function. All three methods predicted the SNPs Q125E, T310R and C258Y to have a negative effect, which increses our confidence in the quality of these predictions. PolyPhen and SIFT or SNAP respectively both predicted the L17F and I213T mutation to be deleterious again increasing our confidence in these predictions. None of the tools predicted I361V or N404S to be deleterious and the R429H, M82I and A328T SNPs were only predicted by either SIFT or Polyphen, making them more unlikely.
|SNP||sift-prediction||sift-score||sift-conservation||sequences represented at pos|
|L17F||AFFECT PROTEIN FUNCTION||0.00||4.32||1|
|Q125E||AFFECT PROTEIN FUNCTION||0.02||3.02||19|
|C258Y||AFFECT PROTEIN FUNCTION||0.02||3.02||19|
|T310R||AFFECT PROTEIN FUNCTION||0.01||3.02||19|
|R429H||AFFECT PROTEIN FUNCTION||0.03||3.21||15|
Given that all three prediction methods agree we will assume the mutations Q125E, T310R and C258Y to be deleterious. As neither I361V or N404S were predicted by a single we will assume them to be neutral.
For the less certain negative effect predictions L17F and I213T we tried to classify based on difference in amino acid properties and visual inspection. For L17F we believe that the change from L to F will not have a major effect for the protein function. The reason for this is that the mutation is inside a loop so a mutation here is less likely to affect the protein structure and therby the function. Also Leu end Phe are in both properties and size. The I213T mutation however causes a large change in polarity and is inside a helix. Both of these reasons increase the likelyhood that this SNP is non-neutral.
The SNPs M82I, A328T and R429H were all predicted by two methods to have no effect on protein function. While M82I and A328Tis both are inside a sheet, the mutation in M82I is not a large change in terms of size or chemical properties. The A328T mutation however is a change in polarity so the change might have an effect. For R429H we assume that the change to a amino acid with an aromatic ring inside a helix must have some kind of effect on the protein.
The actual results in HGMD for our SNPs:
By simply using the consensus between all three predictions we allready were able to correctly classify five of the ten SNPs. For the contradicting predictions in most cases it helped to consider the chemical properties and the size of the amino acids. In the cases of the A328T and R429 mutation we feel like the the change in properties might or might not be large enough to safely call a mutation deleterious or not. Overall from the prediction methods SNAP2 performed best on our SNPs and predicted nine out of ten mutations correctly, while both SIFT and PolyPhen only got 7 out of ten right. It should be kept in mind however that our test set is very small and thereby might not be representative for all cases of SNPs. So just applying SNAP2 might not be ideal for every situation. Using multiple prediction methods also helps by detecting 'easier' cases, where no closer manual inspection might be necessary, which can help if the number of mutations is larger than in our case.