Predicting the Effect of SNPs (PKU)
Contents
Short Introduction
This week's task builds on the data gathered last week. We blindly choose 5 disease causing and 5 harmless SNPs and will try to predict their effect from the sequence change alone. You may find a detailed task description at the usual place. There is no task journal this week as we do not deem the mere execution of tools and webservices noteworthy and there was no other programming effort required.
Our dataset
We propose the following dataset, chosen mostly from well known SNPs from OMIM. They include mutations causing no reported effect, the mild related hyperphenylalaninemia (reduced activity, but functional enzyme) and phenylketonuria.
SNP | effect | prediction | validation |
---|---|---|---|
GLU76GLY | hyperphenylalaninemia | effect | |
SER87ARG | hyperphenylalaninemia | effect | |
ARG158GLN | disease causing | effect | |
GLN172HIS | non-disease | no effect | |
ARG243GLN | disease causing | effect | |
LEU255SER | disease causing | effect | |
MET276VAL | non-disease | no effect | |
ALA322GLY | hyperphenylalaninemia | no effect | |
GLY337VAL | disease causing | effect | |
ARG408TRP | disease causing | effect |
Investigated SNPS
In the following section, we present the information we gathered for each SNP individually. Summary comments on the methods we used, we placed at subsites. These include the PsiBlast Profile of our protein, PsiPred Predictions of the mutated sequences, the complete Alignment of 22 mammalian homologs to PAH and the SIFT, PolyPhen and SNAP predictions, but the conclusions are repeated below.
GLU76GLY
As this mutation results in a change from glutamic acid to glycine which have some differences in structure as can bee seen in <xr id="fig:mutationGLUGLY"/> we expect this change to be of rather minor effect. Of course glutamic acid is charged under biological conditions and glycine is not, but glycine could be a universal substitution, because it is neither hydrophobic nor hydrophilic. Additionally, as it is the smallest amino acid, it can not produce any sterical problems. Of course it might be that the glycine can not stabilize any structure, which should be present at this residue.We do not have any structural data for this point, but predictions suggest a position either in a helix or at the edge of a sheet structure. We only can rely on the physiochemical properties, for which we would say, that these changes are not drastic enough to cause the disease. If the mutation would have occurred rather near the catalytic site, our judgment would have been different with respect to the strong affinity of glutamic acid to ions, which plays a major role in PheOH-activity. But after having considered the other scores (PAM/BlOSUM) and the other more thorough methods, we have to assume that our first impression is wrong and we therefore have to change the prediction to disease causing.
- Amino acid changes: [From negativly charged, polar, strongly hydrophilic, medium sized to neutral, non-polar, non-hydrophilic, small]
- BLOSUM62/PAM250/PAM1: bad replacement
- PSSM: conserved
- Alignment of homologs: some variability
- SIFT: TOLERATED with a score of 0.18.
- PolyPhen: Benign
- SNAP: Non-neutral (low reliability)
<figure id="fig:mutationGLUGLY">
</figure>
SER87ARG
We expect that this mutation, which causes a change from serine to arginine, has a bigger effect on the protein, than the one above. With this mutation the strength of the effect depends completely on the location of the amino acid. Both of the amino acids are hydrophilic, but as arginine is one of the snorkeling, because of its rather hydrophobic stem, the changes can be rather serious(<xr id="fig:mutationSERARG"/>). It is predicted to be two positions N-terminal of a helix structure. As the overall change of size and the change in polarity from rather negative to positive and regarding the fact, that arginine is a rather seldom used amino acid (mostly appearing in the catalytic domain for phosphorylated substrates), we would say, that this is rather a disease causing mutation.
- Amino acid changes: [From neutral, polar, slightly hydrophilic to pos. charged, polar, strongly hydrophilic]
- BLOSUM62/PAM250/PAM1: neutral or good replacement.
- PSSM: residue change appears often
- Alignment of homologs: some variability
- SIFT: TOLERATED with a score of 0.54.
- PolyPhen: Benign
- SNAP: Non-neutral (low reliability)
<figure id="fig:mutationSERARG">
</figure>
ARG158GLN
From the secondary structure of both amino acids (<xr id="fig:mutationSERARG"/> left side), one would guess, that if an Arginine fits in this region, a Glutamine will fit there as well. But if one looks at the right side of this figure, in the close-up there are some small red discs, which indicate sterical collisions. These are due to the additional hydroxl-group, where the original amino acid only had hydrogen-atoms. Depending on the importance of this site and the collision's this mutation can have almost no or a very big effect. But since the sidechain collides in the inside of the protein, we would predict a rather serious effect.
- Amino acid changes: [From pos. charged, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic]
- BLOSUM62/PAM250/PAM1: neutral replacement
- PSSM: residue change appears often
- Alignment of homologs: some variability
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
- PolyPhen: Probably Damaging
- SNAP: Non-neutral (high reliability)
<figure id="fig:mutationSERARG">
</figure>
GLN172HIS
This mutation is rather easy on the first sight. The general chemical properties are the same. The amino acids only differ in size and structure. As one can see in <xr id="mutationGLNHIS"/> on the right, there are some collisions with the structure. But due to the fact, that this amino acid is located on the outside of the protein and also located in a coiled region which only interferes with another coiled region, we are quite sure, that this mutation is not disease causing.
- Amino acid changes: [From neutral, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic, ring-structure]
- BLOSUM62/PAM250/PAM1:neutral to good replacement
- PSSM: residue change appears often
- Alignment of homologs: highly conserved
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.03.
- PolyPhen: Benign
- SNAP: Neutral (high reliability)
<figure id="mutationGLNHIS">
</figure>
ARG243GLN
This mutation changes a positively charged arginine residue in a beta sheet to a neutrally charged, but otherwise in its properties quite similar glutamine. The scores in the scoring matrices reflect this, yet the position in the PSSM and the alignment is highly conserved. This, plus the strong agreement of the prediction methods (maximum scores in SIFT ans PolyPhen, fourth highest expected accuracy of Snap), lead us to the conclusion, that this SNP is disease causing.
- Amino acid changes: From pos. charged, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic.
- BLOSUM62/PAM250/PAM1: neutral to good replacement
- PSSM: conserved
- Alignment of homologs: highly conserved
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
- PolyPhen: Probably damaging
- SNAP: Non-neutral (medium reliability)
<figure id="fig:mutationARG243GLN">
</figure>
LEU255SER
This mutation is located in a helix spatially near the active center of PheOH and the change from the non polar, hydrophobic leucin to the polar, hydrophilic serine with its hydroxyl group might affect the secondary structure. While the Psiblast search shows some variability, the mismatch scores in the scoring matrices are very low and the alignment of homologs shows strong conservation in this region. The prediction methods agree on this SNP to be damaging and we also judge this mutation to be most likely disease causing.
- Amino acid changes: From neutral, non polar, strongly hydrophobic to neutral, polar, slightly hydrophilic.
- BLOSUM62/PAM250/PAM1: bad replacement
- PSSM: position variable
- Alignment of homologs: highly conserved
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
- PolyPhen: Probably damaging
- SNAP: Non-neutral (low reliability)
<figure id="fig:mutationLEU25SER">
</figure>
MET276VAL
This mutation is located in a coil region at the outside of the protein (see <xr id="fig:mutationMET276VAL" />) and while the introduced valine is more hydrophobic than the previous methionine, the location of the residue appears to be uncritical enough for this mutation to have no effect on the protein structure. Indeed, the PSSM shows, that the mismatch valine is observed more often than the match methionine. The prediction methods agree for once and we also think this mutation to be likely not disease causing.
- Amino acid changes: From neutral, non-polar, hydrophobic to neutral, non-polar, strongly hydrophobic.
- BLOSUM62/PAM250/PAM1: neutral replacement
- PSSM: residue change appears often
- Alignment of homologs: highly conserved
- SIFT: TOLERATED with a score of 0.06.
- PolyPhen : Benign
- SNAP: Neutral (low reliability)
<figure id="fig:mutationMET276VAL">
</figure>
ALA322GLY
While this residue is highly conserved in the mammalian analogs and in the PSSM, the replacement of alanine with glycine within the helix visible in <xr id="fig:mutationALA322GLY" /> appears possible. Both are small, non-polar amino acids allowed in helix formation and the now missing methyl group of alanine is unlikely to have a large effect on residues outside the helix. The prediction methods disagree again, but we judge this mutation to be neutral.
- Amino acid changes: From neutral, non-polar, hydrophobic, small to neutral, non-polar, slightly hydrophilic, small.
- BLOSUM62/PAM250/PAM1: good replacement
- PSSM: conserved
- Alignment of homologs: highly conserved
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.01.
- PolyPhen: Possibily damaging
- SNAP: Neutral (low reliability)
<figure id="fig:mutationALA322GLY">
</figure>
GLY337VAL
While the alignment shows some replacements in the region around GLY337 and the PSSM shows no high conservation, the residue itself is conserved in mammals. The position in the loop between two beta sheets (see <xr id="fig:mutationGLY337VAL" />) makes this likely a critical position, especially as the introduced valine is larger but has to fit in the loop, and is much more hydrophobic but lies at the outside of the protein. The prediction methods disagree on the effect of this SNP, but we deem it more likely that this is a disease causing mutation.
- Amino acid changes: From neutral, non-polar, slightly hydrophilic, small to neutral, non-polar, strongly hydrophobic, medium sized.
- BLOSUM62/PAM250/PAM1: Bad replacement
- PSSM: position variable
- Alignment of homologs: some variability
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
- PolyPhen: Possibly damaging
- SNAP: Neutral (low reliability)
<figure id="fig:mutationGLY337VAL">
</figure>
ARG408TRP
Here, the mutation lies in a coil region, but the introduction of the large tryptophan is likely to cause steric clashes and might influence the helix and sheet structures nearby (see <xr id="fig:mutationARG408TRP" />). The predictions agree and all available information suggests, that this mutation causes a negative effect.
- Amino acid changes: From pos. charged, polar, strongly hydrophilic, medium sized to neutral, non-polar, slightly hydrophilic, large.
- BLOSUM62/PAM250/PAM1: Bad replacement
- PSSM: conserved
- Alignment of homologs: highly conserved
- SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
- PolyPhen: Probably damaging
- SNAP: Non-neutral (medium reliability)
<figure id="fig:mutationARG408TRP">
</figure>
Conclusion
We concentrated on gathering information on the SNPs instead on comparing the individual information gathering methods and organized the Wiki accordingly. For a comparison of the methods, larger, more diverse datasets might be more suitable.
We tried to find an intuitive consensus of all the information gathered about our SNPs, since we find it inappropriate to weigh the methods equally in all cases. This approach obviously does not work in a large scale and leaves a lot of room open to discussion, but we do not consider this to be a disadvantage in general. We tried to take as much of the context and of our background knowledge into account as possible and at least in this small dataset we reached an accuracy of 90%.