Predicting the Effect of SNPs (PKU)

From Bioinformatikpedia

Short Introduction

This week's task builds on the data gathered last week. We blindly choose 5 disease causing and 5 harmless SNPs and will try to predict their effect from the sequence change alone. You may find a detailed task description at the usual place. There is no task journal this week as we do not deem the mere execution of tools and webservices noteworthy and there was no other programming effort required.

Our dataset

We propose the following dataset, chosen mostly from well known SNPs from OMIM. They include mutations causing no reported effect, the mild related hyperphenylalaninemia (reduced activity, but functional enzyme) and phenylketonuria.

SNP effect prediction validation
GLU76GLY hyperphenylalaninemia effect
True.jpeg
SER87ARG hyperphenylalaninemia effect
True.jpeg
ARG158GLN disease causing effect
True.jpeg
GLN172HIS non-disease no effect
True.jpeg
ARG243GLN disease causing effect
True.jpeg
LEU255SER disease causing effect
True.jpeg
MET276VAL non-disease no effect
True.jpeg
ALA322GLY hyperphenylalaninemia no effect
Wrong.jpeg
GLY337VAL disease causing effect
True.jpeg
ARG408TRP disease causing effect
True.jpeg

Investigated SNPS

In the following section, we present the information we gathered for each SNP individually. Summary comments on the methods we used, we placed at subsites. These include the PsiBlast Profile of our protein, PsiPred Predictions of the mutated sequences, the complete Alignment of 22 mammalian homologs to PAH and the SIFT, PolyPhen and SNAP predictions, but the conclusions are repeated below.

GLU76GLY

Prediction:Rejected.jpg

As this mutation results in a change from glutamic acid to glycine which have some differences in structure as can bee seen in <xr id="fig:mutationGLUGLY"/> we expect this change to be of rather minor effect. Of course glutamic acid is charged under biological conditions and glycine is not, but glycine could be a universal substitution, because it is neither hydrophobic nor hydrophilic. Additionally, as it is the smallest amino acid, it can not produce any sterical problems. Of course it might be that the glycine can not stabilize any structure, which should be present at this residue.We do not have any structural data for this point, but predictions suggest a position either in a helix or at the edge of a sheet structure. We only can rely on the physiochemical properties, for which we would say, that these changes are not drastic enough to cause the disease. If the mutation would have occurred rather near the catalytic site, our judgment would have been different with respect to the strong affinity of glutamic acid to ions, which plays a major role in PheOH-activity. But after having considered the other scores (PAM/BlOSUM) and the other more thorough methods, we have to assume that our first impression is wrong and we therefore have to change the prediction to disease causing.

  • Amino acid changes: [From negativly charged, polar, strongly hydrophilic, medium sized to neutral, non-polar, non-hydrophilic, small]
  • BLOSUM62/PAM250/PAM1: bad replacement
  • PSSM: conserved
  • Alignment of homologs: some variability
  • SIFT: TOLERATED with a score of 0.18.
  • PolyPhen: Benign
  • SNAP: Non-neutral (low reliability)

<figure id="fig:mutationGLUGLY">

Amino acids in the first mutation

</figure>

2D structure projection of glutamic acid with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature









2D structure projection of glycine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature


SER87ARG

Prediction Rejected.jpg

We expect that this mutation, which causes a change from serine to arginine, has a bigger effect on the protein, than the one above. With this mutation the strength of the effect depends completely on the location of the amino acid. Both of the amino acids are hydrophilic, but as arginine is one of the snorkeling, because of its rather hydrophobic stem, the changes can be rather serious(<xr id="fig:mutationSERARG"/>). It is predicted to be two positions N-terminal of a helix structure. As the overall change of size and the change in polarity from rather negative to positive and regarding the fact, that arginine is a rather seldom used amino acid (mostly appearing in the catalytic domain for phosphorylated substrates), we would say, that this is rather a disease causing mutation.

  • Amino acid changes: [From neutral, polar, slightly hydrophilic to pos. charged, polar, strongly hydrophilic]
  • BLOSUM62/PAM250/PAM1: neutral or good replacement.
  • PSSM: residue change appears often
  • Alignment of homologs: some variability
  • SIFT: TOLERATED with a score of 0.54.
  • PolyPhen: Benign
  • SNAP: Non-neutral (low reliability)

<figure id="fig:mutationSERARG">

Amino acids in the second mutation

</figure>










2D structure projection of serine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
2D structure projection of arginine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature


ARG158GLN

Prediction: Rejected.jpg

From the secondary structure of both amino acids (<xr id="fig:mutationSERARG"/> left side), one would guess, that if an Arginine fits in this region, a Glutamine will fit there as well. But if one looks at the right side of this figure, in the close-up there are some small red discs, which indicate sterical collisions. These are due to the additional hydroxl-group, where the original amino acid only had hydrogen-atoms. Depending on the importance of this site and the collision's this mutation can have almost no or a very big effect. But since the sidechain collides in the inside of the protein, we would predict a rather serious effect.

  • Amino acid changes: [From pos. charged, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic]
  • BLOSUM62/PAM250/PAM1: neutral replacement
  • PSSM: residue change appears often
  • Alignment of homologs: some variability
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
  • PolyPhen: Probably Damaging
  • SNAP: Non-neutral (high reliability)

<figure id="fig:mutationSERARG">

Amino acids in the third mutation

</figure>

2D structure projection of Arginine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature





2D structure projection of Glutamine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated Glutamine at residue 158 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighboring residues. This rotamer appears in 1.7% of mutations according to the PyMol rotamer database.
Mutated Glutamine at residue 158 in PheOH. The residue is located in a helix region.


GLN172HIS

Prediction: Approved.jpg

This mutation is rather easy on the first sight. The general chemical properties are the same. The amino acids only differ in size and structure. As one can see in <xr id="mutationGLNHIS"/> on the right, there are some collisions with the structure. But due to the fact, that this amino acid is located on the outside of the protein and also located in a coiled region which only interferes with another coiled region, we are quite sure, that this mutation is not disease causing.

  • Amino acid changes: [From neutral, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic, ring-structure]
  • BLOSUM62/PAM250/PAM1:neutral to good replacement
  • PSSM: residue change appears often
  • Alignment of homologs: highly conserved
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.03.
  • PolyPhen: Benign
  • SNAP: Neutral (high reliability)


<figure id="mutationGLNHIS">

Amino acids in the fourth mutation

</figure>

2D structure projection of glutamine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature


2D structure projection of histidine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated histidine at residue 172 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighboring residues. This rotamer appears in 18.9% of mutations according to the PyMol rotamer database.
Mutated histidine at residue 172 in PheOH. The residue is located in a coil region.


ARG243GLN

Prediction:Rejected.jpg

This mutation changes a positively charged arginine residue in a beta sheet to a neutrally charged, but otherwise in its properties quite similar glutamine. The scores in the scoring matrices reflect this, yet the position in the PSSM and the alignment is highly conserved. This, plus the strong agreement of the prediction methods (maximum scores in SIFT ans PolyPhen, fourth highest expected accuracy of Snap), lead us to the conclusion, that this SNP is disease causing.

  • Amino acid changes: From pos. charged, polar, strongly hydrophilic to neutral, polar, strongly hydrophilic.
  • BLOSUM62/PAM250/PAM1: neutral to good replacement
  • PSSM: conserved
  • Alignment of homologs: highly conserved
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
  • PolyPhen: Probably damaging
  • SNAP: Non-neutral (medium reliability)


<figure id="fig:mutationARG243GLN">

Amino acids in the fifth mutation

</figure>

2D structure projection of arginine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature




2D structure projection of glutamine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated histidine at residue 243 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighbouring residues. This rotamer appears in 17.3% of mutations according to the PyMol rotamer database.
Mutated glutamine at residue 243 in PheOH. The residue is located in a sheet region.


LEU255SER

Prediction:Rejected.jpg

This mutation is located in a helix spatially near the active center of PheOH and the change from the non polar, hydrophobic leucin to the polar, hydrophilic serine with its hydroxyl group might affect the secondary structure. While the Psiblast search shows some variability, the mismatch scores in the scoring matrices are very low and the alignment of homologs shows strong conservation in this region. The prediction methods agree on this SNP to be damaging and we also judge this mutation to be most likely disease causing.

  • Amino acid changes: From neutral, non polar, strongly hydrophobic to neutral, polar, slightly hydrophilic.
  • BLOSUM62/PAM250/PAM1: bad replacement
  • PSSM: position variable
  • Alignment of homologs: highly conserved
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
  • PolyPhen: Probably damaging
  • SNAP: Non-neutral (low reliability)


<figure id="fig:mutationLEU25SER">

Amino acids in the sixth mutation
2D structure projection of leucine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature


2D structure projection of serine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated serine at residue 255 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighbouring residues. This rotamer appears in 42.9% of mutations according to the PyMol rotamer database.
Mutated serine at residue 255 in PheOH. The residue is located in a helix structure.

</figure>


MET276VAL

Prediction:Approved.jpg

This mutation is located in a coil region at the outside of the protein (see <xr id="fig:mutationMET276VAL" />) and while the introduced valine is more hydrophobic than the previous methionine, the location of the residue appears to be uncritical enough for this mutation to have no effect on the protein structure. Indeed, the PSSM shows, that the mismatch valine is observed more often than the match methionine. The prediction methods agree for once and we also think this mutation to be likely not disease causing.

  • Amino acid changes: From neutral, non-polar, hydrophobic to neutral, non-polar, strongly hydrophobic.
  • BLOSUM62/PAM250/PAM1: neutral replacement
  • PSSM: residue change appears often
  • Alignment of homologs: highly conserved
  • SIFT: TOLERATED with a score of 0.06.
  • PolyPhen : Benign
  • SNAP: Neutral (low reliability)

<figure id="fig:mutationMET276VAL">

Amino acids in the seventh mutation
2D structure projection of methionine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature


2D structure projection of valine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated valine at residue 276 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighbouring residues. This rotamer appears in 94.5% of mutations according to the PyMol rotamer database.
Mutated valine at residue 276 in PheOH. The residue is located in a coil.

</figure>


ALA322GLY

Prediction:Approved.jpg

While this residue is highly conserved in the mammalian analogs and in the PSSM, the replacement of alanine with glycine within the helix visible in <xr id="fig:mutationALA322GLY" /> appears possible. Both are small, non-polar amino acids allowed in helix formation and the now missing methyl group of alanine is unlikely to have a large effect on residues outside the helix. The prediction methods disagree again, but we judge this mutation to be neutral.

  • Amino acid changes: From neutral, non-polar, hydrophobic, small to neutral, non-polar, slightly hydrophilic, small.
  • BLOSUM62/PAM250/PAM1: good replacement
  • PSSM: conserved
  • Alignment of homologs: highly conserved
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.01.
  • PolyPhen: Possibily damaging
  • SNAP: Neutral (low reliability)

<figure id="fig:mutationALA322GLY">

Amino acids in the eighth mutation
2D structure projection of alanine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
2D structure projection of glycine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated glycine at residue 322 in PheOH. There is only one rotamer and there can not be collisions with neighbouring residues as glycine is the smallest amino acid.
Mutated glycine at residue 322 in PheOH. The residue is located in a helix region.

</figure>


GLY337VAL

Prediction:Rejected.jpg

While the alignment shows some replacements in the region around GLY337 and the PSSM shows no high conservation, the residue itself is conserved in mammals. The position in the loop between two beta sheets (see <xr id="fig:mutationGLY337VAL" />) makes this likely a critical position, especially as the introduced valine is larger but has to fit in the loop, and is much more hydrophobic but lies at the outside of the protein. The prediction methods disagree on the effect of this SNP, but we deem it more likely that this is a disease causing mutation.

  • Amino acid changes: From neutral, non-polar, slightly hydrophilic, small to neutral, non-polar, strongly hydrophobic, medium sized.
  • BLOSUM62/PAM250/PAM1: Bad replacement
  • PSSM: position variable
  • Alignment of homologs: some variability
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
  • PolyPhen: Possibly damaging
  • SNAP: Neutral (low reliability)

<figure id="fig:mutationGLY337VAL">

Pymol rendering of the GLY337VA mutation.
2D structure projection of glycine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
2D structure projection of valine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated valine at residue 337 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighbouring residues. This rotamer appears in 2.9% of mutations according to the PyMol rotamer database.
Mutated valine at residue 337 in PheOH. The residue is located in a loop between two sheet elements.

</figure>


ARG408TRP

Prediction:Rejected.jpg

Here, the mutation lies in a coil region, but the introduction of the large tryptophan is likely to cause steric clashes and might influence the helix and sheet structures nearby (see <xr id="fig:mutationARG408TRP" />). The predictions agree and all available information suggests, that this mutation causes a negative effect.


  • Amino acid changes: From pos. charged, polar, strongly hydrophilic, medium sized to neutral, non-polar, slightly hydrophilic, large.
  • BLOSUM62/PAM250/PAM1: Bad replacement
  • PSSM: conserved
  • Alignment of homologs: highly conserved
  • SIFT: AFFECT PROTEIN FUNCTION with a score of 0.00.
  • PolyPhen: Probably damaging
  • SNAP: Non-neutral (medium reliability)

<figure id="fig:mutationARG408TRP">

Pymol rendering of the ARG408TRP mutation.
2D structure projection of arginine with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
2D structure projection of tryptophan with the pK-values for each group. For a better referability the c-atoms are labeled according to the common nomenclature
Close-up of the mutated tryptophan at residue 408 in PheOH. The best fitting rotamer was chosen to minimize severe or smaller collisions (red to green discs) with neighbouring residues. This rotamer appears in 2.3% of mutations according to the PyMol rotamer database.
Mutated tryptophan at residue 408 in PheOH. The residue is located in a coil near a helix region.

</figure>


Conclusion

We concentrated on gathering information on the SNPs instead on comparing the individual information gathering methods and organized the Wiki accordingly. For a comparison of the methods, larger, more diverse datasets might be more suitable.

We tried to find an intuitive consensus of all the information gathered about our SNPs, since we find it inappropriate to weigh the methods equally in all cases. This approach obviously does not work in a large scale and leaves a lot of room open to discussion, but we do not consider this to be a disadvantage in general. We tried to take as much of the context and of our background knowledge into account as possible and at least in this small dataset we reached an accuracy of 90%.