Sequence-based mutation analysis (Phenylketonuria)

From Bioinformatikpedia
Revision as of 10:22, 2 August 2013 by Waldraffs (talk | contribs) (Summary)

Summary

In this task different features that are based on sequence information are used to predict the effects of SNPs. Therefore substitution matrices like PAM or BLOSUM as well as the properties of amino acids are used. Furthermore a closer look on the structur is taken to see, if the SNP might disturb a HELIX or STRAND or if it is located in a more flexible region of an uncoiled part of the sequence. Four different prediction tools are used, too. Three of those - SIFT<ref name="sift1"> Pauline C Ng & Steven Henikoff (2001): "Predicting Deleterious Amino Acid Substitutions". Genome Res. 11: 863-874 doi:10.1101/gr.176601 </ref>,<ref name="sift2"> Prateek Kumar, Steven Henikoff & Pauline C Ng (2009): "Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm". Nature Protocols 4, - 1073 - 1081 doi:10.1038/nprot.2009.86</ref>, PolyPhen2<ref name="polyphen2"> Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. (2010)): "A method and server for predicting damaging and missense mutations". Nat Methods7(4):248-249 doi:10.1038/nmeth0410-248 </ref> and SNAP<ref name="snap"> Yana Bromberg and Burkhard Rost (2007): "SNAP: predict effect of non-synonymous polymorphisms on function.". Nucleic Acids Research, Vol. 35, No. 11: 3823-3835 (PubMed) doi:10.1093/nar/gkm238 </ref> - are based on the sequence similarity and the conservation of the amino acids at particular positions in a multiple alignment, but also other different features are used. The fourth one is MutationTaster<ref name="mutationTaster"> Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. (2010): "MutationTaster evaluates disease-causing potential of sequence alterations.". Nat Methods. 7(8):575-6. doi:10.1038/nmeth0810-575. </ref>, which integrates information of different biomedical databases.
With all those different features a consensus prediction for each of the ten substitutions is made and compared with the information given in the HGMD and dbSNP databases.

Mutation dataset

For the anylsis of sequence-based mutations we randomly choose ten SNPs that lead to an amino acid exchange out of HGMD and dbSNP (<xr id="snps"/>). The detailed information about the SNPs can be found in the Lab journal. Here we act as if we do not know the effect of the SNPs. <figtable id="snps">

SNPs
AA - three letter AA - one letter Nucleotides
Ala259Val A259V C776T
Arg123Ile R123I G368T
Gln20Leu Q20L A59T
Gln172His Q172H G516T
Gly103Ser G103S G307A
Ile421Thr I421T T1262C
Lys341Thr K341T A1022C
Phe392Ser F392S T1175C
Pro416Gln P416Q C1247A
Thr266Ala T266A A796C
Ten SNPs of PAH annotated in HGMD or dbSNP. The first two columns represent the amino acid its position and the mutated amino acid.
For example A259V stands for alanine that is mutated to valine at position 259. The third column represents the nucleotide that is changed to result in the amino acid exchange.

</figtable> Like you can see both the amino acid exchange as well as the nucleotide exchange is given. Those informations are used for the different prediction tools. SIFT, PolyPhen2 and SNAP use the amino acids as input, whereas MutationTaster is based upon the nucleotides.

Analyze SNPs

The complete results of the prediction tools can be found in the Lab journal.
Looking at the substitution matrices you can see that the substitutions analyzed in this task never have the worst values.
As 2pah begins at position 118 for amino acids 20 and 103 the substitutions cannot be shown with pymol. A comination of the other eight mutations can be seen in <xr id="all_mutations"/>. None of those amino acids participates in forming the binding site, so it is not disturbed by any of the substitutions that are examined here.

<figure id ="all_mutations">

All eight substitutions shown in the catalytic domain of 2pah (green), where the original amino acids are shown in yellow and the mutated one in purple.

</figure>

Ala-259-Val

<figure id="A259V">

Mutation of alanine (yellow) to valine (purple) at position 259 of 2pah (green).

</figure>

Properties alanine is small, non-polar, neutral and hydrophobic
valine is aliphatic, small, non-polar, neutral and hydrophobic
Structure LOOP
Substitution Matrices middle ranged value → neutral substitution
PSSM alanine seems to be highly conserved with a value of 70. Valine has a value of 1.
SIFT TOLERATED with a score of 0.16
PolyPhen2 probably damaging with a score of 1.000
SNAP non-neutral, RI = 2
MutationTaster disease causing
prediction: non-neutral

For the substitution of alanine to valine the properties do not change very much as the only difference is that valine is aliphatic. This can also be seen in the substitution matrices where such an exchange has a middle ranged value and therefore indicates a neutral substitution. Additionally SIFT predicts this mutation to be tolerable, but the score is not very high. When looking at the PSSM matrix that was created with psiblast a high conservation of alanine on position 259 can be seen, where valine only has a low tolerable value here. Furthermore the other three prediction tools all predict a non-neutral effect of this substitution where most notably PolyPhen2 has a very high score. SNAP only has a reliability index of two, but an accuracy of 70%. Looking at the structure itself (<xr id="A259V"/>) you can see that the substitution is located in a LOOP, however it is directly beneath a HELIX or the last amino acid of it, which is predicted to get lost by MutationTaster if the amino acid is changed. In this case, to decide if the substitution is neutral or not is really hard as the different features suggest both. However, we predict this mutation to be non-neutral.

Arg-123-Ile

<figure id="R123I">

Mutation of arginine (yellow) to isoleucine (purple) at position 123 of 2pah (green).

</figure>

Properties arginine is positively charged, polar and hydrophilic
isoleucine is aliphatic, neutral, non-polar and hydrophobic
Substitution Matrices low value → bad substitution
Structure LOOP
PSSM arginine seems to be highly conserved with a value of 49. Isoleucine has a value of 7.
SIFT AFFECT PROTEIN FUNCTION with a score of 0.00
PolyPhen2 possibly damaging with a score of 0.807 / 0.582
SNAP neutral, RI = 0
MutationTaster disease causing
prediction: non-neutral

Here a lot of the different features indicate a non-neutral and probably damaging substitution, which begins with the complete different properties of the two amino acids. Furthermore the substitution matrices show low values and arginine is high conserved on position 123. Beside of SNAP all prediction tools forecast a bad substitution, however PolyPhen2 only indicates a possibly damaging mutation. Nevertheless SNAP itself has the lowest reliability index of 0. This substitution lies in a LOOP and a similar conformation of the two amino acids in the molecule can be seen in <xr id="R123I"/>. Altogether we predict that the substitution of arginine to isoleucine at position 123 of the PAH protein is non-neutral and might affect function.

Gln-20-Leu

Properties glutamine is neutral, polar and hydrophilic
leucine is aliphatic, neutral, non-polar and hydrophilic
Substitution Matrices low value → bad substitution
Structure LOOP
PSSM seems to be not conserved as the value for aspartic acid (17) is higher than for glutamine (14).

leucine has a value of 6.

SIFT TOLERATED with a score of 0.90
PolyPhen2 benign with a score of 0.000
SNAP neutral, RI = 0
MutationTaster disease causing
prediction: neutral

The exchange of glutamine to leucine at position 20 is predicted to be neutral, as there is much indication for this by most features. Only the substitution matrices indicate a bad substitution. Glutamine seems to be not conserved at this position, but flexible as there is a higher value for aspartic acid in PSSM. SIFT and PolyPhen2 both get high scores for being a neutral substitution, where SNAP has a very low reliability index. Only MutationTaster predicts it to be disease causing. Again the substitution can be found in a LOOP region.

Gln-172-His

<figure id="Q172H">

Mutation of glutamine (yellow) to histidine (purple) at position 172 of 2pah (green).

</figure>

Properties glutamine is neutral, polar and hydrophilic
histidine is positively charged, polar and hydrophillic
Substitution Matrices middle ranged → neutral substitution
Structure LOOP
PSSM glutamine seems to be conserved with a score of 29.
histidine only has a score of 1.
SIFT AFFECT PROTEIN FUNCTION with a score of 0.03.
PolyPhen2 possibly damaging with a score of 0.705 / benign with a score of 0.170
SNAP neutral, RI = 4
MutationTaster disease causing
prediction: neutral

The properties of glutamine and histidine coincide in being polar and hydrophilic. Furthermore they show middle ranged values in the substitution matrix with advance towards high scores. Again the substitution can be found in a LOOP region at the outside of the protein, which can be viewed in <xr id="Q172H"/>. Anew the prediction tools do not coincide with each other. So SIFT and MutationTaster predict an affect on the protein, whereas SNAP and PolyPhen2 indicate a neutral substitution. For PolyPhen2 you have to be carefull looking at the different datasets, however, we are more interested in HumVar, here. Altogether, we think that this mutation is neutral.

Gly-103-Ser

Properties glycine is small, neutral, non-polar and hydrophilic
serine is small, neutral, polar and hydrophilic
Substitution Matrices middle ranged value → neutral substitution
Structure HELIX
PSSM glycine seems to be completely unconserved as every amino acid has the same value 5.
SIFT TOLERATED with a score of 0.05
PolyPhen2 benign with a score of 0.003 / 0.006
SNAP neutral, RI = 6
MutationTaster disease causing
prediction: neutral

For the substitution of glycine to serine the properties do not have to change much as the only difference is that the original amino acid is non-polar and the new one polar. This can also be viewed in the substitution matrices where the middle ranged values indicate a neutral substitution. In this case the residue is located on a HELIX. As PSSM shows that glycine is unconserved with all other amino acids including serine have the same value and SIFT, PolyPhen2 and SNAP all predict that there is no affect on the protein where the last two show reliable scores, we assume this substitution to be neutral. Nevertheless MutationTaster predicts this mutation to be disease causing as the ACT domain is forecasted to get lost.

Ile-421-Thr

<figure id="I421T">

Mutation of isoleucine (yellow) to threonine (purple) at position 421 of 2pah (green).

</figure>

Properties isoleucine is aliphatic, neutral, non-polar and hydrophobic
threonine is small, neutral, polar and hydrophobic
Substitution Matrices low value → bad substitution
Structure STRAND
PSSM isoleucine with a value of 32 seems to be not conserved as valine has a value of 41.

threonine has a value of 0.

SIFT AFFECT PROTEIN FUNCTION with a score of 0.00.
PolyPhen2 possibly damaging with a score of 0.667 / probably damaging with a score of 0.913
SNAP non-neutral, RI = 2
MutationTaster disease causing
prediction: non-neutral

At position 421 we found a substitution of isoleucine to threonine. This substitution can be found on a beta STRAND as you can see in <xr id="I421T"/>. Threonine seems to have some differences to fit into that position as it only shows similarities in the sidechain but not in the STRAND itself. The properties, however, are not clearly similar or not as they change from aliphatic and non-polar to small and polar, but keep to be neutral and hydrophobic. This can be also viewed in the substitution matrices, where the substitution has low values. Nevertheless for PAM a slightly tendency to a neutral substitution can be viewed. Although isoleucine with a value of 32 is not conserved on this position with valine having the highest score, threonine is not likely to be a good replacement as it gets a value of 0. Additionally all four prediction tools agree on a damaging affect. So this substitution most likely is non-neutral.

Lys-341-Thr

<figure id="K341T">

Mutation of lysine (yellow) to threonine (purple) at position 341 of 2pah (green).

</figure>

Properties lysine is positively charged, polar and hydrophobic
threonine is small, neutral, polar and hydrophobic
Substitution Matrices low value → bad substitution
Structure STRAND
PSSM lysine seems to be conserved with a value of 39.

threonine has a value of 1.

SIFT AFFECT PROTEIN FUNCTION with a score of 0.00.
PolyPhen2 probably damaging with a score of 1.000 / 0.996
SNAP non-neutral, RI = 3
MutationTaster disease causing
prediction: non-neutral

In this case the properties only changed from positively charged to small and neutral, however the substitution matrices show more or less low values indicating a bad substitution. Furthermore the residue is on a beta STRAND and as you can see in <xr id="K341T"/> there are some structural differences between the two amino acids. Additionally lysine is highly conserved on position 341 where threonine only has a score of 1. Again all four prediction tools coincide on forecasting an effecting substitution. Therefore we are pretty sure that this substitution really is non-neutral and probably damaging.

Phe-392-Ser

<figure id="F392S">

Mutation of phenylalanine (yellow) to serine (purple) at position 392 of 2pah (green).

</figure>

Properties phenylalanine is aromatic, neutral, non-polar and hydrophobic
serine is small, neutral, polar and hydrophilic
Substitution Matrices low value → bad substitution
Structure HELIX
PSSM phenylalanine seems to be highly conserved with a value of 49. Serine has a value of 0.
SIFT AFFECT PROTEIN FUNCTION with a score of 0.00.
PolyPhen2 probably damaging with a score of 1.000
SNAP non-neutral, RI = 1
MutationTaster disease causing
prediction: non-neutral

For the exchange of phenylalanine with serine on position 392 all features indicate a bad effect. Most probably this substitution is disease causing. Beginning with the properties you can see that the only accordance is that it remains neutral and also the substitution matrices show very low values. Furthermore the structures of the amino acids are very different and therefore most likely have a drastic affect on the alpha HELIX, where this residue can be found (<xr id="F392S"/>). Also conservation and prediction tools indicate an effecting substitution with high reliability.

Pro-416-Gln

<figure id="P416Q">

Mutation of proline (yellow) to glutamine (purple) at position 416 of 2pah (green).

</figure>

Properties proline is small, neutral, non-polar and hydrophilic
glutamine is neutral, polar and hydrophilic
Substitution Matrices low value → bad substitution
Structure LOOP
PSSM proline seems to be highly conserved with a value of 66. Glutamine has a value of 0.
SIFT AFFECT PROTEIN FUNCTION with a score of 0.00.
PolyPhen2 probably damaging with a score of 0.996 / 0.985
SNAP non-neutral, RI = 1
MutationTaster disease causing
prediction: non-neutral

Like in the substitution before most features indicate a bad substitution. However, the exchange of proline to glutamine does not change the properties that much, but gets low values in the substitution matrices. Like you can see in <xr id="P416Q"/> the substitution occurs on a coiled region and therefore probably would be not so bad, if it would be outside the protein. However, as it lies between to beta STRANDs with only a small LOOP, it most likely affect both STRANDs. Additionally it is really high conserved which can be viewed in the PSSM. Again all four tools predict an affect on the protein.

Thr-266-Ala

<figure id="T266A">

Mutation of threonine (yellow) to alanine (purple) at position 266 of 2pah (green).

</figure>

Properties threonine is small, neutral, polar and hydrophobic
alanine is small, neutral, non-polar and hydrophobic
Substitution Matrices middle ranged value &rarr neutral substitution
Structure HELIX
PSSM threonine seems to be conserved with a value of 43. Alanine has a value of 29.
SIFT AFFECT PROTEIN FUNCTION with a score of 0.00.
PolyPhen2 probably damaging with a score of 1.000
SNAP neutral, RI = 2
MutationTaster disease causing
prediction: non-neutral

The last substitution shows some similarities in the properties, where only a change in polarity can be found. Also the substitution matrices indicate a neutral substitution. Nevertheless, threonine is conserved, althoug alanine also is often seen on this position. Furthermore, SIFT and PolyPhen2 agree with reliable scores that there is an affect on the protein which might cause a disease. SNAP, however, predicts a neutral substitution, whereas MutationTaster forecast that it is disease causing. Looking at the structure shown in <xr id="T266A"/> you can see that the residue can be found on the end of a HELIX. Threonine and alanine show similar orientations. Altogther it is not easy to say, if this mutation is neutral or not. However, with the change of polarity and high conservation as well as the scores of the prediciton tools we assume this mutation to be non-neutral, but maybe with smaller effects than the other non-neutral substitutions above.

Mammalian Homologous Sequences

We used a normal BLAST search on the mammalian database of Uniprot and filter the results per hand for double entries. Altogether we found 22 homologues sequences (<xr id="IDs"/>). <figtable id="IDs">

UniProt-IDs homologue to P00439
H2Q6R0 G3S964 G1R3M2 G7PJC2 F7HMW9 F7I717 F7BKF9
F6XY00 E2R366 G1T8B6 H2NIF5 M3YKN3 M9P0Q7 H0WTI6
M3W9R1 G3TIW0 G1LIM6 Q2KIH7 G1P4I7 P16331 M9P0Y5
UniProt IDs of the 22 mammalian homologues sequences of the PAH protein. The only ID that has a sequence that is not 100% identical to P00439 is marked in green.

</figtable>

For the creation of the multiple alignment we used clustalw. Nevertheless, only F6XY00 has a slightly different sequence. Like E2R366 F6XY00 is a homologue sequence of the human PAH in canis familiaris. On the positions of our substitutions the sequences are conserved.

Prediction

<figtable id="summary">

SNP-Prediction
SNP Prediction Validation
Ala259Val non-neutral correct HGMD
Arg123Ile non-neutral ? dbSNP
Gln20Leu neutral wrong HGMD
Gln172His neutral ? dbSNP
Gly103Ser neutral wrong HGMD
Ile421Thr non-neutral ? dbSNP
Lys341Thr non-neutral correct HGMD
Phe392Ser non-neutral ? dnSNP
Pro416Gln non-neutral correct HGMD
Thr266Ala non-neutral correct dbSNP
Summary of the prediction made with different features for the ten substitution. The validation is done with the databases HGMD and dbSNP.

</figtable>

In <xr id="summary"/> you can see that it is not easy to predict the effect of SNPs. For the SNPs that were taken from HGMD three are predicted correctly as non-neutral and the other two wrong as neutral. For dbSNP Thr266Ala is correctly predicted to have an effect, whereas for the other four it cannot be said, as there is no further information if they are neutral or not. Altogether for PAH there are lot of substitutions that lead to PKU.
Using different features you can see that there is not one true feature as sometimes it is important that the properties keep the same and for other residues the structure position, if it is in a LOOP, HELIX or STRAND, inside or outside the protein, is more important. Thereby looking at the amino acid and its substitution visually also gives hints. Prediction tools must be handeled carefully and to decide if you can trust a prediction looking at probability and evaluation scores is important. For MutationTaster we always get disease causing as result with high probabilities, which makes sense for the HGMD proteins as the program looks at the databases and reports if the SNP was found. The most important feature in our case here seems to be the conservation of an amino acid at a specific position, whereas mammalian homologues only tells us, that PAH seems to be highly conserved in different species, but does not help us to decide between neutral and non-neutral SNPs.

References

<references/>