Sequence-based mutation analysis (Phenylketonuria)
Contents
Summary
In this task different features that are based on sequence information are used to predict the effects of SNPs.
Therefore substitution matrices like PAM or BLOSUM as well as the properties of amino acids are used. Furthermore a closer look on the structur is taken to see, if the SNP might disturb a HELIX or STRAND or if it is located in a more flexible region of an uncoiled part of the sequence. Four different prediction tools are used, too. Three of those - SIFT<ref name="sift"> Prateek Kumar, Steven Henikoff & Pauline C Ng (2009): "Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm". Nature Protocols 4, - 1073 - 1081 doi:10.1038/nprot.2009.86 </ref> , PolyPhen2<ref name="polyphen2"> Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. (2010)): "A method and server for predicting damaging and missense mutations". Nat Methods7(4):248-249 doi:10.1038/nmeth0410-248 </ref> and SNAP<ref name="snap"> Yana Bromberg and Burkhard Rost (2007): "SNAP: predict effect of non-synonymous polymorphisms on function.". Nucleic Acids Research, Vol. 35, No. 11: 3823-3835 (PubMed) doi:10.1093/nar/gkm238 </ref> - are based on the sequence similarity and the conservation of the amino acids at particular positions in a multiple alignment, but also other different features are used. The fourth one is MutationTaster, which integrates information of different biomedical databases.
With all those different features a consensus prediction for each of the ten substitutions is made and compared with the information given in the HGMD and dbSNP databases.
Mutation dataset
For the anylsis of sequence-based mutations we randomly choose ten SNPs that lead to an amino acid exchange out of HGMD and dbSNP (<xr id="snps"/>). The detailed information about the SNPs can be found in the Lab journal. Here we act as if we do not know the effect of the SNPs. <figtable id="snps">
SNPs | |||
---|---|---|---|
AA - three letter | AA - one letter | Nucleotides | |
Ala259Val | A259V | C776T | |
Arg123Ile | R123I | G368T | |
Gln20Leu | Q20L | A59T | |
Gln172His | Q172H | G516T | |
Gly103Ser | G103S | G307A | |
Ile421Thr | I421T | T1262C | |
Lys341Thr | K341T | A1022C | |
Phe392Ser | F392S | T1175C | |
Pro416Gln | P416Q | C1247A | |
Thr266Ala | T266A | A796C |
For example A259V stands for alanine that is mutated to valine at position 259. The third column represents the nucleotide that is changed to result in the amino acid exchange.
</figtable> Like you can see both the amino acid exchange as well as the nucleotide exchange is given. Those informations are used for the different prediction tools. SIFT, PolyPhen2 and SNAP use the amino acids as input, whereas MutationTaster is based upon the nucleotides.
Analyze SNPs
The complete results of the prediction tools can be found in the Lab journal.
Looking at the substitution matrices you can see that the substitutions analyzed in this task never have the worst values.
As 2pah begins at position 118 for amino acids 20 and 103 the substitutions cannot be shown with pymol. A comination of the other eight mutations can be seen in <xr id="all_mutations"/>. None of those amino acids participates in forming the binding site, so it is not disturbed by any of the substitutions that are examined here.
<figure id ="all_mutations">
</figure>
Ala-259-Val
<figure id="A259V">
</figure>
Properties | alanine is small, non-polar, neutral and hydrophobic |
valine is aliphatic, small, non-polar, neutral and hydrophobic | |
Structure | LOOP |
Substitution Matrices | middle ranged value → neutral substitution |
PSSM | alanine seems to be highly conserved with a value of 70. Valine has a value of 1. |
SIFT | TOLERATED with a score of 0.16 |
PolyPhen2 | probably damaging with a score of 1.000 |
SNAP | non-neutral, RI = 2 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
For the substitution of alanine to valine the properties do not change very much as the only difference is that valine is aliphatic. This can also be seen in the substitution matrices where such an exchange has a middle ranged value and therefore indicates a neutral substitution. Additionally SIFT predicts this mutation to be tolerable, but the score is not very high. When looking at the PSSM matrix that was created with psiblast a high conservation of alanine on position 259 can be seen, where valine only has a low tolerable value here. Furthermore the other three prediction tools all predict a non-neutral effect of this substitution where most notably PolyPhen2 has a very high score. SNAP only has a reliability index of two, but an accuracy of 70%. Looking at the structure itself (<xr id="A259V"/>) you can see that the substitution is located in a LOOP, however it is directly beneath a HELIX or the last amino acid of it, which is predicted to get lost by MutationTaster if the amino acid is changed. In this case, to decide if the substitution is neutral or not is really hard as the different features suggest both. However, we predict this mutation to be non-neutral.
Arg-123-Ile
<figure id="R123I">
</figure>
Properties | arginine is positively charged, polar and hydrophilic |
isoleucine is aliphatic, neutral, non-polar and hydrophobic | |
Substitution Matrices | low value → bad substitution |
Structure | LOOP |
PSSM | arginine seems to be highly conserved with a value of 49. Isoleucine has a value of 7. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00 |
PolyPhen2 | possibly damaging with a score of 0.807 / 0.582 |
SNAP | neutral, RI = 0 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
Here a lot of the different features indicate a non-neutral and probably damaging substitution, which begins with the complete different properties of the two amino acids. Furthermore the substitution matrices show low values and arginine is high conserved on position 123. Beside of SNAP all prediction tools forecast a bad substitution, however PolyPhen2 only indicates a possibly damaging mutation. Nevertheless SNAP itself has the lowest reliability index of 0. This substitution lies in a LOOP and a similar conformation of the two amino acids in the molecule can be seen in <xr id="R123I"/>. Altogether we predict that the substitution of arginine to isoleucine at position 123 of the PAH protein is non-neutral and might affect function.
Gln-20-Leu
Properties | glutamine is neutral, polar and hydrophilic |
leucine is aliphatic, neutral, non-polar and hydrophilic | |
Substitution Matrices | low value → bad substitution |
Structure | LOOP |
PSSM | seems to be not conserved as the value for aspartic acid (17) is higher than for glutamine (14).
leucine has a value of 6. |
SIFT | TOLERATED with a score of 0.90 |
PolyPhen2 | benign with a score of 0.000 |
SNAP | neutral, RI = 0 |
MutationTaster | disease causing |
⇒ prediction: neutral |
The exchange of glutamine to leucine at position 20 is predicted to be neutral, as there is much indication for this by most features. Only the substitution matrices indicate a bad substitution. Glutamine seems to be not conserved at this position, but flexible as there is a higher value for aspartic acid in PSSM. SIFT and PolyPhen2 both get high scores for being a neutral substitution, where SNAP has a very low reliability index. Only MutationTaster predicts it to be disease causing. Again the substitution can be found in a LOOP region.
Gln-172-His
<figure id="Q172H">
</figure>
Properties | glutamine is neutral, polar and hydrophilic |
histidine is positively charged, polar and hydrophillic | |
Substitution Matrices | middle ranged → neutral substitution |
Structure | LOOP |
PSSM | glutamine seems to be conserved with a score of 29. |
histidine only has a score of 1. | |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.03. |
PolyPhen2 | possibly damaging with a score of 0.705 / benign with a score of 0.170 |
SNAP | neutral, RI = 4 |
MutationTaster | disease causing |
⇒ prediction: neutral |
The properties of glutamine and histidine coincide in being polar and hydrophilic. Furthermore they show middle ranged values in the substitution matrix with advance towards high scores. Again the substitution can be found in a LOOP region at the outside of the protein, which can be viewed in <xr id="Q172H"/>. Anew the prediction tools do not coincide with each other. So SIFT and MutationTaster predict an affect on the protein, whereas SNAP and PolyPhen2 indicate a neutral substitution. For PolyPhen2 you have to be carefull looking at the different datasets, however, we are more interested in HumVar, here. Altogether, we think that this mutation is neutral.
Gly-103-Ser
Properties | glycine is small, neutral, non-polar and hydrophilic |
serine is small, neutral, polar and hydrophilic | |
Substitution Matrices | middle ranged value → neutral substitution |
Structure | HELIX |
PSSM | glycine seems to be completely unconserved as every amino acid has the same value 5. |
SIFT | TOLERATED with a score of 0.05 |
PolyPhen2 | benign with a score of 0.003 / 0.006 |
SNAP | neutral, RI = 6 |
MutationTaster | disease causing |
⇒ prediction: neutral |
For the substitution of glycine to serine the properties do not have to change much as the only difference is that the original amino acid is non-polar and the new one polar. This can also be viewed in the substitution matrices where the middle ranged values indicate a neutral substitution. In this case the residue is located on a HELIX. As PSSM shows that glycine is unconserved with all other amino acids including serine have the same value and SIFT, PolyPhen2 and SNAP all predict that there is no affect on the protein where the last two show reliable scores, we assume this substitution to be neutral. Nevertheless MutationTaster predicts this mutation to be disease causing as the ACT domain is forecasted to get lost.
Ile-421-Thr
<figure id="I421T">
</figure>
Properties | isoleucine is aliphatic, neutral, non-polar and hydrophobic |
threonine is small, neutral, polar and hydrophobic | |
Substitution Matrices | low value → bad substitution |
Structure | STRAND |
PSSM | isoleucine with a value of 32 seems to be not conserved as valine has a value of 41.
threonine has a value of 0. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00. |
PolyPhen2 | possibly damaging with a score of 0.667 / probably damaging with a score of 0.913 |
SNAP | non-neutral, RI = 2 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
At position 421 we found a substitution of isoleucine to threonine. This substitution can be found on a beta STRAND as you can see in <xr id="I421T"/>. Threonine seems to have some differences to fit into that position as it only shows similarities in the sidechain but not in the STRAND itself. The properties, however, are not clearly similar or not as they change from aliphatic and non-polar to small and polar, but keep to be neutral and hydrophobic. This can be also viewed in the substitution matrices, where the substitution has low values. Nevertheless for PAM a slightly tendency to a neutral substitution can be viewed. Although isoleucine with a value of 32 is not conserved on this position with valine having the highest score, threonine is not likely to be a good replacement as it gets a value of 0. Additionally all four prediction tools agree on a damaging affect. So this substitution most likely is non-neutral.
Lys-341-Thr
<figure id="K341T">
</figure>
Properties | lysine is positively charged, polar and hydrophobic |
threonine is small, neutral, polar and hydrophobic | |
Substitution Matrices | low value → bad substitution |
Structure | STRAND |
PSSM | lysine seems to be conserved with a value of 39.
threonine has a value of 1. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00. |
PolyPhen2 | probably damaging with a score of 1.000 / 0.996 |
SNAP | non-neutral, RI = 3 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
In this case the properties only changed from positively charged to small and neutral, however the substitution matrices show more or less low values indicating a bad substitution. Furthermore the residue is on a beta STRAND and as you can see in <xr id="K341T"/> there are some structural differences between the two amino acids. Additionally lysine is highly conserved on position 341 where threonine only has a score of 1. Again all four prediction tools coincide on forecasting an effecting substitution. Therefore we are pretty sure that this substitution really is non-neutral and probably damaging.
Phe-392-Ser
<figure id="F392S">
</figure>
Properties | phenylalanine is aromatic, neutral, non-polar and hydrophobic |
serine is small, neutral, polar and hydrophilic | |
Substitution Matrices | low value → bad substitution |
Structure | HELIX |
PSSM | phenylalanine seems to be highly conserved with a value of 49. Serine has a value of 0. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00. |
PolyPhen2 | probably damaging with a score of 1.000 |
SNAP | non-neutral, RI = 1 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
For the exchange of phenylalanine with serine on position 392 all features indicate a bad effect. Most probably this substitution is disease causing. Beginning with the properties you can see that the only accordance is that it remains neutral and also the substitution matrices show very low values. Furthermore the structures of the amino acids are very different and therefore most likely have a drastic affect on the alpha HELIX, where this residue can be found (<xr id="F392S"/>). Also conservation and prediction tools indicate an effecting substitution with high reliability.
Pro-416-Gln
<figure id="P416Q">
</figure>
Properties | proline is small, neutral, non-polar and hydrophilic |
glutamine is neutral, polar and hydrophilic | |
Substitution Matrices | low value → bad substitution |
Structure | LOOP |
PSSM | proline seems to be highly conserved with a value of 66. Glutamine has a value of 0. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00. |
PolyPhen2 | probably damaging with a score of 0.996 / 0.985 |
SNAP | non-neutral, RI = 1 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
Like in the substitution before most features indicate a bad substitution. However, the exchange of proline to glutamine does not change the properties that much, but gets low values in the substitution matrices. Like you can see in <xr id="P416Q"/> the substitution occurs on a coiled region and therefore probably would be not so bad, if it would be outside the protein. However, as it lies between to beta STRANDs with only a small LOOP, it most likely affect both STRANDs. Additionally it is really high conserved which can be viewed in the PSSM. Again all four tools predict an affect on the protein.
Thr-266-Ala
<figure id="T266A">
</figure>
Properties | threonine is small, neutral, polar and hydrophobic |
alanine is small, neutral, non-polar and hydrophobic | |
Substitution Matrices | middle ranged value &rarr neutral substitution |
Structure | HELIX |
PSSM | threonine seems to be conserved with a value of 43. Alanine has a value of 29. |
SIFT | AFFECT PROTEIN FUNCTION with a score of 0.00. |
PolyPhen2 | probably damaging with a score of 1.000 |
SNAP | neutral, RI = 2 |
MutationTaster | disease causing |
⇒ prediction: non-neutral |
The last substitution shows some similarities in the properties, where only a change in polarity can be found. Also the substitution matrices indicate a neutral substitution. Nevertheless, threonine is conserved, althoug alanine also is often seen on this position. Furthermore, SIFT and PolyPhen2 agree with reliable scores that there is an affect on the protein which might cause a disease. SNAP, however, predicts a neutral substitution, whereas MutationTaster forecast that it is disease causing. Looking at the structure shown in <xr id="T266A"/> you can see that the residue can be found on the end of a HELIX. Threonine and alanine show similar orientations. Altogther it is not easy to say, if this mutation is neutral or not. However, with the change of polarity and high conservation as well as the scores of the prediciton tools we assume this mutation to be non-neutral, but maybe with smaller effects than the other non-neutral substitutions above.
Mammalian Homologous Sequences
We used a normal BLAST search on the mammalian database of Uniprot and filter the results per hand for double entries. Altogether we found 22 homologues sequences (<xr id="IDs"/>). <figtable id="IDs">
UniProt-IDs homologue to P00439 | |||||||
---|---|---|---|---|---|---|---|
H2Q6R0 | G3S964 | G1R3M2 | G7PJC2 | F7HMW9 | F7I717 | F7BKF9 | |
F6XY00 | E2R366 | G1T8B6 | H2NIF5 | M3YKN3 | M9P0Q7 | H0WTI6 | |
M3W9R1 | G3TIW0 | G1LIM6 | Q2KIH7 | G1P4I7 | P16331 | M9P0Y5 |
</figtable>
For the creation of the multiple alignment we used clustalw. Nevertheless, only F6XY00 has a slightly different sequence. Like E2R366 F6XY00 is a homologue sequence of the human PAH in canis familiaris. On the positions of our substitutions the sequences are conserved.
Prediction
<figtable id="summary">
SNP-Prediction | |||
---|---|---|---|
SNP | Prediction | Validation | |
Ala259Val | non-neutral | correct | HGMD |
Arg123Ile | non-neutral | ? | dbSNP |
Gln20Leu | neutral | wrong | HGMD |
Gln172His | neutral | ? | dbSNP |
Gly103Ser | neutral | wrong | HGMD |
Ile421Thr | non-neutral | ? | dbSNP |
Lys341Thr | non-neutral | correct | HGMD |
Phe392Ser | non-neutral | ? | dnSNP |
Pro416Gln | non-neutral | correct | HGMD |
Thr266Ala | non-neutral | correct | dbSNP |
</figtable>
In <xr id="summary"/> you can see that it is not easy to predict the effect of SNPs. For the SNPs that were taken from HGMD three are predicted correctly as non-neutral and the other two wrong as neutral. For dbSNP Thr266Ala is correctly predicted to have an effect, whereas for the other four it cannot be said, as there is no further information if they are neutral or not. Altogether for PAH there are lot of substitutions that lead to PKU.
Using different features you can see that there is not one true feature as sometimes it is important that the properties keep the same and for other residues the structure position, if it is in a LOOP, HELIX or STRAND, inside or outside the protein, is more important. Thereby looking at the amino acid and its substitution visually also gives hints. Prediction tools must be handeled carefully and to decide if you can trust a prediction looking at probability and evaluation scores is important. For MutationTaster we always get disease causing as result with high probabilities, which makes sense for the HGMD proteins as the program looks at the databases and reports if the SNP was found.
The most important feature in our case here seems to be the conservation of an amino acid at a specific position, whereas mammalian homologues only tells us, that PAH seems to be highly conserved in different species, but does not help us to decide between neutral and non-neutral SNPs.
References
<references/>