Sequence-based mutation analysis HEXA

From Bioinformatikpedia
Revision as of 21:35, 25 June 2011 by Uskat (talk | contribs) (Results)

Mutations

SNP-id codon number mutation codon mutation triplet
rs4777505 29 Asn -> Ser AAC -> AGC
rs121907979 39 Leu -> Arg CTT -> CGT
rs61731240 179 His -> Asp CAT -> GAT
rs121907974 211 Phe -> Ser TTC -> TCC
rs61747114 248 Leu -> Phe CTT -> TTT
rs1054374 293 Ser -> Ile AGT -> ATT
rs121907967 329 Trp -> TER TGG -> TAG
rs1800430 399 Asn -> Asp AAC -> GAC
rs121907982 436 Ile -> Val ATA -> GTA
rs121907968 485 Trp -> Arg gTGG -> CGG

Analysis of the mutations

We created for each mutation an extra page. The summary of the analysis can be seen in the Summary Section.


Summary page

Here we sum up all analysis we did for the mutations:

Results

  • pysicochemical properitites: we called a mutation neutral, if the properitites of the mutated amino acid are very similar to them of the orginal amino acid. Otherwise, it is called non-neutral.
  • visuale analysis: a mutation is called neutral, if the structure of the changed amino acid is very similar to the structure of the original amino acid.
  • PAM1, PAM2, BLOSUM62 and PSSM analysis: a mutation is called neutral, if the change score is near to the score of the most frequent exchanged amino acid.
  • multiple alignment: a mutation is called non-neutral if the original amino acid is very conserved in the alignment. If there is a conservation rate less than 50%, we decided to call the mutation neutral.
  • analysis with JPred, PsiPred: if the mutated amino acid has no secondary structure (coil) in the prediction of the secondary structure, we called the mutation neutral.
  • analysis with the real structure: here we look, if the mutation takes place in a secondary structure element or not. Instead of DSSP, we used the real structure. Normally, the real structure is not available and therefore, this value can not be used in the prediction. Therefore, we do not use this value, since we decided if the mutation is neutral or not.
  • SNAP, SIFT and PolyPhen2 prediction: These are the three mutation prediction methods we used in our analysis. Here a mutation is called neutral, if the program predicts this mutation as neutral.


method  mutations
Asn -> Ser (rs4777505) Leu -> Arg (rs121907979) His -> Asp (rs61731240) Phe -> Ser (rs121907974) Leu -> Phe (rs61747114) Ser -> Ile (rs1054374) Trp -> TER (rs121907967) Asn -> Asp (rs1800430) Ile -> Val (rs121907982) Trp -> Arg (rs121907968)
pysiochemical properitites neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral neutral neutral non-neutral
visuale analysis neutral non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral neutral non-neutral
PAM1 neutral non-neutral non-neutral non-neutral no statement non-neutral no information neutral neutral neutral
PAM250 neutral non-neutral no statement non-neutral no statement no statement no information neutral neutral neutral
BLOSUM62 neutral no statement no statement non-neutral neutral non-neutral no information neutral neutral non-neutral
PSSM analysis neutral non-neutral non-neutral non-neutral non-neutral neutral no information neutral neutral non-neutral
multiple alignment neutral non-neutral non-neutral non-neutral non-neutral neutral neutral neutral neutral non-neutral
analysis with Jpred non-neutral non-neutral neutral neutral neutral neutral neutral neutral neutral neutral
analysis with PsiPred non-neutral non-neutral neutral neutral neutral neutral neutral non-neutral neutral neutral
analysis with real structure non-neutral no statement neutral neutral non-neutral neutral non-neutral no statement non-neutral non-neutral
SNAP Prediction neutral non-neutral non-neutral non-neutral neutral neutral no information neutral neutral non-neutral
SIFT Prediction neutral non-neutral non-neutral non-neutral neutral neutral no information neutral neutral non-neutral
PolyPhen2 Prediction neutral non-neutral non-neutral non-neutral neutral neutral no information neutral neutral non-neutral

Own prediction

In the table above you can see the summary of all of the analysis we did to got the possibility to make a statement about the mutation. Here we want to sum it up for each mutation and write it in the end down in an extra table. In the end, we wanted to compare our summing up with the reality and therefore we compared from which database the mutation was extracted.

  • Asn -> Ser

The first mutation we looked at, is a substitution from Asn to Ser. As you can see in our summary table, there was always a prediction that this mutation is silent, except of the analysis of the secondary structure. Therefore, this means that the mutation is in a secondary structure element. But these two amino acids seems to be very similar and therefore, it seems not to be that bad, if the mutation is in an secondary structure element, because the structure will not change dramatically. Therefore, in sum we predict this mutation as neutral. Also each of the prediction tools predicted this mutation as neutral. So therefore in sum, we think this is a neutral mutation.

  • Leu -> Arg

It is a little bit more complicate to predict the effect of this mutation as before, because there are conflicting predictions of the single categories. In sum, most of the categories counted this mutation as non-neutral. By BLOSUM62 it was not possible to make a statement, because the score of this mutation was between the rarest and highest score and therefore, it was not possible to assign the mutation to one of the two categories. The multiple alignment was good conserved at this position, and therefore with this analysis the mutation seems to be neutral. In the case of the comparison with the real strucuture it was not possible to make a clear statement, because this amino acid is located at the boarder between a secondary structure element and a coiled region and therefore, we do not know if a mutation at the last position of the secondary structure really change the structure dramatically. But this is not that important for our prediction, because we do not attend this. So therefore, in sum we predicted this mutation as non-neutral, which is the same result as the methods gave us.

  • His -> Asp

In this case we have to differ between the secondary structure analysis and the other analysis. So the other analysis showed, that this mutation might not be neutral. In the analysis with PAM250 and BLOSUM62 it was not possible to make a statement, so therefore, the amin acid mutated some times, but without a trend the a very common or very rare mutation. The secondary structure analysis predicted a neutral mutation, which means this mutation does not take place at a secondary structure element. But our analysis method for the secondary structure is very simple. We do not regard any contacts with other amino acids in the structure (which would be there). So it is not absolutly impossible, that the mutation of an amino acid in a loop region do have any effects on structure and function of the protein, especially, if the physicochemical properitites differ. Furthermore, we know that there exist disordered regions, which are essential for the function of a protein, which do not have a defined secondary structure and therefore, will be predicted as coiled regions. So this is a difficult case, but because of the different physicochemical properitites and the very different structure of the two amino acids and also the results of the multiple alignment, we decided to predict this mutation as non-neutral.

  • Phe -> Ser

In this case we have the same situation as before. All our analysis gave us the hint, that these mutation is non-neutral, except the secondary structure analysis. As we mentioned before, it is also possible that there is a big impact on the structure of the protein, even if the mutation takes place in a coil-region. Especially if we keep in mind, that there is the possibility of a disordered region. So we think, the secondary structure is not that a straight criterion for function of the protein than the mutation rate or the physicochemical properities. Therefore, we decided to predict this mutation as non-neutral, which is consisently with the results of the three prediction methods.

  • Leu -> Phe

This mutation is a very interessting case. Here we have a lot of methods, which gave us other hints. So first of all, both amino acids have the identical physicochemical propertites, which is always a strong hint that the mutation does not destroy the function of the protein. Otherwise, if we have a look at the structure of the amino acids, there is a big difference between Leu and Phe and therefore, this is a hint for changing the structure of the amino acid. It was no possible to make a statement about the effect of this mutation by regarding the PAM matrices, but in the BLOSUM matrix this mutation is noted as neutral. The PSSM analysis and the multiple alignment analysis, however, suggest that the mutation is non-neutral. Very interesstingly is the result of the secondary structure analysis. So if we have a look at the results of PsiPred and JPred, we have to suggest, that this mutation is neutral, because it takes place in a coiled region. But both methods predicted the secondary structure wrong, because if we have a look to the real structure, we can see, that the mutation takes place in a secondary structure element. So it is important to keep in mind, that we work on predictions, which could be wrong. But as we said before, the real structure is not regarded in our manually prediction and therefore, we decided that this mutation is neutral for the following reasons. First of all the physicochemical propertites are equal and this is a very important point. Next, the structure of the residues is not similar, but the mutation takes place at a coiled region, and therefore a wrong structure would not be that dramatically as in a secondary structure element. Furthermore, BLOSUM62 told us, that this substitution is neutral. So in sum, we have more neutral predictions that non-neutral predictions. Of course, the multiple alignment is a strong hint, that the mutation is non-neutral, but as we mentioned above, we also do not know if the alignment is right and we have two secondary structure methods, which gave us the same result. Therefore we have to trust the predictions. Therefore, we predicted the same effect as the methods did.

  • Ser -> Ile

Interesstingly in this case the physicochemical properities are not identical and also the substitution matrices scored this substitutions as non-neutral, but the rest of our predictions shows that the effect of this mutation is neutral. So there is a similar structure of the residues, the alignment is not conserved and also the pssm of the PsiBlast run do not show any conservation of this residue. Furthermore, the mutation takes place in a coiled region. Although the physicochemical properitites and the substitution matrices are very important hints for the effect of the mutation we decided to predict this mutation as silent. First of all, there are 5 predictions which predict this mutations as silent and only 3 predictions which see a causing effect of this mutation. An argument for a silent mutation is, that the pysicochemical propertites perhaps are not that important for a residue which is located in a coiled region, especially if this residue does not have many connections to other residues. In general the substitution matrix showed, that this mutation is not neutral, but the PSSM predicts it as neutral. The PSSM also regards the position of the substitution in the sequence. So therefore, it is possible, that this mutation is normally no silent, but in this special case we have a neutral mutation. This prediction is equal to the predictions of the methods.

  • Trp -> Ter

In this case it is not neceassry to have a look at the different predictions of the single analysis. This mutation is located at the middle of the protein and leads to a short protein, which surley could not fold in the right way and therefore could not function anymore. Therefore, this mutation is non-neutral. Sadly, it was not possible to predict the effect of a mutation which leads to shortened protein and therefore, it is not possible to compare the results of the methods with our prediction results. This is bad, because it is also possible that a mutation which leads to a shortened protein is neutral, if the mutation takes place at the very end of the protein. But in this case the mutation takes place at the middle of the protein and therefore, it is predicted as neutral from us.

  • Asn -> Asp

This mutation is a clear thing, because only the visuale analysis do not predict this mutation as neutral. Furthermore, the PsiPred method does also not predict this neutral. This is not very surprisingly, because if we have a look at the real structure of the protein we can see, this amino acid is directly located at the border between a secondary structure element and a coiled region. But the rest of our predictions, especially the physicochemical propertites and the multiple alignment as well as the substitution matrices showed clearly, that this prediction is neutral. The method we used here for the prediction also gave us the same result.

  • Ile -> Val

This mutation is also very easy to classify, because every of our categories predicted the mutation as neutral. Only the comparison with the real structure gave us the hint, that this prediction perhaps is not neutral, because it takes place at a secondary structure element. But firstly, we do not regard the comparison with the real structure and the secondary structure prediction methods failed and secondly, the structure of the residues and the physicochemical properitites are very similar and therefore, it should not have big effects on the structure of the protein, even if the mutation is loacted inside a secondary structure element. Therefore, we predicted this mutation as neutral, which was also the result of the three prediction methods.

  • Trp -> Arg

In our last analysed mutation only the secondary structure methods predict this mutation as neutral and the substitution matrices. All other categories scored this mutation as non-neutral. As we can see, the secondary structure prediction failed, because this mutation is located at a secondary structure element. We predicted this mutation as non-neutral. First of all, we have 5 predictions for non-neutral and 4 for neutral. But only 1 categories difference is in general not enough to give a prediction. But the very important categories (physicochemical properities, alignment, pssm) predict this mutation as non-neutral and we scored these categorires as more important than for example secondary structure. Therefore, we decided to predict this mutation as non-neutral, which is consitent with the results of the three prediction methods.

We decided to sum the predictions up, to give the reader to possiblity to see our predictions in one view. Furthermore, because we want to verify our predictions in the next section, we also listed the prediction results from the other methods one more time:

mutation our prediction SNAP SIFT PolyPhen2
Asn -> Ser (rs4777505) neutral neutral neutral neutral
Leu -> Arg (rs121907979) non-neutral non-neutral non-neutral non-neutral
His -> Asp (rs61731240) non-neutral non-neutral non-neutral non-neutral
Phe -> Ser (rs121907974) non-neutral non-neutral non-neutral non-neutral
Leu -> Phe (rs61747114) neutral neutral neutral neutral
Ser -> Ile (rs1054374) neutral neutral neutral neutral
Trp -> TER (rs121907967) non-neutral no information no information no information
Asn -> Asp (rs1800430) neutral neutral neutral neutral
Ile -> Val (rs121907982) neutral neutral neutral neutral
Trp -> Arg (rs121907968) non-neutral non-neutral non-neutral non-neutral

If we look at the table above, we can see that there is 100% consensus between our predictions and the predictions of the different methods (except Trp -> TER, because the other methods were not possible to predict the effect of a chain termination). So therefore, it seems useful to compare the prediction results with the real effects of the mutation, which was done in the next section.

Comparison with the databases

Here we wanted to figure out, if we and the methods predicted the mutation correctly. Therefore, we dissolve from which database the mutation was taken. We already know, that mutations only annotated in SNP-DB are silent, whereas mutations which are annotated in HGMD oder in both databases are non-neutral.

mutation database predictions
our SNAP SIFT PolyPhen2
Asn -> Ser (rs4777505) SNP-DB (neutral) right right right right
Leu -> Arg (rs121907979) HGMD and SNP-DB (non-neutral) right right right right
His -> Asp (rs61731240) SNP-DB (neutral) wrong wrong wrong wrong
Phe -> Ser (rs121907974) HGMD and SNP-DB (non-neutral) right right right right
Leu -> Phe (rs61747114) SNP-DB (neutral) right right right right
Ser -> Ile (rs1054374) SNP-DB (neutral) right right right right
Trp -> TER (rs121907967) HGMD and SNP-DB (non-neutral) right no information no information no information
Asn -> Asp (rs1800430) SNP-DB (neutral) right right right right
Ile -> Val (rs121907982) SNP-DB (neutral) right right right right
Trp -> Arg (rs121907968) HGMD and SNP-DB (non-neutral) right right right right
right/all 9/10 8/10 8/10 8/10

As we can see in the table above, the prediction of the effect of the SNPs is very good. We predicted 90% of all cases correctly, whereas the prediction methods predict 88% (if we do not regard the chain termination). We think, it is ok to ignore that all methods do not predict what happens if the chain terminates too early, because it is clear, that in almost all cases the mutation is non-neutral. The onlyest prediction which was wrong, was the mutation from His -> Asp which was predicted as non-neutral, although the mutation is neutral. If we have a look at the table with the results of the single analysis from us, we can see, that in this case it seems to be clear, that this mutation is non-neutral. Therefore, there has to be some other influences, which we did not regard in our analysis. We think, that this mutation is a special case, or perhaps it is wrong annotated or already missing in the HGMD data base. But in general we can say, that the predictions worked very well.

Comparison of the different prediction methods

First of all, you can say each prediction method works very well. If you make a manual prediction, if course you learn more about the protein and this specific mutation, but it is very time-consuming and in the end the results mostly not even better than the results of the prediction methods. All prediction methods showed the same results and therefore all of them worked very well. In our opinion SNAP and SIFT are a little better than PolyPhen2, because by SNAP and SIFT it is possible to upload a list with a lot of mutations whereas PolyPhen2 has to be started for each mutation. Therefore, SNAP and SIFT are more userfriendly than PolyPhen2. If we compare SNAP and SIFT there is no big difference. However, we wanted to mentioned, that SNAP is the slowest method. Otherwise, the output from SNAP is very good and it is easy to parse the output, whereas the output from SIFT is in html tables or pictures, what is a difficult format, if you want to parse the output. Therefore, we suggest the user to use SNAP if there is enough time.

Important points for predicting the effect of a SNP

Now we learned a lot about the different influences on a SNP and what is useful to regard to decide if a SNP is neutral or non-neutral. But not each categories is such important than others. Here we want to rank these categories, because if the different categores showed different results it seems useful to know which categories have more impact to the effects than others.

Therefore, in our opinion the most important category is the physicochemical properities of the different residues. Next, very important are also the size of the differenct residues, especially if a small amino acid is replaced by a very large amino acid. Also very important in our opinion is the value of the PSSM and the conservation in the multiple alignment. We scored the pssm value and the conservation more than the values of the substitution matrices, because the values in the matrix is position independent, which is not true in our case. So especially conservation is also a very important point, because this is a good hint how important is the residue for the protein family. Although it is important to keep in mind, that we do not know how the right alignment looks and if we worked with a good alignment. Not that important is the location of the substitution in our view. Because there is the possibility, that the mutation is in a disordered region and therefore if you argument, that a mutation in a coiled region is not that bad as in a secondary structure element, in this case this argumentation is totally wrong. Furthermore, if the amino acids are very similar in properities, size and the residue is not that important for the function of the protein, this mutation could be located in a secondary structure element without changing anything in the protein. Furthermore, there are a lot of residues, which are located at the border of a secondary structure element. And in this case, it is absolutly unclear if a substitution change anything by the function of the protein, although if the secondary structure element ends two or three residues earlier than in a non-mutated sequence. Therefore, in sum, we thing, it is important to regard all possible categories, but if there a doubts about the effect of the mutation it is also important to score the categories and decided with the scoring.