Difference between revisions of "Sequence-based mutation analysis TSD"
(→PolyPhen2) |
(→PolyPhen2) |
||
Line 271: | Line 271: | ||
== SIFT == |
== SIFT == |
||
== PolyPhen2 == |
== PolyPhen2 == |
||
+ | |||
+ | |||
<figtable id="tab:biochem"> |
<figtable id="tab:biochem"> |
||
{| class="wikitable", style="width:600px; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000" |
{| class="wikitable", style="width:600px; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000" |
||
+ | |+ Table 1: Prediction results for all SNPs by PolyPhen2. '''Prediction''' is the predicted effect for the specific SNP. With 'probably damaging' there is less than 5% FPR <ref name="pph2">Peshkin,L. et al. (2010) A method and server for predicting damaging missense mutations. Nature methods, 7, 248-249.</ref>. '''pph2_class''' is the output of a binary effect classification and '''pph2_prob''' the raw probability of the mutation having a damaging effect. The column '''based_on''' present in the output file as well is not shown since it is a remnant of PolyPhen1 and obsolote <ref name="pph2_output>http://genetics.bwh.harvard.edu/pph2/dokuwiki/appendix_a</ref>. |
||
− | |+ Table 1: Prediction results for all SNPs by PolyPhen2. |
||
|- align="center" |
|- align="center" |
||
! style="border-style: solid; border-width: 0 0 2px 0" | Mutation |
! style="border-style: solid; border-width: 0 0 2px 0" | Mutation |
||
! style="border-style: solid; border-width: 0 0 2px 0" | Prediction |
! style="border-style: solid; border-width: 0 0 2px 0" | Prediction |
||
− | ! style="border-style: solid; border-width: 0 0 2px 0" | based_on |
||
! style="border-style: solid; border-width: 0 0 2px 0" | pph2_class |
! style="border-style: solid; border-width: 0 0 2px 0" | pph2_class |
||
! style="border-style: solid; border-width: 0 0 2px 0" | pph2_prob |
! style="border-style: solid; border-width: 0 0 2px 0" | pph2_prob |
||
Line 283: | Line 284: | ||
| style="border-style: solid; border-width: 0 0 0 0" | M1V |
| style="border-style: solid; border-width: 0 0 0 0" | M1V |
||
| style="border-style: solid; border-width: 0 0 0 0" | benign |
| style="border-style: solid; border-width: 0 0 0 0" | benign |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment_mz |
||
| style="border-style: solid; border-width: 0 0 0 0" | neutral |
| style="border-style: solid; border-width: 0 0 0 0" | neutral |
||
| style="border-style: solid; border-width: 0 0 0 0" | 0.063 |
| style="border-style: solid; border-width: 0 0 0 0" | 0.063 |
||
Line 289: | Line 289: | ||
| style="border-style: solid; border-width: 0 0 0 0" | L39R |
| style="border-style: solid; border-width: 0 0 0 0" | L39R |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 295: | Line 294: | ||
| style="border-style: solid; border-width: 0 0 0 0" | C58Y |
| style="border-style: solid; border-width: 0 0 0 0" | C58Y |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | sequence annotation |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 301: | Line 299: | ||
| style="border-style: solid; border-width: 0 0 0 0" | L127R |
| style="border-style: solid; border-width: 0 0 0 0" | L127R |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 307: | Line 304: | ||
| style="border-style: solid; border-width: 0 0 0 0" | R170W |
| style="border-style: solid; border-width: 0 0 0 0" | R170W |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 313: | Line 309: | ||
| style="border-style: solid; border-width: 0 0 0 0" | R178H |
| style="border-style: solid; border-width: 0 0 0 0" | R178H |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | structure |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" |1 |
| style="border-style: solid; border-width: 0 0 0 0" |1 |
||
Line 319: | Line 314: | ||
| style="border-style: solid; border-width: 0 0 0 0" | S210F |
| style="border-style: solid; border-width: 0 0 0 0" | S210F |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 325: | Line 319: | ||
| style="border-style: solid; border-width: 0 0 0 0" | D258H |
| style="border-style: solid; border-width: 0 0 0 0" | D258H |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
||
Line 331: | Line 324: | ||
| style="border-style: solid; border-width: 0 0 0 0" | L451V |
| style="border-style: solid; border-width: 0 0 0 0" | L451V |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 0.994 |
| style="border-style: solid; border-width: 0 0 0 0" | 0.994 |
||
Line 337: | Line 329: | ||
| style="border-style: solid; border-width: 0 0 0 0" | E482K |
| style="border-style: solid; border-width: 0 0 0 0" | E482K |
||
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
| style="border-style: solid; border-width: 0 0 0 0" | probably damaging |
||
− | | style="border-style: solid; border-width: 0 0 0 0" | alignment |
||
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
| style="border-style: solid; border-width: 0 0 0 0" | deleterious |
||
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
| style="border-style: solid; border-width: 0 0 0 0" | 1 |
Revision as of 11:21, 17 June 2012
There was only one catch and that was Catch-22, which specified that a concern for one's own safety in the face of dangers that were real and immediate was the process of a rational mind. Orr was crazy and could be grounded. All he had to do was ask; and as soon as he did, he would no longer be crazy and would have to fly more missions. Orr would be crazy to fly more missions and sane if he didn't, but if he was sane, he had to fly them. If he flew them, he was crazy and didn't have to; but if he didn't want to, he was sane and had to. Yossarian was moved very deeply by the absolute simplicity of this clause of Catch-22 and let out a respectful whistle.
"That's some catch, that Catch-22," he observed.
"It's the best there is," Doc Daneeka agreed.
-Catch 22
The journal for this task can be found here.
Contents
Mutations
Dataset
The following SNPs, selected by an unbiased source, will be analysed: M1V, L39R, C58Y, L127R, R170W, R178H, S210F, D258H, L451V and E482K.
Pick 10 mutations (SNPs) of your dataset, some of which are from the HGMD (missense mutations) and some that were only found in dbSNP ( change in amino acid sequence but not found in the HGMD). Shuffle them and PLEASE do not try to memorize whether they cause the disease! The goal is to pretend that we do NOT know what is going on. It would be great if the most common disease-causing mutations would be included, too.
Chemical properties
The biochemical properties of the wildtype and mutant amino acids of the chosen SNPs ale listed in <xr id="tab:biochem"/>. Displayed are the hydrophobicity in form of the hydropathy index and the according category, the volume with the matching characterisation, the charge and the grantham score.
The Grantham scores predicts the effect of substitutions between amino acids based on chemical properties, including polarity and molecular volume. It categorizes codon replacements into classes of increasing chemical dissimilarity, and it ranges from 5 to 215<ref name="grantham">Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974; 185: 862-864 </ref>.
TODO: Cool stuff, where did volume come from and which of the many many hydrophobicity scales is this? How should I know?? Its from snpdbe.... wherever they got it from is far beyond my knowledge
<figtable id="tab:biochem">
Mutation | Wildtype | Mutant | Grantham score | ||||
---|---|---|---|---|---|---|---|
Hydrophpbicity | Volume | Charge | Hydrophpbicity | Volume | Charge | ||
M1V | 1.9 (nonpolar) | 162.9 (bulky) | neutral | 4.2 (nonpolar) | 140.0 (small) | neutral | 21 |
L39R | 3.8 (nonpolar) | 166.7 (bulky) | neutral | -4.5 (polar) | 173.4 (bulky) | positive | 102 |
C58Y | 2.5 (polar) | 108.5 (small) | neutral | -1.3 (polar) | 193.6 (bulky) | neutral | 194 |
L127R | 3.8 (nonpolar) | 166.7 (bulky) | neutral | -4.5 (polar) | 173.4 (bulky) | positive | 102 |
R170W | -4.5 (polar) | 173.4 (bulky) | positive | -0.9 (nonpolar) | 227.8 (bulky) | neutral | 101 |
R178H | -4.5 (polar) | 173.4 (bulky) | positive | -3.2 (polar) | 153.2 (bulky) | neutral | 29 |
S210F | -0.8 (polar) | 89.0 (tiny) | neutral | 2.8 (nonpolar) | 189.9 (bulky) | neutral | 155 |
D258H | -3.5 (polar) | 111.1 (small) | negative | -3.2 (polar) | 153.2 (bulky) | neutral | 81 |
L451V | 3.8 (nonpolar) | 166.7 (bulky) | neutral | 4.2 (nonpolar) | 140.0 (small) | neutral | 32 |
E482K | -3.5 (polar) | 138.4 (bulky) | negative | -3.9 (polar) | 168.6 (bulky) | positive | 56 |
</figtable>
Structural observations
Now take into consideration where in the protein the mutation occurs and document: Create a picture with PyMOL showing the original and mutated residue in the protein. Use PyMOL for this. More thorough structural analyses will be introduced in the next task.
Using your secondary structure predictions from the previous tasks, investigate whether the mutations are inside secondary structure elements (Helix, Strand) or not.
<figure id="fig:snpsOnstr">
</figure>
Substitution matrices
BLOSUM62 and PAM
<xr id="tab:pamnlos"/> shows the substitution scores of all mutations for the BLOSUM62, PAM1 and PAM250 matrix. PAM1 and PAM250 are the extremes for this type of substitution matrix and likely not used often in reality <ref name="asd">TODO</ref>. BLOSUM62 however is one of the standard matrices used by default in large tools like BLAST. The BLOSUM62 matrix used here is the incorrectly calculated version, since it generally preforms better <ref name="wrongblosum">Styczynski, M. P., Jensen, K. L., Rigoutsos, I., & Stephanopoulos, G. (2008). BLOSUM62 miscalculations improve search performance. Nature biotechnology, 26(3), 274–275. Nature Publishing Group. Retrieved from http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-274.html</ref>.
An entry in the PAM matrix describes the probability that the wt amino acid is replaced by the mt amino acid in a given timeframe, e.g. 1PAM or 250PAM. TODO Scores within a BLOSUM are log-odds scores that measure, in an alignment, the logarithm for the ratio of the likelihood of two amino acids appearing with a biological sense and the likelihood of the same amino acids appearing by chance.[2]
<figtable id="tab:pamnlos">
Mutation | BLOSUM62 <ref name="blosumvalsrc">http://www.ncbi.nlm.nih.gov/Class/BLAST/BLOSUM62.txt</ref> | PAM (1<ref name="pamvalsrc1">http://www.icp.ucl.ac.be/~opperd/private/pam1.html</ref>/250<ref name="pamvalsrc250">http://www.icp.ucl.ac.be/~opperd/private/pam250.html</ref>) |
---|---|---|
M1V | 1 | 17/400 |
L39R | -2 | 1/200 |
C58Y | -2 | 3/300 |
L127R | -2 | 1/200 |
R170W | -3 | 2/200 |
R178H | 0 | 8/500 |
S210F | -2 | 2/200 |
D258H | -1 | 3/400 |
L451V | 1 | 11/1500 |
E482K | 1 | 7/800 |
</figtable>
In a BLOSUM62 matrix, the values range between -4, a very unlikely substitution to 11, the most likely substitution. With this in mind it becomes apparent that most SNPs describe rare changes. The most likely ones only have values of 1 which does not seem much, compared to the maximum of 11, however the 75% quantile of all values in the matrix is only 0, so a value of 1 is not as low as it might seem (TODO I'm writing strange stuff here, can I say that and what do the values truly mean?). On the other hand a value of 1 seems a lot considering that E482K is a change from a negatively charged amino acid to a positively charged one.
In PAM1 and PAM250, the minimum and maximum values are 0,9976 and 9,72 respectively. The high maximum especially for PAM1, stems from the diagonal of the matrix; the 90% quantile is only 22. As to be expected, given more time for the changes to occur (PAM1 to PAM250), the probability for the substitutions to occur rises. However the general relation between the SNPs mostly stays comparable. The biggest change occurs for M1V which is seemed much more likely given only one unit of time.
From this small example the two matrices seem to correlate well. All values that are high in BLOSUM62 are comparably high in both PAM matrices as well, the only exception being M1V which becomes significantly less probable in PAM250. Concerning the phenotype it would be rather far fetched to make an assumption based only on these values, especially since they are mostly close together. However both matrices seem to agree that if there are disease causing SNPs among the set, they are M1V, L451V and E482K, possibly R178H as well.
Look at the BLOSUM62 and PAM(1/250) matrix. What are the scores for the amino acid substitutions? Is it the worst possible substitution or not? Can we say anything about phenotype from this?
PSSM
PSI-Blast was used to create a PSSM that should contain more refined mutation scores since they are calculated separately for each position. The resulting scores are shown in <xr id="tab:pssm"/>. Comparing the values to the scores from position-independent matrices used above one can see that the general relations between the SNPs mostly stay the same. Exception are M1V, L451V and E482K becoming less likely and R178H becoming more likely.
The change in M1V is to be expected considering that the first position in the homologous sequences should mostly be occupied by methionine due to the start codon. The fact that R178H becomes more likely is surprising since this position is one of those originally identified to be potentially important for catalysis. This might be a small indicator that the original assumption was wrong or that it at least is not important in all homologs.
L451 is found in a single residue loop that could also be seen as an extension of the preceding alpha-helix and is directly followed by a beta-strand. In any case the side-chain at this position is pointing outside of the protein and with only beta-sheets and alpha-helices nearby it does not seem to fulfil a deeper purpose. Therefore exchanging the leucine against the even comparably similar valine should not be a problem and the decreased score is a surprise that cannot be explained for now. Apparently the residue does serve a purpose in other seqeunces. E482K is again an expected change, since there is not only the change in charge but structural analysis also suggests that glutamate forms hydrogen bonds to neighbouring residues.
<figtable id="tab:pssm">
Mutation | PSI-Blast PSSM |
---|---|
M1V | -6 |
L39R | -4 |
C58Y | -2 |
L127R | -7 |
R170W | -5 |
R178H | 3 |
S210F | 1 |
D258H | 1 |
L451V | -2 |
E482K | -2 |
</figtable>
Getting a bit closer to evolution you will have to create a PSSM (position specific scoring matrix) for your protein sequence using PSI-BLAST (5 iterations). How conserved are the WT residues in your mutant positions? How is the frequency of occurrence (conservation) for the mutant residue type? Anything interesting?
Multiple sequence alignments
And another step close to evolution: Identify all mammalian homologous sequences. Create a multiple sequence alignment for them with a method of your choice. Using this you can now calculate conservation for WT and mutant residues again. Compare this to the matrix- and PSSM-derived results.
Prediction
SIFT
PolyPhen2
<figtable id="tab:biochem">
Mutation | Prediction | pph2_class | pph2_prob |
---|---|---|---|
M1V | benign | neutral | 0.063 |
L39R | probably damaging | deleterious | 1 |
C58Y | probably damaging | deleterious | 1 |
L127R | probably damaging | deleterious | 1 |
R170W | probably damaging | deleterious | 1 |
R178H | probably damaging | deleterious | 1 |
S210F | probably damaging | deleterious | 1 |
D258H | probably damaging | deleterious | 1 |
L451V | probably damaging | deleterious | 0.994 |
E482K | probably damaging | deleterious | 1 |
</figtable>
SNAP
Finally, we use three different approaches to score our mutants. SIFT Polyphen2 SNAP is installed on the student cluster and should be used command-line only. You will need to create your own ~/.snapfunrc (unless Tim will change the default one) to point to the correct paths. -- As blast is the bottleneck of SNAP, and you are doing that anyway, we might as well look at all possible substitutions in the position of our mutations. This way we can learn much more about the nature of the given mutation: Is our mutation problematic because we introduce an unwanted effect, or because the WT residue is essential and by mutating we remove that?
Consensus
Compare ALL results and create an overview table. Try to come up with a consensus between all the findings requested above.
Evaluation
Check whether you are right in the HGMD – were you able to predict a change?
For this task it is very important to us that you properly interpret and discuss your results. The production of the data should not take that long – so you have more time to do real science!