Difference between revisions of "Task 6: MSUD - Sequence-based mutation analysis"

From Bioinformatikpedia
(Discussion)
 
(11 intermediate revisions by 2 users not shown)
Line 54: Line 54:
 
|}
 
|}
   
== PSI-BLAST - PSSM ==
+
== PSI-BLAST - PSSM // homolog MSA ==
The command we used to create the PSSM-file, and the file itsself can be found in the [[task6-MSUP-journal |journal]]
+
The command we used to create the PSSM-file, the file itsself, and the MSA can be found in the [[task6-MSUP-journal |journal]]
   
 
The underlying database is big_db.
 
The underlying database is big_db.
  +
To interpret these values better, we should find all mammalian hologs of the gene, build a MSA out of these, and calculate the conservation of wild type and mutant in this context. We used HomoloGene<ref>http://www.ncbi.nlm.nih.gov/homologene</ref> from NCBI for this task. Because the result was, there are no mammalian homologs, we just used all results from the search, which can be found at <ref>http://www.ncbi.nlm.nih.gov/sites/homologene/569</ref>.
 
  +
<table>
  +
<tr><td>
 
<table border=1>
 
<table border=1>
 
<tr><td>pos</td><td>wt-AA</td><td>mut-AA</td><td>pssm-score</td><td>pssm percent wt</td><td>pssm percent mut</td></tr>
 
<tr><td>pos</td><td>wt-AA</td><td>mut-AA</td><td>pssm-score</td><td>pssm percent wt</td><td>pssm percent mut</td></tr>
Line 72: Line 74:
 
<tr><td>429</td><td>R</td><td>H</td><td>-1</td><td>26</td><td>1</td></tr>
 
<tr><td>429</td><td>R</td><td>H</td><td>-1</td><td>26</td><td>1</td></tr>
 
</table>
 
</table>
  +
</td><td>
  +
<table border=1>
  +
<tr><td>snp</td><td>conservation wt</td><td>conservation mut</td><td>others</td></tr>
  +
<tr><td>L17F</td><td>10/15 (66%)</td><td>0/15 (0%)</td><td>T(1), I(1), R(1), N(1), M(1)</td></tr>
  +
<tr><td>M82I</td><td>10/15 (66%)</td><td>0/15 (0%)</td><td>V(2), L(2), T(1)</td></tr>
  +
<tr><td>Q125E</td><td>15/15 (100%)</td><td>0/15 (0%)</td><td>-</td></tr>
  +
<tr><td>I213T</td><td>10/15 (66%)</td><td>0/15 (0%)</td><td>L(4), M(1)</td></tr>
  +
<tr><td>C258Y</td><td>12/15 (80%)</td><td>0/15 (0%)</td><td>S(1), A(2)</td></tr>
  +
<tr><td>T310R</td><td>12/15 (80%)</td><td>0/15 (0%)</td><td>M(1), V(2)</td></tr>
  +
<tr><td>A328T</td><td>14/15 (93%)</td><td>0/15 (0%)</td><td>C(1)</td></tr>
  +
<tr><td>I361V</td><td>13/15 (86%)</td><td>1/15 (7%)</td><td>L(1)</td></tr>
  +
<tr><td>N404S</td><td>10/15 (66%)</td><td>0/15 (0%)</td><td>A(1), D(1), P(2), H(1)</td></tr>
  +
<tr><td>R429H</td><td>8/15 (53%)</td><td>0/15 (0%)</td><td>E(2), D(2), Q(1), A(1), K(1)</td></tr>
  +
</table>
  +
</td></tr></table>
   
 
== Tools ==
 
== Tools ==
Line 147: Line 164:
   
 
By simply using the consensus between all three predictions we allready were able to correctly classify five of the ten SNPs. For the contradicting predictions in most cases it helped to consider the chemical properties and the size of the amino acids. In the cases of the A328T and R429 mutation we feel like the the change in properties might or might not be large enough to safely call a mutation deleterious or not.
 
By simply using the consensus between all three predictions we allready were able to correctly classify five of the ten SNPs. For the contradicting predictions in most cases it helped to consider the chemical properties and the size of the amino acids. In the cases of the A328T and R429 mutation we feel like the the change in properties might or might not be large enough to safely call a mutation deleterious or not.
  +
Overall from the prediction methods SNAP2 performed best on our SNPs and predicted nine out of ten mutations correctly, while both SIFT and PolyPhen only got 7 out of ten right. It should be kept in mind however that our test set is very small and thereby might not be representative for all cases of SNPs. So just applying SNAP2 might not be ideal for every situation. Using multiple prediction methods also helps by detecting 'easier' cases, where no closer manual inspection might be necessary, which can help if the number of mutations is larger than in our case.
   
 
= References =
 
= References =

Latest revision as of 18:08, 17 November 2012

Sequence-based mutation analysis

Task description

For this Task the group that reviewed our page last week was to chose 10 SNP's on which we can work on. We assume we don't know which of the SNP's cause MSUP or affects the protein structure or function. As we haven't received any message from the group which reviewed our page last week by thursday, we've decided to chose the 10 SNP's by ourselves: L17F, M82I,Q125E, I213T, C258Y, T310R, A328T, I361V, R429H, N404S

Comparison of wild type AA and mutant AA

Physiological properties<ref>http://en.wikipedia.org/wiki/Amino_acid</ref> and predicted secondary structure element (reprof with pssm)

Positionwt-AAwt propertiesmutant-AAmutant propertiesexpected effect on proteinsec-struct-element( reprof )
17L-Hydrophobic side chain, non-polar, charge: neutralFHydrophobic side chain, non-polar, charge: neutral-L
82MHydrophobic side chain, non-polar, charge: neutralIHydrophobic side chain, non-polar, charge: neutral-E
125Qpolar, charge: neutralEpolar, charge: negative+L
213IHydrophobic side chain, non-polar, charge: neutralTpolar, charge: neutral+H
258Cpolar, charge: neutralYpolar, charge: neutral-L
310Tpolar, charge: neutralRpolar, charge: positive+H
328Anon-polar, charge: neutralTpolar, charge: neutral-E
361IHydrophobic side chain, non-polar, charge: neutralVnon-polar, charge: neutral-H
404Npolar, charge: neutralSpolar, charge: neutral-L
429Rpolar, charge: positiveHpolar, charge: positive( 10% ) neeutral( 90% )+H

Blosum62, PAM1, PAM250 scores

The scores for an aminoacid change in the matrices Blosum and PAM should give another hint, whether the substitution has an effect on the resulting protein or not. The higher the X in BlosumX, the shorter is the evolutionary context of sequences it is calculated for, while in PAMX the opposite is the case.

Positionwt-AAmutant-AABlosum62-Score<ref>http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm</ref> is PAM1-Score<ref>http://www.icp.ucl.ac.be/~opperd/private/pam1.html</ref>PAM250-Score<ref>http://www.icp.ucl.ac.be/~opperd/private/pam250.html</ref>
17LF01313
82MI152
125QE2277
213IT-274
258CY-234
310TR-125
328AT-13211
361IV1339
404NS1205
429RH0106

3D-Structure

We used pyMols Mutagenesis Wizard to visualize the individual Mutations. In each image the C-chain of the original residue is coloured green, that of the SNP is coloured pink. Oxygen atoms are coloured red and nitrogen blue, while sulphur is coloured yellow for both residues. The L17F SNP is not displayed because it was not in the pdb file we used (1DTW). The easiest to spot changes that look like the might have an effect are C258Y, T310R and R429H. In case of the C258Y mutation the protein looses a sulphur atom and gains an aromatic ring, while the T310R mutation greatly increases the size of the amino acid. In case of the R429H mutation the protein again gains an aromatic ring. While the Q125E mutation is similar in size to the original amino acid the change in charge is very likely to have an effect for the protein.

M82I
Q125E
I213T
C258Y
T310R
A328T
I361V
N404S
R429H

PSI-BLAST - PSSM // homolog MSA

The command we used to create the PSSM-file, the file itsself, and the MSA can be found in the journal

The underlying database is big_db. To interpret these values better, we should find all mammalian hologs of the gene, build a MSA out of these, and calculate the conservation of wild type and mutant in this context. We used HomoloGene<ref>http://www.ncbi.nlm.nih.gov/homologene</ref> from NCBI for this task. Because the result was, there are no mammalian homologs, we just used all results from the search, which can be found at <ref>http://www.ncbi.nlm.nih.gov/sites/homologene/569</ref>.

poswt-AAmut-AApssm-scorepssm percent wtpssm percent mut
17LF1319
82MI1286
125QE-1970
213IT-2412
258CY-4290
310TR-3600
328AT25912
361IV26313
404NS21916
429RH-1261
snpconservation wtconservation mutothers
L17F10/15 (66%)0/15 (0%)T(1), I(1), R(1), N(1), M(1)
M82I10/15 (66%)0/15 (0%)V(2), L(2), T(1)
Q125E15/15 (100%)0/15 (0%)-
I213T10/15 (66%)0/15 (0%)L(4), M(1)
C258Y12/15 (80%)0/15 (0%)S(1), A(2)
T310R12/15 (80%)0/15 (0%)M(1), V(2)
A328T14/15 (93%)0/15 (0%)C(1)
I361V13/15 (86%)1/15 (7%)L(1)
N404S10/15 (66%)0/15 (0%)A(1), D(1), P(2), H(1)
R429H8/15 (53%)0/15 (0%)E(2), D(2), Q(1), A(1), K(1)

Tools

Comparison

Figure 1. Venn-Diagram of the SNPs predicted from Polyphen, SIFT and SNAP that are assumed to have a negative effect on protein functionality.

A visual comparison of the predictions can be seen in Figure 1. For the comparison we only used results from SNAP2. The first observation made when comparing the three predictions is, that the number of SNPs that are predicted to have a negative effect greatly varies between the tools. While PolyPhen predicts seven of the SNPs to be deleterious, SIFT predicts five and SNAP only four to impact protein function. All three methods predicted the SNPs Q125E, T310R and C258Y to have a negative effect, which increses our confidence in the quality of these predictions. PolyPhen and SIFT or SNAP respectively both predicted the L17F and I213T mutation to be deleterious again increasing our confidence in these predictions. None of the tools predicted I361V or N404S to be deleterious and the R429H, M82I and A328T SNPs were only predicted by either SIFT or Polyphen, making them more unlikely.

Polyphen2

SNPpolyphen-scoresensitivityspecificitydescription
L17F0.9950.680.97PROBABLY DAMAGING
M82I0.4680.890.90POSSIBLY DAMAGING
Q125E0.5440.880.91POSSIBLY DAMAGING
I213T0.6440.870.91POSSIBLY DAMAGING
C258Y1.0000.001.00PROBABLY DAMAGING
T310R1.0000.001.00PROBABLY DAMAGING
A328T0.9990.140.99PROBABLY DAMAGING
I361V0.3860.900.89BENIGN
N404S0.0001.001.00BENIGN
R429H0.0020.990.30BENIGN

Snap/Snap2

SNPsnap-predsnap-relsnap-expected-accsnap2-predsnap2-relsnap2-expected-acc
L17FNeutral378%Neutral682%
M82INeutral485%Neutral472%
Q125ENon-neutral163%Non-neutral471%
I213TNeutral378%Non-neutral680%
C258YNon-neutral270%Non-neutral680%
T310RNon-neutral163%Non-neutral680%
A328TNeutral485%Neutral157%
I361VNeutral589%Neutral893%
N404SNeutral794%Neutral997%
R429HNeutral692%Neutral472%

SIFT

SNPsift-predictionsift-scoresift-conservationsequences represented at pos
L17FAFFECT PROTEIN FUNCTION0.004.321
M82Ibe TOLERATED0.263.1217
Q125EAFFECT PROTEIN FUNCTION0.023.0219
I213Tbe TOLERATED0.223.0219
C258YAFFECT PROTEIN FUNCTION0.023.0219
T310RAFFECT PROTEIN FUNCTION0.013.0219
A328Tbe TOLERATED1.003.0219
I361Vbe TOLERATED0.063.0219
N404Sbe TOLERATED0.993.0219
R429HAFFECT PROTEIN FUNCTION0.033.2115

Discussion

Given that all three prediction methods agree we will assume the mutations Q125E, T310R and C258Y to be deleterious. As neither I361V or N404S were predicted by a single we will assume them to be neutral.

For the less certain negative effect predictions L17F and I213T we tried to classify based on difference in amino acid properties and visual inspection. For L17F we believe that the change from L to F will not have a major effect for the protein function. The reason for this is that the mutation is inside a loop so a mutation here is less likely to affect the protein structure and therby the function. Also Leu end Phe are in both properties and size. The I213T mutation however causes a large change in polarity and is inside a helix. Both of these reasons increase the likelyhood that this SNP is non-neutral.

The SNPs M82I, A328T and R429H were all predicted by two methods to have no effect on protein function. While M82I and A328Tis both are inside a sheet, the mutation in M82I is not a large change in terms of size or chemical properties. The A328T mutation however is a change in polarity so the change might have an effect. For R429H we assume that the change to a amino acid with an aromatic ring inside a helix must have some kind of effect on the protein.

The actual results in HGMD for our SNPs:

SNPFinal PredictionHGMD
L17Fneutralneutral
M82Ineutralneutral
Q125Enegativenegative
I213Tnegativenegative
C258Ynegativenegative
T310Rnegativenegative
A328Tuncertainnegative
I361Vneutralneutral
N404Sneutralneutral
R429Huncertainneutral

By simply using the consensus between all three predictions we allready were able to correctly classify five of the ten SNPs. For the contradicting predictions in most cases it helped to consider the chemical properties and the size of the amino acids. In the cases of the A328T and R429 mutation we feel like the the change in properties might or might not be large enough to safely call a mutation deleterious or not. Overall from the prediction methods SNAP2 performed best on our SNPs and predicted nine out of ten mutations correctly, while both SIFT and PolyPhen only got 7 out of ten right. It should be kept in mind however that our test set is very small and thereby might not be representative for all cases of SNPs. So just applying SNAP2 might not be ideal for every situation. Using multiple prediction methods also helps by detecting 'easier' cases, where no closer manual inspection might be necessary, which can help if the number of mutations is larger than in our case.

References

<references />