Sequence-based mutation analysis GLA

From Bioinformatikpedia
Revision as of 00:28, 30 June 2011 by Drexler (talk | contribs) (Secondary Structure)

by Benjamin Drexler and Fabian Grandke

Introduction

Selected Mutations

We randomly selected ten annotated point mutations of the human gene GLA and they were chosen out of a pool of mutations that consist of two subsets. The first subset contains mutations that are present in HGMD and these mutations were already gathered in the task 4 Mapping SNPs. The second subset are mutations that are present in dbSNP, but not included in HGMD. This was only the case for three mutations.

Mutations at the amino acid position between 1 and 31 were not included in the selection process, because they are part of the signal peptide (see UniProt entry) and they are not present in the reference structure (PDB ID 1R47).

Number AA-Position Codon change Amino acid change Visualization
1 42 ATG-ACG Met -> Thr
Close-up of the residue number 42 of GLA.
2 65 AGT-ACG Ser -> Thr
Close-up of the residue number 65 of GLA.
3 117 ATT-AGT Ile -> Ser
Close-up of the residue number 117 of GLA.
4 143 cGCA-ACA Ala -> Thr
Close-up of the residue number 143 of GLA.
5 186 CAC-CGC His -> Arg
Close-up of the residue number 186 of GLA.
6 205 gCCT-ACT Pro -> Thr
Close-up of the residue number 205 of GLA.
7 244 gGAC-CAC Asp -> His
Close-up of the residue number 244 of GLA.
8 283 CAG-CCG Gln -> Pro
Close-up of the residue number 283 of GLA.
9 321 tCAG-TAG Gln -> Glu
Close-up of the residue number 321 of GLA.
10 363 TATa-TAA Tyr -> Cys
Close-up of the residue number 363 of GLA.

The visualization was done by using PyMol and the mutagensis of the residue was performed according to this tutorial. The residue of the wildtype is colored green and the mutated residue is colored red.

Mutation Analysis

Physicochemical Properties and Changes

Substitution Matrices

In this section, we take a look at substitution matrices to evaluate whether the introduced substitution of the mutation is favorable in a biological context. For this, we use two different kinds of substitution matrices. First, Blocks of Amino Acid Substitution Matrix (BLOSUM) is a evidence based matrix which is calculated of alignments between proteins <ref name=blosum>en.wikipedia.org/wiki/BLOSUM</ref>. Second, Point Accepted Mutation or Percent Accepeted Mutation (PAM) is a set of matrices that is derived of from the amino acid substitutions between closely related proteins <ref name=pam>en.wikipedia.org/wiki/Point_accepted_mutation</ref>. In general, a high value in a substitution matrix indicates a more likely substitution.


Number Substitution BLOSUM62 PAM1 PAM250
Mutation Best1 Worst2 Mutation Best Worst Mutation Best Worst
1 Met -> Thr -1 2 -3 2 8 0 1 3 0
2 Ser -> Thr 1 1 -3 38 38 0 9 9 3
3 Ile -> Ser -2 2 -4 1 33 0 3 9 1
4 Ala -> Thr -1 1 -3 32 35 0 11 12 2
5 His -> Arg 0 2 -3 8 20 0 5 7 2
6 Pro -> Thr 1 1 -4 4 13 0 5 7 1
7 Asp -> His -1 2 -4 4 53 0 6 10 1
8 Gln -> Pro -1 2 -3 6 27 0 4 7 1
9 Gln -> Glu 2 2 -3 27 27 0 7 7 1
10 Arg -> Cys -3 2 -3 1 19 0 2 9 1

1 Best is the highest value in the regarding column/row except for the self-substitution (e.g. Met -> Met).
2 Worst is the lowest value in the regarding column/row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • gray: the substitution value of the mutation has the same absolute difference to both values


The following substitution matrices were used:

PSSM

In this section, we use a point specific scoring matrix (PSSM) to evaluate how the conservation of the wildtype and mutant residue is in related proteins. For this, we used PSI-BLAST with the following command:

blastpgp -i GLA.fasta -j 5 -d /data/blast/nr/nr -e 10E-6 -Q psiblast.mat -o psiblast.out

The relevant values are listed in the following table. The full PSSM of the certain positions is provided on this page.

Number Substitution PSSM
Mutation Wildtype Best1 Worst2
1 Met -> Thr -3 9 9 -6
2 Ser -> Thr 3 3 5 -3
3 Ile -> Ser -3 4 5 -6
4 Ala -> Thr -1 1 2 -5
5 His -> Arg 1 0 2 -5
6 Pro -> Thr -2 2 3 -5
7 Asp -> His 1 4 4 -5
8 Gln -> Pro -5 3 8 -5
9 Gln -> Glu 1 7 7 -5
10 Arg -> Cys -1 2 3 -4

1 Best is the highest value in the regarding row.
2 Worst is the lowest value in the regarding row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • gray: the substitution value of the mutation has the same absolute difference to both values

Multiple Sequence Alignment

We take a look at a multiple sequence alignment (MSA) to evaluate the conservation of the residues which are affected by one of the mutations. We used BLAST to get the sequences for the MSA. A table of the sequences is provided on this page. Afterwards we created two MSAs with the locally installed T-Coffee.

  • MSA with 100 sequences: see here
  • MSA with 25 sequences: see figure 1
Figure 1: multiple sequence alignment of the best 25 sequences by T-Coffee in JalView.

We used the conservation index according to Livingstone C.D. and Barton G.J.<ref name=livingstone>Livingstone C.D. and Barton G.J. (1993), "Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation.", CABIOS Vol. 9 No. 6 (745-756)), PubMed</ref> to determine whether a residue is conserved or not. The conservation index was calculated by JalView and ranges from 0 to 11.

Number Position Conservation (100 sequences) Conservation (25 sequences)
1 42 10 11
2 65 8 11
3 117 9 9
4 143 10 8
5 186 4 2
6 205 10 11
7 244 10 11
8 283 11 11
9 321 11 11
10 363 3 5

Secondary Structure

We examined the potential influence of the mutation on the secondary structure of the α-galactosidase. For this, we used two programs that predict the secondary structure, i.e. PSIPRED and JPred3. Please see task 2 - sequence-based predictions for further explanations of the programs.

We performed one run with the wildtype sequence of α-galactosidase and afterwards ten runs with an isolated mutation, so that the concurrence of two or more mutations at the same time do not influence the prediction. Even though this would be unlikely, since the distance between each pair of mutations should be large enough. The results are listed in the table below. As it can be seen, none of the mutations seem to influence the secondary structure atleast in the prediction.

Number Substitution UniProt PSIPRED PSIPRED
Wildtype Mutation Wildtype Mutation
1 Met -> Thr Beta strand Coil Coil Coil Coil
2 Ser -> Thr Coil Coil Coil Helix Helix
3 Ile -> Ser Helix Coil Coil Helix Helix
4 Ala -> Thr Coil Coil Coil Coil Coil
5 His -> Arg Helix Helix Helix Helix Helix
6 Pro -> Thr Helix Coil Coil Coil Coil
7 Asp -> His Helix Helix Helix Helix Helix
8 Gln -> Pro Helix Helix Helix Helix Helix
9 Gln -> Glu Coil Coil Coil Coil Coil
10 Arg -> Cys Beta strand Beta strand Beta strand Beta strand Beta strand

Programs

SNAP

SIFT

PolyPhen

Discussion

References

<references />