Lab journal task 8

From Bioinformatikpedia
Revision as of 16:53, 24 August 2013 by Betza (talk | contribs) (Mutation analysis)

Mutation selection

10 mutations were randomly selected from HGMD and dbSNP.

Mutation analysis

The description of the physicochemical properties is based on the entry for amino acids in wikipedia.

The mutations were visualized with Pymol. Because the pdb structure 1A6Z starts at position 22 in the reference structure, we subtracted 22 from the codon position to get the position of the mutation in the structure. The mutatins were done following the description in Use PyMOL for this. We did mutations for the first 9 SNPs but the last one (Arg330Met) could not be visualized, because the pdb structure is shorter than the reference sequence and only contains the residues 22 to 297. The rotamer for each mutated residue was selected based on the orientation and the size and color of the discs. We selected the rotamers with the least and smallest red discs if there was none without. For residues that are located on the border of the protein, we also tried to find rotamers that are not pointed into the solvent.

The secondary structure of the location of the mutation was taken from the DSSP assignment of the 1A6Z_A structure.

The BLOSUM62 matrix was taken from BLOSUM62 and the PAM250 matrix from PAM250.

PSSM fom PsiBlast with 5 iterations and default parameters using the /mnt/project/pracstrucfunc13/data/big/big_80 database:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -j 5 -o /mnt/home/student/betza/task8/psiblast/iter5_big80.results
 -Q /mnt/home/student/betza/task8/psiblast/iter5_big80.pssm -C /mnt/home/student/betza/task8/psiblast/iter5_big80.chk

The resulting PSSM is the following (only the 10 mutation positions are shown):

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V	
  53 V   -3 -6 -7 -7 -3 -6 -6 -7 -2  0  4 -6  1 -1 -6 -4 -5 -6 -5  6    2   0   0   0   1   0   0   0   1   4  37   0   3   2   0   1   0   0   0  47  1.32 inf
  63 H   -3 -1  2 -3  1 -5 -4 -1  0 -4 -5  0 -3 -7 -3  6 -1 -2 -7 -5    1   3   8   2   3   0   1   5   2   1   1   5   1   0   1  62   4   1   0   1  1.26 inf
  67 R   -3  5  0 -3 -3  2  0 -3 -1 -2 -3  3 -2 -6 -2  0  2 -1 -5 -4    2  33   4   1   1   7   5   2   1   2   2  16   1   0   2   5  13   1   0   1  0.69 inf
  97 M   -1  0  0  0  2  0  0 -1  2  0 -1  0  2  1  0 -1  0  4  1 -1    5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5  0.09 inf
 130 N   -4  0  2  0 -2 -2 -1  5 -2 -2 -4  0 -1 -4 -4 -1  2 -1 -6 -3    1   5  10   4   1   2   4  40   1   3   2   5   1   1   1   5  11   1   0   2  0.67 inf
 168 E   -3  1 -1 -2 -3 -1  0 -3 -2  0  1  4  0  0 -3 -3 -3  0  2  0    2   6   4   2   1   2   7   2   1   6  14  22   2   4   2   3   1   1   7   6  0.29 inf
 183 L   -4 -3 -3 -4 -2 -4 -4 -5 -1  1  5 -2  2 -1 -4 -3 -3 -1  1  2    2   2   1   1   1   1   1   1   1   6  45   3   4   3   1   2   2   1   5  13  0.68 inf
 217 T    0 -3 -2  0 -5 -1 -1  1 -1 -3  1 -2 -1 -3  3  3  0 -6 -1 -1    8   1   1   4   0   3   3  10   1   1  12   2   1   1  15  22   6   0   2   5  0.30 inf
 282 C   -6 -6 -8 -9 12 -8 -9 -8 -8 -7 -7 -8 -7 -5 -8 -5 -6 -8 -5 -6    0   1   0   0  94   0   0   0   0   0   0   0   0   1   0   1   0   0   1   0  4.31 inf
 330 R   -5  5 -4 -6  1 -1 -5 -6  2 -5 -5  3 -3  0 -5 -4 -5  8  4 -6    1  28   1   0   3   2   0   0   4   1   1  18   0   3   1   1   0  23  14   0  1.41 inf

The mutations are marked in purple.

For the creation of the MSA, we first searched for homologous mammalian sequences with the NCBI BlastP online tool in the UniprotKB/SwissProt and restricted the organisms to Mammalia. The E-value cutoff was set to 0.1 and Psi-Blast was used as Algorithm. The maximum target sequences threshold was set to 100, bacause we only wanted to get the close homologs. All other parameters were left as default. We performed two iterations and then downloaded all matched sequences in fasta format. Those sequences were then used as input for ClustalW. The MSA was gerenerated with default parameters. Jalview was then used to save the alignment in fasta format. The commandline version of Blast 2.2.25+ was then used to generate a PSSM for the HFE protein Q30201 from the MSA:

psiblast -subject Q30201.fasta -in_msa alignment.fa -out_ascii_pssm pssm.txt

The resulting PSSM is the following:

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  53 V    0 -2  1 -2 -3  1  0  0  2  0  0  0 -1  1 -3  0  0  4  1  0    7   0   7   0   0   7   7   7   7   7   7   7   0   7   0   7   7   7   7   7  0.11 inf
  63 H    0  1  0 -1  0  1  0  1  0 -2  0  1 -2 -1  0  1  0 -3 -2 -1    6   7   5   3   3   6   5  10   3   0  10  14   0   3   3  15   4   0   0   4  0.09 inf
  67 R   -2  3 -1 -1 -4  1 -1 -4  1  1 -1  0  1 -3 -1  0  0  4  0  0    2  20   3   3   0   6   2   0   4   9   6   5   4   0   3   7   8   8   4   6  0.26 inf
  97 M   -1 -1 -1 -1  1 -1 -1 -1 -3  1  0 -1  2  0 -1  0  1  2  1  2    4   3   3   4   3   3   3   3   0   7  10   3   7   3   3   5  12   3   5  16  0.10 inf
 130 N   -1  2  2  2 -4  0  1  2  1 -3 -1  0  0 -3  0 -1 -1  0 -3 -2    3  12  12  13   0   2   9  17   4   0   7   5   2   0   5   2   3   2   0   2  0.23 inf
 168 E   -1  0  0  1  2  1  0 -2 -3  1  2  2 -1  1 -3 -2 -1 -3 -2 -2    3   3   5   7   5   5   4   2   0   9  25  19   0   8   0   0   3   0   0   0  0.18 inf
 183 L   -1 -4 -1 -4  3 -1 -4  0 -4  1  4 -4  2 -3  0  0 -3 -4 -4 -1    4   0   3   0   6   3   0   9   0   7  47   0   6   0   4   9   0   0   0   2  0.50 inf
 217 T    1  0  0  0 -2  0  0  0 -2  0  0 -1  1  0  0  0  0 -3 -2  0   15   7   4   5   0   5   5   9   0   6  12   0   6   5   5   4   5   0   0   7  0.04 inf
 282 C    0 -4  1 -4  8  1 -4 -4 -4 -3 -1 -4 -3 -3 -4  1 -3 -4  1  1    7   0   7   0  46   8   0   0   0   0   7   0   0   0   0  10   0   0   6  10  1.16 inf
 330 R    0 -1 -2 -3  2 -1 -3  3 -1 -1  0 -1  0 -1  1 -1  0  4  0  0    6   4   1   0   6   2   0  25   1   2   8   5   3   2   6   2   6   7   3   9  0.22 inf


          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  53 V    0 -3 -3 -4 -1 -3 -3 -4 -3  2  1 -3  1 -1 -3 -2  0 -3 -1  5    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 100  0.47 inf
  63 H    0 -1  0 -1 -2  0  0 -1  5 -3 -3 -1 -2 -2 -1  4  1 -3 -1 -2    0   0   0    0   0   0   0   0  26   0   0   0   0   0   0  74   0   0   0   0  0.47 inf
  67 R   -2  6 -1 -2 -4  1  0 -3 -1 -3 -3  2 -2 -3 -2 -1 -1 -3 -2 -3    0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  0.91 inf
  97 M    0 -2 -1 -2 -1 -1 -1 -2 -2  0  0 -1  3 -2 -2  1  4  1 -2  0    1   0   0   0   0   0   0   0   0   4   0   0  23   0   0   7  62   2   0   0  0.37 inf
 130 N   -2  6  3 -1 -4  0 -1 -2  0 -4 -3  1 -2 -3 -1 -1 -1 -3 -2 -3    0  75  23   0   0   0   0   0   0   0   0   0   0   0   2   0   0   0   0   0  0.73 inf
 168 E   -1  1 -1  0 -4  1  3 -2 -1 -3 -3  5 -2 -4 -1 -1 -1 -3 -2 -3    0   0   0   0   0   0  23   0   0   0   0  77   0   0   0   0   0   0   0   0  0.62 inf
 183 L   -2 -2 -4 -4 -1 -2 -3 -4 -3  1  4 -3  2  0 -3 -3 -1 -2 -1  1    0   0   0   0   0   0   0   0   0   0  96   0   4   0   0   0   0   0   0   0  0.51 inf
 217 T    1  2 -2 -2 -1 -1 -1 -2 -2  2  0  0  0 -2 -2  0  1 -3 -2  2   15  25   0   0   0   0   0   0   0  12   0   0   0   0   0   0  12   0   0  35  0.15 inf
 282 C   -1 -4 -3 -4  9 -3 -4 -3 -3 -2 -2 -3 -2 -3 -3 -1 -1 -3 -3 -1    0   0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  1.41 inf
 330 R   -1  3 -2 -3 -2  0 -2 -3 -2  1  1  0  5 -1 -3 -2 -1 -2 -2  1    0  30   0   0   0   0   0   0   0  14   6   0  47   0   0   0   0   0   0   3  0.31 inf




  67 R   -2  6 -1 -2 -4  1  0 -3 -1 -3 -3  2 -2 -3 -2 -1 -1 -3 -2 -3    0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  0.91 inf
  97 M    0 -2 -1 -2 -1 -1 -1 -2 -2  0  0 -1  3 -2 -2  1  4  1 -2  0    1   0   0   0   0   0   0   0   0   4   0   0  23   0   0   7  62   2   0   0  0.37 inf
 130 N   -2  6  3 -1 -4  0 -1 -2  0 -4 -3  1 -2 -3 -1 -1 -1 -3 -2 -3    0  75  23   0   0   0   0   0   0   0   0   0   0   0   2   0   0   0   0   0  0.73 inf
 168 E   -1  1 -1  0 -4  1  3 -2 -1 -3 -3  5 -2 -4 -1 -1 -1 -3 -2 -3    0   0   0   0   0   0  23   0   0   0   0  77   0   0   0   0   0   0   0   0  0.62 inf
 183 L   -2 -2 -4 -4 -1 -2 -3 -4 -3  1  4 -3  2  0 -3 -3 -1 -2 -1  1    0   0   0   0   0   0   0   0   0   0  96   0   4   0   0   0   0   0   0   0  0.51 inf
 217 T    1  2 -2 -2 -1 -1 -1 -2 -2  2  0  0  0 -2 -2  0  1 -3 -2  2   15  25   0   0   0   0   0   0   0  12   0   0   0   0   0   0  12   0   0  35  0.15 inf
 282 C   -1 -4 -3 -4  9 -3 -4 -3 -3 -2 -2 -3 -2 -3 -3 -1 -1 -3 -3 -1    0   0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  1.41 inf
 330 R   -1  3 -2 -3 -2  0 -2 -3 -2  1  1  0  5 -1 -3 -2 -1 -2 -2  1    0  30   0   0   0   0   0   0   0  14   6   0  47   0   0   0   0   0   0   3  0.31 inf