Difference between revisions of "Gaucher Disease: Task 08 - LabJournal"

From Bioinformatikpedia
(PSSM)
(PSSM)
Line 43: Line 43:
 
</code>
 
</code>
   
The second PSSM is based on an alignment of all mamalian homologous. We identified the homologs on Uniprot. The Blast seaerch was run on the mamalian database.
+
The second PSSM is based on an alignment of all mamalian homologous. We identified the homologs on Uniprot. The BlastP seaerch was run on the mamalian database. For P04062 we found 169 homologous sequences.

Revision as of 11:24, 31 August 2013

Choosing Mutation set

The cDNA sequence used by HGMD has the accession nr. NM_001005741.2). The one letter code of this cDNA has a 100% sequence identity to the reference sequence of Uniprot, which we used for all tasks. Therefore,the exact positions of the mutations listed by HGMD can be taken. The accession number is also listed by dbSNP, so that the right mutation position can be seen.

To choose mutations from dbSNP we selected four point mutations of the SNP Geneview Report of glucocerebrosidase with the NP_001005741.1 which was already used in task 7. For HGMD we randomly picked six mutations from the snp list of the NP_001005741.1.

Mutation Analysis

The property information about the amino acids were taken from Wikipedia.

The visualisation of the mutations was done with pymol. For the mutation of the residues we followed the description of the PymolWiki We considered again the sequence shift of 39 residues between uniprot and pdb sequence.

We took the information about the secondary structure from our previous task 3. We alway choosed the secondary structure type which was predicted from the majority of the prediction tools in that task.

The scores for the mutations were looked up in two substitution matrices: BLOSUM62 and PAM250

PSSM

We created different PSSM matrices. The first was generated with PsiBlast:

blastpgp -j 5 -h 10e-6
-i /mnt/home/student/gerkej/gaucher/task8/P04062.fasta 
-d /mnt/project/pracstrucfunc13/data/big/big_80  
-o /mnt/home/student/gerkej/gaucher/task8/psiBl/big80_it5.out 
-Q /mnt/home/student/gerkej/gaucher/task8/psiBl/big80_it5.pssm 
-C /mnt/home/student/gerkej/gaucher/task8/psiBl/big80_it5.chk

PSSM matrix

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  77 S    0  1  0  1 -1  1  1 -1  0 -1 -2  0 -1 -3 -3  1  1 -4 -2  0    7   9   4   8   1   7  11   3   2   4   4   6   1   1   0  11  11   0   1   8  0.14 inf
 141 N   -1  0  2  4 -2  2  2 -3 -1 -3 -3  1 -2 -5 -4  0  1  1 -2 -3    5   4  10  20   1  10  15   2   1   1   3   7   1   0   0   7   7   2   1   2  0.40 inf
 159 R   -6  8 -6 -7 -8 -4 -5 -7 -6 -8 -4  2 -6 -7 -7 -3 -6  1 -7 -7    0  81   0   0   0   0   0   0   0   0   2  11   0   0   0   2   0   2   0   0  2.56 inf
 213 L   -3 -5 -2 -6 -4 -5 -5 -2 -5  4  2 -5  0  3 -5 -4 -1  3  0  3    2   0   2   0   0   0   0   3   0  23  22   0   2  13   0   0   4   4   2  21  0.72 inf
 241 G   -1  2  0  2 -4  1 -1  1 -1 -2 -1  3 -2 -3 -1 -1 -1  0 -1 -2    5  11   5  11   0   7   3  11   2   2   6  17   1   1   4   3   4   2   3   3  0.19 inf
 349 V    0 -3 -2 -3  0 -4 -3  2 -1  0 -1 -4 -1  4 -5  0  0  5  2  2   10   1   1   1   2   0   1  14   2   5   4   0   2  17   0   6   6   9   7  14  0.42 inf
 408 T   -1  1  2  2  0  2  0  2  2 -2 -2  0 -1 -1 -1  0  0 -4  0 -2    3   7  10  11   2   9   5  19   5   2   3   4   2   2   2   5   4   0   3   3  0.19 inf
 409 N    1 -2  1  1  1 -2 -1  0  1 -1 -1 -1  0  1  0  1  0  2  1 -1   11   0  10   9   4   0   3   6   3   4   4   3   3   6   4   9   6   4   7   4  0.09 inf
 483 L   -2 -3 -2 -3 -4 -3 -3  0 -2  3  3 -2  0  0 -3 -3 -2 -4 -1  3    2   1   2   2   0   1   1   9   1  13  29   3   2   4   1   2   2   0   2  24  0.46 inf
 501 N   -6 -6  9 -3 -8 -4 -3 -6 -5 -6 -8 -6 -3 -8 -5 -2 -2 -8 -7 -6    1   0  87   1   0   0   2   0   0   0   0   0   1   0   1   3   3   0   0   1  2.83 inf

The second PSSM is based on an alignment of all mamalian homologous. We identified the homologs on Uniprot. The BlastP seaerch was run on the mamalian database. For P04062 we found 169 homologous sequences.