Difference between revisions of "Task6 Hemochromatosis Protocol"

Latest revision as of 17:17, 18 June 2012

Amino acid features

The values were taken from Wiki: Amino Acids and Wiki: Proteinogenic Amino Acids.

BLOSUM62/PAM1/PAM250

BLOSUM62: NCBI: BLAST (last accessed 17th Jun 2012)

   A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
A  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4 
R -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4 
N -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4 
D -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4 
C  0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 
Q -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4 
E -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
G  0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4 
H -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4 
I -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4 
L -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4 
K -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4 
M -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4 
F -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4 
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4 
S  1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4 
T  0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4 
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4 
Y -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4 
V  0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4 
B -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4 
Z -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
X  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4 
* -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1

PAM1: http://www.icp.ucl.ac.be (last accessed 17th Jun 2012)

        Ala  Arg  Asn  Asp  Cys  Gln  Glu  Gly  His  Ile  Leu  Lys  Met  Phe  Pro  Ser  Thr  Trp  Tyr  Val
          A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P    S    T    W    Y    V
Ala A  9867    2    9   10    3    8   17   21    2    6    4    2    6    2   22   35   32    0    2   18
Arg R     1 9913    1    0    1   10    0    0   10    3    1   19    4    1    4    6    1    8    0    1
Asn N     4    1 9822   36    0    4    6    6   21    3    1   13    0    1    2   20    9    1    4    1
Asp D     6    0   42 9859    0    6   53    6    4    1    0    3    0    0    1    5    3    0    0    1
Cys C     1    1    0    0 9973    0    0    0    1    1    0    0    0    0    1    5    1    0    3    2
Gln Q     3    9    4    5    0 9876   27    1   23    1    3    6    4    0    6    2    2    0    0    1
Glu E    10    0    7   56    0   35 9865    4    2    3    1    4    1    0    3    4    2    0    1    2
Gly G    21    1   12   11    1    3    7 9935    1    0    1    2    1    1    3   21    3    0    0    5
His H     1    8   18    3    1   20    1    0 9912    0    1    1    0    2    3    1    1    1    4    1
Ile I     2    2    3    1    2    1    2    0    0 9872    9    2   12    7    0    1    7    0    1   33
Leu L     3    1    3    0    0    6    1    1    4   22 9947    2   45   13    3    1    3    4    2   15
Lys K     2   37   25    6    0   12    7    2    2    4    1 9926   20    0    3    8   11    0    1    1
Met M     1    1    0    0    0    2    0    0    0    5    8    4 9874    1    0    1    2    0    0    4
Phe F     1    1    1    0    0    0    0    1    2    8    6    0    4 9946    0    2    1    3   28    0
Pro P    13    5    2    1    1    8    3    2    5    1    2    2    1    1 9926   12    4    0    0    2
Ser S    28   11   34    7   11    4    6   16    2    2    1    7    4    3   17 9840   38    5    2    2
Thr T    22    2   13    4    1    3    2    2    1   11    2    8    6    1    5   32 9871    0    2    9
Trp W     0    2    0    0    0    0    0    0    0    0    0    0    0    1    0    1    0 9976    1    0
Tyr Y     1    0    3    0    3    0    1    0    4    1    1    0    0   21    0    1    1    2 9945    1
Val V    13    2    1    1    3    2    2    3    3   57   11    1   17    1    3    2   10    0    2 9901

PAM250: http://www.icp.ucl.ac.be (last accessed 17th Jun 2012)

        Ala  Arg  Asn  Asp  Cys  Gln  Glu  Gly  His  Ile  Leu  Lys  Met  Phe  Pro  Ser  Thr  Trp  Tyr  Val
          A    R    N    D    C    Q    E    G    H    I    L    K    M    F    P    S    T    W    Y    V
Ala A    13    6    9    9    5    8    9   12    6    8    6    7    7    4   11   11   11    2    4    9
Arg R     3   17    4    3    2    5    3    2    6    3    2    9    4    1    4    4    3    7    2    2
Asn N     4    4    6    7    2    5    6    4    6    3    2    5    3    2    4    5    4    2    3    3
Asp D     5    4    8   11    1    7   10    5    6    3    2    5    3    1    4    5    5    1    2    3
Cys C     2    1    1    1   52    1    1    2    2    2    1    1    1    1    2    3    2    1    4    2
Gln Q     3    5    5    6    1   10    7    3    7    2    3    5    3    1    4    3    3    1    2    3
Glu E     5    4    7   11    1    9   12    5    6    3    2    5    3    1    4    5    5    1    2    3
Gly G    12    5   10   10    4    7    9   27    5    5    4    6    5    3    8   11    9    2    3    7
His H     2    5    5    4    2    7    4    2   15    2    2    3    2    2    3    3    2    2    3    2
Ile I     3    2    2    2    2    2    2    2    2   10    6    2    6    5    2    3    4    1    3    9
Leu L     6    4    4    3    2    6    4    3    5   15   34    4   20   13    5    4    6    6    7   13
Lys K     6   18   10    8    2   10    8    5    8    5    4   24    9    2    6    8    8    4    3    5
Met M     1    1    1    1    0    1    1    1    1    2    3    2    6    2    1    1    1    1    1    2
Phe F     2    1    2    1    1    1    1    1    3    5    6    1    4   32    1    2    2    4   20    3
Pro P     7    5    5    4    3    5    4    5    5    3    3    4    3    2   20    6    5    1    2    4
Ser S     9    6    8    7    7    6    7    9    6    5    4    7    5    3    9   10    9    4    4    6
Thr T     8    5    6    6    4    5    5    6    4    6    4    6    5    3    6    8   11    2    3    6
Trp W     0    2    0    0    0    0    0    0    1    0    1    0    0    1    0    1    0   55    1    0
Tyr Y     1    1    2    1    3    1    1    1    3    2    2    1    2   15    1    2    2    3   31    2
Val V     7    4    4    4    4    4    4    4    5    4   15   10    4   10    5    5    5   72    4   17

Secondary structure

Sequence and PyMol editing was performed by hand. The SwissModel were created with the WebServer (template: 1a6zC).

PSSM-Matrix and homolog search

The PSSM-Matrix and homolog search was done by using PSI-Blast with the command:

blastpgp -i Q30201.fasta -j 5 -Q outMatrix.txt -o output.txt -d /mnt/project/pracstrucfunc12/data/big/big

To retrieve all homologous proteins, we retained only the results of the fifth iteration and used the following script to extract them:

!usr/bin/perl

use strict;

use warnings;
use autodie;

my $file= shift;

open (FILE, "<$file"); while (<FILE>){ my $currentLine=$_; chomp $currentLine;

if ($currentLine=~m/^..\|(.+)\|.*\d\d\d.*/gi){ print "$1\n"; } } </source>

Because we only wanted mammalian protein sequences, we used Unipot to retrieve the corresponding information to these protein by getting the Flat-formatted file.

To get only mammalian proteins we used the following script on the flat file:

!usr/bin/perl

use strict;

use warnings;
use autodie;

my $file= shift;

my @currentIDs=""; open (FILE, "<$file"); while (<FILE>){ my $currentLine=$_; chomp $currentLine;

if ($currentLine=~m/^AC\s(.*)/gi){ $currentLine=$1;

print "$currentLine 1\n";

$currentLine=~s/\s//gi;

push(@currentIDs, split(/;/,$currentLine)); #print "$1\n"; } elsif ($currentLine=~m/^SQ\s(.*)/gi){ @currentIDs=""; } elsif ($currentLine=~m/^OC.*Mammalia/gi){ # print "Mammalian found, bitches!\n"; foreach my $entry (@currentIDs){ if ($entry ne ""){ print "$entry\n"; } } } } </source>

This way we got all mammalian protein IDs that are homologs to our protein and retrieved their sequences through Uniprot.

As some proteins have multiple IDs noted in the flat file we retained our PSIBlast found IDs by using the

uniq -d

command.

After adding our own sequence, we used these sequences in ClustalW and Muscle to retrieve the MSAs.

clustalw -align -infile=Mammals.fasta -outfile=MammalsClustalw.fasta
muscle -in Mammals.fasta -out MuscleMSAMammals.fasta

Difference between revisions of "Task6 Hemochromatosis Protocol"

Latest revision as of 17:17, 18 June 2012

Contents

Amino acid features

BLOSUM62/PAM1/PAM250

Secondary structure

PSSM-Matrix and homolog search

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools