Difference between revisions of "Sequence Alignments Hemochromatosis"

From Bioinformatikpedia
(Dataset)
(Dataset)
Line 175: Line 175:
 
=== Dataset ===
 
=== Dataset ===
   
  +
<figtable id="msagroup60">
Group 60-99:
 
  +
{| class="wikitable", style="width:30%; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
 
  +
|-
G3QU39 99.14
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Uniprot AC (Group 60-99)
H2PI54 97.41
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Identity
Q6B0J5 95.98
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Comment
G7P2L8 94.54
 
  +
|-
F7GRH8 90.52
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |G3QU39
F7DKE9 86.18
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |99.14
Q9GL42 79.89
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |
F6RUG7 78.16
 
  +
|-
G3THV5 75.21
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |H2PI54
G5BQE5 67.66
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |97.41
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q6B0J5
  +
| style="border-style: solid; border-width: 0 0 0 0" |95.98
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |G7P2L8
  +
| style="border-style: solid; border-width: 0 0 0 0" |94.54
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |F7GRH8
  +
| style="border-style: solid; border-width: 0 0 0 0" |90.52
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |F7DKE9
  +
| style="border-style: solid; border-width: 0 0 0 0" |86.18
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q9GL42
  +
| style="border-style: solid; border-width: 0 0 0 0" |79.89
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |F6RUG7
  +
| style="border-style: solid; border-width: 0 0 0 0" |78.16
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |G3THV5
  +
| style="border-style: solid; border-width: 0 0 0 0" |75.21
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |G5BQE5
  +
| style="border-style: solid; border-width: 0 0 0 0" |67.66
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
|}
  +
</figtable>
   
  +
<figtable id="msagroup40">
Group 00-40:
 
  +
{| class="wikitable", style="width:30%; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
 
  +
|-
P16391 36.03
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Uniprot AC (Group 00-40)
P05534 35.13
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Identity
Q30597 33.72
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Comment
P01900 32.11
 
  +
|-
Q31093 31.74
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |P16391
P14432 29.14
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |36.03
Q860W6 27.74
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |
Q31615 26.38
 
  +
|-
Q31206 25.87
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |P05534
P01921 21.10
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |35.13
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q30597
  +
| style="border-style: solid; border-width: 0 0 0 0" |33.72
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |P01900
  +
| style="border-style: solid; border-width: 0 0 0 0" |32.11
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q31093
  +
| style="border-style: solid; border-width: 0 0 0 0" |31.74
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |P14432
  +
| style="border-style: solid; border-width: 0 0 0 0" |29.14
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q860W6
  +
| style="border-style: solid; border-width: 0 0 0 0" |27.74
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q31615
  +
| style="border-style: solid; border-width: 0 0 0 0" |26.38
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q31206
  +
| style="border-style: solid; border-width: 0 0 0 0" |25.87
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |P01921
  +
| style="border-style: solid; border-width: 0 0 0 0" |21.10
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
|}
  +
</figtable>
   
  +
<figtable id="msagroup00">
Group 00-99:
 
  +
{| class="wikitable", style="width:30%; border-collapse: collapse; border-style: solid; border-width:0px; border-color: #000"
 
  +
|-
Q6B0J5 95.98
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Uniprot AC (Group 00-99)
F7GRH8 90.52
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Identity
F7DKE9 86.18
 
  +
! style="text-align:left; border-style: solid; border-width: 0 0 2px 0" |Comment
Q9GL42 79.89
 
  +
|-
G5BQE5 67.66
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q6B0J5
G1PHG2 57.43
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |95.98
F7C3B3 40.11
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |
Q30597 33.72
 
  +
|-
Q860W6 27.74
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |F7GRH8
P01921 21.10
 
  +
| style="border-style: solid; border-width: 0 0 0 0" |90.52
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |F7DKE9
  +
| style="border-style: solid; border-width: 0 0 0 0" |86.18
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q9GL42
  +
| style="border-style: solid; border-width: 0 0 0 0" |79.89
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |G5BQE5
  +
| style="border-style: solid; border-width: 0 0 0 0" |67.66
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |G1PHG2
  +
| style="border-style: solid; border-width: 0 0 0 0" |57.43
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |F7C3B3
  +
| style="border-style: solid; border-width: 0 0 0 0" |40.11
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q30597
  +
| style="border-style: solid; border-width: 0 0 0 0" |33.72
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |Q860W6
  +
| style="border-style: solid; border-width: 0 0 0 0" |27.74
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
| style="border-style: solid; border-width: 0 0 0 0" |P01921
  +
| style="border-style: solid; border-width: 0 0 0 0" |21.10
  +
| style="border-style: solid; border-width: 0 0 0 0" |
  +
|-
  +
|}
  +
</figtable>
   
 
<br style="clear:both;">
 
<br style="clear:both;">

Revision as of 13:19, 7 May 2012

Henry Frankenstein: Look! It's moving. It's alive. It's alive... It's alive, it's moving, it's alive, it's alive, it's alive, it's alive, IT'S ALIVE!

Victor Moritz: Henry - In the name of God!

Henry Frankenstein: Oh, in the name of God! Now I know what it feels like to be God!

Task Description

Protocol

Sorry for the inconvenience (not beeing able to read something), we're rerunning some data...

Protocol

Reference Sequence

Sequence from Uniprot: Q30201

>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE

Sequence Searches


Blast

<figtable id="blastdist">

Hemo eval blast 80.png
Hemo ident blast 80.png
Table 1: E-Value and identity distributions of the Blast search against Big80.

</figtable>

The first BLAST search against the Big80 database reached the hit limit of 250 sequences with an e-Value of e-30 for the worst hit. So we did it again with a new limit of 1500 reported hits. These hits were then filtered for unique IDs and an e-Value cutoff of 2e-3. After the filtering 1159 hits were left. Only 6 of these were PDB-IDs.

The distributions for the e-Values and identities of these hits are shown in <xr id="blastdist"/>. Most of the e-Values are between 1e-50 and 2e-3 (cutoff). Only few hits have a better e-Value. The identities are piled between 20% and 40% with two peaks at around 27% and 34% respectively.


PSI-Blast

<figtable id="psiblastdist">

Hemo eval psi 80.png
Hemo ident psi 80.png
Table 2: E-Value and identity distributions of the PSI-BLast search against Big80.

</figtable>

After the BLAST search, we also performed several searches with PSI-BLAST. This time we increased the number of reported hits and used a variety of parameter combinations to test their impact on the search results. The parameters to be changed were 'h' and 'j'. The first one, 'h', sets the e-Value cutoff for the inclusion of sequences into the PSI-BLAST-profile. The second one, 'j', is for the number of iterations for the PSI-BLAST search. For each of these parameters we used two different values: 2e-3 and 1e-10 for 'h', 2 and 10 for 'j'. This resulted in a total of 4 combinations.

The first 4 searches were against Big80 with a maximum of 10000 reported hits. We also saved the PSI-BLAST profiles for later (see bellow). The hits for each individual parameter combination were again filtered for unique IDs and an e-Value cutoff of 2e-3. This resulted in the following number of hits:

  • h=2e-3, j=2: 1892 (2786 prefiltered)
  • h=2e-3, j=10: 1704 (2734 prefiltered)
  • h=1e-10, j=2: 2058 (3574 prefiltered)
  • h=1e-10, j=10: 2035 (3458 prefiltered)

<xr id="psiblastdist"/> shows the e-Value and identity distributions for the Big80 results. The differences between the parameter combinations are quite easy to spot. The lower e-Value cutoff (1-e10) also produces more significant hits (lower e-Values). This might be caused by the inclusion of fewer sequences into the profile and therefore a higher specificity for more closely related sequences (i.e. low e-Values). An increased number of iterations on the other hand reduces the number of significant hits.

After the searches against Big80 we also ran PSI-BLAST against the Big database. We reused the profiles from the Big80 runs and also increased the maximum of reported hits to 100000.

  • h=2e-3, j=2: 23840 (25934 prefiltered)
  • h=2e-3, j=10: 25756 (30616 prefiltered)
  • h=1e-10, j=2: 26483 (28766 prefiltered)
  • h=1e-10, j=10: 27535 (29609 prefiltered)

The two combinations with 10 iterations threw multiple error messages (but finished the process nevertheless). These errors were due to an internal code failure of PSI-BLAST and caused because of too many possible hits.

The performances of the different PSI-BLAST runs (see <xr id="psiblastruntime"/>) show that the cutoff for the profiles ('h') doesn't really affect the runtime. The number of iteration on the other hand has a big impact on the runtime. The size of the database, of course, also affects the runtime. The exceptionally high runtime for the 10-iteration runs against Big might also be caused by the errors mentioned above.

<figtable id="psiblastruntime">

Iterations 2 2 10 10
E-Value 0.002 10E-10 0.002 10E-10
Big80 3m21 3m6 16m39 16m41
Big 28m17 26m43 367m15 64m4

Table 3: Runtime analysis of PSI-BLAST. </figtable>


HHblits

<figtable id="hhblitsdist">

Hemo eval hhblits 80.png
Hemo ident hhblits 80.png
E-Value and identity distributions of the HHblits search against Uniprot20.

</figtable>


Comparison

<figtable id="psiblastoverlap">

Hemo Overlap PSI.png
Hemo Overlap PSI BIG.png
Hemo Overlap PSI BLAST.png
Nothing to see here....

</figtable>

<figtable id="alloverlap">

Hemo Overlap All 80.png
Hemo Overlap All BIG.png
Hemo Overlap PSI HHB BIG.png
Nothing to see here....

</figtable>


Multiple Sequence Alignments


Dataset

<figtable id="msagroup60">

Uniprot AC (Group 60-99) Identity Comment
G3QU39 99.14
H2PI54 97.41
Q6B0J5 95.98
G7P2L8 94.54
F7GRH8 90.52
F7DKE9 86.18
Q9GL42 79.89
F6RUG7 78.16
G3THV5 75.21
G5BQE5 67.66

</figtable>

<figtable id="msagroup40">

Uniprot AC (Group 00-40) Identity Comment
P16391 36.03
P05534 35.13
Q30597 33.72
P01900 32.11
Q31093 31.74
P14432 29.14
Q860W6 27.74
Q31615 26.38
Q31206 25.87
P01921 21.10

</figtable>

<figtable id="msagroup00">

Uniprot AC (Group 00-99) Identity Comment
Q6B0J5 95.98
F7GRH8 90.52
F7DKE9 86.18
Q9GL42 79.89
G5BQE5 67.66
G1PHG2 57.43
F7C3B3 40.11
Q30597 33.72
Q860W6 27.74
P01921 21.10

</figtable>


CLustalW


Muscle


T-Coffee


3D-Coffee