Difference between revisions of "Sequence Alignments Hemochromatosis"

From Bioinformatikpedia
(Blast)
(Blast)
Line 28: Line 28:
   
 
The first BLAST search against the Big80 database reached the hit limit of 250 sequences with an e-Value of e-30 for the worst hit. So we did it again with a new limit of 1500 reported hits. These hits were then filtered for unique IDs and an e-Value cutoff of 2e-3. After the filtering 1159 hits were left. Only 6 of these were PDB-IDs.
 
The first BLAST search against the Big80 database reached the hit limit of 250 sequences with an e-Value of e-30 for the worst hit. So we did it again with a new limit of 1500 reported hits. These hits were then filtered for unique IDs and an e-Value cutoff of 2e-3. After the filtering 1159 hits were left. Only 6 of these were PDB-IDs.
 
The distributions for the e-Values and identities of these hits are shown in <xr id="blastdist"/>. Most of the e-Values are between 1e-50 and 2e-3 (cutoff). Only few hits have a better e-Value. The identities are piled between 20% and 40% with two peaks at around 27% and 34% respectively.
 
   
 
<figtable id="blastdist">
 
<figtable id="blastdist">
Line 45: Line 43:
 
|}
 
|}
 
</figtable>
 
</figtable>
  +
  +
The distributions for the e-Values and identities of these hits are shown in <xr id="blastdist"/>. Most of the e-Values are between 1e-50 and 2e-3 (cutoff). Only few hits have a better e-Value. The identities are piled between 20% and 40% with two peaks at around 27% and 34% respectively.
   
 
=== PSI-Blast ===
 
=== PSI-Blast ===

Revision as of 11:07, 7 May 2012

Henry Frankenstein: Look! It's moving. It's alive. It's alive... It's alive, it's moving, it's alive, it's alive, it's alive, it's alive, IT'S ALIVE!

Victor Moritz: Henry - In the name of God!

Henry Frankenstein: Oh, in the name of God! Now I know what it feels like to be God!

Task Description

Protocol

Sorry for the inconvenience (not beeing able to read something), we're rerunning some data...

Protocol

Reference Sequence

Sequence from Uniprot: Q30201

>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE

Sequence Searches

Blast

The first BLAST search against the Big80 database reached the hit limit of 250 sequences with an e-Value of e-30 for the worst hit. So we did it again with a new limit of 1500 reported hits. These hits were then filtered for unique IDs and an e-Value cutoff of 2e-3. After the filtering 1159 hits were left. Only 6 of these were PDB-IDs.

<figtable id="blastdist">

Hemo eval blast 80.png
Hemo ident blast 80.png
Table 1: E-Value and identity distributions of the Blast search within Big80.

</figtable>

The distributions for the e-Values and identities of these hits are shown in <xr id="blastdist"/>. Most of the e-Values are between 1e-50 and 2e-3 (cutoff). Only few hits have a better e-Value. The identities are piled between 20% and 40% with two peaks at around 27% and 34% respectively.

PSI-Blast

After the BLAST search, we also performed several searches with PSI-BLAST. This time we increased the number of reported hits and used a variety of parameter combinations to test their impact on the search results. The parameters to be changed were 'h' and 'j'. The first one, 'h', sets the e-Value cutoff for the inclusion of sequences into the PSI-BLAST-profile. The second one, 'j', is for the number of iterations for the PSI-BLAST search. For each of these parameters we used two different values: 2e-3 and 1e-10 for 'h', 2 and 10 for 'j'. This resulted in a total of 4 combinations.

The first 4 searches were against Big80 with a maximum of 10000 reported hits. We also saved the PSI-BLAST profiles for later (see bellow). The hits for each individual parameter combination were again filtered for unique IDs and an e-Value cutoff of 2e-3. This resulted in the following number of hits:

  • h=2e-3, j=2: 1892 (2786 prefiltered)
  • h=2e-3, j=10: 1704 (2734 prefiltered)
  • h=1e-10, j=2: 2058 (3574 prefiltered)
  • h=1e-10, j=10: 2035 (3458 prefiltered)

<xr id="psiblastdist"/> shows the e-Value and identity distributions for the Big80 results. The differences between the parameter combinations are quite easy to spot. The lower e-Value cutoff (1-e10) also produces more significant hits (lower e-Values). This might be caused by the inclusion of fewer sequences into the profile and therefore a higher specificity for more closely related sequences (i.e. low e-Values). An increased number of iterations on the other hand reduces the number of significant hits.

<figtable id="psiblastdist">

Hemo eval psi 80.png
Hemo ident psi 80.png
Table 2: E-Value and identity distributions of the PSI-BLast search within Big80.

</figtable>

After the searches against Big80 we also ran PSI-BLAST against the Big database. We reused the profiles from the Big80 runs and also increased the maximum of reported hits to 100000.

  • h=2e-3, j=2: 23840 (25934 prefiltered)
  • h=2e-3, j=10: 25756 (30616 prefiltered)
  • h=1e-10, j=2: 26483 (28766 prefiltered)
  • h=1e-10, j=10: 27535 (29609 prefiltered)

The two combinations with 10 iterations threw multiple error messages (but finished the process nevertheless). These errors were due to an internal code failure of PSI-BLAST and caused because of too many possible hits.

HHblits

<figtable id="hhblitsdist">

Hemo eval hhblits 80.png
Hemo ident hhblits 80.png
E-Value and identity distributions of the HHblits search within Uniprot20.

</figtable>

Comparison

<figtable id="psiblastoverlap">

Hemo Overlap PSI.png
Hemo Overlap PSI BIG.png
Hemo Overlap PSI BLAST.png
Nothing to see here....

</figtable>

<figtable id="alloverlap">

Hemo Overlap All 80.png
Hemo Overlap All BIG.png
Hemo Overlap PSI HHB BIG.png
Nothing to see here....

</figtable>

Multiple Sequence Alignments

Dataset

CLustalW

Muscle

T-Coffee