Difference between revisions of "Sequence and multiple alignments"

From Bioinformatikpedia
(Database search)
m (Alignments)
Line 130: Line 130:
 
All of our alignments are very bad alignments strechted with many gaps, which led to the decision that our sequence input is not very useful. Because of this, we decided to try it again but this time with more direct homologs to the HFE protein. This step will be done in the next two to four days because it is too late now to recalculate everything. Maybe we can also fix the strange 3dcoffee bug this time.
 
All of our alignments are very bad alignments strechted with many gaps, which led to the decision that our sequence input is not very useful. Because of this, we decided to try it again but this time with more direct homologs to the HFE protein. This step will be done in the next two to four days because it is too late now to recalculate everything. Maybe we can also fix the strange 3dcoffee bug this time.
   
  +
==Residues==
Furthermore we have almost no conserved column (one column with the same amino acid at all sequences) which we think depends also on our input sequences. Counting and analyzing the gaps is quite useless in this actual view because all results will be redone and corrected.
 
  +
Furthermore we have almost no conserved column (one column with the same amino acid at all sequences) which we think depends also on our input sequences.
  +
  +
=Functional Important=
  +
  +
  +
  +
==Gaps==
  +
Counting and analyzing the gaps is quite useless in this actual view because all results will be redone and corrected.

Revision as of 23:34, 23 May 2011

Sequence Alignments

Database search

Overlap.png

We used different tools for the search in a non-reduntant(NR) database like BLAST, FASTA and PSI-BLAST. Because of the absence of the database in a hmm format, we decided not to integrate the hhpred web-results. These results are not compareable.

We called BLAST and FASTA with the standard parameter. blast -p blastp -i 1a6z.fasta -d /data/nr/nr -o blastp_1a6z_on_NR.txt runtime 5:34
../../Desktop/fasta-36.3.4/bin/fasta36 -q 1a6z.fasta /data/nr/nr >fasta_1a5z_on_nr.txt the runtime was 10:44.

Venn blast fasta psi.png

PSI-BLAST was used with different parameter settings. blastpgp -i 1a6z.fasta -d /data/nr/nr runtime 21:40
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 6 -e 10E-6 > psiblast_on_NR_itera3_e-val_10E-6.txt runtime 23:49
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 3 -e 10E-6 > psiblast_on_NR_itera6_e-val_10E-6.txt runtime 15:08
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 3 > psiblast_on_NR_itera3.txt runtime 9:41
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 6 > psiblast_on_NR_itera6.txt runtime 23:30
We also compared the runtime of the different tools. Suprisingly for PSI-BLAST, in our case, the e-Value and the number of iterations had no impact on the results, so we used PSI-BLAST with standart parameter for the comparison.

Identity histo.png
Fasta score histo.png
Blast psiblast score histo.png

Multiple Alignments

Used Sequences

SeqIdentifier Seq Identity Protein description
99-90% Sequence Identity
AAG29574.1 94% emochromatosis splice variant
89-60% Sequence Identity
AAO47091.1 87% hemochromatosis
Q9GL42.1 68% HFE_DICSU RecName
NP_620577.1 86% hereditary hemochromatosis protein isoform 8 precursor
NP_620578.1 78% hereditary hemochromatosis protein isoform 9 precursor
AAF01222.1 75% hereditary haemochromatosis protein precursor
EDL86571.1 62% hemochromatosis, isoform CRA_c
NP_620576.1 86% hereditary hemochromatosis protein isoform 7 precursor
Q9GL43.1 68% HFE_DICBI RecName
AAH74721.1 84% HFE protein
NP_001166905.1 63% hereditary hemochromatosis protein homolog isoform 2 precursor
P70387.1 61% HFE_MOUSE
59-40% Sequence Identity
XP_001511789.1 41% PREDICTED: similar to MHC class I heavy chain antigen
XP_002713938.1 40% PREDICTED: histocompatibility 2, Q region locus 10-like
39-20% Sequence Identity
AAR25266.1 37% MHC class I heavy chain
AAZ30022.1 34% MHC class I antigen alpha chain
XP_585880.4 37% PREDICTED: histocompatibility 2, Q region locus 10-like
AAX51393.1 36% MHC class I antigen alpha chain precursor
BAD23967.1 35.7% MHC class I antigen

Programs

All programs except cobalt were preinstalled at our VM. Cobalt was downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/cobalt/executables/2.0.1/ (file depending on system architecture) and extracted. Afterwards the current directory was changed to ~/usr/bin and a symbolic link was created (ln -s <extraced-cobalt-runnable> cobalt).

Cobalt

msa using cobalt

call:

  • cobalt -i hfe_fasta_multi.txt -norps T -outfmt clustalw > cobalt.txt

time:

  • 0,55s user
  • 0,02s system

ClustalW

msa using clustalW

call:

  • clustalw hfe_fasta_multi.txt

time:

  • 1,32s user
  • 0,04s system

Muscle

msa using muscle

call:

  • muscle -in hfe_fasta_multi.txt -clw -out muscle.txt

time:

  • 1,25s user
  • 0,00s system

T-Coffee

standard

msa using tcoffee (standard)

call:

  • t_coffee hfe_fasta_multi.txt

time:

  • 11,00s user
  • 0,18s system

3dcoffee

call:

  • t_coffee hfe_fasta_multi.txt -mode 3dcoffee

time:

  • xx,xxs user
  • x,xxxs system

Unfortunately were we not able to run the 3d mode successfully, because our 3dcoffee crashed everytime with an rather unusual and useless error report.

Alignments

All of our alignments are very bad alignments strechted with many gaps, which led to the decision that our sequence input is not very useful. Because of this, we decided to try it again but this time with more direct homologs to the HFE protein. This step will be done in the next two to four days because it is too late now to recalculate everything. Maybe we can also fix the strange 3dcoffee bug this time.

Residues

Furthermore we have almost no conserved column (one column with the same amino acid at all sequences) which we think depends also on our input sequences.

Functional Important

Gaps

Counting and analyzing the gaps is quite useless in this actual view because all results will be redone and corrected.