Sequence and multiple alignments
Contents
Sequence Alignments
Database search
We used different tools for the search in a non-reduntant(NR) database like BLAST, FASTA and PSI-BLAST. Because of the absence of the database in a hmm format, we decided not to integrate the hhpred web-results. These results are not compareable.
We called BLAST and FASTA with the standard parameter.
blast -p blastp -i 1a6z.fasta -d /data/nr/nr -o blastp_1a6z_on_NR.txt runtime 5:34
../../Desktop/fasta-36.3.4/bin/fasta36 -q 1a6z.fasta /data/nr/nr >fasta_1a6z_on_nr.txt the runtime was 10:44.
PSI-BLAST was used with different parameter settings.
blastpgp -i 1a6z.fasta -d /data/nr/nr runtime 21:40
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 6 -e 10E-6 > psiblast_on_NR_itera3_e-val_10E-6.txt runtime 23:49
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 3 -e 10E-6 > psiblast_on_NR_itera6_e-val_10E-6.txt runtime 15:08
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 3 > psiblast_on_NR_itera3.txt runtime 9:41
blastpgp -i 1a6z.fasta -d /data/nr/nr -j 6 > psiblast_on_NR_itera6.txt runtime 23:30
We also compared the runtime of the different tools. Suprisingly for PSI-BLAST, in our case, the e-Value and the number of iterations had no impact on the results, so we used PSI-BLAST with standart parameter for the comparison.
As we looked for an overlab between the reference sequences given by HSSP, no overlap occured.
The HSSP reference is given with UniProt AC, so we mapped the ID's on the GenBank ID's using PIR<ref>pir.georgetown.edu/pirwww/search/idmapping.shtml</ref>. Than we compared the ID's without any result.
We also mapped the PDB ID's found by FASTA to UniProt. Here we found a small overlap of about 34 Proteins.
But the issue is, that the highest identety given by a pdb entry is abput 40%.
Multiple Alignments
Used Sequences
SeqIdentifier | Seq Identity | Protein description | |
---|---|---|---|
99-90% Sequence Identity | |||
AAG29574.1 | 94% | emochromatosis splice variant | |
89-60% Sequence Identity | |||
AAO47091.1 | 87% | hemochromatosis | |
Q9GL42.1 | 68% | HFE_DICSU RecName | |
NP_620577.1 | 86% | hereditary hemochromatosis protein isoform 8 precursor | |
NP_620578.1 | 78% | hereditary hemochromatosis protein isoform 9 precursor | |
AAF01222.1 | 75% | hereditary haemochromatosis protein precursor | |
EDL86571.1 | 62% | hemochromatosis, isoform CRA_c | |
NP_620576.1 | 86% | hereditary hemochromatosis protein isoform 7 precursor | |
Q9GL43.1 | 68% | HFE_DICBI RecName | |
AAH74721.1 | 84% | HFE protein | |
NP_001166905.1 | 63% | hereditary hemochromatosis protein homolog isoform 2 precursor | |
P70387.1 | 61% | HFE_MOUSE | |
59-40% Sequence Identity | |||
XP_001511789.1 | 41% | PREDICTED: similar to MHC class I heavy chain antigen | |
XP_002713938.1 | 40% | PREDICTED: histocompatibility 2, Q region locus 10-like | |
39-20% Sequence Identity | |||
AAR25266.1 | 37% | MHC class I heavy chain | |
AAZ30022.1 | 34% | MHC class I antigen alpha chain | |
XP_585880.4 | 37% | PREDICTED: histocompatibility 2, Q region locus 10-like | |
AAX51393.1 | 36% | MHC class I antigen alpha chain precursor | |
BAD23967.1 | 35.7% | MHC class I antigen |
Programs
All programs except cobalt were preinstalled at our VM. Cobalt was downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/cobalt/executables/2.0.1/ (file depending on system architecture) and extracted. Afterwards the current directory was changed to ~/usr/bin and a symbolic link was created (ln -s <extraced-cobalt-runnable> cobalt).
Cobalt
call:
- cobalt -i hfe_fasta_multi.txt -norps T -outfmt clustalw > cobalt.txt
time:
- 0,55s user
- 0,02s system
ClustalW
call:
- clustalw hfe_fasta_multi.txt
time:
- 1,32s user
- 0,04s system
Muscle
call:
- muscle -in hfe_fasta_multi.txt -clw -out muscle.txt
time:
- 1,25s user
- 0,00s system
T-Coffee
standard
call:
- t_coffee hfe_fasta_multi.txt
time:
- 11,00s user
- 0,18s system
3dcoffee
call:
- t_coffee hfe_fasta_multi.txt -mode 3dcoffee
time:
- xx,xxs user
- x,xxxs system
Unfortunately were we not able to run the 3d mode successfully, because our 3dcoffee crashed everytime with an rather unusual and useless error report.
Alignments
All of our alignments are very bad alignments strechted with many gaps, which led to the decision that our sequence input is not very useful. Because of this, we decided to try it again but this time with more direct homologs to the HFE protein. This step will be done in the next two to four days because it is too late now to recalculate everything. Maybe we can also fix the strange 3dcoffee bug this time.
Residues
Furthermore we have almost no conserved column (one column with the same amino acid at all sequences) which we think depends also on our input sequences.
Functional Important
We used the information available at UniProt for determining functional important residues and regions.
key | positions | length | description |
---|---|---|---|
Topological domain | 23 – 306 | 284 | Extracellular Potential |
Transmembrane | 307 – 330 | 24 | Helical; Potential |
Topological domain | 331 – 348 | 18 | Cytoplasmic Potential |
Domain | 207 – 298 | 92 | Ig-like C1-type |
Region | 23 – 114 | 92 | Alpha-1 |
Region | 15 – 205 | 91 | Alpha-2 |
Region | 206 – 297 | 92 | Alpha-3 |
Region | 298 – 306 | 9 | Connecting peptide |
Gaps
Counting and analyzing the gaps is quite useless in this actual view because all results will be redone and corrected.