Sequence Alignments HEXA
Use of database searching tools
/bin/fasta36 seq.fasta /data/blast/nr/nr > fasta_out.txt
blastall -p blastp -d /data/blast/nr/nr -i mult_seq.fasta > blast_out.txt
blastpgn -i seq.fasta -j <#iterations> -h <e-value threshold> -d /data/blast/nr/nr > psiblast_out.txt
For the HHSearch tool we used the online server for HHSearch.
For the statistical analysis we wrote a script which shows the distribution of the E-Value and the Identity as well as the different aligned sequences. Furthermore we create a Venn diagram to presentate the overlap between the results of the different searching methods (with http://bioinfogp.cnb.csic.es/tools/venny/index.html). First we compared the methods BLAST, FASTA and PsiBlast(PsiBlast with 3 and 5 runs and E-Value cutoff from 10E-6). Then we looked for the overlap of all done PsiBlasts.
Overlap of the aligned sequences
FASTA found a large number of matches which are not found by the other methods. By comparison the number of hits which were not found by BLAST or PsiBlast, is about 1400. This is much higher than the number of sequences which is found by FASTA and BLAST together. This leads to the conclusion that FASTA aligns many sequences which are probably less good or even wrong. The both different PsiBlast-variants deliver the same hits which are all also found by FASTA. Furthermore all resulting sequences by BLAST were also aligned by FASTA and the most of them are also by PsiBlast. Besides, we decided to compare different runs of PsiBlast. We compared PsiBlast with 3 iterations and an e-Value Cutoff of 0.005 and 10E-6 and also two PsiBlast runs with 5 iterations and the same two e-Value cutoffs as before. In this Vann-Digramm could be seen that the result overlap mostly. Only a few ones differ from the other. This leads to the fact, that PsiBlast with differen iteration number and e-value deliver usually a similar result. In summary the BLAST-methods agree with each other. In contrast the FASTa-method delivers much more sequences which do not correspond which one of the other methods.
True positive hits
HSSP (Homology-derived Secondary Structure of Proteins) lists proteins which are homologue and have a similar secondary structure. Therefore we use the HSSP alignment to check our results. Therefore we check how much overlap is between HSSP and the other methods. The overlapping sequences are the true positives.
gi|212691177|ref|ZP_03299305.1 || 22% | PsiBlast, 3 Iterations, E-Value Cutoff = 10E-6
With the results of these analysis, we created our file for the multiple alignments.
|99%-90% Sequence Identity|
|89%-60% Sequence Identity|
|59%-40% Sequence Identity|
|867691|gb|AAA68620.1||55%||PsiBlast, 3 Iterations, E-Value Cutoff = 0.005|
|39%-20% Sequence Identity|
|299139410|ref|ZP_07032585.1||36%||PsiBlast, 3 Iterations, E-Value Cutoff = 0.005|
|166159759|gb|ABY83272.1||32%||PsiBlast, 5 Iterations, E-Value Cutoff = 0.005|
Download Cobalt from ftp://ftp.ncbi.nlm.nih.gov/pub/cobalt/executables/2.0.1/ (ncbi-cobalt-2.0.1-x64-linux.tar). Uncompress the archive file with tar xfz ncbi-cobalt-2.0.1-x64-linux.tar and change directory to the uncompressed cobalt directoy. Call: ./cobalt -i mult_seq.fasta -norps T > cobalt_out.aln
clustalw -infile=mult_seq.fasta > clustalW_out.aln
muscle -in mult_seq.fasta -out muscle_out.aln -clw
t_coffee -seq mult_seq.fasta
- T-Coffee (3D)
t_coffee -seq mult_seq.fasta -mode expresso
|Alignment methods||Conserved Columns|
|Gaps||100% cons||>90% cons||>80% cons||>70% cons||>60% cons||>50% cons||>40% cons|
|Alignment methods||Secundary Structure|
|#Gaps in sum||Helix||Extended||Coil||No secundary strucutre|
We found several functional residues from (LINK FEHLT NOCH). Because these residues are functionally important, these residues should be conserved. We compared the different alignments and looked if these resides are conserved.
|residue position||Cobalt||ClustalW||Muscle||T-Coffee||3D T-Coffee|
|D||207||conserved (once E)||conserved (once E)||conserved (once E)||conserved (once E)||conserved (once E)|
|W||373||conserved||conserved (once V, R)||conserved||conserved||conserved|
|W||392||conserved||conserved (once P, T, G)||conserved||conserved||conserved|
|Y||421||conserved||conserved (twice G, once -, S)||conserved||conserved||conserved (once H)|
|W||460||conserved (once -)||conserved (once -)||conserved (once -)||conserved (once -)||conserved (once -)|
|E||462||conserved (once -)||conserved (once Q)||conserved (once Q)||conserved (once -)||conserved (once -)|