Difference between revisions of "Sequence Alignments"
(→Sequence searches) |
(→Sequence searches) |
||
Line 21: | Line 21: | ||
* '''Overlap''' |
* '''Overlap''' |
||
To illustrate the overlap of the returned sequences for the sequences searches, Venn Diagrams were drawn. |
To illustrate the overlap of the returned sequences for the sequences searches, Venn Diagrams were drawn. |
||
+ | |||
+ | PSI1 = PSI-BLAST run with 3 iterations, E-value cutoff 0.005 |
||
+ | |||
+ | PSI2 = PSI-BLAST run with 5 iterations, E-value cutoff 0.005 |
||
+ | |||
+ | PSI3 = PSI-BLAST run with 3 iterations, E-value cutoff 10E6 |
||
+ | |||
+ | PSI4 = PSI-BLAST run with 5 iterations, E-value cutoff 10E6 |
||
+ | |||
[[Image:comparePSI10E6.png|left|100px]] |
[[Image:comparePSI10E6.png|left|100px]] |
||
+ | This image shows the overlap of returned sequences for the two PSI-BLAST runs with E-value cutoff 10E6. |
||
+ | |||
[[Image:comparePSI0.005.png|left|100px]] |
[[Image:comparePSI0.005.png|left|100px]] |
||
+ | The overlap of returned sequences for the two PSI-BLAST runs with E-value cutoff 0.005 is identical to the picture above. This indicates, that PSI-BLAST with 5 iterations returns 6 sequences more, as it may have more time to search for additional matching hits. |
||
+ | |||
[[Image:comparePSI3.png|left|100px]] |
[[Image:comparePSI3.png|left|100px]] |
||
+ | The PSI-BLAST runs with each 3 iterations returned absolutely the same sequences match our query BCKDHA. From the 3 shown pictures the following conclusion can be drawn: both PSI-BLAST runs with each 5 iterations also returned absolutely the same results. |
||
+ | |||
[[Image:compAll.png|left|100px]] |
[[Image:compAll.png|left|100px]] |
||
[[Image:compAll2.png|left|100px]] |
[[Image:compAll2.png|left|100px]] |
Revision as of 10:14, 21 May 2011
Contents
Sequence Alignments
Sequence searches
- FASTA
../bin/fasta36 sequence.fasta database > FastaOutput.txt
- BLAST
blastall -p blastp -d database -i sequence.fasta > BlastOutput.txt
- PSIBLAST
blastpgp -i sequence.fasta -j iterations -h evalueCutoff -d database > PsiblastOutput.txt
- HHSearch
hhsearch -i query -d database -o output.txt
database = /data/blast/nr/nr
Result Statistics
- Overlap
To illustrate the overlap of the returned sequences for the sequences searches, Venn Diagrams were drawn.
PSI1 = PSI-BLAST run with 3 iterations, E-value cutoff 0.005
PSI2 = PSI-BLAST run with 5 iterations, E-value cutoff 0.005
PSI3 = PSI-BLAST run with 3 iterations, E-value cutoff 10E6
PSI4 = PSI-BLAST run with 5 iterations, E-value cutoff 10E6
This image shows the overlap of returned sequences for the two PSI-BLAST runs with E-value cutoff 10E6.
The overlap of returned sequences for the two PSI-BLAST runs with E-value cutoff 0.005 is identical to the picture above. This indicates, that PSI-BLAST with 5 iterations returns 6 sequences more, as it may have more time to search for additional matching hits.
The PSI-BLAST runs with each 3 iterations returned absolutely the same sequences match our query BCKDHA. From the 3 shown pictures the following conclusion can be drawn: both PSI-BLAST runs with each 5 iterations also returned absolutely the same results.
- Identity Distribution
As the PSIBlast runs with each 3 iterations resulted in the same Hits, as well as the PSI-BLAST runs with 5 iterations, the identity distributions for those runs were pooled using the same colour. Remarkable is the identity distribution for the FASTA run, which returned a lot more hits with little identity, that the other runs. All in all, FASTA returned almost 3000 Hits (default parameter search), while all the BLAST/PSIBLAST runs returned not more than 300 Hits each. Therefore a lot of the FASTA 'Hits' have little identity and a quite low E-value (see below), but still FASTA returned some good results.
- Evalue Distribution
The E-value distribution for the sequence searches is quite similar concerining the BLAST and PSI-BLAST results. Only FASTA has a lot wider E-value range, which can be explained by the fact, that FASTA returned about 10 times more Hits, among which a lot of sequences have little identity and therefore a quite high E-value.
Sequences chosen for the multiple Alignment:
SeqIdentifier | Seq Identity | source |
---|---|---|
99-90% Sequence Identity | ||
56967006|pdb|1X7Z | 99% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
7546384|pdb|1DTW | 95% | BLAST |
34810149|pdb|1OLU | 99% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
13277798|gb|AAH03787.1 | 95% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
148727347|ref|NP_001092034.1 | 95% | BLAST |
89-60% Sequence Identity | ||
196011048|ref|XP_002115388.1 | 66% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
149543950|ref|XP_001517857.1 | 67% | BLAST |
47227873|emb|CAG09036.1 | 82,5% | FASTA |
47196273|emb|CAF88112.1 | 81% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
12964598|dbj|BAB32665.1 | 88% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
59-40% Sequence Identity | ||
193290664|gb|ACF17640.1 | 47% | BLAST |
215431443|ref|ZP_03429362.1 | 40% | FASTA |
225557347|gb|EEH05633.1 | 51% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
58267618|ref|XP_570965.1 | 50% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
162449842|ref|YP_001612209.1 | 41% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
39-20% Sequence Identity | ||
56966700|pdb|1W85 | 31% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
5822330|pdb|1QS0 | 38.1% | FASTA |
13516864|dbj|BAB40585.1 | 33% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
284166853|ref|YP_003405132.1 | 35% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
76800932|ref|YP_325940.1 | 34% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
Sequences for the Multiple Sequences Alignment were downloaded via NCBI, the sequence id can be changed in the link to retrieve the fasta format: http://www.ncbi.nlm.nih.gov/protein/76800932?report=fasta
Multiple Alignments
clustalw sequences.fasta
t_coffee -seq sequences.fasta
t_coffee -seq sequences.fasta -mode expresso
muscle -in sequences.fasta -out output.aln
download cobalt
./cobalt -i sequences.fasta -norps T > output.aln
Conservation and Gaps
Alignment methods | Gaps | Conserved Columns | ||||||
---|---|---|---|---|---|---|---|---|
Gaps | Avg Gap Length | 100% cons | >90% cons | >80% cons | >70% cons | >60% cons | >50% cons | |
ClustalW | 12 | 3,75 | 24 | 136 | 53 | 60 | 71 | 91 |
T-Coffee | 25 | 4,56 | 24 | 136 | 53 | 60 | 71 | 91 |
T-Coffee (3D) | 56 | 4,75 | 21 | 326 | 83 | 92 | 67 | 71 |
Cobalt | 19 | 3,26 | 24 | 128 | 68 | 56 | 101 | 72 |
Muscle | 17 | 4,76 | 26 | 193 | 26 | 38 | 43 | 10 |
Gaps in secondary structure
ClustalW
Gap position | Gap length | Secondary structure |
109-110 | 4 | Helix |
142-143 | 1 | Helix |
235-236 | 1 | Beta strand |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
394-395 | 5 | Helix |
T-Coffee
Gap position | Gap length | Secondary structure |
141-142 | 1 | Helix |
232-233 | 1 | Beta strand |
275-276 | 11 | Helix |
310-311 | 1 | Helix |
369-370 | 5 | Turn |
395-396 | 18 | Helix |
398-399 | 5 | Helix |
T-Coffee 3d
Gap position | Gap length | Secondary structure |
101-102 | 1 | Helix |
108-109 | 4 | Helix |
115-116 | 1 | Helix |
116-117 | 1 | Helix |
141-142 | 1 | Helix |
153-154 | 1 | Beta strand |
163-164 | 1 | Helix |
177-178 | 3 | Helix |
234-235 | 1 | Beta strand |
263-264 | 4 | Beta strand |
265-266 | 1 | Beta strand |
276-277 | 2 | Helix |
308-309 | 8 | Helix |
309-310 | 5 | Helix |
314-315 | 6 | Helix |
362-363 | 6 | Helix |
371-372 | 4 | Turn |
376-377 | 1 | Helix |
380-381 | 1 | Helix |
382-383 | 7 | Helix |
383-384 | 3 | Helix |
384-385 | 2 | Helix |
387-388 | 2 | Helix |
394-395 | 5 | Helix |
Cobalt
Gap position | Gap length | Secondary structure |
108-109 | 4 | Helix |
141-142 | 1 | Helix |
177-178 | 3 | Helix |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
305-306 | 1 | Helix |
311-312 | 2 | Helix |
387-388 | 1 | Helix |
388-389 | 1 | Helix |
395-396 | 13 | Helix |
Muscle
Gap position | Gap length | Secondary structure |
109-110 | 4 | Helix |
141-142 | 1 | Helix |
177-178 | 3 | Helix |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
305-306 | 1 | Helix |
394-395 | 5 | Helix |
References
Secondary structure information
back to Maple syrup urine disease main page