Difference between revisions of "Sequence Alignments"
(→T-Coffee) |
(→References) |
||
(38 intermediate revisions by 2 users not shown) | |||
Line 13: | Line 13: | ||
* HHSearch |
* HHSearch |
||
hhsearch -i query -d database -o output.txt |
hhsearch -i query -d database -o output.txt |
||
− | |||
database = /data/blast/nr/nr |
database = /data/blast/nr/nr |
||
+ | |||
+ | |||
+ | '''Result Statistics''' |
||
+ | |||
+ | * '''Overlap''' |
||
+ | To illustrate the overlap of the returned sequences for the sequences searches, Venn Diagrams were drawn. |
||
+ | |||
+ | [[Image:comparePSI3.png|thumb|right|Comparisons of the results for the PSIBLAST runs with each 3 iteraions. (PSI1 = PSI-BLAST run with 3 iterations, E-value cutoff 0.005, PSI3 = PSI-BLAST run with 3 iterations, E-value cutoff 10E6)]] |
||
+ | [[Image:comparePSI5.png|thumb|right|Comparisons of the results for the PSIBLAST runs with each 5 iteraions. (PSI2 = PSI-BLAST run with 5 iterations, E-value cutoff 0.005, PSI4 = PSI-BLAST run with 5 iterations, E-value cutoff 10E6)]] |
||
+ | The PSI-BLAST runs with each 3 iterations returned absolutely the same sequences match our query BCKDHA. The same is true for the PSI-BLAST runs with each 5 iterations. This fact was used to combine their results for the following Venn Diagramm (created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html]): |
||
+ | [[Image:vennDiagram.png|none|300px]] |
||
+ | |||
+ | Including BLAST and FASTA, the most interesting fact is that FASTA found more than 2600 more results, that were not returned by any of the other search algorithms. This may be due to no restriction concerning the E-value using the default FASTA search. |
||
+ | BLAST returned only one additional Hit, that was not found in any of the PSI-BLAST searches, but which is also included in the FASTA results. Any PSI-BLAST results were also detected by FASTA. Using PSIBLAST with 5 iterations, 6 more Hits were returned than using PSIBLAST with only 3 iterations. This may be due to a longer search time. |
||
+ | All in all one could say, that the Search algorithms returned nearly the same sequences matching our query. |
||
+ | |||
+ | |||
+ | * '''Identity Distribution''' |
||
+ | [[Image:IdentityDistribution.png|none|500px|Identity Distribution for the results from the Sequence Searches]] |
||
+ | As the PSIBlast runs with each 3 iterations resulted in the same Hits, as well as the PSI-BLAST runs with 5 iterations, the identity distributions for those runs were pooled using the same colour. |
||
+ | Remarkable is the identity distribution for the FASTA run, which returned a lot more hits with little identity, that the other runs. All in all, FASTA returned almost 3000 Hits (default parameter search), while all the BLAST/PSIBLAST runs returned not more than 300 Hits each. Therefore a lot of the FASTA 'Hits' have little identity and a quite low E-value (see below), but still FASTA returned some good results. |
||
+ | |||
+ | |||
+ | * '''Evalue Distribution''' |
||
+ | [[Image:EvalueDistribution.png|none|500px| E-value Distribution for the results from the Sequence Searches]] |
||
+ | The E-value distribution for the sequence searches is quite similar concerining the BLAST and PSI-BLAST results. Only FASTA has a lot wider E-value range, which can be explained by the fact, that FASTA returned about 10 times more Hits, among which a lot of sequences have little identity and therefore a quite high E-value. |
||
+ | |||
+ | |||
+ | |||
Sequences chosen for the multiple Alignment: |
Sequences chosen for the multiple Alignment: |
||
Line 94: | Line 122: | ||
./cobalt -i sequences.fasta -norps T > output.aln |
./cobalt -i sequences.fasta -norps T > output.aln |
||
+ | |||
+ | |||
=== Conservation and Gaps === |
=== Conservation and Gaps === |
||
Line 130: | Line 160: | ||
|71 |
|71 |
||
|91 |
|91 |
||
− | | |
||
|- |
|- |
||
|T-Coffee (3D) |
|T-Coffee (3D) |
||
Line 162: | Line 191: | ||
|10 |
|10 |
||
|} |
|} |
||
+ | |||
=== Gaps in secondary structure === |
=== Gaps in secondary structure === |
||
Line 239: | Line 269: | ||
==== T-Coffee 3d ==== |
==== T-Coffee 3d ==== |
||
− | [[Image:t-coffee_3d_gaps_structure.png|thumb|right|T-Coffee (3D) gaps and structure]] |
+ | [[Image:t-coffee_3d_gaps_structure.png|thumb|right|T-Coffee (3D) gaps and structure [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1U5B]]] |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
Line 345: | Line 375: | ||
==== Cobalt ==== |
==== Cobalt ==== |
||
− | [[Image:cobalt_gaps_structure.png|thumb|right|Cobalt gaps and structure]] |
+ | [[Image:cobalt_gaps_structure.png|thumb|right|Cobalt gaps and structure [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1U5B]]] |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
Line 395: | Line 425: | ||
==== Muscle ==== |
==== Muscle ==== |
||
− | [[Image:muscle_gaps_structure.png|thumb|right|Muscle gaps and structure]] |
+ | [[Image:muscle_gaps_structure.png|thumb|right|Muscle gaps and structure [http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1U5B]]] |
{| border="1" style="text-align:center; border-spacing:0;" |
{| border="1" style="text-align:center; border-spacing:0;" |
||
Line 431: | Line 461: | ||
|} |
|} |
||
+ | === Functionally important residues === |
||
+ | The functionally important sites are according to [http://www.uniprot.org/uniprot/P12694] the following sites: |
||
+ | * Metal binding site, 206 |
||
+ | * Metal binding site, 211 |
||
+ | * Metal binding site, 212 |
||
+ | |||
+ | === References === |
||
+ | |||
+ | [http://www.uniprot.org/uniprot/P12694 Secondary structure information] |
||
+ | |||
+ | back to [[Reference_Sequence_BCKDHA|Reference Sequence of BCKDHA]] |
||
back to [[Maple syrup urine disease]] main page |
back to [[Maple syrup urine disease]] main page |
Latest revision as of 11:13, 21 May 2011
Contents
Sequence Alignments
Sequence searches
- FASTA
../bin/fasta36 sequence.fasta database > FastaOutput.txt
- BLAST
blastall -p blastp -d database -i sequence.fasta > BlastOutput.txt
- PSIBLAST
blastpgp -i sequence.fasta -j iterations -h evalueCutoff -d database > PsiblastOutput.txt
- HHSearch
hhsearch -i query -d database -o output.txt
database = /data/blast/nr/nr
Result Statistics
- Overlap
To illustrate the overlap of the returned sequences for the sequences searches, Venn Diagrams were drawn.
The PSI-BLAST runs with each 3 iterations returned absolutely the same sequences match our query BCKDHA. The same is true for the PSI-BLAST runs with each 5 iterations. This fact was used to combine their results for the following Venn Diagramm (created with [6]):
Including BLAST and FASTA, the most interesting fact is that FASTA found more than 2600 more results, that were not returned by any of the other search algorithms. This may be due to no restriction concerning the E-value using the default FASTA search. BLAST returned only one additional Hit, that was not found in any of the PSI-BLAST searches, but which is also included in the FASTA results. Any PSI-BLAST results were also detected by FASTA. Using PSIBLAST with 5 iterations, 6 more Hits were returned than using PSIBLAST with only 3 iterations. This may be due to a longer search time. All in all one could say, that the Search algorithms returned nearly the same sequences matching our query.
- Identity Distribution
As the PSIBlast runs with each 3 iterations resulted in the same Hits, as well as the PSI-BLAST runs with 5 iterations, the identity distributions for those runs were pooled using the same colour. Remarkable is the identity distribution for the FASTA run, which returned a lot more hits with little identity, that the other runs. All in all, FASTA returned almost 3000 Hits (default parameter search), while all the BLAST/PSIBLAST runs returned not more than 300 Hits each. Therefore a lot of the FASTA 'Hits' have little identity and a quite low E-value (see below), but still FASTA returned some good results.
- Evalue Distribution
The E-value distribution for the sequence searches is quite similar concerining the BLAST and PSI-BLAST results. Only FASTA has a lot wider E-value range, which can be explained by the fact, that FASTA returned about 10 times more Hits, among which a lot of sequences have little identity and therefore a quite high E-value.
Sequences chosen for the multiple Alignment:
SeqIdentifier | Seq Identity | source |
---|---|---|
99-90% Sequence Identity | ||
56967006|pdb|1X7Z | 99% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
7546384|pdb|1DTW | 95% | BLAST |
34810149|pdb|1OLU | 99% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
13277798|gb|AAH03787.1 | 95% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
148727347|ref|NP_001092034.1 | 95% | BLAST |
89-60% Sequence Identity | ||
196011048|ref|XP_002115388.1 | 66% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
149543950|ref|XP_001517857.1 | 67% | BLAST |
47227873|emb|CAG09036.1 | 82,5% | FASTA |
47196273|emb|CAF88112.1 | 81% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
12964598|dbj|BAB32665.1 | 88% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
59-40% Sequence Identity | ||
193290664|gb|ACF17640.1 | 47% | BLAST |
215431443|ref|ZP_03429362.1 | 40% | FASTA |
225557347|gb|EEH05633.1 | 51% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
58267618|ref|XP_570965.1 | 50% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
162449842|ref|YP_001612209.1 | 41% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
39-20% Sequence Identity | ||
56966700|pdb|1W85 | 31% | PSI BLAST, 3 iterations, E-value cutoff 0.005 |
5822330|pdb|1QS0 | 38.1% | FASTA |
13516864|dbj|BAB40585.1 | 33% | PSI BLAST, 3 iterations, E-value cutoff 10E-6 |
284166853|ref|YP_003405132.1 | 35% | PSI BLAST, 5 iterations, E-value cutoff 0.005 |
76800932|ref|YP_325940.1 | 34% | PSI BLAST, 5 iterations, E-value cutoff 10E-6 |
Sequences for the Multiple Sequences Alignment were downloaded via NCBI, the sequence id can be changed in the link to retrieve the fasta format: http://www.ncbi.nlm.nih.gov/protein/76800932?report=fasta
Multiple Alignments
clustalw sequences.fasta
t_coffee -seq sequences.fasta
t_coffee -seq sequences.fasta -mode expresso
muscle -in sequences.fasta -out output.aln
download cobalt
./cobalt -i sequences.fasta -norps T > output.aln
Conservation and Gaps
Alignment methods | Gaps | Conserved Columns | ||||||
---|---|---|---|---|---|---|---|---|
Gaps | Avg Gap Length | 100% cons | >90% cons | >80% cons | >70% cons | >60% cons | >50% cons | |
ClustalW | 12 | 3,75 | 24 | 136 | 53 | 60 | 71 | 91 |
T-Coffee | 25 | 4,56 | 24 | 136 | 53 | 60 | 71 | 91 |
T-Coffee (3D) | 56 | 4,75 | 21 | 326 | 83 | 92 | 67 | 71 |
Cobalt | 19 | 3,26 | 24 | 128 | 68 | 56 | 101 | 72 |
Muscle | 17 | 4,76 | 26 | 193 | 26 | 38 | 43 | 10 |
Gaps in secondary structure
ClustalW
Gap position | Gap length | Secondary structure |
109-110 | 4 | Helix |
142-143 | 1 | Helix |
235-236 | 1 | Beta strand |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
394-395 | 5 | Helix |
T-Coffee
Gap position | Gap length | Secondary structure |
141-142 | 1 | Helix |
232-233 | 1 | Beta strand |
275-276 | 11 | Helix |
310-311 | 1 | Helix |
369-370 | 5 | Turn |
395-396 | 18 | Helix |
398-399 | 5 | Helix |
T-Coffee 3d
Gap position | Gap length | Secondary structure |
101-102 | 1 | Helix |
108-109 | 4 | Helix |
115-116 | 1 | Helix |
116-117 | 1 | Helix |
141-142 | 1 | Helix |
153-154 | 1 | Beta strand |
163-164 | 1 | Helix |
177-178 | 3 | Helix |
234-235 | 1 | Beta strand |
263-264 | 4 | Beta strand |
265-266 | 1 | Beta strand |
276-277 | 2 | Helix |
308-309 | 8 | Helix |
309-310 | 5 | Helix |
314-315 | 6 | Helix |
362-363 | 6 | Helix |
371-372 | 4 | Turn |
376-377 | 1 | Helix |
380-381 | 1 | Helix |
382-383 | 7 | Helix |
383-384 | 3 | Helix |
384-385 | 2 | Helix |
387-388 | 2 | Helix |
394-395 | 5 | Helix |
Cobalt
Gap position | Gap length | Secondary structure |
108-109 | 4 | Helix |
141-142 | 1 | Helix |
177-178 | 3 | Helix |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
305-306 | 1 | Helix |
311-312 | 2 | Helix |
387-388 | 1 | Helix |
388-389 | 1 | Helix |
395-396 | 13 | Helix |
Muscle
Gap position | Gap length | Secondary structure |
109-110 | 4 | Helix |
141-142 | 1 | Helix |
177-178 | 3 | Helix |
276-277 | 11 | Helix |
294-295 | 1 | Beta strand |
305-306 | 1 | Helix |
394-395 | 5 | Helix |
Functionally important residues
The functionally important sites are according to [7] the following sites:
- Metal binding site, 206
- Metal binding site, 211
- Metal binding site, 212
References
Secondary structure information
back to Reference Sequence of BCKDHA back to Maple syrup urine disease main page