Difference between revisions of "Sequence Alignment GLA"

From Bioinformatikpedia
(T-Coffee)
(T-Coffee)
Line 95: Line 95:
 
<code>muscle -in sequences.fasta -out muscle_msa.aln</code>
 
<code>muscle -in sequences.fasta -out muscle_msa.aln</code>
 
==T-Coffee==
 
==T-Coffee==
  +
The basic command to start T-Coffe is:
 
<code>t_coffee sequences.fasta</code>
 
<code>t_coffee sequences.fasta</code>
===Default===
 
   
 
===3D===
 
===3D===

Revision as of 21:41, 23 May 2011

Sequence Searches

GLA sequence was searched in the PDB non-redundant(nr) database using three different tools, that are listed below. An additional search has been made applying HHsearch on the pdb70 database(max.70% sequence identity).

Blast

We used the NCBI Blast Version 2.2.18 with the command:

blast -i sequence.fasta -d database -p blastp

Fasta

As no fasta program was installed, we downloaded this and installed it to the virtual machine. The command to run the program is:

fasta36 sequence.fasta database

PSI-Blast

We used the PSI-Blast version 2.2.18 with the command:

blastpgp -i sequence.fasta -d database -j iterations -h e-value

Parameter

We used the following combinations of parameter to run the program:

  • 3 iterations and e-value threshold of 0.005
  • 3 iterations and e-value threshold of 0.002
  • 3 iterations and e-value threshold of 10e-6
  • 5 iterations and e-value threshold of 0.005
  • 5 iterations and e-value threshold of 0.002
  • 5 iterations and e-value threshold of 10e-6

HHsearch

We used the online tool from Gene Center of the LMU Munich. The results are still available: Results.

HSSP

Overlap

Figure 1: Overlap between Blast-, Fasta- and PSI-Blast results.
Figure 2: Overlap between PSI-Blast results with different iteration numbers.
Figure 3: Overlap between PSI-Blast results with different e-values.

E-value Distribution

Figure 4: Distribution of the e-values.

Figure 4 shows the e-value distributions of the programs. Since all blast based programs had a huge number of e-values of the value 0, it was impossible to plot the logarithm of the distribution correctly. The certain values have been set to -500 to provide any plot at all. The logarithm function was necessary, because some outliers were so widely spread, that there was no visible distribution(Plot).


Identity Distribution

Figure 5: Distribution of the identities.

Figure 5 shows the identity distributions of the programs. As not all programs delivered the same number of sequences, the values are normalized to 100. Despite some differences, the majority distribution of the identities is similar, except for HHsearch. The other ones does all have peaks between 30-45% identity.


Runtime Analysis

The runtime of each program was measured by using the command time as a prefix in the commandline.

Program Runtime
Blast 2:40 min
Fasta 5:16 min
PSI-Blast: 3 Iterations: E-value cutoff 10e-6 7:50 min
PSI-Blast: 3 Iterations: E-value cutoff 0.002 7:48 min
PSI-Blast: 3 Iterations: E-value cutoff 0.005 7:55 min
PSI-Blast: 5 Iterations: E-value cutoff 10e-6 13:27 min
PSI-Blast: 5 Iterations: E-value cutoff 0.002 13:06 min
PSI-Blast: 5 Iterations: E-value cutoff 0.005 12:49 min



Multiple Sequence Alignments

Selection of Sequences

Cobalt

cobalt -i sequences.fasta -norps T

ClustalW

clustalw -infile=sequences.fasta

Muscle

muscle -in sequences.fasta -out muscle_msa.aln

T-Coffee

The basic command to start T-Coffe is: t_coffee sequences.fasta

3D

To start the 3D mode the additional parameters -mode expresso -pdb_type dn were given as a suffix to the command.

References

<references />