Difference between revisions of "Parse output.pl"
Kalemanovm (talk | contribs) (Created page with "You can find the script <code>parse_output.pl</code> here on biocluster: <code>/mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1</code>. The usag…") |
Kalemanovm (talk | contribs) |
||
Line 6: | Line 6: | ||
--pdb_full, if HHblits was done against pdb_full <br> |
--pdb_full, if HHblits was done against pdb_full <br> |
||
− | '''Note:''' The flag <code>--pdb_full</code> must be given |
+ | '''Note:''' The flag <code>--pdb_full</code> must be given if HHblits run was performed against the pdb_full database and <code>--pdb70</code> must be given if the clustered pdb70 database was used. If uniprot20 was used, no extra flag has to be given. It is because the databases have different formats of headers of the cluster master sequences, where the IDs of cluster members are listed (and for pdb70 an extra mapping must be used). |
The output is a tab-separated file with the columns: |
The output is a tab-separated file with the columns: |
||
Line 17: | Line 17: | ||
*probabilty (only for HHblits) |
*probabilty (only for HHblits) |
||
− | If both HHblits and (Psi-)BLAST files are given, the overlap of hits with the same ID is calculated. |
+ | The number of found hits is outputted onto stdout. If both HHblits and (Psi-)BLAST files are given, the overlap of hits with the same ID is calculated. <br> |
'''Note:''' The script "filters the duplicates": if more than one HSPs with the same ID are found in one output file, only one HSP with the lowest E-value is taken (for both the calculations and the output). |
'''Note:''' The script "filters the duplicates": if more than one HSPs with the same ID are found in one output file, only one HSP with the lowest E-value is taken (for both the calculations and the output). |
Revision as of 10:35, 5 May 2013
You can find the script parse_output.pl
here on biocluster: /mnt/home/student/kalemanovm/master_practical/Assignment2_Alignments/scripts/task1
. The usage of the script till now is parsing on (Psi-)BLAST and HHblits hhr output files:
Usage: perl parse_output.pl --out_h <output hhblits file> [--out_p <output psiblast file>]
Optional parameters:
--pdb70, if HHblits was done against pdb70
--pdb_full, if HHblits was done against pdb_full
Note: The flag --pdb_full
must be given if HHblits run was performed against the pdb_full database and --pdb70
must be given if the clustered pdb70 database was used. If uniprot20 was used, no extra flag has to be given. It is because the databases have different formats of headers of the cluster master sequences, where the IDs of cluster members are listed (and for pdb70 an extra mapping must be used).
The output is a tab-separated file with the columns:
- id
- evalue
- identity
- similarity
- length
- score
- probabilty (only for HHblits)
The number of found hits is outputted onto stdout. If both HHblits and (Psi-)BLAST files are given, the overlap of hits with the same ID is calculated.
Note: The script "filters the duplicates": if more than one HSPs with the same ID are found in one output file, only one HSP with the lowest E-value is taken (for both the calculations and the output).