Difference between revisions of "Fabry:Homology based structure predictions/Journal"
Rackersederj (talk | contribs) m (→Dataset preparation) |
Staniewski (talk | contribs) (→Default settings) |
||
(36 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
== Dataset preparation == |
== Dataset preparation == |
||
− | The homology search was performed online and resulted in the two output files [https://www.dropbox.com/s/4ue1umheukk7j1a/hhpred.out.txt hhpred.out] and [https://www.dropbox.com/s/ycfy8ke6xt3ofrp/coma.out.txt coma.out] for the structure search with |
+ | The homology search was performed online and resulted in the two output files [https://www.dropbox.com/s/4ue1umheukk7j1a/hhpred.out.txt hhpred.out] and [https://www.dropbox.com/s/ycfy8ke6xt3ofrp/coma.out.txt coma.out] for the structure search with [http://toolkit.tuebingen.mpg.de/hhpred HHpred] and [http://bioinformatics.ibt.lt:8085/coma/ COMA], respectively. In both cases, we used the default values, thresholds and databases. |
− | From |
+ | From both the resulting files we tried in each case to create three distinct datasets with the demanded sequence identity to the target protein with the following calls and scripts. |
− | perl [https:// |
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/make_dataset_hhpred.pl.html make_dataset_hhpred.pl]</span> <span class="plainlinks">[https://www.dropbox.com/s/4ue1umheukk7j1a/hhpred.out.txt hhpred.out]</span> 0.0000000000000001 |
This resulted in the HHpred datasets mentioned in [[Fabry:Homology_based_structure_predictions#Dataset_preparation | Dataset preparation]] Table 1 and the corresponding pdb structure files. |
This resulted in the HHpred datasets mentioned in [[Fabry:Homology_based_structure_predictions#Dataset_preparation | Dataset preparation]] Table 1 and the corresponding pdb structure files. |
||
− | perl [https:// |
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/make_dataset_coma.pl.html make_dataset_coma.pl]</span> <span class="plainlinks">[https://www.dropbox.com/s/ycfy8ke6xt3ofrp/coma.out.txt coma.out]</span> 0.002 |
+ | |||
− | This resulted in the Coma datasets mentioned in [[Fabry:Homology_based_structure_predictions#Dataset_preparation | Dataset preparation]] Table 2 and the corresponding pdb structure files. |
||
+ | This resulted in the Coma datasets mentioned in [[Fabry:Homology_based_structure_predictions#Dataset_preparation_and_target_comparison | Dataset preparation]] Table 2 and the corresponding pdb structure files. |
||
== Calculation of models == |
== Calculation of models == |
||
=== Modeller === |
=== Modeller === |
||
+ | The following steps resulted in 10 models, which can all be downloaded [https://www.dropbox.com/s/537aoxmswsxrebq/models_Modeller.zip here]. |
||
+ | ==== Default settings ==== |
||
+ | For the standard homology modeling with Modeller two basic scripts were used, which build up on the ones described in the recommended [[Using_Modeller_for_TASK_4 | tutorial]]: [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/1_align.py.html 1_align.py] and [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/2_Single_template_modeling.py.html 2_Single_template_modeling.py] (in this case example file for 1ktb) which need the appropriate template pdb structure files, as well as the target (AGAL) sequence in pir format as input: |
||
+ | >P1;1R46 |
||
− | === Swissmodel === |
||
+ | sequence:1R46:::::::0.00: 0.00 |
||
+ | MQLRNPELHLGCALALRFLALVSWDIPGARALDNGLARTPTMGWLHWERFMCNLDCQEEP |
||
+ | DSCISEKLFMEMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQL |
||
+ | ANYVHSKGLKLGIYADVGNKTCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENL |
||
+ | ADGYKHMSLALNRTGRSIVYSCEWPLYMWPFQKPNYTEIRQYCNHWRNFADIDDSWKSIK |
||
+ | SILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQVTQMALWAIMAAPLFMSNDL |
||
+ | RHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWAVAMINRQEIG |
||
+ | GPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENT |
||
+ | MQMSLKDLL* |
||
+ | |||
+ | The first script creates an alignment only based on the structure of template and target, as well as a 2d-alignment also based on the given structural data. The second script creates the actual model and assesses it based on the DOPE score and the GA341 method (see [http://salilab.org/pdf/Melo_ProteinSci_2002.pdf Melo et al., 2002] and [http://salilab.org/pdf/John_NucleicAcidsRes_2003.pdf John & Šali, 2003]). |
||
+ | The following runs were performed on our home computers: |
||
+ | #3HG3 100% |
||
+ | $ mod9.10 1_align_3hg3.py |
||
+ | $ mod9.10 2_Single_template_modeling_3hg3.py |
||
+ | |||
+ | #1KTB 53% |
||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/1_align.py.html 1_align.py]</span> |
||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/2_Single_template_modeling.py.html 2_Single_template_modeling.py]</span> |
||
+ | |||
+ | #3CC1 25% |
||
+ | $ mod9.10 1_align_3cc1.py |
||
+ | $ mod9.10 2_Single_template_modeling_3cc1.py |
||
+ | |||
+ | The scores were read from the log files with the perl script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller.pl.html read_logfiles_modeller.pl]. |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller.pl.html read_logfiles_modeller.pl]</span> 1_align_3hg3.log 2_Single_template_modeling_3hg3.log |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller.pl.html read_logfiles_modeller.pl]</span> 1_align.log 2_Single_template_modeling.log |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller.pl.html read_logfiles_modeller.pl]</span> 1_align_3cc1.log 2_Single_template_modeling_3cc1.log |
||
+ | |||
+ | The vizualisation was done with the help of Pymol and the commands listed in [https://www.dropbox.com/s/29lh6pxgxuxgeoi/pymol_single.txt pymol_single.txt] |
||
+ | |||
+ | ==== Multiple templates ==== |
||
+ | For multiple templates three scripts are needed, which again base on the [[Using_Modeller_for_TASK_4 | tutorial]] and are here again only exemplified for one case: [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/3_multiAli_1.py 3_multiAli_1.py], [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/4_add1R46_1.py 4_add1R46_1.py] and [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/5_multipleMod_1.py 5_multipleMod_1.py]. In the last script we added the methods for the assessment scores, since otherwise there would have been no quality scores for comparison. |
||
+ | The following combinations were performed: |
||
+ | #Multiple 3HG3,1KTB |
||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/3_multiAli_1.py 3_multiAli_1.py]</span> |
||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/4_add1R46_1.py 4_add1R46_1.py]</span> |
||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/5_multipleMod_1.py 5_multipleMod_1.py]</span> |
||
+ | |||
+ | #Multiple 3HG3,1KTB,3CC1 |
||
+ | $ mod9.10 3_multiAli_2.py |
||
+ | $ mod9.10 4_add1R46_2.py |
||
+ | $ mod9.10 5_multipleMod_2.py |
||
+ | |||
+ | #Multiple 3CC1, 3ZSS, 3A24 |
||
+ | $ mod9.10 3_multiAli_3.py |
||
+ | $ mod9.10 4_add1R46_3.py |
||
+ | $ mod9.10 5_multipleMod_3.py |
||
+ | |||
+ | #Multiple 3CC1, 3HG3 |
||
+ | $ mod9.10 3_multiAli_4.py |
||
+ | $ mod9.10 4_add1R46_4.py |
||
+ | $ mod9.10 5_multipleMod_4.py |
||
+ | |||
+ | |||
+ | The scores were read from the log file with the perl script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]. |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 5_multipleMod_1.log MULTI1 |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 5_multipleMod_2.log MULTI2 |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 5_multipleMod_3.log MULTI3 |
||
+ | $ perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 5_multipleMod_4.log MULTI4 |
||
+ | |||
+ | |||
+ | The vizualisation was done with the help of Pymol and the commands listed in [https://www.dropbox.com/s/cw4a750gdwe2ae1/pymol_multi.txt pymol_multi.txt] |
||
+ | |||
+ | ==== Edited Alignment input ==== |
||
+ | For the edited alignment models, only the second Modeller script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/2_Single_template_modeling.py.html 2_Single_template_modeling.py] was needed. Edited alignments were provided as input. We rearranged them with the program [http://molecularevolution.org/software/alignment/seaview SeaView]. |
||
+ | #Active Site shifted right to next D (7 and 1) in -2d.ali |
||
+ | $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite.py |
||
+ | |||
+ | #Active Site shifted right to next D (7 and 1) in both ali files |
||
+ | $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite_2.py |
||
+ | |||
+ | #Active Site shifted right to next D (7 and 1) in both ali files + Substrate binding region (203-207) forced to be consecutive |
||
+ | $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite_3.py |
||
+ | |||
+ | |||
+ | The scores were read from the log file with the perl script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]. |
||
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 2_Single_template_modeling_3hg3_changed_actSite.log CHAS |
||
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 2_Single_template_modeling_3hg3_changed_actSite_2.log CHAS2 |
||
+ | perl <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_logfiles_modeller_CHAS.pl.html read_logfiles_modeller_CHAS.pl]</span> 2_Single_template_modeling_3hg3_changed_actSite_3.log CHAS3 |
||
+ | |||
+ | |||
+ | The vizualisation was done with the help of Pymol and the commands listed in [https://www.dropbox.com/s/zyvy345qj1ysm5d/pymol_changed.txt pymol_changed.txt] |
||
+ | |||
+ | === SwissModel === |
||
+ | |||
+ | The SwissModel calculations were performed online.<br> |
||
+ | Pictures were created either with Pymol (see [https://www.dropbox.com/s/52m78vjdoiasqme/pymol_swiss.txt pymol_swiss.txt]) or with R: |
||
+ | $ R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/localError.R.html localError.R]</span> |
||
=== iTasser === |
=== iTasser === |
||
+ | |||
+ | As input sequence, the already known [[Fabry:Alpha-galactosidase sequence|Alpha-galactosidase sequence]] was used for [http://zhanglab.ccmb.med.umich.edu/I-TASSER/ I-Tasser]. |
||
+ | |||
+ | The models were calculated using ''Option II: Exclude some templates from I-TASSER template library.'' to exclude homologous templates. The field ''Exclude homologous templates'' was |
||
+ | |||
+ | * left blank |
||
+ | * set to 80% |
||
+ | * set to 30% |
||
+ | * set to 20% |
||
+ | |||
+ | in the different runs. This results in up to five models for each run, which were downloaded and stored in folders named ''AGAL_HUMAN*''. Eventually, the script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calc_itasser_scores.sh.html calc_itasser_scores.sh] was applied on the models to calculate the different measures using the TMscore and sap binaries. It also invokes the scripts [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/tm2wikirow.pl.html tm2wikirow.pl] and [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/sap2wikirow.pl.html sap2wikirow.pl] to parse the output of these tools. |
||
+ | |||
+ | $ bash <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calc_itasser_scores.sh.html calc_itasser_scores.sh]</span> |
||
=== 3D-Jigsaw === |
=== 3D-Jigsaw === |
||
+ | |||
+ | As input sequence, the already known [[Fabry:Alpha-galactosidase sequence|Alpha-galactosidase sequence]] was used for [http://bmm.cancerresearchuk.org/~populus/populus_submit.html 3D-Jigsaw] and the ''upload mode'' was selected. The input model sets are listed in detail at the [[Fabry:Homology based structure predictions#3D-Jigsaw|3D-Jigsaw section]]. These sets were generated by the following scripts which make use of [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/repairPDB.html repairPDB] to fix the input files: |
||
+ | |||
+ | $ bash <span class="plainlinks"><span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/gen_top5_30.sh.html gen_top5_30.sh]</span></span> |
||
+ | $ bash <span class="plainlinks"><span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/gen_top5_40_80.sh.html gen_top5_40_80.sh]</span></span> |
||
+ | $ bash <span class="plainlinks"><span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/gen_top5_80.sh.html gen_top5_80.sh]</span></span> |
||
+ | |||
+ | When the jobs were done, they were downloaded with |
||
+ | |||
+ | $ bash <span class="plainlinks"><span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/fetch_jigsaw_results.sh.html fetch_jigsaw_results.sh]</span></span> |
||
+ | |||
+ | and the RMSD, TM-Score, etc. were calculated: |
||
+ | |||
+ | $ bash <span class="plainlinks"><span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calc_3djigsaw_scores.sh.html calc_3djigsaw_scores.sh]</span></span> |
||
== Evaluation == |
== Evaluation == |
||
+ | === TM-score === |
||
+ | The scores were computed on a home computer with the TM-score version of 2012/05/07. We used a [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calculate_TMScore.sh.html bash script], which calls the TM-score program for all models, as well as the [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_TMoutput.pl.html perl script read_TMoutput.pl] that extracts the calculated scores and automatically outputs them in Media-wiki table format into a file. The bash script requires the template pdb file and the directory in which all models are located as input. |
||
+ | #MODELLER |
||
+ | $ ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calculate_TMScore.sh.html calculate_TMScore.sh]</span> ../Modeller/pdb/1R46.pdb ../Modeller/final_models/ |
||
+ | $ ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calculate_TMScore.sh.html calculate_TMScore.sh]</span> ../Modeller/pdb/1R47.pdb ../Modeller/final_models/ |
||
+ | |||
+ | #SWISSMODEL |
||
+ | $ ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calculate_TMScore.sh.html calculate_TMScore.sh]</span> ../Modeller/pdb/1R46.pdb ../Swissmodel/final_models/ |
||
+ | $ ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/calculate_TMScore.sh.html calculate_TMScore.sh]</span> ../Modeller/pdb/1R47.pdb ../Swissmodel/final_models/ |
||
+ | |||
+ | === RMSD with SAP === |
||
+ | Due to some server errors, we calculated most of the RMSD values online with the [http://mathbio.nimr.mrc.ac.uk/wiki/SAP SAP Web Tool]. From this we obtained the same output as from the command line tool. The output was read by another pair of [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/readRMSD.sh.html bash] and [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/read_SAPoutput.pl.html perl] script, to extract the calculated scores and create a table. |
||
+ | $ ./<span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/readRMSD.sh.html readRMSD.sh]</span> |
||
+ | |||
+ | The pngs used for the animated gifs were created with Pymol and the commands listed in [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/pdb_superimpose.txt pdb_superimpose.txt]. Afterwards the pngs were animated with the program [http://www.gamani.com/apng.htm GIF Movie Gear]. |
||
+ | |||
+ | === RMSD around catalytic site === |
||
+ | The all atom RMSD of the atoms within a radius of 6 Angstrom around the catalytic centre was calculated with the help of pymol and the commands, that can be found in [ RMSD_6A.txt]. |
||
+ | |||
+ | === DOPE-score === |
||
+ | The DOPE profile was read with another [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/6_DOPE_score.py script], based on a [http://salilab.org/modeller/tutorial/basic.html tutorial] on the MODELLER website. It reads the DOPE score for each residue and writes them into a file. These scores can be plotted with the R script [https://dl.dropbox.com/u/13796643/fabry/homology/scripts/Profile.R.html Profile.R]. |
||
+ | |||
+ | $ mod9.10 <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/6_DOPE_score.py 6_DOPE_score.py]</span> |
||
+ | $ R CMD BATCH <span class="plainlinks">[https://dl.dropbox.com/u/13796643/fabry/homology/scripts/Profile.R.html Profile.R]</span> |
||
+ | |||
+ | == Comparison == |
||
+ | Visual comparisons were performed with Pymol and the commands listed in [https://www.dropbox.com/s/pzh01zaggjpzh7f/comparison.txt comparison.txt] |
||
[[Category: Fabry Disease 2012]] |
[[Category: Fabry Disease 2012]] |
Latest revision as of 14:58, 11 June 2012
Fabry Disease » Homology based structure predictions » Journal
Contents
Dataset preparation
The homology search was performed online and resulted in the two output files hhpred.out and coma.out for the structure search with HHpred and COMA, respectively. In both cases, we used the default values, thresholds and databases. From both the resulting files we tried in each case to create three distinct datasets with the demanded sequence identity to the target protein with the following calls and scripts.
$ perl make_dataset_hhpred.pl hhpred.out 0.0000000000000001
This resulted in the HHpred datasets mentioned in Dataset preparation Table 1 and the corresponding pdb structure files.
$ perl make_dataset_coma.pl coma.out 0.002
This resulted in the Coma datasets mentioned in Dataset preparation Table 2 and the corresponding pdb structure files.
Calculation of models
Modeller
The following steps resulted in 10 models, which can all be downloaded here.
Default settings
For the standard homology modeling with Modeller two basic scripts were used, which build up on the ones described in the recommended tutorial: 1_align.py and 2_Single_template_modeling.py (in this case example file for 1ktb) which need the appropriate template pdb structure files, as well as the target (AGAL) sequence in pir format as input:
>P1;1R46 sequence:1R46:::::::0.00: 0.00 MQLRNPELHLGCALALRFLALVSWDIPGARALDNGLARTPTMGWLHWERFMCNLDCQEEP DSCISEKLFMEMAELMVSEGWKDAGYEYLCIDDCWMAPQRDSEGRLQADPQRFPHGIRQL ANYVHSKGLKLGIYADVGNKTCAGFPGSFGYYDIDAQTFADWGVDLLKFDGCYCDSLENL ADGYKHMSLALNRTGRSIVYSCEWPLYMWPFQKPNYTEIRQYCNHWRNFADIDDSWKSIK SILDWTSFNQERIVDVAGPGGWNDPDMLVIGNFGLSWNQQVTQMALWAIMAAPLFMSNDL RHISPQAKALLQDKDVIAINQDPLGKQGYQLRQGDNFEVWERPLSGLAWAVAMINRQEIG GPRSYTIAVASLGKGVACNPACFITQLLPVKRKLGFYEWTSRLRSHINPTGTVLLQLENT MQMSLKDLL*
The first script creates an alignment only based on the structure of template and target, as well as a 2d-alignment also based on the given structural data. The second script creates the actual model and assesses it based on the DOPE score and the GA341 method (see Melo et al., 2002 and John & Šali, 2003). The following runs were performed on our home computers:
#3HG3 100% $ mod9.10 1_align_3hg3.py $ mod9.10 2_Single_template_modeling_3hg3.py #1KTB 53% $ mod9.10 1_align.py $ mod9.10 2_Single_template_modeling.py #3CC1 25% $ mod9.10 1_align_3cc1.py $ mod9.10 2_Single_template_modeling_3cc1.py
The scores were read from the log files with the perl script read_logfiles_modeller.pl.
$ perl read_logfiles_modeller.pl 1_align_3hg3.log 2_Single_template_modeling_3hg3.log $ perl read_logfiles_modeller.pl 1_align.log 2_Single_template_modeling.log $ perl read_logfiles_modeller.pl 1_align_3cc1.log 2_Single_template_modeling_3cc1.log
The vizualisation was done with the help of Pymol and the commands listed in pymol_single.txt
Multiple templates
For multiple templates three scripts are needed, which again base on the tutorial and are here again only exemplified for one case: 3_multiAli_1.py, 4_add1R46_1.py and 5_multipleMod_1.py. In the last script we added the methods for the assessment scores, since otherwise there would have been no quality scores for comparison. The following combinations were performed:
#Multiple 3HG3,1KTB $ mod9.10 3_multiAli_1.py $ mod9.10 4_add1R46_1.py $ mod9.10 5_multipleMod_1.py #Multiple 3HG3,1KTB,3CC1 $ mod9.10 3_multiAli_2.py $ mod9.10 4_add1R46_2.py $ mod9.10 5_multipleMod_2.py #Multiple 3CC1, 3ZSS, 3A24 $ mod9.10 3_multiAli_3.py $ mod9.10 4_add1R46_3.py $ mod9.10 5_multipleMod_3.py #Multiple 3CC1, 3HG3 $ mod9.10 3_multiAli_4.py $ mod9.10 4_add1R46_4.py $ mod9.10 5_multipleMod_4.py
The scores were read from the log file with the perl script read_logfiles_modeller_CHAS.pl.
$ perl read_logfiles_modeller_CHAS.pl 5_multipleMod_1.log MULTI1 $ perl read_logfiles_modeller_CHAS.pl 5_multipleMod_2.log MULTI2 $ perl read_logfiles_modeller_CHAS.pl 5_multipleMod_3.log MULTI3 $ perl read_logfiles_modeller_CHAS.pl 5_multipleMod_4.log MULTI4
The vizualisation was done with the help of Pymol and the commands listed in pymol_multi.txt
Edited Alignment input
For the edited alignment models, only the second Modeller script 2_Single_template_modeling.py was needed. Edited alignments were provided as input. We rearranged them with the program SeaView.
#Active Site shifted right to next D (7 and 1) in -2d.ali $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite.py #Active Site shifted right to next D (7 and 1) in both ali files $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite_2.py #Active Site shifted right to next D (7 and 1) in both ali files + Substrate binding region (203-207) forced to be consecutive $ mod9.10 2_Single_template_modeling_3hg3_changed_actSite_3.py
The scores were read from the log file with the perl script read_logfiles_modeller_CHAS.pl.
perl read_logfiles_modeller_CHAS.pl 2_Single_template_modeling_3hg3_changed_actSite.log CHAS perl read_logfiles_modeller_CHAS.pl 2_Single_template_modeling_3hg3_changed_actSite_2.log CHAS2 perl read_logfiles_modeller_CHAS.pl 2_Single_template_modeling_3hg3_changed_actSite_3.log CHAS3
The vizualisation was done with the help of Pymol and the commands listed in pymol_changed.txt
SwissModel
The SwissModel calculations were performed online.
Pictures were created either with Pymol (see pymol_swiss.txt) or with R:
$ R CMD BATCH localError.R
iTasser
As input sequence, the already known Alpha-galactosidase sequence was used for I-Tasser.
The models were calculated using Option II: Exclude some templates from I-TASSER template library. to exclude homologous templates. The field Exclude homologous templates was
- left blank
- set to 80%
- set to 30%
- set to 20%
in the different runs. This results in up to five models for each run, which were downloaded and stored in folders named AGAL_HUMAN*. Eventually, the script calc_itasser_scores.sh was applied on the models to calculate the different measures using the TMscore and sap binaries. It also invokes the scripts tm2wikirow.pl and sap2wikirow.pl to parse the output of these tools.
$ bash calc_itasser_scores.sh
3D-Jigsaw
As input sequence, the already known Alpha-galactosidase sequence was used for 3D-Jigsaw and the upload mode was selected. The input model sets are listed in detail at the 3D-Jigsaw section. These sets were generated by the following scripts which make use of repairPDB to fix the input files:
$ bash gen_top5_30.sh $ bash gen_top5_40_80.sh $ bash gen_top5_80.sh
When the jobs were done, they were downloaded with
$ bash fetch_jigsaw_results.sh
and the RMSD, TM-Score, etc. were calculated:
$ bash calc_3djigsaw_scores.sh
Evaluation
TM-score
The scores were computed on a home computer with the TM-score version of 2012/05/07. We used a bash script, which calls the TM-score program for all models, as well as the perl script read_TMoutput.pl that extracts the calculated scores and automatically outputs them in Media-wiki table format into a file. The bash script requires the template pdb file and the directory in which all models are located as input.
#MODELLER $ ./calculate_TMScore.sh ../Modeller/pdb/1R46.pdb ../Modeller/final_models/ $ ./calculate_TMScore.sh ../Modeller/pdb/1R47.pdb ../Modeller/final_models/ #SWISSMODEL $ ./calculate_TMScore.sh ../Modeller/pdb/1R46.pdb ../Swissmodel/final_models/ $ ./calculate_TMScore.sh ../Modeller/pdb/1R47.pdb ../Swissmodel/final_models/
RMSD with SAP
Due to some server errors, we calculated most of the RMSD values online with the SAP Web Tool. From this we obtained the same output as from the command line tool. The output was read by another pair of bash and perl script, to extract the calculated scores and create a table.
$ ./readRMSD.sh
The pngs used for the animated gifs were created with Pymol and the commands listed in pdb_superimpose.txt. Afterwards the pngs were animated with the program GIF Movie Gear.
RMSD around catalytic site
The all atom RMSD of the atoms within a radius of 6 Angstrom around the catalytic centre was calculated with the help of pymol and the commands, that can be found in [ RMSD_6A.txt].
DOPE-score
The DOPE profile was read with another script, based on a tutorial on the MODELLER website. It reads the DOPE score for each residue and writes them into a file. These scores can be plotted with the R script Profile.R.
$ mod9.10 6_DOPE_score.py $ R CMD BATCH Profile.R
Comparison
Visual comparisons were performed with Pymol and the commands listed in comparison.txt