Difference between revisions of "TSD Homology modelling protocol"

From Bioinformatikpedia
m (Modeller)
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Back to [[Homology modelling TSD| results]].
 
Back to [[Homology modelling TSD| results]].
  +
==Template selection ==
  +
[http://toolkit.tuebingen.mpg.de/hhpred HHpred] and [http://bioinformatics.ibt.lt:8085/coma/ COMA] were employed in order to complement the 3 sequence identity classes.
   
 
==SWISS-MODEL ==
 
==SWISS-MODEL ==
 
The modelling was performed with the webserver of SWISS-MODEL in [http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1 automated mode] and [http://swissmodel.expasy.org/workspace/index.php?func=modelling_align1 alignment mode].
 
The modelling was performed with the webserver of SWISS-MODEL in [http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1 automated mode] and [http://swissmodel.expasy.org/workspace/index.php?func=modelling_align1 alignment mode].
  +
  +
===Visualisation with Pymol===
  +
SWISS-MODEL provides a model file containing the error coding in the B factor column.
  +
This error coding was adopted for the superimposed reference-model illustration. The command therefore is:
  +
spectrum b, red_blue, <model-name>
  +
  +
==iTasser ==
  +
  +
==Modeller ==
  +
Get all necessary files
  +
<source lang="bash">
  +
cd ../input/
  +
  +
#HEXA sequence
  +
wget http://www.uniprot.org/uniprot/P06865.fasta
  +
sed -ri 's/^>.+/>P06865/' P06865.fasta
  +
  +
#Reference structure
  +
wget http://www.pdb.org/pdb/files/2GJX.pdb
  +
  +
#Templates
  +
wget http://www.pdb.org/pdb/files/2GK1.pdb
  +
wget http://www.pdb.org/pdb/files/3GH5.pdb
  +
wget http://www.pdb.org/pdb/files/1O7A.pdb
  +
</source>
  +
  +
Create the pairwise alignments, using Modeller internal methods
  +
<source lang=python>
  +
#!/usr/bin/env python
  +
from modeller import *
  +
  +
OUT='../prediction/'
  +
  +
def makeAli(targetId, targetFile, templateId, templateFile, templateChain) :
  +
env = environ()
  +
aln = alignment(env)
  +
mdl = model(env, file=templateFile, model_segment=('FIRST:'+templateChain, 'LAST:'+templateChain))
  +
aln.append_model(mdl, align_codes=templateId, atom_files=templateFile)
  +
aln.append(file=targetFile, align_codes=targetId, alignment_format='FASTA')
  +
aln.align2d()
  +
aln.check()
  +
aln.write(file=OUT+targetId+templateId+'-2d.ali', alignment_format='PIR')
  +
aln.malign()
  +
aln.check()
  +
aln.write(file=OUT+targetId+templateId+'.ali', alignment_format='PIR')
  +
return
  +
  +
makeAli('P06865','../input/P06865.fasta','2GK1', '../input/2GK1.pdb','A' )
  +
makeAli('P06865','../input/P06865.fasta','1O7A', '../input/1O7A.pdb','D' )
  +
makeAli('P06865','../input/P06865.fasta','3GH5', '../input/3GH5.pdb','A' )
  +
</source>
  +
  +
Call modeller on a single alignment file
  +
<source lang="python">
  +
#!/usr/bin/env python
  +
  +
from modeller import *
  +
from modeller.automodel import *
  +
import os
  +
import sys
  +
  +
ali = sys.argv[1]
  +
strc = sys.argv[2]
  +
wd = sys.argv[3]
  +
  +
os.chdir(wd)
  +
  +
env = environ()
  +
  +
a = automodel(env,
  +
alnfile = ali,
  +
knowns = strc,
  +
sequence = 'P06865',
  +
assess_methods=(assess.DOPE, assess.GA341))
  +
  +
a.starting_model= 1
  +
a.ending_model = 1
  +
a.make()
  +
</source>
  +
  +
Caller script for all modeller predictions
  +
  +
<source lang="bash">
  +
#!/bin/bash
  +
  +
#Ugly hotfix so we can work in our own folders later on (who hardcodes filepaths AND doesn't allow output folders?!)
  +
sed -i 's|X:../input|X:../../input|' ../prediction/*.ali
  +
  +
PDBS=( 2GK1 1O7A 3GH5 )
  +
  +
for pdb in "${PDBS[@]}" #Gotta love bash syntax
  +
do
  +
echo "$pdb"
  +
mkdir -p ../prediction/$pdb/
  +
mkdir -p ../prediction/${pdb}_2D/
  +
  +
./createModel.py ../P06865${pdb}.ali ${pdb} ../prediction/$pdb/ &> ../prediction/$pdb/log
  +
./createModel.py ../P06865${pdb}-2d.ali ${pdb} ../prediction/${pdb}_2D/ &> ../prediction/${pdb}_2D/log
  +
  +
done
  +
  +
mkdir -p ../prediction/3GH5_ED/
  +
./createModel.py ../P068653GH5_edited.pir 3GH5 ../prediction/3GH5_ED/ &> ../prediction/3GH5_ED/log
  +
</source>
  +
  +
Creating MSAs:
  +
<source lang="python">
  +
#!/usr/bin/env python
  +
from modeller import *
  +
import os
  +
  +
OUT='../prediction/'
  +
IN='../input/'
  +
  +
P=OUT+'3GH5_3NSM/'
  +
  +
if not os.path.exists (P) :
  +
os.mkdir(P)
  +
  +
env = environ()
  +
aln = alignment(env)
  +
  +
for (code, chain) in (('3GH5', 'A'), ('3NSM', 'A')):
  +
mdl = model(env, file=IN+code+'.pdb', model_segment=('FIRST:'+chain, 'LAST:'+chain))
  +
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
  +
aln.salign()
  +
aln.write(file=P+'mymas.pap', alignment_format='PAP')
  +
aln.write(file=P+'mymsa.ali', alignment_format='PIR')
  +
  +
aln = alignment(env)
  +
aln.append(file=P+'mymsa.ali', align_codes='all')
  +
aln_block = len(aln)
  +
# Read aligned sequence(s):
  +
aln.append(file='../input/P06865.fasta', align_codes='P06865', alignment_format='FASTA')
  +
# Structure sensitive variable gap penalty sequence-sequence alignment:
  +
aln.salign()
  +
aln.write(file=P+'mymsa.ali', alignment_format='PIR')
  +
aln.write(file=P+'mymsa.pap', alignment_format='PAP')
  +
#And the same thing again for 3gh5 and 2gk1
  +
</source>
  +
  +
Script for creating the models from the MSAs:
  +
<source lang="python">
  +
#!/usr/bin/env python
  +
  +
from modeller import *
  +
from modeller.automodel import *
  +
import os
  +
  +
OUT='../prediction'
  +
  +
os.chdir(OUT)
  +
  +
P='3GH5_3NSM/'
  +
  +
os.chdir(P)
  +
  +
env = environ()
  +
a = automodel(env, alnfile='mymsa.ali',
  +
knowns=('3GH5A', '3NSMA'), sequence='P06865')
  +
a.starting_model = 1
  +
a.ending_model = 1
  +
a.make()
  +
#And the same thing again for 3gh5 and 2gk1
  +
</source>
  +
  +
==3D-Jigsaw==
  +
PDB files consisting of multiple structures were created using repairPDB, for each sequence identity group 3 models were used. These pdb files were then used as input for [http://bmm.cancerresearchuk.org/~populus/populus_submit.html 3D-Jigsaw]. <br>
  +
Example script
  +
<source lang="bash">
  +
#!/bin/bash
  +
  +
IN=/mnt/home/student/meiera/4_HomoMod
  +
  +
for model in $IN/1_SWISSMODEL/2gk1/SwissModel_2gk1A.pdb $IN/2_MODELLER/prediction/2GK1/P06865.B99990001.pdb $IN/3_iTASSER/iTasser/2gk1/model1.pdb; do
  +
perl repairPDB $model >> highseq.pdb
  +
done
  +
  +
</source>
  +
  +
==Evaluation==
  +
  +
=== SAP and TMScore ===
  +
The two scripts were called as follows, exemplified here by the Modeller data.
  +
<source lang="bash">
  +
#!/bin/bash
  +
  +
  +
PDBS=( 2GK1 1O7A 3GH5 )
  +
  +
for pdb in "${PDBS[@]}" #Gotta love bash syntax
  +
do
  +
echo "$pdb"
  +
/mnt/project/pracstrucfunc12/bin/sap ../prediction/$pdb/P06865.B99990001.pdb ../REFERENZchaina.pdb &> ../prediction/$pdb/sap.log
  +
/mnt/project/pracstrucfunc12/bin/sap ../prediction/${pdb}_2D/P06865.B999900012d.pdb ../REFERENZchaina.pdb &> ../prediction/${pdb}_2D/sap.log
  +
  +
/mnt/project/pracstrucfunc12/bin/TMscore ../prediction/$pdb/P06865.B99990001.pdb ../REFERENZchaina.pdb &> ../prediction/$pdb/tmscore.log
  +
/mnt/project/pracstrucfunc12/bin/TMscore ../prediction/${pdb}_2D/P06865.B999900012d.pdb ../REFERENZchaina.pdb &> ../prediction/${pdb}_2D/tmscore.log
  +
done
  +
  +
/mnt/project/pracstrucfunc12/bin/TMscore ../prediction/3GH5_ED/P06865.B99990001ed.pdb ../REFERENZchaina.pdb &> ../prediction/3GH5_ED/tmscore.log
  +
/mnt/project/pracstrucfunc12/bin/sap ../prediction/3GH5_ED/P06865.B99990001ed.pdb ../REFERENZchaina.pdb &> ../prediction/3GH5_ED/sap.log
  +
</source>
  +
===6Å RMSD using Pymol===
  +
*Load reference and model and align them
  +
*Select active site from reference
  +
*Expand selection by 6Å
  +
modify -> expand -> by 6 A
  +
*Exclude the model sequence
  +
modify -> exclude -> object
  +
*Extract selection into object
  +
*Select according region from model
  +
*Align selection to object

Latest revision as of 22:42, 4 June 2012

Back to results.

Template selection

HHpred and COMA were employed in order to complement the 3 sequence identity classes.

SWISS-MODEL

The modelling was performed with the webserver of SWISS-MODEL in automated mode and alignment mode.

Visualisation with Pymol

SWISS-MODEL provides a model file containing the error coding in the B factor column. This error coding was adopted for the superimposed reference-model illustration. The command therefore is:

spectrum b, red_blue, <model-name>

iTasser

Modeller

Get all necessary files <source lang="bash"> cd ../input/

  1. HEXA sequence

wget http://www.uniprot.org/uniprot/P06865.fasta sed -ri 's/^>.+/>P06865/' P06865.fasta

  1. Reference structure

wget http://www.pdb.org/pdb/files/2GJX.pdb

  1. Templates

wget http://www.pdb.org/pdb/files/2GK1.pdb wget http://www.pdb.org/pdb/files/3GH5.pdb wget http://www.pdb.org/pdb/files/1O7A.pdb </source>

Create the pairwise alignments, using Modeller internal methods <source lang=python>

  1. !/usr/bin/env python

from modeller import *

OUT='../prediction/'

def makeAli(targetId, targetFile, templateId, templateFile, templateChain) :

       env = environ()
       aln = alignment(env)
       mdl = model(env, file=templateFile, model_segment=('FIRST:'+templateChain, 'LAST:'+templateChain))
       aln.append_model(mdl, align_codes=templateId, atom_files=templateFile)
       aln.append(file=targetFile, align_codes=targetId, alignment_format='FASTA')
       aln.align2d()
       aln.check()
       aln.write(file=OUT+targetId+templateId+'-2d.ali', alignment_format='PIR')
       aln.malign()
       aln.check()
       aln.write(file=OUT+targetId+templateId+'.ali', alignment_format='PIR')
       return

makeAli('P06865','../input/P06865.fasta','2GK1', '../input/2GK1.pdb','A' ) makeAli('P06865','../input/P06865.fasta','1O7A', '../input/1O7A.pdb','D' ) makeAli('P06865','../input/P06865.fasta','3GH5', '../input/3GH5.pdb','A' ) </source>

Call modeller on a single alignment file <source lang="python">

  1. !/usr/bin/env python

from modeller import * from modeller.automodel import * import os import sys

ali = sys.argv[1] strc = sys.argv[2] wd = sys.argv[3]

os.chdir(wd)

env = environ()

a = automodel(env,

            alnfile  = ali,
            knowns   = strc,
            sequence = 'P06865',
            assess_methods=(assess.DOPE, assess.GA341))

a.starting_model= 1 a.ending_model = 1 a.make() </source>

Caller script for all modeller predictions

<source lang="bash">

  1. !/bin/bash
  1. Ugly hotfix so we can work in our own folders later on (who hardcodes filepaths AND doesn't allow output folders?!)

sed -i 's|X:../input|X:../../input|' ../prediction/*.ali

PDBS=( 2GK1 1O7A 3GH5 )

for pdb in "${PDBS[@]}" #Gotta love bash syntax do

 echo "$pdb"
 mkdir -p ../prediction/$pdb/
 mkdir -p ../prediction/${pdb}_2D/
 ./createModel.py ../P06865${pdb}.ali ${pdb} ../prediction/$pdb/ &> ../prediction/$pdb/log
 ./createModel.py ../P06865${pdb}-2d.ali ${pdb} ../prediction/${pdb}_2D/ &> ../prediction/${pdb}_2D/log

done

mkdir -p ../prediction/3GH5_ED/ ./createModel.py ../P068653GH5_edited.pir 3GH5 ../prediction/3GH5_ED/ &> ../prediction/3GH5_ED/log </source>

Creating MSAs: <source lang="python">

  1. !/usr/bin/env python

from modeller import * import os

OUT='../prediction/' IN='../input/'

P=OUT+'3GH5_3NSM/'

if not os.path.exists (P) :

       os.mkdir(P)

env = environ() aln = alignment(env)

for (code, chain) in (('3GH5', 'A'), ('3NSM', 'A')):

  mdl = model(env, file=IN+code+'.pdb', model_segment=('FIRST:'+chain, 'LAST:'+chain))
  aln.append_model(mdl, atom_files=code, align_codes=code+chain)

aln.salign() aln.write(file=P+'mymas.pap', alignment_format='PAP') aln.write(file=P+'mymsa.ali', alignment_format='PIR')

aln = alignment(env) aln.append(file=P+'mymsa.ali', align_codes='all') aln_block = len(aln)

  1. Read aligned sequence(s):

aln.append(file='../input/P06865.fasta', align_codes='P06865', alignment_format='FASTA')

  1. Structure sensitive variable gap penalty sequence-sequence alignment:

aln.salign() aln.write(file=P+'mymsa.ali', alignment_format='PIR') aln.write(file=P+'mymsa.pap', alignment_format='PAP')

  1. And the same thing again for 3gh5 and 2gk1

</source>

Script for creating the models from the MSAs: <source lang="python">

  1. !/usr/bin/env python

from modeller import * from modeller.automodel import * import os

OUT='../prediction'

os.chdir(OUT)

P='3GH5_3NSM/'

os.chdir(P)

env = environ() a = automodel(env, alnfile='mymsa.ali',

             knowns=('3GH5A', '3NSMA'), sequence='P06865')

a.starting_model = 1 a.ending_model = 1 a.make()

  1. And the same thing again for 3gh5 and 2gk1

</source>

3D-Jigsaw

PDB files consisting of multiple structures were created using repairPDB, for each sequence identity group 3 models were used. These pdb files were then used as input for 3D-Jigsaw.
Example script <source lang="bash">

  1. !/bin/bash

IN=/mnt/home/student/meiera/4_HomoMod

for model in $IN/1_SWISSMODEL/2gk1/SwissModel_2gk1A.pdb $IN/2_MODELLER/prediction/2GK1/P06865.B99990001.pdb $IN/3_iTASSER/iTasser/2gk1/model1.pdb; do

        perl repairPDB $model >> highseq.pdb

done

</source>

Evaluation

SAP and TMScore

The two scripts were called as follows, exemplified here by the Modeller data. <source lang="bash">

  1. !/bin/bash


PDBS=( 2GK1 1O7A 3GH5 )

for pdb in "${PDBS[@]}" #Gotta love bash syntax do

 echo "$pdb"
 /mnt/project/pracstrucfunc12/bin/sap ../prediction/$pdb/P06865.B99990001.pdb ../REFERENZchaina.pdb &> ../prediction/$pdb/sap.log
 /mnt/project/pracstrucfunc12/bin/sap ../prediction/${pdb}_2D/P06865.B999900012d.pdb ../REFERENZchaina.pdb &> ../prediction/${pdb}_2D/sap.log
 /mnt/project/pracstrucfunc12/bin/TMscore ../prediction/$pdb/P06865.B99990001.pdb ../REFERENZchaina.pdb &> ../prediction/$pdb/tmscore.log
 /mnt/project/pracstrucfunc12/bin/TMscore ../prediction/${pdb}_2D/P06865.B999900012d.pdb ../REFERENZchaina.pdb &> ../prediction/${pdb}_2D/tmscore.log

done

/mnt/project/pracstrucfunc12/bin/TMscore ../prediction/3GH5_ED/P06865.B99990001ed.pdb ../REFERENZchaina.pdb &> ../prediction/3GH5_ED/tmscore.log /mnt/project/pracstrucfunc12/bin/sap ../prediction/3GH5_ED/P06865.B99990001ed.pdb ../REFERENZchaina.pdb &> ../prediction/3GH5_ED/sap.log </source>

6Å RMSD using Pymol

  • Load reference and model and align them
  • Select active site from reference
  • Expand selection by 6Å
modify -> expand -> by 6 A
  • Exclude the model sequence
modify -> exclude -> object
  • Extract selection into object
  • Select according region from model
  • Align selection to object