CHAINSAW (CCP4: Supported Program)

NAME

chainsaw - Mutate a pdb file according to an input sequence alignment

SYNOPSIS

chainsaw xyzin foo.pdb alignin foo_ali.ali xyzout foo_out.pdb
[Keyworded input]

DESCRIPTION

Chainsaw is a Molecular Replacement utility which takes an alignment between target and model sequences and modifies the model pdb file by pruning non-conserved residues. The pruning can be done in one of three ways: back to the gamma atom, back to the beta atom, or in such a way as to retain all atoms common to the target and model residues. The names of the retained atoms will be changed if necessary to match the target. Conserved residues are left unchanged.

The residues in the output pdb file will be numbered in a way consistent with the target sequence i.e. if a residue in the target corresponds to a gap in the model, the residue numbers in the output pdb file will contain a gap, but if a residue in the model corresponds to a gap in the target, the residue numbers in the output pdb file will be consecutive.

If there are alternate conformations in the input pdb file, chainsaw will choose the most probable conformation, and assign it an occupancy of 1 in the output pdb file.

Chainsaw accepts several alignment file formats, which it identifies by the file extension: PIR (.pir), Fasta (.fas), Clustal (.aln), MSF/GCG (.msf), interleaved Phylip (.phy), BLAST (.bla) and OCA (.oca). Suggestions for additional formats are welcome.

A .bla file consists of output from the BLAST server:

 Score =  164 bits (416),  Expect = 6e-42, Method: Composition-based stats.
 Identities = 85/120 (70%), Positives = 101/120 (84%), Gaps = 0/120 (0%)

Query  4    LVGVIMGSTSDWETMKYACDILDELNIPYEKKVVSAHRTPDYMFEYAETARERGLKVIIA  63
            +VG+IMGS SDWETM++A  +L EL IP+E  +VSAHRTPD + +YA TA ERGL VIIA
Sbjct  23   VVGIIMGSQSDWETMRHADALLTELEIPHETLIVSAHRTPDRLADYARTAAERGLNVIIA  82

Query  64   GAGGAAHLPGMVAAKTNLPVIGVPVQSKALNGLDSLLSIVQMPGGVPVATVAIGKAGSTN  123
            GAGGAAHLPGM AA T LPV+GVPV+S+AL G+DSLLSIVQMPGGVPV T+AIG +G+ N
Sbjct  83   GAGGAAHLPGMCAAWTRLPVLGVPVESRALKGMDSLLSIVQMPGGVPVGTLAIGASGAKN  142

while a .oca file contains output from the OCA database:

>>PDB:1U11 _A mol:protein length:182     Pure (N5-Carboxy  (182 aa)
 initn: 605 init1: 605 opt: 606  Z-score: 729.8  bits: 141.5 E(): 1.3e-33
Smith-Waterman score: 606;  65.972% identity (65.972% ungapped) in 144 aa overla
p (4-147:23-166)

                                  10        20        30        40
SEARCH                    MKSLVGVIMGSTSDWETMKYACDILDELNIPYEKKVVSAHR
                             .::.:::: ::::::..:  .: ::.::.:  .:::::
PDB:1U MSETAPLPSASSALEDKAASAPVVGIIMGSQSDWETMRHADALLTELEIPHETLIVSAHR
               10        20        30        40        50        60

              50        60        70        80        90       100
SEARCH TPDYMFEYAETARERGLKVIIAGAGGAAHLPGMVAAKTNLPVIGVPVQSKALNGLDSLLS
       ::: . .::.:: ::::.::::::::::::::: :: : :::.::::.:.::.:.:::::
PDB:1U TPDRLADYARTAAERGLNVIIAGAGGAAHLPGMCAAWTRLPVLGVPVESRALKGMDSLLS
               70        80        90       100       110       120

             110       120       130       140       150       160
SEARCH IVQMPGGVPVATVAIGKAGSTNAGLLAAQILGSFHDDIHDALELRREAIEKDVREGSELV
       ::::::::::.:.::: .:. ::.::::.::. ..  .   ::  :
PDB:1U IVQMPGGVPVGTLAIGASGAKNAALLAASILALYNPALAARLETWRALQTASVPNSPITE
              130       140       150       160       170       180

PDB:1U DK

The alignment file should contain the target sequence first and the model sequence second.

In practice, the model sequence in the alignment file will often be different from the model sequence in the pdb file. This may be because the alignment only uses part of the model sequence, or because the structure determination has not resolved all residues. Chainsaw is capable of handling such differences automatically.

Chainsaw will work with both monomer and multimer search models. If you wish to use a monomer model and the model pdb file contains more than one chain, you will have to delete the surplus chains manually. If you use a multimer model, chainsaw will apply the same alignment to each successive chain in the input pdb file.

Chainsaw will output a list of conserved/mutated/deleted residues and an estimate of the sequence identity. If this estimate is less than you expect, it is a sign that something has gone wrong, usually a problem with missing residues.

INPUT AND OUTPUT FILES

XYZIN: Input template coordinates.
ALIGNIN: Input sequence alignment file. Chainsaw accepts several alignment file formats, which it identifies by the file extension: PIR (.pir), Fasta(.fas), Clustal (.aln), MSF/GCG (.msf), interleaved Phylip (.phy), BLAST (.bla) and OCA (.oca). See above for further details.
XYZOUT: Output model coordinates with some atoms removed according to the Chainsaw protocol, and remaining atoms renamed and/or renumbered.

KEYWORDED INPUT

Possible keywords are:

MODE <mode>

Currently, three modes are supported: MIXS (default), MIXA and MAXI. MIXS implements the Mixed Model of R. Schwarzenbacher et al. in which non-conserved residues are truncated to the gamma atom, while conserved residues are preserved unchanged. The MIXA mode is similar, but non-conserved residues are truncated to the beta atom. The MAXI mode retains the maximal number of atoms common to the target and model residues. Exactly which atoms are retained is determined by the table in the chainsaw.h header file.

END

End keyworded input.

EXAMPLES

Target 1mzr and model 1a80_1.pdb.


chainsaw xyzin 1a80_1.pdb alignin 1mzr_1a80.pir xyzout 1a80_1_chainsaw.pdb <<eof
END
eof

AUTHOR

Norman Stein

ACKNOWLEDGEMENTS

Randy Read, Eleanor Dodson, Martyn Winn.

REFERENCES

N. Stein, J. Appl. Cryst. 41, 641 - 643 (2008).
CHAINSAW: a program for mutating pdb files used as templates in molecular replacement.
R. Schwarzenbacher et al., Acta Cryst. D60, 1229 - 1236 (2004).
The importance of alignment accuracy for molecular replacement.