This program will generate a crank XML file so that crank can be run via
script. The user can either input the number of monomers that is expected
and the corresponding solvent content, or the program can get a first guess
via a Kantardjieff - Rupp [1] probability analysis of the
Matthew's [2] coefficient.
Note that the ordering of some keywords is important. In particular, the
XTAL subkeywords (CELL, ATOM, DNAME) must be preceded by the corresponding XTAL
keyword, and similarly for the ATOM and DNAME subkeywords.
The number of nucleotides per monomer. If there are nucleotides, currently there is no support
for reading nucleic acid sequences, so the NNUC keyword must be set. See also the NMON (optional) keyword.
XTAL <ID>
- <ID>
-
The crystal's name/identification string.
XTAL SUBKEYWORDS:
CELL
<a> <b> <c> <alpha> <beta> <gamma>
-
<a> <b> <c> <alpha> <beta> <gamma>
-
Cell parameters for the given XTAL.
Default: take the values from the mtz file.
MODL <pdbfile>
-
<pdbfile>
-
Input a pdb file containing substructure coordinates in standard pdb form. Note! only
input coordinates with MODL or XYZ or SITE. Using any two of the keywords will result in an error.
ATOM <ID>
-
<ID>
-
The atom's name. The name must match a (case insensitive) atom's name in
$CLIBD/atomsf.lib.
ATOM SUBKEYWORDS:
NUMB <numb>
-
<numb>
-
The number of expected atoms *PER MONOMER*. Thus the total number of
heavy atoms search for is <numb> * <nmonomer>. See also the NMON keyword.
XYZ <Xfrac> <Yfrac> <Zfrac> [NOREf [X] [Y] [Z] ]
-
<Xfrac> <Yfrac> <Zfrac>
-
Fractional atomic coordinates.
-
NOREF X Y Z
-
Including the NOREf subkeyword of XYZ indicates that the X, Y and/or Z
coordinate of this atom will not be refined.
Default: refine all the coordinates.
OCCU <occ> [NOREf]
-
<occ>
-
Atomic occupancy - at the moment, convergence is faster if you start with
a lower value (ie. 0.25)
-
NOREf
-
The atomic occupancy will not be refined
BISO <bfac> [NOREf]
-
<bfac>
-
Atomic isotropic B factor. For faster convergence, use the B-factor from
a Wilson plot (ie. from the WILSON program). This will be the default value
when using CCP4i.
-
NOREf
-
The atomic B factor will not be refined
DNAMe <ID>
The dataset identifier. This keyword is required. If your pipeline contains
a SHELX program, <ID> must be either NAT, SIR, SIRA, PEAK, INFL, HREM, or LREM.
DNAMe SUBKEYWORDS:
Either intensities or structure factor amplitudes can be inputted (NOT both!)
To input intensities use the following:
COLUmn I=<i> SI=<si> I+=<i+> SI+=<si+> I-=<i-> SI-=<si->
Diffraction data for the XTAL and DNAMe defined.
If anomalous data is not to be used, set I and SI only. If using
anomalous data, set I+, SI+, I-, SI-. Setting both I and I+ will result
in an error.
-
<i>
-
I (observed intensity *if no anomalous data is present*).
-
<si>
-
Corresponding sigma for <i>.
-
<i+>
-
I+ (observed intensity of positive Bijvoet pair).
-
<si+>
-
Corresponding sigma of <i+>.
-
<i->
-
I- (observed intensity of negative Bijvoet pair).
-
<si->
-
Corresponding sigma of <i->.
To input structure factor amplitudes, use the following:
COLUmn F=<f> SF=<sf> F+=<f+> SF+=<sf+> F-=<f-> SF-=<sf->
Diffraction data for the XTAL and DNAMe defined.
If anomalous data is not to be used, set F and SF only. If using
anomalous data, set F+, SF+, F-, SF-. Setting both F and F+ will result
in an error. If only F and DANO is present in the mtz file, use the ccp4
program mtzMADmod to change F/DANO to F+/F-.
-
<f>
-
|F| (observed structure factor amplitude *if no anomalous data is present*).
-
<sf>
-
Corresponding sigma for <f>.
-
<f+>
-
|F+| (observed structure factor amplitude of positive Bijvoet pair).
-
<sf+>
-
Corresponding sigma of <f+>.
-
<f->
-
|F-| (observed structure factor amplitude of negative Bijvoet pair).
-
<sf->
-
Corresponding sigma of <f->.
OPTIONAL KEYWORDS:
PIPEline <type> FIRSt <firststep> LAST <laststep>
In gcx, there are five pipelines predefined. (At the moment, all predefined pipelines
use PREP to prepare the data and ARP/wARP + REFMAC for automated model building and refinement).
-
Setting <type> to 1 gives the following pipeline (which is the default):
-
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.
-
Setting <type> to 2 gives the following pipeline:
-
PREP -> SHELXC -> SHELXD -> SHELXE -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.
-
Setting <type> to 3 gives a pipeline that George Sheldrick has suggested:
-
PREP -> SHELXC -> SHELXD -> SHELXE -> ARP/wARP + REFMAC.
-
The first two SHELXE jobs, followed by SHELXEHAND try to determine the correct enantiomorph.
The third and last SHELXE job attempts to get the best phases as possible by running 100
SHELXE cycles.
-
Setting <type> to 4 gives the following:
-
PREP -> AFRO -> CRUNCH2 -> BP3 -> DM -> ARP/wARP + REFMAC.
-
Setting <type> to 5 gives the following:
-
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> RESOLVEDM -> RESOLVEMB.
-
Setting <type> to 6 gives the following:
-
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> PIRATE -> BUCCANEER.
The FIRSt subkeyword of PIPE allows you to optionally start the pipeline at a step. The following
are the possible values of .
-
DETEct
-
Start at substructure detection (default).
-
PHASe
-
Start at substructure phasing (you must input ATOM coordinates).
The LAST subkeyword of PIPE allows you to optionally stop the pipeline after a step. The following
are the possible values of .
-
DETEct
-
Stop after substructure detection.
-
PHASe
-
Stop after substructure phasing.
-
DM
-
Stop after density modification.
-
BUILD
-
Stop after model building.
If you would like to change any of the program options in a predefined pipeline, after
the PIPEline has been set, you can reset a program value using the program keyword (shown
below). So, for example, if you want to use PIPEline 1, but would like to set
a SIGF and SANO cutoff for Fa estimation, you would do the following:
-
PIPEline 1
-
AFRO 2 SIGF 2 SANO 1
Since the PIPEline keyword defines AFRO as the second step, you can access AFRO subkeywords
as shown. See the "PROGRAM KEYWORDS" section below for more options that can be modified.
PROGRAM KEYWORDS:
You can set up a pipeline in crank and change options for a program that is part of the
pipeline using the program keywords. The general syntax for the changing options for a
given program is the following:
<programname> <step> <option1> <option2> ...
where <programname> is the name of the crank plugin. Currently, there are plugins for
the following program: PREP, AFRO, CRUNch2, BP3, DM, SOLO, PIRA, WARP, SHLC, SHLD, SHLE.
<step> is the step number for the program in the crank pipeline. Thus, using gcx, a user
can construct their own pipeline. However, at the moment, there is no checking, so this
should be reserved for experts only!
<option1> are the particular options that can be changed for the program at the particular
<step>.
PREP <step>
PREP is the crank plugin for using truncate [3] to calculate structure factor amplitudes
from intensities (if necessary) and using scaleit [4] to relatively scale the data sets
together (if necessary). At the moment, no options are available to be changed with gcx.
AFRO <step> SIGF <nsigf> SANO <nsano> SISO <nsiso> HIREs <hires>
AFRO is the crank plugin for the AFRO [5] program to calculate FA values needed for substructure
detection programs. The subkeyword options for the AFRO keyword are the following:
-
SIGF <nsigf>
-
Exclude reflections if FP < <nsigf> * SIGF. The default setting is 2.
-
SANO <nsano>
-
Exclude reflections if DANO < <nsano> * SDANO. The recommended setting is 0.5.
(DANO = abs(|F+| - |F-|) and SDANO is the standard deviation of DANO in measurement.
-
SISO <niso>
-
Exclude reflections if abs(|Fder| - |Fnat|) < <nsiso> * SDISO. The recommended setting is 0.5.
SDISO is sqrt(SIGFder^2 + SIGFnat^2).
-
HIREs <hires>
-
Specify the high resolution limit. gcx defaults to set the limit to 0.5 Angstroms above the high
resolution limit - unless the high resolution limit is lower than 2.5, when it sets it to 3.0 Angstroms.
CRUNch2 <step> TRIAls <ntrials> PTRIals <pntrials> THREshold <nthreshold> DEVIation <ndeviation> BP3Test <ntest> SPEC
Crunch2 is the crank plugin for the CRUNCH2 [6] substructure detection program using
Karle-Hauptmann matrices. The following options can be changed within gcx.
-
TRIAls <ntrials>
-
Sets the number of trials. The default is 15. For difficult problems with weak signals, increasing
the number of trails can lead to a solution, if one was not found after 15 trials.
-
PTRIals <pntrials>
-
Sets the number of Patterson trials. Before a crunch2 run, a Patterson minimal function is calculated to
generate trial solutions for crunch2. The default is to generate 150 starting Patterson trials.
Crunch2 ranks all starting patterson solutions and runs trials on the top <ntrials>.
-
THREshold <nthreshold>
-
<nthreshold> is the minimum crunch2 figure of merit to stop the crunch2 run. The default is 1.00.
-
DEVIation <ndeviation>
-
<ndeviation> specifies another stopping criteria: Crunch2 is stopped if the highest score if a
trial is <ndeviation> times greater than the lowest score. The default value for
<ndeviation> is 1.75.
-
BP3Test <ntest>
-
<ntest> specifies the number of crunch2 trials to perform before running BP3 to verify if the substructure
is correct. If <ntest> is 0, BP3 will not be run between crunch2 trials. The default value for <ntest> is 3,
-
SPEC
-
Add the SPEC subkeyword if you think that any of your substructure atoms lie on special positions.
BP3 <step> STOP <stop> NOHAnd NODIff PHASe
BP3 is the crank plugin for using the substructure phasing program BP3 [7],
[8].
-
STOP <stop>
-
The minimum fom allowable to proceed to a further step is specified in <stop>.
-
NOHAnd
-
Do not generate phases for the other hand.
-
NODIff
-
Do not attempt to find and refine additional sites from gradient maps.
-
PHASe
-
Perform fast phasing in BP3. This is the default for MAD.
SOLOmon <step> NOHAnd OPTI NODM NOBIas BETA <beta> MARGin <margin> NCYCles <ncyc> HCYCles <hcyc> MLHL DIREct SIGMaa
SOLOmon is the crank plugin for using the density modification program SOLOMON [9].
-
NOHAnd
-
Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand
by running a few cycles of density modification and seeing which hand has a greater contrast in protein
to solvent region.
-
OPTI
-
Optimize the solvent content (by running a few cycles of density modification corresponding to
different monomers). The default is not to optimize the solvent content.
-
NODM
-
Do not run density modification (ie. if you only wish to use the density modification program to determine
the hand or optimize the solvent content.)
-
NOBIas
-
Do not calculate or apply a bias correction parameter.
-
BETA <beta>
-
Use the value <beta> for the bias correction parameter (and do not refine it).
-
MARGin <margin>
-
<margin> is the parameter which the selected hand's contrast*(correlation coefficient) must exceed the
other hand.
-
NCYCles <ncyc>
-
Run <ncyc> cycles of density modification.
-
HCYCles <hcyc>
-
Determine the hand by looking at the (correlation coefficient)*contrast at the <hcyc> cycle of density
modification of each hand.
-
MLHL
-
Use the MLHL function of multicomb for phase combination. Use only one of MLHL or DIREct keyword.
-
DIREct
-
Use the DIREct (for SAD or SIRAS data) from multicomb for phase combination.
-
SIGMaa
-
Use the SIGMAA from CCP4 for phase combination.
DM <step> NOHAnd NODM NOHIst
DM is the crank plugin for using the density modification program DM [10].
-
NOHAnd
-
Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand
by running a few cycles of density modification and seeing which hand has a greater contrast in protein
to solvent region.
-
NODM
-
Do not run density modification (ie. if you only wish to use the density modification program to determine
the hand or optimize the solvent content.)
-
NOHIST
-
Do not use histogram matching in density modification (use this if you have a metal cluster).
SHLC <step> HIREs <hires>
SHLC is the crank plugin for the SHELXC [11] program to generate files
(including FA values) needed for SHELXD and SHELXE. The subkeyword options for the SHLC keyword
are the following:
-
HIREs <hires>
-
Specify the high resolution limit. gcx defaults to set the limit to 0.5 Angstroms above the high
resolution limit - unless the high resolution limit is lower than 2.5, in which case the limit is
set to the lower of 3.0 Angstroms or the high resolution limit of the data.
SHLD <step> TRIAls <ntrials> MIND <mind> MDEQ <mdeq> THREshold <nthreshold> DSUL <ndsul>
SHLD is the crank plugin for the SHELXD [12] program to determine substructures.
-
TRIAls <ntrials>
-
Sets the number of trials. The default is 500, but for more difficult problems with a weak signal,
setting <ntrials> to a greater value (like 1000) can lead to a solution, when a solution was
not found with 500.
-
MIND <mind>
-
Set the minimum distance between substructure atoms. The crank default value is 3.5.
-
MDEQ <mdeq>
-
Set the minimum distance allowed between symmetry equivalents. The crank default value is -0.1 (ie. allowing
for special positions).
-
THREshold <nthreshold>
-
<nthreshold> is the minimum "weak" correlation coefficient needed to stop the shelxd run.
The default is 30.
-
DSUL <ndsul>
-
<ndsul> is the number of disulfide bridges expected in the structure. The default is 0.
SHLE <step> NOHAnd OPTI NODM FREE <reso> NCYCles <ncyc> HCYCles <hcyc>
SHLE is the crank plugin for using the density modification program SHELXE [13].
-
NOHAnd
-
Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand
by running a few cycles of density modification and seeing which hand has a greater contrast in protein
to solvent region.
-
OPTI
-
Optimize the solvent content (by running a few cycles of density modification corresponding to
different monomers). The default is not to optimize the solvent content.
-
NODM
-
Do not run density modification (ie. if you only wish to use the density modification program to determine
the hand or optimize the solvent content.)
-
FREE <reso>
-
Use the "free lunch" algorithm and extend the resolution of your data to the value <reso>. If no
value is given for <reso>, the highest resolution limit minus 0.5 is used. The default is not to
use the free lunch algorithm.
-
NCYCles <ncyc>
-
Run <ncyc> cycles of density modification.
-
HCYCles <hcyc>
-
Determine the hand by looking at the contrast at the <hcyc> cycle of density
modification of each hand.
WARP <step> BIG <nbig> SMAL <nsmal> DOCK <ndock> CYCL <ncyc> TARG <targ> NOREstrain NOLOop TWIN
WARP is the crank plugin for automated model building and iterative refinement with
ARP/wARP [14] and REFMAC5 [15]. The following
subkeywords can be modified for this step.
-
BIG <nbig>
-
<nbig> is the number of ARP/wARP "big" cycles or "cycles of autobuilding" given in the ARP/wARP ccp4i
interface. The default is 10.
-
SMAL <nsmall>
-
<nsmall> is the number of ARP/wARP "small" cycles. The "total cycles" as defined in the ARP/wARP ccp4i
interface is <nbig> * <nsmall>. The default is 5.
-
DOCK <ndock>
-
<ndock> is the number of ARP/wARP cycles of autobuilding to perform before sequence docking (if a
sequence file is given). The default is 7.
-
CYCL <ncyc>
-
<ncyc> is the number of cycles of refinement in each REFMAC run. The default is 3.
-
TARG <targ>
-
<targ> is the name of the target function to use in REFMAC refinement. The choices are RICE or using no
prior phase information, MLHL [16] which uses phase information encoded by
Hendrickson-Lattman coefficients, or SAD [17] which uses phase information directly
from Bijvoet differences.
-
NOREstrain
-
Do not use conditional dynamics. The default is to use it.
-
NOLOop
-
Do not use loop building. The default is to use it.
-
TWIN
-
Use twin refinement in refmac: only available with TARGets MLHL and RICE at the moment. The default is to use it.
PIRAte <step> NCYCles <ncyc> INWEight
PIRAte is the crank plugin for using the density modification program PIRATE [18].
-
NCYCles
-
<ncyc> is the number of cycles of density modification to perform. The default is 3.
-
INWEight
-
The weight to apply to the input phases/Hendrickson-Lattman coefficients. The default is 0.75.
BUCC <step> NCYCles <ncyc> FAST NORFree RELIability <nreli>
BUCCaneer is the crank plugin for using the model building program BUCCANEER [19].
-
NCYCles
-
<ncyc> is the number of cycles of building. The default is 3.
-
FAST
-
Use FAST building (ie. no recycling with REFMAC). The default is to recycle with REFMAC.
-
NORFree
-
Do not use the free R factor. The default is to use it.
-
RELIability
-
<nreli> is the reliability of sequence docking as a fraction. The default is 0.95.
NMON <nmon>
<nmon> is the number of monomers expected. If it is not set, the program
will calculate the most probable number by performing a Kantardjieff - Rupp
probability analysis of the Matthew's coefficient.
SOLV <solv>
<solv> is the solvent content. If it is not set, the program
will calculate the most probable number of monomers by performing a
Kantardjieff - Rupp probability analysis of the Matthew's coefficient
and set the solvent content based on this. If it is set, and the number
differs by 0.25 from that corresponding to NMON (if set) or the most probable
NMON, an error will result.
OBFA <bfac>
<bfac> is an estimate of the overall B-factor. If it is not set, the program
will calculate it based on a maximum likelihood analysis.
OUTPut <outputname>
<outputname> is the string associated with the XML file
that gcx writes out. The default <outputname> is "crank".
VERBose <n>
Specify amount of information to be outputted (where n is a positive integer).
n = 0 is the normal output, n = 1 is more output and n = 2 is for debugging
purposes. Default: n = 0.
Please look in the examples directory!
REFERENCES
- [1] Kantardjieff, K.A. and Rupp, B. (2003) Protein Science, 12, 1865-1871.
- [2] Matthews, B.W. (1968) J.Mol.Biol, 33, 491-497.
- [3] French, G.S. and Wilson, K.S. (1978) Acta. Cryst., A34, 517-534.
- [4] Evans, P.R., Dodson, E.J. and Dodson R. (unpublished).
- [5] Pannu, N.S. (unpublished).
- [6] de Graaff, R.A.G., Hilge, M., van der Plas, J.L. and Abrahams, J.P. (2001)
Acta Cryst., D57, 1857-1862.
- [7] Pannu, N.S. and Read, R.J. (2004) Acta Cryst., D60, 22-27.
- [8] Pannu, N.S., McCoy, A.J. and Read, R.J. (2003) Acta Cryst., D59, 1801-1808.
- [9] Abrahams, J.P. and Leslie, A.J.W. (1996) Acta Cryst., D52, 30-42.
- [10] Cowtan, K.D. (1994) Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography,
31, 34-38.
- [11] Sheldrick, G.M. http://shelx.uni-ac.gwdg.de/SHELX/
- [12] Schneider T.R., Sheldrick G.M. (2002) Acta Cryst., D58, 1772-1779.
- [13] Sheldrick, G.M. (2002) Z. Kristallogr., 217, 644-650.
- [14] Perrakis, A., Morris, R.M. and Lamzin, V.S. (1999) Nat Struct Biol., 6, 458-463.
- [15] Murshudov, G.N., Vagin, A.A. and Dodson, E.J. (1997) Acta Cryst., D53, 240-255.
- [16] Pannu, N.S., Murshudov, G.N., Dodson, E.J. and Read, R.J. (1998) Acta Cryst. D54,
1285-1294.
- [17] Skubak, P, Murshudov, G.N. and Pannu, N.S. (2004) Acta Cryst., D60, 2196-2201.
- [18] Cowtan, K (2000) Acta Cryst., D56, 1612-1621.
- [19] Cowtan, K (2006) Acta Cryst., D56, 1002-1011.
- [20] Terwilliger, T.C. (2000) Acta Cryst., D56, 965-972.
- [21] Terwilliger, T.C. (2003) Acta Cryst., D59, 38-44.
Last modified: Thu May 4 11:32:32 CEST 2006