GCX Version 0.8 - Documentation

NAME

gcx - a program to generate a crank XML file to run crank via script

SYNOPSIS

gcx HKLIN foo.mtz SEQIN foo.pir HKLOUT foo_out.mtz

Keyworded input

DESCRIPTION

This program will generate a crank XML file so that crank can be run via script. The user can either input the number of monomers that is expected and the corresponding solvent content, or the program can get a first guess via a Kantardjieff - Rupp [1] probability analysis of the Matthew's [2] coefficient.

GETTING STARTED

The best way to start is to use the example scripts - which will start crank as well.

KEYWORDED INPUT

Note that the ordering of some keywords is important. In particular, the XTAL subkeywords (CELL, ATOM, DNAME) must be preceded by the corresponding XTAL keyword, and similarly for the ATOM and DNAME subkeywords.

NRES <nres>

<nres>

The number of residues PER MONOMER. Either the NRES or the NNUC keyword must be set, or the (protein) sequence must be input with SEQIN. See also the NMON (optional) keyword.

NNUC <nnuc>

<nnuc>

The number of nucleotides per monomer. If there are nucleotides, currently there is no support for reading nucleic acid sequences, so the NNUC keyword must be set. See also the NMON (optional) keyword.

XTAL <ID>

<ID>: The crystal's name/identification string.

XTAL SUBKEYWORDS:

CELL

<a> <c> <alpha> <beta> <gamma>: Cell parameters for the given XTAL. Default: take the values from the mtz file.

MODL <pdbfile>

<pdbfile>: Input a pdb file containing substructure coordinates in standard pdb form. Note! only input coordinates with MODL or XYZ or SITE. Using any two of the keywords will result in an error.

ATOM <ID>

<ID>

The atom's name. The name must match a (case insensitive) atom's name in $CLIBD/atomsf.lib.

ATOM SUBKEYWORDS:

NUMB <numb>

<numb>: The number of expected atoms *PER MONOMER*. Thus the total number of heavy atoms search for is <numb> * <nmonomer>. See also the NMON keyword.

XYZ <Xfrac> <Yfrac> <Zfrac> [NOREf [X] [Y] [Z] ]

<Xfrac> <Yfrac> <Zfrac>: Fractional atomic coordinates.
NOREF X Y Z: Including the NOREf subkeyword of XYZ indicates that the X, Y and/or Z coordinate of this atom will not be refined. Default: refine all the coordinates.

OCCU <occ> [NOREf]

<occ>: Atomic occupancy - at the moment, convergence is faster if you start with a lower value (ie. 0.25)
NOREf: The atomic occupancy will not be refined

BISO <bfac> [NOREf]

<bfac>: Atomic isotropic B factor. For faster convergence, use the B-factor from a Wilson plot (ie. from the WILSON program). This will be the default value when using CCP4i.; NOREf
The atomic B factor will not be refined

DNAMe <ID>

The dataset identifier. This keyword is required. If your pipeline contains a SHELX program, <ID> must be either NAT, SIR, SIRA, PEAK, INFL, HREM, or LREM.

DNAMe SUBKEYWORDS:

Either intensities or structure factor amplitudes can be inputted (NOT both!) To input intensities use the following:

COLUmn I= SI=<si> I+=<i+> SI+=<si+> I-=<i-> SI-=<si->

Diffraction data for the XTAL and DNAMe defined. If anomalous data is not to be used, set I and SI only. If using anomalous data, set I+, SI+, I-, SI-. Setting both I and I+ will result in an error.

: I (observed intensity *if no anomalous data is present*).
<si>: Corresponding sigma for .
<i+>: I+ (observed intensity of positive Bijvoet pair).
<si+>: Corresponding sigma of <i+>.
<i->: I- (observed intensity of negative Bijvoet pair).
<si->: Corresponding sigma of <i->.

To input structure factor amplitudes, use the following:

COLUmn F=<f> SF=<sf> F+=<f+> SF+=<sf+> F-=<f-> SF-=<sf->

Diffraction data for the XTAL and DNAMe defined. If anomalous data is not to be used, set F and SF only. If using anomalous data, set F+, SF+, F-, SF-. Setting both F and F+ will result in an error. If only F and DANO is present in the mtz file, use the ccp4 program mtzMADmod to change F/DANO to F+/F-.

<f>: |F| (observed structure factor amplitude *if no anomalous data is present*).
<sf>: Corresponding sigma for <f>.
<f+>: |F+| (observed structure factor amplitude of positive Bijvoet pair).
<sf+>: Corresponding sigma of <f+>.
<f->: |F-| (observed structure factor amplitude of negative Bijvoet pair).
<sf->: Corresponding sigma of <f->.

OPTIONAL KEYWORDS:

PIPEline <type> FIRSt <firststep> LAST <laststep>

In gcx, there are five pipelines predefined. (At the moment, all predefined pipelines use PREP to prepare the data and ARP/wARP + REFMAC for automated model building and refinement).

Setting <type> to 1 gives the following pipeline (which is the default):
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.: Setting <type> to 2 gives the following pipeline:
PREP -> SHELXC -> SHELXD -> SHELXE -> BP3 -> SOLOMON -> ARP/wARP + REFMAC.: Setting <type> to 3 gives a pipeline that George Sheldrick has suggested:
PREP -> SHELXC -> SHELXD -> SHELXE -> ARP/wARP + REFMAC.: The first two SHELXE jobs, followed by SHELXEHAND try to determine the correct enantiomorph. The third and last SHELXE job attempts to get the best phases as possible by running 100 SHELXE cycles.; Setting <type> to 4 gives the following:
PREP -> AFRO -> CRUNCH2 -> BP3 -> DM -> ARP/wARP + REFMAC.: Setting <type> to 5 gives the following:
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> RESOLVEDM -> RESOLVEMB.: Setting <type> to 6 gives the following:
PREP -> AFRO -> CRUNCH2 -> BP3 -> SOLOMON -> PIRATE -> BUCCANEER.

The FIRSt subkeyword of PIPE allows you to optionally start the pipeline at a step. The following are the possible values of .

DETEct: Start at substructure detection (default).
PHASe: Start at substructure phasing (you must input ATOM coordinates).

The LAST subkeyword of PIPE allows you to optionally stop the pipeline after a step. The following are the possible values of .

DETEct: Stop after substructure detection.
PHASe: Stop after substructure phasing.
DM: Stop after density modification.
BUILD: Stop after model building.

If you would like to change any of the program options in a predefined pipeline, after the PIPEline has been set, you can reset a program value using the program keyword (shown below). So, for example, if you want to use PIPEline 1, but would like to set a SIGF and SANO cutoff for Fa estimation, you would do the following:

PIPEline 1 AFRO 2 SIGF 2 SANO 1

Since the PIPEline keyword defines AFRO as the second step, you can access AFRO subkeywords as shown. See the "PROGRAM KEYWORDS" section below for more options that can be modified.

PROGRAM KEYWORDS:

You can set up a pipeline in crank and change options for a program that is part of the pipeline using the program keywords. The general syntax for the changing options for a given program is the following: <programname> <step> <option1> <option2> ... where <programname> is the name of the crank plugin. Currently, there are plugins for the following program: PREP, AFRO, CRUNch2, BP3, DM, SOLO, PIRA, WARP, SHLC, SHLD, SHLE. <step> is the step number for the program in the crank pipeline. Thus, using gcx, a user can construct their own pipeline. However, at the moment, there is no checking, so this should be reserved for experts only! <option1> are the particular options that can be changed for the program at the particular <step>.

PREP <step>

PREP is the crank plugin for using truncate [3] to calculate structure factor amplitudes from intensities (if necessary) and using scaleit [4] to relatively scale the data sets together (if necessary). At the moment, no options are available to be changed with gcx.

AFRO <step> SIGF <nsigf> SANO <nsano> SISO <nsiso> HIREs <hires>

AFRO is the crank plugin for the AFRO [5] program to calculate FA values needed for substructure detection programs. The subkeyword options for the AFRO keyword are the following:

SIGF <nsigf>: Exclude reflections if FP < <nsigf> * SIGF. The default setting is 2.
SANO <nsano>: Exclude reflections if DANO < <nsano> * SDANO. The recommended setting is 0.5. (DANO = abs(|F+| - |F-|) and SDANO is the standard deviation of DANO in measurement.
SISO <niso>: Exclude reflections if abs(|Fder| - |Fnat|) < <nsiso> * SDISO. The recommended setting is 0.5. SDISO is sqrt(SIGFder^2 + SIGFnat^2).
HIREs <hires>: Specify the high resolution limit. gcx defaults to set the limit to 0.5 Angstroms above the high resolution limit - unless the high resolution limit is lower than 2.5, when it sets it to 3.0 Angstroms.

CRUNch2 <step> TRIAls <ntrials> PTRIals <pntrials> THREshold <nthreshold> DEVIation <ndeviation> BP3Test <ntest> SPEC

Crunch2 is the crank plugin for the CRUNCH2 [6] substructure detection program using Karle-Hauptmann matrices. The following options can be changed within gcx.

TRIAls <ntrials>: Sets the number of trials. The default is 15. For difficult problems with weak signals, increasing the number of trails can lead to a solution, if one was not found after 15 trials.
PTRIals <pntrials>: Sets the number of Patterson trials. Before a crunch2 run, a Patterson minimal function is calculated to generate trial solutions for crunch2. The default is to generate 150 starting Patterson trials. Crunch2 ranks all starting patterson solutions and runs trials on the top <ntrials>.
THREshold <nthreshold>: <nthreshold> is the minimum crunch2 figure of merit to stop the crunch2 run. The default is 1.00.
DEVIation <ndeviation>: <ndeviation> specifies another stopping criteria: Crunch2 is stopped if the highest score if a trial is <ndeviation> times greater than the lowest score. The default value for <ndeviation> is 1.75.
BP3Test <ntest>: <ntest> specifies the number of crunch2 trials to perform before running BP3 to verify if the substructure is correct. If <ntest> is 0, BP3 will not be run between crunch2 trials. The default value for <ntest> is 3,
SPEC: Add the SPEC subkeyword if you think that any of your substructure atoms lie on special positions.

BP3 <step> STOP <stop> NOHAnd NODIff PHASe

BP3 is the crank plugin for using the substructure phasing program BP3 [7], [8].

STOP <stop>: The minimum fom allowable to proceed to a further step is specified in <stop>.
NOHAnd: Do not generate phases for the other hand.
NODIff: Do not attempt to find and refine additional sites from gradient maps.
PHASe: Perform fast phasing in BP3. This is the default for MAD.

SOLOmon <step> NOHAnd OPTI NODM NOBIas BETA <beta> MARGin <margin> NCYCles <ncyc> HCYCles <hcyc> MLHL DIREct SIGMaa

SOLOmon is the crank plugin for using the density modification program SOLOMON [9].

NOHAnd: Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand by running a few cycles of density modification and seeing which hand has a greater contrast in protein to solvent region.
OPTI: Optimize the solvent content (by running a few cycles of density modification corresponding to different monomers). The default is not to optimize the solvent content.
NODM: Do not run density modification (ie. if you only wish to use the density modification program to determine the hand or optimize the solvent content.)
NOBIas: Do not calculate or apply a bias correction parameter.
BETA <beta>: Use the value <beta> for the bias correction parameter (and do not refine it).
MARGin <margin>: <margin> is the parameter which the selected hand's contrast*(correlation coefficient) must exceed the other hand.
NCYCles <ncyc>: Run <ncyc> cycles of density modification.
HCYCles <hcyc>: Determine the hand by looking at the (correlation coefficient)*contrast at the <hcyc> cycle of density modification of each hand.
MLHL: Use the MLHL function of multicomb for phase combination. Use only one of MLHL or DIREct keyword.
DIREct: Use the DIREct (for SAD or SIRAS data) from multicomb for phase combination.
SIGMaa: Use the SIGMAA from CCP4 for phase combination.

DM <step> NOHAnd NODM NOHIst

DM is the crank plugin for using the density modification program DM [10].

NOHAnd: Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand by running a few cycles of density modification and seeing which hand has a greater contrast in protein to solvent region.
NODM: Do not run density modification (ie. if you only wish to use the density modification program to determine the hand or optimize the solvent content.)
NOHIST: Do not use histogram matching in density modification (use this if you have a metal cluster).

SHLC <step> HIREs <hires>

SHLC is the crank plugin for the SHELXC [11] program to generate files (including FA values) needed for SHELXD and SHELXE. The subkeyword options for the SHLC keyword are the following:

HIREs <hires>: Specify the high resolution limit. gcx defaults to set the limit to 0.5 Angstroms above the high resolution limit - unless the high resolution limit is lower than 2.5, in which case the limit is set to the lower of 3.0 Angstroms or the high resolution limit of the data.

SHLD <step> TRIAls <ntrials> MIND <mind> MDEQ <mdeq> THREshold <nthreshold> DSUL <ndsul>

SHLD is the crank plugin for the SHELXD [12] program to determine substructures.

TRIAls <ntrials>: Sets the number of trials. The default is 500, but for more difficult problems with a weak signal, setting <ntrials> to a greater value (like 1000) can lead to a solution, when a solution was not found with 500.
MIND <mind>: Set the minimum distance between substructure atoms. The crank default value is 3.5.
MDEQ <mdeq>: Set the minimum distance allowed between symmetry equivalents. The crank default value is -0.1 (ie. allowing for special positions).
THREshold <nthreshold>: <nthreshold> is the minimum "weak" correlation coefficient needed to stop the shelxd run. The default is 30.
DSUL <ndsul>: <ndsul> is the number of disulfide bridges expected in the structure. The default is 0.

SHLE <step> NOHAnd OPTI NODM FREE <reso> NCYCles <ncyc> HCYCles <hcyc>

SHLE is the crank plugin for using the density modification program SHELXE [13].

NOHAnd: Do not attempt to determine the correct enantiomorph. The default is to determine the correct hand by running a few cycles of density modification and seeing which hand has a greater contrast in protein to solvent region.
OPTI: Optimize the solvent content (by running a few cycles of density modification corresponding to different monomers). The default is not to optimize the solvent content.
NODM: Do not run density modification (ie. if you only wish to use the density modification program to determine the hand or optimize the solvent content.)
FREE <reso>: Use the "free lunch" algorithm and extend the resolution of your data to the value <reso>. If no value is given for <reso>, the highest resolution limit minus 0.5 is used. The default is not to use the free lunch algorithm.
NCYCles <ncyc>: Run <ncyc> cycles of density modification.
HCYCles <hcyc>: Determine the hand by looking at the contrast at the <hcyc> cycle of density modification of each hand.

WARP <step> BIG <nbig> SMAL <nsmal> DOCK <ndock> CYCL <ncyc> TARG <targ> NOREstrain NOLOop TWIN

WARP is the crank plugin for automated model building and iterative refinement with ARP/wARP [14] and REFMAC5 [15]. The following subkeywords can be modified for this step.

BIG <nbig>

<nbig> is the number of ARP/wARP "big" cycles or "cycles of autobuilding" given in the ARP/wARP ccp4i interface. The default is 10.

SMAL <nsmall>

<nsmall> is the number of ARP/wARP "small" cycles. The "total cycles" as defined in the ARP/wARP ccp4i interface is <nbig> * <nsmall>. The default is 5.

DOCK <ndock>

<ndock> is the number of ARP/wARP cycles of autobuilding to perform before sequence docking (if a sequence file is given). The default is 7.

CYCL <ncyc>

<ncyc> is the number of cycles of refinement in each REFMAC run. The default is 3.

TARG <targ>

<targ> is the name of the target function to use in REFMAC refinement. The choices are RICE or using no prior phase information, MLHL [16] which uses phase information encoded by Hendrickson-Lattman coefficients, or SAD [17] which uses phase information directly from Bijvoet differences.

NOREstrain

Do not use conditional dynamics. The default is to use it.

NOLOop

Do not use loop building. The default is to use it.

TWIN

Use twin refinement in refmac: only available with TARGets MLHL and RICE at the moment. The default is to use it.

PIRAte <step> NCYCles <ncyc> INWEight

PIRAte is the crank plugin for using the density modification program PIRATE [18].

NCYCles: <ncyc> is the number of cycles of density modification to perform. The default is 3.
INWEight: The weight to apply to the input phases/Hendrickson-Lattman coefficients. The default is 0.75.

BUCC <step> NCYCles <ncyc> FAST NORFree RELIability <nreli>

BUCCaneer is the crank plugin for using the model building program BUCCANEER [19].

NCYCles: <ncyc> is the number of cycles of building. The default is 3.
FAST: Use FAST building (ie. no recycling with REFMAC). The default is to recycle with REFMAC.
NORFree: Do not use the free R factor. The default is to use it.
RELIability: <nreli> is the reliability of sequence docking as a fraction. The default is 0.95.

NMON <nmon>

<nmon> is the number of monomers expected. If it is not set, the program will calculate the most probable number by performing a Kantardjieff - Rupp probability analysis of the Matthew's coefficient.

SOLV <solv>

<solv> is the solvent content. If it is not set, the program will calculate the most probable number of monomers by performing a Kantardjieff - Rupp probability analysis of the Matthew's coefficient and set the solvent content based on this. If it is set, and the number differs by 0.25 from that corresponding to NMON (if set) or the most probable NMON, an error will result.

OBFA <bfac>

<bfac> is an estimate of the overall B-factor. If it is not set, the program will calculate it based on a maximum likelihood analysis.

OUTPut <outputname>

<outputname> is the string associated with the XML file that gcx writes out. The default <outputname> is "crank".

VERBose <n>

Specify amount of information to be outputted (where n is a positive integer). n = 0 is the normal output, n = 1 is more output and n = 2 is for debugging purposes. Default: n = 0.

EXAMPLES

Please look in the examples directory!

REFERENCES

[1] Kantardjieff, K.A. and Rupp, B. (2003) Protein Science, 12, 1865-1871.

[2] Matthews, B.W. (1968) J.Mol.Biol, 33, 491-497.

[3] French, G.S. and Wilson, K.S. (1978) Acta. Cryst., A34, 517-534.

[4] Evans, P.R., Dodson, E.J. and Dodson R. (unpublished).

[5] Pannu, N.S. (unpublished).

[6] de Graaff, R.A.G., Hilge, M., van der Plas, J.L. and Abrahams, J.P. (2001) Acta Cryst., D57, 1857-1862.

[7] Pannu, N.S. and Read, R.J. (2004) Acta Cryst., D60, 22-27.

[8] Pannu, N.S., McCoy, A.J. and Read, R.J. (2003) Acta Cryst., D59, 1801-1808.

[9] Abrahams, J.P. and Leslie, A.J.W. (1996) Acta Cryst., D52, 30-42.

[10] Cowtan, K.D. (1994) Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, 34-38.

[11] Sheldrick, G.M. http://shelx.uni-ac.gwdg.de/SHELX/

[12] Schneider T.R., Sheldrick G.M. (2002) Acta Cryst., D58, 1772-1779.

[13] Sheldrick, G.M. (2002) Z. Kristallogr., 217, 644-650.

[14] Perrakis, A., Morris, R.M. and Lamzin, V.S. (1999) Nat Struct Biol., 6, 458-463.

[15] Murshudov, G.N., Vagin, A.A. and Dodson, E.J. (1997) Acta Cryst., D53, 240-255.

[16] Pannu, N.S., Murshudov, G.N., Dodson, E.J. and Read, R.J. (1998) Acta Cryst. D54, 1285-1294.

[17] Skubak, P, Murshudov, G.N. and Pannu, N.S. (2004) Acta Cryst., D60, 2196-2201.

[18] Cowtan, K (2000) Acta Cryst., D56, 1612-1621.

[19] Cowtan, K (2006) Acta Cryst., D56, 1002-1011.

[20] Terwilliger, T.C. (2000) Acta Cryst., D56, 965-972.

[21] Terwilliger, T.C. (2003) Acta Cryst., D59, 38-44.

Last modified: Thu May 4 11:32:32 CEST 2006