AIMLESS (CCP4: Supported Program)

NAME

aimless
 - scale together multiple observations of reflections

SYNOPSIS

aimless HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

References
Input and Output files
Release Notes

DESCRIPTION

Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
Data from Denzo
Datasets

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity: it is a successor program to SCALA

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number (strongly deprecated). In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction expanded as spherical harmonics. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does several passes through the data:

  1. initial estimate of the scales
  2. first round scale refinement, using strong data using an I/sigma(I) cutoff
  3. first round of outlier rejection
  4. if both summation and profile-fitted intensity estimates are present (from Mosflm), then the cross-over point is determined between using profile-fitted for weak data and summation for strong data.
  5. first analysis pass to refine the "corrections" to the standard deviation estimates
  6. final round scale refinement, using strong data within limits on the normalised intensity |E|^2
  7. final analysis pass to refine the "corrections" to the standard deviation estimates
  8. final outlier rejections
  9. a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as file with scaled but unmerged observations, with partials summed and outliers rejected, for each dataset

Anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help. For most purposes, my normal recommendation is the default

  scales rotation spacing 5 secondary  bfactor on brotation spacing 20 
Other hints:-
  1. Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This is rarely the case for synchrotron data, but is appropriate for serial data (eg XFEL). This mode may be VERY slow if there are many batches.
  2. If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as different RUNs. This will be done automatically if no runs are specified.
  3. The SECONDARY correction is recommended and is the default: this provides a correction for absorption. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
  4. Use a B-factor correction unless the data are only very low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption.
  5. When trying out more complex scaling options, it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
  6. When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged separately. This is the default if multiple datasets are present in the input file.
Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

The ONLYMERGE flag skips the scaling (usually in conjuction with RESTORE to read in previously determined scales),  calculates statistics and outputs the data.

Partially recorded reflections

See appendix 1

The different options for the treatment of partials are set by  the PARTIALS command. Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

Summed partials [default]:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5]. This only works well if the calculated fractions are accurate, which is not usually the case.

Scaling algorithm

See appendix 2

Data from Denzo

Data integrated with Denzo may be scaled and merged with Aimless as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 3 for more details. 

Datasets

TBD

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ANALYSIS, ANOMALOUS, BINS, DUMP, EXCLUDE, HKLIN, HKLOUT, HKLREF, INITIAL, INTENSITIES, LABREF, LINKNAME, ONLYMERGE, OUTPUT, PARTIALS, REFINE, REJECT, RESOLUTION, RESTORE, RUN, SCALES, SDCORRECTION, TIE, TITLE, UNLINK, USESDPARAMETER, XMLOUT, XYZIN

RUN <Nrun>  BATCH <b1> to <b2>

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The definition of a run may use several RUN commands. If no RUN command is given then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. If any RUN definitions are given, then all batches not explicitly specified will be excluded.

SCALES [<subkeys>]

Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:
RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>
Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 5]
BATCH
Set "Batch" mode, no interpolation along rotation (primary) axis. This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred. WARNING: this option is not optimised and may take a very long time if you have many batches
BFACTOR ON | OFF 
Switch Bfactors on or off. The default is ON.
BROTATION <Ntime> | SPACING <delta_time>
Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used as its proxy [default SPACING 20].
SECONDARY [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. Set Lmax = 0 to switch off
ABSORPTION [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
POLE <h|k|l>
Define the polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
CONSTANT
One scale for each run (equivalent to ROTATION 1)
TILE <NtileX> <NtileY> [CCD]
Define a detector scale for each tile. Currently this implements a scale model for 3x3 tiled CCD detectors to correct for the underestimation of intensities in the corners of the tile, see Appendix 2. If the detector appears to be a 3x3 CCD (3072x3072 pixels) then this correction will be activated automatically unless the NOTILE keyword is given. The parameters are restrained using the TIE TILE parameters (qv)
NOTILE
Switch off the automatic TILE 3 3 correction for CCD detectors

SDCORRECTION [[NO]REFINE]   [INDIVIDUAL | SAME  [FIXSDB]

[RUN <RunNumber>] [FULL | PARTIAL] <SdFac> [<SdB>] <SdAdd> [DAMP <dampfactor>]

[SIMILAR [<sd1> <sd2> <sd3>]] ||

[[NO]TIE  SdFac | SdB | SdAdd <targetvalue> <SDtarget>]

[SAMPLESD]

Input or set options for the "corrections" to the input standard deviations: these are modified to
        sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}

where Ihl is the intensity and  (SdB may be omitted in the input). 
The default is "SDCORRECTION REFINE INDIVIDUAL", If explicit values are given, the default changes to NOREFINE.

The keyword REFINE controls refinement of the correction parameters, essentially trying to make the plot of the SD of the distribution of fraction deviations (Ihl - <I>)/sigma  = 1.0  over all intensity ranges. The residual minimised is Sum( w * (1 - SD)^2) + Restraint Residual

SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested.

Other subkeys control what values are determined and used for each run (if more than one). TIE and SIMILAR are mutually exclusive

RUN <run_number>
Define run for which values are given the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both if one set is given, e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

USESDPARAMETER  [NO | DIAGONAL | COVARIANCE]

For the final estimation of intensity errors sd(I), incorporate the estimated error in the refined scale model parameters, as estimated from the inverse normal matrix in the scale refinement. The default is DIAGONAL if this keyword is omitted, or given with no sub-keyword. "NO" switches it off. The DIAGONAL option uses the separate parameter variances, ie the diagonal of the variance/covariance matrix. COVARIANCE uses the full matrix, which is slower but may be more accurate.

The variance/covariance matrix [V] = Sum(wD^2)/(m-n) [H]^-1, where [H] is the normal (Hessian) matrix, Sum(wD^2) is the minimised residual, m the number of observations, and n the number of parameters.

The scaled intensity I'hl  =   Ihl/ghl  where ghl is its inverse scale factor

    Var(I')/I'^2 = Var(I)/I^2 + Var(g)/g^2          ie Var(I') = (1/g^2) [ Var(I) + I'^2 Var(g) ]
  
    Var(g) = [dg/dp]T [V] [dg/dp]      (COVARIANCE option)   where  dg/dp is the vector of partial derivatives with respect to parameters p

    DIAGONAL approximation: Var(g) = Sum(i) { [dg/dp(i)]^2 V(i,i) }    ie summed over parameters i

PARTIALS [[NO]CHECK] [TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP [<maxgap>]]

Set criteria for accepting complete or incomplete partials. Default is CHECK TEST 0.95 1.05 CORRECT 0.95 NOGAP

After all parts have been assembled, the total observation is accepted if:-

  1. the CHECK flag is set [default] and the MPART flags (if present) are all consistent (these flags indicate that a set of parts is eg 1 of 3, 2 of 3, 3 of 3)

  2. if CHECK fails, then the total fraction is checked to lie between lower_limit & upper_limit [default 0.95, 1.05]

  3. if this fails, then the incomplete partial is scaled up by the total fraction if  it is > minimum_fraction [default 0.95] (NB Pointless has different default for a different purpose)

  4. a reflection has a gap in the middle may be accepted if GAP is set, maxgap is maximum number of missing slots [not recommended: default 1 if GAP is set]

INITIAL UNITY | MEAN

Set initial scale factors either based on mean intensities (MEAN, default) or all set to 1.0 (UNITY)

INTENSITIES [SUMMATION | PROFILE | COMBINE [<Imid>] [POWER <Ipower>]

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. This applies to all stages of the program, scaling & averaging. Mosflm produces two different estimates of the intensity, from summation integration and from profile fitting. Generally the profile-fitted estimate is better, but for the strongest reflections the summation value is often better. The default is to use a weighted mean, depending on the "raw" intensity ie before LP correction (COMBINE option), and to optimise automatically the switch-over point Imid, to give the best overallR meas.

Subkeys:
SUMMATION
use summation integrated intensity Isum.
PROFILE
use profile-fitted intensity Ipr.
COMBINE [<Imid>] [POWER <Ipower>]
Use weighted mean of profile-fitted & integrated intensity, profile-fitted for weak data, summation integration value for strong.
          If no value is given for Imid, it will be automatically optimised
I = w*Ipr + (1-w)*Isum
w = 1/(1 + (Iraw/Imid)**Ipower)

Ipower defaults to 3.

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]
<Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]
[EMAX <Emax>]
[BATCH <batchrejectfactor>]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL -8, ie test within I+ or I- sets on 6sigma, between I+ & I- with a threshold adjusted upwards from 8sigma according to the strength of the anomalous signal. The adjustment of the ALL test is not necessarily reliable.

If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.

If ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I-
observations. Observations rejected on this second check are flagged "@" in the ROGUES file.

REJECT BATCH <batchrejectfactor> is intended for batch scaling of eg XFEL data. After the initial scales are calculated, very weak batches with scale factorsbatchrejectfactor x median scale are rejected

Subkeys:
SEPARATE
rejection & deviation calculations only between observations from the same dataset
COMBINE
rejection & deviation calculations are done with all datasets [default]
SCALE
use these values for the scaling pass
MERGE
use these values for the merging (FINAL) pass
sdrej
sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]
special value for reflections measured twice [default = sdrej]
ALL
check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF). A negative value [default -8] means adjust the value upwards according to the slope of the normal probability analysis of anomalous differences (AnomPlot)
sdrej+-
sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)
[sdrej2+-]
special value for reflections measured twice [default = sdrej+-]
KEEP
in merging, if two observations disagree, keep both of them [default]
REJECT
in merging, if two observations disagree, reject both of them
LARGER
in merging, if two observations disagree, reject the larger
SMALLER
in merging, if two observations disagree, reject the smaller
EMAX
maximum acceptable value for E = normalised |F| [default = 10.0 for acentrics]

The test for outliers is described in Appendix 4

ANOMALOUS [OFF] [ON]

OFF [default]
no anomalous used, I+ & I- observations averaged together in merging
ON
separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own

RESOLUTION [RUN <RunNumber>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

Set resolution limits in Angstrom, either order, optionally for individual datasets. The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If a RUN is specified this limit applies only to that run: this may a previous general limit for all runs, and may be used with automatic run generation. [Default use all data]

TITLE <new title>

Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

ANALYSIS [CONE <angle>] [CCMINIMUM <MinimumHalfdatasetCC>] [CCANOMMINIMUM <MinimumHalfdatasetAnomCC>] [ISIGMINIMUM <MinimumIoverSigma>] [BATCHISIGMINIMUM <MinimumBatchIoverSigma>]

Specify analysis parameters:

CONE specifies the half-angle (degrees) for cones around each reciprocal axis, for anisotropy analysis [default 20°].

CCMINIMUM & ISIGMINIMUM specify thresholds for estimation of suitable maximum resolution limits, both overall and along each reciprocal axis. These estimates are printed in the final Results summary, and give guide to possible cut-offs. BATCHISIGMINIMUM gives the threshold for the analysis of maximum resolution by batch, on <I/sd> before averaging. CCANOMMINIMUM is the threshold for analysis of the resolution limit of strong anomalous differences, from CC(1/2)anom.

Resolution estimates from CC(1/2) and CC(1/2)anom are done by fitting a function (1/2)(1 - tanh(z)) where z = (s - d0)/r, s = 1/d^2, and d0 is the value of s for which the function = 0.5, and r controls the steepness of falloff. For very negative CCs (usually from CCanom), an additional offset parameter dcc is added, {(1/2)(1 - tanh(z) * dcc - dcc + 1}. The fitted function is plotted along with the values. This curve-fitting was suggested by Ed Pozharski.

ONLYMERGE

Only do the merge step, no initial analysis, no scaling. If RESTORE is also given, the SDCORRECTION optimising will also be skipped.

DUMP [<Scale_file_name>]

Dump all scale factors to a file after the main scaling. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line.

RESTORE [<Scale_file_name>]

Read scales and SDcorrection parameters from a SCALES file from a previous run of Aimless (see DUMP). 

REFINE  [CYCLES <Ncycle>] [BFGS | FH] [SELECT <IovSDmin> <E2min> [<E2max>]]

[PARALLEL [AUTO] | <Nprocessors> | <Fractionprocessors>]

Define number of refinement cycles Ncycle and method for scale refinement.

    BFGS use BFGS optimisation (usual method)
    FH      use Fox-Holmes least-squares algorithm (not recommended)

SELECT  define selection limits for the two rounds of scaling. If unset, suitable values will be chosen automatically
PARALLEL  use multiple processors for the scale refinement steps, if available. This produces some speed-up for very large jobs.
For this option to be available, the program must be compiled and linked with the "-fopenmp" option, and the environment variable OMP_NUM_THREADS must be set to the maximum number of threads allowed by the system

EXCLUDE BATCH <batch range>|<batch list>]

BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
Define a list of batches, or a range of batches, to be excluded altogether.

TIE [SURFACE <Sd_srf>] [BFACTOR <Sd_bfac>] [ZEROB <Sd_zerob>] [ROTATION <Sd_z>] [TILE <Sd1-5>] [TARGETTILE <r0> w0>]

Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001. A negative SD value indicates no tie.

SURFACE: tie surface parameters to spherical surface [default is TIE SURFACE 0.001]
BFACTOR: tie Bfactors along rotation
ZEROB: tie all B-factors to zero
ROTATION: tie parameters along rotation axis (mainly useful with BATCH mode)
TILE: tie the CCD tile parameters. 5 SDs for radius r, width w, amplitude A, centre x0,y0, and Fourier coefficients
TARGETTILE: target values for tile parameters r and w

OUTPUT [MTZ] [NO]MERGED [UNMERGED [SPLIT|TOGETHER]] [SCALEPACK [MERGED | UNMERGED]]

Control what goes in the output file. Two types of output files may be produced, either in MTZ format or in Scalepack format: (a) MERGED (or AVERAGE), average intensity for each hkl (I+ & I-) (b) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected. Up to four types of files may be created at the same time: UNMERGED filenames are created from the HKLOUT filename (with dataset appended if there are multiple datasets) with the string "_unmerged" appended. If there are multiple datasets, by default MTZ files,  merged or unmerged, are  split into separate files (SPLIT). Unmerged MTZ files may optionally include all datasets if the keyword TOGETHER qualifies UNMERGED.

The default is to create a merged MTZ file for each dataset.

File format options:
NONE
no output file written
MERGED or AVERAGE
[default] output averaged intensities, <I+> & <I-> for each hkl
UNMERGED
apply scales, sum or scale partials, reject outliers, but do not average observations
SCALEPACK or POLISH
Write reflections to a formatted file in a format as written by "scalepack" (or my best approximation to it). If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.

KEEP [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE] 

Set options to accept observations flagged as rejected by the FLAG column from Mosflm. By default, any observation with FLAG .ne. 0 is rejected. Flagged reflections which are accepted may be marked in the ROGUES file.

Subkeys:
OVERLOADS
Accept profile-fitted overloads
BGRATIO
Observations are flagged in Mosflm if the ratio of rms background deviation relative to its expected value from counting statistics is too large. This option accepts observations if bgratio < bgratio_max [default in Mosflm 3.0]
PKRATIO
Accept observations with peak fitting rms/sd ratio pkratio < pkratio_max [default maximum in Mosflm 3.5]. Only set for fully recorded observations
GRADIENT
Accept observations with background gradient < bg_gradient_max [default in Mosflm 0.03].
EDGE
Accept profile-fitted observations on edge of active area of detector

LINK [SURFACE] ALL | <run_2> TO <run_1> 

run_2 will use the same SURFACE (SECONDARY or ABSORPTION) parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. The keyword ALL will be assumed if omitted.

UNLINK [SURFACE] ALL | <run_2> TO <run_1>

Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted

BINS [RESOLUTION] <Nsbins> INTENSITY <Nibins>

Define number of resolution and intensity bins for analysis [default 10]

SMOOTHING <subkeyword> <value> NOT YET DONE

Set smoothing factors ("variances" of weights). A larger "variance" leads to greater smoothing

Subkeys:
TIME <Vt>
smoothing of B-factors [default 0.5]
ROTATION <Vz>
smoothing of scale along rotation [default 1.0]
PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>
maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3]

NAME  PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

 Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file and redefines the single output dataset.
Note that these names apply to all data: if multiple datasets are required, these must be specified in Pointless. DATASET must be present, and may optionally be given in the syntax crystal_name/dataset_name

BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name>  NOT YET DONE

If there are multiple datasets in the input file, define the "base" dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i'th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.

HKLIN <input file name>

Filename for the main input file, as an alternative to specifying it on the command line.

HKLOUT <output file name>

Filename for the output file, as an alternative to specifying it on the command line.

XMLOUT <output XML file name>

Filename for the XML output file, as an alternative to specifying it on the command line.

HKLREF <reference file name>

Filename for a reference reflection file, as an alternative to specifying it on the command line. This file is used to provide a "best" estimate of intensity, and the observed data is analysed for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. This reference data could for example be calculated from the best current model, eg the FC_ALL_LS column from Refmac. Column labels may be specified with the LABREF command. Amplitudes are squared to intensities, and intensities are scaled to the merged observations with a scale and a anisotropic temperature factor. This is an alternative to giving a coordinate file XYZIN from which structure factors will be calculated.

LABREF [F | I =]<columnlabel>]

For an HKLREF file, this defines the column label for intensity or amplitude (which will be squared to an intensity). If this command is omitted, the first intensity column (or if no intensities, the first amplitude) will be used. The next column is assumed to contain the corresponding sigma.

XYZIN <reference coordinate file name>

The filename for a reference coordinate set. Structure factors will be calculated to use as a reference, in the same way as HKLREF. This provides a current "best" estimate of intensity, and the observed data is analysed for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. The file should contain a valid space group name (full name with spaces, eg "P 21 21 21", "P 1 21 1" etc) and unit cell parameters (ie a CRYST1 line in PDB format).

INPUT AND OUTPUT FILES

Input

HKLIN
The input file must be sorted on H K L M/ISYM BATCH

Compulsory columns:

        H K L           indices
M/ISYM partial flag, symmetry number
BATCH batch number
I intensity (integrated intensity)
SIGI sd(intensity) (integrated intensity)

Optional columns:

        XDET YDET       position on detector of this reflection: these
may be in any units (e.g. mm or pixels), but the
range of values must be specified in the
orientation data block for each batch.
ROT rotation angle of this reflection ("Phi"). If
this column is absent, only SCALES BATCH is valid.
IPR intensity (profile-fitted intensity)
SIGIPR sd(intensity) (profile-fitted intensity)
SCALE previously calculated scale factor (e.g. from
previous run of Scala). This will be applied
on input
SIGSCALE sd(SCALE)
TIME time for B-factor variation (if this is
missing, ROT is used instead)
MPART partial flag from Mosflm
FRACTIONCALC calculated fraction, required to SCALE PARTIALS
LP Lorentz/polarization correction (already applied)
FLAG error flag (packed bits) from Mosflm (v6.2.3
or later). By default, if this column is present,
observations with a non-zero FLAG will be
omitted. They may be conditionally accepted
using the KEEP command (qv)
Bit flags:
1 BGRATIO too large
2 PKRATIO too large
4 Negative > 5*sigma
8 BG Gradient too high
16 Profile fitted overload
32 Profile fitted "edge" reflection
BGPKRATIOS packed background & peak ratios, & background
gradient, from Mosflm, to go with FLAG
LATTNUM lattice number for multilattice data
Hn, Kn, Ln hkl indices for overlapped observations with multilattice data
HKLREF   reference file for analysis of agreement by batch. This may contain intensities or amplitudes (which will be squared), eg the FC_ALL_LS column from Refmac. The label is specified on the LABREF command

XYZIN  as an alternative to HKLREF, a coordinate file may be given, from which amplitudes and intensities will be calculated

Output

Reflection files output

In all cases, separate files are written for each dataset:  files are named with the base HKLOUT name with the dataset name appended, as "_dataset"

(a) HKLOUT: option OUTPUT [MTZ] MERGED

The output file contains columns
H K L  IMEAN SIGIMEAN  I(+) SIGI(+)  I(-) SIGI(-)

Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.


(b) HKLOUTUNMERGED: option OUTPUT [MTZ] UNMERGED
Unmerged data with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. Only a single scaled intensity value is written, chosen as summation, profile-fitted or combined as specified by the INTENSITIES command. Columns defining the diffraction geometry (e.g. FRACTIONCALC XDET YDET ROT TIME WIDTH LP)  will be preserved in the output file. If HKLOUTUNMERGED is not specified, then the filename for the unmerged file has "_unmerged" appended to HKLOUT

Output columns:

        H,K,L     REDUCED or ORIGINAL indices (see OUTPUT options)
M/ISYM Symmetry number (REDUCED), = 1 for ORIGINAL indices
BATCH batch number as for input
I, SIGI scaled intensity & sd(I)
SCALEUSED scale factor applied
SIGSCALEUSED sd(SCALE applied)
NPART number of parts, = 1 for fulls, negated for scaled
partials, i.e. = -1 for scaled single part partial
FRACTIONCALC total fraction (if present in input file)
TIME copied from input if present
XDET,YDET copied from input if present
ROT copied from input if present (averaged for
multi-part partials)
WIDTH copied from input if present
LP copied from input if present
(c) SCALEPACK: option OUTPUT SCALEPACK MERGED
If a SCALEPACK filename is not specified then the filename will be taken from HKLOUT with  the extension ".sca"
(d) SCALEPACKUNMERGED: option OUTPUT SCALEPACK UNMERGED
If a SCALEPACKUNMERGED filename is not specified then the filename will be taken from SCALEPACK with  "_unmerged" appended and the extension ".sca"

Other output files

XMLOUT
XML output for plotting etc. It includes the NORMPLOT, ANOMPLOT, CORRELPLOT and ROGUEPLOT data, as well as the $TABLE graph data
SCALES
scale factors from DUMP, used by RESTORE option
ROGUES
list of bad agreements
TILEIMAGE
a detector image representing the CCD TILE correction, if activated, in ADSC image format which may be viewed with adxv
The following 4 files are also represented in the XMLOUT file:
NORMPLOT
normal probability plot from merge stage
*** this is at present written is a format for plotting program xmgr (aka [xm]grace), but can also be read by loggraph ***
ANOMPLOT
normal probability plot of anomalous differences
            (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph ***
CORRELPLOT
scatter plot of pairs of anomalous differences (in multiples of RMS) from random half-datasets. One of these files is generated for each output dataset
*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph ***
ROGUEPLOT
a plot of the position on the detector (on an ideal virtual detector with the rotation axis horizontal) of rejected outliers, with the position of the principle ice rings shown
*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph *** 

REFERENCES

    1. P.R. Evans and ,G.N. Murshudov "How good are my data and what is the resolution?" Acta Cryst. (2013). D69, 1204–1214
    2. P.R.Evans "An introduction to data reduction: space-group determination, scaling and intensity statistics", Acta Cryst. D67, 282-292 (2011)
    3. P.R.Evans, "Scaling and assessment  of data quality", Acta Cryst. D62, 72-82  (2006). Note that definitions of Rmeas and Rpim in this paper are missing a square-root on the (1/n-1) factor
    4. W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
    5. P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
    6. P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
    7. R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
    8. Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
    9. Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
    10. Kay Diederichs & P. Andrew Karplus, "Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)
    11. Manfred Weiss & Rolf Hilgenfeld, "On the use of the merging R factor as a quality indicator for X-ray data", J.Appl.Cryst. 30, 203-205 (1997)
    12. Manfred Weiss, "Global Indicators of X-ray data quality" J.Appl.Cryst. 34, 130-135 (2001)

Appendix 1: Partially recorded reflections

In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 301, 302, 303), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

Summed partials:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The parameters for the checks are set by the PARTIALS command. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

  1. if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked (TEST). NOCHECK switches off this check.
  2. if the TEST option is set (default), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.05]
  3. if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. This correction relies on accurate FRACTIONCALC values, so beware.
  4. if the GAP option is set (not recommended), partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

      TEST 1.2 1.2 CORRECT 0.5 

will scale up all partials with a total fraction between 0.5 & 1.2

      TEST 0.95 1.05           

will accept summed partials 0.95->1.05, no scaling

      TEST 0.95 1.05 CORRECT 0.4  

will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

Appendix 2: Scaling algorithm

For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

 
Sum( whl * ( Ihl - ghl * Ih )**2 ) Ref Hamilton, Rollett & Sparks

where Ih is the current best estimate of the "true" intensity

        Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)

Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

The inverse scale factor ghl is derived as follows:

        ghl = Thl * Chl * Shl

where Thl is an optional relative B-factor contribution, Chl is a scale factor, and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY, ABSORPTION options).

a) B-factor (optional)

For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl

        B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))

where Bi are the B-factors at time ti
delt = tl - ti
p(delt) = exp ( - (delt)**2 / Vt )
Vt is "variance" of weight, & controls the smoothness
of interpolation

Thl = exp ( + 2 s B )
s = (sin theta / lambda)**2

b) Scale factors

For each run, scale factors Cz are determined at intervals on rotation angle z. Then for an observation at position (z0),

        Chl(z0) =
Sum(z)[p(delz)*Cz]/Sum(z)[p(delz)]

where delz = z - z0
p(delz) = exp(-delz**2/Vz)
Vz is the "variance" of the weight & controls the smoothness of interpolation

For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor for each batch. 

c) Anisotropy factor

The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of
(1) the secondary beam (SECONDARY correction) in the camera spindle frame,
(2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis

  1. SECONDARY beam direction (camera frame)
             s  =  [Phi] [UB] h
    s2 = s - s0
    s2' = [-Phi] s2
    Polar coordinates:
    s2' = (x y z)
    PolarTheta = arctan(sqrt(x**2 + y**2)/z)
    PolarPhi = arctan(y/x)

    where [Phi] is the spindle rotation matrix
    [-Phi] is its inverse
    [UB] is the setting matrix
    h = (h k l)
  2. ABSORPTION: Secondary beam direction (permuted crystal frame)
             s    = [Phi] [UB] h
    s2 = s - s0
    s2c' = [-Q] [-U] [-Phi] s2
    Polar coordinates:
    s2' = (x y z)
    PolarTheta = arctan(sqrt(x**2 + y**2)/z)
    PolarPhi = arctan(y/x)

    where [Phi] is the spindle rotation matrix
    [-Phi] is its inverse
    [Q] is a permutation matrix to put
    h, k, or l along z (see POLE option)
    [U] is the orientation matrix
    [B] is the orthogonalization matrix
    h = (h k l)
then
 Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)

where Ylm is the spherical harmonic function for
the direction given by the polar angles
Clm are the coefficients determined by
the program

Notes:

(d) Detector correction (TILES)

A correction for tiled CCD detectors has been implemented to attempt to correct for the underestimation of spots falling in the corner of the detector. The present model expresses a correction factor in terms of an erfc function of the distance from the tile centre, such that the correction = 1 in the centre of the tile and falls off at the edge and corners

For a spot at position x,y relative to the tile centre, normalised by the tile width in pixels such that x & y run from -1 to +1, then
distance from centre (x0,y0) d = sqrt[(x-x0)2 + (y-y0)2]
correction factor g  = A f(z) + 1 - A    where A is the amplitude of the correction near the edge and f(z) is a radial function of the modified "radius"  z = (2/w)(d - r - w)  .  r  defines the point at which the scale starts to decline from 1.0, and w the "width" of the fall-off
Currently f(z) = 0.5 erfx(z) though other expressions have been tried

Amplitude A various azimuthally with the angle phi = tan-1(y/x) as a Fourier series, A = A0{a cos(phi) + b sin(phi) + c cos(2phi) + d sin(2phi)}

Refined parameters for each tile are r, w, A0, x0, y0, and the four Fourier terms for A, a,b,c,d.

By default, parameters are restrained (TIE) as follows (see TIE TILE)
A0, a,b,c,d and x0,y0 are tied to 0.0 with their SDs
r, w are tied to target values with their SDs [default 0.70, 0.40]
r, w, and A0 are tied to be similar over all tiles
Five SD values control the strength of the restraints, respectively for r, w, A0, x0|y0, and abcd
SD = 0 switches off the restraint

Appendix 3: Data from Denzo

DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Aimless. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation) or POINTLESS (for Scalepack output only).

Both of these options have some problems


Appendix 4: Outlier algorithm

The test for outliers is as follows:

(1) if there are 2 observations (left), then
(a) for each observation Ihl, test deviation
     Delta(hl) =  (Ihl - ghl Iother) / sqrt[sigIhl**2 + (ghl*sdIother)**2]

against sdrej2, where Iother = the other observation

(b) if either |Delta(hl)| > sdrej2, then
  1. in scaling, reject reflection. Or:
  2. in merging,
    1. keep both (default or if KEEP subkey given) or
    2. reject both (subkey REJECT) or
    3. reject larger (subkey LARGER) or
    4. reject smaller (subkey SMALLER).
(2) if there 3 or more observations left, then
(a) for each observation Ihl,
  1. calculate weighted mean of all other observations <I>n-1 & its sd(<I>n-1)
  2. deviation
  3.           Delta(hl) =
    (Ihl - ghl <I>n-1>) / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
  4. find largest deviation max|Delta(hl)|
  5. count number of observations for which Delta(hl) .ge. 0 (ngt), & for which Delta(hl) .lt. 0 (nlt)
(b) if max|Delta(hl)| > sdrej, then reject one observation, but which one?
  1. if ngt == 1 .or. nlt == 1, then one observation is a long way from the others, and this one is rejected
  2. else reject the one with the worst deviation max|Delta(hl)|
(3)  iterate from beginning

RELEASE NOTES

0.5.10,11,12 bug fix in setting spherical harmonic orders. Keep empty batches in unmerged file output. Bug fix in scaling one lattice from multilattice data
0.5.9 bug fix to allow ABSORPTION <lmax> to work. Fix for restore problem with variances & tiles
0.5.8 bug fix for case SCALES CONSTANT BROTATION with one run (not sensible anyway). Also fixed bug when there are different resolution limits for different datasets
0.5.7 minor bug fix to resolution tables
0.5.6 bug fix for SCALE CONSTANT with more than one run
0.5.5 bug fix in radiation damage analysis. Rescale Scalepack output if intensities are small
0.5.4 bug fix for Sca output with one of I+ or I- missing
0.5.3 fix save/restore bug for BFACTOR OFF
0.5.2 bug fix for already merged data. Fix long-standing rare bug in hash table
0.5.1 "improved" SD correction refinement. Added [UN]LINK commands, imporved default linking
0.4.10 add SD analysis graph to XML
0.4.8,9 improved robustness of maximum resolution curve fit
0.4.7 better trap for no data in SDCORRECTION refinement
0.4.5,6 fill in missing IPR columns from I, shouldn't normally happen
0.4.2,3,4 Bug fixes. Unmerged SCA files written with corrected symmetry translations
0.4.1 Inflate sd(I) using estimated parameter errors from inverse normal matrix (see USESDPARAMETER). Fixed bug in TILE correction. Added curve fit for maximum resolution estimation
0.3.11 Fixed nasty bug from XDS->Pointless giving Assertion failed: (sd > 0.0), function Average
0.3.10 Bug fixes. Also restrict run-run correlations to < 200 runs.
0.3.9 Added matrix of run-run cross-correlations
0.3.8 bug fixes to make EXCLUDE BATCH <range> option work
0.3.7 Bug fixes for unusual case of runs with all fulls and no fulls. Options for XFEL data: SDCORRECTION SAMPLESD; Rsplit. Bug fixes for Batch scaling with rejected batches. REJECT BATCH option for batch scaling
0.3.6 Fix bug with explicit RUN definitions.
0.3.4,5 remove debug print for self-overlaps. Pick up number of parts for previously summed partials (MPART column from Feckless), for partial bias analysis
0.3.3 fixed save/restore for TILE correction. Fixed reading of SDcorrection parameters
0.3.2 more corrections to multilattice handling (mapping lattice number to run number for scaling)
0.3.1  optional reference data for analysis of agreement by batch, either as structure factors (or intensities) HKLREF,LABREF, or coordinates (XYZIN)
0.2.20 fix bug for single B-factor/run
0.2.18,19 updates from Pointless for multiple lattices. Corrected calculation of anomalous multiplicity
0.2.17 fix bug in setting same resolution bin widths for multiple datasets when NBINS is set
0.2.16 message for std::bad_alloc, running out of memory
0.2.15 fix to XML graphs (for ccp4i2)
0.2.14 fix to correctly append to MTZ history
0.2.13 activate writing spacegroup confidence. Reflection status flags cleared before outlier checks
0.2.12 fix bug in reading multilattice files
0.2.10 small bug fix in radiation damage analysis
0.2.9 fix for Batch scaling if no phi range information
0.2.8 XML changes for I2 report. Change automatic anomalous thresholds, always output anom statistics
0.2.7 Bug fix in XML if no orientation data (ROGUEPLOT)
0.2.6 Fix to output multilattice overlaps. Added radiation damage analysis as in CHEF, for Graeme Winter
0.2.5 Fix so that BINS RESOLUTION works
0.2.4 Bug for XDS data, was omitting reflections with FRACTIONCALC (derived from IPEAK) < 0.95, leading to incompleteness
0.2.3 Now does reject and record Emax outliers properly (though work is continuing on improving this). Fixed small bug in analyseoverlaps.
0.2.2 fixed bug in Bdecay plot when batches omitted. Explicit Xrange for XML batch plots. No ROGUEPLOT if no orientation data. List overlaps in ROGUES file
0.2.1   some major reorganisations. Added XML output. SCALES TILE option. Handling of multilattice data. SDCORRECTION SIMILAR
0.1.30 allow TIE with negative sd to turn off tie, as documented. Also fixed bug in ABSORPTION
0.1.29 small change to Result table to work with Baubles arcane (and undocumented) rules for Magic Tables
0.1.28 bug fix in "sdcorrection same"
0.1.27 bug fix in minimizer which sometimes affected the case with just 2 parameters
0.1.26 Default to "scales secondary"
0.1.25 omit sigI<=0, process REJECT command properly, small bug fix in smoothed Bfactors
0.1.24 small bug fix in printing batch tables with multiple datasets
0.1.22,23 INITIAL UNITY option. In tables, print batches with no observations but not rejected batches. Put title into output file. Fix initial scale bug with 3 scales
0.1.21 corrections to ROGUEPLOT, ice rings were in wrong place (by a factor of wavelength)
0.1.20 made sdcorrection refinement more robust to low multiplicity. If anomalous off (or no anomalous detected), statistics are now printed over all I+ I- together. Reject large negative observations (default E < -5)
0.1.19 preliminary addition of spg_confience(|| status). Bug fix from valgrind (from Marcin)
0.1.18 changed tablegraph to fix compilation problem (va_start)
0.1.17 bug fix in outlier rejection, problem with large variances leading to inconsistencies in Rogues file and some over-rejection
0.1.16 made SDcorrection refinement more robust
0.1.14,15 various bug fixes (including memory leaks), fixed autorun generation, improved SD correction for large anomalous, constrain cell to lattice group, etc
0.1.12 Half-dataset CC labelled as "CC(1/2)"
0.1.11 Small bug fixes
0.1.9 autodetect anomalous. Plot Rmeas for each run
0.1.7 fix for SCALES CONSTANT from XSCALE
0.1.6  anisotropy analysis against planes in trigonal, hexagonal and tetragonal systems (inlcuding rhombohedral axes), principal anisotropic axes in monoclinic and triclinic, cone analyses weighted according to cos(AngleFromPrincipalDirection). Fixed cases where multiple datasets have different resolution limits
0.1.4,5 more fixes for multiple datasets, dump/restore. OUTPUT UNMERGED SPLIT is default
0.1.3 More "resolution run"bug fixes
0.1.2 REFINE PARALLEL option (thanks to Ronan Keegan). Fixed bug in "resolution run" options
0.1.1 fixed bugs in writing ROGUES file; introduced HKLOUTUNMERGED etc filename specifiers;  cleaned up Unmerged output; added Rfull to tables
0.1.0   fixed some bugs found by cppcheck and valgrind
0.0.16 fixed small bug in INTENSITIES COMBINE optimisation
0.0.15 if run definitions are given explicitly, then unspecified batches are excluded
0.0.14  Added optimisation for INTENSITIES COMBINE, for Mosflm data. This is now the default

AUTHOR

Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

SEE ALSO