SCALA (CCP4: Supported Program)

NAME

scala - scale together multiple observations of reflections

SYNOPSIS

scala HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

Keyworded input summary
References
Input and Output files
Examples
Release Notes

DESCRIPTION

Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
Corner correction
TAILS correction
Data from Denzo
Datasets
Data harvesting

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity.

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number (deprecated). In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction, either expanded as spherical harmonics, or as a interpolated three-dimensional function of Phi and the spatial coordinates of the measured spot on the detector. Such three-dimensional scaling is typically somewhat ill-determined, but it is generally useful if suitably restrained (see below for discussion of this) and should normally be used. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ), the interpolated three-dimensional correction is similar to that described by Kabsch (Ref Kabsch (1988)).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does three passes through the data:

a scaling pass: firstly, there is an initial estimate of the scales, then the scale parameters are refined
an analysis pass to refine the standard deviation estimates
a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as a copy of the input file with evaluated scales appended to each observation.

Normally anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help. For most purposes, my normal recommendation is

  scales rotation spacing 5 secondary 6 bfactor on brotation spacing 20

Other hints:-

If successive images are collected with the same detector (on-line detector) or equivalent detectors, and the beam intensity is steady or smoothly varying, then use a smoothed scaling options. Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This may sometimes but rarely be the case for synchrotron data (if a "dose" mode is not used). It is possible to "mix-and-match" options. For instance, the best option for data from an unstable synchrotron beam may be e.g. SCALES BATCH BFACTOR ON BROTATION SPACING 10, which will make the Bfactor variation smooth, but the scales discontinuous by batch.
If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as different RUNs. This will be done automatically if no runs are specified.
The SECONDARY correction is recommended: this provides a correction for absorption and is better than the DETECTOR option. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions, even in the absence of a reference dataset. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
Use a B-factor correction unless the data are only low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption.
The TAILS correction might be tried if the fractional bias is significant: this is only useful if there are many fully recorded reflections (ie rarely). The refinement of the TAILS parameters is not very robust, and it may be necessary to FIX A1 (this should be improved).
When trying out more complex scaling options (eg TAILS), it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged separately. This is the default if multiple datasets are present in the input file. For isomorphous replacement, it may sometimes be useful to provide a native dataset as a reference, to make the systematic errors in the derivative similar to those in the native (ie "local" scaling, using the SECONDARY option).

Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

Each of the stages can be individually activated or suppressed. Particularly useful options are:

Restarting scaling after a crash or failure to converge :the RESTORE option enables a restart from where you left off. Scales are dumped by default to a file SCALES after each cycle in case of crashes (see DUMP/NODUMP options).
Rerunning the merge step without repeating the scaling, using the ONLYMERGE and RESTORE commands, eg to adjust the SDCORRECTION parameters

Partially recorded reflections

See appendix 1

Partially recorded reflections are by default included the scaling pass, as well as included in the final analysis and merging. They may optionally be excluded from the scaling (controlled by the command INTENSITIES), and excluded from the final analysis (controlled by the command FINAL). Note that this default has changed from some antique versions

The different options for the treatment of partials are set by either the PARTIALS command, effective for both scaling & merging stages; or separately for the scaling stage only (INTENSITIES command) or for the merging stage only (FINAL command).

Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

Summed partials [default]:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5]. This only works well if the calculated fractions are accurate, which is not usually the case.

Scaling algorithm

See appendix 2

Corner correction

CCD detectors underestimate the intensities of spots close to the edges and particularly in the corners of the tiles, due to the point spread function from the optical taper. This is a significant problem for 3x3 tiled detectors, as the corners lie in critical parts of the diffraction pattern. The spot intensities may be corrected using a calibrated correction table for the individual detector, using the pixel coordinates in the HKLIN file. The table is given to Scala as an ADSC image format file, and activated by the CORNERCORRECT command. Acknowledgements for this correction are due to the following people: Andy Arvai, Xuong Nguyen-huu, Chris Nielsen, Raimond Ravelli, Gordon Leonard, Sean McSweeney, Sandor Brockhauser, and Andrew McCarthy.

Note that at present Scala has no way of knowing which detector was used, so it is up to the user to provide the correct file: correction files should be available from the synchrotron beamlines, or from the detector manufacturers.

TAILS correction

The TAILS (SCALES .. TAILS) correction may be used to improve poor partial bias: this is an attempt to allow for the difference in scan width between fulls and partials. A partial is measured across twice (or 3 times etc) the rotation width of a full, so more of the diffuse scattering tails are included in the intensity, leading to an under-estimation of the fulls relative to partials. This correction is not very robust (though more so than in earlier versions of Scala), and the parameters may be unstable: you should always try first without this correction, and check that it really does improve the data statistics, without applying ridiculously large corrections. This correction is only useful if you have a large proportion of fully-recorded observations. See appendix 3 for more details.

Data from Denzo

Data integrated with Denzo may be scaled and merged with Scala as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 4 for more details.

Datasets

Data in MTZ files are assigned to "datasets", within a hierarchy of Crystal/Dataset. A crystal also has a "project name" which is not part of the hierarchy but is used to group data for harvesting. Each of these levels of hierarchy has "properties": a crystal has a unit cell, and a dataset has a wavelength. Unmerged data files as used in Scala typically contain a single dataset, but may contain multiple datasets if for instance multiple wavelength datasets are being scaled together, or if a reference set is present. Each BATCH in the file is assigned to a specific dataset.

Assigning a dataset:-

Preferably, a project name, crystal name and dataset name should be assigned when the file is created, eg in Mosflm
Utility programs eg (or REBATCH) may be used to (re)assign dataset names and add or correct dataset properties (wavelength and cell)
Names may be (re)assigned within Scala using the NAME command. This may be useful if names have not been assigned before, or if data from different crystals are merged into a single dataset. Note that each NAME command defines a different output dataset.

Using datasets in Scala:

A RUN may not contain batches from different datasets, but a dataset may contain multiple runs. Datasets may be explicitly assigned to runs (see the RUN command).
By default, each dataset is written out to a different output file, (see OUTPUT options).
By default, outliers are rejected across all datasets (unless REJECT SEPARATE). This is normally a sensible thing to do for MAD data, since the expected differences are small, but carries with it the danger of rejecting real differences. By default, the rejection test is automatically adjusted upwards (to accept larger differences) if the anomalous signal is strong, but this is not very precise. If you have for strong signals and good data, check the ROGUES file & the value of the I+/I- test & reset it if necessary, eg
```
         ANOMALOUS ON
         REJECT 6 ALL 15  # to check between I+ and I-
```
Various analyses are done between datasets, comparing the anomalous differences and the dispersive (isomorphous) differences from a defined "base" set (ie correlation between ((I(i) - I(base)) and (I(j) - I(base)) (i .ne. j .ne. base)). Typically the base dataset would be a high-energy remote (this is the default), but it may be set with the BASE command.

Data Harvesting

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.scala

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.

KEYWORDED INPUT - SUMMARY

Summary classification of keywords

The most commonly used keywords (almost essential)

SCALES

define scaling method (scaling model)

RUN

define subsets of data as "runs". By default, data are split into runs at points of discontinuity.

REJECT

set outlier rejection limits

ANOMALOUS on

anomalous scattering is present

RESOLUTION

resolution limits

TITLE

set a title

Control of program flow

ONLYMERGE

Skip the scaling, go straight to merge step: this requires RESTORE as well if the original input HKLIN file is used, but not if a file from a previous OUTPUT SEPARATE run is re-input.

RESTORE

restore previously-determined scales, eg after convergence failure or instead of re-running the scaling

General keywords:

PARTIALS

controls acceptance of partials

PRINT

how much printing in logfile

Principal keywords affecting scaling

CYCLES

number of cycles and convergence etc

EXCLUDE

select reliable reflections for scaling

TIE

restrain scaling parameters, particularly useful for the SECONDARY (ABSORPTION) scaling option

LINK

use same scaling parameters for different runs (for surface parameters (SECONDARY, ABSORPTION) or TAILS)

INTENSITIES full

use only fulls in scaling

Principal keywords affecting merging

OUTPUT

what to put in the output file

FINAL

treatment of partials

SDCORRECTION

set SDcorrection parameters (particularly after first run). The Sd parameters are refined by default.

Dataset and Data Harvesting keywords

NAME

assign project/crystal/dataset name

BASE

define "base" dataset for dispersive differences

PRIVATE

directory permissions for user only

USECWD

write deposit file to current directory

RSIZE

width of a row in deposit file

NOHARVEST

do not write deposit file

Rarely used keywords: ANALYSE, BINS, DAMP, DUMP, FILTER, HISTORY, INITIAL, INSCALE, NODUMP, NOSCALE, OVERLAPMAP, SKIP, SMOOTHING, [UN]FIX, UNLINK, WIDTH, XYBINS

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ACCEPT, ANALYSE, ANOMALOUS, BASE, BINS, CORNERCORRECT, CYCLES, DAMP, DUMP, EXCLUDE, FILTER, FINAL, HISTORY, INITIAL, INSCALE, INTENSITIES, LINK, NAME, NODUMP, NOHARVEST, NORMALISE, NOSCALE, ONLYMERGE, OUTPUT, OVERLAPMAP, PARTIALS, PRINT, PRIVATE, REJECT, RESOLUTION, RESTORE, RSIZE, RUN, SCALES, SDCORRECTION, SKIP, SMOOTHING, TIE, TITLE, [UN]FIX, UNLINK, USECWD, WIDTH, XYBINS

RUN <Nrun> [<subkeys>]

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The subkeys allow definition of a run in a flexible way. The definition of a run may use several RUN commands. If no RUN command is given, or if the ALL keyword is used, then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. Batches or batch ranges may still be excluded, either with the EXCLUDE subkey here, or by using the EXCLUDE keyword (qv)

Subkeys:

REFERENCE: This run is a reference set, i.e. it will be given a single scale factor = 1.0 (an input scale factor in the SCALE column will still be applied if present). Reference datasets are (by default) excluded from the merging process, both from the output intensities and from the statistics
BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |: Define a list of batches, or a range of batches, to be included in or excluded from the run. If batches are included in more than one run definition, the last definition will take priority.
ALL: Include all batches. In this case automatic run assignment will be done: to override this use eg RUN 1 BATCH 1 to 99999
CRYSTAL <crystal_name>: Define a crystal name to be included in the run. This would usually be used in conjunction with the DATASET subkey.
DATASET <dataset_name>: Define a dataset name to be included in the run. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.
INCLUDE | EXCLUDE: Set include/exclude flag for a following RANGE or BATCH keyword. Excluded batches or ranges will be omitted from the output file.
RANGE <r1> TO <r2>: Rotation range to include or exclude

Examples:

  RUN 1 BATCH 1 TO 10000    # unconditionally include all batches
  RUN 1 ALL  EXCLUDE 77 79 132  # automatic run splitting will be done
  RUN 1 INCLUDE BATCH 1 TO 200 EXCLUDE 77 79 132
  RUN 2 CRYSTAL  Native DATASET Lambda1
  RUN 3 DATASET  Native/Lambda2
  RUN 4 INCLUDE RANGE 0 TO 90 EXCLUDE RANGE 45 TO 48

SCALES [<subkeys>]

Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:

RUN <run_number>: Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>: Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 10]
BATCH: Set "Batch" mode, no interpolation along rotation (primary) axis. This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred.
SMOOTH <delta_batch>: Set smoothed Batch mode: this treats the batch number as a rotation angle, and interpolates along rotation axis in the same way as the ROTATION option. <delta_batch> sets the interval on batches (ie the number of batches to smooth over). This option is an alternative to ROTATION if you have lost the information in the ROT column (spindle rotation angle (Phi)), but otherwise the ROTATION option is preferred.
BFACTOR ON | OFF | ANISOTROPIC: Switch Bfactors on or off. The default is ON, but Bfactor refinement will be switched off by default if the scales are allowed to vary across the detector (qv DETECTOR). The ANISOTROPIC keyword activates anisotropic Bfactors (NOT RECOMMENDED): beware that the parameters for this option is likely to be poorly determined. Note that the anisotropic correction is centrosymmetric.
BROTATION [|TIME] <Ntime> | SPACING <delta_time>: Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used as its proxy. SCALES BATCH BROTATION SPACING 5 make the Bfactor variation smooth, but the scales discontinuous by batch.
SECONDARY [<Lmax>]: Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 SECONDARY 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]
ABSORPTION [<Lmax>]: Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 ABSORPTION 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
SURFACE [<Lmax>]: Local correction expanded on direction of the scattering vector in hkl space (ie crystal frame) in spherical harmonics up to maximum order Lmax. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). The polar axis may be specified with the POLE keyword (qv). If you want to do 3-dimensional scaling, the SECONDARY or ABSORPTION option is preferable: this option should only be used if the diffraction geometry information required to work out the beam directions is not available.
POLE <h|k|l>: Define the polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
DETECTOR <Nscales_X> [<Nscales_Y>] | SPACING <delta_X> [<delta_Y>]: Define layout of scale factors on detector (i.e. secondary beam), either as number of scales in each direction (along XDET & YDET), or (if SPACING keyword present) as interval on XDET & YDET. The values for Y default equal to those for X is not specified. This option assumes that the detector positions are recorded in the input file (columns XDET, YDET), in any units (mm or pixels). If you allow the scale to vary across the detector (anything other than DETECTOR 1, the default), then by default Bfactor refinement is switched off, since the combination is likely to be unstable [Default 1 scale, i.e. no variation of scale across detector]. The SECONDARY option is probably better.
CONSTANT: One scale for each run (equivalent to ROTATION 1)
TAILS [<v> [<a0> [<a1>]]]: Not normally recommended. Apply correction for diffuse scattering (reflection tails) for this run. This can only be used with summed partials (INTENSITIES PARTIALS: this is the default). See introduction for explanation. Initial values for the parameters v, a0 & a1 may be given following the keyword

v width of tails in reciprocal space (A**-1) [default = 0.01]
a0 fraction of intensity in diffuse peak at theta = 0 [default = 0.0, fixed]
a1 slope of intensity fraction against (sin theta/lambda)**2 [default = 10]

Parameters may be fixed using the FIX command, or the same set used by different runs as defined by the LINK command. These controls may be required to avoid the parameters going wild..
SLOPE: NOT RECOMMENDED. Set "Slope" mode, like Batch, except that each batch has different scales at the beginning and end of the rotation range. The value used for each reflection is interpolated linearly according to the "Rotation" (phi) value. SLOPE implies BATCH mode. Be careful with this option: does it really improve the data? It is unlikely to work well if the mosaicity is large. TIE ROTATION may be used to restrain the difference in scales.

CORNERCORRECT <correction table filename>

Apply "corner correction" for CCD detectors, see above. This applies a correction on input based on the pixel coordinates of the observation, using a calibrated table of correction factors. The name of the file containing the corrections (as an ADSC image) is given here, or on the command line as the CORNERCORRECT parameter.

SDCORRECTION [[NO]REFINE] [UNIFORM | INDIVIDUAL | COMMON] [FIXSDB] [[NO]ADJUST] [RUN <RunNumber>] [FULL | PARTIAL | BOTH] <SdFac> [<SdB>] <SdAdd>

Input or set options for the "corrections" to the input standard deviations: these are modified to

        sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}

where Ihl is the intensity and LP is the Lorentz/Polarization factor (SdB may be omitted in the input). Note that the SdB term was multiplied by the LP factor in versions from version 3.3.0 to 3.3.8, but not in earlier or later versions: the values of SdB cannot be compared between these versions.
The default is "SDCORRECTION REFINE INDIVIDUAL NOADJUST"

The keyword REFINE controls refinement of the correction parameters, essentially trying to make the plot of the SD of the distribution of fraction deviations (Ihl - <I>)/sigma = 1.0 over all intensity ranges. The residual minimised is Sum( w * (1 - SD)^2) where w = number of reflections in that intensity bin. Other subkeys control what values are determined and used for different runs (if more than one)

UNIFORM same SD parameters for all runs, fulls and partials (always used in a first pass before the other options)
INDIVIDUAL [default] use different SD parameters for each run, fulls and partials
COMMON same SdB & SdAdd for all, but individual SdFac parameters
FIXSDB fixes the SdB parameter in the refinement (but it seems best to let it refine, even though it has no obvious physical meaning)

The keyword ADJUST activates an automatic adjustment of the Sdfac parameters from the normal probability analysis, after any REFINE step [default is NOADJUST] (this applies to all runs)

RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both (BOTH), e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

With the output options SEPARATE or POSTREF, the modified Sds are written to the output file in columns SIGIC [& SIGIPRC if IPR is present]. These columns will be used by Postref but ignored on reinput to Scala.

PARTIALS [NO]CHECK [NO]TEST [<lower_limit> <upper_limit>] CORRECT <minimum_fraction>] [NO]GAP MAXWIDTH <maximum_width> SCALE_PARTIAL <minimum_fraction> USE_PROFILE

Select the way in which partials are treated in both scaling and merging. These settings may be overridden separately for the scaling and merging steps with the INTENSITIES and FINAL commands respectively.

By default, partials are included (summed) in both scaling and in merging.

Subkeys:

[NO]CHECK: do [not] check for consistency of MPART flags (if present, i.e. from Mosflm). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]: do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]: Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP: do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>: maximum number of parts for an acceptable summed partial
SCALE_PARTIALS: use scaled partials greater than <Minimum_fraction>. Only use this if the FRACTIONCALC column contains a good estimate of the partiality, and if you really need to recover these observations.
USE_PROFILE: use profile-fitted intensity even for scaled partials

INTENSITIES

[INTEGRATED | PROFILE | PR_PART | COMBINE [<Imid>] [POWER <Ipower>] ]

[[NO]ANOMALOUS]

[FULLS | ONLYFULLS | SCALE_PARTIAL <minimum_fraction>

| PARTIALS [ [NO]CHECK | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction> ] [ [NO]GAP ] [MAXWIDTH <maximum_width>] ] ]

Intensities selection for scaling: which intensities to use, whether to keep Bijvoet pairs separate, and treatment of partials in scaling:

(a) Intensity selection options:

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. Note this applies to all stages of the program, scaling & averaging.

Subkeys:

INTEGRATED: summation integrated intensity I.
PROFILE: profile-fitted intensity IPR [default if present]. Note that this will not be used for scaled partials unless PARTIALS USE_PROFILE is set.
PR_PART: profile IPR for fullys, integrated for partials
COMBINE [<Imid>] [POWER <Ipower>]: Use weighted mean of profile-fitted & integrated intensity, profile-fitted for weak data, summation integration value for strong.; I = w*Ipr + (1-w)*Iint; w = 1/(1 + (Iint/Imid)**Ipower); Imid may either be given here explicitly or by default will be set to the mean unscaled intensity.
Ipower defaults to 3.

(b) Treatment of Bijvoet-related observations

By default, all observations (I+ & I-) are treated alike in scaling. This is normally the correct thing to do, since the anomalous differences are usually small and randomly positive and negative. In a case with large anomalous differences and high redundancy, it may be better to keep the I+ & I- observations separate in the scaling. Note that typically this will severely reduce the scaling overlaps between different parts of the data, and is not recommended except in special cases.

Subkeys:

ANOMALOUS: keep I+ and I- observations separate in scaling
NOANOMALOUS: use I+ and I- together in scaling [default]

Set whether partially recorded reflections should be used in scaling, & if so, whether to use summed or scaled partials. By default summed partials are used in scaling as well as fulls. See introduction above for a description of the use of partially recorded reflections. Treatment of partials in the final averaging stage is defined with the FINAL command

Subkeys:

FULLS: use fully recorded observations only, & previously summed partials (from MOSFLM ADDPART)
ONLYFULLS: use fulls only: exclude previously summed partials (from MOSFLM)
SCALE_PARTIALS: use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
PARTIALS: use summed partials in scaling (if present) [this is the default]. The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the scaling step only (not merging):

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]

<Sdrej> [<Sdrej2>]

[ALL <Sdrej+-> [<Sdrej2+->]]

[KEEP | REJECT | LARGER | SMALLER]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging (FINAL) passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL -8, ie test within I+ or I- sets on 6sigma, between I+ & I- with a threshold adjusted upwards from 8sigma according to the strength of the anomalous signal. The adjustment of the ALL test is not necessarily reliable.

If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.

If ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged "@" in the ROGUES file. In the scaling step, the outlier check includes all observations, unless anomalous observations are kept separate in scaling (INTENSITIES ANOMALOUS: this is an unusual option for special cases only).

Subkeys:

SEPARATE: rejection & deviation calculations only between observations from the same dataset
COMBINE: rejection & deviation calculations are done with all datasets [default]
SCALE: use these values for the scaling pass
MERGE: use these values for the merging (FINAL) pass
sdrej: sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]: special value for reflections measured twice [default = sdrej]
ALL: check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF). A negative value [default -8] means adjust the value upwards according to the slope of the normal probability analysis of anomalous differences (AnomPlot)
sdrej+-: sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)[default check within I+ & I- sets only]
[sdrej2+-]: special value for reflections measured twice [default = sdrej+-]
KEEP: in merging, if two observations disagree, keep both of them [default]
REJECT: in merging, if two observations disagree, reject both of them
LARGER: in merging, if two observations disagree, reject the larger
SMALLER: in merging, if two observations disagree, reject the smaller

The test for outliers is described in Appendix 5

ANOMALOUS [OFF] [ON | ALL]

[RUN <Nrun>]

[MATCH [ [NO]INRUN | SPINDLE | INVERT | <hkl symmetry>]

[PHIDIF <maximum Phi difference>]

[TIMEDIF <maximum Time difference>]]

Controls the treatment of anomalous scattering information in the merging step. Note that the option of selecting matching anomalous pairs is not recommended for normal use: it is likely to lead to seriously incomplete data in many cases, and the results should be compared carefully with those with the MATCH option switched off.

Subkeys:

OFF [default]: no anomalous used, I+ & I- observations averaged together in merging
ON | ALL: separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own
RUN <run number>: set run for this MATCH option to apply to, otherwise it applies to all runs [default]
MATCH: use only matching I+ & I- pairs in merging; Matching pairs are :-

Definition of symmetry:-

SPINDLE

related by negation of reciprocal index closest to spindle: this option requires full orientation data to be present in the file

INVERT

related by inversion of indices, i.e. -h, -k, -l

specified hkl symmetry (e.g. h, -k, l)

PHIDIF <DeltaPhi>

maximum difference in Phi (ROT) between matching pairs

TIMEDIF <DeltaTime>

maximum difference in TIME between matching pairs

RESOLUTION [RUN <Nrun>] [DATASET <dataset_name>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

Set resolution limits in Angstrom, either order, optionally for individual datasets, or for runs (in which case this command MUST come after definition of the run). The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If the RUN & DATASET subkeywords are omitted, the limit applies to all runs. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command. [Default use all data]

TITLE <new title>

Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

ONLYMERGE

Only do the merge step, no initial analysis, no scaling (== INITIAL NONE; NOSCALE). Note that this will usually need to be combined with a RESTORE command.

RESTORE [<Scale_file_name>]

Read initial scales from a SCALES file from a previous run of Scala (scales are normally dumped on every cycle, see DUMP). The number of scales defined for each run this time should typically be the same as in the dump, although a set of scale factors along ROTATION or DETECTOR may be extrapolated to additional batches which were not present in the initial scaling. The file may contain scales for runs which are not used this time, but new runs may not be added. RESTORing from a scale file which does not properly correspond to the run which generated the file is liable to give silly results. No initial analysis pass will be done unless the command INITIAL ANALYSE is given.

INITIAL MEAN | UNITY | RUN <RunNumber> <InitialScale> | NONE | ANALYSE

Define method of setting initial scales

Subkeys:

MEAN: from mean intensities by rotation range [default]
UNITY: set all scales = 1.0
RUN <RunNumber> <InitialScale>: set initial scale factor for this run If this option is used, any runs whose scales are not set explicitly will have their scales set = 1.0
NONE: no initial analysis pass, set all scales to unity
ANALYSE: force initial analysis pass even if RESTORE option is used

PRINT [<subkey>]

Define amount of printing

Subkeys:

NONE: almost none
BRIEF: some more [default]
CYCLES: more information about each minimization cycle
FULL: quite a lot
DEBUG [<reflection_interval>]: far too much: also define reflection interval for printing
ALLOVERLAP: print all numbers in overlap matrix after initial pass, rather than the default condensed table
OVERLAP: print condensed table of overlap matrix after initial pass
NOOVERLAP: no printing of overlap matrix after initial pass [default]

CYCLES [[NUMBER] <Ncycle>] [CONVERGE <Conv_limit>] [REJECT <Rej_cycle>] [WEIGHT VARIANCE | UNIT ]

Define number of refinement cycles, convergence limit, and weighting scheme for scale refinement

Subkeys:

[NUMBER]: maximum number of cycles [default 10]
CONVERGE: convergence limit (multiple of sd(param)) [default 0.3]
REJECT: 1st cycle number for rejection of outliers [default 2] The default is not to reject outliers on the first cycle when the scales may be a long way off, but if the initial scales are reasonable (particularly if they come from a previous run) it is probably better to exclude outliers from the first cycle as well
WEIGHT VARIANCE | UNIT: Weighting scheme for scale refinement: VARIANCE weighting is default and usual; UNIT weights may help if the scale-factors vary over a large range (unit weights have not been much tested)

EXCLUDE [RUN <Nrun>]
[[NO]EMAX <maximum_E> | EPROB <minimum_probability>]
[SDMIN <value>] [SDMAX <value>] [ABSMAX <value>]
[ARC INSIDE|OUTSIDE <X1> <Y1> <X2> <Y2> <X3> <Y3> ... <Xn> <Yn>]
[RECTANGLE <Xmin> <Xmax> <Ymin> <Ymax>] [BATCH <batch range>|<batch list>] [CRYSTAL <crystal_name>] [DATASET <dataset_name>]

Set intensity limits or positional limits for excluding observations.

Limits for scaling and merging passes:-

EMAX or EPROB, ARC, RECTANGLE, BATCH, CRYSTAL and DATASET limits apply to all stages of the program

Limits for scaling pass only:-

If an observation is considered too weak (I .lt. sd(I) * SDMIN), or if an observation is too strong (I .gt. sd(I) * SDMAX .or. I .gt. ABSMAX), then all observations of that reflection are omitted from the scaling. Exclusions are not applied to a Reference run. [Default EXCLUDE SDMIN 3.0]

These exclusions do not apply to the initial scale calculation (INITIAL MEAN), nor to the output statistics, only to the scaling. The test is only done on fully recorded observations, and against the input standard deviations (i.e. unmodified by SDCORRECTION parameters)

Subkeys:

RUN <Nrun>: defines a run number (previously defined) for these exclusion parameters to apply to: else applies to all runs (this applies to SD, arc and rectangle limits only: the EMAX|EPROB limit applies to all runs)
EMAX <maximum_E> | EPROB <minimum_probability>: Define maximum normalized amplitude E allowed: this may be given either as the maximum E-value EMAX for an acentric reflection eg 8 - 10, or as the minimum allowed probability EPROB eg 1e-8 Eprob = exp (- Emax**2). Excluded reflections are listed in the log file, and in the ROGUES file. See R.Read, CCP4 Study Weekend, Sheffield 1999. [Default EMAX 10]. NOEMAX switches this test off
SDMIN: minimum sd multiple for inclusion
SDMAX: maximum sd multiple for inclusion
ABSMAX: maximum absolute value i.e. observations are excluded if:-
ARC: defines an area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging, as a circular arc. Data are excluded either INSIDE (lower radius) or OUTSIDE (higher radius) the arc. The arc is defined by fitting a circle to the coordinates of 3 or more points: points 1 (X1,Y1) and 2 (X2,Y2) define the ends of the arc (in either order). If X1,Y1 = X2,Y2 a complete circle is excluded. A series of arcs may be defined. This option allows for the exclusion of shadows on the detector from eg backstop or cryocooler etc.
RECTANGLE: defines a rectangular area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging. A series of rectangles may be defined.
BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |: Define a list of batches, or a range of batches, to be excluded altogether.
CRYSTAL <crystal_name>: Define a crystal name to be excluded altogether. This would usually be used in conjunction with the DATASET subkey.
DATASET <dataset_name>: Define a dataset name to be excluded altogether. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.

[UN]TIE [SURFACE [<Sd_srf>]] [BFACTOR [<Sd_bfac>]][A1 [<Sd_a1>]][ROTATION [<Sd_z>]][DETECTOR [<Sd_xy>]]

Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) or in detector plane (DETECTOR = secondary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended (a) if scales are varied across the detector, eg TIE DETECTOR 0.1, or (b) for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001

UNTIE may be used to remove the default restraints on SURFACE and A1 (not recommended)

SURFACE: tie surface parameters to spherical surface [default is TIE SURFACE 0.001]
BFACTOR: tie Bfactors along rotation
A1: tie TAILS parameter A1 to starting value, ie that given on the SCALES command [default is TIE A1 4]
ROTATION: tie parameters along rotation axis (mainly useful with BATCH mode)
DETECTOR: tie parameters on detector

NORMALISE [SCALES|BFACTOR] [BEST|FIRST|RUN <run_number>]

Controls which scale factors and Bfactors are "normalised", ie set to 1.0 or 0.0. The overall scale of the data is indeterminate, so one scale factor needs to be set = 1.0: similarly, one relative B-factor needs to be set = 0.0. The default options are to normalise scales on the first part of the first run, and Bfactors on the best part (ie to make all the Bfactors negative: because of the smoothing they may still go slightly positive). The normalisation of the scales is not important, but the normalisation of Bfactors is, because negative Bs will sharpen data, while positive Bs will blur it.

SCALES: Following keywords apply to scales
BFACTORS: Following keywords apply to Bfactors [default]
BEST: Normalise B-factors on the best bit (not applicable to scales) [default for Bfactors]
FIRST: Normalise on the beginning of the first run [default for scales]
RUN <run_number>: Normalise on the beginning of the defined run

OUTPUT <subkeywords>

Control what goes in the output file. Three types of output MTZ file may be produced: (a) AVERAGE, average intensity for each hkl (I+ & I-). (b) SEPARATE, observations from input file with scale calculated, for re-input to Scala (or Postref, see POSTREF option) (c) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected. AVERAGE and UNMERGED may be combined to write both types of file at the same time: in this the filename is created from the HKLOUT filename (with dataset appended if the SPLIT option is on) with the string "_unmerged" appended.

A reference batch is always excluded from the final statistics, even if it is included in the output file (only possible with the SEPARATE option).

File format options:

NONE: no output file written
AVERAGE: [default] output averaged intensities, <I+> & <I-> for each hkl
SEPARATE: output observations as input, but with added columns for SCALE etc. This file may be reinput to Scala for further scaling (e.g. with a different scaling model)
POSTREF: append columns for Postref. This option implies SEPARATE. The added columns are IMEAN SIGIMEAN ISUM SIGISUM IMEAN mean of fully-recorded reflections ISUM summed partials (partials only)
UNMERGED: apply scales, sum or scale partials, reject outliers, but do not average observations
POLISH: Write reflections also to a formatted file as well as the MTZ file (logical name SCALEPACK) in some obscure format as written by "scalepack" (or my best approximation to it). Why would anyone want to do this? If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.

Dataset options (only relevant for multiple datasets):

SPLIT: If there are multiple datasets defined, split them into separate output files [this is the default]. The base filename is taken from the HKLOUT, with the dataset name added for each dataset.
TOGETHER: Write out multiple datasets into the same file, but labelled as different datasets

Other options:

(a) UNMERGED options:

ORIGINAL: write original indices hkl: M/ISYM = 1 for all reflections
REDUCED: [default] hkl indices are reduced to asymmetric unit, as in input file
BEAMS: output direction cosines of incident (s0) and diffracted (s2) beams in output file (columns S0X, S0Y, S0Z, S2X, S2Y, S2Z). These vectors are in the orthogonalised crystal frame with x,y,z axes along a*, c x a*, c (or in the diffractometer frame if the keywords DBEAMS is used)

(b) SEPARATE (POSTREF) options the following apply only to the SEPARATE (POSTREF) option, and must not precede that switch:-

REFERENCE: write reference batch (if present) to output file
NOREFERENCE: [default] omit reference batch (if present) from output file
KEEP: [default unless average] keep reflections outside resolution limits. The SCALE column will be set = 0.0
KEEP SCALE: keep reflections outside resolution limits, and calculate scales for them. This is dangerous unless the proportion of reflections omitted from scaling is small
EXCLUDE: [default if AVERAGE] exclude reflections outside resolution limits
OMIT OUTLIERS: omit rejected outliers from output file (SEPARATE & POSTREF options only). In this case a ROGUES file is written (see below) [default keep them in, but flagged in the FLAG column]
OMIT PARTIALS [RUN <Nrun>]: omit partially recorded reflections from output file. If no run number is given, then it applies to all runs. Multiple runs may specified on successive OUTPUT OMIT PARTIALS RUN commands
ROGUES: write a list of rejected reflections is written to the file ROGUES. This may be assigned on the command line. A ROGUES file is always written for the AVERAGE & UNMERGED options. [for SEPARATE, default no ROGUES file written unless OMIT OUTLIERS option used]

ACCEPT [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE]

Set options to accept observations flagged as rejected by the FLAG column from Mosflm (Version 6.2.3 and later). By default, any observation with FLAG .ne. 0 is rejected.

Subkeys:

OVERLOADS: Accept profile-fitted overloads
BGRATIO: Observations are flagged in Mosflm if the ratio of rms background deviation relative to its expected value from counting statistics is too large. This option accepts observations if bgratio < bgratio_max [default in Mosflm 3.0]
PKRATIO: Accept observations with peak fitting rms/sd ratio pkratio < pkratio_max [default maximum in Mosflm 3.5]. Only set for fully recorded observations
GRADIENT: Accept observations with background gradient < bg_gradient_max [default in Mosflm 0.03].
EDGE: Accept profile-fitted observations on edge of active area of detector

FINAL [ NONE | FULLS | ONLYFULLS

| SCALE_PARTIAL <Minimum_fraction>

| PARTIALS [[NO]CHECK] | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP] [MAXWIDTH <maximum_width>] ]

Select whether or not to use summed or scaled partials in the final analysis after scale determination. If this command is missing, summed partials will be included if the input file contains a FRACTIONCALC column.

Subkeys:

NONE: no final analysis/output pass
FULLS: use fulls only (& previously summed partials, eg from MOSFLM ADDPART or Scalepack) [default if no FRACTIONCALC column]
ONLYFULLS: use fulls only: exclude previously summed partials (from MOSFLM)
SCALE_PARTIALS: use scaled partials greater than <Minimum_fraction> in the merging. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
PARTIALS: use summed partials in final analysis (if present). See introduction above for a description of the use of partially recorded reflections. [this is the default if FRACTIONCALC column is present] The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the merging step only (not scaling):

[UN]FIX [V] [A0] [A1]

Option to fix or free TAILS parameters: by default V & A1 are free, A0 is fixed [default A0 = 0.0]. Fixing A1 may help for low resolution data particularly.

LINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

run_2 will use the same SURFACE (or SECONDARY) or TAILS parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. LINK TAILS ALL will use the same tails parameters for all runs for which TAILS parameters are refined. The keyword ALL will be assumed if omitted.

For TAILS parameters, the default is LINK TAILS ALL, but any LINK or UNLINK command will override this.
For SECONDARY or SURFACE parameters, the default is to link runs which come from the same dataset. They should be UNLINKed if they are different.

UNLINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted, e.g. UNLINK TAILS [ALL] will use separate tails parameters for each run.

SKIP <N_skip> [[FOR] <N_skip_cycles>]

Allow a subset of reflections to be used during the initial cycles of scaling, to speed up the program. For the first N_skip_cycles, only every N_skip'th unique reflection will be used. N_skip_cycles defaults = Ncycle-2, and the program will force 2 more cycles with all data if convergence is reached while reflections are still being skipped. You should check that convergence has been reached with all observations, particularly if the number of observations used in the early cycles is small.

FILTER <Filter> [<Damp>]

Define filter level, & damp level. In the minimization, shifts corresponding to eigenvalues .lt. <Filter> are removed, <Damp> is added to all eigenvalues. [Default 1.0e-6, 0.0]

DAMP [NONE] | <Damp> <NcycDamp>

Set damping level for shifts. <Damp> is added to all eigenvalues for the first <NcycDamp> cycles. This may be useful if the scales vary over a wide range, particularly if the scale refinement diverges at first, but is not normally recommended, as it seems to slow convergence. Default is DAMP NONE. If <NcycDamp> is omitted, the damping applies to all cycles

BINS <Nsrange>

Define number of resolution bins for analysis [default 10]

XYBINS <Nx> [<Ny>]

Define number of bins across detector, x (=XDET) and y (YDET). Only used if XDET, YDET columns are present in input file <Ny> defaults to <Nx>. XYBINS 0 turns off analysis [default Nx = Ny = 20]

SMOOTHING <subkeyword> <value>

Set smoothing factors ("variances" of weights). A larger "variance" leads to greater smoothing

Subkeys:

TIME <Vt>: smoothing of B-factors [default 0.5]
ROTATION <Vz>: smoothing of scale along rotation [default 1.0]
DETECTOR <Vxy>: smoothing of scale on detector [default 1.0]
PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>: maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3.0]

INSCALE OFF | ON

Switch OFF or ON application of an input SCALE column. By default, if the input file contains a column called SCALE (e.g. from a previous run of Scala), it will be applied.

NOSCALE

Don't do any scaling, just the final analysis (equivalent to CYCLES 0)

DUMP [<Scale_file_name>]

Dump all scale factors to a file after each cycle. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line. DUMP is set by default, but may be turned off with the NODUMP command.

NODUMP

No dump of scales to file. Default is DUMP.

ANALYSE [[NO]NORMAL] [[NO]PLOT] [MAXDENSITY <maximum point density>]

This command controls the normal probability analyses

Subkeys:

[NO]NORMAL: do [not] do normal probability analyses [default do them]
[NO]PLOT: do [not] write normal probability plot to output file with logical name DELTA [default do write file]. This file contains pairs of delta(expected), delta(observed) for fulls, then summed partials, then scaled partials
MAXDENSITY: maximum point density for normal probability plot. This plot includes a point for every observation, so in large datasets it can get very big. This parameter allows the sampling of the plot, so that in the central crowded part only some of the points are included in the plotfile [default 25]

HISTORY <history line>

Define optional line to be added to the history records in the file. This is in addition to a line giving the date and time of the run, which is always added. Only one optional history line may be added.

OVERLAPMAP

Write the overlap matrix from the initial analysis to a map file assigned to MAPOUT. Note that the initial analysis is not done if the RESTORE option is used or INITIAL NONE is set.

WIDTH WILSON | LINEAR | SQUARE [NBINS <Nbins>] [<mid-point>]

Select binning mode on intensity

Subkeys:

WILSON: [default] exponential bins
LINEAR: linear bins
SQUARE: quadratic bins

In each case, <mid-point> is the upper limit for the middle bin. The NBINS keyword may be used to specify the number of bins [maximum & default = 13]

NAME [RUN <RunNumber(s)>] PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file: each NAME command defines an output dataset.

If the RUN subkey is present, different runs (or groups of runs) may be assigned to different datasets: the run must have already been defined. If the RUN subkey is omitted, the names apply to all data. RunNumber may be a list or a range of run numbers (see examples below). DATASET must be present and must be unique: if PROJECT or CRYSTAL are omitted, they take the value last given for these parameters. DATASET may optionally be given in the syntax crystal_name/dataset_name

Examples:

name run 1      project  Lysozyme crystal  Native dataset L1
name run 2 3         dataset  L2  #  takes project & crystal from previous line
name run 4 to 6    crystal Native  dataset L3

BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name>

If there are multiple datasets in the input file, define the "base" dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i'th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.

PRIVATE

Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').

USECWD

Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.

RSIZE <row_length>

Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

NOHARVEST

Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

INPUT AND OUTPUT FILES

Input

HKLIN: The input file must be sorted on H K L M/ISYM BATCH

CORNERCORRECT
File containing pixel corrections for the corner correction option (qv), as an ADSC image file, for groups of 8x8 pixels for a binned image. Note this correction is unique to an individual detector, and Scala is unable to check whether the appropriate file has been given

Output

HKLOUT

	IMEAN_dataset SIGIMEAN_dataset  I(+)_dataset SIGI(+)_dataset  I(-)_dataset SIGI(-)_dataset

If the "SPLIT" option is specified then separate files are written for each dataset: files are named with the base HKLOUT name with the dataset name appended, as "_dataset"
(b) Option SEPARATE: The output file contains the same columns as the input, with some columns added if not previously present:-
(c) Option UNMERGED: As for SEPARATE, but with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. If a separate profile-fitted intensity column IPR, SIGIPR is present in the input file as well as columns I, SIGI, only one set will be chosen, as specified. Columns defining the diffraction geometry (e.g. XDET YDET ROT TIME LP FRACTIONCALC) will be preserved in the output file. If both AVERAGE & UNMERGED are specified, then the filename for the unmerged file has "_unmerged" appended

SCALES

scale factors from DUMP, used by RESTORE option

ROGUES

list of bad agreements

PLOT

If SCALES SECONDARY or SURFACE options are used, graph of correction surface (Plot84 format)

NORMPLOT

normal probability plot from merge stage
*** this is at present written is a format for plotting program xmgr (aka grace) ***

ANOMPLOT

normal probability plot of anomalous differences

            (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]

*** this is at present written is a format for plotting program xmgr (aka grace) ***

CORRELPLOT

scatter plot of pairs of anomalous differences (in multiples of RMS) from random half-datasets. One of these files is generated for each output dataset
*** this is at present written is a format for plotting program xmgr (aka grace) ***

ROGUEPLOT

a plot of the position on the detector (on an ideal virtual detector with the rotation axis horizontal) of rejected outliers, with the position of the principle ice rings shown
*** this is at present written is a format for plotting program xmgr (aka grace) ***

SCALEPACK

Formatted output selected by the command OUTPUT POLISH

EXAMPLES

Simple smoothed scaling, with some alternatives flagged as #*#

set crystal = "tfn2"
scala hklin     ${crystal}_srs  \
      hklout    ${crystal}_merge \
      scales    ${crystal}_${run}.scales \
      rogues    ${crystal}_${run}.rogues \
      normplot  ${crystal}_${run}.norm \
            << eof 

run  1 all

intensities partial     # we have few fulls: this is the default

cycles 20

anomalous off           # this is a native set
#*# anomalous on        #   or a derivative

sdcorrection 1.3 0.02   # from a previous run

# try it with and without the tails correction: this is with tails
scales   rotation spacing 10  bfactor on    tails
#*#
#*#  Some alternatives
#*# >> Recommended usual case
#*# >> If you have radiation damage, you need a Bfactor, 
#*# >>  but a Bfactor at coarser intervals is more stable
#*# scales  rotation spacing 5 secondary 6   \
#*#    bfactor on brotation spacing 20
#*# tie bfactor 0.5     ##  restraining the Bfactor also sometimes helps
#*#


reject 4              # reject outliers more than 4sd from mean
#*# reject 6 all 8  is default

exclude emax 8        # reject very large observations
                      #    default is Emax 10

eof

Simple Batch scaling

#!/bin/csh -f
#
# Scale data from Mosflm, merge with Scala
#
scala hklin jpa_example hklout jpa_example_sc \
      scales   jpa.scales \
      rogues   jpa.rogues \
      normplot jpa.norm \
      anomplot jpa.anom \
<< eof-1
run 1 batch 2001 to 2049
run 2 batch 2051 to 2100
cycles 8
sdcorr  1.5  0.03
scales batch  bfactor on    # batch scaling is generally poorer than smoothed 
reject merge 4
anomalous on
eof-1

A more complicated example, smooth scaling of native, then scaling of derivative to native

#!/bin/csh -f
#
#scala
#
cd /scr0/fm1/Temp
#
##
#==== Sort native output from Mosflm together
##
sort:
sortmtz hklout m6c8_sort.mtz  << end_sort
H K L M/ISYM BATCH I SIGI
m6c8a1.mtz
m6c8a2.mtz
end_sort
#
##
#==== scale native data together, no Bfactor, smooth scale on rotation
#==== merge native
##
scala hklin m6c8_sort.mtz hklout m6c8_scala <<EOF
run 1 batch 1 to 90000
title frozen native monoclinic m6c8 
scales bfactor off  rotation spacing 5
resolution 25 6.1
anomalous off
reject merge  4
sdcorr  1.3  0.04
EOF
#
# Convert native data into form suitable for reinput to Scala
combat  hklin m6c8_scala hklout m6c8_r << eof-r
input mtzi
labin I=IMEAN SIGI=SIGIMEAN
batch 1
eof-r
#
##
#==== Sort derivative data together
##
sort:
sortmtz hklout m6cb3_sort.mtz  << end_sort
H K L M/ISYM BATCH I SIGI
m6cb3b.mtz
m6cb3c.mtz
end_sort
#
##
#==== Combine together merged native & sorted derivative data, by
#     interleaving reflection records
#     Must resort data after this step
##
mtzutils:
mtzutils hklin2 m6cb3_sort.mtz \
         hklin1 m6c8_r \
         hklout temp_m6cb3_resort << eof-m
merge
eof-m
#
sortmtz hklin temp_m6cb3_resort hklout m6cb3_resort << eof-m
H K L M/ISYM BATCH
eof-m
#
##
#==== Scale and merge derivative data, using native data as reference (run 1)
#     Use secondary beam absorption correction for derivative,
#       but with some restraints (tie)
#     The reference data (native) is omitted from the output file
##
scala hklin m6cb3_resort.mtz hklout m6cb3_scala \
  scales    m6cb3.scales \
  rogues    m6cb3.rogues \
  normplot  m6cb3.norm   \
  anomplot  m6cb3.anom   \
  plot      m6cb3.plt    \
 <<EOF
run 1 batch 1 reference
run 2 batches 10 to 23156 exclude 23152          #  reject one duff batch
run 3 batches 23157 to 90000
title frozen native monoclinic m6cb3 
scales bfactor off  rotation spacing 5 secondary 6
tie surface 0.001  # this is the default value anyway
resolution 25 2.5
reject merge  4
anomalous on
sdcorr  1.1  0.005
EOF
#
#
#
#exit
trunc:
truncate  hklin  m6cb3_scala \
          hklout /ss3/fm1/Mutase/Derivs_FzM/m6cb3_F <<end-trunc
anomalous yes
resolution  25 2.5
nresidue   1400
labout  F=FM623 SIGF=SIGFM623 DANO=DANOM623 SIGDANO=SIGDANOM623
end-trunc

Scaling of several MAD datasets together, no reference dataset

#!/bin/csh -f

# Define a base name for files created in this script
set name = dfxe_3d
set project = dfxe
set crystal = crys1

# Input filenames for the 4 datasets at different wavelengths
set l1 = dfxe_1   # peak
set l2 = dfxe_2   # inflection
set l3 = dfxe_3   # hard remote
set l4 = dfxe_4   # 1A wavelength

set nl1 = peak
set nl2 = inflect
set nl3 = highE
set nl4 = lowE

# Angular spacing for smoothed scales
set spacing = 5

# Sort together the initial data files
sortmtz hklout ${name}_all << eof-s
H K L M/ISYM BATCH
${l1}.mtz
${l2}.mtz
${l3}.mtz
${l4}.mtz
eof-s


###=== Step 1 ==========================================================
###===    Scale all datasets together
###===    This will write out 4 output files, with filenames constructed
###===    by appending the dataset name on to the hklout name
scale_1:
set run = all
scala hklin ${name}_all  hklout ${name} \
      scales   ${run}.scales \
      normplot ${run}.norm \
      anomplot ${run}.anom \
      rogues   ${run}.rogues \
                                      << eof-r1
title Scale all datasets together, smooth, secondary
#  Define runs
run 1 batch 1000 to 1999
run 2 batch 2000 to 2999
run 3 batch 3000 to 3999
run 4 batch 4000 to 4999

#  Define datasets: this should have been done in Mosflm previously
name  run 1   project ${project} crystal ${crystal} dataset ${nl1} # peak
name  run 2   project ${project} crystal ${crystal} dataset ${nl2} # inflection
name  run 3   project ${project} crystal ${crystal} dataset ${nl3} # highE
name  run 4   project ${project} crystal ${crystal} dataset ${nl4} # lowE

# Dispersive differences for analysis are relative to the "base" dataset
base dataset highE


# If using secondary beam correction, usually turn Bfactor off
# unless you have high resolution and radiation damage 

scales rotation spacing ${spacing}  bfactor off secondary 6
tie surface 0.001    # this is the default restraint to keep the
                     # absorption surface spherical

anomalous on

# reject on 5 sigma within the I+ or I- sets, 8 sigma between I+ & I-
reject 5 all 8

eof-r1

eof-${run}

###===   Convert I to F, do Wilson plot, for each dataset
###===   A future change to Truncate may allow processing of multiple
###===   datasets together 
l1:
set ln = ${nl1}
truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
   DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
   F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${ln}

l2:
set ln = ${nl2}
truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
   DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
   F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${ln}

l3:
set ln = ${nl3}
truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
   DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
   F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${ln}

l4:
set ln = ${nl4}
truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
nresidues 117
ranges 30
labout F=F${ln} SIGF=SIGF${ln} \
   DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
   F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
eof_t${ln}

###===   Sort together merged data for all wavelength, outputting a 
###===   single record for each hkl
###===   For each wavelength, store amplitude F & sigF, 
###===   anomalous difference DANO (= F+ - F-) & sigDANO,
###===   and ISYM flag which shows if both F+ & F- were measured
cad  hklout ${name}_fcad  \
     hklin1 ${name}_${nl1}_f \
     hklin2 ${name}_${nl2}_f \
     hklin3 ${name}_${nl3}_f \
     hklin4 ${name}_${nl4}_f       << eof-c
labin  file_number 1  \
  E1=F${nl1} E2=SIGF${nl1} E3=DANO${nl1} E4=SIGDANO${nl1} E5=ISYM${nl1} \
  E6=F${nl1}(+) E7=SIGF${nl1}(+) E8=F${nl1}(-) E9=SIGF${nl1}(-)
labin  file_number 2  \
  E1=F${nl2} E2=SIGF${nl2} E3=DANO${nl2} E4=SIGDANO${nl2} E5=ISYM${nl2} \
  E6=F${nl2}(+) E7=SIGF${nl2}(+) E8=F${nl2}(-) E9=SIGF${nl2}(-)
labin  file_number 3  \
  E1=F${nl3} E2=SIGF${nl3} E3=DANO${nl3} E4=SIGDANO${nl3} E5=ISYM${nl3} \
  E6=F${nl3}(+) E7=SIGF${nl3}(+) E8=F${nl3}(-) E9=SIGF${nl3}(-)
labin  file_number 4  \
  E1=F${nl4} E2=SIGF${nl4} E3=DANO${nl4} E4=SIGDANO${nl4} E5=ISYM${nl4} \
  E6=F${nl4}(+) E7=SIGF${nl4}(+) E8=F${nl4}(-) E9=SIGF${nl4}(-)
eof-c

REFERENCES

P.R.Evans, "Scaling and assessment of data quality", Acta Cryst. D62, 72-82 (2006). Note that definitions of R_meas and R_pim in this paper are missing a square-root on the (1/n-1) factor
W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
Kay Diederichs & P. Andrew Karplus, "Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)
Manfred Weiss & Rolf Hilgenfeld, "On the use of the merging R factor as a quality indicator for X-ray data", J.Appl.Cryst. 30, 203-205 (1997)
Manfred Weiss, "Global Indicators of X-ray data quality" J.Appl.Cryst. 34, 130-135 (2001)

Appendix 1: Partially recorded reflections

Partially recorded reflections are usually used in scaling (controlled by the command INTENSITIES), and in the final analysis (controlled by the command FINAL). The default is to include summed partials in both scaling and the final analysis and merging.

Different options for the treatment of partials are set for both scaling & merging stages by the PARTIALS command, or separately for the scaling stage (INTENSITIES command) and the merging stage (FINAL command). Partials may either be summed (subkeyword PARTIALS, with various options), or scaled (subkeyword SCALE_PARTIALS): in the latter case, each part is treated independently of the others. If summed partials are used in scaling with the SCALES BATCH option, the FRACTIONCALC is used to partition the effects of the different scales for the two halves. In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 31, 32, 33), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags below to accept or reject partial sets according to their reliability.

Summed partials:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The options to use partials as well as fulls are defined separately for the scaling and merging steps on the INTENSITIES and FINAL commands. The parameters for the checks are set by the PARTIALS command for both stages, or separately on the INTENSITIES and FINAL commands. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

(a): At least two parts must be present (unless the CORRECT option is set, see (e) below)
(b): not more than MAXWIDTH <maximum_width> parts must be present [default maximum_width = 5]
(c): if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked unless NOTEST is specified, in which case they are rejected. NOCHECK switches off this check.
(d): if the TEST option is set (default if no MPART column), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.2]
(e): if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. As for the scaled partial option, this correction relies on accurate FRACTIONCALC values, so beware.
(f): if the GAP option is set, partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

      TEST 1.2 1.2 CORRECT 0.5

will scale up all partials with a total fraction between 0.5 & 1.2

      TEST 0.95 1.05

will accept summed partials 0.95->1.05, no scaling

      TEST 0.95 1.05 CORRECT 0.4

will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

Note that a profile-fitted intensity, if present in the file as a separate IPR column, will not be used for a scaled partial, unless the PARTIALS USE_PROFILE flag is set.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].

Appendix 2: Scaling algorithm

For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

 
        Sum( whl * ( Ihl - ghl * Ih )**2 )   Ref Hamilton, Rollett & Sparks

where Ih is the current best estimate of the "true" intensity

        Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)

Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

The inverse scale factor ghl is derived as follows:

        ghl = Thl * Chl * Shl

where Thl is an optional relative B-factor contribution, Chl is a scale factor (1-dimensional or 3-dimensional (ie DETECTOR option)), and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY, ABSORPTION or SURFACE options).

a) B-factor (optional)

For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl

        B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))

        where   Bi  are the B-factors at time ti
                delt    = tl - ti
                p(delt) = exp ( - (delt)**2 / Vt )
                Vt  is "variance" of weight, & controls the smoothness
                        of interpolation

        Thl = exp ( + 2 s B )
                s = (sin theta / lambda)**2

An alternative anisotropic B-factor may be used to correct for anisotropic fall-off of scattering: THIS OPTION IS NOT RECOMMENDED. This is parameterized on the components of the scattering vector (divided by 2 for compatibility with the normal definition of B) in two directions perpendicular to the Xray beam (y & z in the "Cambridge" coordinate frame with x along the beam).

        Thl = exp ( + 2[uy**2 Byy + 2 uy uz Byz + uz**2 Bzz])

        where  uy, uz are the components of d*/2

Byy, Byz, Bzz are functions of time ti or batch as for the isotropic Bfactor. The principal components of B (Bfac_min, Bfac_max) are also printed.

b) Scale factors

For each run, scale factors Cxyz are determined at positions (x,y) on the detector, at intervals on rotation angle z. Then for an observation at position (x0, y0, z0),

        Chl(x0, y0, z0) =
   Sum(z)[p(delz){Sum(xy)[q(delxy)*Cxyz]/Sum(xy)[q(delxy)]}/Sum(z)[p(delz)]

where   delz    = z - z0
        p(delz) = exp(-delz**2/Vz)
        q(delxy)= exp(-((x-x0)**2 + (y-y0)**2)/Vxy)
        Vz, Vxy are the "variances" of the weight & control the smoothness
                of interpolation

For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor (or set of scale factors across the detector) for each batch. The SLOPE (not recommended) option has two scale factors per batch, with the scale interpolated linearly between the beginning and end according to the rotation angle of the reflection.

c) Anisotropy factor

The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of
(1) the secondary beam (SECONDARY correction) in the camera spindle frame,
(2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis
or
(3) the scattering vector in the crystal frame (SURFACE option).

SECONDARY beam direction (camera frame)

         s  =  [Phi] [UB] h
         s2 = s - s0       
         s2' = [-Phi] s2
Polar coordinates:
         s2' = (x y z)
         PolarTheta = arctan(sqrt(x**2 + y**2)/z)
         PolarPhi   = arctan(y/x)

                             where [Phi] is the spindle rotation matrix
                                   [-Phi] is its inverse
                                   [UB]  is the setting matrix
                                   h = (h k l)

ABSORPTION: Secondary beam direction (permuted crystal frame)

         s    = [Phi] [UB] h
         s2   = s - s0       
         s2c' = [-Q] [-U] [-Phi] s2
Polar coordinates:
         s2' = (x y z)
         PolarTheta = arctan(sqrt(x**2 + y**2)/z)
         PolarPhi   = arctan(y/x)

                             where [Phi] is the spindle rotation matrix
                                   [-Phi] is its inverse
                                   [Q] is a permutation matrix to put
                                       h, k, or l along z (see POLE option)
                                   [U]  is the orientation matrix
                                   [B]  is the orthogonalization matrix
                                   h = (h k l)

Scattering vector in crystal frame

	(x y z) = [Q][B] h
Polar coordinates:
         PolarTheta = arctan(sqrt(x**2 + y**2)/z)
         PolarPhi   = arctan(y/x)

                             where [Q] is a permutation matrix to put
                                       h, k, or l along z (see POLE option)
                                   [B]  is the orthogonalization matrix
                                   h = (h k l)

then

 Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)

                             where Ylm is the spherical harmonic function for
                                       the direction given by the polar angles
                                   Clm are the coefficients determined by
                                       the program

Notes:

The initial term "1" is essentially the l = 0 term, but with a fixed coefficient.
The number of terms = (lmax + 1)**2 - 1
Even terms (ie l even) are centrosymmetric, odd terms antisymmetric
Restraining all terms to zero (with the TIE SURFACE) reduces the anisotropic correction. This should always be done

Appendix 3: TAILS correction

For many crystals, the reflection profile on rotation ("phi") is not a simple closed curve, but has long tails due at least in part to thermal diffuse scattering (TDS): the amount of this depends on the crystal, and is larger at high resolution than at low resolution. If all reflections were scanned through the same angle, then equal amounts of this diffuse scattering would be included in each reflection. However, in typical "coarse sliced" data collection schemes, where the image rotation width is larger than the reflection width, reflections are recorded on a variable number of images, 1, 2, 3 etc, and different amounts of the tails are included in the integrated intensity. This generally leads to a negative "partial bias", increasing with resolution, i.e. the apparent intensities of partially recorded reflections are higher than equivalent fulls.

The TAILS correction is an attempt to correct for the different truncation of tails, by using a simple (crude) model of thermal diffuse scattering, although the correction only attempts to correct for the different truncation, and does not attempt to correct for diffuse scattering itself.

Some of the ideas used are based on suggestions by R.H.Blessing, Cryst. Reviews, 1, 3-58 (1987), but he should not be blamed for this.

This is a brief account of method (see code & comments in subroutine dffscn for more details):-

I = J ( 1 + alpha)
where J is the Bragg intensity (true intensity) & I is the measured intensity, i.e. the TDS intensity is proportional to the Bragg intensity
alpha = alpha0 + alpha1 * (sin theta / lambda)**2
where alpha0 & alpha1 are refinable parameters. This is a simple linear isotropic model to the amount of TDS. alpha0 should be 0.0, and may be fixed as such, but allowing it to vary seems to help sometimes. Both alpha0 & alpha1 are reset if they go negative in the refinement. An extension of the model would be to make alpha anisotropic.
each reflection is scanned over an angle DPhi, which is an integral multiple of the image width (Dphi = Nimages * DelPhi). A rotation by DPhi moves the reflection a distance in reciprocal space

        Dq = Dphi * xsi,

where xsi is the radius from the rotation axis

If the half width of the reflection (including tails) is v (another refineable parameter), and 2v > Dq, then part of the tails will be truncated.

Taking a simple model of the shape of the tails as a triangle of base width 2v, height in the middle h (h = J * alpha / v), then the area in the tails (= tail intensity) and the intensity truncated by the restricted scan range can be calculated. Then the corrected ("true") intensity J can be calculated

For full scan:

        J = I / (1 + alpha)

For truncated scan (missing parts of tails C1 & C2)

        J = I / (1 + alpha*(1 - C1 - C2))

because this model is very crude, it seems insufficiently trustworthy to use as a proper correction for TDS. It does however seem reasonable to correct for the different amounts of tails truncation, C1 & C2 ( >= 0.0)

The correction applied is thus

        I' = I * (1 + alpha) / (1 + alpha*(1 - C1 - C2))

the parameters refined are v, alpha0 (A0) and alpha1 (A1). By default, the same parameters are used for all runs (see LINK, UNLINK). refinement of the parameters seems often to be unstable. If they are being reset from negative values, try setting A0 = 0.0 (e.g. SCALES . . TAILS 0.005 0.0 30.0) and fixing A0 (FIX A0, this is the default)

Appendix 4: Data from Denzo

DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Scala. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation) or POINTLESS (for Scalepack output only).

Both of these options have some problems

If you take the output from Denzo into Scala, there may be problems with partially recorded reflections: it is difficult for Scala to determine reliably that it has all parts of a partial to sum together.
If you take unmerged output from scalepack into Scala, most of the geometrical information about how the observations were collected is lost, so many of the scaling options in Scala are not available. Only Batch scaling can be used, but simultaneous scaling of several wavelengths or derivatives may still be useful

Appendix 5: Outlier algorithm

The test for outliers is as follows:

(1): if there are 2 observations (left), then
(2): if there 3 or more observations left, then
(3): iterate from beginning

RELEASE NOTES

Version 3.3.21

rename CC_Imean to CC(1/2), add to summary results

Version 3.3.18

Corrected geometry calculations for phi scans in a 3-axis system (eg from SAINT)
Allow rotation ("phi") to go backwards: in this case "time" (for relative B-factor) defaults to -ROT= -phi

Version 3.3.17

Removed spurious error warning of "batch not overlapping"
Default total maximum width 10�

Version 3.3.15, 16

Minor changes to rogueplot, and gap warnings, corrected Reference

Version 3.3.14

More robust to missing cell information
Slight correction to ice-ring placing in ROGUEPLOT

Version 3.3.8, 9, 10

Removed LP factor on SdB term of SD correction
Omit DelAnom > [5] * RMSanom from correlation analysis betweeen half datasets, as was done before for the "RMS correlation ratio". This makes the correlation coefficient more robust to outliers
Fixed some infelicities associated with observations flagged as bad (from Mosflm)

Version 3.3.4, 5

More tweaking of SD refinement
RESOLUTION DATASET option
Changed smooth scaling (& B-factors) to always use 3-point smoothing

Version 3.3.1,2,3

Logfile reorganised to work with baubles
OUTPUT UNMERGED columns names for the applied scales are now SCALEUSED, SIGSCALEUSED
Can now simultaneously write OUTPUT AVERAGE UNMERGED

Version 3.3.0

Refinement of SD correction parameters. Note that the SdB term is now multiplied by the LP factor, so the values of SdB cannot be compared with those from earlier versions.

Version 3.2.33

Fixed long-standing bug in ACCEPT option (this never worked properly)

Optional analysis of fractional bias v. fraction (PRINT FULL), WIDTH NBINS

Version 3.2.31

CORNERCORRECT option

Version 3.2.28

Implemented OUTPUT AVERAGE TOGETHER option

Version 3.2.22

INTENSITIES command now preserves PARTIALS CORRECT setting correctly (inadvertent interaction between these two keywords). This led scripts from ccp4i not to accept PARTIALS CORRECT settings.

Version 3.2.21

Removed incorrect update of FLAG in OUTPUT SEPARATE option
With multiple dataset SD analysis to determine SDFAC only looks at deviations within datasets

Version 3.2.20

Added ROGUEPLOT output

Version 3.2.18

Check for unique dataset name. Forces cell to fit lattice constraints (angles 90 or 120, a=b when necessary). Improved gap finding algorithm.

Version 3.2.17

Bugfix for reference run with multiple batches, and related "scales constant" problems.
Allow for generalised goniostat information in orientation block.

Version 3.2.16

Fixed buglet in setting dataset wavelength & cell if splitting one dataset into more than one

Version 3.2.15

Fixed bug in anomalous multiplicity, wrong for all but 1st dataset (totals not cleared)
Fixed bug in ANOMALOUS MATCH (since CCP4 5.0)

Version 3.2.13

Scatter plot and RMS Correlation Ratio analysis
OUTPUT POLISH scaled to fit in to format

Version 3.2.10

Fixed bug in handling reference data (failure saying not in any dataset)
Activated INTENSITIES COMBINE option

Version 3.2.8-9

Added Rpim statistic
Better diagnostic of negative U matrix

Version 3.2.0-3

Changed internal workings of SD analysis
Inflate default I+/I- outlier test (REJECT ALL) if strong anomalous signal (not a satisfactory solution to the outlier problem)
Fixed array size bug nstrej in refout (rejection flags)
Added processing of FLAG & BGPKRATIOS columns from Mosflm, also ACCEPT command & options to accept flagged observations
Added correlation analysis between random subsets of each dataset (ie split into equal halves)
Added summary at end of logfile, added anomalous multiplicity, reordered & tidied up logfile
Fixed a few bugs in UNMERGED output: for multiple datasets, this now defaults to SPLIT mode, ie it writes multiple output files, also for scalepack-type output.
Support for crystal level (XNAME) from MTZ file
Debugged ANOMALOUS MATCH
Suppressed spurious phi-range warnings with SCALES SMOOTH option
Force SCALES CONSTANT if ONLYMERGE and not RESTORE

Version 3.1.20

Fixed bug for files lacking XDET, YDET

Version 3.1.19

Fixed bug for BATCH BFACTOR mode in working out "best" batch

Version 3.1.18

Corrected totals in harvest file

Version 3.1.15

Fixed long-standing bug in resolution limits for analysis, when resolution is cut back from maximum in the MTZ file

Version 3.1.12

Fixed bug in BATCH mode when runs are automatically allocated

Version 3.1.6-11

Default maximum width of partials to 5 degrees
Minor syntax changes to keep compilers happy
Extend partial bias analysis to case when there are no fulls, correct small bug in previous analysis
Correct (again) case of no datasets defined in file

Version 3.1.5

In Project & dataset names, only accept alphanumeric and "-._" characters, change others to "_"
fixed bug in dtsstore routine (failed with 5 or more datasets)
defaults BASE dataset = 1 if no wavelength information in file

Version 3.0.N, 3.1.2-4

Many changes to handle multiple datasets properly
REJECT BYRUN option removed, replaced by COMBINE to do the opposite
Analysis of correlation and differences between datasets, for MAD
TIE BFACTOR
NORMALISE to allow normalisation of Bfactor on best bit
Wrap-around of Phi at 360 degrees
Fixed bug in anomalous normal probability plot
Added SCALES SMOOTH option
Automatic assignment of runs
TIE A1 option, and auto-fixing of tails parameters
OUTPUT BEAMS option
EXCLUDE BATCH, DATASET, CRYSTAL options
SCALES ABSORPTION option
default Bfactor smoothing made less smooth (Vwt = 0.5 instead of 1.0). This seems to improve behaviour (reduces oscillations)
Absorption options now use datum setting in orientation block if set (relevant for 3-circle goniostats for ABSORPTION option)

Version 2.7.6

Corrected B-factor in analysis table against batch when it is a function of TIME.

Version 2.7.5

Added EXCLUDE ARC option

Version 2.7.4

Added OUTPUT POLISH option

Version 2.7.3

Bug fix: "output separate" now works again

Version 2.7.2

Correct calculation of completeness, by counting reflections, instead of approximate calculation by volume.

Version 2.7.1

Harvest stuff added by Martyn Winn & Kim Henrick
Corrected bug in counting rejections by batch

Version 2.6.4

Set SDADD parameter = 0.02 by default
New algorithm to determine initial scales from mean intensities: this should work much better when different runs or batches have very different resolution ranges.

Version 2.6.3

Default to include summed partials in scaling
Classify partials for analysis in the batch to which they contribute most, rather than to the first batch they occur in.

Version 2.6.2

Allow extrapolation of RESTORE file to new batches
Buffer input echo so that there is a plain text & html form

Version 2.6.1

Fixed pointer bug in FIX

Version 2.6.0

Spherical harmonic expansion of scale factors: SCALES SECONDARY & SCALES SURFACE options

Version 2.5.5

Added EXCLUDE EMAX|EPROB option, to reject zingers & ice spots (Read's method)
Added unit weight option CYCLES WEIGHT UNIT
Added calls to libhtml to write html stuff into log file

Version 2.5.4

New optional outlier check comparing I+ with I- observations in ANOMALOUS ALL case (REJECT ... ALL)
Removed all (or most) html-reserved characters from logfile

Version 2.5.3

Checks on PhiRange if present

Version 2.5.2

Proper default for PARTIAL TEST (s/r chkkpf)
Count failures with inconsistent MPART flags

Version 2.5.1

Fixed uninitialized variables in s/r setscl, particularly affecting eg "scales batch detector 3 3"
Updated version of ea06cd

Version 2.5.0

Includes all Kim Henrick's harvesting calls, and calls to new MTZ library things for project & dataset names, but currently commented out or inactivated

Version 2.4.3

Fixed a couple of uninitialized variable bugs (in s/r anlini, nrmprb)

Version 2.4.2

New REJECT options for two observations (KEEP, REJECT, LARGER, SMALLER)

Version 2.4.1

Allow for MPART > 200, for Mosflm 5.51
Corrected partial check, to allow for errors in MPART

Version 2.3.2

Out of Phi range is warning, not fatal

Check for M>0 (flag set in Postref) for partials: previously didn't work with data from Postref

Correct labels for UNMERGED output option

DAMP keyword added

Bug fix to avoid normal probability analysis problem is no fulls

Version 2.3.1

Output labels for SEPARATE option changed to conform with CCP4 3.3 convention, i.e. I(+) and I(-) etc

Version 2.3.0

added "anomalous match" options for selecting matched I+ & I-

EXCLUDE does not check reference batch

Version 2.2.3

fixed bug in summed partials in case of "scales batch": this combination is still dubious, but awaits proper analysis
added PARTIALS keyword
fixed bug in calculation of Rfull: this was completely wrong if anomalous data was present
added INTENSITIES ANOMALOUS option to keep I+ & I- separate in scaling (not normally recommended)
allow incomplete orientation data in certain cases

Version 2.2.2, November 1996

defaults on partial summation improved (and again 18/12/96)
analysis on fulls only even when partials are used
bug fix in random number routine (thanks to Adam)
ONLYMERGE option
If scaling across detector (e.g. "scales detector 3 3"), checks on valid Xdet, Ydet (within limits in file header)
Rogues file lists Xdet, Ydet, Phi
default in scaling is "exclude sdmin 6" (omitting weak observations speeds scaling)
default FIX A0
reject outliers on every cycle if scales "restored" (else previous scaling gets messed up)
analysis by position on detector
fixed bug affecting "reject byrun" & deviations with anomalous on

Version 2.2.1, November 1996

Many changes from version 1.x.x

this version by default merges multiple measurements and thus replaces Agrovata. See the keyword OUTPUT for further description of the output options:-

-: AVERAGE [default] merged I (as from Agrovata); SEPARATE separate scaled measurements (as from older Scala versions), for reinput into Scala, or input into Agrovata [not recommended]; POSTREF scaled file for input to POSTREF; UNMERGED scaled, partials summed (or scaled), but not merged

by default, the SDCORRECTION parameter SdFac (multiplier) will be automatically adjusted, from the normal probability analysis of deviations. This is done in a separate pass through the data before the final merging pass. The command SDCORRECTION NOADJUST disables this adjustment.
The scaling option TAILS has been introduced. This makes some attempt to correct for the different truncation of the tails of diffuse scattering between fulls & partials. This option comes with a health warning: it should be treated with caution. Try with & without. (see commands SCALES . . TAILS, FIX, [UN]LINK)
the way of putting data (e.g. native) back into the scaling as a reference set has changed. See example.
treatment of summed partials has been elaborated (see FINAL & INTENSITIES keywords above). In 2.2.1, the defaults are not set optimally (whatever that means!): this is improved in 2.2.2

Recommended usage:

FINAL PARTIALS CHECK TEST 0.95 1.05     # for Mosflm

FINAL PARTIALS TEST 0.95 1.05         # for Denzo (but FractionCalc 
                                      #  is rather unreliable)

Scales are dumped to the file SCALES by default (see DUMP & RESTORE)
Normal probability analyses done, plots output to files NORMPLOT and ANOMPLOT in a format suitable for xmgr (from your favourite ftp server)
by default scaling now excludes weak data (EXCLUDE SDMIN 3.0)

AUTHOR

Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

SCALA (CCP4: Supported Program)

NAME

SYNOPSIS

DESCRIPTION

Scaling options

Control of flow through the program

Partially recorded reflections

Scaling algorithm

Corner correction

TAILS correction

Data from Denzo

Datasets

Data Harvesting

KEYWORDED INPUT - SUMMARY

Summary classification of keywords

KEYWORDED INPUT - DESCRIPTION

RUN <Nrun> [<subkeys>]

SCALES [<subkeys>]

CORNERCORRECT <correction table filename>

SDCORRECTION [[NO]REFINE] [UNIFORM | INDIVIDUAL | COMMON] [FIXSDB] [[NO]ADJUST] [RUN <RunNumber>] [FULL | PARTIAL | BOTH] <SdFac> [<SdB>] <SdAdd>

PARTIALS [NO]CHECK [NO]TEST [<lower_limit> <upper_limit>] CORRECT <minimum_fraction>] [NO]GAP MAXWIDTH <maximum_width> SCALE_PARTIAL <minimum_fraction> USE_PROFILE

INTENSITIES

REJECT [SCALE | MERGE] [COMBINE] [SEPARATE] <Sdrej> [<Sdrej2>] [ALL <Sdrej+-> [<Sdrej2+->]] [KEEP | REJECT | LARGER | SMALLER]

ANOMALOUS [OFF] [ON | ALL]

RESOLUTION [RUN <Nrun>] [DATASET <dataset_name>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

TITLE <new title>

ONLYMERGE

RESTORE [<Scale_file_name>]

INITIAL MEAN | UNITY | RUN <RunNumber> <InitialScale> | NONE | ANALYSE

PRINT [<subkey>]

CYCLES [[NUMBER] <Ncycle>] [CONVERGE <Conv_limit>] [REJECT <Rej_cycle>] [WEIGHT VARIANCE | UNIT ]

[UN]TIE [SURFACE [<Sd_srf>]] [BFACTOR [<Sd_bfac>]][A1 [<Sd_a1>]][ROTATION [<Sd_z>]][DETECTOR [<Sd_xy>]]

NORMALISE [SCALES|BFACTOR] [BEST|FIRST|RUN <run_number>]

OUTPUT <subkeywords>

ACCEPT [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE]

FINAL [ NONE | FULLS | ONLYFULLS

[UN]FIX [V] [A0] [A1]

LINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

UNLINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

SKIP <N_skip> [[FOR] <N_skip_cycles>]

FILTER <Filter> [<Damp>]

DAMP [NONE] | <Damp> <NcycDamp>

BINS <Nsrange>

XYBINS <Nx> [<Ny>]

SMOOTHING <subkeyword> <value>

INSCALE OFF | ON

NOSCALE

DUMP [<Scale_file_name>]

NODUMP

ANALYSE [[NO]NORMAL] [[NO]PLOT] [MAXDENSITY <maximum point density>]

HISTORY <history line>

OVERLAPMAP

WIDTH WILSON | LINEAR | SQUARE [NBINS <Nbins>] [<mid-point>]

NAME [RUN <RunNumber(s)>] PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name>

PRIVATE

USECWD

RSIZE <row_length>

NOHARVEST

INPUT AND OUTPUT FILES

Input

Output

EXAMPLES

REFERENCES

Appendix 1: Partially recorded reflections

Appendix 2: Scaling algorithm

Appendix 3: TAILS correction

Appendix 4: Data from Denzo

Appendix 5: Outlier algorithm

RELEASE NOTES

Version 3.3.21

Version 3.3.18

Version 3.3.17

Version 3.3.15, 16

Version 3.3.14

Version 3.3.8, 9, 10

Version 3.3.4, 5

Version 3.3.1,2,3

Version 3.3.0

Version 3.2.33

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]

<Sdrej> [<Sdrej2>]

[ALL <Sdrej+-> [<Sdrej2+->]]

[KEEP | REJECT | LARGER | SMALLER]