FREERFLAG (CCP4: Supported Program)

NAME

freerflag - tags each reflection in an MTZ file with a flag for cross-validation

SYNOPSIS

freerflag HKLIN foo.mtz HKLOUT foo_out.mtz
[Keyworded input]

DESCRIPTION

This program is used to tag each reflection with a flag. It is strongly recommended to use the keyword UNIQUE, when the resulting reflection file will contain all possible (h k l) for the structure spacegroup and for the defined resolution. This file will be used for refinement and the tagged reflections used for the calculation of `Free R factors' (reference [1]).

This master list of FreeR assignments can then be transferred to any new data sets, or to isomorphous data sets such as substrate complexes. This is important if you plan to start refinement against new data using the previously refined model (as we all do!), or if you are combining different methods of refinement. In these cases it is essential to tag the SAME reflections.

This can be done by generating an mtz file with FreeR flags (the uniqueify script is recommended), then using the program MTZ2VARIOUS to convert it to any other (non-CCP4) format with the appropriate flag. These formats use different conventions to indicate the free and working sets:

Program: Convention for free and working set flags:
CCP4 assigns the flag FreeR_flag to be 0 for the free set and 1,...n-1 for the working set.
XPLOR assigns the flag TEST to be 1 for the free set and 0 for the working set.
CNS assigns the flag TEST to be 1 for the free set and 0,2,...n-1 for the working set.
SHELX assigns a flag with -1 for the free set and 1 for the working set.
TNT assigns a flag with 0 to indicate the free set.

Conversion from other (non-CCP4) formats requires the use of F2MTZ to convert the original file to an mtz file, which can then be extended to fit the CCP4 convention. See examples for XPLOR CNS SHELX or TNT input. The program FREERFLAG recognises the different conventions and automatically transforms the flags into the ccp4 convention (see table above).

Input:

HKLIN
This must contain: H K L plus some amplitude information.

Output:

HKLOUT
This will contain the same items as HKLIN plus the FreeR_flag appended to each reflection. The column is given the LABEL `FreeR_flag', and the CTYPE `I'.

By default, the FreeR_flag for each reflection is 0, 1, 2 etc., so that each value occurs (on average) in a fraction of the data specified by the FREERFRAC keyword. Under the CCP4 convention, the free set is assigned a FreeR_flag = 0, and the working set is assigned a flag between 1 and (n-1) where n = 1/fraction.

The FreeR_flag is randomly and uniformly distributed reflexion-by-reflexion, but, additionally, if the keyword NOSYM is not set, all reflections that are equivalent by the symmetry of the point group of the twin lattice (assuming the data is twinned), obtain the same flag. This includes both the possibility of merohedral and pseudomerohedral twinning. In the latter case, the obliquity parameter can be set using the keyword TWIN.

(Note that it is no longer possible to generate flags under the old system where the FREE percentage have the flag 0, and the rest of the data is flagged 1, and the OLDFREE keyword which used to allow this is now obsolete.)

This means that it is possible to select different blocks of reflections for exclusion, using a preset `exclusion flag'. The selected value should be held constant throughout a complete refinement run. For density modification and other procedures which need full `cross validation' (reference [2]) it may be useful to be able to vary the FREE set. WARNING - do NOT change the selected set casually!

If during any calculation (e.g. refinement, map calculation or agreement analysis) the program label assignment `FREE=FreeR_flag' is made, reflections which are flagged with the chosen value (default 0) are excluded from the calculation. For instance, during refinement this means that the agreement between their FP and the Fc is independent of the refinement procedure. The Free R factor calculated for these reflections is a useful indicator of the quality of the refinement, especially when there is a shortage of observations and the structure is underdetermined.

Treatment of systematic absences

Systematically absent reflections (if present) are treated like other reflections, and are also assigned a freeR flag. This is different to the behaviour of previous versions of the program, where systematic absences were flagged as "missing" by the FreeR_flag.

KEYWORDED INPUT

All the possible keywords are optional but if you wish to retain the existing freeR flags then COMPLETE must be given.

Keywords are:

FREERFRAC, SEED, COMPLETE, NOSYM, TWIN, UNIQUE, TWIN, END

The OLDFREE keyword is now obsolete and has no function.

FREERFRAC <fraction>

A <fraction> of all reflections in the file is flagged with a given value (`indicator') in the FreeR_flag column. The indicators will range from 0 to int(1.0/<fraction>)-1. <fraction> defaults to 0.05 and therefore the indicators will range from 0 to 19.

SEED

By default, for a given job on a given machine, the random number generator produces the same list of "random" free-R flags each time the job is run. Since you would generally only produce one list of free-R flags for each project, this is not usually a problem. However, if you specify the keyword SEED, then the random number generator is seeded with the current time, and will produce a different list of free-R flags each time the job is run.

COMPLETE FREE=<column>

This option will complete an existing list of FREE flags when extending the indices. If a FREE value is present in the file in <column> it is carried through for output; if the FREE <column> isn't present for a given reflection a value is given a value using the standard random number generation.

The keywords FREERFRAC and SEED are ignored when COMPLETE is specified. The fraction of data per bin is taken from the highest value of the freeR flag. If the file has an old style freeR (i.e. 0 or 1) then the output MTZ has the same format. The fraction of data flagged as free would then be calculated from the existing reflections. This fraction may not be exactly the same as the one you used originally because of statistical variations. See the example.

NOSYM

This option makes the program to emulate the older version, where no twinning symmetries were taken into account. If NOSYM is not used, all reflections equivalent by all possible twin laws acquire the same free R flag.

TWIN <obliquity>

This option sets the twin obliquity angle for pseudemerohedral twinning. The default value is 5.0 degrees, which is used if the keyword is not given.

UNIQUE

This option will create a unique list of reflections, i.e., give the MTZ file all allowed reflections present whether or not data have been measured for them. It replicates the functionality of the old 'uniqueify' script with the '-s' option (see documentation for UNIQUE). The systematic absences are kept but a warning is issued for them.

END

End input.

REFERENCES

  1. A.T. Brünger, Nature 355, 472-4 (1992)
  2. A.T. Brünger, "Free R Value: Cross-validation in crystallography", Methods in Enzym. 277, 366-396 (1997).
    See The Brunger Lab Publications for more references on the Free R.

SEE ALSO

f2mtz, mtz2various, sfall

EXAMPLES

FREERFLAG is normally run as part of the uniqueify script, examples of which are:

With the new keyword UNIQUE, it is no longer necessary to use the uniqueify script. The examples will be updated.

Examples of running FREERFLAG on its own can be found at: