fhscal (CCP4: Supported Program)

NAME

fhscal - Scaling of isomorphous derivative data using Kraut's method.

SYNOPSIS

fhscal hklin foo_in.mtz hklout foo_out.mtz
[Keyworded input]

DESCRIPTION

Derivative to native scale factors are calculated in equi-volume shells in reciprocal space using Kraut's formula (ref 1), generalised to use both centric and acentric data, and applied to the derivative data. This formula takes account of the degree of heavy-atom substitution, but does not require the presence of anomalous differences.

The program also computes a scale factor to put the isomorphous difference Patterson on the correct scale for the vector-space refinement program VECREF.

It also possible to apply the scales to all "scaleable" columns in a dataset (i.e. to F+/- and to the structure intensities; see the LABIN keyword), and this is advisable to avoid mixtures of scaled and unscaled data for a single derivative. For input MTZ files with dataset information, FHSCAL will attempt to check and warn you accordingly if it detects datasets which will be output with such a mixture. In these cases, specifying the AUTO keyword will cause the appropriate scale factor to be applied automatically to all such columns.

INPUT CONTROL DATA

Free format using keywords. The following keywords may be used; only the leading 4 characters are significant and the order is immaterial:

AUTO BIAS, CENT, END, LABIN, LIST, RESO, SHELLS, TITLE

The LABIN keyword is always required, the others are optional and assume default values if omitted. Use of BIAS 1 is recommended, provided the standard deviations produced by the data processing program (e.g. SCALA) are reliable. If in doubt, omit BIAS.

TITLE <title>

Title (max 100 characters).

LIST <list>

Number of reflections to list (for debugging purposes). Default = 0.

SHELLS <nshell>

Number of shells to divide data (aim to have at least 200 reflections per shell; the program may override your choice in order to maintain this limit). If there is more than one derivative to be scaled, this applies to the one with the highest resolution. Default = 20.

BIAS <bias>

Bias factor to multiply standard deviations. This is used to correct for the bias effect when averaging squares of differences. Normally this should be 1, however care should be taken that the standard deviations are valid; for example, some programs set the s.d. of a zero F to 9999; the correct value is sqrt(I+sigma(I))-sqrt(I). This will cause the program to give incorrect results; either ignore the s.d.'s by setting BIAS = 0, or better delete or correct the reflection(s). Default = 0.

RESOLUTION <maxres> [ <minres> ]

Maxres and minres can be given in any order and in Angstrom units or as 4*sin(theta)**2/lamda**2. If specified then reflections outside this resolution range will be excluded from scaling and the output MTZ file. If the card is absent then all possible reflections are used for scaling and the resolution range of the native determines the output. If only one number is given it will be taken as the high resolution cutoff.

LABIN <program label>=<file label>

MTZ assignments, see below.

AUTO

Switches on AUTOmatic column selection. This option can only be used if the input file contains dataset information.

It is only necessary to specify FPHn for each dataset on the LABIN line (except in special cases, see below). Other labels can also be specified if desired. The program will then try to identify all "scaleable" columns in the dataset, automatically read them in and then apply the appropriate scale factor determined from FPHn.

This option is intended to prevent a mixture of scaled and unscaled columns within a dataset, e.g. FPHn is scaled but not FPHn(+) and FPHn(-). There are a couple of caveats:

It is assumed that each dataset contains the information for one derivative.
There may be problems with the automatic scaling if datasets contain both SIGIMEAN and SIGDPHn. This is because the program cannot distinguish between sigmas for intensities (which need to be scaled by the square of the scale factor) and those for other quantities (which are multiplied by the scale factor).
In these cases the automatic selection will make a best guess at which sigma is which; the ambiguity can also be resolved provided that IMEAN and SIGIMEAN are explicitly set by the user on the LABIN line (which is safer).

CENT

Use only centric reflections to compute the scale factors. All reflections will be scaled and output.

END

Terminate input. Equivalent to end-of-file.

INPUT and OUTPUT FILES

Standard MTZ reflection files are used for input (HKLIN) and output (HKLOUT). The following column labels are used :

        H, K, L         Standard meaning.

        FP, SIGFP       Native amplitude and sigma.

If only 1 derivative is being scaled:

        FPH, SIGFPH     Derivative amplitude and sigma.

        DPH, SIGDPH     Derivative anomalous difference and sigma
                        (optional).

        FPH(+), SIGFPH(+), FPH(-), SIGFPH(-)
                        Derivative amplitudes and sigmas for Friedel pair
                        (optional).

If more than 1 derivative is being scaled (up to 20 per run), the column labels are FPH1, SIGFPH1, [ DPH1, SIGDPH1, FPH1(+), SIGFPH1(+), FPH1(-), SIGFPH1(-), ] FPH2, SIGFPH2, [ DPH2 ... ] etc.

Scales are applied to FPH, SIGFPH and DPH, SIGDPH, FPH(+), SIGFPH(+), FPH(-), SIGFPH(-) if present. All other columns, including those for which no label assignments are given, are output unchanged.

CAVEAT

WARNING : Reflections for which there is a derivative measurement but no native and which have a greater value of S than any reflection for which both are measured, will be rejected (because no valid scale can be applied). The rejections must be re-incorporated later when higher resolution native data becomes available.

In order to avoid losing reflections in the scaling procedure, it is worth considering using the dataset with the highest resolution limit as the reference (i.e. 'native') dataset in FHSCAL.

PRINTER OUTPUT

After echoing the input data, a table with the following columns is produced for each derivative:

Shell number
Maximum resolution for shell
Number of reflections in shell
RMS FP
RMS FPH
RMS (K.FPH - FP) for centrics
RMS (K.FPH - FP) for acentrics
Smoothed scale factor for shell

Overall scale and temperature factors are determined from a Wilson plot and printed with their estimated standard deviations; however the scale factors actually applied to the derivative data are obtained by interpolating the shell scale factors.

At the end, the V factor (pseudo-cell volume) for the FFT program for use in computing a correctly scaled isomorphous difference Patterson is given:

V' = V * C / Kv

where V is the true cell volume, C is the completeness of the data and the scale factor Kv is defined below. This value should be used for the VF000 keyword of FFT when preparing a Patterson map for the program VECREF.

ERRORS

In addition to the usual MTZ file opening errors:

ERROR(S) IN DATA: syntax errors were found in the general equivalent positions. Check for spurious characters, missing commas, extra commas etc.

ERROR - NO REFLECTIONS: no common reflections were found. Check column assignments, check reflection listing.

NO REFLECTIONS IN SHELL n. Try using smaller number of shells. Reflections may be missing in a resolution range.

PROGRAM FUNCTION

Kraut's formula can be derived by equating the Patterson origins

        K^2 . sum FPH^2  =  sum FP^2  +  sum FH^2               (1)

where FPH, FP and FH are derivative, native and heavy-atom amplitudes respectively, and K is the derivative scale to be determined.

For acentric reflections :

        <FH^2>a  ~=  2.<(K.FPH - FP)^2>                         (2)

Elimination of the unknown FH from (1) and (2) gives a quadratic equation for K, the solution of which is :

        K  =  (2.sum FP.FPH  -  sqrt(4.(sum FP.FPH)^2 -
              3.sum FP^2 . sum FPH^2)) / sum FPH^2              (3)

Note that in the original reference the leading factor given as 1/2 should be 2. This formula is valid only for acentric reflections. However it can easily be generalised to include centrics by noting that

        <FH^2>  ~=  <M.(K.FPH - FP)^2>                          (4)

where M = 1 for centric and 2 for acentric, so using (4) instead of (2) :

        K  =  (sum M.FP.FPH  -  sqrt((sum M.FP.FPH)^2 -
              sum (M+1).FP^2 . sum (M-1).FPH^2)) / sum (M-1).FPH^2
                                                                (5)

The numerator and denominator of (5) could be zero if all reflections in a shell were centric; this is unlikely, but just in case the equivalent formula can be used instead :

        K  =  sum (M+1).FP^2 / (sum M.FP.FPH  +
              sqrt((sum M.FP.FPH)^2 - sum (M+1).FP^2 . sum (M-1).FPH^2))
                                                                (6)

This formula is modified slightly to take into account the bias effect when averaging the squares of differences, i.e. the term <M.(K.FPH - FP)^2> is replaced by : <M.(K.FPH - FP)^2 - M.((K.sigma(FPH))^2+sigma(FP)^2)> where the sigma's have been multiplied by the BIAS factor.

The program reads the input control data, then makes a first pass through the reflections to get the resolution limits (can be controlled by RESO card), then a second pass to flag reflections as centric or acentric and accumulate the sums in shells for the scale factors. The scale factors are calculated and smoothed, and applied in a third pass. The program also computes a scale factor to apply to the isomorphous difference Patterson for use in the program VECREF:

        Kv  =  (sum (FPH-FP)^2)c + 2.(sum (FPH-FP)^2)a)  /
               (sum (FPH-FP)^2)c + (sum (FPH-FP)^2)a)

AUTHOR

Ian Tickle, Birkbeck College, London

REFERENCES

Kraut J, Sieker LC, High DF and Freer ST, Proc. Nat. Acad. Sci. USA, 48, 1417-1424 (1962).

EXAMPLE

Simple unix example script found in $CEXAM/unix/runnable/

fhscal.exam (Kraut scaling of derivative data.).

Also found combined with other programs in the example script ($CEXAM/unix/runnable/)

vecref.exam (Use in vector space heavy atom refinement).