LAUENORM - WAVELENGTH NORMALISATION =================================== INTRODUCTION The program LAUENORM is used to perform a wavelength normalisation for Laue data using symmetry equivalent reflections measured at different wavelengths. The wavelength normalisation curve is calculated by splitting the data into wavelength bins, scaling these bins together and curve fitting the resulting scale factors. The program will also calculate inter film pack scale and temperature factors if more than one set of Laue data is input. An iterative procedure is carried out alternating wavelength normalisation curve calculation with film pack scaling. The output file may contain either merged or unmerged data as required. This version now also contains an option to deconvolute multiples data making use of the varying nature of the normalisation curve. The program was written by J.W. Campbell, Daresbury Laboratory. List of sections: Data Control Cards Input and Output Files Running the Program Notes Printer Output Error Messages Program Function References Examples DATA CONTROL CARDS Data Card 1 TITLE Title for the output MTZ file containing the normalised reflection data (max of 70 characters). Also used as a title on the printer output and as a batch title if a file of unmerged data is output. Data Card 2 A B C ALPHA BETA GAMMA The cell dimensions in Angstroms and degrees for the output MTZ file. Data Card 3 OPTION NPACK [ GEINTYP ISPAT ] OPTION is a program option code which may either be 'NORMALISE' to normalise a set or sets of Laue data or 'SCALE' if it is only required to apply a previously determined normalisation curve. NPACK is the number of input Laue film packs (default = 1). Normally the input files to LAUENORM are files created by the program AFSCALE. However, there is an alternative option to use the data from the 'A' films only directly from the '.ge1' files. To use this option, three extra parameters are given: The code 'GE' to indicate that .ge1 files are to be read. INTYP is the integration type: = 1, box integration, = 2, profile fitted (the default) ISPAT indicates the required handling of spatially overlapped spots: = 0, Do not include spatially overlapped data = 1, Include spatially overlapped data Data Card 4 LAMCOD LAMCOD is a code indicating the initial type of normalisation curve to be applied. It may be one of the following: UNITY, Apply values of 1.0 throughout (the default) PEALECT, Apply a Pea Lectin normalisation curve YORK, Apply a 'York data' normalisation curve INPUT, Apply the normalisation curves specified on the cards which define the wavelength ranges. Usually, if the program option 'NORMALISE' was specified, the type 'UNITY' is given. In this case a wavelength normalisation calculation will be followed by film pack scaling for each iteration. If, however, one of the other types is specified, each iteration will start with film pack scaling followed by wavelength normalisation. For the program option 'SCALE' one of the options other than 'UNITY' will normally be specified. Data Card 5 SCODE [parameters] [SC1 SC2 ... SC(NPACK)] [B1 B2 ... B(NPACK)] SCODE is a code indicating the type of scaling between film packs to be applied initially. It may be one of the following: UNITY, Apply unit scale factors and zero temperature factors (the default). IAVE, Apply initial scale factors calculated by equalising the average intensity for reflections within the the wavelength range SCLAM1 to SCALM2. The values of SCLAM1 and SCLAM2 follow the code 'IAVE' e.g. IAVE 0.95 1.05 SYMM, Calculate and refine initial scale factors using symmetry equivalent relections for which the measurements have been made within DLAM Angstroms of each other. If an absorption edge occurs between the measurements they are not included. The parameters following the code SYMM are the value of DLAM and the wavelengths of up to 6 absorption edges. e.g. SYMM 0.1 0.49 0.92 SYMMB, As for SYMM except that initial temperature factors as well as scale factors are calculated and refined. INPUT, Input scale factors and temperature factors are given on following cards. The program will read NPACK scale factors starting on a new card followed by NPACK temperature factors starting on a new card. If necessary the scale and temperature factors may each be input on more than one card. e.g. INPUT 1.0 0.676 0.875 0.0 -3.0 -6.0 Note: The scale factor and temperature factor cards are only given if the scaling code is 'INPUT'. Data Card 6 NSPGRP NRANGE NITER NBCYC NBFIX NPCYC NPFIX NSPGRP is the space group number. The symmetry positions for this space group are read from the CCP4 symmetry operators file and the corresponding point group symmetry matrices are read from the CCP4 point groups file. NRANGE is the number of separate ranges of wavelength for which wavelength bins are to be defined. (1 to 8) NITER is the number of iterations of the wavelength normalisation curve calculation and film pack scaling procedure to be carried out. (e.g. 3) NBCYC is the number of cycles of wavelength bin scale factor refinement to be carried out in each iteration. (e.g. 4) NBFIX is the number of the bin whose scale factor is to be set to 1.0. It is probably best to select a bin around the middle of one of the wavelength ranges. NPCYC is the number of cycles of film pack scale factor refinement to be carried out in each iteration (e.g. 4). This number will also be used for the number of refinement cycles for initial scale factor refinement when the option SYMM or SYMMB is specified. NPFIX is the number of the reference film pack (1 to NPACK) for which the scale and temperature factor are set to 1.0 and 0.0. Data Card 7 IWSCAL NCURV IBREF SIGREJ IRFAC IR2 IR3 SIGSCA IWSCAL is a flag indicating the type of weighting to be used when scaling together the wavelength bins. = 0, use weights of 1/sigI**2 (probably best avoided) = 1, use unit weights NCURV is the number of the iteration from which curve fitting is to be carried out on the refined wavelength bin scale factors. (e.g. 1) IBREF is the number of the iteration from which inter film pack temperature factors as well as scale factors are to be calculated and refined. (e.g. 2) SIGREJ. Reject reflections with I < SIGREJ*SIGI. (I = reflection intensity, SIGI = standard deviation of the intensity). Such reflections are excluded througout and are not written to the output file. (e.g. 1.5). If SIGREJ is zero the all reflections including negative intensities will be read in. Negative intensities will be output to the unmerged MTZ ROTAVATA/AGROVATA output file if this option is selected but will be omitted from the other files where F values are output. IRFAC is a flag = 0, Output R-factor tables only before any scaling and at the end of each iteration. This will be the value normally used. = 1, Output R-factor tables before scaling and after each wavelength normalisation, curve fitting and film pack scaling. IR2 is a flag which controls the printing of the inter-bin table of R-factors of type 2. = 0, omit this table (the usual choice) = 1, print this table IR3 is a flag which controls the printing of the inter-bin table of R-factors of type 3. = 0, omit this table (the usual choice) = 1, print this table SIGSCA. If a non-zero value is given then reflections with I < SIGSCA*SIGI are excluded from the calculation of the lambda normalisation curve and the calculation of the inter-pack scaling factors. If the value iz zero or not given then all reflections (except those omitted via the SIGREJ cutoff) will be used. (I = reflection intensity, SIGI = standard deviation of the intensity). Data Card 8 BMINL BMAXL BINS NFIT IFEX [IORD Q(0) ... Q(IORD)] BMINL, BMAXL are the minimum and maximum Lambda values (in Angstroms) for the range. If only one range was requested, then zero values may be given and the values will be set to the minimum and maximum lambda values present in the input Laue data file. BINS may either given as the width of the Lambda bins to be used in the wavelength binning (a value in Angstroms less than 1.0) or as the number of bins (a value greater than 1). The total number of bins for all the ranges must not exceed 100. NFIT is the order of the polynomial (1 to 10) to be fitted for the range. If a value of zero is given, no curve fitting is done for range but instead a single scale factor (as derived from the inter bin scaling) is used for all the reflections within a given bin. IFEX is a flag indicating whether or not extra end of range points are to be added to the wavelength bin scale factors before curve fitting. See the notes on curve fitting in section 9 for details. = 0, Do not add extra end points (the default). = 1, Add an extra end point at the low wavelength end of the range. = 2, Add an extra end point at the high wavelength end of the range. = 3, Add extra end points at both ends of the wavelength range. IORD Q0 ...Q(IORD). These values are only required if the input normalisation code LAMCOD was 'INPUT'. They are the order of the polynomial followed by the polynomial coefficents (IORD+1 values) of the normalisation curve to be applied. If necessary the polynomial coefficients may continued on additional cards. Data Card 9 IOUT IC1 IC2 IC3 IC4 IC5 IC6 IC7 IOUT is a flag controlling the type of output reflection data file to be written. = 1, Output a file of fully merged data with no anomalous data. = 2, Output a file of fully merged data including anomalous data. >2, Output a file of unmerged data with a batch serial number of IOUT. =-1, Write a file of fully merged data with no anomalous data in SHELX format. =-2, Write a file of unmerged data with lambda values in SHELX format. IC1...IC7 are flags indicating the centric zones (only relevant if IOUT=2). Non zero values indicate centric zones. The zones 1 to 7 are 0kl h0l hk0 hkk hkh hhl and h,-2h,l respectively. The version of LAUENORM (LNZ) which deconvolutes harmonics has the same input as LAUENORM but with an additional data control card as follows: The Laue data input must be from the .ge files (top films/plates) Data Card 10 IFMUL, SCMAX, IWMUL, NEGTYP, MDIAG IFMUL = 0, do not deconvolute multiples = 1, deconvolute multiples SCMAX, Maximum lambda curve scaling factor for inclusion of terms (0.0 = no limit) e.g. 25.0 IWMUL = 0, Use unit weights in equations = 1, Use weights of 1/sigma(i)**2 NEGTYP = 0, Eliminate results from equations with negative results = 1, Include recalculated positive components (normal option) = 2, Only output these MDIAG = 0, Do not write multiples diagnostics file >0, Write multiples diagnostics file INPUT AND OUTPUT FILES The input files are: a) The control data file b) The Laue Data input reflection files. These are normally card image files with one reflection per record giving the items H K L LAMBDA THETA I SIGI. See program documentation for further details. The data are unmerged and need be in no particular order. Alternatively, if the code 'GE' is specified on Data Card 3, then the input files are '.ge1' format files. The output file is one of the following: a) A reflection data file in standard MTZ format containing the merged normalised reflection data. The data items are either H K L F SIGF DANO SIGDANO or H K L F SIGF depending on whether or not anomalous data are to be processed and output. The reflection data are sorted on H, K and L. b) A reflection data file containing unmerged but normalised data as a single batch. The data are sorted on H, K, L and M/ISYM and the file is suitable as input to the program AGROVATA or for combining with other such files for input to ROTAVATA. The output data items are H K L M/ISYM BATCH I SIGI $$. c) A card image file containing the fully merged normalised reflection data (without anomalous) in SHELX format. (H K L F SIGF in format 3I5,2F8.2) d) A card image file containing the unmerged normalised reflection data (with lambda values) in SHELX format. (H L K F SIGF LAMBDA in format 3I4,3F8.2) and (if multiples are being deconvoluted) : a) A reflection file in standard MTZ format containing the deconvoluted multiple reflections with columns H K L F SIGF. b) Optionally a diagnostics file giving details of the reflections and equations used in the deconvolution. RUNNING THE PROGRAM Use the command 'laue lauenorm' Parameters: DATA The control data file. LAUEHKL1, LAUEHKL etc. The input Laue data files. (type = .afout or .ge1) HKLOUT The output reflection data file in MTZ format. This will be specified unless there is a SHELX output file. SHELX The optional output reflection file in SHELX format (type = .data) HKLMULT Optional deconvoluted multiples file in MTZ format. MULTDIAG Optional multiples deconvolution diagnostics file. NOTES Current program limits: Maximum number of wavelengths ranges = 8 Maximum number of wavelength bins (over all ranges) = 100 Maximum number of film packs = 100. Maximum number of input Laue reflections = 300000 PRINTER OUTPUT The printer output starts with details of the input control data, the symmetry matrices and the header of the output MTZ file. Details are then given of the number of bins to be used for each wavelength range, the number of reflections read from the Laue input data file, the minimum and maximum wavelength values found and the number of bins which will be used in the scaling. The numbers of reflections rejected for various reasons are then printed. A summary is then given showing the wavelength bins with their bin numbers, numbers of reflections, initial scale factors and wavelength ranges. A second table gives the number of reflections in each film pack. Tables are then printed showing the numbers of overlapping reflections between each of the pairs of bins and each of the pairs of film packs. For these tables, the overlaps are defined such that any given reflection may only contribute up to 1 overlap for a given pair of bins or packs. For the overlaps between a bin or pack and itself, each overlap indicates that the reflection occurred twice or more within the given bin or pack. If initial scale factors were calculated details of these are then given. R factor analyses of the data before scaling are then printed. Three types of Laue data merging R-factors are calculated. These are calculated as R = Sum(|Il-Imean|)/Sum(Imean) with sums over individual Laue reflections (Il = a Laue intensity) where Imean is calculated as follows for the three types. RFACT1 Imean = The mean intensity for all measurements (I+) and (I-) for the reflection (and all symmetry equivalents). RFACT2 Imean = The mean intensity for all measurements of the same sign (I+ or I-) for the reflection. RFACT3 Imean = The mean intensity for all measurements of the same sign (I+ or I-) for the reflection and with lambda values within 0.1 Angstroms. Tables of R-factors between wavelength bins and between film packs are also printed. In these tables, the R-factor between bins 'n' and 'm' is calculated as Sum|(In-Im)|/Sum(In+Im) where the sum is over all reflections with measurements in both bins 'n' and 'm'. A reflection which has multiple measurements within a bin and has measurements in different bins will give several contributions to the R-factor. If there are N measurements in bin 'n' and M measurements in bin 'm' then there will be N*M contributions to the R-factor between bins 'n' and 'm' (and 'm' and 'n'). For the R-factor within bin 'n' containing N measurements there will be NC2 (=N!/2*(N-2)!) contributions to the R-factor. As an example to indicate how many contributions a reflection will make to a given R-factor, take the case where a reflection has 2 measurements in bin 'n' and 3 measurements within bin 'm'. The numbers of contributions are as follows: RFACT1 5 R(n,m) 6 R(n,n) 1 R(m,m) 3 Two further optional tables of R-factors between wavelength bins may also be output. In the second table only pairs of measurements of the same sign (I+) or (I-) are included and, in the third table, only pairs of measurements of the same sign and with lambda values within 0.1 Angstroms are included. Then for each iteration the following output is printed: (Note: the order of the items will be different if the input normalisation curve type was not 'UNITY' and these sections will be omitted except for the final R-factor tables if the program option 'SCALE' was used.) a) The scale factors (and shifts etc.) for each cycle of wavelength bin scale factor refinement. b) R-factor analyses of the data after the bin scale factors have been applied (see above for details) if IRFAC was not zero. c) Details of the curve fitting (if requested) including mean square deviations for the different orders of polynomial and a table (with graph) showing the results of the curve fitting for the selected orders of polynomials. d) R-factor analyses of the data after scaling using the curve fitted scale factors (see above for details). (These will only be present if curve fitting was requested and IRFAC is non zero). e) The scale factors (and shifts etc.) for each cycle of film pack scale factor refinement. Details of the temperature factor refinement will also be given if this was carried out. f) R-factor analyses of the data at the end of the iteration after all wavelength and film pack scaling has been applied. A table is then printed giving the cumulative inverse scale factors for the wavelength bins (using curve fitted values at the mean bin wavelengths if curve fitting was requested) normalised to give a maximum value of 1.0. The inverse scale factors are given so that the values may be more easily compared, if required, with those from the program LAUESCALE. Finally, the number of reflections written to the output file is printed. ERROR MESSAGES a) General syntax error in the control data **SYNTAX ERROR IN FIELD n ** text b) Errors in the control data **PROGRAM OPTION code INVALID** **LAMBDA CURVE TYPE code INVALID** **INPUT SCALING TYPE code INVALID** **SCLAM1 > SCLAM2**: sclam1 sclam2 **MAXIMUM OF n RANGES ALLOWED** **INVALID CURVE FITTING PARAMETER SPECIFICATION** **LAMBDA RANGE LIMITS MUST BE IN INCREASING POSITIVE VALUES OF LAMBDA**: blmin1 blmax1 blmin2 blmax2 blmin3 blmax3 **INVALID VALUE OF IFEX FOR RANGE n** **BINS PARAMETER MUST BE GREATER THAN 0, BINS = x FOR RANGE y ** c) Error in the number of bins, film packs or reflections **MAXIMUM OF n BINS ALLOWED** **MAXIMUM OF n FILM PACKS ALLOWED** **BIN TO BE ASSIGNED A SCALE FACTOR OF 1.0 IS ABSENT** **MAXIMUM OF n REFLECTIONS EXCEEDED** **NO REFLECTION DATA FOR SCALING BETWEEN SCLAM1 AND SCLAM2 FOR FILM PACK n ** d) Error in sorting the data **STACK LIMIT OF 16 EXCEEDED IN REFSOR** e) Errors in Scaling the bins **MISSING DATA FOR SOME BATCHES** If insufficient overlaps are present then the program may fail during the scale factor refinement process with overflows, underflows etc. The table of overlaps should be inspected carefully and an attempt should be made to get round the problem by adjusting the bin selection parameters. f) Errors during curve fitting **LESS THAN TWO POINTS DETERMINED FOR CURVE FITTING** **INVALID PARAMETERS FOR EXTRAPOLATION** x x0 x1 x2 g) MTZ file handling errors Error messages may be produced by the MTZ file handling routines. PROGRAM FUNCTION The program LAUENORM is designed to perform an internal wavelength normalisation for one or more sets of Laue data based on symmetry equivalent reflections measured at different wavelengths (ref. 1). It may also be used to apply previously calculated normalisation curves to further sets of Laue data. For the normalisation, the symmetry and setting of the crystal are important factors in determining the success of the method. If more than one film pack is input then the program will refine the inter film pack scaling (and optionally temperature) factors. An iterative procedure is carried out which alternates cycles of wavelength normalisation and cycles of inter film pack scaling. The user may input a starting normalisation curve or starting values for the film pack scaling factors. Alternatively the program may use film pack scale factors derived from equivalent reflections where the equivalents are measured at at wavelengths close to each other (avoiding absorption edges) or may use approximate scale factors based on average intensity values for reflections within a given wavelength range. In many cases it may be sufficient to input initial scaling values of 1.0 for both the film pack scale factors and for the initial wavelength normalisation curve. The Laue reflection data are read in and split into user defined wavelength bins. Up to eight non-overlapping ranges of bins may be defined to allow for discontinuities in the normalisation function at absorption edges. After initial film pack scale factors have been determined, if required, the symmetry equivalent reflection overlaps between the wavelength bins are found and the bins are scaled together using the Fox and Holmes method (ref. 2) based on the program ROTAVATA. The refined scale factors as a function of wavelength are fitted by calculated polynomial curves if required. The inter film pack scale factors are then refined again using the Fox and Holmes method. This wavelength normalisation and film pack scaling procedure can be iterated as required and it is possible to start the iterations with the film pack scaling instead of the wavelength normalisation if required. The internal R-factors for the Laue data are extensively analysed and printed at each stage of the scaling process (see section 8 above for details). The normalised reflection data may either be written to an output file as a merged set of data (using 1/sigI**2 weighting in the calculating means and calculating sigF from the expression SQRT(I+sigI)-SQRT(I) ) or as an unmerged set of data suitable for input to the program AGROVATA or for combination with other batches of unmerged data for scaling them together using the program ROTAVATA. An option is also available to write a merged set of data to a file in a format suitable fo input to SHELX. Advantage can be taken of the variation of the normalisation function with wavelength and it has been recognised for some time (e.g. ref. 6, Prof Keith Moffat, University of Chicago, personal communication) that it provides, in principle at least, information which may be used in the deconvolution of intensities from multiple spots. This can be made use of when a reflection multiple has been measured with the crystal in more than one orientation (i.e. at different wavelengths) or when a symmetry equivalent multiple is present again at different wavelengths from the first multiple. To deconvolute a double, two or more such orientations are needed, three or more for a triple and so on. The method and some results are described in more detail in ref. 5. Further experience may well suggests improvements to the method but it has now been included within this current version of LAUENORM so that it can be tried by those interested. It should be noted that at the moment single reflections, which also occur as part of multiples, are not included in the deonvolution process. Notes on Curve Fitting Some factors affecting the choice of curve fitting are summarised here based on tests described in reference 3. The scale factors between the wavelength bins are taken as being the scale factors for the wavelengths at the mid points of the bins. These scale factors are then fitted using a polynomial curve fitting routine in which the user may choose the order of the polynomial to be fitted. The program indicates the goodness of fit for a series of different orders to help the user to make a suitable choice. There is however a potential problem in that the first half of the first bin and the second half of the final bin fall outside the range used to determine the polynomial with the result that the polynomial may behave badly in these regions. It is important therefore that the curve fitting at the ends of the range be examined closely even if the curve fitting in general looks satisfactory. In order to try and improve the behaviour of the curve, there is a program option to estimate the scale factors at the end points of a wavelength range by examining the local trend of the scale factors near the ends of the range. The end point at the end of a range is estimated by making a quadratic extrapolation from the three points nearest to the end of the range. The extrapolated point is added to the set of points before the curve fitting is performed. The following general conclusions were made: a) The choice of curve fitting parameters is important in getting good scaling of the data as judged by R-factor analyses. It should therefore be done with some care. b) Do not choose too low an order for the curve fitting. It is probably wise to use an order of at least 4 if possible. c) It is important to examine closely the behaviour of a fitted curve at the ends of the range and to look at the R-factors for the data in the end of range bins. Bad fitting in these regions may not be obvious if the data are only considered as a whole. d) In general it is best to use a reasonably large number of bins provided that there are sufficient reflection overlaps between the bins to give good inter-bin scale factors. e) If the number of bins is increased, it is important to increase the order of the polynomial fitted if a low order was used initially. Otherwise worse rather than better results may be obtained and problems of bad end of range behaviour may not be overcome. f) If only a few bins can be used then the addition of end of range points may be of considerable benefit. g) If a large number of bins is used then it is probably not a good idea to add the end points, as random errors in the scale factors at the three points closest to the end may give misleading information about the local change of scale factor in that area. REFERENCES 1) "Determination of the Wavelength Normalisation Curve in the Laue Method" by J. Campbell, J. Habash, J.R. Helliwell and K. Moffat, Information Quarterly for Protein Crystallography, 18, July 1986. 2) Fox and Holmes, Acta Cryst. (1966), 20, 886. 3) "Curve fitting during Laue data normalisation" by John W. Campbell, Information Quarterly for Protein Crystallography, 19, December 1986. 4) "The Recording and Analysis of Synchrotron X-radiation Laue Diffraction Photographs" J.R. Helliwell, J. Habash, D.W.I. Cruickshank, M.M. Harding, T.J. Greenhough, J.W. Campbell, I.D. Clifton, M. Elder, P.A. Machin, M.Z. Papiz and S. Zurek, J.Appl.Cryst. (1989) 22483-497 5) "Evaluation of Reflection Intensities for the Components of Multiple Laue Diffraction Spots. II. Using the Wavelength Normalisation Curve" by J.W. Campbell and Q. Hao, Acta Cryst. (1993) A49889-893 6) "Macromolecular Crystallography with Synchrotron Radiation", J.R. Helliwell (1992) Cambridge University Press. EXAMPLES Example of the control data for normalising a set of Laue data. PEA LECTIN LAUE DATA 50.8 61.6 137.4 90 90 90 NORMALISE 5 UNITY UNITY 19 2 3 4 10 4 1 1 1 2 1.5 0 0 0 0.45 0.92 5 4 0 0.93 2.07 10 5 0 1 1 1 1 0 0 0 0 0 25.0 0 1 0