f2mtz hklin foo.hkl hklout foo.mtz
[Keyworded input]
F2MTZ is a program to convert a free- or fixed-format formatted reflection file to MTZ format. It should be used on merged data in the later stages of structure determination, e.g. when importing data from X-Plor. If you wish to import data from a data processing program for reduction in SCALA and TRUNCATE, then it is better to use COMBAT. F2MTZ requires some keywords to describe the content of the MTZ header.
Possible keywords are:
CELL, CTYPOUT, END, FILE, FORMAT, LABOUT, NAME, SCALE, SKIP, SYMMETRY, TITLE
The CELL, SYMMETRY, LABOUT and CTYPOUT specifications are mandatory; the program will stop with an error if they are not present.
Followed by the cell lengths and angles.
Followed by the standard space group name or number, or explicit symmetry operators.
In the specification of the MTZ format, each data column has an associated label and type. The LABOUT command allows you to specify the column labels. The various programs in the CCP4 program suite expect reflection data to be labelled according to a specific scheme. If the label in the input reflection MTZ file does not match the expected name, all programs in the suite will allow you to specify the label names using the LABIN and LABOUT commands.
The standard label names used are as follows:
Name Item H, K, L Miller indices. S (4 sin**2 theta / lambda**2). IC Centric flag. M/ISYM Partial flag and symmetry number. BATCH Batch number. I Intensity. I' Selected mean intensity. SIGI sigma(I). SIGI' sigma(I'). FRACTIONCALC Calculated partial fraction. IMEAN Mean intensity. SIGIMEAN sigma(IMEAN). RATDELSD Agreement factor between films in a pack. FP Native `F' value. FC Calculated `F'. FPHn `F' value for derivative `n'. DP Anomalous difference for native data. DPHn Anomalous difference for derivative `n'. SIGFP sigma(FP). SIGDP sigma(DP). SIGFPHn sigma(Fn). SIGDPHn sigma(DELn). PHIC Calc Phase. PHIM Most prob phase. PHIB Phase. FOM figure of merit. WT weight HLA ABCD H/L coeffs HLB HLC HLD
This keyword allows you to specify column types. These take a number of strings as arguments, one per column. The MTZ format requires each data column to have an associated label and type; if a CTYPOUT specification is absent a default type of R (see below) is assumed in the output file. There is a special case for CTYPOUT: if it is `X' (an invalid type, normally), it is changed to `I' and that column is assumed to be an X-PLOR or SHELX free R factor and the difference in conventions is accounted for.
The data types for the different types of data which can be present in an MTZ file are as follows;
H index h,k,l J intensity F structure amplitude, F D anomalous difference Q standard deviation of anything: J,F,D or other G structure amplitude associated with one member of an hkl -h-k-l pair, F(+) or F(-) L standard deviation of a column of type G K intensity associated with one member of an hkl -h-k-l pair, I(+) or I(-) M standard deviation of a column of type K P phase angle in degrees W weight (of some sort) A phase probability coefficients (Hendrickson/Lattman) B BATCH number Y M/ISYM, packed partial/reject flag and symmetry number I any other integer R any other real X special dummy type for Rfree in X-PLOR; this is reset to I.
This information allows the programs to decide whether the data is being treated sensibly in a given situation.
Specify the project, crystal and dataset names for the output MTZ file. It is strongly recommended that this information is given. Otherwise, the default project, crystal and dataset names are "unknown", "unknown" and "unknownddmmyy" respectively.
The project-name specifies a particular structure solution project, the crystal name specifies a physical crystal contributing to that project, and the dataset-name specifies a particular dataset obtained from that crystal. All three should be given.
Supply a valid FORTRAN fixed format string, such as might be given in a FORMAT statement, including the brackets and quoted. E.g.,
FORMAT '(6(6X,F6.0))'
will read records comprising six numbers each preceded by a six-character-wide field which will be skipped. It is not possible to read more than one reflexion from each input line. However, a single reflexion can be read from more than one line by using the '/' format character. The 'X' format character can be used to skip over keywords in the input file, e.g. "INDE" in the X-Plor format. Under Unix, the cut (1) command may be useful for reformatting the input columns if, for instance, the relevant fields aren't in fixed positions.
Since the MTZ format stores all data, including indices, as reals, the FORMAT statement must read numbers as real (F), and not integer (I). Numbers which are integers in the input file should be read as real with a .0 extension, e.g. F6.0, see the CAVEATS section below. If the supplied FORMAT string includes I's then F2MTZ will convert them to the correct F format automatically, e.g. 3I4 will be converted to 3F4.0.
If no FORMAT keyword is specified, then the program will assume free format.
Specify a number of lines to be skipped at the start of the file before the data are read.
Put a suitable title in the output file.
Followed by a list of scale factors to apply to the values in each column of the output file, given as real numbers in column order.
Specify the ASCII input file. Usually this would be done via the logical name HKLIN. It can be in free format or in the format specified by the FORMAT keyword. The input typically contains h, k, l, Fp and SigFp.
If you are using this to convert Raxis data do not forget to run CAD afterwards - your data may not be in the conventional CCP4 asymmetric unit. This may apply to data from other sources too.
Note the comment above about REAL numbers being required. In cases like that of scalepack output where the format actually changes depending on the value of the datum, becoming integer rather than real in some cases (!) ensure you use a format item ending with `.0'. If the numbers are right-justified in the F format field you specify, the `.0' can't harm.
Note that I's in the supplied FORMAT string will automatically be converted to the appropriate F format; see FORMAT keyword.
If you read the input in free-format (no FORMAT specification) data which take the `default' value according to the Fortran rules will appear in the output with the canonical `missing' value. Free-format default values are acquired, for instance, if there are consecutive commas or the record is prematurely terminated by a slash.
================== CUT ================ #!/usr/bin/perl until(($_=<>)=~/\binde/i){} do{s/\b[a-z]\S*\s*//gi;print}while(<>); ================== CUT ================Usage: save it (e.g. as 'cnshkl.pl'), make it executable, then:
cnshkl.pl in.hkl > out.hklThen run f2mtz without the FORMAT option.
The program is at the mercy of the argument to FORMAT.
Make sure it's going to read the correct number of real numbers from the
right columns.
Morten Kjeldgaard.
combat, scalepack2mtz, Data Harvesting, CAD, cut (1)