mtz2various - produces an ASCII reflection file for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN, mmCIF, pseudo-SCALEPACK, XtalView (foo.phs) or user-defined format. This may contain amplitudes, intensities or differences.
mtz2various hklin
foo_in.mtz
hklout
foo_out
[Keyworded input]
This reads an MTZ file (assigned to HKLIN) and produces an ASCII file (assigned to HKLOUT)in a suitable form for MULTAN, SHELX, TNT, X-PLOR/CNS, pseudo-SCALEPACK, MAIN, XtalView (foo.phs) or in a user-defined format. For SHELX it is possible to output all quantities as intensities, i.e. F or delF terms may be squared. An mmCIF file can also be produced with all the relevant information taken from the MTZ header.
There are many options controlled by the assignments on the LABIN line. The most common requirements are:
When using OUTPUT USER you can define the output columns as you wish; this option can be used to construct a foo.phs file by assigning F PHI and FOM (see examples).
Many of the tasks can also be performed with SFTOOLS.
The allowed keywords are:
END, EXCLUDE, FREEVAL, FSQUARED, INCLUDE, LABIN, MISS, MONITOR, OUTPUT, RESOLUTION, SCALE
Compulsory input keywords are OUTPUT and LABIN.
The output types are as follows:
The output file has h,k,l,f,imt in FORMAT(3I4,7X,F7.0,I6), where imt=0 for a good reflection.
To use the SHELX suite of programs (SHELXD, SHELXE, SHELXL or SHELXS) it is necessary to prepare two input files: foo.ins containing information about the cell, symmetry and some parameters to control the SHELX run, and foo.hkl containing a reflection list. The foo.hkl file may contain intensities (HKLF 4) or amplitudes (HKLF 3). Intensities may be generated from input amplitudes using the FSQUARED keyword, but it is better to use the original intensities. The foo.ins file finishes with a record HKLF 3 if foo.hkl contains amplitudes, or HKLF 4 for intensities.
The foo.hkl file created by this program contains:
To use the programs SHELXD to solve a complete molecule by direct methods, to use SHELXL for refinement, or to prepare a reflection list for SHELXE, assign either I/SIGI/FREE or FP/SIGFP/FREE only on the LABIN line. Reflections previously flagged for FreeR analysis are marked with -1 in the last column. These can be extracted by "grep -e -1$ foo.hkl"
To use the program SHELXD to find heavy atom or anomalous scattering sites, followed by SHELXE to calculate protein phases, you need to prepare two *.hkl files, one containing the FP to be phased, and the other the differences between two observations which are related to the substructure signal. To use isomorphous differences, scale FP and FPH together, and assign FP and FPH on the labin line. Request OUTPUT SHELX. MTZ2VARIOUS outputs to foo.hkl the difference |FP - FPH|, or its squared value (i.e. |FP - FPH|^2) if FSQUARED specified, and an appropriate SIGMA, followed by a phase estimate. The output file will contain lines of the form (HKLF 3 format if FSQUARED not specified):
h, k, l, Del= ABS("FPH-Fp"), sigma"DEL", PHIdel in FORMAT(3I4,2F8.1,I4)where PHIdel is 0 or 180, depending on whether Del is positive or negative. Similarly for HKLF 4 format, if FSQUARED is specified.
If you wish to use anomalous differences, you can EITHER assign FP as FPH(-) and FPH as FPH(+), OR assign DP as DPH in which case the program will output DPH or its square. You must use keyword SHELXDiff to use this option; this flags that phases must be 90 or 270, not 0 or 180. The output file contains the anomalous differences and has lines of the form (HKLF 3 format if FSQUARED not specified):
h, k, l, Del=ABS("FPH(+)-FPH(-)"), sigma"DEL", PHIdel in FORMAT(3I4,2F8.1,I4)where PHIdel is 90 or 270, depending on whether Del is positive or negative. Similarly for HKLF 4 format, if FSQUARED is specified.
The phase information is needed for SHELXE. If the program SHELXD is to be used to find heavy atom or anomalous scattering sites from substructure differences, and you wish to run the program SHELXE to calculate protein phases using the SHELXD file, it must also list a phase estimate for the difference.
The output file has 'HKL ', h, k, l, F, sig(F), phase, fom in
format(A4,3I4,3F8.1,F8.4), with phase = 1000, fom = 0 i.e. dummies.
Note that files for TNT must be sorted on h, k, l and certain reflection
zones are required. You may need to run CAD to resort your data.
Use keywords
INCLUDE FREER <num> and EXCLUDE FREER <num>
to generate files for R-free calculation.
There is a maximum likelihood version of TNT from Pannu and Read
which requires a free-R flag (in XPLOR convention). This column
will be output if you assign the FREE column in LABIN and do
not use the INCLUDE | EXCLUDE FREER options.
CIF output is invoked, where <data block header> is a maximum of 80 characters
long, and must begin with the characters "data_" (any mixture of upper and
lowercase thereafter). OUTPUT CIF can be used to prepare data (from crystallography
or EM) for deposition to the PDB.
Unlike the other output formats, all the reflections from HKLIN are written
to HKLOUT. Not all column labels are appropriate for CIF output (see Notes
on CIF). Also, only RESO, EXCLUDE SIGP and
FREEVAL can be used with OUTPUT CIF.
They are used to flag certain reflections but not to reject them. The others
are ignored.
The output file has FORMAT(A,3I5,A,F10.1,F10.1,A,F10.2,A,I6...). The exact contents will depend on which labels have been specified by the LABIN keyword. See the documentation for FREERFLAG for a table explaining the differences in free R flag conventions.
Similar to XPLOR output. However, free R flags are left unchanged. To select the correct free R flag in CNS, you will need something like:
{===>} test_flag=0;
For SHELX and XPLOR/CNS ONLY. If FP and the anomalous difference is assigned (see
LABIN),
then the amplitudes for reflections h,k,l and -h,-k,-l are generated and
output as separate reflections. In this case, the column ISYM
may also be assigned if it is present: this is a flag from TRUNCATE which
= 0 if F comes from both positive (hkl) and negative (-h-k-l) Bijvoet
reflections,
= 1 if only from F+ and
= 2 if only F-
This gives output suitable for the MAIN program. The output file contains H K L FP SIGFP and optionally FREE, PHIB and FOM if they are specified on the LABIN line. Alternatively, if FC is specified on the LABIN line, then FP and FC are interpreted as the real and imaginary parts respectively of a calculated F, and output as a "COMPLEX" field.
This gives pseudo-SCALEPACK output which is needed as input to the SOLVE package. The output file assigned to HKLOUT is ASCII and writes out H K L I(+) SIGI(+) I(-) SIGI(-), with the format (3I4,4F8.1). The output may need to be rescaled to fit this format. If the input is F(+) and F(-) the rescaling is done within the program
The output file is of the form H K L ? ? ... where the user can specify which columns are to be output, how many and in what format. It can be used to generate a foo.phs file suitable for XtalView. See examples. Ten dummy labels (DUM??) are available to assign to any column and are output as real. Also, there are ten dummy columns (IDUM??) which are output as integer. The order of the data in the ASCII file are taken from the order of the program labels specified on the LABIN card e.g. LABIN FP=FP1 DP=DP1 SIGFP=SIG1 SIGDP=SIGDP1 would give the order H K L FP1 DP1 SIG1 SIGDP1 in the output file. The format must either be of a FORTRAN type with initially three integer items and the rest must be complementary with the LABIN card e.g.
LABIN FP=FP DUM1=X IDUM1=Y OUTPUT USER '(3I4,2F7.1,I4)'
or
OUTPUT USER *
to use free formatted output. However, all columns after H, K and L will be treated as real numbers.
The output is controlled by the labels specified here:
Beware: if you want to take any sort of difference: Fph - Fp, or F(+) - F(-) you MUST specify FP= ..., FPH=...:
Input labels accepted are:
H, K, L Indices FP, SIGFP F and Sigma for native FPH, SIGFPH F and Sigma for derivative FC, PHIC F and Phase from model FPART, PHIPART F and Phase from partial structure DP, SIGDP Anomalous difference and Sigma I, SIGI I and Sigma F(+), SIGF(+) F+ and Sigma(F+) F(-), SIGF(-) F- and Sigma(F-) used for anomalous output I(+), SIGI(+) I+ and Sigma(I+) I(-), SIGI(-) I- and Sigma(I-) FPART_BULK_S, PHIPART_BULK_S Partial F and Phase for bulk solvent correction W, FOM Weights PHIB Best phase (experimental) HLA,HLB,HLC,HLD Hendrickson-Lattman coefficients FREE FreeR flag ISYM (see TRUNCATE) DUM?? Dummy labels (output as real) IDUM?? Dummy labels (output as integer)
Not all columns are used in the various output formats, see Notes on INPUT and OUTPUT. Also, the contents of the columns which are output may depend on which input columns are assigned by LABIN, see DESCRIPTION above.
Note: when using the DUM?? and IDUM?? labels, the program may generate warnings about column type mismatches. This may happen for instance if an anomalous difference (column type D) is assigned to one of the DUM labels (which is nominally of type R, i.e. 'any other real'). These warnings should be ignored, and the output is not affected.
End input.
If this flag is set, the program expects F and SIGF and will output I and SIGI: I = F*F, SIGI = 2*SIGF*F + SIGF*SIGF. These intensities are not necessarily the same as the measured intensities (pre-TRUNCATE); it is better to use the measured values if you have them.
followed by an integer <Nmon>. Every <Nmon>-th reflection within the resolution range is monitored (printed out).
followed by 2 real numbers, <resmin>, <resmax>. This can be used to restrict the output data to the given resolution range.
The F/SIGF or I/SIGI are multiplied by <scale> before output. For SHELX output, if the SCALE keyword is not given then a scale factor is computed so that the maximum intensity is 99999.0 (so as to fit into the output format).
Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are FREER.
Each secondary keyword is followed by a number setting the appropriate limit for excluding data. Possible keywords are SIGP, SIGH, DIFF, FPMAX, FPHMAX, FREER. If DP is assigned without FP then the exclusion criterion for DIFF are applied to |DP|.
The reflections with FreeRflag = <num> are treated as the freeR set: the default is 0 if FREE is assigned. This is important if you want to include a free-R test in your XPLOR/CNS or SHELX refinement, or you are using the Pannu-Read version of TNT. The FREE column must be assigned with LABIN.
By default, if any data associated with a reflection are missing, i.e. are represented in HKLIN by a Missing Number Flag (MNF), then that reflection will not appear in the output. However, if the keyword MISS is given then these reflections will be output, but with the MNFs converted to <valm>. The latter need not be given, and defaults to 0.0. The other exclusions are still effective. Note that mmCIF output is a special case, and the mmCIF character '?' is used to denote missing values. This keyword is therefore ignored for mmCIF output.
Also, if MISS is present then when producing isomorphous data, i.e. |FPH-FP|, if either FPH or FP is a MNF then the difference is set to zero and the sigma is twice the measured sigma. For example; FP=MNF SIGFP=MNF, FPH=100 SIGFPH=10 then FPH-FP = 0 and SIG=20.
Not all INPUT columns are accepted with a particular OUTPUT format. If one has OUTPUT <subkw> then the allowed input columns are given below (see LABIN and OUTPUT):
You may still have trouble getting exactly the output you want. You can use the UNIX utilities cut(1) or sed(1) to manipulate the mtz2various output.
All reflections in the MTZ input file will be output to the CIF file. However, there are ways to flag certain reflections with the data type _refln.status. Observed reflections will be flagged with 'o'. Unobserved reflections, i.e. those flagged as missing in the relevant amplitude or intensity column, will be flagged as 'x'; these reflections will not be added to _reflns.number_obs. The 'free' reflections will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set. Systematically absent reflections are flagged with '-'.
If the RESO keyword is specified then reflections at higher or lower resolution than the limits given, will be written with _refln.status 'h' or 'l' respectively. The limits will be written to the CIF as the values of _refine.ls_d_res_high and _refine.ls_d_res_low.
If EXCLUDE SIG is given then reflections for which F < <value>*sigma(F),
and which satisfy the resolution limits (if given), will be written with
_refln.status '<'. The value of _reflns.number_obs excludes all reflections
which do not satisfy the condition on sigma(F). All other sub-keywords of
EXCLUDE are ignored for CIF output.
NB: The translation of the RESOLUTION and EXCLUDE SIGP conditions to
_refln.status values does not imply that the the use of these conditions is
good crystallographic practice. Be prepared to justify why you have excluded
any data from your final refinement!
Below is a list of the items output to the CIF file:
_entry.id _audit.revision_id _audit.creation_date _audit.creation_method _audit.update_record _cell.entry_id _cell.length_a _cell.length_b _cell.length_c _cell.angle_alpha _cell.angle_beta _cell.angle_gamma _symmetry.entry_id _symmetry.Int_Tables_number _symmetry.space_group_name_H-M _symmetry_equiv.id _symmetry_equiv.pos_as_xyz _reflns.entry_id _reflns.d_resolution_high _reflns.d_resolution_low _reflns.limit_h_max _reflns.limit_h_min _reflns.limit_k_max _reflns.limit_k_min _reflns.limit_l_max _reflns.limit_l_min _reflns.number_all _reflns.number_obs _diffrn_radiation_wavelength.id _exptl_crystal.id _reflns_scale.group_code These items are the ones per reflection. _refln.wavelength_id Always written _refln.crystal_id Always written _refln.scale_group_code Always written _refln.index_h Always written _refln.index_k Always written _refln.index_l Always written _refln.status Always written _refln.F_meas_au FP _refln.F_meas_sigma_au SIGFP _refln.F_calc FC _refln.phase_calc PHIC _refln.phase_meas PHIB _refln.fom FOM _refln.intensity_meas I _refln.intensity_sigma SIGI _refln.ebi_F_xplor_bulk_solvent_calc FPART_BULK_S _refln.ebi_phase_xplor_bulk_solvent_calc' PHIPART_BULK_S _refln.pdbx_HL_A_iso HLA _refln.pdbx_HL_B_iso HLB _refln.pdbx_HL_C_iso HLC _refln.pdbx_HL_D_iso HLD _refln.pdbx_F_meas_plus F(+) _refln.pdbx_F_plus_sigma SIGF(+) _refln.pdbx_F_minus F(-) _refln.pdbx_F_minus_sigma SIGF(-) _refln.pdbx_anom_difference DP _refln.pdbx_anom_difference_sigma SIGDP _refln.pdbx_I_plus I(+) _refln.pdbx_I_plus_sigma SIGI(+) _refln.pdbx_I_minus I(-) _refln.pdbx_I_minus_sigma SIGI(-)
Important note: In the 6.0 version of MTZ2VARIOUS, the tokens associated with anomalous data (such as _refln.pdbx_F_meas_plus) and with Hendrickson-Lattman coefficients have been updated to use the PDB exchange dictionary, replacing those from the CCP4 harvest dictionary. This is a change in nomenclature only and the new tokens are accepted by the deposition sites.
mmCIF (at least at version 0.8) makes no provision for the output of derivative data in the same data block as native data. For more information about what these mmCIF categories are, check out the mmCIF dictionary.
# Output a file suitable for input to CNS or XPLOR # mtz2various HKLIN nicona HKLOUT dell.hkl << EOF RESOLUTION 10000 2 OUTPUT XPLOR EXCLUDE SIGP 0.01 # to exclude unmeasured refl. LABIN FP=F SIGFP=SIGF FREE=FreeR_flag END EOF # Output a file suitable for shelx solution or refinement mtz2various HKLIN aucn_trn-unique.mtz HKLOUT aucn_I.hkl <<eof LABIN I=IMEAN SIGI=SIGIMEAN FREE=FreeR_flag OUTPUT SHELX END eof # Output a file suitable for shelxd to find heavy atom sites mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF LABIN FP=FTOXD3 SIGFP=SIGFTOXD3 FPH=FAU20 SIGFPH=SIGFAU20 OUTPUT SHELX ( Program will recognise this is an isomorphous difference) RESOLUTION 10 3.5 END EOF # Output a file suitable for shelxd to find heavy atom sites from anomalous differences mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF LABIN DPH=DANOAU20 SIGDPH=SIGDANOAU20 or LABIN FP=FAU20(-) SIGFP=SIGFAU20(-) FPH=FAU20(+) SIGFPH=SIGFAU20(+) OUTPUT SHELXD RESOLUTION 10 3.5 END EOF # Output a foo.phs file suitable for XtalView map calculation after a REFMAC5 refinement mtz2various HKLOUT $CCP4_SCR/toxd.phs hklin $CEXAM/toxd/toxd-refmac5 <<EOF LABIN DUM1=FWT DUM2=SIGFP DUM3=PHWT OUTPUT USER RESOLUTION 10 3.5 FORMAT( (3i5,3f12.1) END EOF
A runnable unix example script is in $CEXAM/unix/runnable/
A non-runnable unix example script which demonstrates mtz2various used to output anomalous data is in $CEXAM/unix/non-runnable/
mtzdump, f2mtz, SFTOOLS, cut(1), sed(1)
Eleanor Dodson, York University