mtz2cif - produce an mmCIF reflection file suitable for deposition. This may contain amplitudes, intensities and/or differences.
mtz2cif hklin
foo_in.mtz
hklout
foo_out.cif
[Keyworded input]
MTZ2CIF reads an MTZ file (assigned to HKLIN) and produces an mmCIF file (assigned to HKLOUT) in a form suitable for deposition with the PDB. The user must specify which quantities are to be exported via the LABIN keyword; cell and symmetry information is taken directly from the MTZ file.
It is also possible to export multiple MTZ datasets to a single mmCIF file by specifying multiple LABIN lines.
The allowed keywords are:
DATABLOCK, END, EXCLUDE, FREEVAL, LABIN, MODE, RESOLUTION
Compulsory input keywords are DATABLOCK and LABIN.
(Compulsory)
<data block header> is a maximum of 80 characters long, and must begin with the characters "data_" (any mixture of upper and lowercase thereafter).
End input.
Only one keyword is allowed for EXCLUDE:
Reflections for which F < <value>*sigma(F), and which satisfy the resolution limits (if given), will be written with _refln.status '<'. The value of _reflns.number_obs excludes all reflections that do not satisfy the condition on sigma(F).
The reflections with FreeRflag = <num> are treated as the freeR set: the default is 0 if FREE is assigned. The FREE column must be assigned with LABIN.
The output is controlled by the labels specified here:
Input labels accepted are:
H, K, L Indices FP, SIGFP F and Sigma for native FC, PHIC F and Phase from model DP, SIGDP Anomalous difference and Sigma I, SIGI I and Sigma F(+), SIGF(+) F+ and Sigma(F+) F(-), SIGF(-) F- and Sigma(F-) used for anomalous output I(+), SIGI(+) I+ and Sigma(I+) I(-), SIGI(-) I- and Sigma(I-) W, FOM Weights PHIB Best phase (experimental) HLA,HLB,HLC,HLD Hendrickson-Lattman coefficients FREE FreeR flag
To output multiple datasets from a single MTZ file to a single CIF, use multiple LABIN lines (one per dataset). In CIF, a dataset corresponds to a unique crystal/wavelength pair. The program assumes that the crystal and dataset information is correctly set up in the MTZ file - see the MTZ documentation for more details about crystals and datasets in MTZ files.
There are restrictions on the use of multiple datasets:
Note that multiple datasets involves writing out non-standard CIF tokens - these need to be agreed with the RCSB and EBI. If only a single dataset is written then the resulting CIF should conform to the existing standards.
Default: PDBX
Specify the _refln.* token set used to write out the reflections in the output CIF, for anomalous data.
The CCP4 exchange dictionary corresponds to the token set for the old MTZ2VARIOUS CIF output.
Specify minimum (<resmin>) and maximum (<resmax>) resolution range in Angstroms. Note that reflections outside these limits are still output but are flagged as 'l' (below low resolution limit) or 'h' (above high resolution).
The limits will be written to the CIF as the values of _reflns.d_resolution_high and _reflns.d_resolution_low.
All reflections in the MTZ input file will be output to the CIF file. However, there are ways to flag certain reflections with the data type _refln.status. Observed reflections will be flagged with 'o'. Unobserved reflections, i.e. those flagged as missing in all the relevant amplitude and/or intensity columns, will be flagged as 'x'; these reflections will not be added to _reflns.number_obs.
The 'free' reflections will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set. Systematically absent reflections are flagged with '-'. Note that 'free' reflections are counted as 'observed' when outputting the total number of observed reflections to _reflns.number_obs.
Note that the translation of the RESOLUTION and EXCLUDE SIGP conditions to _refln.status values does not imply that the the use of these conditions is good crystallographic practice. Be prepared to justify why you have excluded any data from your final refinement.
The mmCIF character '?' is used to denote missing values.
The output of anomalous data from MTZ to CIF is still not completely resolved. The OUTPUT CIF option in older versions of the MTZ2VARIOUS program did not have the CIF tokens corresponding to F(+)/F(+) or anomalous difference, and so anomalous data was converted to explicit hkl/-h-k-l pairs with the corresponding F(+) or F(-) value written to _refln.F_meas_au as appropriate.
With the use of explicit tokens for anomalous data this approach is not necessary - only hkl needs to be written. However note that there is some ambiguity if only mean FP is supplied (i.e. without anomalous differences or supporting F(+) and F(-) pairs). In this case MTZ2CIF will only write one reflection to the CIF per reflection in the MTZ file.
Note also that while the CIF2MTZ program can recognise the anomalous tokens (as of CCP4 v6.0), other programs such as SFCHECK may not deal correctly with the anomalous data in the CIF
It is possible with MTZ2CIF to write multiple MTZ crystals and datasets from a single MTZ file, into a single CIF. This is done by specifying multiple LABIN lines (one for each crystal).
Each LABIN line will correspond to a unique _refln.crystal_id and _refln.wavelength_id pair in the output reflection list. Additional non-standard CIF tokens are written in the following CIF blocks in order to correctly relate the contents of the block to the crystals and wavelengths that have been output:
Note that at present neither CIF2MTZ nor SFCHECK can deal with multiple crystals and datasets.
Below is a list of the items output to the CIF file:
_entry.id _audit.revision_id _audit.creation_date _audit.creation_method _audit.update_record _cell.entry_id _cell.CCP4_wavelength_id (only for multiple datasets) _cell.CCP4_crystal_id (only for multiple datasets) _cell.length_a _cell.length_b _cell.length_c _cell.angle_alpha _cell.angle_beta _cell.angle_gamma _symmetry.entry_id _symmetry.Int_Tables_number _symmetry.space_group_name_H-M _symmetry_equiv.id _symmetry_equiv.pos_as_xyz _reflns.entry_id _reflns.CCP4_wavelength_id (only for multiple datasets) _reflns.CCP4_crystal_id (only for multiple datasets) _reflns.d_resolution_high _reflns.d_resolution_low _reflns.limit_h_max _reflns.limit_h_min _reflns.limit_k_max _reflns.limit_k_min _reflns.limit_l_max _reflns.limit_l_min _reflns.number_all _reflns.number_obs _diffrn_radiation_wavelength.CCP4_crystal_id (only for multiple datasets) _diffrn_radiation_wavelength.id _exptl_crystal.id _reflns_scale.group_code
The following items are one per reflection:
_refln.wavelength_id Always written _refln.crystal_id Always written _refln.scale_group_code Always written _refln.index_h Always written _refln.index_k Always written _refln.index_l Always written _refln.status Always written _refln.F_meas_au FP _refln.F_meas_sigma_au SIGFP _refln.F_calc FC _refln.phase_calc PHIC _refln.phase_meas PHIB _refln.fom FOM _refln.intensity_meas I _refln.intensity_sigma SIGI _refln.ebi_F_xplor_bulk_solvent_calc FPART_BULK_S _refln.ebi_phase_xplor_bulk_solvent_calc PHIPART_BULK_S
The following items are also one per reflection, the exact token will depend on which set of tokens (specified by the MODE keyword) are being written:
PDBX CCP4 Label ------------------------------------------------------------------------------- _refln.pdbx_HL_A_iso _refln.ccp4_SAD_HL_A_iso HLA _refln.pdbx_HL_B_iso _refln.ccp4_SAD_HL_B_iso HLB _refln.pdbx_HL_C_iso _refln.ccp4_SAD_HL_C_iso HLC _refln.pdbx_HL_D_iso _refln.ccp4_SAD_HL_D_iso HLD _refln.pdbx_F_meas_plus _refln.ccp4_SAD_F_meas_plus_au F(+) _refln.pdbx_F_meas_plus_sigma _refln.ccp4_SAD_F_meas_plus_sigma_au SIGF(+) _refln.pdbx_F_meas_minus _refln.ccp4_SAD_F_meas_minus_au F(-) _refln.pdbx_F_meas_minus_sigma _refln.ccp4_SAD_F_meas_minus_sigma_au SIGF(-) _refln.pdbx_anom_difference _refln.ccp4_SAD_phase_anom DP _refln.pdbx_anom_difference_sigma _refln.ccp4_SAD_phase_anom_sigma SIGDP _refln.pdbx_I_plus _refln.ccp4_I_plus I(+) _refln.pdbx_I_plus_sigma _refln.ccp4_I_plus_sigma SIGI(+) _refln.pdbx_I_plus_sigma _refln.ccp4_I_minus I(-) _refln.pdbx_I_minus_sigma _refln.ccp4_I_minus_sigma SIGI(-)
2/5/2006 The CCP4 tokens are not recognised by CIF2MTZ; neither the CCP4 nor the PDBX tokens are recognised by SFCHECK.
Example with a single wavelength:
mtz2cif hklin $CEXAM/tutorial/data/gere_MAD_nat.mtz \ hklout $CCP4_SCR/gere_MAD_nat.cif <<EOF labin FP=F_nat SIGFP=SIGF_nat \ F(+)=F_nat(+) SIGF(+)=SIGF_nat(+) \ F(-)=F_nat(-) SIGF(-)=SIGF_nat(-) \ FREE=FreeR_flag datablock data_gere_TEST mode PDBX # Default end EOF
Example with multiple crystals and wavelengths:
mtz2cif hklin $CEXAM/tutorial/data/gere_MAD_nat.mtz \ hklout $CCP4_SCR/gere_MAD_nat.cif <<EOF # Dataset 1 labin FP=F_nat SIGFP=SIGF_nat \ F(+)=F_nat(+) SIGF(+)=SIGF_nat(+) \ F(-)=F_nat(-) SIGF(-)=SIGF_nat(-) \ FREE=FreeR_flag # Dataset 2 labin FP=F_peak SIGFP=SIGF_peak \ F(+)=F_peak(+) SIGF(+)=SIGF_peak(+) \ F(-)=F_peak(-) SIGF(-)=SIGF_peak(-) # Dataset 3 labin FP=F_infl SIGFP=SIGF_infl \ F(+)=F_infl(+) SIGF(+)=SIGF_infl(+) \ F(-)=F_infl(-) SIGF(-)=SIGF_infl(-) datablock data_gere_TEST mode PDBX # Default end EOF
A runnable unix example script is in $CEXAM/unix/runnable/
Peter Briggs, CCLRC Daresbury