mmCIF FORMAT: (CCP4: Formats)

NAME

mmCIF format for CCP4 - the mmCIF format as used in CCP4

OVERVIEW

The macromolecular Crystallographic Information File (mmCIF) format was developed by a working group of the IUCr formed in 1990. It represents an extension of the CIF format used by small molecule crystallographers, and which is used for automatic submission to Acta Crystallographica C. mmCIF files are text files with a flexible format based around either <data_name> <data_value> pairs or a loop structure (works like a table). In particular, a wide variety of data items are supported (as defined in the mmCIF dictionary), and character data values may be lengthy and descriptive. This alleviates many of the restrictions of the traditional PDB format.

Full details of the mmCIF format can be found on the IUCr mmCIF Page. Central to the format is the dictionary of allowed data items. Note that data items are grouped into categories. As of January 2002, the mmCIF dictionary is on Version 2.0.03. The dictionary is designed to be extensible, and new data items are added with new versions.

An mmCIF dictionary is distributed with the CCP4 suite as $CCP4/lib/data/cif_mm.dic, consisting of the standard mmCIF dictionary together with some additional data items required for data harvesting and some data items for TLS refinement. The CCIF software library uses a binary symbol table representation of the mmCIF dictionary which is produced during the CCP4 build.

The mmCIF format is currently used in the CCP4 Suite in the following ways:

Overview of some useful mmCIF categories

The following categories cover the information typically held in CCP4-PDB files:
CELL
cell dimensions (replacing CRYST1 card).
SYMMETRY
spacegroup name or number (not always included in CCP4-PDB files).
ATOM_SITES
cell transformations (replacing SCALEx cards).
ATOM_SITE
atom site information (replacing ATOM,HETATM,ANISOU cards).
Note:
ATOM_SITES_ALT, ATOM_SITES_ALT_ENS, ATOM_SITES_ALT_GEN
pointed to by _atom_site.label_alt_id and gives more information on alternative conformations: _atom_site.label_alt_id is sufficient for programs in their current form.
ATOM_SITE_ANISOTROP
this contains alternate_exclusive data items to those in category ATOM_SITE: in general, it is simpler to use the latter. However, when there is anisotropic U data for only a small subset of atoms, e.g. for metal ions only, then it might be more convenient to use a separate category.

In addition, the following categories are also useful:

AUDIT
information on how the file was created and subsequently modified.
ENTITY
define polymer/non-polymer/water entities.
ENTITY_POLY_SEQ
sequence information. Ideally, this should correspond to the sequence in the ATOM_SITE category, although there are exceptions, e.g. if the latter describes a temporary poly-ALA model.
STRUCT_ASYM
describes contents of asymmetric unit.
STRUCT_CONN
describes disulphides, salt bridges and hydrogen bonds. The first would be useful for protin.

See Also

The imgCIF Dictionary
The image CIF dictionary (imgCIF) is a CIF dictionary of data names required by the Crystallographic Binary File (CBF) image representation project. imgCIF/CBF is an initiative to extend the IUCr CIF concept to cover efficient storage of 2-D area detector data and other large datasets.
The Symmetry CIF Dictionary
The symmetry CIF dictionary (symCIF) is a supplement to the Core dictionary designed to provide the data names required to describe crystallographic symmetry.