mmCIF FORMAT: (CCP4: Formats)
NAME
mmCIF format for CCP4
- the mmCIF format as used in CCP4
The macromolecular Crystallographic Information File (mmCIF) format
was developed by a working group of the IUCr formed in 1990. It
represents an extension of the CIF format used by small molecule
crystallographers, and which is used for automatic submission to Acta
Crystallographica C. mmCIF files are text files with a flexible format
based around either <data_name> <data_value> pairs or a
loop structure (works like a table). In particular, a wide variety of
data items are supported (as defined in the mmCIF dictionary), and
character data values may be lengthy and descriptive. This alleviates
many of the restrictions of the traditional PDB format.
Full details of the mmCIF format can be found on the
IUCr mmCIF Page.
Central to the format is the
dictionary of allowed data items. Note that data
items are grouped into categories. As of January 2002, the mmCIF dictionary
is on Version 2.0.03. The dictionary is designed to be extensible, and new
data items are added with new versions.
An mmCIF dictionary is distributed with the CCP4 suite as
$CCP4/lib/data/cif_mm.dic, consisting
of the standard mmCIF dictionary together with some additional data items
required for data harvesting and some data
items for TLS refinement. The CCIF software
library uses a binary symbol table representation of the mmCIF dictionary
which is produced during the CCP4 build.
The mmCIF format is currently used in the CCP4 Suite in the following ways:
- Data Harvesting:
A limited number of programs write out data harvesting files into
a subdirectory of HARVESTHOME (which defaults to the home directory) for
subsequent transfer to deposition sites at the time of structure deposition.
These files are in mmCIF format.
- The CCP4 distribution includes Peter Keller's
CCIF software library for reading and writing mmCIF files. Some of
the harvest files are produced using this library.
- The CCP4 distribution also includes a set of
library routines which perform a similar function for mmCIF as the
rwbrook library does for the PDB format.
- Reflection files in mmCIF format can
be created by the program MTZ2VARIOUS.
This format is suitable for deposition of structure factors.
- Version 2.7 of RASMOL will read and
display coordinate files in mmCIF format.
- An emacs lisp file for a CIF major mode is distributed as
$CCP4/include/cif.el.
- The refinement program REFMAC (from version 5 onwards) stores
restraint information and other intermediate files in mmCIF format.
- The forthcoming MMDB
software library for coordinate data will read and write coordinate
files in mmCIF format, as well as PDB format and an internal binary
format.
Overview of some useful mmCIF categories
The following categories cover the information typically held
in CCP4-PDB files:
- CELL
- cell dimensions (replacing CRYST1 card).
- SYMMETRY
- spacegroup name or number (not always included in CCP4-PDB files).
- ATOM_SITES
- cell transformations (replacing SCALEx cards).
- ATOM_SITE
- atom site information (replacing ATOM,HETATM,ANISOU cards).
Note:
- ATOM_SITES_ALT, ATOM_SITES_ALT_ENS, ATOM_SITES_ALT_GEN
- pointed to by _atom_site.label_alt_id and gives more information
on alternative conformations: _atom_site.label_alt_id is sufficient
for programs in their current form.
- ATOM_SITE_ANISOTROP
- this contains alternate_exclusive data items to those in
category ATOM_SITE: in general, it is simpler to use the latter.
However, when there is anisotropic U data for only a small subset
of atoms, e.g. for metal ions only, then it might be more convenient
to use a separate category.
In addition, the following categories are also useful:
- AUDIT
- information on how the file was created and subsequently modified.
- ENTITY
- define polymer/non-polymer/water entities.
- ENTITY_POLY_SEQ
- sequence information. Ideally, this should correspond to the
sequence in the ATOM_SITE category, although there are exceptions,
e.g. if the latter describes a temporary poly-ALA model.
- STRUCT_ASYM
- describes contents of asymmetric unit.
- STRUCT_CONN
- describes disulphides, salt bridges and hydrogen bonds. The first
would be useful for protin.
-
-
See Also
- The
imgCIF Dictionary
- The image CIF dictionary (imgCIF) is a CIF dictionary of data names
required by the Crystallographic Binary File (CBF) image representation project.
imgCIF/CBF is an initiative to extend the IUCr CIF concept to cover efficient
storage of 2-D area detector data and other large datasets.
- The
Symmetry CIF Dictionary
- The symmetry CIF dictionary (symCIF) is a supplement to the Core dictionary
designed to provide the data names required to describe crystallographic symmetry.