CCP4 Program Suite

fffear HKLIN foo.mtz [XYZIN foo.pdb] [MAPIN foo.map] [MAXIN foo.max] XYZOUT bar.pdb [ SOLIN foo.msk ]
[Keyworded input]

REFERENCE

DESCRIPTION

`fffear' is a package which searches for molecular fragments in poor quality electron density maps. It was inspired by the Uppsala `ESSENS' software (Kleywegt+Jones, 1997), but achieves greater speed and sensitivity through the use of Fast Fourier transforms, maximum likelihood, and a mixed bag of mathematical and computational approaches (Cowtan, 1998). Currently, the main application is the detection of helices in poor electron density maps (5.0A or better), and the detection of beta strands in intermediate electron density maps (4.0A or better). It is also possible to use electron density as a search model, allowing the location of NCS elements. Approximate matches may be refined, and translation searches may be performed using a single orientation. The results are scored using an agreement function based on the mean squared difference between model and map over a masked region.

The program takes as input an mtz file containing the Fourier coefficients of the map to be searched, and a search model in the form of a pdb file, map, or maximum likelihood target. A `fragment mask' is generated to cover the fragment density, and orientations and translations are searched to find those transformations which give a good fit between the fragment density and map density within the fragment mask.

The program has been highly optimised using reciprocal-space rotations and grid-doubling FFT's, and crystallographic symmetry (Rossman+Arnold, 1993) giving 4-50 times speed improvement over the results published in 1998. The speed of the calculation is almost independent of the size of the model, thus the program may also be used for molecular replacement calculations where weak phases are available.

A maximum likelihood search function is under consideration for future versions.

`fffear' Recipes

INPUT/OUTPUT FILES

HKLIN

Input mtz file - This should contain the conventional (CCP4) asymmetric unit of data (see CAD).

The mtz file should contain all reflections to the limit of the measured diffraction pattern, since all the reflections are used to accurately scale the data. However, only those reflection with phases, to the resolution limit specified by the compulsory RESOLUTION keyword, will actually be used in the search procedure.

XYZIN

Input pdb file. This may contain an arbitrary crystal header, or none at all. The only restriction is that the atomic coordinates are given in Angstroms on arbitrary orthogonal axes. The B-factors of the input atoms should be set to an average value of (around) zero. Normally, all B-factors should be equal, unless some prior information about the B-factors of atoms in the desired fragment is available. (It is legitimate for example to make the B-factors of the C-beta or Oxygen atoms higher).

MAPIN

Input map file. This may be specified as an alternative to XYZIN to perform a search for NCS or cross-crystal operators. The map should contain the search density, placed in a cubic cell. Regions of the map outside the search mask should be set to zero. The search map will usually be generated using maprot in map cutting mode, e.g.

An input coordinate file may also be provided on XYZIN. This will not be used for the search, but will be rotated and output for visualisation purposes.

MAXIN

Input maximum likelihood search target. This is used in the same way as an input map, however it also contains density variance information. (Special software is used for the construction of ML targets for fffear). ML targets are resolution dependent, so the appropriate target should be used in conjunction with the RESOLUTION keyword. XYZIN may again be used for visualisation purposes.

XYZOUT

Output pdb file. This contains multiple copies of the input fragment, rotated and translated to the positions of the best matches between the fragment density and and map density. The fragments are sorted in order of quality, with the best first. The b-factor is set to the value of the search function, with low values representing a good fit.

Good matches to major secondary structure features are usually obvious because several fragments link up or overlap in sensible manners. At better than 4.0A resolution, the direction of the chain is commonly correct as well.

MAPOUT

A map of the best fragment fit at each position in the map. Values closest to zero represent the best fit.

SOLIN

Input mask - this is used as a filter for the results. Any rotation/translation solutions whose centre-of-mass falls in the solvent (zero) region of the mask will be excluded from the output. If no mask is given, the whole cell is allowed.

Generally there is no point providing a solvent mask, since the solvent density generally does not provide a match to atomic features. However this may be useful when fitting a molecular replacement map from a very incomplete model, to exclude hits to the MR model.

KEYWORDS

BASIC KEYWORDS

SOLC <solc> [MEAN <solvval> <protval>]

<solc>: solvent content for scaling. Always input the correct solvent content here to ensure correct scaling. 0.0=all protein, 1.0=all solvent.
MEAN <solvval> <protval>: used to set mean density for solvent and protein regions. This affects scaling and density modification.
<solvval> = mean density in solvent region.
<protval> = mean density in protein region.
(defaults 0.32, 0.43 electrons per cubic angstrom)

LABIN FP=.. [SIGFP=..] PHIO=.. [FOMO=..]

Enough columns must be provided to allow calculation of a map. Common combinations include (calculated_magnitude + phase), (observed_magnitude + phase + weight), (weighted_magnitude + phase).

FP: = F magnitude
SIGFP: = standard deviation, 0 for unmeasured
PHIO: = best initial phase estimate
FOMO: = weight attached to PHIO

RESOLUTION <rmin> <rmax>

Resolution range of reflections to include in the translation search stage of the calculation. This should be set to cover the resolution range for which significant phase information is available. Good results are obtained with phases to 4.0A or better; for larger fragments (10 residues or more) information may be obtained at still lower resolutions.
(default is the whole range of the input mtz file)

MODEL [RADIUS <mdlrad>] [RESOLUTION <mrmax>] [BFACTOR <bfac>] [ROTATE <alpha> <beta> <gamma>]

Set the parameters for the model atoms.

<mdlrad>: is the radius over which the atomic density is calculated (as VDWR in sfall).
<mrmax>: is the resolution at which the atomic shape functions are calculated. At higher resolutions this should make little difference, however at lower resolutions a significant amount of the atomic density leaks out of the fragment mask, and so better results are obtained if the atomic shape function is corrected. If <mrmax> is set to zero, resolution truncation is disabled and the true shape function is used.
<bfac>: Temperature factor to add to all the atoms in the input fragment. By default the search is conducted in a map sharpened to Boverall=0. If a different type of map is used through use of the SCALE card, the B-factors of the model atoms should be adjusted accordingly.
<alpha> <beta> <gamma>: An initial rotation to apply to the model before starting the search. This rotation is included in the output rotation angles. Useful for doing a translation search at fixed orientation, or for refining an MR solution.

Defaults: <mdlrad>=2.5A, <mrmax>=<rmax>, <bfac>=0, <alpha>=<beta>=<gamma>=0.

MASK [RADIUS <mskrad>]

Set the radius of the fragment mask about the model atoms. This determines the volume over which the agreement between the map and the model are compared. Defaults: <mskrad>=2.5A.

FILTER [MAP] [MODEL] [VARIANCE] [RADIUS <fltrad>]

Apply a filter to the map and/or model before starting the search. The filter may match either the local mean (default) within the filter radius, or it may match both the local mean and variance. This is useful at low resolutions or when performing MR or NCS searches.

MAP: apply filter to map.
MODEL: apply filter to search model.
VARIANCE: match both local mean and variance.
<fltrad>: radius of sphere for local mean/variance calculation.

Defaults: do not filter either map or model, do not match variance, <fltrad>=5.0A

SEARCH [STEP <step>] [RANGE <beta>] [PEAKS <npeaks>] [GRID <sampling>]

Set the parameters for the search function.

<step>: is the orientation step angle in degrees. At 4.0A resolution or better, a search step of 10 degrees is sufficient for a 10 residue fragment. The search step should be smaller at higher resolutions or for larger models. A quick and dirty result may often be obtained with a much coarser search step, as large as 30 degrees.
<range>: Set the maximum range in degrees for the search. To perform a translation search on a single orientation, set <range> to 0.1. To refine the result of a previous search, use SEARCH STEP 2 RANGE 10 and give the approximate orientation with MODEL ROTATE.
<npeaks>: is the number of peaks in the resulting search function for which rotated fragment atoms will be output. Defaults: <step>=10degrees, <npeaks>=100.
<sampling>: is the factor by which the search grid is oversampled. Higher values give finer search grids, and potentially better results at the cost of time. Default: <sampling>=1.333.

SCALE [NATURAL] [<scale> <bfac>]

Override internal scaling and scale input data by:
F² = <scale> exp (<bfac> s/2) F_obs²
Scaling is critical to correctly fitting the density with a model. The data scale will be accurately determined automatically if structure factor magnitudes are provided to better than 4.5A, otherwise it is a good idea to provide a SCALE card.

NATURAL: place the map on an absolute scale but do not adjust the B-factor.

CENTRE FRAC/ORTH <x> <y> <z>

Center the output fragment positions in an asymmetric unit around <x> <y> <z>, given in fractional or orthogonal coordinates in accordance with the preceding keyword. Useful to put your matches in the same region and any model you are working on.

OTHER KEYWORDS

GRID <nx> <ny> <nz>

Set the grid for the calculation. Ideally the grid spacing should be 1/5 of the resolution of the phases, thus for 4.0A phases the grid spacing must be 0.8A. Spacings greater than 1/4 of the resolution will cause an error. Grid sampling must be a multiple of 4 and obey any other requirements imposed by the spacegroup.

FORM <z> <a1> <b1> <a2> <b2>

Alternate 2-gaussian formfactor coefficient for atomic number <z>. f=<a1>exp(<b1>s)+<a2>exp(<b2>s). Formfactors are supplied for H, N, C, O, S and other atom types are scaled from these. Given that the model B-factors will generally be wrong, a crude approximation is sufficient for all common cases.

TRUNCATE <rmin> <rmax>

Resolution range of reflections to include in the data scaling stage. This keyword can be used to exclude part of the input data by resolution cutoffs. This is generally highly inadvisable.
(default is the whole range of the input mtz file)

STRUCFAC [REAL <rscale>] [RECIP <hscale>]

Use a (slow) direct Fourier to calculate the model and mask structure factors instead of the default FFT. The REAL and RECIP keywords may then be used to set the spacing of the real space grid used to calculate the fragment density and mask, and the reciprocal space sampling of the fragment and mask transforms.

<rscale>: The spacing in Angstroms of the grid on which the fragment model density and mask are calculated. Default 0.5 A.
<hscale>: The spacing in reciprocal Angstroms of the grid on which the fragment and mask transforms (structure factors) are calculated. This grid needs to be quite fine to ensure accurate interpolation when the fragment transform is related. The default value is sufficient for a model whose longest axis is 15A, for larger models decrease this number proportionally. Default 0.01 A^-1.

OUTPUT

The output PDB file (XYZOUT) contains up to 1000 copies of the input molecule in decreasing order of fit to the density. For the purposes of visualisation I find it useful to get the header and the first 250 C-alpha atoms from this file, as follows:
grep 'C[AR]' XYZOUT | head -250 > ca.pdb

The translation function map (which omits the orientation information) is also output on MAPOUT. This has peaks where the origins of the good orientations are found. If the input model has an alpha carbon at the origin a rough backbone trace of map regions matching the fragment may be obtained.

AUTHOR

Kevin D. Cowtan, Department of Chemistry, University of York
email: cowtan@ysbl.york.ac.uk

FFFEAR (CCP4: Supported Program)

NAME

SYNOPSIS