EDSTATS

NAME

edstats - Calculates per-residue real-space electron density R factors, correlation coefficients, Z(observed) metrics for the ρ_obs Fourier map and Z(difference) metrics for the Δρ (difference) Fourier; also computes data for the histogram, and P-P and Q-Q difference plots for the observed and difference Fourier maps.

SYNOPSIS

edstats MAPIN1 input1.map MAPIN2 input2.map XYZIN input.pdb [HISOUT output.his] [PPDOUT output.ppd] [QQDOUT output.qqd] [MAPOUT1 output1.map] [MAPOUT2 output2.map] [OUT output.out] [XYZOUT output.pdb]

DESCRIPTION

The program EDSTATS calculates real-space electron density R factors, correlation coefficients, Z_obs and Z_diff metrics for main- (includes Cβ atom) and side-chain atoms of individual residues and/or atoms. This integrates and replaces the functionalities of SFALL (MODE ATMMAP ATMMOD/RESMOD options) and OVERLAPMAP (CORRELATE ATOM/RESIDUE options). In addition it recognises the chain ID and the PDB residue label insertion code (which SFALL ignores!), and so does not require a specification of the residue label mapping for each chain (CHAIN option in SFALL/OVERLAPMAP).

The real-space R factor (RSR) is defined (Brändén & Jones, 1990; Jones et al., 1991) as:
RSR = Σ |ρ_obs - ρ_calc| / Σ |ρ_obs + ρ_calc|
The real-space correlation coefficient (RSCC) is defined as:
RSCC = cov(ρ_obs,ρ_calc) / sqrt(var(ρ_obs) var(ρ_calc))
where cov(.,.) and var(.) are the sample covariance and variance (i.e. calculated with respect to the sample means of ρ_obs and ρ_calc).
EDSTATS computes two real-space correlation coefficients: the 'sample' correlation coefficient defined above, and the 'population' correlation coefficient, i.e. with respect to the population (overall) means, which will be zero if the F(000) terms were not included in the map calculation (OVERLAPMAP uses only the sample means). The RSCC based on the population means seems to be better at detecting weak correlations.
The real-space Z_obs metric (RSZO) is defined (Tickle, 2011) as:
RSZO = mean(ρ_obs) / σ(Δρ)
where σ(Δρ) is the standard uncertainty of the difference Fourier map. Note that this is the standard uncertainty of the 'Fo-Fc' map, NOT the RMS value of the '2Fo-Fc' map, which bears no relationship whatsoever to the uncertainty!
The real-space Z_diff metrics (RSZD- and RSZD+) are defined (Tickle, 2011) as follows for the sets of negative and positive values respectively of Δρ at the grid points that are covered by the group of main- or side-chain atoms under consideration:
1. Order the values in each set in increasing numerical value (i.e. ignoring the sign).
2. For each of N subsets of size 1, 2, ..., N-1, N of the numerically highest values of the original set of size N, compute the cumulative probability of chi-square (χ² = Σ (Δρ/σ(Δρ))²) for the subset. So the subset of size 1 is simply the numerically highest value ('maximum order statistic') in the original set, the subset of size 2 consists of the 2 highest values of the set, the subset of size N-1 excludes the lowest value, and the subset of size N is just the set itself.
3. In practice this χ² cumulative probability is very difficult to compute (even by stochastic numerical integration) for subsets other than those of size 1 and N (it involves integrals up to dimension N where N may be anything from 10 to 1000). Note that the standard χ² cumulative probability assumes that the sample is selected randomly, whereas here we are selecting the highest values. Therefore we approximate it as the product of two components: the standard cumulative probability of χ² for a randomly selected subset, and a correction, the Dunn-Šidák correction (Sokal & Rohlf, 1995; Gibbons & Chakraborti, 2003), in this case the cumulative probability of the order statistic, for the fact that we are selecting the highest values.
4. Take the highest cumulative probability over all subsets, and convert this to the corresponding normal Z-score, making the Z-score negative for the set of negative values; this is the final RSZD- or RSZD+ score. The program also computes a combined RSZD score which is simply the maximum of |RSZD-| and RSZD+.
The real-space Z-scores RSZO, RSZD- and RSZD+ require estimates of the standard uncertainty σ(Δρ) and offset of the 'Fo-Fc' map (the offset arises from omission of the F(000) term, which may differ from zero since the model is not necessarily complete). The recommended procedure is to use as an initial estimate the value of the σ(Δρ) in the map header, with zero as the offset, and then rescale σ(Δρ) and the offset separately for each chain and the bulk solvent. Bulk solvent is assigned the chain ID '%' for this purpose and ordered waters are considered to belong to a chain with ID '0' whatever their actual chain IDs in the PDB file.
The sample size correction above arises because the greater the sample size the more likely it is that high values will occur purely by chance. This correction takes into account the fact that the number of grid points is not the same for all residues, because obviously different residue types contain different numbers of atoms, and also different limiting atom radii will enclose different numbers of grid points, because the radius varies with atom type and B_iso. The correction is therefore necessary to make the metrics comparable between different residues and to be able to apply a common threshold to the metric for all residues. Note that the RSR and RSCC metrics do not apply a sample size correction: it is assumed that all sample points contribute equally to the metrics independent of the sample size.
The number of grid points referred to above is the number of statistically independent grid points covering the atoms; this is the actual number of grid points with an over-sampling correction factor. According to the Nyquist-Shannon sampling theorem, the grid spacing required for statistical independence is 1/2 the high resolution cut-off (d_min), so e.g. if a grid spacing of d_min/4 is used then the effective number of grid points is the actual number / 2³.
The advantage of the real-space Z-scores over the real-space R factor and correlation coefficient scores (including the 'population' CC metric) is that the former depend purely on model accuracy (RSZD) or model precision (RSZO), whereas RSR and RSCC depend on both (e.g. it's obvious from the plots that RSR and RSCC are at least partially correlated with the atomic B_isos); this means that it's impossible to say how much of the observed effect on the metric is due to lack of accuracy and how much to lack of precision.
Note that model accuracy is related to the likelihood of the model (i.e. the consistency of the model with the data), and is what is improved by model building and refinement. The difference Fourier density is obviously a measure of any discrepancy between the model and the data, so is a direct measure of model accuracy.
Model precision is a property of the crystal and the data (assuming the refinement is done optimally), and is related to data quality and completeness, resolution, atom type (or atomic scattering factor), occupancy and atomic B_iso; hence model precision can only be improved by crystallizing in a different crystal form and/or collecting better (e.g. more precise and/or higher resolution) data. The ρ_obs density, divided by its standard uncertainty (note: this is not the same as RMS(ρ_obs)), is a measure of model precision which incorporates all the above factors correlated with precision (e.g. the atomic B_iso is also a precision metric but it doesn't take account of the variation of precision with atom type and occupancy).

The sums and min/max functions required to compute all residue or atom metrics are taken over all map grid points within a specified distance of each atom centre. This distance limit is naturally a function of the atom type (via the atomic scattering factors computed from the 5-Gaussian approximation table in $CLIBD/atomsf.lib), the atomic B_iso values and the resolution limits, as shown in the following table of the distance limit r_max for an O atom. Values used by SFALL are also shown for comparison: note that the latter depend only on B_iso and are independent of atom type and resolution:

B/Å²:                   10    20    30    40    50    60    70    80    90
d_min/Å
      r_max/Å (SFALL: all atoms)
 All                   2.35  2.67  2.95  3.21  3.45  3.67  3.88  4.08  4.27

      r_max/Å (EDSTATS: O atom)
 3.5                   1.72  1.78  1.83  1.89  1.95  2.02  2.08  2.15  2.22
 3.0                   1.51  1.58  1.65  1.72  1.80  1.88  1.97  2.06  2.14
 2.5                   1.31  1.39  1.49  1.59  1.70  1.80  1.91  2.02  2.12
 2.0                   1.12  1.24  1.38  1.52  1.66  1.79  1.91  2.02  2.13
 1.5                   0.96  1.16  1.35  1.52  1.66  1.79  1.91  2.02  2.13
<=1.0                  0.91  1.16  1.35  1.52  1.66  1.79  1.91  2.02  2.13

Note that the limiting high-resolution values of r_max are attained at ~ d_min = 1.5Å.

The resolution-dependent distance limit is computed by first performing an analytical truncated Fourier transform of the atomic scattering factor f(s) to obtain the equation for the calculated electron density ρ(r) for data between specified resolution cut-offs, at distance r from the atom centre:
```
                               s_max
        ρ(r) = FT(f(s)) = (8/r) ∫ f(s) exp(-Bs²) sin(4πrs) s ds
                               s_min
```
for specified limits s_min and s_max of sin(θ)/λ.
Then the ratio of the radius integral of ρ(r) integrated out to the outer limit r_max relative to the radius integral integrated to infinite distance is:
```
                               r_max          ∞
        Radius integral ratio = ∫ ρ(r) dr / ∫ ρ(r) dr
                                0           0
```
and this equation solved to obtain r_max for a radius integral ratio = 0.95 (i.e. 95% of the integral lies within distance r_max of the atom centre). The integrals with respect to r can be obtained analytically; the integrals with respect to s in general have no analytical solution and must be computed numerically (using e.g. the QUADPACK library). Note that ideally the volume integral of ρ(r):
```
                         r_max           r_max
        Volume integral = ∫ ρ(r) dV = 4π ∫ ρ(r) r² dr
                          0              0
```
should be used, but unfortunately this integral does not converge.
For the RSZO metric EDSTATS uses the ρ_obs map with Fourier coefficients 2mF_o-DF_c for acentric reflections or mF_o for centrics (Main, 1979; Read, 1986); for the RSZD metric it uses the Δρ (difference Fourier) map with Fourier coefficients 2(mF_o-DF_c) for acentrics, or mF_o-DF_c for centrics. For the RSR and RSCC metrics it uses the ρ_obs and ρ_calc maps.
However, for the latter, because we cannot rely on the correct Fourier coefficient for ρ_calc being present in the file of map coefficients, it is necessary to obtain it as the difference between the ρ_obs and Δρ coefficients. Since we have:
```
        Δρ = ρ_obs - ρ_calc
```
or:
```
        ρ_calc = ρ_obs - Δρ
```
therefore for acentrics:
```
        ρ_calc = F(2mF_o-DF_c) - F(2(mF_o-DF_c))
          = F(DF_c)
```
whereas for centrics:
```
        ρ_calc = F(mF_o) - F(mF_o-DF_c)
          = F(DF_c)
```
Hence the correct Fourier coefficient for ρ_calc is DF_c for all reflections. Note that it is frequently stated that the coefficient for acentrics is mF_o-DF_c but if this were used it would give completely the wrong result for the ρ_calc coefficient (it would give mF_o !).
EDSTATS also has options to output data for the histogram, 'P-P difference' and 'Q-Q difference' plots of the difference Fourier and observed Fourier maps. Note that the 'P-P difference' and 'Q-Q difference' plots are functionally identical to the standard 'P-P plot ' (probability-probability) and 'Q-Q plot ' (quantile-quantile: 'quantile' is just another name for 'normalised deviate' or 'Z-score'). The distinction is purely one of presentation: whereas the standard 'P-P' or 'Q-Q' plot plots x vs. y, where x and y are respectively the normal expected and observed probabilitlies or quantiles, the 'P-P difference' or 'Q-Q difference' plot plots x vs. y-x.

INPUT

The input is in 'namelist' format, i.e. it consists of 'keyword = value' pairs separated by a comma or newline. The keyword is always case-insensitive and only the first 4 characters are significant. The value may be a character string, a logical (true or t or false or f) or an integer or real scalar or array. The RESLO & RESHI values, obtainable from an MTZDUMP summary table for the map coefficient columns (NOT the overall values for the file as given in the MTZ header), are required; all other input values are optional.

Available options:

MAIN = string
Optional specification of type of averaging used to compute main-chain (including Cβ atom) R factors and correlation coefficients (both types), where string is either RESI (default) or ATOM (both case-insensitive):
RESI averages all map values for the main-chain atoms in each residue.
ATOM averages the map values for each atom, but reports the extreme values of these as the residue metrics.
This option has no effect on the real-space Z-scores, which are as defined in the DESCRIPTION section above.
SIDE = string
Same as MAIN, but for side-chains.
MOLE = string
Optional concatenated list of chain IDs defining the molecule for which metrics are to be calculated (default is to use all atoms). Chain IDs are case-sensitive.
RESC = string
Optional specification of type of rescaling of σ(Δρ) by Q-Q plot required: string may be ALL, BULK, CHAIN (default) or NONE (all are case-insensitive).
Scaling type ALL rescales using a single scale factor and offset based on all map points in the asymmetric unit.
BULK rescales using a single scale factor and offset based only on points in the bulk solvent.
CHAIN independently rescales each chain and the bulk solvent with a separate scale factor and offset for each group (ordered waters are treated as belonging to a single separate chain '0' regardless of their chain IDs in the PDB file). This is now the recommended procedure.
NONE does no Q-Q plot rescaling; the value of σ(Δρ) read from the map header is used, with zero for the offset.
RESLO = real
Required low resolution cut-off used in map calculation.
RESHI = real
Required high resolution cut-off used in map calculation.
THR1 = real
Optional σ cut-off threshold for Fo map: default is no cut-off.
THR2= real
Optional σ cut-off threshold for ΔF map: default is no cut-off.

TEST = integer
Debug flag used for testing and obsolete options: sum of debug option values as follows:

LS-bit  Value  Output
  0       1    General debugging.
  1       2    P-P & Q-Q difference plots for chains.
  2       4    Memory allocation debugging.
  3       8    ZSCORE s/r debugging for RSZD values.
  4      16    RSZD outliers.
  5      32    Cumulative frequencies for RSZDs > 3 σ.
  6      64    Normality tests.

USEFO = logical
A value TRUE indicates that the density histogram and Q-Q difference plots should use the Fo density instead of the ΔF density. This is only intended for demonstration purposes: the Fo density is not useful in the calculation of the RSZD metrics so with this option set, the program will stop after doing the Q-Q plot calculations.

INPUT FILES

XYZIN - Co-ordinate file in PDB format.
MAPIN1 - Input 2mFo-DFc map in CCP4 format.
MAPIN2 - Input 2(mFo-DFc) map in CCP4 format: this must contain the same header info as MAPIN1.
Both maps should be calculated with a grid spacing between 1/4 and 1/6 of the high resolution cut-off (usually 1/4 is sufficient), and the PDB file and the maps should all be from the same refinement job.
NOTE: it is essential that the MTZ file from the refinement job is run through the MTZFIX program before map calculation with FFT to ensure that the map coefficients are correct and consistent between programs (unfortunately different refinement programs have different conventions for the map coefficients!).

OUTPUT FILES

HISOUT - Optional output file for the histogram of map values, containing 2 data columns: the observed Z-score and the observed frequency. This is used for visualising the deviations of the observed distribution of either Δρ/σ(Δρ) (if USEFO = f), or of ρ_obs/σ(Δρ) (USEFO = t), from the theoretical normal distribution. A normal distribution would give the Gaussian curve y = exp(-.5 x²) / √(2π), so deviations from this indicate deviations from normality. However a histogram does not show up outliers nearly as clearly as the P-P and Q-Q plots (see below), so is really only suitable for demonstration purposes. The output is readily visualised using a plotting program such as gnuplot, e.g.:
```
> gnuplot
Terminal type set to 'x11'
gnuplot> plot'edstats.his' w l,exp(-.5*x**2)/sqrt(2*pi)
```
PPDOUT - Optional output file for the 'P-P difference' plot, containing 2 data columns: the cumulative probability for the normal distribution, and the difference (inverse normal cumulative probability of the observed quantile - normal probability). The output is readily visualised using gnuplot (see example for Q-Q difference plot below). The P-P plot is not as informative as the Q-Q plot, and generally is only used for test purposes.
QQDOUT - Optional output file for the 'Q-Q difference' plot, containing 2 data columns: the expected quantile (or Z-score) for the normal distribution, and the difference (observed quantile - normal expected quantile). This is used for visualising the deviations of the observed distribution of either Δρ/σ(Δρ) (if USEFO = f), or of ρ_obs/σ(Δρ) (USEFO = t) from the normal distribution. A normal distribution would give the straight line y = 0, so deviations from this line indicate outliers, i.e. deviations from normality (note that the Q-Q plot does not show deviations from zero density, but rather deviations from the normal, or other assumed, distribution). The numerically highest outliers will be in the 'tails', i.e. the negative outliers are the troughs in Δρ/σ(Δρ) or ρ_obs/σ(Δρ) and the positive outliers are the peaks. The output is readily visualised (with the 'normal' y = 0 line) using gnuplot, e.g.:
```
> gnuplot
Terminal type set to 'x11'
gnuplot> plot'edstats.qqd' w l,0 lt 0
```
OUT - Optional output file for table of per-residue metrics suitable for plotting with e.g. gnuplot. If no output file is specified the data go to standard output. The columns in this table are:
1. Residue 3-letter code.
2. Chain ID.
3. Residue label (including insertion code if present).
4. Weighted average B_iso for main-chain atoms in residue (including Cβ). This is weighted according to the contribution of the atoms to the total scattering in the resolution range specified (Tickle et al., 1998).
5. Number of statistically independent grid points covered by main-chain atoms.
6. Real-space R factor (RSR) for the main-chain atoms in the residue.
7. Real-space correlation coefficient (RSCC).
8. Real-space 'population' correlation coefficient.
9. Real-space Z_obs metric (RSZO).
10. Real-space Z_diff metric (RSZD); this is simply the maximum value of |RSZD-| and RSZD+.
11. Real-space Z_diff metric for negative differences (RSZD-).
12. Real-space Z_diff metric for positive differences (RSZD+).
Columns 13-21 contain the same information as columns 4-12 above (i.e. add 9), but for the side-chain atoms (excluding Cβ) if present.
To plot the RSZD- and RSZD+ metrics (in columns 11 & 12) by residue for the main-chain atoms with the suggested threshold lines at ±3σ, using gnuplot:
```
> gnuplot
Terminal type set to 'x11'
gnuplot> set style data impulses
gnuplot> plot'edstats.out'u 11,''u 12,-3 lt 0,3 lt 0
```
Similarly use columns 20 & 21 to plot the side-chain values. See separate section below on interpreting these plots.
MAPOUT1 - Optional rescaled and normalised 2mFo-DFc map, i.e. a map of ρ_obs/σ(Δρ) where σ(Δρ) may vary between grid points.
MAPOUT2 - Optional rescaled and normalised 2(mFo-DFc) map, i.e. a map of Δρ/σ(Δρ) where σ(Δρ) may vary between grid points.
XYZOUT - Optional co-ordinate file in PDB format; if given, only the molecule(s) selected are output and the occupancy column (character columns 55-60) is overwritten with the per-atom |RSZD-| metric.

ANALYSING THE STANDARD OUTPUT: OVERALL METRICS

The default run options will produce 2 files: the standard output from edstats (edstats.log in the example above) which contains some overall metrics, and the output file (e.g.edstats.out) containing the table of per-residue metrics (see following section).

The supplied Perl script percent-rank.pl extracts a small subset of the overall metrics from the standard output, compares the results with a pre-calculated set in the supplied data file pdb-edstats.out, and for each metric prints out the per-cent rank (i.e. the percentage of structures in the pre-calculated set which have a worse score, so 0% is 'worst' and 100% is 'best'). This is intended to a give a quick overview of the state of the difference Fourier and is not a meant as a substitute for interpreting the per-residue metrics (see next section). Generally you would probably want your structure to score above average on all measures, so at least above the median 50% rank. But obviously not every structure can be above average!

The data file pdb-edstats.out, or a link to it, must be present in the current directory; alternatively set the environment variable PDB_EDSTATS to point to it. The data were obtained by running edstats on ~ 600 supposedly 'good' structures (anonymous!) from PDB_REDO with Rfree < 0.175 and > 100 residues (protein only). This is not ideal, since it would clearly be much better to bin the known structures by high resolution cut-off and compare your structure only with known structures at roughly the same resolution; however this will require a much larger database than I have the resources to set up in the short term. Hopefully this feature will be developed and improved in a future release.

The columns in the data file pdb-edstats.out contain:

High resolution cut-off.
Resolution-weighted average B_iso (e.g. the effective average of B_isos 10 and 100 is not 55 but something much closer to 10, depending on the resolution cut-offs, since the atom with B_iso = 100 only contributes significantly to the scattering at low resolution).
Q-Q plot ZD- metric: this gives an overall indication of how much the distribution of all negative difference density in the asymmetric unit deviates from the expected normal distribution for purely random errors. Significant negative density outliers giving a high numerical Q-Q plot ZD- metric probably indicates wrongly placed atoms, over-restrained B factors, problems with the bulk solvent parameters (e.g. due to low completeness at low resolution), or generally low data completeness.
Q-Q plot ZD+ metric: ditto for all positive difference density. Low per-cent ranks (large values) for this metric are not as indicative of problems as are low ranks for the Q-Q plot ZD- metric above, because it can be difficult to interpret residual density outliers due to disorder, buffer ions, cryo-protectants and other additives, so uninterpreted (and uninterpretable) density is quite common in deposited structures. Consequently a high value for the Q-Q plot ZD+ metric does not necessarily indicate a serious problem; it would be better to check the per-residue RSZD+ scores.
Percentage of residue RSZD- metrics numerically above the 3σ threshold (see also next section).
Percentage of residue RSZD+ metrics above the 3σ threshold.

The percent-rank.pl script prints out the per-cent ranks for metrics 3-6 above.

Examples of usage:

percent-rank.pl edstats.log
or
percent-rank.pl *.log

Note that the overall statistics for the RSZO metrics which appear in the standard output are not listed by the percent-rank.pl script; this is deliberate: the RSZO metric is a measure of precision and is really only meaningful when analysed at the residue level. For example it may be that only say 50% of the residues score above the threshold of the precision metric, but if these 50% tell you all that you wanted to know about the biological function, then clearly the experiment can be counted as a success (assuming of course that all residues have acceptable scores for the accuracy metrics). So it all depends on which residues have high values of the precision metric. On the other hand, if only 50% of residues scored above the threshold for the accuracy metric then this would be regarded as a poor result, no matter which residues they were.

INTERPRETING THE PER-RESIDUE METRICS

For the per-residue metrics listed in the output file (e.g. edstats.out) I have suggested rejection limits of < -3σ and > 3σ for the residue RSZD-/+ metrics respectively, and < 1σ for the residue RSZO metrics, though these may need to be adjusted in the light of experience.

The RSZD scores are accuracy metrics, i.e. at least in theory they can be improved by adjusting the model (by eliminating the obvious difference density), so start by checking the worst offenders first. Use the Fourier and difference maps in your favourite graphics model-building program to guide any adjustments of the model that may be required, in the usual way. Note that positive density deviations are usually more frequent than negative ones, because they represent uninterpretable, as opposed to incorrectly interpreted density, and are therefore less symptomatic of underlying problems.

The RSZO scores are precision metrics and will be strongly correlated with the B_isos (since that is also a precision metric), i.e. assuming you've fixed any issues with accuracy of that residue there's nothing you can do about the precision, short of re-collecting the data.

The RSR and RSCC (both 'sample' and 'population') metrics are tabulated for comparison but are correlated with both accuracy and precision, so they can be useful in some circumstances, but they don't always help with telling you whether adjustment of the model is required, or whether the problem is actually an intrinsic property of the structure, or lies with the data. Note that the RSR and RSCC metrics vary with the program used, since they depend strongly on the radius cut-off, scaling algorithm and other variables which can vary a lot between programs.

REFERENCES

C-I. Brändén & T.A. Jones Nature (1990). 343, 687-689.
J.D. Gibbons & S. Chakraborti, S. (2003). Nonparametric statistical inference, 4th ed., New York: Marcel Dekker, Inc.
T.A. Jones, J-Y. Zou, S.W. Cowan & M. Kjeldgaard Acta Cryst. (1991). A47, 110-119.
P. Main Acta Cryst. (1979). A35, 779-785.
R.J. Read Acta Cryst. (1986). A42, 140-149.
R.R. Sokal & F.J. Rohlf (1995). Biometry, 3rd ed., New York: WH Freeman.
I.J. Tickle, R.A. Laskowski, & D.S. Moss Acta Cryst. (1998). D54, 243-252.
I.J. Tickle CCP4 Study Weekend (2011). Manuscript of presentation submitted - to be published in Acta Cryst. D.

AUTHOR

EXAMPLES

Example 1

This example illustrates how the maps must be prepared. Failure to follow this recipe is likely to give inaccurate results!

#!/bin/tcsh
# Fix up the map coefficients: FLABEL specifies the label for Fobs &
# σ(Fobs) (defaults are F/SIGF or FOSC/SIGFOSC).  Here, 'in.mtz'
# is the output reflection file from the refinement program in MTZ
# format.

rm -f fixed.mtz
mtzfix  FLABEL FP  HKLIN in.mtz  HKLOUT fixed.mtz  >mtzfix.log
if($?) exit $?

# Good idea to check the mtzfix output before proceeding!

less mtzfix.log

# If no fix-up was needed, use the original file.

if(! -e fixed.mtz)  ln -s  in.mtz fixed.mtz

# Compute the 2mFo-DFc map; you need to specify the correct labels for
# the F and phi columns: 'FWT' & 'PHWT' should work for Refmac.
# Note that EDSTATS needs only 1 asymmetric unit (but will also work
# with more).  Grid sampling must be at least 4.

echo 'labi F1=FWT PHI=PHWT\nxyzl asu\ngrid samp 4.5'  | fft  \
HKLIN fixed.mtz  MAPOUT fo.map
if($?) exit $?

# Compute the 2(mFo-DFc) map; again you need to specify the right
# labels.

echo 'labi F1=DELFWT PHI=PHDELWT\nxyzl asu\ngrid samp 4.5'  | fft  \
HKLIN fixed.mtz  MAPOUT df.map
if($?) exit $?

Example 2

#!/bin/tcsh
# Q-Q difference plot & main- & side-chain residue statistics.

echo resl=50,resh=2.1  | edstats  XYZIN in.pdb  MAPIN1 fo.map  \
MAPIN2 df.map  QQDOUT q-q.out  OUT stats.out
if($?) exit $?

Example 3

#!/bin/tcsh
# Main- & side-chain atom statistics, using chains A & I only & writing
# PDB file with per-atom Z_diff metrics.
echo mole=AI,resl=50,resh=2.1,main=atom,side=atom  | edstats  \
XYZIN in.pdb  MAPIN1 fo.map  MAPIN2 df.map  XYZOUT out.pdb  \
OUT stats.out
if($?) exit $?