sfcheck [HKLIN in.mtz] [XYZIN in.pdb]
[HKLOUT out.mtz] [MAPOUT map.ccp4]
[PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]
Authors: A.A.Vagin, J.Richelle, S.J.Wodak.
email: alexei@ysbl.york.ac.uk
A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of
procedure for evaluating the quality of macromolecular structure-factor
data and their agreement with atomic model.
Acta Cryst.(1999). D55, 191-205
Copy file sfcheck.tar.gz
and uncompress it (`gunzip sfcheck.tar.gz')
After untaring `sfcheck.tar' , you will get a sfcheck directory,
with src, doc and bin subdirectory. To build the executable,
go in src and then you have to option
After untaring `sfcheck.tar' (command: tar xvf sfcheck.tar) you will get a sfcheck directory, with src, doc and bin subdirectory. To build the executable, go to src and then you will have following options:
Also you can download binaries (executable files):
( all with memory allocation option)
sfcheck_sgi.gz
sfcheck_alpha.gz
sfcheck_linux.gz
You can use this version as previous one:
1. by command (batch) file
2. interactively
3. by ccp4i
New style to use:
You can use program by command string with options (without any keywords):
sfcheck -f file_sf_mtz_or_cif_or_map -m model_pdb_or_cif
-out out -nomit Nomit
-mem Nm -na Na
-scl map_scale_factor -map -invert
-h -r
-po path_out -ps path_scrath
-lf label_F -lsf label_sigF
-li label_I -lsi label_sigI
-lfree label_free_flag
h = help and information about mtz labels
r = rest some special files(.dst,...)
out = y - see nomit option
a - program creates CIFile (sfcheck.hkl)
with anisothermal corrected Fobs
u - CIFile with detwinned data
map = extract density map will be created (sfcheck_ext.map)
or new map if input was map (sfcheck.map).
Useful to prepare mirror or/and scaled map
invert = mirror map will be used
nomit = number of cycles of omit procedure.
2 is a good choice.It takes time
if OUT = Y, program creates CIFile (sfcheck.hkl)
with omit phases
Nm = memory request in Mb (for f90 only)
Na = maximal number of atoms in the model
label_* = labels for mtz_file
For example:
sfcheck -f file.mtz
or
sfcheck -m file.pdb
or
sfcheck -f file.mtz -m file.pdb
or
sfcheck -f file.mtz -m file.pdb -nomit 2 -map -out y
or
sfcheck -f file.mtz -lf FP -lsf SIGFP
or
sfcheck -f file.mtz -h
1.Crystal:
cell parameters and space group
2.Model:
number of atoms
number of water molecules
solvent content
<B> for model
Matthews coefficient and corresponding solvent %
reported resolution
reported R-factor
3.Refinement:
refinement program
resolution range for refinement
reported sigma cut-off for refinement
reported R-factor
reported Rfree
4.Structure factors:
number of reflections
number of reflections with I > sigma
number of reflections with I > 3sigma
resolution range
completeness
R-standard (sum(sigma)/sum(F))
Wilson plot (amplitudes vs. resolution)
overall B-factor by Patterson origin peak and by Wilson plot
optical resolution
expected minimal error in coordinates
Anisotropic distribution of Structure Factors -ratio of Eigenvalues
5.Model vs. structure factors:
R-factor
Correlation coefficient
R-factor for reported resolution range and sigma cut-off
Rfree
Luzzati plot (R-factor vs. resolution)
coordinate error from Luzzati plot
expected maximal error in coordinates
DPI
Patterson scaling - scale, Badd
Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23
Solvent correction - Ks,Bs
Optical resolution
Optical resolution is defined as an expected minimum distance
between two resolved peaks in the electron density map.
With a single-Gaussian approximation of the shape of atomic peak
the minimum distance between two resolved peaks is twice the standard
deviation "sigma" or the width of atomic peak W (W = 2 sigma).
Expected width of atomic peak W is computed as
W = sqrt ( 2 (sigma_patt2 + sigma_res2) )
where sigma_patt - standard deviation of the Gaussian corresponded
to the Patterson origin peak.
sigma_res - standard deviation of the Gaussian corresponded
to the origin peak of spherical interference function
which is Fourier transform of the sphere in
the reciprocal space with radius 1/d_min.
sigma_res = 0.356 d_min.
d_min is minimum d-spacing, "nominal resolution".
The "expected optical resolution for complete data set" is
calculated as above but using all reflections, with values for
missing reflection being the average value in the corresponding
resolution shell.
Plot of Optical resolution for an atom with B=0 demonstrates
behaviour of the part of Optical resolution corresponded on the
series termination.
(for the proof see Appendix)
Patterson scaling
Scaling in SFCHECK is based on the Patterson origin peak which is
approximated as a gaussian. Compared to the conventional scaling
by the Wilson plot, this method is particularly advantageous when
only low resolution data are available.
The program gives overall B-factors estimated by both methods.
Low resolution cut-off
Disordered solvent contributes to diffraction at low resolution.
However, removing of low resolution data from calculations results
in a series termination effect which is noticeable in the electron
density at the surface of the molecule. To reduce the influence of
low resolution terms, SFCHECK applies the "soft" low resolution
cut-off to structure factors according to the formula:
Fnew = Fold (1-exp(-Boff*s2)) , where Boff = 2dmax2
Program uses Boff = 256
Scaling
Program scales Fobs and Fcalc by the Patterson origin peak using all
data applying Boff.
First, computes Boveralls for observed and calculated amplitudes.
Second, makes the width of the calculated peak equal to the
observed, i.e. computes an additional thermal factor Badd:
Badd = Boverall_obs - Boverall_calc
Third, computes the scale factor for Fcalc:
sum(Fobs2*(1-exp(-Boff*s2)))
scale = sqrt ( --------------------------------------------- )
sum(Fcalc2*exp(-Badd*s2)*(1-exp(-Boff*s2)))
Finally we have:
Fcalc_scaled = Fcalc * scale * exp(-Badd*s2)
The program computes R-factor and Correlation coefficient for all
data applying the soft low resolution cut-off as described above.
The program also computes R-factor and Correlation coefficient for
the reported resolution range and reported sigma cut-off without
applying Boff. If the Fobs file contains reflections marked with
the Rfree flag, the program computes Rfree.
Completeness
Missing data are restored by using the average values of
intensities for the corresponding resolution shell.
The program produces a plot of completeness vs. resolution and
a plot of the average radial completeness in polar coordinates
theta and phi.
Expected minimal error
The minimal coordinate error is estimated using experimental
sigmas(F). The standard deviation of atomic coordinates is
sig_min(r) = sqrt(3)*sigma(slope)/curvature
where sigma(slope) is a slope of electron density in the
x direction ( along A).
curvature is an average curvature of the electron
density in the atomic peak center.
and computed as:
sigma(slope) = (2pi*sqrt(sum(h2*(sigF)2)))/(VOL*A)
VOL - volume of cell
A - cell parameter
h - Miller index
summation over all reflections
( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.)
curvature = (2pi2*sum(h2*F))/(VOL*A2)
( Murshudov et al., (1997) Acta.Cryst D532, 240.)
If there is no experimental sigma for observed data, the program
uses sigma = Fobs * 0.04 for all reflections.
Expected maximal error
Expected maximal error in coordinates is estimated by the difference
between !Fobs! and !Fcalc!:
sig_max(r) = sqrt(3)*sigma(slope)/curvature
sigma(slope) = (2pi*sqrt(sum(h2*(Fobs-Fcalc)2)))/(VOL*A)
curvature = (2pi2*sum(h2*F))/(VOL*A2)
For missing reflections the program uses the average value of
sigma(Fobs) for the corresponding resolution shell instead
of (Fobs-Fcalc).
DPI - diffraction-data precision indicator
The Cruickshank's method of estimation of coordinate error.
( the Refinement of Macromolecular structure Proceeding
of CCP4 Study weekend. pp11-22 1996)
sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact
where C - fractional completeness.
Rfact - conventional crystallographic R-factor
Nobs - number of reflections
Dmin - maximal resolution
If Rfree flags are specified, the program uses the Murshudov's approach
to calculate DPI:
(Newsletter on protein crystallography., Daresbury
Laboratory, (1997) 33, pp 25-30.)
sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree
Luzzati plot (R-factor vs. resolution)
Program computes the average radial error <delta> in coordinates
by Luzzati plot.
<delta(r)> = 1.6 sig(x)
Solvent content
Solvent content is the fraction of the unit cell volume not occupied
by the model. The model consists of ALL atoms present in the coordinate
file.
Residual factor Rmerge
sum_i (sum_j |Ij - <I>|)
Rmerge(I) = --------------------------
sum_i (sum_j (<I>))
Ij = the intensity of the jth observation of reflection i
<I> = the mean of the intensities of all observations of
reflection i
sum_i is taken over all reflections
sum_j is taken over all observations of each reflection
Local error estimation (plotted for each residue, for the backbone
and for the side chain):
1. Amplitude of displacement of atoms from electron density
2. Density correlation coefficient
3. Density index
4. B-factor
5. Index of connectivity
Displacement
Displacement of atoms from electron density is estimated from the
difference (Fobs - Fcal) map. The displacement vector is the ratio of
the gradient of difference density to the curvature. The amplitude of
the displacement vector is an indicator of the positional error.
Correlation coefficient
The density correlation coefficient is calculated for each residue
from atomic densities of (2Fobs-Fcalc) map - "Robs" and the model
map (Fcalc) - "Rcalc" :
D_corr = <Robs><Rcalc>/sqrt(<Robs2><Rcalc2>)
where <Robs> is the mean of "obsereved" densities of atoms of residue
(backbone or side chain).
<Rcalc> is the mean of "calculateded" densities of atoms of
residue.
Value of density for some atom from map R(x) is:
sum_i ( R(xi) * Ratom(xi - xa) )
Dens = ----------------------------------
sum_i ( Ratom(xi - xa) )
where Ratom(x) is atomic electron density for x-th point of grid.
xa - vector of the centre of atom.
xi - vector of the i-th point of grid.
Sum is taken over all grid points which have distance
from the centre of atom less than Radius_limit.
For all atoms Radius_limit = 2.5 A.
Index of density and index of connectivity
The index of connectivity is the product of the (2Fobs-Fcal) electron
density values for the backbone atoms N, CA and C, i.e. the geometric
mean value for these atoms. Low values of this index indicate breaks
in the backbone electron density which may be due to flexibility of
the chain or incorrect tracing. The index of density is a similar
indicator which is calculated for all atoms of a given residue.
An omit map is a way to reduce the model bias in the electron
density calculated with model phases. SFCHECK produces the so
called total omit map by an automatic procedure. First, the
initial (Fobs, PHImodel) map is divided into N boxes. For each
box, the electron density in it is set to zero and new phases are
calculated from this modified map. A new map is calculated using
these phases and Fobs. This map contains the omit map for the
given box which is stored until the procedure is repeated for
all boxes. At the end, all the boxes with omit maps compose
the total omit map. Phases calculated from the total omit map
are combined with the initial phases. The whole procedure may
be repeated (keyword NOMIT). Note: it is time consuming!
Program can create output file with omit phases (see keyword OUT)
Program can use only one input file of coordinates or structure
factors. In this case program gives information derived from
input file without local estimation.
Program checks for merohedral twinning.
Perfect twinning test: <I2>/<I>2
Also (if it's possible)
Program will compute Partial Twinning test:
H = !I(h1)-I(h2)!/(I(h1)+I(h2))
Alpha (twinning fraction) = 1/2 - <H>
If 0.05 <Alpha< 0.45 program can create output file
with detwinned data (see keyword OUT)
It is easy to use SFCHECK interactively, but can be used in batch. The best and easiest way to prepare a command file is to run SFCHECK once by dialogue. If a sfcheck.log file was assigned (first request), the program creates a command (batch) file (sfcheck.bat) automatically.
See some command (batch) file examples.
All keywords must be preceded by an underscore (e.g. _DOC). The available keywords are:First keyword always must be defined:
DOC
One or both of these keywords must be defined:
FILE_C
FILE_FOther keywords
NOMIT OUT MAP PATH_SCR TEST SCL INVER
To get started with SFCHECK interactively, you first have to answer this question:
Do you want to have FILE-DOCUMENT sfcheck.log? < N | Y >
_DOC:
Default: <N>
The DOC-file contains the protocol of the running of the program. With the DOC-file, the program creates a command (batch) file: sfcheck.bat.
Also you can use this keyword DOC to redirect output files:
sfcheck.log
sfcheck.bat
sfchek_XXX.ps
sfcheck.hkl
sfcheck_ext.map
sfcheck.map
to special directory ( _DOC Y>path or _DOC >path). Examples:
_DOC Y>/y/people/alexei/
or
_DOC >/y/people/alexei/
Default: < >
Default: < >
When using an MTZ file, MTZ keywords must be used (or program will use default values).
Default: <0>
<nomit> is the number of cycles of omit procedure. 2 is good choice.
Default: <N>
Default: <N>
Default: < >
Default: <1 >
Default: <N>
Default: <N>
The output information is represented in the PostScript file:
sfcheck_<identifier>.ps
sfcheck_map.ps (if input was map)
A simple ASCII version of this file is in:
sfcheck.log
Also the program can create:
a new formatted CIFile of Fobs: sfcheck.hkl (keyword OUT)
a file of density around model: sfcheck_ext.map (keyword MAP)
/CCP4 format for CCP4 distribution or BLANC format/
a new map if input was map: sfcheck.map
Some other files will not be deleted if keyword TEST = Y.
These files have internal format of the BLANC program
suite (see file README by ftp from anonymous @ftp.ysbl.york.ac.uk)
and can be used by programs of this suite.
sfcheck_fob.dat - BLANC_Fobserved_file
sfcheck_ph.dat - BLANC_phases of the model
sfcheck_omit_ph.dat - BLANC_omit_phases
sfcheck_detwin.dat - BLANC_detwinned_Fobs
You can use keyword PATH_SCR to redirect all scratch files to special directory.Example:
_PATH_SCR /y/people/alexei/
You can use keyword DOC to redirect output files:
sfcheck_<identifier>.ps sfcheck.hkl sfcheck_ext.map sfcheck.map and also (if keyword TEST = Y ) sfcheck_fob.dat sfcheck_ph.dat sfcheck_omit_ph.dat sfcheck_detwin.dat
to special directory.Examples:
_DOC Y>path or _DOC >path
You can have CCP4 version of SFCHECK which can read MTZ file
or EM map (format CCP4) and create file with extract density
around model or new mirror or/and scaled map (format CCP4).
1. This possibility uses CCP4 libraries.
You must make setup CCP4 before.
2. Keywords for reading MTZ file.
Next keywords are necessary only for MTZ file
F - label of F or F(+)')
SIGF - label of sigma F or sigma F(+)')
F- - label of F(+)')
SIGF- - label of sigma F(-)')
FREE - label of Free_flag')
I - label of I or I(+)')
SIGI - label of sigma I or sigma I(+)')
I- - label of I(-)')
SIGI- - label of sigma I(-)')
# -------------------------------- sfcheck <<stop # -------------------------------- # _DOC Y # _FILE_C model.pdb _FILE_F fobs.cif # # _END stop
In this case all output files will be in directory: /y/people/alexei/ and all scratch files will be created in directory: /y/people/alexei/work/ # -------------------------------- sfcheck <<stop # -------------------------------- # _DOC >/y/people/alexei/ # _FILE_C model.pdb _FILE_F fobs.cif # # _NOMIT 2 _OUT Y _path_scr /y/people/alexei/work/ _END stop
In this case coordinate file isn't used. # -------------------------------- sfcheck <<stop # -------------------------------- # _DOC Y # _FILE_C _FILE_F p1.mtz # _F FO _SIGF SDFO _END stop
1. Input PDB_file of coordinates
Input PDB_file of coordinates must contain the CRYST1 card with
the unit cell and the space group name.
Program can use the information from HEADER,SCALE,MTRIX,REMARK cards.
2. Input formatted file of structure factors
This file of structure factors must be in PDB-format or CIFile
which contains indices and structure factors or intensities.
(also simple formatted file with "h,k,l,!F!,sig(F)" or "h,k,l,!F!"
and without titles is acceptable)
The best is CIFile.
A. Example of a CIfile of structure factor amplitudes:
data_structure_9ins
_entry.id 9ins
_struct.title ' insuline 9ins'
_cell.length_a 100.000
_cell_length_b 100.000
_cell.length_c 100.000
_cell.angle_alpha 90.000
_cell.angle_beta 90.000
_cell.angle_gamma 90.000
_symmetry.space_group_name_H-M 'P 1 21 1'
loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas_au
_refln.F_meas_au_sigma
2 3 4 12.3 1.2
-2 -3 -4 11.4 1.1
. . . . . . . . . . . . .
C
or just:
data_structure_9ins
loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas_au
_refln.F_meas_au_sigma
2 3 4 12.3 1.2
-2 -3 -4 11.4 1.1
. . . . . . . . . . . . .
For intensities use:
_refln.intensity_meas
_refln.intensity_sigma
B. Example of a PDB file of structure factor amplitudes:
HEADER R2SARSF 15-JAN-91
COMPND RIBONUCLEASE SA (E.C.3.1.4.8) COMPLEX WITH 3'-*GUANYLIC ACID
SOURCE (STREPTOMYCES $AUREOFACIENS)
AUTHOR J.SEVCIK,E.J.DODSON,G.G.DODSON
CRYST1 64.900 78.320 38.790 90.00 90.00 90.00 P 21 21 21 8
CONTNT H,K,L,S,FOBS,SIGMA(FOBS)
FORMAT (2(I3,2I4,2F7.0,F6.0,9X))
COORDS 2SAR
REMARK 1 TWO REFLECTIONS PER RECORD.
REMARK 2 DMIN=1.85, DMAX=16.28
CHKSUM 1 MIN H=0,MAX H=34,MIN K=0,MAX K=41,MIN L=0,MAX L=20
CHKSUM 2 TOTAL NUMBER OF REFLECTIONS=17346
CHKSUM 3 TOTAL NUMBER OF REFLECTION RECORDS=8673
CHKSUM 4 SUM OF FOBS=0.235499E+07
0 0 3 60 9 16 0 0 4 106 307 25
0 0 5 166 23 20 0 0 6 239 657 52
0 0 7 326 0 38 0 0 8 425 511 40
. . . . . . . . . . . . . . . . . . . . . .
C. Example of a simple formatted file of structure factor amplitudes
which is assumed to contain H,K,L,F,sig(F):
2 3 4 12.3 1.2
-2 -3 -4 11.4 1.1
. . . . . . . . . . . . .
or without sig(F):
2 3 4 12.3
-2 -3 -4 11.4
. . . . . . . . .
The length of file records must not exceed 80 characters.
The format of the records is free, e.g. data must be separated by
blanks. (be careful - some PDB files do not satisfy this rule)
The program uses the information about cell parameters and space
group from the coordinate file and ignores such information in
the structure factor file.
Memory control parameters ( in main_sfcheck_ccp4.f ):
C
MEMORY - memory for densities, gradients, coordinates, ...
PARAMETER ( MEMORY=5000000)
REAL POOL(MEMORY)
C
NCRDMAX - maximal number of coordinates
PARAMETER ( NCRDMAX=200000)
C
IPRSYM - maximal number of symmetry operators
PARAMETER ( IPRSYM=96 )
INTEGER*2 ISYM(5,3,IPRSYM)
C
ISYM - integer*2 array for cryst.symmetry operators
IPRSYM - dimension of integer*2 array ISYM(5,3,IPRSYM)
maximal number of cryst.symmetry operators.
C
MEMORY - dimension array POOL.
C
MEMORY = MAPMAX + (NCRDMAX/2)*5 , where MAPMAX - maximal
size of XY-section (NX*NY)
Estimation of the width of atomic peak by the Patterson origin peak.
Fourier transform of atomic Gaussian:
1
--------------- exp( -r2/(2 sigma_four2) )
(2pi sigma_four)2/3
where sigma_four is standard deviation of Gaussian.
is also Gaussian:
B s2
exp( - ----- ) where B = 8pi2 sigma_four2
4
Patterson function which calculated as Fourier transform of reciprocal
space Gaussian in square:
2 B s2
exp( - ------- )
4
is also Gaussian with standard deviation (for infinite fourier series)
2B
sigma_patt_02 = ---- = 2 sigma_four2
8pi2
Effect of series termination of Fourier transform can be considered
as the product in the reciprocal space infinite number of
Fourier coefficients and the sphere with radius 1/d_min, where
d_min is minimum d-spacing. The product in the reciprocal space
corresponds to the convolution in the Patterson space.
Fourier image of sphere is the spherical interference
function T(r) (Int.Tables,1993,vol B,p247):
3 ( sin(x) - x cos(x) )
T(r) = ------------------------- where x = 2pi r (1/d_min)
x3
Using Taylor's expansion the origin peak of function T(r)
can be approximated by Gaussian:
r2
exp( - ------------- )
2 sigma_res2
where sigma_res is standard deviation of Gaussian.
sigma_res = ( d_min *sqrt(5) )/ 2pi = 0.356 * d_min
This result is identical to the optical definition of resolution
(Blundell,1976), (James,1948)
as twice the distance from maximum to the first zero of image
of a point source. In 3-dimensional case the coordinate of the first
zero is 0.715 d_min ~ 2 sigma_res.
Standard deviation 'sigma' of Gaussian which is product of two Gaussians with
standard deviations sigma_1 and sigma_2 is
sigma2 = sigma_12 + sigma_22
Therefore the standard deviation of Patterson origin peak with finite
Fourier series is
sigma_patt2 = sigma_patt_02 + sigma_res2
Standard deviation of expected atomic peak for finite Fourier series is
sigma_four2 = sigma_patt_02/2 + sigma_res2 =
= sigma_patt2/2 + sigma_res2/2
Finally, expected width of atomic peak is:
W = 2 sigma_four = sqrt ( 2 ( sigma_patt2 + sigma_res2) )