RESTRAIN version 4.6 A MACROMOLECULAR REFINEMENT PROGRAM MINIMISING A FUNCTION CONTAINING TERMS INVOLVING: STRUCTURE AMPLITUDES PHASES INTERATOMIC DISTANCES GROUP PLANARITY ISOTROPIC THERMAL PARAMETER DIFFERENCES ANISOTROPIC THERMAL PARAMETER DIFFERENCES with respect to OVERALL SCALE FACTOR OVERALL ISOTROPIC THERMAL PARAMETER OVERALL ANISOTROPIC THERMAL PARAMETERS BULK SOLVENT PARAMETERS ATOMIC COORDINATES RIGID BODY ROTATIONS AND TRANSLATIONS NON-CRYSTALLOGRAPHIC SYMMETRY OPERATORS ATOMIC ISOTROPIC THERMAL PARAMETERS ATOMIC ANISOTROPIC THERMAL PARAMETERS GROUP ISOTROPIC THERMAL PARAMETERS GROUP ANISOTROPIC THERMAL PARAMETERS GROUP TLS TENSOR COMPONENTS ATOMIC, GROUP AND COUPLED OCCUPANCIES
Contact: Ian Tickle (tickle@mail.cryst.bbk.ac.uk).
The design and implementation follow papers by Waser (1963), Rollett (1969), Moss (1981), Moss & Morffew (1982), Haneef et al. (1985) and Driessen et al. (1989).
The function minimised is of the form:
M = SUM [w(f) (|Fo| - G.|Fc|)2] + SUM [w(p) (PHIo - PHIc)2] + SUM [w(d) (d(t) - d(c))2] + SUM [w(b) (b(o) - b(min))2] + SUM [w(U) delta-U2] + SUM [w(Ua) delta-Ua2] + SUM [w(v) V] + SUM [w(c) (d(t) - d(c))2] (1)where
The non-bonded interaction is only operational when b(o) < b(min) and chirality restraints are applied as distance restraints along the edges of chiral tetrahedra. Equation (1) may be written as a function of three terms: M = M(a) + M(b) + M(c). M(a) is the first term and is the one conventionally found in crystallographic least-squares procedures. M(b) is the second term which allows the use of estimates of phases from isomorphous and/or anomalous data. M(c) is the sum of the remaining terms and represents pseudo-potential energy terms.
The function M may be minimised with respect to a selection of the following parameters:
Although RESTRAIN has been written primarily for refinement of macromolecular structures, the use of a user defined dictionary for interatomic and planar restraints and other options allows the user to specify additional interatomic restraints and planes, and means that virtually any structure can be refined by the program. The program at present uses a four-Gaussian expansion of scattering factors (INTERNATIONAL TABLES FOR X-RAY CRYSTALLOGRAPHY, Vol. IV). Coefficients for this expansion suitable for X-ray or neutron diffraction may be read from the dictionary.
The program is completely general and may be used for any number of reflections in any space group. The program can be used for any size of problem. The number of atoms which may be refined is only limited by the available memory of the computer used. Array sizes are increased by a global change of the relevant variables in PARAMETER statements in an INCLUDE file (common.inc), followed by re-compilation of the source file.
At Birkbeck College this program has been used for refinement of protein and nucleic acid structures using X-ray or neutron diffraction data. It has generally been used in conjunction with model building using an interactive graphics system. The program has been set up so that the input/output interfaces easily with the graphics model building program O (Jones 1991) and FFT programs. Coordinate files have the standard PDB format. Reflection input files may be either formatted or unformatted (CCP4 MTZ).
Note that some of these subroutines (the monoclinic and orthorhombic ones) may be used for a higher symmetry space group provided it is a super-group with the same origin. For example P213 is a super-group of P212121 with no origin shift.
User friendliness of input/output has been an important criterion in the design of RESTRAIN. No preparation programs need be used. The authors have endeavoured to print sensible error messages on job failure, and to intercept lethal input. Any suggestions for improvement will be welcome.
File File name Explanation - Script with control none section 3.1 and steering data - Dictionary DICTION section 3.2 - Coordinates XYZIN section 3.3 - Optional group TLSIN section 3.4 thermal parameters - Optional reflections HKLIN or REFIN section 3.5
Alternatively you may have the control and steering data in an input file separate from the job script.
Care must be taken in preparing the coordinates for refinement. After each polymer chain a TER record must be inserted. This includes breaks in the chain due to one or more missing residues. The residues need not be numbered sequentially and the residue labels may contain non-numeric characters at any position. However, to maintain compatibility with the PDB standard format it is advisable to restrict the use of non-numeric characters to alphabetic characters, and then only in the last character position (residue insertion code).
The C-terminal residue of a protein chain may have an extra O (carboxyl) or N (amide) atom, but it must be put in a separate residue (CAR or CAM) with the atom label OXT or NXT. All atoms not contained in chains must be supplied as HETATMs. The atom labels in a residue must correspond exactly (i.e. in case and justification) with the supplied dictionary, and there must be no missing or extra atoms. Missing atoms can be dealt with by temporarily renaming the residue (e.g. for missing protein side-chain rename to GLY or ALA). Extra disordered atoms must be supplied after the TER record as HETATMs; extra distance restraints will have to be supplied for these atoms.
The PDB file may contain either Uiso's or Biso's, but the appropriate steering parameter must be specified (ISO=true and BINPUT=false or true respectively). The file may also contain anisotropic U's in the standard PDB format.
After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64A2) to more reasonable starting values (e.g. U=0.2 or B=16A2).
The atomic coordinates in the polymer chains need not be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary. If they are not they will be re-ordered and the output file of atomic coordinates will then be produced in dictionary order for subsequent cycles. Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.
After each run (1 or more cycles) 1 or more output files will be created:
File Filename Explanation - coordinates XYZOUT section 4.2 - TLS parameters TLSOUT section 4.3 - reflections HKLOUT or REFOUT section 4.4 - normal matrix MATOUT section 4.5
Furthermore the listing of the run (section 4.1) will have to be examined closely, since the steering data may need to be updated for the next run, especially G, U, SB1 and SB2 (section 3.1.3). You should update the cycle number CYCNO, so that you keep track of how many cycles you have done, and later relate this to the R-factor.
If you are refining NCS parameters you will need to supply updated parameters. You may also want to change the weighting coefficients for the reflections WF(i), section 3.1.3. All the required parameters are always printed at the end of every log file whenever new values have been computed; these can be pasted into the steering data ready for the next run. Refined coordinates and group thermal parameters can be read back in by the program without modification. In order to obtain output reflections define HKLOUT or REFOUT (section 3.1.1).
The input that is necessary and the sections that are relevant to you depend on the application for which you intend to use RESTRAIN. There are basically two categories:
The following options are available, either separately or in combination to refine a set of coordinates from low to high resolution. However note that some combinations do not make sense, and will cause abnormal program termination, for example if both RIGID coordinate groups and UISO/UANISO/TLS thermal parameter groups are defined, the thermal parameter groups must be completely contained within the coordinate groups, otherwise application of the refined RIGID body rotations and translations to the thermal parameters would destroy the correlations within the thermal parameter group.
You will normally start by setting ISO=false to get an overall thermal parameter U and scale factor G. At this stage you may still want to include the MIR or MIRAS phases in the refinement. Set PHAS=true and make sure that the input reflection file contains these phases. However, if your low-resolution model is reasonable, you may not want to use these data.
Unless phasing extends to a resolution of better than 3Å you may find that progress should begin by breaking the structure up into rigid body segments and refining these as strictly rigid bodies. Set RIGID=true and specify RIGID groups; structure outside the rigid groups will not be refined. Such segments may be as small as one residue or one side chain.
If the bonds between such segments become seriously disrupted during rigid body refinement, then those parts may have to be rebuilt on a graphics system; otherwise the structure may be annealed by restrained refinement. Remember that refinement cannot usually correct errors which are larger than one third of the high resolution cut-off.
Regions of the structure which are more highly disordered may have to be omitted initially if maps show no clear main chain density. In this case the structure will have to be broken up into extra chains with TER records at the end of each chain. If the main chain density is clear and the side chain is unclear or the sequence at this point is uncertain then the residue should be treated as ALA or GLY in the case of proteins. Remember that the number of atoms in a residue in the coordinates must correspond with the number of atoms in the residue of that name in the dictionary.
Initially the data-parameter ratio will be unfavourable and the normal matrix for the positional parameters ill-conditioned. In the first cycles at low resolution you will normally get large shifts.
In the case where only one molecule is present in the asymmetric unit it is best to start by refining the six rigid body parameters from the molecular replacement by using RIGID=true and the RIGID specification to delineate the molecule. After convergence it may be possible to break the structure up into large chunks, e.g. in the case of domains. See sections 2.3.2 and 2.3.5 for further information.
In the case where more than one molecule is present in the asymmetric unit, one may want to proceed as with one molecule. However, it is possible at low to intermediate resolution to save on time and parameters by refining the structure making use of non-crystallographic symmetry and then only to rebuild one molecule on the graphics before further refinement. There are two modes to deal with non-crystallographic symmetry.
MODE 1:
The sole purpose of MODE 1 is to enable the refinement of the
relative positions of up to 14 identical molecules (or
subunits) in the crystallographic asymmetric unit by applying
rigid body refinement. MODE 1 is usually used in the earlier
stages of refinement in which case the transformations relating
the molecules may come from molecular replacement studies.
The orthogonal coordinates of one molecule are supplied along with the transformations operating on these coordinates which generate the coordinates of up to 14 molecules.
Set RIGID=true and define one or more RIGID bodies as before, and the molecules will then be refined as independent rigid bodies. Output will be the refined coordinates of the generated molecules, and the refined transformations, which should be input to the next cycle.
Note that the program will not notify you if the same molecule is generated twice. This may happen if a dimer is supplied and also generated. You therefore must make sure that only one molecule and the correct transformation are used by the program.
MODE 2:
The purpose of MODE 2 is to assist a user who has more than one
molecule (or subunit) in the asymmetric unit and who wishes to
refine these molecules while imposing the condition that they
are structurally identical. This is useful in the earlier stages
of refinement (possibly after the use of MODE 1) as it saves
having to manually adjust the coordinates of more than one
molecule.
Input is the same as for MODE 1 except that RIGID=false and RIGID specifications are absent (see above). The transformations supplied are used as extra "equivalent positions" and the refinement produces an asymmetric unit where the molecules are identical and tend to an average of the real molecules.
The coordinates of only one molecule are written out and the same transformations are supplied for subsequent cycles. As in MODE 1, it is important to make sure that you do not generate a molecule that has already been read in.
See sections 2.3.2 and 2.3.5 for further information.
When small errors in isomorphism are present, it may be useful to refine the protein in CONSTRAINED mode before difference Fouriers are calculated. Set RIGID=true and use RIGID groups. Use only an overall Uiso. After difference Fouriers and building in the ligand, it may be advisable to refine the ligand and the macromolecule in CONSTRAINED-RESTRAINED mode by setting RIGID=false and defining RIGID groups (see above). See sections 2.3.2 and 2.3.5 for further information.
How to proceed at intermediate resolution has already been discussed partially in sections 2.3.3 and 2.3.4. Generally it may be still useful to do some cycles of CONSTRAINED-RESTRAINED refinement before proceeding to RESTRAINED refinement only. Set RIGID=false and use RIGID records to delineate "rigid" bodies. This will accelerate convergence. Finally RESTRAINED refinement is obtained by removing the definitions of any RIGID groups.
It may now be useful to refine individual isotropic thermal parameters. When these are already present in the input coordinates they can be used and refined using ISO=true together with BINPUT=false (if Uiso's are present in the PDB file), or BINPUT=true (the default, if B's are present), together with the default ISOREF=true. When not present in your input coordinate data set, use ISO=false and ISOREF=true in the initial run. Having ISO=true and ISOREF=false will merely indicate that you want to read isotropic thermal parameters, but not refine them. This can be useful for molecular replacement models. In order to get meaningful isotropic thermal parameters it is usually necessary to include data higher than 3Å resolution. Note that MFACR (see section 3.1.3) is used to remove ill-conditioning. The input Uiso for each atom is checked and reset if necessary. The lowest allowed Uiso is set with ULIML; the highest by ULIMH.
Physical background:
In this option the thermal parameters of atomic groups are refined using the approximation that the groups possess, either partly or wholly, "correlated amplitude" motion. This is not necessarily the same as "rigid body" motion because the Bragg scattering is sensitive only to the amplitudes of vibrating atoms, not to their relative phases. Small rigid groups of bonded atoms such as the planar aromatic rings in HIS, PHE, TYR and TRP are likely to vibrate as rigid bodies, because the mean square vibration amplitude of a typical bond is very small (~ 0.002Å2). However larger groups such as secondary structure elements or domains are likely to have larger internal motions, where sub-structures have vibration amplitudes which are correlated, but whose relative phases are not (e.g. in anti-phase, as opposed to in phase); this correlated amplitude motion will be indistinguishable from true in-phase rigid body motion if only Bragg scattering data is used.
The atomic groups may be whole molecules, units of secondary structure (e.g. alpha helices) or they may be pseudo-rigid side groups such as phenyl rings, imidazole, carboxylate, guanidinium or amide groups. When units of secondary structure are chosen, there is an option to include main chain atoms only. For small groups (i.e. < 20 atoms) data at high resolution (e.g. 1.5Å) may be required for success. It should also be remembered that the model assumes harmonic thermal parameters and this may not be valid for side groups on the surface of a macromolecule.
There are three group thermal parameter options: UISO, UANISO and TLS. The UISO option refines 1 parameter per group, the UANISO option 5 or 6 per group, and the TLS (translation/libration/screw-rotation) option 19 or 20 per group. This is still likely to be far fewer than the 6 per atom required in full anisotropic refinement (see section 2.3.7). The potentially rigid groups in proteins which may be suitable are aromatic rings, the "propellers" of ASP/ASN, GLU/GLN and ARG, ligands such as heme, the secondary structure elements, domains, the entire molecule, or even the entire contents of the asymmetric unit.
For the UANISO and TLS options it is possible to refine the atomic isotropic thermal parameters in addition to the group parameters; this reduces the number of group parameters from 6 to 5 and 20 to 19 respectively (because the isotropic component of the T tensor is then not used, and is set to the mean Uiso). This is in fact the default if atomic isotropic thermal parameters are refined (ISOREF=true); if this option is not desired it must be deselected (see option NOATOM in the description of parameters).
In order to analyse the TLS tensors, the output files may be used as input to the CCP4 program TLSANL. The resulting anisotropic tensors may be visualised by using the output coordinate file to compute very high atomic resolution (0.7Å) structure factors, and then contouring the Fcalc electron density with a program such as O.
If your data extend to atomic resolution it will be possible to refine individual atomic anisotropic thermal parameters using a 6 element anisotropic U tensor. This type of refinement can be started up by defining groups using the ANISO keyword. The isotropic U value of each atom will be put in the diagonal elements of the anisotropic U tensor (U11, U22, U33) to use as a starting value.
After refinement the new anisotropic U tensor (U11 U22 U33 U12 U13 U23) will be written to the coordinate file behind the ATOM record in a separate record identified by ANISOU using the standard PDB format. These records will then be used in future runs for reading and writing the anisotropic tensors.
Uncoupled group occupancy refinement may be useful for protein- inhibitor complexes, where the inhibitor is not present in stoichiometric amounts. The occupancy groups are defined in the control data with records using keyword OCCUp. The contiguous segment(s) comprising each group is/are specified by the starting atom number as present in the coordinates, the number of atoms in the segment (may be just 1 atom), and the group identifier using free format. Use as starting occupancy for the atoms in the group a value as suggested by the electron density.
Coupled alternative sites may be most easily created by using extra dictionary entries (see section 3.2). e.g. call the short alternative site residue ASX if it is the alternative site of the side chain for an ASP. These alternative site residues should then be added to the coordinate data set as ATOM records after chains terminated by TER, and effectively treated as separate protein chains themselves by inserting a TER record. Both the first and subsequent sites are specified as described above, but with different coupling identifiers appended; the group identifier must be the same for these coupled sites. It will be useful to use an extra restraint to tie the alternative site(s) down to the atom where it diverges, and extra restraints will also be required between atoms defined as HETATM's (see XTRDIST in section 3.1.1). Van der Waals repulsion is automatically turned off for coupled groups. It is always important to study the U values for the atoms in alternative sites because of the strong correlation between occupancy and U. Too large a U value with a low occupancy either means that the coordinates have been built in the wrong position, or that the site is not "real". A reasonable starting atomic isotropic U value for the second site is 0.2Å2.
Secondly in the latter stages of refinement, the weights may be used to reflect the expected discrepancies between observations and target values or functions and the corresponding quantities calculated from the model. As the model improves, higher resolution data may be included, and the higher angle data and weak reflections may be given higher weighting until the sum of the weighted residual squared over all observations and restraints equals the total number of observations and restraints minus the total number of variable parameters. This may be called statistical weighting. The weighting strategies to be adopted in the two cases may be quite different.
When applying any weights one has to recall the function that is minimised:
M = SUM [w(f) (|Fo| - G.|Fc|)2] [=M(a)] + SUM [w(p) (PHIo - PHIc)2] [=M(b)] + SUM [w(d) (d(t) - d(c))2] + SUM [w(b) (b(o) -b(min))2] + SUM [w(U) delta-U2] + SUM [w(Ua) delta-Ua2] + SUM [w(v) |V|2] + SUM [w(c) (d(t) - d(c))2] [=M(c)]
The factors w(f), w(p), w(d), w(U), w(Ua), w(v) and w(c) are the weights, the choice of which determines the relative influence of the terms in the function M which is to be minimised. It should be noted that only relative weights are significant. The choice of the absolute value of the weights does not influence the course of refinement. The relative contributions to the residual will be found in the general weighting analysis table (***ANALYSIS OF FUNCTION MINIMISED***). The weights are not directly supplied by the user. Instead weighting coefficients are supplied which are used in a formula to generate the weights. The formulae and their use are discussed in the sections below.
If the structure factor model perfectly described the diffraction of the macromolecule, the theory of least squares shows that the structure amplitudes should be given weights which are inversely proportional to their variances. However, due to the disorder present in macromolecular crystals, the structure factor model is always significantly in error. The final values of residuals and R factors usually owe more to errors in the model than due to experimental errors in the diffraction data.
The object of weighting the structure amplitude terms is to ensure that terms heavily affected by model or experimental errors are down-weighted. Several weighting schemes may be employed.
w(f) = WF(1).
This is the scheme that should be employed only in the initial stages.
w(f) = WF(1).SWF(2) / [WF(3).sigma(Fo)2 + WF(4).Fo2]where S = sin(theta)/lambda
w(f) = WF(1) / [WF(2) + WF(3).Fo + WF(4).Fo2]
w(f) = WF(1) / [WF(2) + WF(3).Fo + WF(4).Fo2 + WF(5).S + WF(6).S2 + WF(7).Fo.S]
Note that the previously suggested procedure of adjusting the WE coefficients on each cycle is not recommended. The current recommendation is to leave the WE coefficients set at their default values, and adjust the WF coefficients only after a rebuild. In any case because the structure factor and energy weights are purely relative, adjusting only WF(1) to raise or lower the F weights will give the same effect as simultaneously adjusting the geometry weights.
Alternatively the weighting coefficients can be chosen manually so that
the mean values of
w(f).(|Fo| - |Fc|)2 are approximately
independent of Fo and/or resolution (within a factor of two or
three). These mean values may be inspected in the tables ***ANALYSIS OF STRUCTURE FACTOR TERMS*** supplied in the output where they
are displayed in bins dependent on sin(theta)/lambda and Fo.
It is recommended that the user starts with scheme 1 and then when most of the ordered atoms have been refined, scheme 2 should be selected if standard deviations are available, otherwise use scheme 3 or 4. The choice of weighting coefficients is not a precise science but the resulting parameters are not likely to be critically dependent on it.
For schemes 2, 3 and 4, the optimum coefficients to make the mean values of w(f).(|Fo| - |Fc|)2 approximately independent of Fo and/or resolution, will be calculated by Nielsen's method before the first refinement cycle if USEWFC is set true, and the same values will then be used for all the cycles in the job.
Phase observations from isomorphous replacement or anomalous scattering measurements may be weighted using the figure of merit. The weighting formula is designed to weight down those reflections according to the difference between the observed and calculated values. Centric reflections are always given zero weight as they cannot contribute to a refinement. The formula is
w(p) = WP(1)*FOM*[180 - |PHIo - PHIc|WP(2)]2The figure of merit (FOM) must be read from the reflection file. The best way to choose WP(1) and WP(2) requires further research. Use the weighting analysis table for guidance.
Energy weighting involves the application of geometric restraints to the structure during refinement. The paucity of reflection data in a macromolecular refinement usually means that large random errors in atomic coordinates occur when an unrestrained refinement is attempted. These errors result in poor molecular stereochemistry.
Energy weighting uses a dictionary of target interatomic distances and standard deviations which govern the allowed deviations from the target values. Alternatively, the weights may be controlled by use of weight coefficients (WE) supplied in the steering data.
Weight Case Ideal r.m.s deviation W(d) = WE(1)2 if d(t) < 2.12Å 0.02Å W(d) = WE(2)2 if 2.12Å < d(t) < 2.625Å 0.04Å W(d) = WE(3)2 if d(t) > 2.625Å 0.05Å W(v) = WE(4)2 for planar peptide groups 0.01Å W(c) = WE(5)2 for all other planar groups 0.01Å W(c) = WE(6)2 for edges of chiral tetrahedra 0.02Å
Chiral restraints are applied as distance restraints along the edges of chiral tetrahedra with d(t)<=2.12A. In all cases WE(i)2 is the weighting coefficient that decides the relative weight of the particular energy restraint and the other terms in the function minimised.
Softer restraints than those suggested above may assist convergence at earlier stages. Note that application of harder restraints at too early a stage may severely reduce the rate of convergence. Because the structure factor and geometry weights are purely relative, the effect of reducing all the geometry weights can be obtained by increasing the weight coefficient WF(1).
Relevant information about the weighting can be found in the table under the heading:
***ANALYSIS OF ENERGY TERMS***
There are 2 weighting coefficients (WU(1) and WU(2)) for the thermal parameter restraints which aim to minimise the difference between thermal parameters of pairs of atoms whose interatomic distance is also restrained (i.e. 1-2 and 1-3 bonded atoms), though the two types of restraint can be applied independently. WU(1) applies to isotropic thermal parameters, and WU(2) to anisotropic thermal parameters (but not group thermal parameters as these are already constrained).
The standard deviation of the half-bond restraint for an atom in the isotropic and anisotropic cases (where d is the interatomic distance) is given by the equations:
siso = WU(1).U2iso saniso = WU(2).d2
The weight for the restraint on the thermal parameter difference between atoms i and j is then:
wij = 1/(s2i + s2j)
The target of the restraint is also different in the two cases; in the isotropic case it is simply the difference between the Uiso's; in the anisotropic case it is the difference between the components of the anisotropic tensors along the line joining the atoms.
There are sound statistical and physical reasons for using different forms of the weight in the isotropic and anisotropic cases.
In the isotropic case the differences are purely statistical in origin: they are almost entirely due to the assumption of isotropy, not to any actual difference in thermal parameters. In reality atomic vibrations in a macromolecule, in particular in loosely bound regions such as chain termini and side-chains will have large anisotropic and/or librational components, so that the isotropy assumption is only very approximate.
The distribution of Uiso's is always very skewed, i.e. most cluster near the modal value, but with a long tail of large values. Consequently an atom with a value near the mode is most likely to find itself next to one with a similar value giving a small difference, whereas one with a value much larger than the mode will also most likely be near one with a value near the mode, giving a large difference. This leads to a dependence of the r.m.s. difference in Uiso proportional to the square of the mean Uiso, with a proportionality factor found empirically from refinement of high resolution (1Å) structures of ~ 1; this is the weighting coefficient WU(1).
In contrast, in the anisotropic case, where the difference is between the along-bond components of the tensors, the differences are real and reflect the physical situation. From IR spectroscopy it is found that the mean square amplitude of a typical (single C-C bond) bond vibration at ambient temperature is about 0.002Å2 (equivalent to delta-B ~ 0.16Å2), which is very rigid in comparison with the atomic vibrations (B typically > 5 to 10Å2). The atomic vibrations therefore arise almost entirely as a consequence of bond librations.
In the anisotropic case, therefore, the r.m.s. difference in the thermal tensor components should be independent of the isotropic thermal parameters. The difference between thermal tensor components will however be larger across bond angles (1-3 restraints), so a dependence on the square of the interatomic distance is used. The default value of the weighting coefficient WU(2) (0.01) is rather larger than the expected difference (0.002). This is because if the correct value is used initially the restraints are so tight that the refinement often fails to converge. It may be possible to use the correct value of WU(2) (0.0007) once convergence has been attained.
FILE FILE NAME EXPLANATION - control and steering data section 3.1 - dictionary DICTION section 3.2 - atomic coordinates XYZIN section 3.3 - group thermal parameters TLSIN section 3.4 - reflections REFIN or HKLIN section 3.5
Any record or part of a record can be temporarily "commented out" by use of the ! or # character; this causes all subsequent characters on the same line to be skipped.
Each record in the control data is identified by a keyword, but only the first 4 characters are significant and case-insensitive. Any other input required follows immediately in free-format (space-separated) on the same line, with the sole exception of the keyword STEER where the data must follow on the succeeding line(s). Data records (but not comments) may be continued by finishing a line with a "-". The keywords available are:
ANISo, DESOut, DICTion, DNAMe, FORMat, HKLIn, HKLOut, LABIn, LABOut, MATOut, NCSYmm, OCCUp, PNAMe, PRIVate, REFIn, REFOut, RIGId, STEEr, SYMMetry, TITLe, TLSIn, TLSOut, USECwd, XTRDist, XTRPlan, XYZIn, XYZOut
Each of the keywords DICTION, XYZIN, TLSIN, HKLIN, REFIN, XYZOUT, TLSOUT, HKLOUT, REFOUT, MATOUT and DESOUT specifies a filename. Files may be also connected using the CCP4 logical names matching these keywords. The keyword information overrides the logical names.
In the case that no HKLIN or REFIN name is given, the only possibility is regularisation.
The program labels are 'H', 'K', 'L', 'FP', 'SIGFP', 'PHIB', 'FOM' and 'FREE' with the conventional meanings. For conventional amplitude refinement only the FP and SIGFP columns need be assigned. To calculate free R factors, assign label FREE to a free R flag column generated by `freerflag' (or otherwise).
This record contains the format for the reflections when using a formatted reflection file (section 3.5). This record is not required for unformatted reflection files.
This requires that the input reflection file be also MTZ format. It is not possible to have an input formatted and an output unformatted file, or vice versa.
The program labels are 'H', 'K', 'L', 'FP', 'SIGFP', 'FC', 'PHIC' and 'FREE'.
This is used to obtain individual standard deviations of the parameters
by matrix inversion, which is performed by a separate program
(FUMAIN*).
*FUMAIN is not yet a part of CCP4.
Be aware that the process of accumulating the terms of the full normal matrix and then inverting it is extremely CPU and memory intensive!
This feature is experimental.
Each ANISO record defines a contiguous segment of atoms in the coordinate file whose isotropic thermal parameters are to be converted to individual anisotropic parameters by setting each of the diagonal elements of the U tensor to Uiso and the off-diagonal elements to zero. The new anisotropic tensors will be written to the coordinate file in the standard PDB format. This option should therefore not be used for atoms that are already defined as anisotropic (unless you really want to reset them). This option should not be confused with the group thermal parameter option TLSIN. For each segment atoms may be selected by name or by using various keywords. Each contiguous segment is specified as:
An atom identifier is interpreted as a character string, not as an integer, and is matched with the atom number in columns 7 to 11 of the PDB ATOM or HETATM record. Alternatively (and probably more conveniently, as some programs may change the atom numbers), the atoms may be specified by their residue and atom labels joined by a ".", for example: 34A.CG1 . If the coordinate file contains chain identifiers, the chain id must be prepended, including the correct number of spaces. If the resulting string contains any spaces it must be completely enclosed by apostrophes, for example: 'C 13.N'. The atom name may also be omitted leaving both the residue label and the final "."; in that case the range specified either starts at the first atom of the residue or ends at the last atom.  Atom and residue labels, and also residue names are always case sensitive (usually only capital letters are used).
If the second component of the range specification is omitted or given as a null string (i.e. double apostrophe: ' ' in the input), it is set equal to the first component, i.e. specifying a single atom or residue. If both are omitted or given as nulls, the range is set to the entire coordinate file. Note that if you want to specify the optional selection string, you can't leave out either of the range components, you must supply both of them as either non-null or null, so that the selection string is then the third one on the line after the keyword.
Beware that the range specification applies to the file AFTER any re-ordering is done, so it is probably safer to re-order first, then check the coordinate file and specify the ANISO ranges in a separate job.
In the optional selection string, atom names have to conform to the PDB convention. All atom codes found in the PDB atom files can be used. Additionally, four group codes can also be specified: SDCH, MNCH, ALL and NOT. MNCH will select all mainchain atoms (' N ', ' CA ', ' C ' and ' O '), SDCH selects all non-mainchain atoms, ALL selects all atoms and NOT negates the selection of atom types on the line. The order of atom specifiers is not important. If no atom specifier is given, the default is ALL.
Each record contains either the rotation or the translation component of an orthogonal non-crystallographic symmetry operator. The 9 elements for the rotation matrix are read in ROW-wise (beware other programs which read and write column-wise matrices!). In the case of two molecules in the asymmetric unit the input would be:
NCSY MATRIX R11 R12 R13 R21 R22 R23 R31 R32 R33 (rotation 1-2) NCSY TRANS T1 T2 T3 (translation 1-2)
For N molecules in the asymmetric unit there would be N-1 pairs of these records altogether. Alternatively it may be more convenient (and less error-prone!) to use polar angles to specify the rotation component. The use of Eulerian angles to specify the rotation has not been implemented because there are so many different Eulerian angle conventions in use.
NCSY POLAR theta phi chi (rotation 1-2) NCSY TRANS T1 T2 T3 (translation 1-2)
Note that the identity operator is always assumed and may be omitted.
Each record defines a contiguous "occupancy segment" by means of a starting atom number, the number of atoms in the segment, an optional segment "group identifier" and an optional segment "coupling identifier". One or more occupancy segments with the same group and coupling identifiers comprise an "occupancy group". All atoms belonging to the same occupancy group have the same shift applied during occupancy refinement. Two or more occupancy groups may be coupled so that the sum of their occupancies is constrained to be constant; this is done by giving the groups the same group identifier but different coupling identifiers. If occupancy coupling is not required, the coupling identifiers may be omitted. Note that the content of a group identifier carries no significance; only its equality or inequality as compared with the other group identifiers is significant. The same applies to the coupling identifiers; in addition their equality or inequality is only significant when they share a common group identifier.
Each contiguous occupancy segment is specified as:
Alternatively, the atom may be specified by its residue and atom labels; for details see above under "ANISO".
Note that non-unit occupancies must be specified in OCCUP records even if they are only to be used in structure factor calculation and not in refinement (in which case OCCREF=false).
Each RIGID record defines a contiguous segment of atoms in the coordinate file whose rigid-body parameters (3 rotations and 3 translations) are to be refined. Each contiguous segment is specified as:
Alternatively, the atoms may be specified by their residue and atom labels; for details see above under "ANISO".
The purpose of the optional identifier is to allow consolidation of several segments into one rigid body, because a rigid body does not necessarily consist of contiguous atoms in the file. To do this just give the same identifier to segments that are to be part of the same rigid body. Many rigid bodies may be present, but nesting is not allowed.
Each record contains a general equivalent position for the space group typed as in INTERNATIONAL TABLES FOR X-RAY CRYSTALLOGRAPHY, Vol A. If symmetry information is not given here, the SGROUP parameter is used; if that is not defined, the CRYST1 record in the PDB file is searched for the space group name; if none is found then the symmetry information in the MTZ file (if given) is used. Use of this option is discouraged as it is very error-prone; it is better to update the "A HREF="symlib.html">symop.lib
Each record contains an extra interatomic restraint. This is specified as
The atoms either may be specified by their atom numbers in the coordinate file, or by their residue and atom labels; for details see above under "ANISO".
If the distance is given as negative it is interpreted as a repulsion- only restraint, i.e. it is only applied if the calculated distance is less than the specified target distance. This explicit extra restraint may be required because the implicit repulsion restraints (when REPEL=true) are not applied to pairs of atoms in the same residue; nor are they applied to an atom involved in any explicit extra restraints, whether repulsion or not.
Each record contains an extra plane. This is specified as:
Alternatively, the atom may be specified by its residue and atom labels; for details see above under "ANISO".
This keyword introduces the steering data. It must appear on a line by itself after all of the keywords in the above list.
After the record with the single keyword STEER, the data follows on the next line and consists of a series of "name=value" specifications separated either by a comma or by the end of the line (a comma at the end of a line is optional) e.g.:
A=10.8, Gamma =90, ISO= f , isoref = T, Aniso= False G=2 ,High=2.8, dxyzlm=.02 , wF(1)=1.234e-6
The read statement makes use of a simulated version of the FORTRAN NAMELIST facility and thus the order in which the variables are given is immaterial. The letter case and spacing do not matter, and there may be any number of "name=value" specifications per record, up to 80 columns. However a "name=value" specification may not be split across two or more lines, and the use of the "-" continuation character is not allowed.
Only those items which you want to differ from default values need be entered. For example cell parameters are not normally supplied in the steering data because the values in the reflection and/or coordinate files are usually the correct ones. A list of variables which can be input to the program is given below. A detailed explanation of each variable is given in section 3.1.3.
The steering data may be terminated either by end-of-file, or by a variable name &EOF (without a value). In either case, this will cause refinement to be initiated. Additional steering data items (starting on a new line) may follow the &EOF variable. The refinement will then be restarted from the point that it was terminated. The values of the variables used will be those at the termination of the original refinement updated by the new supplied values. This may be repeated as often as desired.
VARIABLE | DEFAULT VALUE |
---|---|
A | (see note) |
ALPHA | 90 |
ANISO | true |
B | (see note) |
BETA | 90 |
BINPUT | true |
C | (see note) |
CGFACR | 25 |
CREACT | false |
CYCNO | 1 |
DESMAT | false |
DICPRI | false |
DIFS | true |
DXYZLM | 0.05 |
FLIBR | 0 |
FMAX | 0 |
FOBMIN | 0 |
FREERFLAG | 0 |
FREF | true |
FULMAT | false |
G | (see note) |
GAMMA | 90 |
GSFACR | ? |
HIGH | 0 |
ILLCON | false |
ISO | true |
ISOREF | true |
ISYM | 0 |
LATTYP | 1 |
LOW | 9999 |
MAXFMT | 5 |
MFACR | 0.1 |
MODULO | 5 |
NCSREF | true |
NCYC | 1 |
NORMAT | false |
OCCREF | true |
OFFDIA | false |
ONLYFC | false |
ONLYFR | false |
ORDER | false |
PHAS | false |
PRTALL | false |
REPEL | true |
RIGID | false |
RMERGE | 0.1 |
RMSMIN | 0.03 |
RSIZE | true |
RWDMIN | 100 |
RWLMIN | 4 |
SB1 | 5 |
SB2 | 1.6 |
SCHEME | 1 |
SFACR | 0.8 |
SFTLIM | 0.02 |
SGROUP | (see note) |
SIGMA | 0 |
TESTIN | false |
TLSREF | true |
TPREST | true |
TSFACR | 0.01 |
U | 0 |
UHIGH | 0.15 |
ULIMH | 2.5 |
ULIML | 0 |
ULOW | 0.02 |
USEDSD | true |
USEFR | false |
USEWFC | false |
WATER | true |
WE(1) | 0.02 |
WE(2) | 0.04 |
WE(3) | 0.05 |
WE(4) | 0.01 |
WE(5) | 0.01 |
WE(6) | 0 |
WF(1) | (see note) |
WF(2) | (see note) |
WF(3) | (see note) |
WF(4) | (see note) |
WF(5) | (see note) |
WF(6) | (see note) |
WF(7) | (see note) |
WFREF | false |
WP(1) | 20 |
WP(2) | 0.2 |
WU(1) | 1 |
WU(2) | 0.01 |
&EOF | - |
Note for table: refer to full explanation of variable in the next section.
Default values are given in brackets immediately after the variable name.
Cell parameters default first to those defined by the SCALE matrix in the input PDB coordinate file; if one is not supplied the values given in the steering data are used; if none are supplied the values given on the CRYST1 record in the PDB file are used; finally if one is not given, the values read from the reflection file are used. If cell parameters cannot be found anywhere the program will terminate abnormally. Usually it is not necessary to supply cell parameters. The default orthogonal setting is the standard PDB one, i.e. x || a and z || c* .
The scale factor G and the overall thermal parameter U may, as an alternative to least squares refinement from initial estimated values, be calculated ab initio by the method of Kraut. (See the documentation for the CCP4 program FHSCAL for details of the method, noting that FP is to be considered as Fobs and FPH as Fcalc). The initial values of G and U are obtained by the program in an extra structure factor cycle before the coordinate refinement cycle(s) (but if the weight calculation option USEWFC is also set to true only one extra cycle is done). This option is activated by omitting both the G and U parameters from the input.
(w(f)1/2) DELTA(|F|) > RWDMINwhere DELTA(|F|) is the absolute difference between the calculated and observed structure amplitudes. If set to a negative value all structure factors are listed.
(w(d)1/2) DELTA(dist) > RWLMIN
where DELTA(dist) is the absolute difference between calculated and observed distances then the distances are printed. If set to a negative value all distances restrained are listed.
f' = f - SB1*exp(-1/2*SB2*q2)where q = 4.PI.sin(theta)/lambda.
The parameters SB1 and SB2 are only used if WATER=true. Their refined values may be used in subsequent cycles in the same way as G and U. These parameters are highly correlated and well defined values may not exist. They may also allow for disordered parts of a macromolecule which do not form part of the model currently being refined.
WE(1) is used for restraints on 1-2 distances (< 2.12Å).
WE(2) is used for restraints on 1-3 distances across bond angles
(>= 2.12 but < 2.625Å).
WE(3) is used for restraints on non-bonded distances (>=
2.625Å).
WE(4) is used for restraints on peptide planes.
WE(5) is used for restraints on ring and other planes.
WE(6) is used for restraints on the edges of chiral tetrahedra.
If all these variables are set to 0 then no contribution from ideal geometry is included, i.e the refinement is based solely on the structure amplitudes, thermal parameters and/or phase data. See also section 2.4. If USEDSD is set true, and the standard deviation of the distance restraint given in the dictionary is > 0, then the WE coefficient (1, 2 or 3) is not used to obtain the weight. For compatibility with previous versions of the program, this version will also accept values of WE(i) >= 1, in which case the value used is 1/WE(i).
The first block is organised into residue types, the first entry for each type being "RESI" followed by the residue name as a three letter abbreviation. Note that these residue names must correspond to those present in your coordinate set (see section 3.3). Within each residue entry the records may appear in any order.
Following the residue entry record are a series of "DIST" records defining the atom names, and each distance restraint in sequence moving down the residue. Each restraint is specified by a positional number defining which atom following the current atom it is restrained to, then the distance in Å and its standard deviation. The order of the different atoms in the residue therefore specifies the positional number. By default the restraint weights are calculated from the standard deviations. Note that the atom names must correspond to those present in your coordinate file (see section 3.3).
"DIHE" records define the name of each dihedral angle and the four positional numbers of the atoms defining this angle. Note that the names are not stored in the program. It is however sensible to use a consistent logical order, since the calculated dihedral angles will be printed in the same order, e.g. phi and psi, chi angles, omega for amino acid residues.
"CHIR" records define the name of each chiral centre and the four positional numbers of the atoms defining this centre. The order in which these atoms should be given should refer to a right-handed rotation when looking along the bond between the first atom (with the lowest positional number in the table) and the one at the centre of the tetrahedron. For Calpha chiral centres in amino acids the order therefore is N-Calpha-C-Cbeta. Note that the names are not stored in the program.
"PLAN" records define the name of each plane, the plane type, an individual plane weight (not used; for future development), and the atom pointers defining these planes. In this version of RESTRAIN only two types of planes are recognised. Planes of type 1 in the list will be put in the first category (PLANE1), all of type 2 in the second one (PLANE2). For amino acid residues the peptide planes therefore are usually put in first position. The reason for this is that RESTRAIN allows different weighting to be used for the two types of plane (see section 2.4). Note that the plane names are not stored in the program.
The residue entries in the first block are terminated by a record starting with END.
The second block consists of "ATOM" records and is organised into atom types, the first entry for each type being the atom name. Note that these atom names must correspond to those present in the first block and in your coordinate set (see section 3.3). Each atom name is followed by a record containing the 4 constants S(i), the 4 constants E(i), the constant C and the closest van der Waals radius RKL.
These constants will be used for a four-Gaussian expansion of the scattering factor:
f(hkl)=SUM(i) S(i)exp(-E(i)(sin(theta)/lambda)2)+C for i = 1,4These constants can be found in INTERNATIONAL TABLES FOR X RAY CRYSTALLOGRAPHY, Vol. IV. The van der Waals radius is used for calculation of nearest allowed distances of atoms more than three bond distances apart when REPEL=true. The second block is terminated by a record starting with END.
The distributed dictionaries (in $CLIBD) are:
chiral_pep4.dic: Main-chain chiral restraints; 4-atom peptide planes.
chiral_pep5.dic: Ditto, but 5-atom planes.
dna.dic
The first is the default if DICTION isn't assigned. A program "rdent" is available to generate RESTRAIN dictionary entries from PDB coordinate files; however it only makes the distance records (without standard deviations), the user has to work out the other sections, but this is not difficult.
The peptide dictionaries use values published by Engh & Huber (1991).
XO || a YO || c* x a ZO || c*
If SCALE records are present in the file, these will override the above, as well as any cell parameters given in the steering data.
A CRYST record if present will override any crystal data (i.e. cell and space group) read from the MTZ file (if used). However any crystal data given in the steering data will override both the PDB and MTZ files.
The coordinate records must be in the format designed by the Brookhaven Protein Data Bank. The format expected is:
Care must be taken in preparing the coordinates for refinement. After each polymer chain a TER record must be inserted. All atoms not contained in chains must be labelled HETATM.
Note that atomic thermal parameters can be read as either U's or B's (B=8.PI2.U); the variable BINPUT must be set accordingly. After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64Å2) to more reasonable starting values (e.g. U=0.2 or B=16Å2).
The number of atoms in each residue in the polymer chains must be the same as the number of atoms in that residue in the dictionary. The names of all atoms must correspond to the names of the atoms in the dictionary. Blanks (including leading blanks) are significant in assessing an atom name.
The atomic coordinates in the polymer chains must be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary. If this is not the case, set ORDER=true in the steering data in the initial cycle. The output file of atomic coordinates will then be produced in dictionary order for subsequent cycles. Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.
For anisotropic thermal parameters the six values defining the U tensor of an atom U(11) U(22) U(33) U(12) U(13) U(23) are written out (multiplied by 104 immediately following the coordinate record of that atom. The record containing the U tensor is identified by the label ANISOU. The format used for this record is (A6,22X,6I7).
All information for the group thermal parameter refinement is contained in the file assigned to TLSIN; the steering data does not contain any information. Each thermal parameter group is defined by an entry in the TLSIN file.
The layout of a UISO entry is typically:
UISO name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . U Uiso (Å2)
The layout of a UANISO entry is typically:
UANISO name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . U U11 U22 U33 U23 U31 U12 (Å2)
The layout of a TLS entry is typically:
TLS name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . ORIGIN x y z (Å) T T11 T22 T33 T23 T31 T12 (Å2) L L11 L22 L33 L23 L31 L12 (deg.2) S S1 S2 S23 S31 S12 S32 S13 S21 (Å.deg.)
Uij means the element (i,j) of tensor U. Since X-ray
data allow the calculation of only eight of nine S tensor elements, the
usual constraint of setting the trace of S to zero is adopted.
This means that the elements S1 and
S2 are (S33 - S22) and (S11
- S33) of the S tensor as defined by the equation
U = T + A L A' + A S + S'A' (Johnson and Levy, 1974).
Note that the order of the off-diagonal terms in the group U, T and L tensors is different from that of the U tensor in the coordinate file (the 23 and 12 elements are swapped).
All the records of each except the first (UISO, UANISO or TLS) are optional, and can appear in any order. The data will assume sensible defaults if not supplied (so the TLSIN file may contain only 1 line). If the U or T record is omitted, the mean isotropic thermal parameter for the group is either used as is for UISO, or converted to the equivalent anisotropic tensor for UANISO or TLS. ORIGIN specifies the local origin of a TLS group; if omitted it is set to the mean centre of the group. The L and S tensors if omitted are set to zero. In addition to the keyworded records shown above, the following are also accepted: DEFAULT, NOATOM, RESIDUE (see the next section for details).
Only the first 4 letters of the keywords are significant and they are case-insensitive. The format is free, that is items separated by one or more spaces. If items are left blank they default to zero values.
UISO [name]
Introduces Uiso group. "name" is optional text used to identify
the group in the output.
UANISO [name]
Introduces Uaniso group.
TLS [name]
Introduces TLS group.
RANGE atom_id_start atom_id_end [atom_selection]
The RANGE record contains two atom identifiers indicating the start and
finish of a segment of the coordinate file followed optionally by the
names of atoms to be selected from this segment for inclusion in the
group. There may be any number of RANGE records per entry,
including none (in which case the range of the group is the entire
coordinate file). See section 3.1.1 under keyword ANISO for a
description of the options available for defining the range and the
atom selection.
U Uiso
or
U U11
U22 U33 U23
U31 U12
Group isotropic thermal parameter, if a UISO group, or group anisotropic
thermal tensor components (6), if a UANISO group.
ORIGIN x y z
Coordinates of the local origin of the TLS group. For an aromatic
ring it is usually the C-beta atom; for larger groups such as
domains it is usually the mean centre (the default).
T T11
T22 T33 T23
T31 T12
T tensor components (6) for TLS group.
L L11
L22 L33 L23
L31 L12
L tensor components (6) for TLS group.
S S1
S2 S23 S31
S12 S32 S13
S21
S tensor components (8) for TLS group. If CREACT=true (refine
centre of reaction of all TLS groups), the S tensor is symmetric, so
only the first 5 components are needed.
DEFAULT
This specifies that the values in the current group may be overridden
if a subsequent group specifies any atoms in common with this
group. Otherwise it is an error to specify groups that have common
atoms. For example, one could specify a default UANISO group for
the whole coordinate file; then override it with smaller UANISO or TLS
groups. Any atoms left outside these groups would get the overall
Uaniso.
NOATOM
This switches off the default option to refine isotropic thermal
parameters for atoms in the current group at the same time as the group
parameters. This is only valid for UANISO and TLS groups.
RESIDUE
This causes all the range(s) specified for the current group to be
split up into single residues, each with its own set of parameters of
the same type as the parent group, which are then refined independently.
An example of the TLS record specifying a TLS group consisting of two
mainchain segments, with atoms in residues 1 to 68 and 129 to 300 is:
TLS N domain RANGE 1. 68. MNCH RANGE 129. 300. MNCH T .112 .165 .131 -.052 -.003 -.003 L 1.877 2.165 3.471 4.562 6.152 7.313 S .366 -.382 .147 -.981 .185 .118 .132 .140
Where TLS tensors result in U tensor that is not positive-definite, a warning message is printed out stating the atom name, number and U tensor.
If the L tensor elements are large (>20 degr2) and an atom is far away from the centre of origin for the calculation of the TLS tensors (>20Å), then the observed and calculated structure factor amplitudes can be different by several orders of magnitude. This is a consequence of the numerical instability in calculation of derivatives of the TLS tensors with respect to positional coordinates (on some machines it may also result in an overflow floating point error). These problems usually appear at the beginning of the TLS refinement of large groups if the user does not set the initial L small enough and origin of the rigid group sufficiently close to the centre of gravity. Such an error is checked for in two ways. First, a warning message is printed if the selected origin is more than 10Å away from the gravity centre. Second, a warning message is printed if more than 30% of elements of U tensors for individual atoms had to be reset to an arbitrary interval [0, ULIMH].
Note that TLS calculations, like all anisotropic calculations, cannot take advantage of space-group specific subroutines. The general space-group subroutine must be used.
H K L FOBS SIGMA(FOBS) PHASE FOM FREERFLAG Item Description Form- Unform- atted atted H K L Miller indices of reflection I R FOBS Observed structure factor amplitude I R SIGMA(FOBS) Standard deviation in observed amplitude I R PHASE Estimated phase from isomorphous and/or anomalous data I R FOM Figure of merit for phase (on scale of 0-100) I R FREERFLAG Free R flag (MTZ only) ITwo file types containing the amplitude and/or phase data are accepted. Which file type is actually read depends on the keyword REFIN or HKLIN (see section 3.1.1).
When REFIN is used, a formatted reflection file is read and the input depends on the value specified for MAXFMT which must be >=4 and <=7. When MAXFMT is 5 the items H, K, L, FOBS AND SIGMA(FOBS) will be read. The reflections are read in with the format specified after the steering data. Note that the format must be consistent with the value for MAXFMT.
When HKLIN is used then the input is read from an unformatted (MTZ) reflection file. The file has header information containing the crystal data (cell parameters and space group), which means that this information does not normally need to be supplied in the steering data.
File File name Description - refined atomic coordinates XYZOUT section 4.2 - refined group thermal parameters TLSOUT section 4.3 - structure factors HKLOUT or REFOUT section 4.4 - full normal matrix MATOUT section 4.5 - design matrix DESOUT section 4.5There are also 3 scratch files used by RESTRAIN:
File Unit Description - coordinates for ordering 12 section 4.6 - reflections for scaling & weighting 14 section 4.6 - normal equations for positional parms. 11 section 4.6
1. The program header stating the version number used.
2. The array dimensions which have been set using the PARAMETER statements.
3. The TITLE as supplied by the user in the control data.
4. The filenames for coordinates input XYZIN, reflections input REFIN, dictionary DICTION, coordinates output XYZOUT and reflections output REFOUT.
5. FORMAT FOR INPUT: The format specified by the user is printed.
6. Under this heading there follows a list of all the steering parameters, with their default values and the input values which were specified by the user. If no value for a parameter has been given, the default value is used, with the exception of the cell parameters and the scale factor G and overall thermal parameter U, which must be supplied by the user.
7. FRACTIONAL CRYSTALLOGRAPHIC EQUIVALENT POSITIONS. The general equivalent positions are given in the format of International Tables Vol. A. It is advisable to check these at the beginning of a refinement.
8. When refining using non-crystallographic symmetry MODE 2 (RIGID=false) ORTHOGONAL NON-CRYSTALLOGRAPHIC EQUIVALENT POSITIONS will be printed. These will then be followed by a list of ALL ORTHOGONAL EQUIVALENT POSITIONS including those generated by the non-crystallographic symmetry.
9. When extra distance restraints are to be used NUMBER OF NON-DICTIONARY RESTRAINTS will be printed. Six restraints per line are listed. These restraints are ATOM1 ATOM2 DISTANCE e.g. 190- 638 2.08 means that the distance between atoms 190 and 638 is 2.08Å. Check that the restraints are correct.
10. When extra planes are to be used NUMBER OF NON- DICTIONARY PLANES will be printed. Check that the planes are correct. E.g.
FIRST ATOM ATOMS IN PLANE 1045 6This means that there are 6 atoms in the extra plane, the first atom being number 1045, the other 5 atoms following sequentially with no atoms being skipped.
11. When atoms are to have occupancies refined (OCCREF=true) NUMBER OF OCCUPANCY GROUPS will be printed. The occupancy groups are then listed.
FIRST ATOM NUMBER GROUP COUPLING OF ATOMS NUMBER NUMBER 910 6 1 1 952 5 2 1 1045 6 1 -1This shows the two cases:
The present occupancies as read from the input coordinates are then listed.
ATOM 910 HAS OCCUPANCY 0.621. ATOM 1045 HAS OCCUPANCY 0.379. ATOM 952 HAS OCCUPANCY 0.565.Note that coupled occupancies should add up to 1.
12. In case DICPRI is true, the contents of the dictionary will be printed as it is read to facilitate the development of new entries. At the end some overall statistics are printed.
13. MOLECULAR PARAMETERS. This is self-explanatory. Note that (groups of) terminal atoms may be counted as extra residues. This is seen when for the carboxyterminal oxygen a separate residue entry in the dictionary is used.
14. If there are groups of atoms which are to have their thermal parameters refined by rigid body option, the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY BY RIGID BODY (TLS) is printed, followed by the description of rigid bodies using the format in section 2.3.6.
15. If there are groups of atoms which are to have their thermal parameters refined anisotropically the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY is printed, followed by 10 ranges per line giving first and last atom number (internal counters).
16. If there are rigid groups, these are listed under the heading ATOMS IN THE FOLLOWING RANGES TO BE REFINED AS RIGID GROUPS. Ten ranges per line are printed giving first and last atom number (internal counters).
17. When refining using non-crystallographic symmetry MODE 1 (RIGID=true) ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED AS RIGID GROUPS RELATED BY NON- CRYSTALLOGRAPHIC SYMMETRY is printed. For each molecule the atom ranges (internal counters) are given, followed by a description of the non-crystallographic symmetry operation in terms of a rotation and a screw translation. This is an aid in visualising the transformation involved.
18. NUMBER OF PARAMETERS TO BE REFINED. This gives an indication of the stability of the refinement seen in relation to the number of observeds and restraints.
19. The cycle number CYCNO as supplied by the user (or default value 1).
20. When refining TLS parameters there is a list of those atoms within TLS groups for which are the derived anisotropic tensors are not positive definite. This information is listed below details of the TLS group concerned.
***AGREEMENT BETWEEN FO AND FC BASED ON INPUT COORDINATES***
26. TITLES READ FROM REFLECTION FILE when a binary reflection file is used.
27. UNFAVOURABLE AGREEMENTS BETWEEN F(OBS) AND F(CALCS) AS DETERMINED BY RWDMIN. Under this header structure factors are listed, when their rootweighted (Fo - G.Fc) (DELTA ROOTW) is larger than the user supplied value for RWDMIN. In the early stages of a refinement it is advisable to print some structure factors, to check whether the amplitudes and/or phases are read correctly, and to see which reflections cause problems. In later stages this output can then be suppressed.
28. TABLE OF TOTALS DERIVED FROM THE STRUCTURE FACTORS INCLUDING THE R FACTOR. This table gives information about the number of reflections (and phases) used, W DELTA SQ or SUM w(f)(Fo - G.Fc)2 is the term being minimised. Then two residuals and a correlation coefficient are printed.
R = SUM(|Fo| - G.|Fc|) / SUM(|Fo|) RDASH = (SUM(W.(|Fo| - G.|Fc|)2) / SUM(W.|Fo|2))1/2 C = (N.SUM(|Fo|.|Fc|) - SUM(|Fo|).SUM(|Fc|)) / ((N.SUM(|Fo|2) - SUM(|Fo|)2) . (N.SUM(|Fc|2) - SUM(|Fc|)2))1/2where N is the number of amplitudes used.
The conventional R-factor is self-explanatory. However, it is the weighted R-factor which gives an indication of the progress of the refinement. As long as this residual is decreasing, there is hope, even when the unweighted R-factor temporarily increases (which is sometimes seen in the initial cycles of a refinement). The correlation coefficient may have a greater discerning power than the R-factors, when refining potential molecular replacement solutions at low resolution.
***ANALYSIS OF STRUCTURE FACTOR TERMS***
29. This table prints the mean w.delta2 values for amplitudes (and phases if PHAS is true) in batches according to the resolution (columns) and amplitudes (rows). The table will be very useful when judging the effect of the weights which are printed above the table. Above the table the weighting formula as defined by SCHEME and WF(i) is shown.
30. The values of the refined scale (G) and overall thermal parameter (U). If WATER=true, the values of the parameters SB1 AND SB2 will also be printed.
***GEOMETRY OF INPUT COORDINATES***
31. Under this header restrained interatomic distances are listed, when their rootweighted d(t) - d(c) (RWDELTA) is larger than the user supplied value for RWLMIN. In the early stages of a refinement it is advisable to print some differences, to check whether the order of the coordinates is correct, and to see which distances cause problems. In later stages this output can then be suppressed. This table also gives the r.m.s deviations from planarity of the peptide and ring planes where they exceed 0.03Å. If a chiral centre threatens to reverse hand, or has already done so, the tetrahedral volume will be printed. If many residues have this tendency as sometimes happens in the early stages of a refinement, it may be useful to use a dictionary with extra chiral restraints, and to use a value for the weighting coefficient WE(6) < WE(1).
At the right-hand side of this table the torsion angles as calculated from the coordinates are listed in the order as defined by the dictionary.
***ANALYSIS OF ENERGY TERMS***
32. A table printing the mean w.delta2 values for distance and planarity restraints in groups according to the target distance or plane type is given. This table will be very useful when judging the effect of the weighting coefficients which are also printed in this table, with WE(1) to WE(6) from left to right.
***ANALYSIS OF FUNCTION MINIMISED***
33. Under this heading a table prints the value of the function minimised (see section 1.1), showing the sum of the w.delta2 values for the amplitudes, phases, distance restraints and planarity restraints, and their relative contribution to the total minimum. This will be useful in defining the relative weights for each term. When FREF=true there will be a second table showing the relative residuals in dependence on the resolution.
***ANALYSIS OF GAUSS-SEIDEL SOLUTION OF NORMAL EQUATIONS***
34. This next block of information describes the convergence of the Gauss-Seidel iterative method for solving the normal equations for the positional parameters. The first table describes the condition of the matrix.
This is followed by a table describing the solution of the normal equations listing for each iteration : the iteration number I, MEAN(Q) and MAX(Q), the mean and maximum respectively of the elements of DELTA P(I) - DELTA P(I-1) and DELTA P (I) - DELTA P (I-1) / DELTA P (I), where P(I) = solution vector at iteration I.
The ANGLE BETWEEN SHIFT VECTOR AND DIRECTION OF STEEPEST DESCENT gives an indication of the progress towards the minimum.
In case the program cannot not solve the normal equations, MFACR will be automatically incremented, and a retry will take place. When this leads to divergence again, some suggestions are printed.
***ANALYSIS OF RESIDUAL TO DETERMINE OPTIMUM SHIFT FACTOR***
35. This table shows the results of the sampled residual calculations using
Actual shift = SFACR * calculated shiftSampled residual calculations are made to determine the optimum shift factor (ESTIMATED SHIFT FACTOR).
36. The r.m.s atomic shift is printed out. This indicates whether any refinement is still taking place, or if convergence has been reached.
37. If there are rigid groups, for each group the three translations and a rotation angle around an axis, of which the direction cosines are given, are printed together with the r.m.s atomic shift. The latter value will give an indication if convergence is being approached.
***ANALYSIS OF NON-CRYSTALLOGRAPHIC SYMMETRY***
38. When refining using non-crystallographic symmetry MODE 1 (RIGID=true) the program will print the new transformation for each molecule, followed by a description of this non-crystallographic symmetry operation in terms of a rotation and a screw translation. This can then be compared to the input value printed in item 17.
***SHIFTS IN OUTPUT COORDINATES***
39. Next is printed a listing of all atoms, to which shifts larger than DXYZLM have been applied, or which have U values not within the range ULOW to UHIGH. In case of anisotropic atoms the trace is used to determine whether the tensor is printed. In the case of multiple cycles the shifts refer to the last cycle only.
40. The r.m.s atomic shift for the original input coordinates is printed out. This will be different from the one under item 31 when more than one cycle has been run, and/or when constrained-restrained refinement has taken place.
41. When refining TLS parameters there is a list of the refined TLS groups with the derived anisotropic tensor for each atom in the group. This is checked for being positive definite. The results may be compared with those of item 20.
H K L 40000(sin(theta)/lambda)2 Fo/G SIGMA/G Fc PHASEin the format (3I4,4I6,I4) for REFOUT, or
H K L Fo/G SIGMA/G Fc PHASEunformatted for HKLOUT. When no sigma is read in, 1/sqrt(weight) replaces SIGMA in the output.
If DESMAT is set true, the design matrix is
written to the file DESOUT. The output file is used by another
program (FUMAIN2*) for estimation of the variance of the least-squares
residual. At present this feature is experimental.
*FUMAIN2 is not
yet a part of CCP4.
An unformatted scratch file (unit 14) may be used for temporary reflection storage when initial calculation of the overall scale and thermal parameters, or of the amplitude weighting coefficients, is required.
An unformatted scratch file (unit 11) will be opened to store the approximation to the normal matrix where contributions to the off-diagonal terms are included for the energy restraints and 3x3 blocks are used for the contribution from the position all parameters of the atoms. All other off-diagonal terms are taken as zero. This file is read several times during the solving of the normal equations (see variables SFTLIM, CGFACR and GSFACR in section 3.1.3).
When using weighting schemes with the standard deviation or when using MIR or MIRAS phases you must have these present in your reflection file.
#!/bin/tcsh set r=$0:r time restrain <<EOF TITLE Illustrating all the options in one script! ! ! First define the input and output files (can also do it on command line). ! All input is free format, order and letter case of keywords don't matter. ! XYZIN hexpep.brk ! Check section 3.3 for preparation guide. TLSIN hexpep.tls ! Needed for group thermal parameters. ! Described in detail below. HKLIN hexpepf.mtz LABIN FP=FP_hexpep SIGFP=SP_hexpep FREE=FreeR_flag XYZOUT $r.brk TLSOUT $r.tls HKLOUT $r.mtz LABOUT FC=FC_hexpep PHIC=PC_hexpep ! ! ANISO creates individual atomic anisotropic thermal tensors (high res.only!). ! ANISO 327.CA ! This will match either Calpha or calcium. ANISO 10. 50. ! Residues 10-50, all atoms. ANISO 200. 250. ' CA' ' CB' ! Calpha's (but not calcium!) & Cbeta's only. ANISO 100. 150. mnch ! Main chain atoms only. ANISO 151. 190. sdch ! Side chain atoms only. ! ! NCSYMM defines NCS operators (3 molecules/a.u. here; identity is assumed). ! NCSY POLAR 25.563 87.995 127.906 ! Can also say "NCSY MATRIX ...". NCSY TRANS 100.076 -3.502 9.137 ! Use lsqkab to get these. NCSY POLAR 65.746 117.435 180.153 NCSY TRANS 119.479 46.151 31.805 ! ! OCCU allows occupancies in PDB file to be used, and creates occupancy groups. ! Here group A consists of 4 atoms with 3 coupled occupancy parameters, ! i.e. their sum is constant. ! Group B consists of 6 atoms with one free occupancy parameter. ! OCCU 101.CG 4 A 1 ! First atom id, no. of atoms, group id, coupling id. OCCU 151.CB 5 B OCCU 51.SG 1 B OCCU 251.CG 4 A 2 OCCU 201.CG 4 A 3 ! ! RIGID defines rigid bodies. ! RIGID 10. 50. A ! Residues 10-50, all atoms, rigid group A. RIGID 200. 250. A ! More atoms in group A. RIGID 100. 150. A ! Yet more. RIGID 151. 190. B ! These are in rigid group B. ! ! XTRD defines extra distance restraints. ! Here's a real example with a disordered cystine. ! XTRD 18.N 618.CB 2.455 0.034 ! Atom 1 Atom 2 d [sigma(d)] XTRD 18.CA 618.CB 1.530 0.020 ! Residue 618 is an alternate s/c of 18. XTRD 18.CA 618.SG 2.822 0.043 XTRD 18.CB 622.SG 3.034 0.059 XTRD 18.SG 622.CB 3.034 0.059 XTRD 18.SG 622.SG 2.030 0.008 XTRD 18.C 618.CB 2.504 0.038 XTRD 22.N 622.CB 2.455 0.034 ! Residue 622 is an alternate s/c of 22. XTRD 22.CA 622.CB 1.530 0.020 XTRD 22.CA 622.SG 2.822 0.043 XTRD 22.C 622.CB 2.504 0.038 XTRD 618.CB 622.SG 3.034 0.059 XTRD 618.SG 622.CB 3.034 0.059 XTRD 618.SG 622.SG 2.030 0.008 ! STEER ! ! "Steering data" follows STEER keyword (uses simulated Fortran NAMELIST). ! NCYC=8, CYCNO=21, SCHEME=5 ! May want to modify these. EOF
UANISO ! Overall anisotropic tensor. DEFAULT ! Defines default values, ! i.e. may be overridden. UANISO N-domain ! Group anisotropic tensor just for one domain. RANGE 1. 180. ! Domain consists of 2 contiguous segments. RANGE 98. 327. TLS C-domain ! TLS tensor for other domain. RANGE 191. 290. ! This domain has just one segment. TLS A-helix ! TLS tensor for helix main chain. RANGE 30. 55. mnch NOATOM ! Don't refine atomic Uiso's for this group. UANISO TRP 99 s/c RANGE 99. '' sdch ! U tensor for individual side chain. NOATOM ! Don't refine atomic Uiso's for this group. UISO ! Can also do group isotropic tensors. RANGE 100. 130. sdch ! Side-chains of residues 100-130 will have RESIDUE ! separate group Uiso's.
! APP ANISO/TLS AT 2.1Å no refine Creact. ! Output from refinement cycle 5 UANISO Polypro helix. RANGE 1. 9. ALL U 0.0053 -0.0086 0.0033 -0.0140 0.0044 0.0007 ! (0.0112)(0.0127)(0.0204)<0.0064>(0.0063)(0.0066) TLS Alpha helix. RANGE 13. 32. ALL ORIGIN -3.103 -8.863 3.788 T 0.2217 0.1885 0.2016 -0.0052 0.0067 0.0045 ! <0.0083><0.0090><0.0145>(0.0053)<0.0052>(0.0052) L 0.62 1.67 2.23 -0.45 -0.15 0.23 ! ( 1.12)< 0.46>< 0.87>< 0.44>( 0.82)( 0.47) S -0.025 0.000 0.045 -0.008 0.048 -0.034 -0.034 -0.066 ! ( 0.057)( 0.070)< 0.034>( 0.049)( 0.052)( 0.052)( 0.054)< 0.033> END
NCSY POLAR ... NCSY TRANS ... G=5.0972, SB1=3.6804, SB2=10.6158 WF(2)= 1.94021E+03, WF(3)= 1.06121E+01, WF(4)= 1.22706E-02