WATERTIDY (CCP4: Supported Program)

NAME

watertidy - rationalise waters at the end of refinement

SYNOPSIS

watertidy xyzin refined-coords.brk distout distang-out.log xyzout tidied-coords.brk
[Keyworded input]

DESCRIPTION

At the end of refinement it is useful to try to rationalise the H2O naming. You may have more than one molecule in the asymmetric unit; have two isomorphous structures, etc., and want to compare the H2O structures for them.

This program has two purposes.

  1. It moves the H2O coordinates to the symmetry related position nearest to the host molecule.
  2. It attempts to design an H2O naming system which gives some information about the residue which a particular H2O is hydrogen bonded to. The user inputs chain IDs for host chains and assigns an output ID for the H2Os bonded to this chain.

The distance search is done with the program DISTANG, which must be run first. WATERTIDY then reads in the DISTOUT file from DISTANG which lists all close contacts, and does some preliminary analysis of H2O contacts (e.g. contact too close, C involved in close contact, number of contacts per chain).

Important: post CCP4 V4.2, WATERTIDY cannot read the logfile from DISTANG directly. Instead the OUTPUT DISTOUT option of DISTANG must be used - the resulting output file assigned to DISTOUT from DISTANG will be in the appropriate format to be read in by WATERTIDY.

This generates another problem; what to do about H2Os which are bonded to more than one host atom? The solution used here is to list such H2Os more than once, giving the site closest to a host atom the input occupancy, and all secondary sites occupancy <occw> (default value 0.01, see keyword OCCW).

The program can be run first to find the H2Os linked to the protein molecule, then a second or third pass would attempt to apply the same rules to renaming H2Os in a second or third solvent shell which will not have been renamed at all in the previous pass.

All non relabelled atoms are output exactly as input.

WATERTIDY names the waters with the appropriate output ID and a label containing information about which residue and atom type the water is H-bonded to. An H2O is labelled in the output PDB file as

O<i><j> WAT <chnid> <nres>
where <nres> is the host residue number and <chnid> is the assigned output ID. <i> and <j> are defined as follows:
  1. If the host atom belongs to a protein residue the number <i> (range 0-9) defines the bonding atom type as follows:
          0 for N 
          1 for O
          2 for OG OG1 
          3 for OD1 ND1
          4 for OD2 ND2
          5 for OE OE1 NE1  
          6 for OE2 NE2
          7 for NZ        
          8 for OH OH1 NH1 
          9 for OH2 NH2
    
    Additional assignments for <i> are made as follows:
          0  for OW
         <n> for O<n> or OW<n> where n=0-9
         <n> for O<n><m> where n,m=0-9
    
    The number <j> (range 0-3) numbers the contact of the H2O to the protein atom; up to <hbond> H2Os can be bonded to one atom. An extension to allow other acceptor atoms (e.g. C S etc.) means that the numbering has to be modified slightly.
          0 for CA        as well
          1 for C         as well
          2 for CG CG1    as well
          3 for CD CD1    as well
          4 for CD2 CD3   as well
          5 for CE CE1    as well
          6 for CE2 CE3.. as well
          7 for CZ        as well
          8 for CH CH1    as well
          9 for CH2 CH3.. as well
    
  2. If the host atom is another H2O the number <i> will be the same as that of the host atom.
    The number <j> (range 4-6) numbers the contact of the H2O to its host for the second shell; up to 3 H2Os can be bonded to one atom and <j> is offset to the range 4-6 to make it clear which H2Os are in the second shell.
    The number <j> (range 7-9) numbers the contact of the H2O to its host for the third shell; up to 3 H2Os can be bonded to one atom and <j> is offset to the range 7-9 to make it clear which H2Os are in the third shell.
    For molecules with non-crystallographic symmetry there is no guarantee that the <j>-th number for one related chain will be the same as that for the other.

When you have assigned as many shells as you feel are needed, resort the output water atoms of the PDB file on <chnid>, residue number, etc., using the system sort utility. On Unix, this sorts on <chainid> first, then residue number then atom number:

sort +4 -5 +5 -6 +3.1 - 3.3 wat.pdb > wat_sorted.pdb
BEWARE: Your CRYSTAL and SCALE cards will be scrambled by the sorting.

INPUT AND OUTPUT FILES

Input

XYZIN
Input coordinate file in PDB format.
DISTOUT
Output file from the program DISTANG, using the OUTPUT DISTOUT option. The program reads the list of distances included in the DISTOUT file, and ignores the rest.

Output

XYZOUT
Output coordinate file in PDB format. Water atoms will be relabelled as described above, and may have been moved to a symmetry-related position. Water atoms which bond to more than one host atom will be duplicated, with second and subsequent entries having occupancy <occw>.

KEYWORDED INPUT

Available keywords are:

ACCEPT, CHNID, END, HBOND, OCCW, SHELL, SYMMETRY, TITLE, WATID.

ACCEPT <id> ...

Specify extra acceptors: single character atom types, default O N.

CHNID <chainid> [ WATOUTID <id> ] [ RANGE <residue1> <residue2> ]

<chainid>
The host chain id (the chain identifier for the <ich>-th host chain), as it appears in XYZIN e.g. A or B.
<id>
A single character label for the water chain bonded to <chainid>, to be used in XYZOUT.
<residue1> <residue2>
The starting and ending residue numbers for the host chain. This range is necessary if the chain is not numbered 1, 2, 3... or if you have more than one chain.

HBOND <hbond>

Maximum number of waters bonded to one atom, default 4.

OCCW <occw>

Occupancy for secondary sites (default 0.01). If <occw> is set to 0.0 then secondary sites are not written to XYZOUT.

SHELL <shell>

Specify the shell number (up to 3), default 1.

SYMMETRY <SG name> | <SG number> | <operators>

Standard symmetry specification. This must be the same as used for DISTANG.

TITLE <title>

<title> is written to output PDB file as a REMARK.

WATID <id>

Water chain id. The chain identifier for unassigned H2Os to be assigned in this pass, as it appears in XYZIN.

END

Terminate input.

EXAMPLES

Example of output file

REMARK
REMARK
SCALE2       0.00000   0.03820   0.00000        0.00000
SCALE3       0.00000   0.00000   0.01937        0.00000
SCALE1       0.01897   0.00000   0.00099        0.00000
ATOM      1  N   GLY A   1      -8.094   0.714  38.861  1.00 19.52
 ...
ATOM     18  C   VAL A   3     -10.635   2.653  34.037  1.00 15.79
ATOM     13  N   VAL A   3      -8.153   2.210  33.953  1.00 16.23
 ...
ATOM     25  N   GLU A   4     -10.661   2.145  35.262  1.00 13.58
ATOM     28  O   GLU A   4     -12.831   4.702  36.359  1.00 15.64
 ....
ATOM     21  OE1 GLU A   4      -9.572   0.074  36.837  1.00 30.05
ATOM     20  OE2 GLU A   4     -11.042  -1.224  35.968  1.00 32.63
 ....
ATOM    769  O00 WAT P   1      -8.453  -1.913  39.350  1.00 45.10
   A H2O bonded to the N of GLY A 1...
ATOM    772  O00 WAT P   3      -7.612  -0.514  34.997  0.01 22.90
   A H2O bonded to the N of VAL A 3...
ATOM    750  O10 WAT P   4     -14.304   4.121  38.925  1.00 25.25
ATOM    772  O50 WAT P   4      -7.612  -0.514  34.997  1.00 22.90
 ...
ATOM    795  O04 WAT T   3      -5.847  -2.930  35.432  0.01 30.04
ATOM    749  O14 WAT T   4     -11.391   4.228  40.350  1.00 32.06
ATOM    811  O15 WAT T   4     -14.681   2.966  41.308  1.00 56.74
ATOM    795  O54 WAT T   4      -5.847  -2.930  35.432  0.01 30.04
 ...

Unix example script found in $CEXAM/unix/runnable/

SEE ALSO

distang, pdbset, sort (1)