The shape correlation statistic Sc (Lawrence and Colman, 1993) can be used to quantify the shape complementarity of protein/protein interfaces and give an idea of the "goodness of fit" between two protein surfaces. The program SC will calculate values of Sc and related statistics for the interface region between two molecules in a Brookhaven coordinate file.
SC also allows the normal products to be merged into GRASP surface files for display in GRASP (Nicholls, 1993).
The input comprises three sections:
The molecule definition commands are used to select which atoms in the input file are to make up the two individual molecules for the Sc calculation. Entries for this section appear twice, once for each molecule (see EXAMPLES):
AT_EXCL, AT_INCL, CHAIN, MOLECULE, ZONE
The default values for the parameters are set inside the program at compilation time (in the file defaults.h), and should be suitable for most applications. In particular you should avoid using different values for PROBE_RADIUS, TRIM and WEIGHT if you intend to compare your values of Sc with the results of other calculations, or with values found in the literature.
DOT_DENSITY, INTERFACE, PROBE_RADIUS, TRIM, WEIGHT
These commands are only required if you want to merge the results of the Sc calculations with existing GRASP surface files for the purposes of graphical display.
GRASP_BACKGROUND, GRASP_MATCH
See NOTES ON GRASP FILES if you intend to use the merging facility.
[Default: 1.7 Å]
Sets the radius of the probe sphere which is used to define the solvent excluded surface.
Note:You should avoid changing the probe radius if you intend to cross-compare the results of the Sc calculation with values obtained elsewhere, as the comparison will be invalid if different probe radii are used.
[Default: 15 dots/Å2]
The density of the dots used to calculate the molecular surface - higher values (more dots per unit area) give higher precision but also take longer to run.
[Default: 1.5 Å]
Sets the distance used to generate the peripheral band.
The peripheral band consists of those surface points which are part of the buried portion of the molecular surface but which lie within a distance <trim> of the non-buried (i.e. solvent accessible) surface. Points in the peripheral band are omitted from the calculations.
Note: You should avoid changing the width of the peripheral band if you intend to cross-compare the results of the Sc calculation with values obtained elsewhere, as the value of Sc depends on the width of the excluded band.
[Default: 8 Å]
Distance determining which atoms are used in the calculations. See PROGRAM FUNCTION for details about this parameter before changing it.
[Default: 0.5 Å-2]
This sets the value of the weighting factor used in the calculation of the surface complementarity function S(A->B). (See PRINTER OUTPUT for the definition of S(A->B).)
Note: You should avoid changing the weighting factor if you intend to cross-compare the results of the Sc calculation with values obtained elsewhere, as the value of Sc depends on the weighting used.
[Default: 1.5 Å]
The tolerance for equivalencing GRASP and SC surface points. The strategy employed by the program is to assign to each GRASP surface vertex the weighted normal dot product associated with the nearest Connolly surface point to that vertex. If no point employed within the Sc calculation is found within a distance <tol> of the vertex then the vertex is deemed to be part of the non-interacting surface. The value of <tol> will depend on the dot density and resolution of the respective surfaces. The non-interacting surfaces are assigned a general property 1 value assigned by the GRASP_BACKGROUND keyword (below).
[Default: -2.0]
General Property 1 value for vertices that lie more than GRASP_MATCH from any Connolly point within the interacting surfaces. The aim here is simply to set up a distinctly different value that can hence be displayed in a separate colour within GRASP.
End keyworded input.
The program output includes the following loggraph tables for each of the molecules.
D(A->B) is defined as
where xA is a point on the interface (i.e. buried) surface of molecule A and x'A is the nearest surface point to xA on molecule B. (It is noted that differences in shape complementarity are less well discerned by these simple distance metrics. See Lawrence and Colman, 1993.)
S(A->>B) (also referred to as the weighted normal dot product) is defined as
where xA, x'A have the same meanings as above, nA,n'A are the normals to the surfaces at those points, and w is a weighting factor.
The shape correlation Sc is then defined as
where the braces denote the median of the S(A->B), S(B->A) distributions. (See Lawrence and Colman, 1993 for more detailed descriptions of these functions.)
Interfaces with Sc = 1 will mesh precisely, interfaces with Sc approximately zero will effectively be uncorrelated in their topography.
Note that Sc may become rather meaningless when the buried area becomes small, and hence it may not be a good measure for small crystal contacts. This is simply because as the overall buried area becomes smaller and/or more convoluted or disjointed in shape, the percentage removed as part of the peripheral band increases substantially.
This program computes Sc between two molecules in a numerical fashion. The algorithm is fully detailed in Lawrence and Colman, 1993. Briefly: the molecular surfaces are represented as a series of discrete points (Connolly, 1983) of sufficiently high surface sampling density (set by the DOT_DENSITY keyword) and S(1->2) and S(2->1) are then evaluated at these points.
The interface surfaces are defined as being the portion of the molecular surface of molecule 1 which is buried from solvent by its interaction with molecule 2 (and vice versa). The molecular surface itself is defined (Richards, 1977) as the union of contact and re-entrant portions demarcated by a probe sphere of a given radius (set by the PROBE_RADIUS keyword).
Only atoms within the INTERFACE distance of any "buried" atoms (defined in the Connolly sense) are selected for initial surface computation. This parameter does not enter formally into the evaluation of Sc, its purpose is simply to speed up the computation by excluding from consideration atoms remote from the interface. The program in reality computes not the entire surface for the individual molecules, but rather only for the subset of atoms within the INTERFACE distance from the other molecule. A portion of this surface is non-physical, as it is buried with the core of the individual molecule, however its presence does not affect the computation of Sc as it is remote from the interaction. If there is any doubt about the validity of this approach for a particular molecule, the program should be rerun with a larger value for this parameter to ensure that the computation is stable. Subsequently, a periphery band of buried points are removed if they lie within a distance TRIM of any solvent accessible surface points.
Cross-comparison of Sc numbers between proteins (i.e. characterisation of surfaces as more or less complementary than other types of surface) is the main interest in SC. This is only valid if the same values of the critical parameters (probe radius, width of the peripheral band, atomic radii, weighting factor) are used in both computations. To this end it is recommended that the default values for the PROBE_RADIUS, TRIM width and the atomic radii set in the sc_radii.lib file should be used, so that the results will be comparable with other literature values.
The program includes a modified version of Michael Connolly's subroutine "mds" for calculating molecular surfaces; the original code can be obtained from his website at http://www.biohedron.com. The version contained in SC is provided here with the consent of Michael Connolly. The modifications include a minor bug fix, and use of the CCP4 library routines for exiting on fatal errors (``CCPERR'') and for calculating vector products (``CROSS'').
Sc itself cannot be computed satisfactorily within GRASP, as GRASP uses a rather different approach to surface definition. However qualitative display of the weighted normal products S(A->B) is possible - this is achieved by a simple mapping of this value from the one surface to the other.
There are however some limits to SC's interaction with GRASP. See the NOTES ON GRASP FILES below.
To the best of our knowledge, GRASP is only available for Silicon Graphics machines, and since the surface files it produces contain unformatted data these files are not generally portable to other systems, e.g. Digital Alphas.
SC will make a check on the compatibility of input surface files before trying to read them in. In cases where it detects a problem, the files will not be read in, no merging will be performed, and no output surface files will be generated. In these cases, if GRASP output is required it will be necessary to run SC on another machine which has compatible conventions for reading and writing unformatted data.
There have been some reports of bugs in GRASP 1.3.6 which have caused problems with the GRASP output from SC. Please let us know if you experience problems which might be due to such bugs.
It will be necessary to edit the radii file used by the program, if your input file contains atoms which are not in the file already. It is not recommended that you change the values of radii already in the file, as this will compromise comparison of your calculated Sc values with values used in the literature.
Each entry in the file is a single line with three fields separated by spaces, of the format:
Residue_name Atom_name Radius
Either of the name fields can contain one or more wildcards (i.e. the asterisk character '*') to match to multiple residues or atoms, e.g. O* will match to O1, O2 etc. Unidentified residue/atom combinations will cause the program to stop.
The default radii file is sc_radii.lib in $CLIBD; to use a modified radii file in a different directory, assign the filename and path via the SCRADII logical name.
It is essential to remove ALL multiple conformations from the input PDB file (XYZIN). If multiple conformations are present in the file then the program may terminate with an message ERROR IN CHAIN CARD (from the CCP4 libraries) - in which case it is recommended that you check that there are no remaining multiple confirmations.
There also appear to be problems with H atoms in XYZIN. The program may stop with error message "SC: imaginary contain". Stripping H atoms from XYZIN seems to cure it. It is not known how general this problem is, nor why it occurs.
If these problems persist, then please report it to CCP4.
Version 2.0
Copyright Michael Lawrence,
Biomolecular Research Institute,
343 Royal Parade Parkville Victoria Australia