POLYPOSE (CCP4: Deprecated Program)
NAME
polypose
- a program for superimposing many multi-domain structures
SYNOPSIS
polypose xyzin1 xyzin2 ... outdir xyzout1 xyzout2 ...
foo.dat
[Keyworded input]
DESCRIPTION
This program originating from R. Diamond is a program to superimpose several
multi-domained structures. This is done in an optimal way by minimising the
residuals between (n*(n-1)/2) pairwise comparisons. One of the molecules can
be fixed and the others rotated to that orientation. Alternatively, the
structures can be oriented so their longest dimensions are along X. Each
domain is fitted to the equivalent domains in the other structures.
EACH DOMAIN MUST CONTAIN THE SAME NUMBER OF ATOMS.
The program expects coordinates in Brookhaven format and will only read the
`ATOM' cards. Care must be taken to ensure that each file has the same axis
definitions (NCODE number).
Transformed coordinates can be written out with optionally an average
structure. The output files will also be in PDB format with the same axis
system as the input files. There are coordinates for each domain fitted, for
each structure. The coordinates used for fitting the structure are given as
REMARKs followed by the whole structure.
All the equations referred to in this documentation come from
reference [1].
KEYWORDED INPUT
The program has been parameterised and the current defaults are: maximum
number of molecules = 10, maximum number of sub-domains/domains = 60, maximum
number of atoms in molecule = 1200, maximum number of atoms in a domain = 2000.
The permitted data control statements in the form of keywords are listed below:
CHECK, COMBINE,
END, FIX,
INCLUDE, INDEPENDENT,
INPUT, MAXCYCLE,
OUTPUT, TERMINATE
MAXCYCLE <num>
This determines the maximum number of cycles to achieve the best fit between
the structures. Note that it can be set to 1 if there are just two structures
or the structures are of the same shape. (default=10)
INDEPENDENT
If present this causes the residual (equation R0 #40 and #41) to be calculated
from N(N-1)/2 distinct and independent orientations. If the card is not
present then R0 is not calculated.
INPUT [ CA | ALL ]
The program will either work with all the atoms or just the C-alpha atoms
(default) in a residue. The number of atoms in a domain MUST be equal between
molecules.
INCLUDE <res1> TO <res2> FILE <num>
These define sub-domains within the <num>-th molecule. The ordering of
the cards relates to the sub-domain number. Each molecule must contain
the same number of sub-domains. If absent all the residues
in the molecule are included.
e.g. INCL 1 to 10 file 4
INCL A1 to B10 file 1
COMBINE <num1> <num2> ... <numN>
This combines sub-domains into a domain (applies to all molecules).
Sub-domains can be included several
times which has the effect of weighting the atoms in that sub-domain. If there
is no COMBINE card then each sub-domain is treated as a domain. Sub-domains
not mentioned in the COMBINE cards are treated as domains. If the number of
atoms exceeds the maximum permitted in a domain then an error will be given.
Note that the atoms in a domain are paired off in order between molecules.
e.g. if there are 6 sub-domains
COMB 1 2 3
COMB 2 4
COMB 2 6
gives
1st domain is 1 2 3
2nd domain is 2 4
3rd domain is 2 6
4th domain is 5
and
COMB 1 2 2 3
gives
1st domain 1 2 3 atoms of the second sub-domain have double weight
2nd domain 4
3rd domain 5
4th domain 6
TERMINATE <crit>
This defines when the refinement will stop before reaching MAXCYCLE. The
criterion is either when:
SUM [ SIN{DP/2}**2 ] < <crit>
DP = angular shift in orientation.
or
the reduction in rms in the cycle < <crit> * rms
Default <crit>=0.00001.
OUTPUT [ MATRIX | COORDS | AVERAGE ]
Defines what output will be given from the program. MATRIX will just output
the orientation matrices. COORDS will give the transformed coordinates (.rot),
the atoms used to fit the structure will be given as REMARKS in the PDB file
followed by the whole structure. The calculations will be done per domain per
molecule. AVERAGE will produce an average structure (.ave) based on the
orientation calculated for each domain. Each option is progressive from MATRIX
to AVERAGE. Thus OUTPUT AVER will effectively include the other options.
CHECK
If the keyword is present then the program will check residue and atom name.
The program will terminate if inconsistencies are found between molecules.
FIX <num>
The orientation of the <num>-th molecule will be fixed and the other
structures will be fitted to this molecule. If this card is not
present then the longest axes of the molecules will be aligned along
x.
END
Terminate input.
INPUT AND OUTPUT FILES
The input PDB files are assigned to logical names XYZIN1 XYZIN2 etc. The
input keywords must use the same numbering, e.g. domains specified for 'file 1'
should refer to XYZIN1, FIX 2 should refer to the molecule in XYZIN2, and so
on. However, if the numbering used is not consecutive, the program will
renumber the files, e.g. if XYZIN1, XYZIN2 and XYZIN4 are specified then
the latter becomes structure 3.
The output files are:
- a)
-
optionally transformed coordinates <name>d<nn>.rot, where <nn> is the domain
number. The root <name> can be assigned through the logical name XYZOUT1,
XYZOUT2 etc. If the logical names are undefined then the root name of the
input file is taken. If the logical OUTDIR is defined all coordinate files
will be sent to this directory.
- b)
-
optionally average coordinates <name>d<nn>.ave, where <name> is taken
from the fixed coordinates or the first file (defined in a similar way to the
transformed coordinates) and <nn> is the domain number.
EXAMPLES
polypose XYZIN1 dtk1.pdb XYZIN2 dtk2.pdb OUTDIR /scr1/acr/ XYZOUT1 jnk1 << +
maxcycle 3
input all
indep
include 1 to 10 file 1
include 1 to 10 file 2
include 11 to 20 file 1
include 11 to 20 file 2
combine 1 2
output average
check
fix 1
+
The output files would be:
/scr1/acr/jnk1d01.rot
/scr1/acr/dtk2d01.rot
/scr1/acr/jnk1d01.ave
PRINTER OUTPUT
Rms differences are shown between all structures pairwise. Also, the summation
of these differences is shown (R1 equation #42). R1 can be compared with R0
(equation #40 and #41) which represents the sum of the rms residuals over the
best possible fit between two molecules calculated from all possible pairs. R0
is the lowest rms achievable but can only be recognised in practice when all
the structures are of the same shape or there are only two structures.
AUTHOR
R. Diamond
MRC Laboratory for Molecular Biology
Hills Road
Cambridge CB2 2QH
England
REFERENCES
-
R. Diamond, Protein Science, 1, 1279-1287 (1992)