PDBCUR (CCP4: Supported Program)
NAME
pdbcur
- a curation tool providing various analyses and manipulations of PDB files
SYNOPSIS
pdbcur xyzin
foo_in.pdb
xyzout
foo_out.pdb
[Key-worded input file]
DESCRIPTION
pdbcur provides various functions for analysing and manipulating
the contents of PDB files. The program is written using the new MMDB
library for coordinate data, and thus works with a hierarchical view
of the atomic model. This hierarchy is visible for example in the
atom selection syntax used.
The program works on each keyword in turn. Therefore, the behaviour
of a keyword may depend on what keywords have been previously
applied, and how they have updated the model. In complex cases, it
may be better to perform several runs of pdbcur.
INPUT AND OUTPUT FILES
XYZIN
Input coordinate file.
XYZOUT
Output coordinate file.
KEYWORDED INPUT
Summarise the contents of the coordinate file. For each chain in each model,
the following information is listed:
- The total number of residues
- The set of residue ranges
- The number of residues which have alternative conformations
- The amino acid composition
Delete all solvent residues from all chains and all
models. The list of residues identified as "solvent" is
kept in $CCP4/lib/src/mmdb/mmdb_tables.cpp as array
StdSolventName, and is currently "ADE", "CYT", "GUA",
"INO", "THY", "URA", "WAT", "HOH", "TIP", "H2O", "DOD",
"MOH".
If a chain has both
protein and solvent, only solvent gets removed. If a chain
becomes empty after removal of solvent, it gets removed.
Delete all hydrogens in the model. This is simple wrapper
to delatom.
Keep only the most probable alternate conformation, i.e.
that with the highest occupancy, irrespective of ordering
and altID label. Other conformations are deleted from the
model. For the kept atoms, the occupancy is set to 1.0
and the altID removed. Note that the resetting of occupancies
renders a subsequent 'cutocc' keyword ineffective.
If there are two conformations with occupancy of 0.5, then
the conformation with alternate ID <altID> is kept.
If this parameter is not given, it defaults to "A". Note that
this only applies to equal dual conformers - for unequal
conformers, that with the highest occupancy is kept, as explained
above.
This option is useful for switching back and forth between single
and multiple conformers during refinement to check the validity of
the model, and for generating single models for simulation or
modelling.
Delete all atoms with an occupancy less than or equal
to {cutoff}. The default cutoff is 0.0, i.e. specifying
cutocc with no arguments removes all atoms with zero
occupancy.
Removes all the ANISOU records from the coordinate file.
Example: renchain /*/A 'B'
Quotations are optional and are useful for designation 'no chain ID'.
Examples:
- rename A to 'no chain ID': renchain A ''
- rename 'no chain ID' to B: renchain /*// B
Example: renresidue (ALA) 'AL1'
Example: renatom CA[C] ' CC '
Example: renelement CA[C] 'AL'
Deletes the specified model(s).
Example (delete model #1): delmodel /1
Example (delete all models with chain A): delmodel /*/A
Deletes the specified chain(s).
Example (delete chain A in all models): delchain A
Example (delete chain A in 1st model): delchain /1/A
Same as 'delchain', but will delete the enclosing model(s) if they become
empty as a result of operation.
Example (remove chain A in all models): rmchain A
Example (remove chain A in 1st model): rmchain /1/A
Deletes the specified residue(s).
Example (delete residues 33 to 120): delresidue 33-120
Same as 'delresidue', but will delete the enclosing chain(s) and model(s)
if they become empty as a result of operation.
Example (remove residues 33 to 120): rmresidue 33-120
Deletes the specified atom(s).
Example (delete all C-gamma atoms): delatom CG[C]
Same as 'delatom', but will delete the enclosing residue(s), chain(s)
and model(s) if they become empty as a result of operation.
Example (delete all C-gamma atoms): rmatom CG[C]
Deletes atoms within R angstrom of x,y,z.
Example: deldist 32.1 45.6 -0.4 10.0
Leaves the specified model(s), everything else is deleted.
Example (leave only model #1): lvmodel /1
Example (leave all models with chain A): lvmodel /*/A
Leaves the specified chain(s), everything else is deleted.
Example (leave chains A in all models): lvchain A
Example (leave only chain A in 1st model): lvchain /1/A
Leaves the specified residue(s), everything else is deleted.
Example (leave residues 33.A to 120.B): lvresidue 33.A-120.B
Leaves the specified atom(s), everything else is deleted.
Example (leave only C-alpha atoms): lvatom "CA[C]:*"
Note the use of * to get all alternative conformations, since the
default for atom-level selection is a blank alt loc indicator
only, see atom selection syntax.
Leaves atoms within R angstrom of x,y,z; everything else is deleted.
Example: lvdist 32.1 45.6 -0.4 10.0
Writes 'xyzout' as a PDB, mmCIF or MMDB BINary
file. By default, the file is written in the format
of input file.
No parameters; this keyword generates PDB 'TER' cards.
No parameters; this keyword deletes all PDB 'TER' cards.
No parameters; this keyword generates correct atom
serial numbers.
No parameters; moves solvent chains to the end of models.
Input of the space group symmetry name, e.g. 'P 21 21 21'
(without quotation marks, spaces _are_ significant, case sensitive).
This parameter is mandatory if coordinate file does not
specify the space group symmetry.
Input of the unit cell dimensions (space-separated
real numbers). This parameter is mandatory if coordinate
file does not specify the cell parameters.
Generating a unit cell as defined by crystallographic
information given in coordinate file or set up with
keywords 'symmetry' and 'geometry'. Chains generated
by identity operation retain their names, all other
are renamed as c_n, where c is the chain's original
name, and n is the number of symmetry operation in
the space group used (starting from 0 for identity
operation on). In order to comply with PDB standards,
the chains are then to be renamed using renchain
command, e.g. renchain A_2 H . The chains may be
assigned automatically generated 1-character names
using the command mkchainIDs .
Example: rnase.pdb contains 2 chains A and B.
Generate a unit cell, space group P 21 21 21, 4
symmetry operations, and assign chain IDs C,D,E for
chain A transformed by operations #1,2,3, and IDs
F,G,H for chain B transformed by the same operations.
Chains A and B transformed by 0th operation (identity)
retain their IDs:
pdbcur xyzin rnase.pdb xyzout ucell.pdb <<eof
? symm P 21 21 21
? genu
? renc A_1 C
? renc A_2 D
? renc A_3 E
? renc B_1 F
? renc B_2 G
? renc B_3 H
? eof
Declares (but does not apply) a symmetry operation.
The symmetry operations for each X,Y,Z fractional
coordinates must be written without spaces.
Pairs 'old chain ID' - 'new chain ID' specify how
the chains should be renamed after operation. This
input is not mandatory. If no renaming is specified,
the newly generated chains will be renamed automatically
(see keyword symcommit).
Example: symop Y+1/2,X-1/2,Z A S B R
(declare symmetry transformation x=Y+1/2, y=X-1/2, z=Z
with renaming chain A to S and B to R.
No parameters.
Applies all symmetry operations declared since
last symcommit statement. First operation (normally
identity) will be applied to the existing set of
coordinates, all other will be applied to the
duplicates of the coordinates, and the results
are merged.
The newly generated chains are named as C_n,
where C is the original chain name, and n is the
symmetry operation number. Symmetry operations
are numbered as they appear in symop statements,
from 0 on; however the very first one is applied
to the existing chains, which are not renamed in
this case.
Example:
pdbcur xyzin rnase.pdb xyzout rnase1.pdb <<eof
? symop X,Y,Z
? symop Y+1/2,X-1/2,Z
? symcommit
? eof
just adds two chains named A_1 and B_1, obtained
according to the rule Y+1/2,X-1/2,Z from chains
A and B, to the existing file.
Automatically generates 1-character chain IDs after
applying symmetry operations. The IDs are generated
such that they use all available letters starting
from A, and a chain is not renamed if its name is
already a 1-character one.
The following example
pdbcur xyzin rnase.pdb xyzout ucell.pdb <<eof
? symm P 21 21 21
? genu
? mkch
? eof
produces exactly the same result as that given for
keyword GENUNIT, because the original chains are named
sequentially as A,B (not G,I, for example).
Euler rotation of selected atoms through angles alpha,
beta and gamma (degrees) as applied to the initial
Z-axis, new Y-axis and newest Z-axis, correspondingly.
The rotation center is given by either orthogonal
coordinates x, y and z or by keyword 'center' for
specifying the mass center of the selected atoms.
Examples:
1. 90-degree rotation of chain A about Z-axis in
original coordinate system:
rotate A 90 0 0 0 0 0
2. 60-degree rotation of chains A and B about Y-axis
in the coordinate system of their mass center:
rotate 'A,B' 0 60 0 center
Translate selected atoms through tx ty tz, which can be
in fractional or orthogonal coordinates, depending on
subkeyword 'frac' or 'orth'.
Rotation of selected atoms through angle alpha (degrees)
about a vector given by direction (vx,vy,vz) from the
rotation center (given as x,y,z or by keyword 'center'
for the mass center of the selected atoms). The vector
may also be specified by two atoms atom1 and atom2
represented in the mmdb selection notation.
Examples:
1. 90-degree rotation of chain A about Z-axis in
original coordinate system:
vrotate A 90 0 0 1 0 0 0
2. 60-degree rotation of chains A and B about Y-axis
in the coordinate system of their mass center:
vrotate 'A,B' 60 0 1 0 center
3. 45-degree rotation of all atoms about vector connecting
C-alpha atoms of residues 20.A of chain A and 55
of chain B:
vrotate /*/*/*/* 45 /1/A/20.A/CA[C] /1/B/55/CA[C]
or, if there is only one model in the PDB file:
vrotate * 45 A/20.A/CA[C] B/55/CA[C]
Specification of the selection sets:
- either
- /mdl/chn/s1.i1-s2.i2/at[el]:aloc
- or
- /mdl/chn/*(res).ic/at[el]:aloc
where no spaces are allowed. The slashes separate the
hierarchical levels of models, chains, residues and atoms.
Notations:
mdl - the model's serial number or 0 or '*' for any model
(default).
chn - the chain ID or list of chain IDs like 'A,B,C' or
'*' for any chain (default).
s1,s2 - the starting and ending residue sequence numbers
or '*' for any sequence number (default).
i1,i2 - the residues insertion codes or '*' for any
insertion code. If the sequence number other than
'*' is specified, then insertion code defaults to ""
(no insertion code), otherwise the default is '*'.
res - residue name or list of residue names like 'ALA,SER'
or '*' for any residue name (default)
at - atom name or list of atom names like 'CA,N1,O' or
'*' for any atom name (default)
el - chemical element name or list of chemical element
names like 'C,N,O', or '*' for any chemical element
name (default)
aloc - the alternative location indicator or list of
alternative locations like 'A,B,C', or '*' for any
alternate location. If the atom name and chemical
element name is specified (both may be '*'), then
the alternative location indicator defaults to ""
(no alternate location), otherwise the default is
'*'.
Values for chain IDs, residue names, atom names, chemical element
names and alternative location indicators may be negated by
prefix '!'. For example, '!A,B,C' for the list of chain names
means 'any chain ID but A,B,C'.
Generally, any hierarchical element as well as the selection
code may be omitted, in which case it is replaced for
default (see above). This makes the following examples valid:
* select all atoms
/1 select all atoms in model 1
A,B select all atoms in chains A and B in
all models
/*/1,2 select all atoms in chains 1 and 2 in
all models. Note that you must use this
format with numerical chain identifiers
/1// select all atoms in chain without chainID
in model 1
/*/,A,B/ select all atoms in chain without chainID,
chain A and B in all models
33-120 select all atoms in residues 33. to 120.
in all chains and models
A/33.A-120.B select all atoms in residues 33.A to
120.B in chain A only, in all models
A/33.-120.A/[C] select all carbons in residues 33. to
120.A in chain A, in all models
CA[C] select all C-alphas in all
models/chains/residues
A//[C] select all carbons in chain A, in all models
(!ALA,SER) select all atoms in any residues but
ALA and SER, in all models/chains
/1/A/(GLU)/CA[C] select all C-alphas in GLU residues of
chain A, model 1
/1/A/*(GLU)./CA[C}: same as above
[C]:,A select all carbons without alternative
location indicator and carbons in alternate
location A
NOTE: if a selection contains comma(s), the selection sentence must
be embraced by quotation marks, which indicate to the input parser that
the sentence is a single input parameter rather than a set of comma-
separated arguments.
PROGRAM OUTPUT
The program currently gives a short summary of the operations carried
out.
EXAMPLES
Runnable example
pdbcur.exam
SEE ALSO
ncont - MMDB application for finding contacts.
pdbset - traditional PDB utility program.
AUTHORS
Eugene Krissinel, European Bioinformatics Institute, Cambridge, UK.
Martyn Winn, Daresbury Laboratory, UK - some additional keywords.