PDBCUR (CCP4: Supported Program)

NAME

pdbcur - a curation tool providing various analyses and manipulations of PDB files

SYNOPSIS

pdbcur xyzin foo_in.pdb xyzout foo_out.pdb
[Key-worded input file]

DESCRIPTION

pdbcur provides various functions for analysing and manipulating the contents of PDB files. The program is written using the new MMDB library for coordinate data, and thus works with a hierarchical view of the atomic model. This hierarchy is visible for example in the atom selection syntax used.

The program works on each keyword in turn. Therefore, the behaviour of a keyword may depend on what keywords have been previously applied, and how they have updated the model. In complex cases, it may be better to perform several runs of pdbcur.

INPUT AND OUTPUT FILES

XYZIN

Input coordinate file.

XYZOUT

Output coordinate file.

KEYWORDED INPUT

summarise

Summarise the contents of the coordinate file. For each chain in each model, the following information is listed:

The total number of residues
The set of residue ranges
The number of residues which have alternative conformations
The amino acid composition

delsolvent

Delete all solvent residues from all chains and all models. The list of residues identified as "solvent" is kept in $CCP4/lib/src/mmdb/mmdb_tables.cpp as array StdSolventName, and is currently "ADE", "CYT", "GUA", "INO", "THY", "URA", "WAT", "HOH", "TIP", "H2O", "DOD", "MOH".

If a chain has both protein and solvent, only solvent gets removed. If a chain becomes empty after removal of solvent, it gets removed.

delhydrogen

Delete all hydrogens in the model. This is simple wrapper to delatom.

mostprob [ <altID> ]

Keep only the most probable alternate conformation, i.e. that with the highest occupancy, irrespective of ordering and altID label. Other conformations are deleted from the model. For the kept atoms, the occupancy is set to 1.0 and the altID removed. Note that the resetting of occupancies renders a subsequent 'cutocc' keyword ineffective.

If there are two conformations with occupancy of 0.5, then the conformation with alternate ID <altID> is kept. If this parameter is not given, it defaults to "A". Note that this only applies to equal dual conformers - for unequal conformers, that with the highest occupancy is kept, as explained above.

This option is useful for switching back and forth between single and multiple conformers during refinement to check the validity of the model, and for generating single models for simulation or modelling.

cutocc {cutoff}

Delete all atoms with an occupancy less than or equal to {cutoff}. The default cutoff is 0.0, i.e. specifying cutocc with no arguments removes all atoms with zero occupancy.

noanisou

Removes all the ANISOU records from the coordinate file.

renchain {selection of chain(s)} 'new chain ID'

Example: renchain /*/A 'B'
Quotations are optional and are useful for designation 'no chain ID'.
Examples:

rename A to 'no chain ID': renchain A ''
rename 'no chain ID' to B: renchain /*// B

renresidue {selection of residue(s)} 'new residue name'

Example: renresidue (ALA) 'AL1'

renatom {selection of atom(s)} 'new 4-letter atom name'

Example: renatom CA[C] ' CC '

renelement {selection of atom(s)} 'new element name'

Example: renelement CA[C] 'AL'

delmodel {selection of model(s)}

Deletes the specified model(s).
Example (delete model #1): delmodel /1
Example (delete all models with chain A): delmodel /*/A

delchain {selection of chain(s)}

Deletes the specified chain(s).
Example (delete chain A in all models): delchain A
Example (delete chain A in 1st model): delchain /1/A

rmchain {selection of chain(s)}

Same as 'delchain', but will delete the enclosing model(s) if they become empty as a result of operation.
Example (remove chain A in all models): rmchain A
Example (remove chain A in 1st model): rmchain /1/A

delresidue {selection of residue(s)}

Deletes the specified residue(s).
Example (delete residues 33 to 120): delresidue 33-120

rmresidue {selection of residue(s)}

Same as 'delresidue', but will delete the enclosing chain(s) and model(s) if they become empty as a result of operation.
Example (remove residues 33 to 120): rmresidue 33-120

delatom {selection of atom(s)}

Deletes the specified atom(s).
Example (delete all C-gamma atoms): delatom CG[C]

rmatom {selection of atom(s)}

Same as 'delatom', but will delete the enclosing residue(s), chain(s) and model(s) if they become empty as a result of operation.
Example (delete all C-gamma atoms): rmatom CG[C]

deldist <x> <y> <z> <R>

Deletes atoms within R angstrom of x,y,z.
Example: deldist 32.1 45.6 -0.4 10.0

lvmodel {selection of model(s)}

Leaves the specified model(s), everything else is deleted.
Example (leave only model #1): lvmodel /1
Example (leave all models with chain A): lvmodel /*/A

lvchain {selection of chain(s)}

Leaves the specified chain(s), everything else is deleted.
Example (leave chains A in all models): lvchain A
Example (leave only chain A in 1st model): lvchain /1/A

lvresidue {selection of residue(s)}

Leaves the specified residue(s), everything else is deleted.
Example (leave residues 33.A to 120.B): lvresidue 33.A-120.B

lvatom {selection of atom(s)}

Leaves the specified atom(s), everything else is deleted.
Example (leave only C-alpha atoms): lvatom "CA[C]:*"
Note the use of * to get all alternative conformations, since the default for atom-level selection is a blank alt loc indicator only, see atom selection syntax.

lvdist <x> <y> <z> <R>

Leaves atoms within R angstrom of x,y,z; everything else is deleted.
Example: lvdist 32.1 45.6 -0.4 10.0

write {PDB|CIF|BIN}

Writes 'xyzout' as a PDB, mmCIF or MMDB BINary file. By default, the file is written in the format of input file.

genter

No parameters; this keyword generates PDB 'TER' cards.

delter

No parameters; this keyword deletes all PDB 'TER' cards.

sernum

No parameters; this keyword generates correct atom serial numbers.

mvsolvent

No parameters; moves solvent chains to the end of models.

symmetry <spgname>

Input of the space group symmetry name, e.g. 'P 21 21 21' (without quotation marks, spaces _are_ significant, case sensitive). This parameter is mandatory if coordinate file does not specify the space group symmetry.

geometry <a> <b> <c> <alpha> <beta> <gamma>

Input of the unit cell dimensions (space-separated real numbers). This parameter is mandatory if coordinate file does not specify the cell parameters.

genunit

Generating a unit cell as defined by crystallographic information given in coordinate file or set up with keywords 'symmetry' and 'geometry'. Chains generated by identity operation retain their names, all other are renamed as c_n, where c is the chain's original name, and n is the number of symmetry operation in the space group used (starting from 0 for identity operation on). In order to comply with PDB standards, the chains are then to be renamed using renchain command, e.g. renchain A_2 H . The chains may be assigned automatically generated 1-character names using the command mkchainIDs .

Example: rnase.pdb contains 2 chains A and B. Generate a unit cell, space group P 21 21 21, 4 symmetry operations, and assign chain IDs C,D,E for chain A transformed by operations #1,2,3, and IDs F,G,H for chain B transformed by the same operations. Chains A and B transformed by 0th operation (identity) retain their IDs:


pdbcur xyzin rnase.pdb xyzout ucell.pdb <<eof
? symm P 21 21 21
? genu
? renc A_1 C
? renc A_2 D
? renc A_3 E
? renc B_1 F
? renc B_2 G
? renc B_3 H
? eof

symop X,Y,Z 'old chain ID' 'new chain ID' 'old ID' 'new ID' ...

Declares (but does not apply) a symmetry operation. The symmetry operations for each X,Y,Z fractional coordinates must be written without spaces. Pairs 'old chain ID' - 'new chain ID' specify how the chains should be renamed after operation. This input is not mandatory. If no renaming is specified, the newly generated chains will be renamed automatically (see keyword symcommit).

Example: symop Y+1/2,X-1/2,Z A S B R
(declare symmetry transformation x=Y+1/2, y=X-1/2, z=Z with renaming chain A to S and B to R.

symcommit

No parameters.
Applies all symmetry operations declared since last symcommit statement. First operation (normally identity) will be applied to the existing set of coordinates, all other will be applied to the duplicates of the coordinates, and the results are merged.

The newly generated chains are named as C_n, where C is the original chain name, and n is the symmetry operation number. Symmetry operations are numbered as they appear in symop statements, from 0 on; however the very first one is applied to the existing chains, which are not renamed in this case.

Example:


pdbcur xyzin rnase.pdb xyzout rnase1.pdb <<eof
? symop  X,Y,Z
? symop  Y+1/2,X-1/2,Z
? symcommit
? eof

just adds two chains named A_1 and B_1, obtained according to the rule Y+1/2,X-1/2,Z from chains A and B, to the existing file.

mkchainIDs

Automatically generates 1-character chain IDs after applying symmetry operations. The IDs are generated such that they use all available letters starting from A, and a chain is not renamed if its name is already a 1-character one.

The following example


pdbcur xyzin rnase.pdb xyzout ucell.pdb <<eof
? symm P 21 21 21
? genu
? mkch
? eof

produces exactly the same result as that given for keyword GENUNIT, because the original chains are named sequentially as A,B (not G,I, for example).

rotate {selection of atoms} alpha beta gamma x y z

rotate {selection of atoms} alpha beta gamma center

Euler rotation of selected atoms through angles alpha, beta and gamma (degrees) as applied to the initial Z-axis, new Y-axis and newest Z-axis, correspondingly. The rotation center is given by either orthogonal coordinates x, y and z or by keyword 'center' for specifying the mass center of the selected atoms.

Examples:


 1. 90-degree rotation of chain A about Z-axis in
    original coordinate system:
    rotate   A   90 0 0   0 0 0
 2. 60-degree rotation of chains A and B about Y-axis
    in the coordinate system of their mass center:
    rotate 'A,B'  0 60 0   center

translate {selection of atoms} frac tx ty tz

translate {selection of atoms} orth tx ty tz

Translate selected atoms through tx ty tz, which can be in fractional or orthogonal coordinates, depending on subkeyword 'frac' or 'orth'.

vrotate {selection of atoms} alpha vx vy vz x y z

vrotate {selection of atoms} alpha vx vy vz center

vrotate {selection of atoms} alpha atom1 atom2

Rotation of selected atoms through angle alpha (degrees) about a vector given by direction (vx,vy,vz) from the rotation center (given as x,y,z or by keyword 'center' for the mass center of the selected atoms). The vector may also be specified by two atoms atom1 and atom2 represented in the mmdb selection notation.

Examples:


 1. 90-degree rotation of chain A about Z-axis in
    original coordinate system:
    vrotate  A   90  0 0 1   0 0 0
 2. 60-degree rotation of chains A and B about Y-axis
    in the coordinate system of their mass center:
    vrotate 'A,B'  60  0 1 0  center
 3. 45-degree rotation of all atoms about vector connecting
    C-alpha atoms of residues 20.A of chain A and 55
    of chain B:
    vrotate /*/*/*/* 45  /1/A/20.A/CA[C] /1/B/55/CA[C]
    or, if there is only one model in the PDB file:
    vrotate *  45  A/20.A/CA[C] B/55/CA[C]

ATOM SELECTION SYNTAX

Specification of the selection sets:

either: /mdl/chn/s1.i1-s2.i2/at[el]:aloc
or: /mdl/chn/*(res).ic/at[el]:aloc

where no spaces are allowed. The slashes separate the hierarchical levels of models, chains, residues and atoms.

Notations:


 mdl   - the model's serial number or 0 or '*' for any model
         (default).
 chn   - the chain ID or list of chain IDs like 'A,B,C' or
         '*' for any chain (default).
 s1,s2 - the starting and ending residue sequence numbers
         or '*' for any sequence number (default).
 i1,i2 - the residues insertion codes or '*' for any
         insertion code. If the sequence number other than  
         '*' is specified, then insertion code defaults to ""
         (no insertion code), otherwise the default is '*'.
 res   - residue name or list of residue names like 'ALA,SER'
         or '*' for any residue name (default)
 at    - atom name or list of atom names like 'CA,N1,O' or
         '*' for any atom name (default)
 el    - chemical element name or list of chemical element
         names like 'C,N,O', or '*' for any chemical element
         name (default)
 aloc  - the alternative location indicator or list of
         alternative locations like 'A,B,C', or '*' for any
         alternate location. If the atom name and chemical
         element name is specified (both may be '*'), then
         the alternative location indicator defaults to ""
         (no alternate location), otherwise the default is
          '*'.

Values for chain IDs, residue names, atom names, chemical element names and alternative location indicators may be negated by prefix '!'. For example, '!A,B,C' for the list of chain names means 'any chain ID but A,B,C'.

Generally, any hierarchical element as well as the selection code may be omitted, in which case it is replaced for default (see above). This makes the following examples valid:


 *                   select all atoms
 /1                  select all atoms in model 1
 A,B                 select all atoms in chains A and B in
                     all models
 /*/1,2              select all atoms in chains 1 and 2 in
                     all models. Note that you must use this 
                     format with numerical chain identifiers
 /1//                select all atoms in chain without chainID
                     in model 1
 /*/,A,B/            select all atoms in chain without chainID,
                     chain A and B in all models
 33-120              select all atoms in residues 33. to 120.
                     in all chains and models
 A/33.A-120.B        select all atoms in residues 33.A to
                     120.B in chain A only, in all models
 A/33.-120.A/[C]     select all carbons in residues 33. to
                     120.A in chain A, in all models
 CA[C]               select all C-alphas in all
                     models/chains/residues
 A//[C]              select all carbons in chain A, in all models
 (!ALA,SER)          select all atoms in any residues but
                     ALA and SER, in all models/chains
 /1/A/(GLU)/CA[C]    select all C-alphas in GLU residues of
                     chain A, model 1
 /1/A/*(GLU)./CA[C}: same as above
 [C]:,A              select all carbons without alternative
                     location indicator and carbons in alternate
                     location A

NOTE: if a selection contains comma(s), the selection sentence must be embraced by quotation marks, which indicate to the input parser that the sentence is a single input parameter rather than a set of comma- separated arguments.

PROGRAM OUTPUT

The program currently gives a short summary of the operations carried out.

EXAMPLES

Runnable example

pdbcur.exam

AUTHORS

Eugene Krissinel, European Bioinformatics Institute, Cambridge, UK.
Martyn Winn, Daresbury Laboratory, UK - some additional keywords.