For a complete description of the PDB file see the PDB guide - Format Description Version 2.3. Here only short descriptions of the records used by REFMAC are given. REFMAC will use MODRES, SSBOND, LINK and CISPEP records to define restraints used in refinement; this is dependent on the MAKE_restraints keyword input. There are some CCP4-specific extensions to the standard definitions, which are shown in red below.
Note that PDB is a formatted file, so care should be taken when edited manually. Both the order of the records, and the placing of characters in the correct column within a record, is important. The easiest way to enter restraints or review and edit the restraints in a file, is using the CCP4I task 'Edit Restraints in PDB File'.
When the program reads a PDB file, it uses the following records from PDB:
This record defines cell dimensions and space group symmetry corresponding to this crystal.
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------- 1 - 6 Record name "CRYST1" 7 - 15 Real(9.3) a a (Angstroms). 16 - 24 Real(9.3) b b (Angstroms). 25 - 33 Real(9.3) c c (Angstroms). 34 - 40 Real(7.2) alpha alpha (degrees). 41 - 47 Real(7.2) beta beta (degrees). 48 - 54 Real(7.2) gamma gamma (degrees). 56 - 66 LString sGroup Space group.
Example:
CRYST1 76.560 55.400 84.650 90.00 116.53 90.00 P 1 21 1
MODRES is mainly used to avoid the 3 letter limitation of the pdb residue names. Using this record one can change residue names for longer name (maximum 8 character) which is present in the dictionary file. It could also be used for any other modifications described in the dictionary.
Example:
1234567890123456789012345678901234567890123456789012345678901234567890123456789 MODRES DTT A 950 DTT_oxd RENAME
It means that residue number 950 of chain A is DTT in pdb but it should be interpreted as DTT_oxd which is present in dictionary (either supplied by us or created by user).
Note that as all pdb records it is formatted also. Maximum length for renamed residue is 8 characters.
The SCALEn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates as contained in the entry to fractional crystallographic coordinates. Non-standard coordinate systems should be explained in the remarks.
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------- 1 - 6 Record name "SCALEn" n=1, 2, or 3 11 - 20 Real(10.6) s[n][1] Sn1 21 - 30 Real(10.6) s[n][2] Sn2 31 - 40 Real(10.6) s[n][3] Sn3 46 - 55 Real(10.5) u[n] Un
If vector a, vector b, vector c describe the crystallographic cell edges, and vector A, vector B, vector C are unit cell vectors in the default orthogonal Angstroms system, then vector A, vector B, vector C and vector a, vector b, vector c have the same origin; vector A is parallel to vector a, vector B is parallel to vector C times vector A, and vector C is parallel to vector a times vector b (i.e. vector c*).
xfrac = S11X + S12Y + S13Z + U1 yfrac = S21X + S22Y + S23Z + U2 zfrac = S31X + S32Y + S33Z + U3
The SSBOND record identifies each disulfide bond in protein and polypeptide structures by identifying the two residues involved in the bond.
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------------------- 1 - 6 Record name "SSBOND" 8 - 10 Integer serNum Serial number. 12 - 14 LString(3) "CYS" Residue name. 16 Character chainID1 Chain identifier. 18 - 21 Integer seqNum1 Residue sequence number. 22 AChar icode1 Insertion code. 26 - 28 LString(3) "CYS" Residue name. 30 Character chainID2 Chain identifier. 32 - 35 Integer seqNum2 Residue sequence number. 36 AChar icode2 Insertion code. 60 - 65 SymOP sym1 Symmetry operator for 1st residue. 67 - 72 SymOP sym2 Symmetry operator for 2nd residue.
The LINK records specify connectivity between residues that is not implied by the primary structure. Connectivity is expressed in terms of the atom names. This record supplements information given in CONECT records and is provided here for convenience in searching.
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "LINK " 13 - 16 Atom name1 Atom name. 17 Character altLoc1 Alternate location indicator. 18 - 20 Residue name resName1 Residue name. 22 Character chainID1 Chain identifier. 23 - 26 Integer resSeq1 Residue sequence number. 27 AChar iCode1 Insertion code. 43 - 46 Atom name2 Atom name. 47 Character altLoc2 Alternate location indicator. 48 - 50 Residue name resName2 Residue name. 52 Character chainID2 Chain identifier. 53 - 56 Integer resSeq2 Residue sequence number. 57 AChar iCode2 Insertion code. 60 - 65 SymOP sym1 Symmetry operator for 1st atom. 67 - 72 SymOP sym2 Symmetry operator for 2nd atom. 73 - 80 LinkID linkid Cross-reference to LINK definition in CCP4 libraries
CISPEP records specify the prolines and other peptides found to be in the cis conformation. This record replaces the use of footnote records to list cis peptides.
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------- 1 - 6 Record name "CISPEP" 8 - 10 Integer serNum Record serial number. 12 - 14 LString(3) pep1 Residue name. 16 Character chainID1 Chain identifier. 18 - 21 Integer seqNum1 Residue sequence number. 22 AChar icode1 Insertion code. 26 - 28 LString(3) pep2 Residue name. 30 Character chainID2 Chain identifier. 32 - 35 Integer seqNum2 Residue sequence number. 36 AChar icode2 Insertion code. 44 - 46 Integer modNum Identifies the specific model. 54 - 59 Real(6.2) measure Measure of the angle in degrees.
The ATOM records present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional.
COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 73 - 76 LString(4) segID Segment identifier, left-justified. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.
The ANISOU records present the anisotropic temperature factors.
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------------- 1 - 6 Record name "ANISOU" 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Insertion code. 29 - 35 Integer u[0][0] U(1,1) 36 - 42 Integer u[1][1] U(2,2) 43 - 49 Integer u[2][2] U(3,3) 50 - 56 Integer u[0][1] U(1,2) 57 - 63 Integer u[0][2] U(1,3) 64 - 70 Integer u[1][2] U(2,3) 73 - 76 LString(4) segID Segment identifier, left-justified. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.
termination record. More information to follow.