ACEDRG (CCP4: Supported Program)

NAME

acedrg
 -A stereo-chemical description generator for ligands

SYNOPSIS

acedrg -h

acedrg -c (or --mmcif=) input_mmcif_file -o (or --out=) name_root_for_output_files -r (or --res= ) output_short_monomer_name(optional)

acedrg -i (or --smi= ) input_file_containing_a_SMILES_string -o (or --out=) name_root_for_your_output_files -r (or --res= ) output_short_monomer_name(optional)

acedrg -m (or --mol= ) input_mol_file -o (or --out=) name_root_for_output_files -r (or --res= ) output_short_monomer_name(optional)

acedrg -g (or --mol2=) input_mol2_file -o (or --out=) name_root_for_output_files -r (or --res= ) output_short_monomer_name(optional)

acedrg -x (or --pdb=) input_pdb_file -o (or --out=) name_root_for_output_files -r (or --res= ) output_short_monomer_name(optional)

acedrg -L (or --linkInstruction=) instruction_file_for_build_covalent-links (txt format) -o (or --out=) name_root_for_output_files -r (or --res= ) output_short_monomer_name(optional)


Description
Input and output files
Usuage
Keyworded input
References
Authors and credits
How to cite ACEDRG

DESCRIPTION

The program ACEDRG is designed for the derivation of stereo-chemical information about monomers/ligands (or small molecules). It uses local chemical and topological environment-based atom typing to organise bond lengths and angles from a small molecule database i.e. the Crystallography Open Database (COD). Information about hybridisation states of atoms, small ring belongingness (up to seven membered rings), ring aromaticity and nearest-neighbour information is encoded in the atom types. All atoms from COD have been classified according to the generated atom types. All bonds and angles have also been classified according to the atom types, and, in a certain sense, bond types.

Using the tables containing those bonds and angles, ACEDRG can derive ideal bond lengths, angles for an unknown monomer/ligand. It also generates information on plane and stereo-chemical properties in the monomer/ligand. The minumum information Acedrg requires the users provide for is element types of atoms iin the monomer/ligand, and the basic bonding pattern in the monomer/ligand, such as atom connnections and bond-orders. Of course, users can provide some extra information such as coordinates of atoms, properties of existing chiral-centers, and ask Acedrg to use those information.

When users want to join two monomers/ligands by covalengtly bonding one atom in one monomer/ligand to that in the other monomer/ligands. The descriptions of the effects from that bonding can be provided for via the running covalent-link generation mode in Acedrg. Once jobs finish succesfully, Acedrg gives (1) information on the link, i.e. the bonds, angles and torsions that involve both atoms which newly joined, (2) information on modifications to two input monomers/ligands. The latter consists of changes of bonds, angles, torsions, chiral centers and planes in those two monomers, all of which are in an output file of mmCif format. To get the information on the link and modifications to the original monomers/ligands, users need to give some instructions for operations to Acedrg. Those instructions are included in a .txt file and input to Acedrg as a command-line argument. The format of an input instruction file to and the output mmCif file from ACEDRG, and some examples are shown in the following sections.

INPUT AND OUTPUT FILES

When used to generate a full descriptioon of a monomer/ligand, Acedrg takes input files from some of of the computational chemistry file formats, which include SMILES, mmCIF, SDF/MOL, and SYBYL MOL2 files. It outputs ACEDRG-derived ideal bond lengths, angles, plane groups, aromatic rings and chirality information, and writes them to an file of mmCif format that can be used by the refinement programs and model building programs. It also outputs coordinator sets of the ligands in form of PDB files.

A instruction file as an input file is required for running Acedrg to get information on covalent-links and the resulting modifications to monomers/ligands.

Input

SMILES
a typographical line notation for specifying chemical structure
  1. "C1=CC=CC(CCC2)=C12", which can be feed into command-lines 
  2. a_smiles.smi, which contains the above SMILES string and can be feed into command-lines 
MDL MOL File
A MDL Molfile is a file format created by MDL and now owned by Symyx. It contains information about the atoms, bonds, connectivity and coordinates of a molecule. It also includes some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information
mmCif File
The macromolecular Crystallographic Information File (mmCIF) is a extension from Crystallographic Information File (CIF) data representation.
Mol2 file
A MOL2 (.mol2) is a flexible representation of molecules, containing atom coordinates, bonds, substructure information.

Output

Execution of ACEDRG in different modes will result in different output files. One of the modes used most frequently is to generate ligand descriptions. Two output files will be provided from ACEDRG if the job finishes successfully.

(a) From the mode to generate ligand descriptions:
Two output files will be provided from this mode if the job finishes successfully.

Usuage

  1. Generate descriptions of a ligand with input files of different formats
  2. SMILES
    Generate descriptions of a ligand when the input file is a SMILES string
    1.  
                 acedrg -i "C1=CC=CC(CCC2)=C12"  -o my_ligand
             When the job finishes, you will see two output files, my_ligand.cif and my_ligand.pdb.
                    
    2.  
                 acedrg -i my_ligand.smi  -o my_ligand
             The file, my_ligand.smi, contains a SMILES string, such as C1=CC=CC(CCC2)=C12. 
             Again, the output files are my_ligand.cif and my_ligand.pdb.
                    
    MMCIF
    Generate descriptions of a ligand when the input file is of mmcif format
               acedrg -c my_ligand.cif  -o my_ligand_fromAcedrg   
           When the job finishes, you will see two output files, my_ligand_fromAcedrg.cif 
           and my_ligand.pdb. The difference between my_ligand.cif and my_ligand_fromAcedrg.cif
           is that the latter contain more detailed stereo-chemical information.
               
    MDL MOL
    Generate descriptions of a ligand when the input file is of mdl/mol format
               acedrg -m my_ligand.mol  -o my_ligand  
           When the job finishes, you will see two output files, my_ligand.cif and my_ligand.pdb.
               
    SYBL MOL2
    Generate descriptions of a ligand when the input file is of SYBL mol2 (.mol2) format
               acedrg -g my_ligand.mol2  -o my_ligand       
           When the job finishes, you will see two output files, my_ligand.cif and my_ligand.pdb.
               
    Other options that may help
    Use the following options to achieve different purposes
    1.            acedrg -c my_ligand.cif  -o my_ligand_fromAcedrg -p   
             When option -p is used, acedrg will use the coordinates of atoms in my_ligand.cif as 
             the initial coordinates for optimization. 
                    
    2.            acedrg -m my_ligand.mol  -o my_ligand  -K (upper case)     
             Acedrg will keep the original protonation/deprotonation states in the input file, my_ligand.mol
                   

  3. Generate a description of a linker between two atoms in two different ligands
    1. Generating a linker involves a few steps and needs an instruction file to tell acedrg how to do it
      1. Create an instruction file, e.g. my_instructions.txt
      2. Run aceDRG in a command line, using the instruction file created in step (a), e.g.
               acedrg -L my_instructions.txt -o my_linker
                          
      3. The output file, e.g. my_linker.cif, contains the detailed information of the linker.
    2. How to create an instruction file.
      1. The instruction file always begins with "LINK:"
      2. Keywords:
        1. The required keywords :
          • RES-NAME-1
          • ATOM-NAME-1
          • FILE-1
          • RES-NAME-2
          • ATOM-NAME-2
          • FILE-2
        2. The optional keywords :
          • DELETE
          • CHANGE
          • BOND-TYPE
        3. The cases for the keywords do not matter. ATOM-NAME-1 and aToM-NamE-1 have the same effect.
      3. Details and examples:
        • All contents in an instruction file should be in one line
                 LINK: RES-NAME-1 2OP FILE-1 2OP_acedrg.cif ATOM-NAME-1  C  RES-NAME-2 VAL ATOM-NAME-2  N
                                   
        • keywords RES-NAME-1 and RES-NAME-2 should be followed by two residue names, e.g. 2OP and VAL
        • keywords ATOM-NAME-1 and ATOM-NAME-2 should be followed by two atom names, e.g. C and N. These are two atoms to be linked.
        • keyword FILE-1 and FILE-2 should be followed by a mmcif file name which provides detailed information on the ligand. These two keywords are optional depending on what types of ligands to be linked.
          1. If the ligand/monomer is an amino acid such as VAL in the above example, you do not need to use these keywords.
          2. You can generate the mmcif file of the ligand/monomer, e.g. 2OP, by running aceDRG, as shown below:
            1. Get the mmcif file, 2OP.cif, from PDB.
            2. Run acedrg using 2OP.cif as an input file.
                     acedrg -c 2OP.cif -o 2OP_acedrg  -p(optional)
                                                                 
            3. Put the generated mmcif file, 2OP_acedrg.cif, into the instruction file as showed in the example instruction file.
        • Some keywords can be used to perform further actions on the linker
          • The keyword, DELETE, can be used to delete the following properties in one of the ligands/monomers.
            1. An atom in one of the ligands/monomers. e.g.
                  LINK: RES-NAME-1 2OP FILE-1 2OP_acedrg.cif ATOM-NAME-1 C RES-NAME-2 VAL ATOM-NAME-2 N DELETE ATOM OXT 1
                  which means that atom, OXT, in ligand 1, i.e. 2OP will be deleted when the linker is generated.
                                                             
            2. A bond, e.g. in one of the ligands/monomers. e.g.
                  LINK: RES-NAME-1 CYS ATOM-NAME-1 SG RES-NAME-2 TMP FILE-2 TMP.cif ATOM-NAME-2 C1 DELETE BOND C1 C2 2
                  which means that the bond order between C1 and C2 in ligand 2, i.e. TMP will be deleted.
                                                             
          • The keyword, CHANGE, can be used to change the following properties in one of the ligands/monomers.
            1. A bond, e.g. between atom C1 and C2 in Residue TMP.
                  LINK: RES-NAME-1 CYS ATOM-NAME-1 SG RES-NAME-2 TMP FILE-2 TMP.cif ATOM-NAME-2 C1 CHANGE BOND C1 C2 SINGLE 2
                  which means that the bond order between C1 and C2 in ligand 2, i.e. TMP will be changed from orignal double into single.
                                                             
            2. A formal charge on one of atoms in a residue/monomer. The keywords "CHANGE CHARGE" are followed by the residue number, atom name, and the value(a positive or negetive integer). For example:
                  LINK: RES-NAME-1 TYR ATOM-NAME-1  CE1  RES-NAME-2 MET ATOM-NAME-2 SD CHANGE CHARGE 2 SD 1
                  which means that the formal charge on atom SD in residue 2, i.e. MET will be changed into 1. 
                  Note: formal charges are always integers. 
                                                             
                  LINK: RES-NAME-1 BO2 ATOM-NAME-1 B26 RES-NAME-2 THR ATOM-NAME-2 OG1 CHANGE CHARGE 1 B26 1
                  which means that the formal charge on atom B26 in residue 1, i.e. BO2 will be changed into 1. 
                                                             
          • The keyword, BOND-TYPE, can be used to define the bond order between atoms linked. By default, the bond order between two atoms linked together is single.
                LINK: RES-NAME-1 LYS ATOM-NAME-1 NZ RES-NAME-2 PLP FILE-2 PLP_acedrg.cif ATOM-NAME-2 C4A BOND-TYPE DOUBLE DELETE ATOM O4A 2 
                which means that the bond between NZ in LYS and C4A in PLP_acedrg.cif, will be a bond order of double.
                                            

  4. Generate a description about the modification of a ligands (or Modification for short)
    1. Generating a Modification is similar to generating a link. The steps are:
      1. Create an instruction file, e.g. my_instructions.txt, which tells acedrg how to do it
      2. Run aceDRG in a command line, using the instruction file created in step (a), e.g.
               acedrg -L my_instructions.txt -o my_modification
                          
      3. There are two output cif files if the program finishes. One is the dictionary file for modifid ligand, the other is a cif file showing what modification have done to the ligand. The corresponding files for the above command-line is "my_modification.cif", and "my_modification_modres.cif".
    2. How to create an instruction file.
      1. The instruction file always begins with "MOD:"
      2. Keywords:
        1. The keywords :
          • RES-NAME
          • FILE
          • ADD
          • DELETE
          • CHANGE
          • ATOM
          • BOND
        2. The cases for the keywords do not matter. ATOM and aToM have the same effect.
      3. Values following keywords:
        1. All contents in an instruction file should be in one line
        2. The keywords "RES-NAME" is followed by the name of the original ligand, e.g.
           RES-NAME HIS 
        3. The keywords ADD, DELETE, CHANGE are always followed by ATOM or BOND or CHARGE.
        4. To add an atom, using "ADD ATOM " followed by atom name, element and charge, e.g.
           ADD ATOM O1 O 0 
          When adding a non-H atom, you do not need to add the associated H atoms. But there may be a few possiblities. if you would like to sure some H atoms are added. You should put those atoms in the instruction file as shown above. See the examples for details.
        5. To delete an atom, using "DELETE ATOM " followed by atom name only, e.g.
           DELETE ATOM O2 
          When deleting an atom, the atoms attched only this deleted atoms will be deleted at the same time. In above example, any H atoms attached to O2 atom will be deleted.
        6. To add a bond, using "ADD BOND " followed by names of two atoms forming the bond, and the bond type, e.g.
           ADD BOND NZ CM SINGLE 
        7. To delete a bond, using "DELETE BOND " followed by names of two atoms forming the bond only, e.g.
           DELETE BOND NZ CM 
        8. To change the charge on an atom, using "CHANGE CHARGE" followed by the atom name and the value of a charge, e.g.
           CHANGE CHARGE ND 1 
          It is not recommended to put CHARGE key in the instruction file. If there is no CHARGE keyword in the instruction file, acedrg will re-calculate the bond-order and charges if necessary.
        9. Acedrg will add or delete H atoms accordingly, and will re-calculate bond-orders and charges if necessary.
      4. Details and examples:
        • Example 1
                 MOD: RES-NAME LYS ADD ATOM CM C 0 ADD BOND NZ CM SINGLE
                                   
          In this example:
          1. the monomer to be modificated is LYS.
          2. the atom CM is added to LYS. It bonds to atom NZ in LYS, the bond order is "SINGLE".
          3. As no other information is provided, acedrg will statisfy the valence requirement of atom CM, three H atoms will be added, each of which of them will bond to CM. That means a Methyl group is added, instead of just one C atom. One of H atom bonding to NZ will be deleted automatically.
        • Example 2
                 MOD: RES-NAME HIS DELETE ATOM HD1 CHANGE CHARGE ND1 0
                                   
          In this example:
          1. the monomer to be modificated is HIS.
          2. the atom HD1 is deleted.
          3. the charge on atom ND1 is changed to be zero
        • Example 3
                 MOD: RES-NAME HIS DELETE ATOM HE2
                                   
          In this example:
          1. the monomer to be modificated is HIS.
          2. the atom HE2 is deleted.
          3. after deleting HE2, valence requirement for NE2 is not satisfied. As there no other condition is proposed, the bond order and charge will be re-calculated
        • Example 4
            MOD: RES-NAME LYS ADD ATOM CM1 C 0  ADD ATOM CM2 C 0 ADD ATOM CM3 C 0 ADD BOND NZ CM1 SINGLE ADD BOND NZ CM2 SINGLE ADD BOND OXT CM3 SINGLE
                                   
          In this example:
          1. the monomer to be modificated is LYS.
          2. three atoms CM1, CM2, CM3 are added into the monomer.
          3. H atoms associated with atoms CM1, CM2, CM3 are added automatically
          4. three bonds are added, all of which are SINGLE.
        • Example 5
            MOD: RES-NAME HY3 ADD ATOM O3 O 0 ADD BOND C4 O3 SINGLE DELETE ATOM O2 ADD ATOM HB3 H 0 ADD BOND HB3 C3 SINGLE
                                   
          In this example:
          1. the monomer to be modificated is HY3.
          2. atoms O3 are added into the monomer together with a bond between C4 and O3
          3. H atom HB3 is specifically instructed to be added. The program will make sure about that
          4. atom O2 and associated H atom HO1 are deleted

REFERENCES

    1. Fei Long, Robert A Nicholls, Paul Emsley, Saulius GraZulis, Andrius Merkys, Antanas Vaitkus and Garib N Murshudov "ACEDRG: A stereo-chemical description generator for ligands" Acta Cryst. (2017), D73, 112-122.
    2. Fei Long, Robert A Nicholls, Paul Emsley, Saulius GraZulis, Andrius Merkys, Antanas Vaitkus and Garib N Murshudov "Validation and extraction of stereochemical information fromsmall molecular databases" Acta Cryst. (2017), D73, 103-111.

AUTHORS AND CREDITS

Fei Long(flong@mrc-lmb.cam.ac.uk) and Garib N Murshudov(garib@mrc-lmb.cam.ac.uk) for most ideas and programming
Robert A Nicholls(nicholls@mrc-lmb.cam.ac.uk) for statistical validations of tables used in ACEDRG
Paul Emsley and Robert A Nicholls for systematic testing

Special thanks to CCP4 core team for work on distribution and people participating Ligand Forum for useful discussions.


HOW TO CITE ACEDRG

The main reference for ACEDRG is:

Fei Long, Robert A Nicholls, Paul Emsley, Saulius GraZulis, Andrius Merkys, Antanas Vaitkus and Garib N Murshudov
"ACEDRG: A stereo-chemical description generator for ligands"
Acta Cryst. (2017), D73, 112-122.

The following reference should also be included when citing ACEDRG because the software makes frequent use of the Cheminformatics Software RDKit and the CCP4 programs REFMAC :

For RDKit cite
RDKit Documentation


For REFMAC cite
Murshudov, G. N., Skubak, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A. , Winn, M. D., Long, F. & Vagin, A. A.
REFMAC5 for the refinement of macromolecular crystal structures
Acta Cryst. (2011), D67, 355-367

SEE ALSO

RDKit REFMAC