crank - automated structure solution pipeline for SAD/MAD or SIRAS data.
Crank [1] is a program to automate macromolecular structure determination for single or multiple-wavelength anomalous diffraction (SAD/MAD) or single isomorphous replacement (SIRAS) experiments. Crank interfaces with various crystallographic programs and is designed to allow both the automation of the structure determination process, but also allow the user to re-run and optmize results, if necessary. Users can start either at the substructure detection or substructure phasing step and can end at any stage after the initial step.
This version of Crank has interfaces to the programs CRUNCH2 [2] and SHELXD [3] for substructure detection, BP3 [4], [5] for substructure phasing, SOLOMON [6], DM [7], SHELXE [8], PARROT, PIRATE [9] and RESOLVE [23] for density modification and RESOLVE [24], BUCCANEER [25] and ARP/wARP [10] for automated model building. ARP/wARP uses REFMAC [11] for iterative refinement. Within REFMAC, either the likelihood function restraining phases via Hendrickson-Lattman coefficients [12] or a multivariate likelihood SAD function [13] is used. To calculate FA values needed for substructure detection, crank interfaces with the programs SHELXC [14] or AFRO [15]. For setting up and preparing files, crank using programs from the CCP4 [16] suite, including SFTOOLS [17] and TRUNCATE [18]. Also, crank uses the Kantardjieff-Rupp algorithm [19] which performs a probabilistic Matthew's coefficient [20] calculation for estimating the the number of monomers in the asymmetric unit. To visualize the results produced by crank, an interface to COOT [26] is also available.
Crank can be run using its CCP4i [21] interface or via script using the program GCX [22]. Crank's only dependency to produce a density modified map is a licenced CCP4 version 5.99.x or later. If you would like to use the SHELX [13] programs, ARP/wARP [10], RESOLVE [23], [24] and/or BUCCANEER [25] within crank, you must have it installed on your system with the appropriate licence. If these programs do not appear in your path, they will not appear as options in the ccp4i interface.
Crank can be run either through its CCP4i interface or via script using the program GCX. Currently, the CCP4i interface has more options available. To see how to run crank via GCX, please consult the program's documentation available in the crank's distribution subdirectory programs/gcx/doc. To start the Crank CCP4i interface type the command:
Then, using the main CCP4i menu on the far left hand side of the interface, select "Experimental Phasing", then select "Automated Search & Phasing" and see "Crank - automated EP pipeline" (or within the "Program List", scroll down to "Crank" and click it).
Below, descriptions of the crank CCP4i fields are given.
A short descriptive title for the experiment to appear in the CCP4i task window
Select your experiment between Single wavelength anomalous diffraction (SAD), Single isomorphous replacement with anomalous scattering (SIRAS), (Two, Three or Four wavelength MAD) 2W-MAD, 3W-MAD, or (Two, Three or Four wavelength MAD with native) 2W-MADN or 2W-MADN, 3W-MADN, 4W-MADN.
If you wish to input the protein sequence in pir format to use in automated model building and refinement and estimating the solvent content. Crank will then display the total number of amino acid residues. If you do not wish to input the protein sequence, unclick the button and input the number of protein residues per monomer.
If you have DNA and/or RNA, click on the DNA/RNA button and input the number of nucleotides per monomer.
The name of your input MTZ file. At the moment, this must contain merged intensities or structure factor amplitudes from your experiment.
Choose whether you wish to input Intensities or Structure factor amplitudes.
By default, CRANK will create an R-free flag. Alternatively, you can specify an existing R-free column label present in the MTZ IN file.
The name of the MTZ file that will be outputted. The intermediate steps run by crank may also output MTZ file. See the section on INTERPRETING RESULTS for more information.
You will now have to input information on your substructure atom and the mtz columns for your data.
Give your anomalous or heavy substructure atom. The name must correspond to an atom in CCP4's library file ($CLIBD/atomsf.lib).
Give the number of anomalous/heavy atoms expected per monomer. The total number of substructure atoms looked for (in the asymmetric unit) will be this number multiplied by the number given or obtained in the "number of monomers in the asu" field in the Required parameters section.
If your data was collected at CuKalpha wavelength, click this box and the f' and f" values for your atom will be obtained automatically. If your data was not collected at CuKalpha wavelength, input the f' and f" values. To get the best possible results, please give a reasonable value. If you only have the wavelength and did not measure the values by a florescence scan, you can use the CCP4 program CROSSEC to get an estimate.
Input the mtz columns corresponding to your merged mtz file. If you have a significant anomalous signal, input the anomalous intensities.
Once the number of protein residues and/or nucleotides is given, crank will attempt to guess the solvent content, overall B-factor and number of monomers in the asymmetric unit. Crank obtains a first guess for the solvent content and number of monomers in the asymmetric unit by using the functional form proposed by Kantardjieff and Rupp [19] - these values will be filled in automatically. If you would like to see the Matthew's [20] coefficient, Kantardjieff-Rupp probability, and solvent content corresponding to a different number of monomers, simply input the number of monomers in the box, and re-click on the "Guess Overall B, solvent content..." button and the updated parameters will be shown.
This option allows you to start or end the pipeline at a certain step.
If you choose to start at a step that requires inputting a substructure, and you would like to input a substructure in pdb format, the format of that pdb file should be the following:
HETATM 1 SE HAT 1 25.284 28.195 17.180 1.00 33.96 OR ATOM 1 SE HAT 1 25.284 28.195 17.180 1.00 33.96
The fixed format for the columns agree with the pdb format, but column 3 has to be the name of your substructure that matches an atom in $CLIBD/atomsf.lib. See file gere.pdb in the test sub-directory of the main crank directory for an example.
The section allows you to choose the experimental pipeline you would like to perform. At the moment, five predefined pipelines are available:
Click on this option if you wish to adjust any of the program options. If you click this option, you can remove all programs by clicking the "Clear All" button, and the experimental palette pipeline will be removed. It is possible to construct your own pipeline by selecting the program that you wish to use next with the "Next possible program:" option, followed by the "Add program" button. However, this should only be used by experts who know exactly what they are doing! If you do not wish to run the specified program listed at the end of the crank pipeline, the program can be removed with the "Edit list" and "Delete last item" option. You can also see the flow of information (ie. mtz columns, substructures, etc), by clicking on the "Show all pipeline input columns".
Below, a description of all the crank plugins as well as some of the more important modifiable options is given.
PREP is the crank plugin for using ctruncate [17] to calculate structure factor amplitudes from intensities and using scaleit [15] to relatively scale the data sets together. Most of the options given are self-explanatory.
AFRO is the crank plugin for the AFRO [14] program to calculate FA values needed for substructure detection programs. Some of the important fields that can be modified to improve results in substructure detection are the following:
CRUNCH2 [2] is a substructure detection program using Karle-Hauptmann matrices.
BP3 [4], [5] is a substructure phasing program. The "Fast phasing" option can be toggled on and off if you would like to have a quicker run of BP3. Fast phasing is the default for MAD phasing.
SOLOMON's [6] interface can attempt to determine the correct hand, optimize the solvent content, and perform a density modification run.
DM [7] is a density modification program. If you have a metallo-protein, it is probably optimal to unclick the "Histogram matching" option. Again, the interface to DM can be used to determine the correct hand, optimize the solvent content and, of course, perform a complete density modification run.
PIRATE [9] is a density modification program. The weight to apply to input phases is an option to change if you believe that the phases that you input are bias.
SHELXC [14] is the program to generate files (including FA values) needed for SHELXD and SHELXE.
SHELXD [3] can be used to determine substructures, if the SHELX suite is installed. Important options to consider are the following.
Crank also has an interface for the the density modification program SHELXE [8] which can be used to determine the correct hand, optimize the solvent content and perform a complete density modification run.
ARPWARP is the crank plugin for the automated model building program ARP/wARP [10] with iterative refinement using REFMAC [11]
Crank uses a hierarcherically structured directory system to store all the different runs of the various crystallographic programs. There are two types of directories under the Crank directory hierarchy, directories where crystallographic programs are run, and directories where information is collected.
The directories where programs are run in always start with a number followed by a dash and a program name as in "3-crunch2" or "4-bp3". The first number signifies what part in the pipeline the named program was run.
Inside these directories are where the various crystallographic programs are run and thus contain all the files produced by the run of the given program. These directories are constructed by crank, which first builds a shell script to run the program. This shell script is designed to be as close as possible to the example scripts generated by the program author. Crank copies all the requisite data files for the program into the run directory.
Then, Crank simply executes the shell script that it has built, timing the run and collecting results.
In cases where automated structure solution was not possible or to try to optimize results, obviously, identifying the step (ie. substructure detection or phasing) which either failed, or produced sub-optimal results is the first step. The best place to start to look if a particular step failed is to examine the individual programs documentation. Also, it may be useful to look at the Crank test system to give an indication of statistics from successful jobs and diagnostics (and suggestions) from jobs that failed with default settings.
Last modified: Tue Feb 17 11:39:19 CET 2009