XIA2 (CCP4: Unsupported Program)

NAME

xia2 - automated data reduction.

SYNOPSIS

xia2 -xinfo project.xinfo

xia2 is a new automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution, e.g. through Mr BUMP or your favourite experimental phasing suite.

The following steps are performed (although not strictly in this order)

There is a lot of fun stuff which goes on in there - run it to find out more...

The system is designed to cope with MAD data and data measured in multiple passes, or a combination of the two. The input is a therefore a little complex, but consists only of information you should have to hand anyway - here is a simple example, a more complex example can be found below.

! This is a demonstration .xinfo file which illustrates how to cope
! with a simple case - this example is a native cubic insulin data
! set measured on 14.2 at the SRS

BEGIN PROJECT DEMONSTRATION

BEGIN CRYSTAL INSULIN

BEGIN AA_SEQUENCE

! this is only really needed at the moment for assessing the solvent
! content and number of residues in the asu

GIVEQCCASVCSLYQLENYCN
FVNQHLCGSHLVEALYLVCGERGFFYTPKA

END AA_SEQUENCE

BEGIN WAVELENGTH NATIVE

! this doesn't have to be here - if it is 
! not included then the values from 
! the image headers will be used - however 
! if it is there then it should
! be correct!

WAVELENGTH 0.979000

! in here you can also have
! F' value
! F'' value

END WAVELENGTH NATIVE

BEGIN SWEEP NATIVE
WAVELENGTH NATIVE
IMAGE insulin_1_001.img

! you will probably need to change this - 
! this is the only thing which 
! you will need to change for the 
! demonstration data set

DIRECTORY /media/data1/graeme/demo/

! additionally you can add the following 
! information - if it is wrong in the headers
! BEAM x y (mm)
! DISTANCE z (mm)

! this describes the order in which 
! the sweeps were collected - 
! it usually comes from the image header 
! if that information is in there
! EPOCH 5

! you can also add this to only reduce 
! a subset of the data 
! START_END 1 30 (image numbers)

END SWEEP 

END CRYSTAL INSULIN

END PROJECT DEMONSTRATION
    

This uses test images which are available from the links below. Thanks to John Cowan for providing this test data!

Here are a couple of templates - for native data from two sweeps, SAD data from one sweep, MAD data from three wavelengths - all you need to do is select an appropriate one and add your own information! Please note - if you wish to comment something out from a .xinfo file, simply at a "!" at the beginning of the appropriate lines.

More examples can be found in $DPA_ROOT/examples. Remember - you need to tell xia2 where to find your data! Answers to frequently asked questions can be found here.

To assist in generating the .xinfo file I have written a small program xia2setup, which is passed in the directory where your data are stored, and will write to the standard output an illustrative .xinfo file. This will number the wavelengths and sweeps, and so will not add sensible information, and will get all of it's information from the image headers, which are sometimes unreliable.

The new program is run thus:

xia2setup -atom se -project foo -crystal bar \
-beam 109,105 /path/to/image/directory > automatic.xinfo
    

This will look for sequence files (.seq) scan files (.scan) and images files (.img, .osc, .mar345, .mccd, .cbf) and make a good guess as to what is going on. If you have a scan file in there and you have chooch installed, then xia2 will have a stab at identifying the different wavelengths. This is a guess and may be wrong. For this to work I have made the following assumptions:

If you want to combine data from a number of crystals in the same .xinfo file, then you will need to copy out all text from BEGIN CRYSTAL to END CRYSTAL from one .xinfo file to the other.

You should then load automatic.xinfo in your favourite editor, and check that the sequence looks correct and that the names are sensible as well as checking that the epoch numbers are set correctly and also that the wavelengths and beam centres are correct. If you provided a heavy atom there is a place to say how many to look for.

Finally, if you have labelit installed xia2setup will run this to update the beam positions. If this happens, you will see a comment to this effect above the BEAM records in the sweeps.

Platforms

The following platforms are supported (see notes at bottom of page for platform specific advice.)

Supported Detector Types

The following detectors are supported:

If you have another detector class which needs support, please get in touch.

Requirements

The only software which is actually required is CCP4 6.0.1/2 and Python 2.4.x. On Windows the PyWin32 module is also required, to provide additional process control functionality. If you are using this on a Windows platform and wish to be able to use the integration functionality, you should get in touch with Harry Powell for a custom build of Mosflm.

In addition the autoindexing program Labelit is supported but the system will work without it. If it is installed (and it should be a moderately recent version) then it will be used for autoindexing.

License

This software is distributed under the BSD license for all users. Everyone who uses this software is invited to join the xia2bb mailing list. Finally, if you use this software in solving a structure which is published, please acknowledge it! Thanks!

A copy of the license is available here .

This software depends upon CCP4 and will also make use of Labelit - users are reminded that it is their responsibility to have properly obtained licenses for this software.

Usage

Once the software is installed, the first thing you need to do is prepare a .xinfo file (example above) to run from. This contains a description of the experiment, which will allow xia2 to decide how to handle your data and what information to put in the MTZ headers.

Essentially the input file just needs a description of the data which were collected from each crystal - for instance, if you measured three MAD wavelengths, a peak, inflection and high remote, then there should be three WAVELENGTH blocks in the file, containing the fp, fpp and wavelength values. The images that you measured from these should then be described in the SWEEP block, which is then assigned to a wavelength. This structure simplifies handling of multiple passes contributing to one wavelength. In the reduced output, the wavelength names will correspond to MTZ datasets. If you are interested in keeping the harvesting information, and you have more than one crystal to process in a single xia2 run, you will need to ensure that the WAVELENGTH names are globally unique within the input file.

The program is actually run with:

xia2 -xinfo project.xinfo
    

And all data reduction is performed in the working directory, with the a directory structure CRYSTAL/WAVELENGTH/SWEEP. Interesting log files can be found in the CRYSTAL/scale directory.

Acknowledgements

Development of xia2 is supported by the BBSRC e-HTPX grant and the EU framework 6 BioXHit initiative. xia2 makes extensive use of Phil Evans's program Pointless, and I would like to thank him for the extensive modifications made to the program. I would also like to thank the US structural genomics JCSG group for providing their extensive collection of raw diffraction data to methods developers. Finally I would like to acknowledge CCP4, for without their software this project would go nowhere!

The More Complex Example

This example has two crystals, one of which is selenomethionated, with two wavelengths (inflection and low remote) measured for the derivative and two native sweeps measured for the native sample. Lines beginning with # or ! will be treated as comments.

BEGIN PROJECT TS01
BEGIN CRYSTAL 12847

BEGIN AA_SEQUENCE

MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD
FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL
RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF

END AA_SEQUENCE

BEGIN WAVELENGTH NATIVE
WAVELENGTH 0.99187
END WAVELENGTH NATIVE

! high resolution native pass

BEGIN SWEEP NATIVE_HR
WAVELENGTH NATIVE
BEAM 109.0 105.0
IMAGE 12847_4_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/
END SWEEP NATIVE_HR

! low resolution native pass

BEGIN SWEEP NATIVE_LR
WAVELENGTH NATIVE
BEAM 109.0 105.0
IMAGE 12847_5_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/
END SWEEP NATIVE_LR

END CRYSTAL 12847

BEGIN CRYSTAL 13140

BEGIN AA_SEQUENCE

MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD
FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL
RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF

END AA_SEQUENCE

BEGIN HA_INFO
ATOM SE
NUMBER_PER_MONOMER 5
END HA_INFO

BEGIN WAVELENGTH INFL
WAVELENGTH 0.979741
F' -10.0
F'' 3.2
END WAVELENGTH INFL

BEGIN WAVELENGTH LREM
WAVELENGTH 1.019859
F' -2.6
F'' 0.55
END WAVELENGTH LREM

BEGIN SWEEP INFL
WAVELENGTH INFL
BEAM 108.7 102.0
IMAGE 13140_1_E1_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/
END SWEEP

BEGIN SWEEP LREM
WAVELENGTH LREM
BEAM 108.7 102.0
IMAGE 13140_1_E2_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/
END SWEEP

END CRYSTAL 13140

END PROJECT TS01
    

This is a particularly complex example, but it illustrates the capability of the system well. There are data from two crystals, one native and the other selenomethionated. The native set is measured in two sweeps. Currently (version 0.2.6.3) no attempt is made to combine data from multiple crystals, however this is on the to-do list.

Release Notes

Changes since 0.2.6.2

Changes since 0.2.6.1

Changes since 0.2.6.0

Changes since 0.2.5.2

Changes since 0.2.5.1 - big changes in bold

Changes since 0.2.5

Changes since 0.2.4

Changes since 0.2.3

Changes since 0.2.2.4:

Changes since 0.2.2.3:

Changes since 0.2.2.2:

Platform Specific Advice

Gfortran compiled CCP4

Recent versions of the gcc compiler suite have included a new compiler called "gfortran" in place of "g77". This has, unfortunately, been designed with buffering too high on the agenda, and the output from programs can sometimes be mashed. If this happens you need to set "GFORTRAN_UNBUFFERED_ALL" to 1 in your environment.

Linux

There should be no particular problems on this platform - assuming that you have a recent version of Python. Any 2.4 version should work, but I have mostly used 2.4.3 and 2.4.4, and would recommend them. If the version you have on your system by default (type "python" to find out what version you have)is too old, you can easily install a new version, perhaps in the xia directory, from here. This is best installed by unpacking the tarball (tar xvfz Python-2.4.4.tgz) then doing ./configure /where/I/want/it then make, make install.

OS X 10.4

The only potential problem with this platform is that the Python which comes as standard is 2.3 - however a universal binary is available from here, which should sort out any problems. Just follow the instructions in the disk image.

Windows XP

Once more you will need a recent version of Python - which can be found here, however, you will also need "pywin32" to provide a little more job control functionality - this is available from here. Both are simple, double-click, binary installers.

There is now a version of Labelit available for Windows, which can be found here.

The mosflm binary which comes with iMosflm works well with xia2.

AUTHOR

Graeme Winter