xia2 Automated Data Reduction

Sourceforge project page.

New! I have now set up a mailing list for xia2 on sourceforge. If you would like to subscribe to this - where announcements of new versions of xia2 and problems can be discussed, please go to the list page. Also, if you are interested in keeping up-to-date with what is going on in xia2 development take a look at the blog.

xia2 is a new automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution, e.g. through Mr BUMP or your favourite experimental phasing suite.

The following steps are performed (although not strictly in this order)

There is a lot of fun stuff which goes on in there - run it to find out more...

The system is designed to cope with MAD data and data measured in multiple passes, or a combination of the two. The input is a therefore a little complex, but consists only of information you should have to hand anyway - here is a simple example, a more complex example can be found below.

Download

The current version, 0.3.6.0 can be obtained by following these links:

All of the files are the same, simply packed for different platforms.

Installation

Since everything is Python using cctbx, the only requirement is to export XIA2_ROOT to point to the directory where xia2 was unpacked (including the xia2-0.3.6.0 bit) and then source $XIA2_ROOT/setup.(c)sh.

Input

There are two ways of running xia2 - with and without an input file, an example of which follows below. If you just run
xia2 /my/data/are/here
xia2 will do something sensible - in this case process all of the data, scale all measurements as if they are from a single crystal and merge the data from each wavelength separately. If only one wavelength is present xia2 will assume that the data are a native data set - to separate anomalous pairs provide a heavy atom (at this time it doesn't matter what it is...) i.e.
xia2 -atom se /my/data/are/here
Other options are (type just xia2 to get this list)
Command-line options to xia2:
[-2d] or [-3d] or [-3dii]
[-parallel 4] (say, for XDS usage)
[-resolution 2.8] (say, applies to all sweeps)
[-freer_file free.mtz]
[-quick]
[-atom se] (say)
[-reversephi]
[-migrate_data]
[-beam x,y]
Running ths way some assumptions are made:

If you want to combine data from a number of crystals in the same .xinfo file, then you will need to copy out all text from BEGIN CRYSTAL to END CRYSTAL from one .xinfo file to the other.

You should then load automatic.xinfo in your favourite editor, and check that the sequence looks correct and that the names are sensible as well as checking that the epoch numbers are set correctly and also that the wavelengths and beam centres are correct. If you provided a heavy atom there is a place to say how many to look for.

Finally, if you have labelit installed xia2 will run this to update the beam positions. If this happens, you will see a comment to this effect above the BEAM records in the sweeps.

The other mechanism for running xia2 is via a xinfo file, which explains the layout of the data set to xia2 explicitly. This is helpful if you wish to only process a subset of the measurements, or want to process data for an RIP experiment. A simple example of xinfo file follow below, and more complex examples can be found here:

More examples can be found in $XIA2_ROOT/examples. Remember - you need to tell xia2 where to find your data! Answers to frequently asked questions can be found here.

! This is a demonstration .xinfo file which illustrates how to cope
! with a simple case - this example is a native cubic insulin data
! set measured on 14.2 at the SRS

BEGIN PROJECT DEMONSTRATION

BEGIN CRYSTAL INSULIN

BEGIN AA_SEQUENCE

! this is only really needed at the moment for assessing the solvent
! content and number of residues in the asu

GIVEQCCASVCSLYQLENYCN
FVNQHLCGSHLVEALYLVCGERGFFYTPKA

END AA_SEQUENCE

BEGIN WAVELENGTH NATIVE

! this doesn't have to be here - if it is
! not included then the values from
! the image headers will be used - however
! if it is there then it should
! be correct!

WAVELENGTH 0.979000

! in here you can also have
! F' value
! F'' value

END WAVELENGTH NATIVE

BEGIN SWEEP NATIVE
WAVELENGTH NATIVE
IMAGE insulin_1_001.img

! you will probably need to change this -
! this is the only thing which
! you will need to change for the
! demonstration data set

DIRECTORY /media/data1/graeme/demo/

! additionally you can add the following
! information - if it is wrong in the headers
! BEAM x y (mm)
! DISTANCE z (mm)

! this describes the order in which
! the sweeps were collected -
! it usually comes from the image header
! if that information is in there
! EPOCH 5

! you can also add this to only reduce
! a subset of the data
! START_END 1 30 (image numbers)

END SWEEP

END CRYSTAL INSULIN

END PROJECT DEMONSTRATION
    

Demonstration Data: Cubic Insulin

This uses test images which are available from the links below. Thanks to John Cowan for providing this test data!

Platforms

The following platforms are supported (see notes at bottom of page for platform specific advice.)

Supported Detector Types

The following detectors are supported:

If you have another detector class which needs support, please get in touch.

Requirements

The only software which is actually required is CCP4 6.0.1/2 and Python 2.4.x. On Windows the PyWin32 module is also required, to provide additional process control functionality. If you are using this on a Windows platform and wish to be able to use the integration functionality, you should get in touch with Harry Powell for a custom build of Mosflm.

In addition the autoindexing program Labelit is supported but the system will work without it. If it is installed (and it should be a moderately recent version) then it will be used for autoindexing.

License

This software is distributed under the BSD license for all users. Everyone who uses this software is invited to join the xia2bb mailing list. Finally, if you use this software in solving a structure which is published, please acknowledge it! Thanks!

A copy of the license is available here .

This software depends apon CCP4 and will also make use of Labelit - users are reminded that it is their responsibility to have properly obtained licenses for this software.

Usage

Once the software is installed, the first thing you need to do is prepare a .xinfo file (example above) to run from. This contains a description of the experiment, which will allow xia2 to decide how to handle your data and what information to put in the MTZ headers.

Essentially the input file just needs a description of the data which were collected from each crystal - for instance, if you measured three MAD wavelengths, a peak, inflection and high remote, then there should be three WAVELENGTH blocks in the file, containing the fp, fpp and wavelength values. The images that you measured from these should then be described in the SWEEP block, which is then assigned to a wavelength. This structure simplifies handling of multiple passes contributing to one wavelength. In the reduced output, the wavelength names will correspond to MTZ datasets. If you are interested in keeping the harvesting information, and you have more than one crystal to process in a single xia2 run, you will need to ensure that the WAVELENGTH names are globally unique within the input file.

The program is actually run with:

xia2 -xinfo project.xinfo
    

And all data reduction is performed in the working directory, with the a directory structure CRYSTAL/WAVELENGTH/SWEEP. Interesting log files can be found in the CRYSTAL/scale directory.

Acknowledgements

Development of xia2 is supported by the BBSRC e-HTPX grant and the EU framework 6 BioXHit initiative. xia2 makes extensive use of Phil Evans's program Pointless, and I would like to thank him for the extensive modifications made to the program. I would also like to thank the US structural genomics JCSG group for providing their extensive collection of raw diffraction data to methods developers. Finally I would like to acknowledge CCP4, for without their software this project would go nowhere!

The More Complex Example

This example has two crystals, one of which is selenomethionated, with two wavelengths (inflection and low remote) measured for the derivative and two native sweeps measured for the native sample. Lines beginning with # or ! will be treated as comments.

BEGIN PROJECT TS01
BEGIN CRYSTAL 12847

BEGIN AA_SEQUENCE

MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD
FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL
RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF

END AA_SEQUENCE

BEGIN WAVELENGTH NATIVE
WAVELENGTH 0.99187
END WAVELENGTH NATIVE

! high resolution native pass

BEGIN SWEEP NATIVE_HR
WAVELENGTH NATIVE
BEAM 109.0 105.0
IMAGE 12847_4_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/
END SWEEP NATIVE_HR

! low resolution native pass

BEGIN SWEEP NATIVE_LR
WAVELENGTH NATIVE
BEAM 109.0 105.0
IMAGE 12847_5_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/
END SWEEP NATIVE_LR

END CRYSTAL 12847

BEGIN CRYSTAL 13140

BEGIN AA_SEQUENCE

MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD
FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL
RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF

END AA_SEQUENCE

BEGIN HA_INFO
ATOM SE
NUMBER_PER_MONOMER 5
END HA_INFO

BEGIN WAVELENGTH INFL
WAVELENGTH 0.979741
F' -10.0
F'' 3.2
END WAVELENGTH INFL

BEGIN WAVELENGTH LREM
WAVELENGTH 1.019859
F' -2.6
F'' 0.55
END WAVELENGTH LREM

BEGIN SWEEP INFL
WAVELENGTH INFL
BEAM 108.7 102.0
IMAGE 13140_1_E1_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/
END SWEEP

BEGIN SWEEP LREM
WAVELENGTH LREM
BEAM 108.7 102.0
IMAGE 13140_1_E2_001.img
DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/
END SWEEP

END CRYSTAL 13140

END PROJECT TS01
    

This is a particularly complex example, but it illustrates the capability of the system well. There are data from two cystals, one native and the other selenomethionated. The native set is measured in two sweeps. Currently (version 0.3.5.1) no attempt is made to intelligently combine data from multiple crystals, however this is on the to-do list.

Release Notes

Changes since 0.3.5.2: dealing with huge data sets

Changes since 0.3.5.1: CCP4 patch release

Changes since 0.3.5.0

Changes since 0.3.4.0

Changes since 0.3.3.4

Changes since 0.3.3.3

Changes since 0.3.3.2

Changes since 0.3.3.1

Changes since 0.3.3.0

Changes since 0.3.2.0

Changes since 0.3.1.7

Changes since 0.3.1.6

Changes since 0.3.1.0

Changes since 0.3.0.6

Changes since 0.3.0.5

Changes since 0.3.0.4

Changes since 0.3.0.3

Changes since 0.3.0.0

Changes since 0.2.7.2

Changes since 0.2.7.0

Changes since 0.2.6.6

Changes since 0.2.6.5

Changes since 0.2.6.4

Changes since 0.2.6.3

Changes since 0.2.6.2

Changes since 0.2.6.1

Changes since 0.2.6.0

Changes since 0.2.5.2

Changes since 0.2.5.1 - big changes in bold

Changes since 0.2.5

Changes since 0.2.4

Changes since 0.2.3

Changes since 0.2.2.4:

Changes since 0.2.2.3:

Changes since 0.2.2.2:

Platform Specific Advice

Gfortran compiled CCP4

Recent versions of the gcc compiler suite have included a new compiler called "gfortran" in place of "g77". This has, unfortunately, been designed with buffering too high on the agenda, and the output from programs can sometimes be mashed. If this happens you need to set "GFORTRAN_UNBUFFERED_ALL" to 1 in your environment.

Linux

There should be no particular problems on this platform - assuming that you have a recent version of Python. Any 2.4 version should work, but I have mostly used 2.4.3 and 2.4.4, and would recommend them. If the version you have on your system by default (type "python" to find out what version you have)is too old, you can easily install a new version, perhaps in the xia directory, from here. This is best installed by unpacking the tarball (tar xvfz Python-2.4.4.tgz) then doing ./configure /where/I/want/it then make, make install.

OS X.4 - X.6

The only potential problem with this platform is that the Python which comes as standard is 2.3 - however since you need cctbx.python anyway the problem has gone away.

Windows XP

Once more you will need a recent version of Python - which can be found here, however, you will also need "pywin32" to provide a little more job control functionality - this is available from here. Both are simple, double-click, binary installers.

There is now a version of Labelit available for Windows, which can be found here.

The mosflm binary which comes with iMosflm works well with xia2.