xia2 is a new automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution, e.g. through Mr BUMP or your favourite experimental phasing suite.
The following steps are performed (although not strictly in this order)
There is a lot of fun stuff which goes on in there - run it to find out more...
The system is designed to cope with MAD data and data measured in multiple passes, or a combination of the two. The input is a therefore a little complex, but consists only of information you should have to hand anyway - here is a simple example, a more complex example can be found below.
! This is a demonstration .xinfo file which illustrates how to cope ! with a simple case - this example is a native cubic insulin data ! set measured on 14.2 at the SRS BEGIN PROJECT DEMONSTRATION BEGIN CRYSTAL INSULIN BEGIN AA_SEQUENCE ! this is only really needed at the moment for assessing the solvent ! content and number of residues in the asu GIVEQCCASVCSLYQLENYCN FVNQHLCGSHLVEALYLVCGERGFFYTPKA END AA_SEQUENCE BEGIN WAVELENGTH NATIVE ! this doesn't have to be here - if it is ! not included then the values from ! the image headers will be used - however ! if it is there then it should ! be correct! WAVELENGTH 0.979000 ! in here you can also have ! F' value ! F'' value END WAVELENGTH NATIVE BEGIN SWEEP NATIVE WAVELENGTH NATIVE IMAGE insulin_1_001.img ! you will probably need to change this - ! this is the only thing which ! you will need to change for the ! demonstration data set DIRECTORY /media/data1/graeme/demo/ ! additionally you can add the following ! information - if it is wrong in the headers ! BEAM x y (mm) ! DISTANCE z (mm) ! this describes the order in which ! the sweeps were collected - ! it usually comes from the image header ! if that information is in there ! EPOCH 5 ! you can also add this to only reduce ! a subset of the data ! START_END 1 30 (image numbers) END SWEEP END CRYSTAL INSULIN END PROJECT DEMONSTRATION
This uses test images which are available from the links below. Thanks to John Cowan for providing this test data!
Here are a couple of templates - for native data from two sweeps, SAD data from one sweep, MAD data from three wavelengths - all you need to do is select an appropriate one and add your own information! Please note - if you wish to comment something out from a .xinfo file, simply at a "!" at the beginning of the appropriate lines.
More examples can be found in $DPA_ROOT/examples. Remember - you need to tell xia2 where to find your data! Answers to frequently asked questions can be found here.
To assist in generating the .xinfo file I have written a small program xia2setup, which is passed in the directory where your data are stored, and will write to the standard output an illustrative .xinfo file. This will number the wavelengths and sweeps, and so will not add sensible information, and will get all of it's information from the image headers, which are sometimes unreliable.
The new program is run thus:
xia2setup -atom se -project foo -crystal bar \ -beam 109,105 /path/to/image/directory > automatic.xinfo
This will look for sequence files (.seq) scan files (.scan) and images files (.img, .osc, .mar345, .mccd, .cbf) and make a good guess as to what is going on. If you have a scan file in there and you have chooch installed, then xia2 will have a stab at identifying the different wavelengths. This is a guess and may be wrong. For this to work I have made the following assumptions:
If you want to combine data from a number of crystals in the same .xinfo file, then you will need to copy out all text from BEGIN CRYSTAL to END CRYSTAL from one .xinfo file to the other.
You should then load automatic.xinfo in your favourite editor, and check that the sequence looks correct and that the names are sensible as well as checking that the epoch numbers are set correctly and also that the wavelengths and beam centres are correct. If you provided a heavy atom there is a place to say how many to look for.
Finally, if you have labelit installed xia2setup will run this to update the beam positions. If this happens, you will see a comment to this effect above the BEAM records in the sweeps.
The following platforms are supported (see notes at bottom of page for platform specific advice.)
The following detectors are supported:
If you have another detector class which needs support, please get in touch.
The only software which is actually required is CCP4 6.0.1/2 and Python 2.4.x. On Windows the PyWin32 module is also required, to provide additional process control functionality. If you are using this on a Windows platform and wish to be able to use the integration functionality, you should get in touch with Harry Powell for a custom build of Mosflm.
In addition the autoindexing program Labelit is supported but the system will work without it. If it is installed (and it should be a moderately recent version) then it will be used for autoindexing.
This software is distributed under the BSD license for all users. Everyone who uses this software is invited to join the xia2bb mailing list. Finally, if you use this software in solving a structure which is published, please acknowledge it! Thanks!
A copy of the license is available here .
This software depends upon CCP4 and will also make use of Labelit - users are reminded that it is their responsibility to have properly obtained licenses for this software.
Once the software is installed, the first thing you need to do is prepare a .xinfo file (example above) to run from. This contains a description of the experiment, which will allow xia2 to decide how to handle your data and what information to put in the MTZ headers.
Essentially the input file just needs a description of the data which were collected from each crystal - for instance, if you measured three MAD wavelengths, a peak, inflection and high remote, then there should be three WAVELENGTH blocks in the file, containing the fp, fpp and wavelength values. The images that you measured from these should then be described in the SWEEP block, which is then assigned to a wavelength. This structure simplifies handling of multiple passes contributing to one wavelength. In the reduced output, the wavelength names will correspond to MTZ datasets. If you are interested in keeping the harvesting information, and you have more than one crystal to process in a single xia2 run, you will need to ensure that the WAVELENGTH names are globally unique within the input file.
The program is actually run with:
xia2 -xinfo project.xinfo
And all data reduction is performed in the working directory, with the a directory structure CRYSTAL/WAVELENGTH/SWEEP. Interesting log files can be found in the CRYSTAL/scale directory.
Development of xia2 is supported by the BBSRC e-HTPX grant and the EU framework 6 BioXHit initiative. xia2 makes extensive use of Phil Evans's program Pointless, and I would like to thank him for the extensive modifications made to the program. I would also like to thank the US structural genomics JCSG group for providing their extensive collection of raw diffraction data to methods developers. Finally I would like to acknowledge CCP4, for without their software this project would go nowhere!
This example has two crystals, one of which is selenomethionated, with two wavelengths (inflection and low remote) measured for the derivative and two native sweeps measured for the native sample. Lines beginning with # or ! will be treated as comments.
BEGIN PROJECT TS01 BEGIN CRYSTAL 12847 BEGIN AA_SEQUENCE MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF END AA_SEQUENCE BEGIN WAVELENGTH NATIVE WAVELENGTH 0.99187 END WAVELENGTH NATIVE ! high resolution native pass BEGIN SWEEP NATIVE_HR WAVELENGTH NATIVE BEAM 109.0 105.0 IMAGE 12847_4_001.img DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/ END SWEEP NATIVE_HR ! low resolution native pass BEGIN SWEEP NATIVE_LR WAVELENGTH NATIVE BEAM 109.0 105.0 IMAGE 12847_5_001.img DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.2.1/20050121/collection/TM0892/12847/ END SWEEP NATIVE_LR END CRYSTAL 12847 BEGIN CRYSTAL 13140 BEGIN AA_SEQUENCE MKVKKWVTQDFPMVEESATVRECLHRMRQYQTNECIVKDREGHFRGVVNKEDLLDLDLDSSVFNKVSLPD FFVHEEDNITHALLLFLEHQEPYLPVVDEEMRLKGAVSLHDFLEALIEALAMDVPGIRFSVLLEDKPGEL RKVVDALALSNINILSVITTRSGDGKREVLIKVDAVDEGTLIKLFESLGIKIESIEKEEGF END AA_SEQUENCE BEGIN HA_INFO ATOM SE NUMBER_PER_MONOMER 5 END HA_INFO BEGIN WAVELENGTH INFL WAVELENGTH 0.979741 F' -10.0 F'' 3.2 END WAVELENGTH INFL BEGIN WAVELENGTH LREM WAVELENGTH 1.019859 F' -2.6 F'' 0.55 END WAVELENGTH LREM BEGIN SWEEP INFL WAVELENGTH INFL BEAM 108.7 102.0 IMAGE 13140_1_E1_001.img DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/ END SWEEP BEGIN SWEEP LREM WAVELENGTH LREM BEAM 108.7 102.0 IMAGE 13140_1_E2_001.img DIRECTORY /Volumes/Arthur/JCSG Data/1vr9/data/jcsg/als1/8.3.1/20050105/collect/TM0892/13140/ END SWEEP END CRYSTAL 13140 END PROJECT TS01
This is a particularly complex example, but it illustrates the capability of the system well. There are data from two crystals, one native and the other selenomethionated. The native set is measured in two sweeps. Currently (version 0.2.6.3) no attempt is made to combine data from multiple crystals, however this is on the to-do list.
Changes since 0.2.6.2
xia2 -project TG6623 -crystal X77788 -atom se /my/images/are/hereHowever this relies on your image headers being accurate and the images having some kind of recognisable format...
Changes since 0.2.6.1
Changes since 0.2.6.0
Changes since 0.2.5.2
xia2 -ehtpx_xml_out project.xml
Changes since 0.2.5.1 - big changes in bold
-trust_timestampscan be used, which will use the time stamps on the image files to analyse things like radiation damage.
-migrate_datato the command line - the data will be removed from the local disk once the processing is finished.
Changes since 0.2.5
DIRECTORY $DATA/example or ~/data or %DATA%/example (win32)
Changes since 0.2.4
Changes since 0.2.3
Changes since 0.2.2.4:
Changes since 0.2.2.3:
Changes since 0.2.2.2:
Recent versions of the gcc compiler suite have included a new compiler called "gfortran" in place of "g77". This has, unfortunately, been designed with buffering too high on the agenda, and the output from programs can sometimes be mashed. If this happens you need to set "GFORTRAN_UNBUFFERED_ALL" to 1 in your environment.
There should be no particular problems on this platform - assuming that you have a recent version of Python. Any 2.4 version should work, but I have mostly used 2.4.3 and 2.4.4, and would recommend them. If the version you have on your system by default (type "python" to find out what version you have)is too old, you can easily install a new version, perhaps in the xia directory, from here. This is best installed by unpacking the tarball (tar xvfz Python-2.4.4.tgz) then doing ./configure /where/I/want/it then make, make install.
The only potential problem with this platform is that the Python which comes as standard is 2.3 - however a universal binary is available from here, which should sort out any problems. Just follow the instructions in the disk image.
Once more you will need a recent version of Python - which can be found here, however, you will also need "pywin32" to provide a little more job control functionality - this is available from here. Both are simple, double-click, binary installers.
There is now a version of Labelit available for Windows, which can be found here.
The mosflm binary which comes with iMosflm works well with xia2.