freerunique - Convert FreeRflags Between CCP4 and Other Formats (XPLOR/CNS/TNT/SHELX)
For successful cross validation:
Different programs have different philosophies for dealing with FreeR reflections:
CCP4 | first expands the data set to include all possible HKLs to the resolution given, marking those which are unmeasured. It then divides the data set into n partitions randomly, assigning a FreeRflag with values (0 1 2 ... (n-1)) to each set. These cross validation sets are used during density modification, and for refinement. The default FreeR set used within refinement is flagged as 0, but this can be changed by setting a KEYWORD FREE x. |
XPLOR | assigns the flag TEST=x. The only acceptable values are: |
CNS | assigns the flag TEST=x. The acceptable values range from
x=0,1,...,n-1. The defaults are: |
SHELX | has a flag, following the format (3I4,2F8.2,I2). The values are: |
TNT | separates the data into different files; one for the free set, and one for the working set. Old versions of SHELX also separated the data into different files. |
It is important to choose a fraction that is large enough so that the statistics are sensible (at least 500 reflections seems to be the consensus at the moment), but small enough so that as many reflections as possible are still used for the refinement. This is of course always true, whichever philosophy is chosen for the selection of the FreeR reflections!
When you are ready to start the first refinement, or preferably as soon as you collect the native data:
Run uniqueify mydata.mtz.
This script generates an output file mydata-unique.mtz which contains
(H K L F SIGF ( I SIGI ) .. FreeR_flag)
for all observed reflections to the resolution limit available,
plus entries for any unobserved reflection, all with FreeR_flags assigned.
The percentage flagged defaults to 5%, but this can be reset using
uniqueify {-p fraction} mydata.mtz.
The default label is FreeR_flag but this can be reset using
uniqueify {-f FreeLABel} mydata.mtz.
A complete set of FreeR_flags (similar to that produced for a new data set, see above) can be added to any other related data set using CAD:
cad hklin1 new.mtz hklin2 olddata-unique.mtz hklout new-unique.mtz LABI FILE 1 ALLIn LABI FILE 2 E1=FreeR_flag END
If the new data is to higher resolution, you will now need to run
uniqueify again to pad out the FreeR_flags:
uniqueify {-f FreeLABel} new-unique.mtz new-uniquer.mtz
(the default label for the free set is FreeR_flag, but you can use
whatever you like).
The script will estimate the percentage of data you have used as a test set.
This assigns FreeR_flags to any reflections in the higher resolution shell where the previous set of FreeR_flags are missing.
You can use the jiffy MTZ2VARIOUS to convert from MTZ to XPLOR/CNS TNT or SHELX formats quite simply. They all have different conventions, but MTZ2VARIOUS attempts to reproduce them (see program documentation: MTZ2VARIOUS).
XPLOR | output will have TEST=0 for working set; TEST=1 for free set |
CNS | output will have TEST=1 for free set; TEST=0,2,...,(n-1) for working set |
SHELX | output will have 1 as the flag for the working set, and -1 for free set |
TNT | output may be split into two files |
# test set flagged with TEST=1, working set with TEST=0 # mtz2various \ hklin pc553_19f-unique.mtz \ HKLOUT xplor.hkl \ <<eof # All these labels can be set and will be handled appropriately: # LABIN FP=F SIGFP=SIGF [FPART PHIPART PA PB PC PD PHIB WEIGHT ] FREE=FreeR_flag OUTPUT CNS/XPLOR # END eof exit
mtz2various \ hklin lmw.mtz \ HKLOUT shelxout.hkl \ <<eof OUTPUT SHELX LABIN FP=FRBP SIGFP=SIGFRBP [IP SIGIP FP(+) FP(-) IP(+) IP(-) ] FREE=FreeR_flag # This will always output Is; and will rescale the data to fit the format. # You can override the default by setting SCAL yourself. SCALE 0.01 # END eof
# TNT uses a different asymmetric unit of reciprocal space to CCP4. Dale has # programs to convert the data if necessary. # The data is seperated into a free set and a working set. # mtz2various \ hklin lisa.wright/lmw.mtz \ HKLOUT lisa.wright/tnt_work.hkl \ <<eof LABIN FP=FP SIGFP=SIGFP FREE=FreeR_flag OUTPUT TNT EXCLUDE FREER 0 # END eof #
mtz2various \ hklin lisa.wright/lmw.mtz \ HKLOUT lisa.wright/tnt_free.hkl \ <<eof LABIN FP=FP SIGFP=SIGFP FREE=FreeR_flag OUTPUT TNT INCLUDE FREER 0 # END eof exit
These are all ASCII formats, so F2MTZ can be used in a straightforward way. After all these conversions you need to uniqueify the MTZ file.
Run uniqueify {-f FreeLABel} mydata.mtz
This will
The script guesses what style of file is being imported, by looking at the
distribution of FreeR_flags:
It estimates the percentage of reflections flagged as the FreeR set, and then pads out the missing reflections and converts the flags to the CCP4 style of (0, 1,...,(n-1)).
SHELX "input"
Use F2MTZ and TRUNCATE to convert (H K L I SIGI FreeR_flag) to an MTZ file.
See example.
SHELX "output"
Use F2MTZ (and TRUNCATE) to convert (H K L I SIGI FC PHIC FreeR_flag) to
an MTZ file. See example.
TNT
The easiest way is to insert a final column of 1 into the working and
0 into the free set, 'cat' the two files together and use F2MTZ.
See example.
CNS/XPLOR
See example.
# # NREFlection= 10208 # ANOMalous=FALSe { equiv. to HERMitian=TRUE} # DECLare name=FOBS DOMAin=RECIprocal type=COMP END # DECLare name=SIGMA DOMAin=RECIprocal type=REAL END # DECLare name=FPART DOMAin=RECIprocal type=COMP END # DECLare name=WEIGHT DOMAin=RECIprocal type=REAL END # DECLare name=TEST DOMAin=RECIprocal type=INTE END # INDE 6 0 0 FOBS= 1259.884 0.000 SIGMA= 38.561 # FPART= 0.000 0.000 WEIGHT= 1.000 TEST= 0 # INDE 8 0 0 FOBS= 827.600 0.000 SIGMA= 30.983 # FPART= 0.000 0.000 WEIGHT= 1.000 TEST= 0 #!/bin/csh -f # f2mtz \ hklin suying/b-over.hkl \ hklout suying/b-over.mtz \ hklout suying/b-over.mtz \ <<eof # skip the NREF and DECLARE lines SKIP 7 # For XPLOR you would probably need: SKIP 0 CELL 55.19 79.73 66.68 90.00 90.00 90.00 SYMM C2221 # # f2mtz assumes a free format without any character data # So you must either remove these from the file, or design # a format statement to skip the labels. # # You have to get this format right! nX ignores n characters. # Count characters FORMT '(6x,3F5.0,6X,2f10.0,7X,f10.0,/,25X,2f10.0,8X,F10.0,6x,F10.0)' # #1234561234512345123451234561234567890123456789012345671234567890 # INDE 6 0 0 FOBS= 1259.884 0.000 SIGMA= 38.561 #1234567890123456789012345123456789012345678901234567812345678901234561234567890 # FPART= 0.000 0.000 WEIGHT= 1.000 TEST= 0 # # LABO H K L FRBP PHIB SIGFRBP FPART PHIPART WEIGHT FreeR_flag # CTYPO H H H F P Q F F W I END eof # uniqueify suying/b-over.mtz exit
f2mtz \ hklin pc553_19.hkl \ hklout pc553_19i.mtz \ <<eof CELL 37.144 39.422 44.021 90.00 90.00 90.00 SYMM P212121 LABO H K L I SIGI [ FreeR_flag ] CTYPO H H H J Q [ I ] END eof # # To reduce Is to Fs - use truncate # truncate \ hklin pc553_19i.mtz \ hklout pc553_19f.mtz \ <<eof LABI IMEAN=I SIGIMEAN=SIGI END eof # # If you read a FreeR_flag, you will now have to rescue it - # TRUNCATE ignores it. # cad hklin1 pc553_19f.mtz \ hklin2 pc553_19i.mtz \ hklout pc553_19f-free.mtz \ <<eof LABI FILE 1 ALLIN LABI FILE 2 E1=FreeR_flag END eof # # Modify FreeR_flags uniqueify pc553_19f.mtz #
# First edit the TNT to assign flag 1 to working set and 0 to free set; # then cat both TNT files together: # # sed 's/$/ 1/' $SCRATCH/tnt-work.hkl # sed 's/$/ 1/' $SCRATCH/tnt-work.hkl # cat $SCRATCH/tnt-work.hkl $SCRATCH/tnt-work.hkl > $SCRATCH/tnt-all.hkl # # Example piece: HKL -22 0 4 2010.9 134.7 1000.0 0.0000 1 HKL -22 0 5 4005.2 83.1 1000.0 0.0000 1 HKL -22 0 6 3661.5 91.1 1000.0 0.0000 1 HKL -22 0 7 2321.9 59.7 1000.0 0.0000 1 .... HKL -21 1 9 488.4 143.9 1000.0 0.0000 0 HKL -20 0 6 329.5 202.9 1000.0 0.0000 0 HKL -20 0 11 1009.2 146.7 1000.0 0.0000 0 HKL -20 4 10 1989.1 46.5 1000.0 0.0000 0 .... # f2mtz \ hklin tnt_all.hkl \ hklout tnt_all.mtz \ <<eof CELL 37.144 39.422 44.021 90.00 90.00 90.00 SYMM P212121 LABO H K L F SIGF FreeRflag CTYPO H H H F Q I # # See above comments about formats.. You need to skip the HKL label. # FORMT '(4x,3F4.0,2F8.0,16X,F4.0)' # or, if PHI and FOM given # LABO H K L F SIGF PHIB FOM FreeRflag CTYPO H H H F Q P W I FORMT '(4x,3F4.0,4F8.0,F4.0)' END eof # # uniqueify will now complete hkl list and add FreeRflags # uniqueify -f FreeRflag pc553_19f.mtz #!/bin/csh -f #
f2mtz HKLIN ./1bxo*-sf.hkl \ hklout $CCP4_SCR/junk.mtz \ <<eof TITLE X-PLOR to MTZ CELL 96.980 46.650 65.710 90.00 115.57 90.00 LABOUT H K L I SIGI FC PHIC CTYPE H H H I Q F P SKIP 2 SYMM C2 eof if($status) exit truncate \ hklin $CCP4_SCR/junk.mtz \ hklout $CCP4_SCR/junk1.mtz \ <<eof LABI IMEAN=I SIGIMEAN=SIGI TRUNCATE YES END eof # if($status) exit cad \ hklin1 $CCP4_SCR/junk1.mtz \ hklin2 $CCP4_SCR/junk.mtz \ hklout ./ibxo-sf.mtz \ <<eof LABI FILE 1 ALLIN LABI FILE 2 E1=FC E2=PHIC END eof
Eleanor Dodson, University of York, England
Maria Turkenburg, University of York, England