From CCP4 5.0, the core handling of input parsing is done by a set of C functions. Separate documentation describes the structures and functions used, and the API for C/C++ programs.
For Fortran programs, the original set of subroutines (held in parser.f) has mostly been replaced by an interface to the C library. From the point of view of an application programmer, this interface should be identical to the original set of subroutines. This document originates from the original Fortran library, but should be applicable to the new library.
The PARSER module of the CCP4 library contains routines which are mainly used for `free-format' `keyworded' input of control data for programs. Most programs have a loop over input records which are initially fed to the routine PARSER to tokenise them and extract the initial keyword. PARSER can cope with continued, commented input lines and included files. It calls PARSE to tokenise individual records and PARSE is sometimes useful itself to compensate for the lack of free-format internal READs in the Fortran 77 standard. See the entries below for details.
See also the keyparse library, which are wrapper routines for the parser routines designed to make it easier to read keyworded input.
The library contains routines to decode the parameters following the `standard' program keywords:
There are also routines to extract real and integer numbers from fields.
At present PARSER only supports one atom selection command syntax, with the subroutine RDATOMSELECT. This will decode lines with the following type of atom selection commands:
Keywords... ATOM <inat0> [ [TO] <inat1> ] | RESIDUE [ ALL | ONS | CA ] [ CHAIN <chnam> ] <ires0> [ [TO] <ires1> ]
This is based on the syntax used in atom selection in DISTANG. For the purposes of decoding the selection commands the preceding keyword(s) are irrelevant.
In practice the syntax described above is designed to allow selections such as:
... ATOM 1 TO 1000 ... ATOM 7 9 ... ATOM 10 ... RESIDUE 11 TO 22 ... RESIDUE 10 CHAIN A ... RESIDUE CHAIN S CA 12 19
The selection will specifies a range either of atom numbers or of residue numbers. In the latter case it can also optionally be used to specify a chain identifier (one character) and/or an ``atom type'' selection keyword:
ALL all types of atoms in the selected range ONS only oxygens and nitrogens in the selected range CA only carbon atoms in the in the selected range(Note that RDATOMSELECT can also be made to treat ALL/ONS/CA as invalid input, which may be appropriate in some applications.)
The ordering of the RESIDUE subarguments is flexible, so that RESIDUE 1 TO 9 CA CHAIN B is the same as RESIDUE CA CHAIN B 1 TO 9 and so on.
For examples see DISTANG and CONTACT.
Library parser.f contains the following subroutines:
Routine(argument list) | Comments |
---|---|
PARSER | read and interpret data from the input stream |
PARSE | free format read routine |
PARSDL | change delimiters |
KEYNUM | check number of numeric fields |
[ KEYERR ] | - internal subroutine (KEYNUM) |
[ CHKNUM ] | - not used (Sep 1993) |
[ CHKTOK ] | - internal (CHKNUM) |
[ GETREA ] | - not used (Sep 1993) |
[ GETINT ] | - not used (Sep 1993) |
GTNREA | extract reals from input |
GTNINT | extract integers from input |
GTPREA | extract single real number |
GTPINT | extract single integer |
[ GETSTR ] | - not used (Sep 1993) |
SBLANK | blank characters in an array |
[ GTCFLD ] | - not used (Sep 1993) |
[ CPYCHR ] | - not used (Sep 1993) |
[ CMOVE ] | - not used (Sep 1993) |
CHKKEY | check keyword against a list |
PUTLIN | dummy routine? |
BLANK | put blank lines to output |
LERROR | general error reporting routine |
RDSYMM | read 'symmetry' keyword |
RDHEAD | read 'header' keyword |
RDCELL | read 'cell' keyword |
RDRESO | read 'resolution' keyword |
RDSCAL | read 'scale' keyword |
RDRESL | read and decode 'resolution' limits |
RDATOMSELECT | read and decode atom selection keywords |
GTTREA | extract a real number |
GTTINT | extract an integer |
CMATCH | compare characters in a string |
SUBROUTINE PARSER(KEY,LINE,IBEG,IEND,ITYP,FVALUE,CVALUE,IDEC,NTOK,LEND,PRINT)
The normal behaviour is to read `keyworded' data from the input stream, and interpret it. This is the case if LINE is initially blank. Stream 5 is the standard input stream, but a line beginning with @<name> starts reading from a file <name> (on stream 11), until end-of-file.
Each logical 'card' may be continued on next line by the continuation characters `&', `-' or `\' at the end of the line: this character is dropped from the list returned to the calling routine.
Trailing comments may be present, following the character '#' or '!': any continuation character (`&', `-' or `\') must PRECEDE the comment character -- comments can't be continued. The complete (continued) line, less any comments, is returned in LINE. Lines containing ONLY comments (or blank) will not be returned from this routine -- reading will continue.
Strings may be quoted or unquoted. See also PARSE for details of token delimiters etc.
Alternatively, if LINE is non-blank it will be interpreted before possibly reading further data on the standard input if LINE ends with a continuation character.
Arguments
KEY (O) CHARACTER*4 Keyword at beginning of line (if present), uppercased before returning. LINE (I/O) CHARACTER*(*) Parse this input string. If blank read lines from unit 5. LINE will be updated to contain the entire line read, including continuations. IBEG (O) INTEGER(*) Array of size at least NTOK. 1st column number of tokens in field IEND (O) INTEGER(*) Array of size at least NTOK. Last column number of tokens in field ITYP (O) INTEGER(*) Array of size at least NTOK. =0 null field =1 character string =2 number FVALUE (O) REAL(*) Array of size at least NTOK. Value of number. CVALUE (O) CHARACTER(*)*4 Array of size at least NTOK. Character string (1st 4 characters), for numbers as well as strings. Items in FVALUE and CVALUE are left unchanged for null fields IDEC (O) INTEGER(*) Array of size at least NTOK. Number of 'digits': for string, number of characters (=4 if.gt.4) for integer, number of digits for real number, (number of digits before point+1)*100 +number of digits after point NTOK (I/O) INTEGER On input sets the maximum number of fields to be parsed (if <20 then defaults to 20) On output returns the number of fields parsed. LEND (O) LOGICAL .FALSE. for control card .TRUE. for end-of-file PRINT (I) LOGICAL .TRUE. echo line to unit 6 via PUTLIN .FALSE. don't echo
SUBROUTINE PARSE(LINE,IBEG,IEND,ITYP,FVALUE,CVALUE,IDEC,N)
Free format read routine. This is really a scanner, not a parser. It scans the LINE into N tokens which are separated by delimiters and updates the information arrays for each, as below. The default delimiters are space, tab, comma and equals; they may be changed using PARSDL. Adjacent commas delimit `null' fields (the same as empty strings). Strings may be unquoted or single- or double-quoted if they don't contain delimiters, but must be surrounded by delimiters to be recognised. This allows literal quotes to be read, e.g. "ab"c" will be recognised as the token `ab"c'. An unquoted `!' or `#' in LINE introduces a trailing comment, which is ignored.
Arguments:
LINE (I) CHARACTER*(*) String to be parsed N (I/O) INTEGER Usually <0, when abs(N) is the maximum number of fields to interpret and should be <= the array dimensions. If N>0 it is the number of tokens read so far, intended for continuation lines with PARSER. Returns number of fields scanned or 0 if line is blank or just contains a comment For I=1,N : IBEG(I) (O) INTEGER(*) 1st column number in field IEND(I) (O) INTEGER(*) last column number in field ITYP(I) (O) INTEGER(*) =0 null field =1 character string =2 number FVALUE(I) (O) REAL(*) Value of number. Use NINT(FVALUE(I)) to extract an integer. CVALUE(I) (O) CHARACTER(*)*4 Character string (1st 4 characters) for numbers as well as strings Items in FVALUE and CVALUE are left unchanged for null fields IDEC(I) (O) INTEGER(*) Number of 'digits' for string, number of characters (=4 if.gt.4) for integer, number of digits for real number, (number of digits before point+1)*100 +number of digits after point
SUBROUTINE PARSDL(NEWDLM,NNEWDL,NSPECD)
Call to change delimiters used by PARSE(R)
NEWDLM (I) CHARACTER*(*) Array containing NNEWDL new delimiters NNEWDL (I) INTEGER Number of new delimiters. If .le. 0, reset delimiters to the standard default set (in DDELIM). NSPECD (I) INTEGER Number of special delimiters which cannot delimit a null field. These are at the beginning of the delimiter array. (defaults in NDSDLM)
SUBROUTINE KEYNUM(N,NSTART,LINE,IBEG,IEND,ITYP,NTOK)
Check that correct number of numbers (numeric fields) are present
Arguments:
N (I) INTEGER Number of consecutive numeric fields expected NSTART (I) INTEGER Number of first field to check LINE (I) CHARACTER*(*) Array containing the fields IBEG (I) INTEGER(*) First column number of fields (from PARSER) IEND (I) INTEGER(*) Last column number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) NTOK (I) INTEGER Number of fields (from PARSER)
SUBROUTINE GTNREA(N,M,X,NTOK,ITYP,FVALUE)
Extract M real numbers X starting from N'th value of Parser array FVALUE, if possible. If no value, X = 0.0 . If illegal, write message.
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted M (I) INTEGER Number of elements to be extracted X (O) REAL(M) Put extracted elements into this array NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE GTNINT(N,M,J,NTOK,ITYP,FVALUE)
Extract M integers J starting from N'th value of Parser array FVALUE, if possible. If no value, J = 0 . If illegal, write message
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted M (I) INTEGER Number of elements to be extracted J (O) INTEGER(M) Put extracted elements into this array NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE GTPREA(N,X,NTOK,ITYP,FVALUE)
Extract real number X from N'th value Parser array FVALUE, if possible. If no value, leave X unchanged. If illegal, write message
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted X (O) REAL Extracted number put here NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE GTPINT(N,I,NTOK,ITYP,FVALUE)
Extract integer I from N'th value Parser array FVALUE, if possible If no value, leave I unchanged. If illegal, write message
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted I (O) INTEGER Extracted number put here NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE SBLANK(ARRAY,N1,N2)
Blank characters N1 to N2 of ARRAY
Arguments:
ARRAY (I/O) CHARACTER(*) N1 (I) INTEGER N2 (I) INTEGER
LOGICAL FUNCTION CMATCH(STRING1,STRING2,NCHAR)
Compare nchar character in string1 and string2 return cmatch .true. if all match, else .false.
Arguments:
STRING1 (I) CHARACTER*(*) 1st string to compare STRING2 (I) CHARACTER*(*) 2nd string to compare NCHAR (I) INTEGER number of characters to compare
SUBROUTINE CHKKEY(KEY,WORDS,NWORDS,IKEY)
Check keyword KEY against list of NWORDS possible keywords in WORDS. Allows abbreviated or extended keys provided they are not ambiguous.
Arguments:
KEY (I) CHARACTER*(*) Keyword for checking WORDS (I) CHARACTER(NWORDS)*(*) List of possible keywords NWORDS (I) INTEGER Number of keywords in WORDS IKEY (I/O) INTEGER = '?', list all words Returns: = keyword number found (.gt.0) = 0 if not found or null = -1 if ambiguous
SUBROUTINE PUTLIN(STROUT,OUTWIN)
This is a dummy PUTLIN to link with the MTZ routines mark 1 - all it does is write the line in STROUT to lun 6. Later the routines will be linked with the Compose-Parser etc. from Kim where PUTLIN does a few more things !
Arguments:
STROUT (I) CHARACTER*(*) Input line OUTWIN (O) CHARACTER*(*) Not used
SUBROUTINE BLANK(OUTWIN,NLINES)
This subroutine calls PUTLIN to output NLINES blank lines to the window OUTWIN
Arguments:
OUTWIN (I) CHARACTER*6 output window NLINES (I) INTEGER number of blank lines to output
SUBROUTINE LERROR(ERRFLG,IFAIL,ERRMSG)
General error reporting subroutine, for the MTZ routines, etc
Arguments:
ERRFLG (I) INTEGER =1 output message as warning =2 output message as fatal IFAIL (I) INTEGER =0 return after fatal error =-1 STOP after reporting fatal error ERRMSG (I) CHARACTER*(*) character string containing error message to output
SUBROUTINE RDSYMM(JTOK,LINE,IBEG,IEND,ITYP,FVALUE,NTOK, . SPGNAM,NUMSGP,PGNAME,NSYM,NSYMP,RSYM)
Read and decode symmetry specification
Arguments:
JTOK (I) INTEGER Number of first field to interpret LINE (I) CHARACTER*(*) Input string (from PARSER) IBEG (I) INTEGER(*) 1st column number of tokens in field (from PARSER) IEND (I) INTEGER(*) Last column number of tokens in field (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) NSYM (I/O) INTEGER Number of symmetry operations already read, including non-primitive. (should be cleared to 0 at beginning) SPGNAM (O) CHARACTER*(*) Space group name NUMSGP (O) INTEGER Space group number PGNAME (O) CHARACTER*(*) Point group name NSYMP (O) INTEGER Number of primitive symmetry operations RSYM (O) REAL(4,4,*) Symmetry matrices. * should be at least =NSYM
SUBROUTINE RDHEAD(JTOK,LINE,IBEG,IEND,ITYP,FVALUE,NTOK, . MTZPRT,MTZBPR)
Read and decode HEADER command, to set print flags for MTZ headers
Arguments:
JTOK (I) INTEGER Number of first field to interpret LINE (I) CHARACTER*(*) Input string (from PARSER) IBEG (I) INTEGER(*) 1st column number of tokens in field (from PARSER) IEND (I) INTEGER(*) Last column number of tokens in field (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) MTZPRT (O) INTEGER Flag to control printout from MTZ file header NONE sets MTZPRT = 0 no header o/p BRIEF sets MTZPRT = 1 (default) brief header o/p HISTORY sets MTZPRT = 2 brief + mtz history ALL sets MTZPRT = 3 full header o/p from mtz reads MTZBPR (O) INTEGER Controls printout from BATCH HEADERS NOBATCH sets MTZBPR = 0 no batch header o/p BATCH sets MTZBPR = 1 (default) batch titles o/p ORIENTATION sets MTZBPR = 2 batch orientation also
SUBROUTINE RDCELL(ITOK,ITYPE,FVALUE,NTOK,CELL)
Read and decode cell parameters
Arguments:
ITOK (I) INTEGER Number of first field to interpret ITYPE (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) CELL (O) REAL(6) Cell parameters a, b, c, alpha, beta, gamma.
SUBROUTINE RDRESO(ITOK,ITYPE,FVALUE,NTOK,RESMIN, + RESMAX,SMIN,SMAX)
Read and decode resolution limits.
Arguments:
ITOK (I) INTEGER Number of first field to interpret ITYPE (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) RESMIN (O) REAL Minimum resolution (in As) RESMAX (O) REAL Maximum resolution (in As) SMIN (O) REAL Minimum resolution ( 4sin**2 theta/lambda**2) SMAX (O) REAL Maximum resolution ( 4sin**2 theta/lambda**2)
SUBROUTINE RDSCAL(ITOK,LINE,IBEG,IEND,ITYP,FVALUE,NTOK, . NLPRGI,LSPRGI,ILPRGI,SCAL,BB)
Read and decode SCALE.
Arguments:
ITOK (I/O) INTEGER Input: number of first field to interpret Output: number of next token to interpret (.gt. 0) = 0 if line exhausted (SCAL & BB OK) = -1 if no scale given = -2 unrecognized label LINE (I) CHARACTER*(*) Input string (from PARSER) IBEG (I) INTEGER(*) 1st column number of tokens in field (from PARSER) IEND (I) INTEGER(*) Last column number of tokens in field (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) LSPRGI (I) CHARACTER(*)*30 Program label strings. L(abel) S(tring) PRG(ram) I(nput) NLPRGI (I) INTEGER Number of label strings in LSPRGI ILPRGI (O) INTEGER Number in array of LSPRGI whose scale has been rese SCAL (O) REAL Scale factor, no default BB (O) REAL Temperature factor, default = 0.0
SUBROUTINE RDRESL(ITOK,ITYPE,FVALUE,CVALUE,NTOK,RESMIN, + RESMAX,SMIN,SMAX,ISTAT)
Read and decode resolution limits.
Subkeywords in CVALUE recognized:
LOW read next number as low resolution limit HIGH read next number as high resolution limit
If LOW & HIGH are both present, the limits will still be swapped to the correct order
If only LOW or HIGH are given, the unset limit (ie either RESMAX, SMAX or RESMIN, SMIN) will be set to -1.0. If only one number is given, it is treated as a high resolution limit
If both limits are given without keywords, and both are .lt. 1.0, it is assumed that the limits are 4(sin theta/lambda)**2 rather than A
Arguments:
ITOK (I) INTEGER Number of first field to interpret ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) NTOK (I) INTEGER The number of fields parsed. (from PARSER) CVALUE (I) CHARACTER(*)*4 Parsed tokens from program input. (from PARSER) RESMIN (O) REAL Minimum resolution (in As) (ie low resolution) RESMAX (O) REAL Maximum resolution (in As) (ie high resolution) SMIN (O) REAL Minimum resolution ( 4sin**2 theta/lambda**2) (ie low resolution) SMAX (O) REAL Maximum resolution ( 4sin**2 theta/lambda**2) (ie high resolution) ISTAT (O) INTEGER =0 OK =-1 illegal subkeyword =+1 no limits set =+2 illegal number (probably can't happen)
SUBROUTINE GTTREA(N,X,LFLAG,NTOK,ITYP,FVALUE)
Extract real number X from N'th value of Parser array FVALUE, if possible.
If no value, leave X unchanged. If illegal, write message
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted X (O) REAL Put extracted number here LFLAG (O) INTEGER = 0 OK (valid number or null field) = -1 beyond end of line = +1 illegal number NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE GTTINT(N,I,LFLAG,NTOK,ITYP,FVALUE)
Extract integer I from N'th value of Parser array FVALUE, if possible.
If no value, leave I unchanged. If illegal, write message.
Arguments:
N (I) INTEGER Number of 1st element of FVALUE to be extracted I (O) INTEGER Put extracted number here LFLAG (O) INTEGER = 0 OK (valid number or null field) = -1 beyond end of line = +1 illegal number NTOK (I) INTEGER Total number of fields (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) FVALUE (I) REAL(*) Array of numbers to be extracted (from PARSER)
SUBROUTINE RDATOMSELECT(JTOK,INAT0,INAT1,IRES0,IRES1,CHNAM, + IMODE,NTOK,LINE,IBEG,IEND,ITYP,IDEC, + FVALUE,IFAIL)
Subroutine to process atom selection keyword with the following general syntax:
<Keywords...> ATOM <inat0> [ [TO] <inat1> ] | RESIDUE [ALL | ONS | CA] [ CHAIN <chnam> ] <ires0> [ [TO] <ires1> ] e.g. kywd atom 1 to 100 kywd residue chain A 20 to 30 kywd residue all 11 32 etc...
To be compatible with DISTANG, CONTACT etc the ordering of the RESIDUE subarguments is flexible, eg RESIDUE 1 TO 9 CA CHAIN B is the same as RESIDUE CA CHAIN B 1 TO 9...
The subroutine returns the selection entered by the user and expects the calling program to deal with the results. The preceding keywords are relevant for this subroutine
ARGUMENTS
JTOK (I) INTEGER Number of first field to interpret NTOK (I) INTEGER The number of fields parsed, from PARSER LINE (I) CHARACTER*(*) Input string, from PARSER IBEG (I) INTEGER(*) 1st column number of tokens in field (from PARSER) IEND (I) INTEGER(*) Last column number of tokens in field (from PARSER) ITYP (I) INTEGER(*) =0 null field =1 character string =2 number (from PARSER) IDEC (I) INTEGER(*) Number of characters/digits in each token (from PARSER) FVALUE (I) REAL(*) Array of numbers. (from PARSER) INAT0 (O) INTEGER Lower limit of atom range (-99 if not set) INAT1 (O) INTEGER Upper limit of atom range (-99 if not set) IRES0 (O) INTEGER Lower limit of residue range (-99 if not set) IRES1 (O) INTEGER Upper limit of residue range (-99 if not set) CHNAM (O) CHARACTER*1 Chain identifier (' ' if not set) IMODE (I/O) INTEGER On entry: -1 = don't allow MODE any other value = allow MODE On exit: Type of atoms to include: 1=ALL 2=ONS 3=CA (see eg CONTACT) IFAIL (I/O) INTEGER On entry: 0 = suppress warnings -1 = print warnings On exit: 0 = LINE parsed ok >0 = error occurred parsing line (value of IFAIL is no. of bad token)
RETURNED VALUES
The subroutine returns either:
all = (default) all atoms in residue range ons = only oxygen and nitrogen atoms ca = only CA atoms(see CONTACT/DISTANG)
Unset atoms/residue numbers will be returned < 0 (i.e. -99)
Unset chain identifier will be returned as a blank, i.e. ' '
Mode defaults to 1 = include all types of atoms.