TIDYTEXT - TEXT FORMATTING PROGRAM ================================== INTRODUCTION TIDYTEXT is a text formatting program designed to be used, in particular, with machine readable documentation, though its application is much more general. It was written with the following requirements: 1) The program must be able to produce neatly formatted text files which can be easily read when printed or listed using the usual commands available on a computer system. 2) The program must be able to produce a paged printed output and allow for a range of page sizes and printer types. 3) The program must be portable so that it can be distributed to sites receiving the machine readable documentation. The first requirement dictated that the text files should contain an absolute minimum of special control characters so that the formatting is done following a few simple conventions which must be observed when the text is prepared or edited. The actual editing of a text file is done using whatever editor is available on the computer holding the text files and the file is then re-formatted using TIDYTEXT. To meet requirement (3), the program TIDYTEXT has been written in ANSI standard FORTRAN 77. The program was written by J.W. Campbell, Daresbury Laboratory. List of sections: Preparing The Text Underlining Tables Forcing A New Page Bold Print Summary of Control Options Restrictions Warnings Notes Example Control Data Installation Hard Copy PREPARING THE TEXT The text formatting is done on a paragraph basis with paragraphs being separated by one or more blank lines. Two types of paragraph are distinguished and treated as follows: a) Single Line Paragraphs : The spacing within the lines is left unchanged. Single lines must not be longer than the number of columns requested for the formatted output or they will be converted to multiple line paragraphs (See also warnings below). b) Multiple Line Paragraphs: These are formatted with both left and right justification as follows. Two features of the way TIDYTEXT does this are of particular importance. (i) The left hand columns for justification are taken from the starting positions of the lines actually present in the text whereas the right hand margin is justified to a fixed column number for the complete document. The last line of a paragraph is left justified only. (ii) The concept of 'lead in' text in the first line of a multiple line paragraph used. The 'lead in' text is the text in the first line up to the column at which the second line starts. The spacing of such text is not altered. These points are illustrated by the layout of the text in this section itself. The text string indicated below is an example of 'lead in' text for a multiple line paragraph: 'Multiple Line Paragraphs: ' UNDERLINING Special provision is made for underlining if this is required. This is done by linking the words to be underlined with the underline character or by appending an underline character if a single word is to be underlined. The character used to indicate underlining may be defined by the user. As the underline character itself is often needed in documents, the up-arrow character is now normally used as the standard underline indicating character. e.g. UNDERLINED_ Data_Control_Cards When the document is printed using TIDYTEXT, the underline characters within the text are replaced by spaces or deleted as appropriate (See Note 1 in the Notes section for full details). TABLES In some cases, the text formatting procedure, as described above, is not appropriate, particularly in tables where it is necessary to retain the spacing of the items in lines of text which are not separated by blank lines. To cater for such requirements, text which appears between two delimiting lines, containing only a full stop in column 1, will remain unaltered in its spacing except for minor adjustments if underlining/bold printing is used. . N Average Sigma 246 126.4 3.1 278 98.3 2.7 943 250.4 4.2 . On printing the document, the delimiting full stops are replaced by spaces. It may also be noted that the lines, containing the full stop in column 1 only, also act as paragraph delimiters. For details of what happens if the lines of a table are longer than the output line or record length specified, see Notes section below for fuller details. FORCING A NEW PAGE The user may force the taking of a new page on the printer output by inserting in the text a line containing only an '@' character in column 1. It should be noted that such a line also acts as a paragraph delimiter. The line itself will not appear on the printer output. BOLD PRINT Bold printing (using overprinting) may also be incorporated if required by making an appropriate specification as a program control option (see below). This is carried out in an analogous manner to the underlining except that the underline character is replaced by another user defined character e.g. a tilde. SUMMARY OF CONTROL OPTIONS The various control options are summarised here. Fuller details are given later. General Options WF Write a re-formatted output file. PR Produce a paged printer output (File with Fortran carriage control or, if PS flag also set, a Postscript format file). LI Output a monitor listing. OV Overprinting available. DS Request double spacing of the printer output. BOx Request bold printing and define character 'x' to indicate that bold printing is to be carried out. ULx Re-define character indicating underline as 'x'. PS Output a Postscript format file (PR must also be specified) Printer Output Options Define minimum column for printing (left hand margin). Define maximum column for printing (right hand margin). Define first line on the page for printer output. Define final line on the page for printer output. Define page length if required (ignored if Postscript file is being output). Request printing of 'cutting' lines if using line printer output (ignored if Postscript file output). Specify the number of extra blank pages to be output at the end of the text. Set the page number for the first page or suppress page numbering. Input/Output File Options The record lengths of the input and output files may be defined for FORTRAN formatted sequential files of fixed record length. An option to use sequential files with variable length records is also available. Character Translation A table may be set up defining character translations which are to be performed when a text file is printed using TIDYTEXT. RESTRICTIONS a) The top line of the page on the printer output is reserved for page numbering even if page numbering is suppressed. b) For multiple line paragraphs, the start of each line must be at least 25 columns to the left of the maximum print column requested. If this is not the case then the text will automatically be shifted to the left. c) The minimum number of columns allowed for output is 25. d) The maximum number of columns allowed is 150. e) The maximum record lengths allowed for the input and output text files are 150. WARNINGS a) The most common mistake is to forget to leave blank lines between a set of lines whose spacing is to remain unaltered. For example, if an address is typed as: Daresbury Laboratory, Warrington, WA4 4AD it will be reformatted as: Daresbury Laboratory, Warrington, WA4 4AD To get the required effect, the address must either be treated as a table or the lines separated by blank lines i.e. . Daresbury Laboratory, Warrington, WA4 4AD . or Daresbury Laboratory, Warrington, WA4 4AD b) Single line paragraphs will be converted to multiple line paragraphs if the are longer than the output width of text requested with a consequent change of the spacing within the text. NOTES 1) Further Details of Underlining and Bold Printing This note gives details of how trailing underline/bold characters are treated when text is processed for printing (all embedded underline/bold characters will be replaced by spaces). Let 'nr' be the number of trailing underline/bold characters which are to be replaced or removed as indicated for the following cases ('string' may contain embedded underline/bold characters). The example assumes that the back-quotecharacter rather than the more usual tildeis being used as the bold text indicator (this is done here so that this document itself may be produced and printed in TIDYTEXT format) string nr=0 string_ nr=1 string` nr=1 string_` nr=2 string`_ nr=2 For single line paragraphs and lines of a table, the trailing 'nr' characters will be removed and the following text shifted 'nr' positions to the left. In multiple line paragraphs, the 'nr' trailing characters will be replaced by spaces. If they reach the right hand margin, then the last word of the line will be shifted 'nr' characters to the right in order to maintain the justification of the text. 2) Treatment of Too Long Table Lines If output lines of a table (to a file of the printer) are longer than the requested width of text, then the text is split over as many lines as required with a null continuation character being output at the ends of the lines to be continued. Thus for a line width of 'iw' characters, 'iw-1' characters of the text will be output per line. When a line of table is read from an input file, the program will look for a continuation character at the end of the lines and will re-constitute the original line if it was split. 3) Treatment of Hyphens When a multiline paragraph is read in and the last 'word' of an input line has a hyphen appended, then the first 'word' of the next line is appended to this 'word' and the hyphen is removed. An isolated hyphen sign at the end of a line remains unaltered. EXAMPLE This section gives an example of a text file as prepared/edited and shows the results of formatting using TIDYTEXT in a number of cases. a) A Listing of the Text File as Prepared/Edited CCP4` (SERC Daresbury Laboratory) The Science and Engineering Research Council has set up a number of Collaborative Computational Projects (CCP's) for the benefit of UK University research groups. The services provided by the CCP in Protein Crystallography (CCP4) include the following: a) Software` A comprehensive suite of protein crystallography programs is being built up and documented to an agreed set of standards. b) News`Distribution The project publishes an informal newsletter, the Information Quarterly in Protein Crystallography. c) Meetings` The project organises study weekends, workshops and other smaller meetings. The following volumes of documentation are available: . Volume` Colour` Contents` A Red Overview B Blue User Documentation C Green Program Documentation D Yellow Programmer's Guide E Violet Daresbury Housekeeping . b) A Listing of the Formatted File (74 columns wide) CCP4` (SERC Daresbury Laboratory) The Science and Engineering Research Council has set up a number of Collaborative Computational Projects (CCP's) for the benefit of UK University research groups. The services provided by the CCP in Protein Crystallography (CCP4) include the following: a) Software` A comprehensive suite of protein crystallography programs is being built up and documented to an agreed set of standards. b) News`Distribution The project publishes an informal newsletter, the Information Quarterly in Protein Crystallography. c) Meetings` The project organises study weekends, workshops and other smaller meetings. The following volumes of documentation are available: . Volume` Colour` Contents` A Red Overview B Blue User Documentation C Green Program Documentation D Yellow Programmer's Guide E Violet Daresbury Housekeeping . c) A Printed Output From the File (74 columns wide) 1 CCP4 (SERC Daresbury Laboratory) The Science and Engineering Research Council has set up a number of Collaborative Computational Projects (CCP's) for the benefit of UK University research groups. The services provided by the CCP in Protein Crystallography (CCP4) include the following: a) Software A comprehensive suite of protein crystallography programs is being built up and documented to an agreed set of standards. b) News Distribution The project publishes an informal newsletter, the Information Quarterly in Protein Crystallography. c) Meetings The project organises study weekends, workshops and other smaller meetings. The following volumes of documentation are available: Volume Colour Contents A Red Overview B Blue User Documentation C Green Program Documentation D Yellow Programmer's Guide E Violet Daresbury Housekeeping d) A Printed Output from the File (55 columns wide) 1 CCP4 (SERC Daresbury Laboratory) The Science and Engineering Research Council has set up a number of Collaborative Computational Projects (CCP's) for the benefit of UK University research groups. The services provided by the CCP in Protein Crystallography (CCP4) include the following: a) Software A comprehensive suite of protein crystallography programs is being built up and documented to an agreed set of standards. b) News Distribution The project publishes an informal newsletter, the Information Quarterly in Protein Crystallography. c) Meetings The project organises study weekends, workshops and other smaller meetings. The following volumes of documentation are available: Volume Colour Contents A Red Overview B Blue User Documentation C Green Program Documentation D Yellow Programmer's Guide E Violet Daresbury Housekeeping CONTROL DATA The program TIDYTEXT requires a small file of control data in which the required options are selected. It is a card image (i.e. 80 characters per record) file. Control Record 1 MINCOL MAXCOL MINLIN MAXLIN NLINES LREC1 LREC2 IC1 IC2 NBL IPAGE MINCOL is the minimum column number for the printer output. A value greater than or equal to 1 must be given. MAXCOL is the maximum column number for the printer output. A maximum value of 150 is allowed. When an output text file is being written, the number of columns written is (MAXCOL-MINCOL+1) starting at column 1. MINLIN is the number of the first line on the page for the printer output of the text. This must be greater than or equal to 2 as the first line is reserved for the page number. MAXLIN is the final line number on the page for the printer output of the text. NLINES is the total number of lines in a page. If a hardware form feed is available on the printer then a value of 0 should be given. LREC1 is the record length of the input text file (maximum of 150) for a formatted sequential file. A value of 0 indicates a sequential file of variable length records. LREC2 is the record length of the output text file (maximum of 150) for a formatted sequential file. A value of 0 indicates a sequential file of variable length records. IC1, IC2 are the minimum and maximum column numbers for the output of MAXLIN lines of 'cutting lines' (e.g. indicating the width of an A4 page) to be printed at the start of the printer output if required. If no cutting lines are required the set IC1 and IC2 to 0. If requested, the cutting lines are output as two columns of full stops. e.g. . . . . . . . . . . . . etc. NBL is the number of extra blank pages to be output at the end of the printed text. IPAGE is the page number minus 1 to be printed on the first page of the printed output. If a value of -1 is given, then page numbering will be suppressed. Control Records 2 CODE1 CODE2 ... ..... blank line These records contain a list of codes selected as required from the following list. They may be input on one or more records as required and are terminated by a blank record. WF Write a re-formatted output file. PR Print a paged printer output (or Postscript if PS is also set. LI Output a monitor listing to the terminal while printing and/or re-formatting a file. OV Indicates that overprinting is available. If this is not specified then no underlining or bold printing will be done on the printer output. DS Double space the printer output. BOx Request bold printing on the printer output. The character 'x' is to be used to indicate bold printing. If 'x' is omitted, the tilde is assumed. No bold printing will be done unless BO or BOx is specified. ULx Redefines the character indicating underlining as 'x'. PS Output a Postscript file (PR must also be given). Control Records 3 NTRANS N1(1) N2(1) N1(2) N2(2) ... ... N1(NTRANS) N2(NTRANS) NTRANS is the number of characters for which character translations are to be performed when the text file is to be printed using TIDYTEXT. The records following contain NTRANS pairs of numbers (on as many records as required) giving N1(i) as the position in the collation sequence of an input character which is to be translated and N2(i) is the position in the collation sequence of the translated character. NTRANS may be 0. INSTALLATION The program TIDYTEXT is written in ANSI standard FORTRAN 77. It uses two subroutines from the SERC CCP4 program suite at the Daresbury Laboratory. These are CCPOPN and CCPRVR. Default versions of the subroutines are given commented out at the end of the source code file for TIDYTEXT as distributed. The program uses the following unit numbers and logical file names: Unit File Use 1 TEXTIN The input text file 2 TEXTOUT The output text file (Option WF) 5 DATA The control data file (Card image) 6 TERMOUT Monitor listing output to the terminal (Option LI) 7 PROUT The printer output stream (Option PR) It should be noted that the printer is treated as a FORTRAN output device i.e. as one which recognises the carriage control characters '1' as a form feed and '+' for overprinting (unless the Postscript option is also requested in whic case a Postscript format file will be written. As the program is written, it uses a CHARACTER*500 variable LINBUF. If the length of this needs to be reduced then the length of the the character variables MARG, ITEXT, JTEXT, UTEXT and BTEXT should be reduced to about one third of the new length of LINBUF. LBMAX and MXTEXT must also be reset to the reduced lengths for LINBUF and for the other variables respectively. The maximum allowed record lengths of the input and output files and the maximum number of output columns will be reduced from 150 to the new value of MXTEXT. HARD COPY When printing the documentation of TIDYTEXT the following options must be used: ULx Where x is an up-arrow character BOx Where x is a tilde A minimum of 75 print columns (the recommended value) is required.