Useful docs: w3 schools sql, sql as understood by sqlite, wikipedia for TLAs etc.
i2 uses sqlite accessed by the Python sqlite3 interface. Discussion on other options later. All access should be via the database api, dbapi/CCP4DbApi.py and the schema is dbapi/database_schema.sql. The db is usually opened as part of startup by utils/startup.py:startDb() function and can be accesed via CCP4Modules.PROJECTSMANAGER().db(). By default the user database file is $HOME/.CCP4I2/db/database.sqlite but this can be changed either as command line argument (-db) or by the environment variables: CCP4_LOCAL_HOME which resets where .CCP4I2 and other project directories are by default CCP4_LOCAL_DOTDIR resets .CCP4I2 parent path These are mostly useful for classroom environments with disk space and/or network speed issues.
Note that Python sqlite has a stated requirement that the sqlite is in the main thread. The database is accessed by both the main gui process and the 'background' runTask processes. I have tried to minimise all access but especially from runTask processes.
The present implementation expects the db to have a single user access - the users login name is found automatically and checked against the value saved in Users table (set when database created) and may fail if they do not match. The gui (in startup.py:startDb() function) deals with this by asking the user to provide the 'old' username as found in the db. There are tables in db such as ProjectsUsersPermissions which are intended for a more sophisticated multi-user access but are not fully implemented.
Each of the main tables in the database has an id which is a randomly generated unique identifier that serves as the primary key. In Python this is handled as a string (or as defined by CCP4DbApi.UUIDTYPE valriable which is currently set to str) and is widely used. There are also some tables (FileTypes, JobStatus) which contain list of integer 'enums' that are used in the 'main' tables
To inspect the db in the Job list the context (right mouse) menus for jobs and files have a View->Database entry option. Also the program sqlitebrowser is very useful and can also be used to edit a db.
All access to the db is channeled through the excute() and commit() methods so changes needed when porting to alternative sql are hopefully localised to these methods. Most methods in CDbApi create an sql command and use execute() (always!) to execute that command. This function traps and reports errors and if setDiagnostic() has been set True it will report all executed sql commands. If the sql command has changed the database then this must be followed by a commit(). If the change is not commited then the process does not release the db and, typically for the case of the gui holding the db, prevents the runTask from reporting a job finished so it appears to run forever. If the sql command is a 'getter' returning data then the fetchAll2Py() or fetchAll2PyList() method should be called to return the data as lists of appropriate Python data types. The difference between these functions is that fetchAll2Py() returns a single list of a single column of data from the db but fetchAll2PyList() returns a list of lists of multiple columns. Both of these functions require as input either the Python type or list of Python types for the returned data.
For the main tables in the database there are methods:
createWhatever() - create new row in table
updateWhatever(whateverId=None,key=None,value=None) - update key
column in row to value.
getWhateverInfo(whateverId,mode=None) - return a dictionary of data
for the whatever with id whateverId for all the columns listed in
mode argument. If mode is the default, None, then all the columns are
returned. If mode is a single column name then the single Python
value is return rather than a dict. The keys in the return dict are
the column names but all lower case.
deleteWhatever(whateverId) - delete the row with id whateverId
Note that, for convenience, the much used getJobInfo() can return data not in the Jobs table: projectname, runtime (finish-start time), parentjobnumber, childjobs, descendentjobs, performanceclass, performance.
There are numerous methods to return more complex data combinations required in specific gui context, to load data from finished jobs, to handle running jobs(?LINK), and to support import/export of projects(?LINK).
The following review of db tables mentions several redundant and 'not yet fully implemented' tables and columns. There is a difficulty the sqlite update will not remove columns - this can only be done by copying the db without the no-longer-required features.
The Projects table holds:
ProjectsUsersPermissions is a table to hold project access details for users other than the owner - not currently used.
ProjectExports and ProjectImports tables track whenever a project in exported or imported. Currently this info is used in the gui to offer user option to export only jobs created since last import/export. ProjectImports.ProjectExportDatabaseId references the databaseId of the source of the imported data. There is potentially useful info on 'where did that job come from' in these tables but not currently accessible from the gui.
Recently (Oct 2016) tables to support tags and comments on projects have been added. ?LINK User docs. The Tags table is a simple list of tags but with Tags.ParentTagID enables grouping of tags into heirarchy. The ProjectTags table associates a project with a tag. The ProjectComments table saves comments with userid and time and potentially could be used for 'conversation' on the project.
The Comments table is a list of comments on specific jobs.
JobKeyValues and JobKeyCharValues hold items of real or char data performance indicators that summaries the quality of the job result. See dev docs. The KeyTypes table lists the allowed JobKeyValues.KeyTypeID values.
A record of a file is created in the Files table when the file is either imported into i2 or the file is output by a job. The Files record includes the jobId of the job importing/creating the file. If the file was imported then it is also recorded in the ImportedFiles table (whose columns include the name of the original source file).
FileUses The file may be subsequently input into another job and the FileUses table is used to associate the file with that other job (with Role value indicating 'in'). A further complication is that a file may be ouput twice (sort-of) by a sub-job in a pipeline and then being one of the final files output by the whole pipeline - the file needs to be associated with two jobs and so a FileUses entry (with a Role column value indicating 'out') is used for the second job.
Filename The Files.Filename column contains just the name of the file (no path). A file is always expected to be in the job directory of the job that created or imported it or in the CCP4_IMPORTED_FILES directory for the project. The Files.PathFlag column indicates which of these it is. There is a redundant Files.FilePath column (which can not be easily removed form an sqlite db).
JobParamName In both Files and FileUses tables there is the jobId of a job and also a JobParamName column which contains the name of the parameter from the task (e.g. XYZIN).
Annotation Files.Annotation is a short text string to appear in the gui.
File types There are three Files columns containing flags to denote some aspect of file type: Files.FiletypeID is the fundamental type which corresponds to a single data class derived from CDataFile and possible values listed in CCP4DbApi.FILETYPELIST. Files.FileContent Some types of file (notable the 'mini' MTZs) can hold data in different forms and this column flags that form - permissible values depend on the Files.FiletypeID. Files.FileSubType For some types of file (eg CPdbDataFile) the content of the file may only be appropriate for use in a some context that use that type of file. This is an attempt at codifying that problem - it is little used in practice. Note -- The names FileContent and FileSubType would probably be more appropriate if reversed!
ImportFiles This table exists mostly to hold the SourceFilename of an imported file. It also contains a checksum value for the source file at the time of import. Any subsequent access to the source file should check it has the same checksum i.e. is still the same file.
ExportFiles Ideally when a user exports a file from i2 (presumably to use in some other software) the export is noted in this table. When another file is subsequently imported (presumably the output from the other software) it should be associated with the exported file. There is a column ImportFiles.ExportFileID to make this connection. This is not fully implemented and used partly due to user resistance to being organised.
FileAssociations with support tables FileAssociationTypes, FileAssociationRoles and FileAssociationMembers is intended to make an association between two files such as the reflection data and freer data. This is fully implemented in CDbApi but not used yet. The CDbApi.createFileAssociation() method creates an entry in the FileAssociations table and then for every file in the association adds an entry to FileAssociationMembers with a reference to the FileAssociations entry.
Loading file data. The params.def.xml file associated with the job contains all parameters associated with the job including the input and output data files. The CDbApi.gleanJobFiles() method extracts data from the params.def.xml and creates the Files and FileUse records. The gleanJobFiles() method is passed a CContainer instance containing the contents of the param file. The method scans the inputData and outputData sub-containers for objects of type derived from CDataFile and checks if the spcified file name exists before creating a Files or FileUses record.