XML Format for Database Schemas

Contents

Return to main dbmstools documentation page.

Example Schema

At times like these an example always helps, so here's an extract from an XML representation of a DBMS schema:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE db-schema SYSTEM "file:///db-schema.dtd">

<db-schema name="mmspring">
    <templates>
        <template name="systemId">
            <column name="systemId" type="long" allows-null="false" description="Primary key (auto-generated)">
                <column-override dbms="mssql, mysql" auto-increment="true"/>
            </column>
        </template>
        <template
            ...
    <sequence dbms="postgres8,oracle" name="SEQ_MsoUser_SystemId"/>

    <table name="MsoUser" description="Authorised users of MSO's" size="small" update-frequency="low" select-frequency="low">
        <columns>
            <column template="systemId" />
            <column name="userName" type="varchar" length="18" allows-null="false"
                    description="Full name of the user"/>
            <column name="bigtextcol" type="varchar" length="8000">
                <column-override dbms="oracle,mssql" type="clob" length="0"/>
            </column>
            ....
        </columns>
        <constraints>
            <primary-key name="PK_MsoUser">
                <column-ref name="systemId"/>
            </primary-key>
            <unique name="UN_MsoUser_mso_userName"
                    description="MSO and userName combination must be unique">
                <column-ref name="mso"/>
                <column-ref name="userName"/>
            </unique>
            <foreign-key name="FK_MsoUser_Mso" references="Mso">
                <column-ref name="mso"/>
                <referenced-column name="systemId"/>
            </foreign-key>
        </constraints>
        <indexes>
            <index name="IN_MsoUser_userName">
                <column-ref name="userName"/>
            </index>
        </indexes>
    </table>
    <table
        ...
</db-schema>

OK, so it's a little contrived, but you can probably see what's going on here. In particular, you can guess that if the target DBMS is Oracle or Microsoft SQLServer, instead of doing the normal DBMS mapping for a varchar for the bigtextcol column, instead we use a CLOB, and omit the length (by setting it to zero). The CLOB will mapped to a CLOB column in Oracle, and a TEXT in SQLServer. In other words, you get pretty full control.

IMPORTANT NOTE: The tables should be listed in dependency order - in other words, your foreign keys should only refer to tables that have already appeared in the XML, rather than tables that occur later. dbmstools always does things in the order you've specified in the XML (except for drop scripts, where the order is reversed).

Use of DTD

There is a document type definition (DTD) in the same directory as the main scripts, called db-schema.dtd. The DTD isn't ever checked if you're using CPython, even if you add a reference to it in your XML database schema document (since the xml.dom.minidom parser is non-validating).

However, if you're running any of the tools using Jython (and therefore also if running from an ant build script), the DTD will be checked if you insert a reference to it in the XML document, i.e.:

<!DOCTYPE db-schema SYSTEM "file:///db-schema.dtd">

XML Structure

The XML structure for defining a database schema is shown below, with the number of occurrences of each element shown in brackets. Documentation for each of these elements can be found in the following section.

Note that description elements are omitted in this tree - see the following sections for details.

Descriptions

Every element which allows a description attribute (see the table below) also supports a description sub-element (which is used in preference). This allows one-liners to be included as attributes (often more convenient and compact), while more involved descriptions can use the element form.

Conditions

Many of the schema elements allow a condition element, which is used to determine whether the element will be included in any output (based on variables supplied to the tool at execution time).

Conditions have the following structure:

  • condition
    • Any _one_ of the following elements:
      • if-set
      • if-value
      • if-value-in
        • value (0 or more)
      • not
        • Any _one_ of: if-set, if-value, if-value-in, and, or
      • or
        • One or more of: if-set, if-value, if-value-in, not, and, or
      • and
        • One or more of: if-set, if-value, if-value-in, not, and, or

XML Elements

Notes for the following table:

  1. The dbms attribute accepts either the name of a single supported DBMS, drawn from the set (postgres7, postgres8, oracle, mysql, mssql, hsql, sqlite), or a comma-separated list of values, e.g. 'postgres8, oracle'. If omitted, the item applies to all DBMSs
  2. Item names must not exceed 30 characters in length, unless this limit is overridden on the command line for xml2ddl and deltaddl. This limit exists for Oracle databases.
  3. The previous-name attribute is used to tell deltaddl the name by which an element was known in the schema version immediately previous to the current version. This allows deltaddl to generate SQL to rename the element. Note that you should remove this field in subsequent schema versions, otherwise deltaddl may attempt to rename the element every time.
  4. The description attribute and element are alternatives for describing any element. The element is used in preference (if both are provided). Typically, you'd use the attribute for simple descriptions (since this provides a more compact representation), and use the element for multi-line descriptions.
  5. The description attribute and element allow basic Wiki-style markup (for character formatting and new lines). See the following section for details.
  6. The description attribute and element allow simple token replacement using variables, using the form ${variableName}, e.g. ${schemaType}. Each variable must be defined in the variables section, although the virtual variable dbms is always available.
Parent Element Notes and attributes
- db-schema

The root element for the schema XML document. Attributes:

name:The database name, used only in documentation and filenames (mandatory)
description:(optional)
version:Version number of the schema, for documentation purposes only (optional)
max-name-length:
 (optional) Maximum number of characters for names of all items in schemas, e.g tables, indexes, columns etc.
schema:(optional) Name of the default schema in which to create objects
db-schema description Alternative to the description attribute
db-schema variables Defines any variables that may be used in conditions or for token replacement
db-schema schema-override

Overrides the standard schema name (here the global default) for one or more specified DBMSs. Zero or more. Attributes:

dbms:(mandatory)
schema:(mandatory) Name of the schema in which to create objects for this DBMS. Set to an empty string to prevent use of a schema name for the DBMS
db-schema templates Contains 1 or more 'template' elements, which are used as templates when creating columns. Optional
db-schema grant-templates Contains 1 or more 'grant-template' elements, which are used as templates when creating sets of grants. Optional
db-schema sequence

Defines a database sequence (of numbers) for DBMSs that support them, e.g. Oracle, Postgres. Zero or more. Attributes:

name:(mandatory)
description:(optional)
dbms:(optional)
start-from:(optional) - The number from which the sequence starts (default is DBMS-specific)
db-schema table

Defines a table in the schema. Zero or more. Attributes:

name:(mandatory)
previous-name:(optional)
description:(optional)
dbms:(optional)
size:(optional) - contains a value from the set (small, medium, large), nominating the likely size of the table in production. For documentation only
update-frequency:
 (optional) - contains a value from the set (none, low, medium, high), nominating how often rows in the table will be inserted, updated or deleted. For documentation only
select-frequency:
 (optional) - contains a value from the set (none, low, medium, high), nominating how often rows in the table will be retrieved. For documentation only
variables variable

Defines a variable that may be used in conditions or for token replacement. Values for variables must be passed into the tool (on the command-line, or via properties in the ant build file).

name:(mandatory)
description:(optional)
default:(optional) - if no value is provided for the variable, this value will be used (it is an error to refer to a variable that has neither value nor default)
valid-values:(optional) - a shorthand alternative to the valid-values sub-element. Consists of a comma-separated list (white space is ignored) of valid values for the variable
variable description See definition for db-schema element
variable valid-value Represents a single value that can be held by the variable
schema-override condition A condition to be evaluated based on the variables supplied at runtime. The containing element will only be included if the condition evaluates as True
condition if-set

Evaluates true if the named variable is set (either at runtime, or via a default-value)

variable:(mandatory) - name of the variable to check
condition if-value

Evaluates true if the named variable contains the specified value

variable:(mandatory) - name of the variable to check
value:(mandatory) - the required value
condition if-value-in

Evaluates true if the named variable contains any of the specified values

variable:(mandatory) - name of the variable to check
values:(optional) - the list of acceptable values. A shorthand alternative to the value sub-element, consists of a comma-separated list (white space is ignored)
condition not Negates the contained condition element (if-set, if-value, if-value-in, and or or)
condition or True if any of the contained condition elements are true (if-set, if-value, if-value-in, not, and or or)
condition and True if all of the contained condition elements are true (if-set, if-value, if-value-in, not, and or or)
if-value-in value A valid value for the variable
templates template

Defines a column template that can be used (in whole or in part) when defining columns in tables. Zero or more. Attributes:

name:(mandatory) - the name by which this template is referenced from a column definition
description:(optional)
template description See definition for db-schema element
template column

Defines a column within a column template. Mandatory. Attributes:

name:(optional) - standard name for this column.
previous-name:(optional)
description:(optional)
type:(optional) - standard type for this column. Allowable values are int, long, short, tiny, varchar, char, numeric, boolean, datetime, date, blob, clob.
raw-type:(optional) - DBMS-specific type for this column. This will be inserted directly in the DDL, without being translated for the DBMS. If you need this behaviour, it's better to put in a DBMS-specific column-override
length:(optional) - length of the column (used for char, varchar, numeric)
scale:(optional) - numeric scale of the column (used for numeric only)
allows-null:(optional, default true) - whether the column allows nulls
auto-increment:(optional, default false) - whether the column is auto-incrementing, i.e. the DBMS will generate a sequential value for you
default-value:(optional) - a value that the DBMS will insert in the column for you if you don't provide a value. NB: You MUST provide your own single quotes around character values, e.g. default-value="'Some value'". This is so that you can use functions etc. as default values (in this case you don't include single quotes)
constraints:(optional) - used for standard (non-DBMS-specific) column constraints, i.e. those words that appear immediately after the column type and size declaration, e.g. 'CHECK()'. Note that it's more common to put constraints in a 'column-override' element, since they're commonly DBMS-specific
column description See definition for db-schema element
column condition See definition for schema-override element
column column-override

Defines a column override, which overrides the standard values for the column for one or more specified DBMSs. It's normally only required when you want to do something special, like identity columns, LOBs, raw types etc. Optional. Attributes:

dbms:(mandatory)
description:(optional)
type:(optional) - DBMS-specific type for this column, used to override the default type that xml2ddl would normally use for this DBMS
raw-type:(optional) - DBMS-specific type for this column. This will be inserted directly in the DDL, without being translated for the DBMS
length:(optional) - length of the column (used for char, varchar, numeric). Set to 0 to omit a length that was set in the 'column' element
allows-null:(optional) - whether the column allows nulls
scale:(optional) - numeric scale of the column (used for numeric only)
constraints:(optional) - used for DBMS-specific column constraints, which will be added immediately after the column type, size declaration, and any NULL constraints. This can be used to insert constraints such as 'IDENTITY' (mssql), DEFAULT etc.
column-override description See definition for db-schema element
column-override condition See definition for schema-override element
grant-templates grant-template

Defines a grant template that can be used to apply a pre-defined set of grants to a table. Zero or more. Attributes:

name:(mandatory) - the name by which this template is referenced from a grant-set
description:(optional)
grant-template description See definition for db-schema element
grant-template grant

Defines the grant of one or more actions for a single role. One or more required. Attributes:

dbms:(optional) - determines the DBMS(s) this grant applies to
description:(optional)
role:(mandatory) - the role to which the actions are to be granted
actions:(mandatory) - one or more values (comma-separated) from the set (select, insert, update, delete, references, all)
grant description See definition for db-schema element
grant condition See definition for schema-override element
sequence description See definition for db-schema element
sequence condition See definition for schema-override element
sequence schema-override See definition for database element
sequence grants Defines a set of action grants to a role for a sequence. Optional
grants grant Defines the grant of one or more actions for a single role on the sequence or table. See definition under grant-template
sequence grant-set

Pointer to a grant-template used to apply a set of grants together. Optional. Attributes:

template:(required) Uses the grant-template of this name to provide the set of grants to apply
table description See definition for db-schema element
table condition See definition for schema-override element
table schema-override See definition for database element
table columns Defines the set of columns in the table. Mandatory
columns column

Defines a column within the table. Documentation is generally as for the 'column' element above (in the 'template' section), with the following differences:

  • Attribute template (optional) - if included, uses the template with this name for default values. Selection order for attributes is:
    • Attribute on column-override element under this column (applying to the target DBMS), if any
    • Attribute on column-override element under the template (applying to the target DBMS), if any
    • Attribute on this element
    • Attribute on the template column
  • Name and type attributes are optional at each level, but must be found within the above search order. Type can be omitted if a raw-type is provided, but is still recommended.
table constraints Contains all primary key, unique and foreign key constraints for the table. Optional
constraints primary-key

Defines the primary key for a table. Optional. Attributes:

name:(optional, if omitted defaults to 'PK_<tableName>')
previous-name:(optional)
description:(optional)
primary-key description See definition for db-schema element
primary-key condition See definition for schema-override element
primary-key column-ref

Defines one of the columns making up a primary key. One or more, in order. Attributes:

name:(mandatory) - name of the referenced column in the table. Note that names provided by DBMS-specific overrides are not available here - every column must have a single name by which it is referenced
previous-name:(optional)
column-ref condition See definition for schema-override element
constraints unique

Defines a unique constraint applied to one or more columns in a table. Optional. Attributes:

dbms:(optional) - determines the DBMS(s) this constraint is used for
name:(optional, if omitted defaults to 'UN_<tableName>_<columnName>*')
previous-name:(optional)
description:(optional)
unique description See definition for db-schema element
unique condition See definition for schema-override element
unique column-ref Defines one of the columns making up a unique constraint. One or more, in order. Attributes as per column-ref in primary-key above
constraints foreign-key

Defines a foreign key relationship to another table. Zero or more. Attributes:

dbms:(optional) - determines the DBMS(s) this foreign key is used for
name:(optional, if omitted defaults to 'UN_<tableName>_<referencedTableName>')
previous-name:(optional)
description:(optional)
references:(required) - the name of the table this foreign key references
foreign-key description See definition for db-schema element
foreign-key condition See definition for schema-override element
foreign-key column-ref Defines one of the columns in this table that refers to the other table in the foreign key relationship. One or more, in order. Attributes as per column-ref in primary-key above
foreign-key referenced-column Defines one of the columns in the referenced table in the foreign key relationship. Only required if the foreign key doesn't relate to the primary key of the referenced table, but in that case you must have one such element for each column-ref, in the same order. Attributes as per column-ref in primary-key above
referenced-column condition See definition for schema-override element
table indexes Contains all indexes for the table. Optional
indexes index

Defines a single index on the table. One or more required. Attributes:

dbms:(optional) - determines the DBMS(s) this index applies to
name:(optional, if omitted defaults to 'IN_<tableName>_<columnName>*')
previous-name:(optional)
description:(optional)
unique:(optional, defaults to 'false') - whether this is a unique index
index description See definition for db-schema element
index condition See definition for schema-override element
index column-ref Defines one of the columns in this index. Zero or more, in order (at least one column-ref or expression is required). Attributes as per column-ref in primary-key above
index expression

Defines an SQL expression used in this index, for DBMSs that support expressions on indexes. For example, "upper(name)" or "name(30)". Zero or more, in order (at least one column-ref or expression is required).

text:(mandatory) - the text of the SQL expression
expression condition See definition for schema-override element
table storage-lines Defines a set of raw lines that are written after the closing bracket in the table create statement. Typically used for Oracle storage parameters. Optional
storage-lines storage-line

Defines a single raw line to be written after the closing bracket in the table create statement. Typically used for Oracle storage parameters. One or more, in order. Attributes:

dbms:(optional) - determines the DBMS(s) this storage line applies to
text:(required) - the line to write
storage-line description See definition for db-schema element
storage-line condition See definition for schema-override element
table grants Defines a set of action grants to a role for a table. Optional
grants grant Defines the grant of one or more actions for a single role on the table. See definition under grant-template
table grant-set

Pointer to a grant-template used to apply a set of grants together. Optional. Attributes:

template:(required) Uses the grant-template of this name to provide the set of grants to apply
db-schema view

Defines a view in the schema. Zero or more. Attributes:

name:(mandatory)
previous-name:(optional)
description:(optional)
dbms:(optional)
view description See definition for db-schema element
view condition See definition for schema-override element
view schema-override See definition for database element
view view-definition

Contains the actual definition of the view (as text within the element), for 1 or more DBMS. This should contain only the DML part of the definition, i.e. it should include neither 'CREATE VIEW xxx AS' nor the command terminator (;). One or more

dbms:(optional)
view grants See definition for table
view grant-set See definition for table

Column Type Mapping

The standard column type mapping is shown in the following table. Note that all these mappings can be over-ridden using column-override elements, with either the 'type' or 'raw-type' attributes.

Standard type DBMS DBMS type Notes
int default NUMERIC(9,0) Assumed to be a 32-bit signed integer
int postgres7, postgres8, sqlite INTEGER  
int mysql, mssql, hsql INT  
long default NUMERIC(18,0) Assumed to be a 64-bit signed integer
long postgres7, postgres8, mysql, mssql hsql, sqlite BIGINT  
short default NUMERIC(5,0) Assumed to be a 16-bit signed integer
short postgres7, postgres8 mysql, mssql hsql, sqlite SMALLINT  
tiny default NUMERIC(3,0) Assumed to be an 8-bit signed integer
tiny mysql TINYINT  
boolean default BOOLEAN  
boolean oracle CHAR(1)  
boolean mssql BIT  
varchar default VARCHAR Variable length character column
varchar oracle VARCHAR2  
char default CHAR Fixed length character column, space-padded
numeric default NUMERIC Arbitrary length and scale numeric
numeric oracle NUMBER  
datetime default TIMESTAMP  
datetime oracle TIMESTAMP Only works from Oracle 9 onwards
datetime mysql DATETIME  
datetime mssql DATETIME A word to the wise - MSSQL datetimes are rounded to the nearest 3 milliseconds (or thereabouts). This fact has shortened my lifespan by at least 3 years :{
date default DATE  
blob default BLOB Binary large object
blob postgres7, postgres8 BYTEA  
blob mysql LONGBLOB  
blob mssql IMAGE  
blob hsql LONGVARBINARY  
clob default CLOB Character large object
clob postgres7, postgres8, mssql, sqlite TEXT  
clob mysql LONGTEXT  
clob hsql LONGVARCHAR  

Wiki Markup

By using the -w option, xml2doc allows limited Wiki-style formatting to be included in descriptions.

Markup Accepted

Format Start End Example Result
Bold * * These *words are* in bold These words are in bold
Underlined _ _ And _this_ one is underlined And this one is underlined (I can't really show it here, but you know what underlining looks like)
Literal ` (i.e. a backtick) ` Useful for `codes` Useful for codes
Line break [[BR]] Nothing A line[[BR]]Another line
A line
Another line

Rules for Character Markup

The three character markup types (bold, underlined and literal) share the following rules:

  • The first format character must be preceded by white space (including a new line), or a literal line break ([[BR]])
  • The first format character must not be followed by white space
  • The second format character must not be preceded by white space.

In other words, the following are invalid:

White space _ after_ the first format character
White space `before ` the first second character

Markup Example

Here's a description, followed by what it will be turned into.

description="*Current* status of the project.  Values are:[[BR]]
        `SI`: Setup - initial[[BR]]
        `SD`: Setup - generating diffs[[BR]]
        `SF`: Setup - selecting files[[BR]]
        `RP`: Review - preparation[[BR]]
        `RM`: Review - meeting[[BR]]
        `C`:  Complete"

This will be output as:

Current status of the project. Values are
SI: Setup - initial
SD: Setup - generating diffs
SF: Setup - selecting files
RP: Review - preparation
RM: Review - meeting
C: Complete

Using Variables

Variables can be used for 2 purposes:

  • Controlling the generation of schema items, with conditions based on variable values
  • As replaceable tokens in description fields.

To use a variable, you must:

  • Define the variable, in the variables section of the XML schema. Definitions can also include valid values, and a default value
    • Note that the dbms variable is always provided - there's no need to define it
  • Optionally, provide a value for the variable at runtime. For command-line tools, this uses the --var name:value option syntax, while the Ant wrappers for these tools use the <mapargument name="variables"> element to define the values. If you defined a default value for the variable, there's no need to provide a value at runtime.

The elements that can be used for conditions are defined in the conditions section above.

All variables can be used in description attributes/elements, using the ${variableName} syntax. The DBMS for which the schema is being generated is also included as a virtual variable, accessible as ${dbms}.

Variable Example

Here's a simple example of a single variable used to control inclusion of an index. Note that the values are also included in the schema description, where they'll be included by xml2doc, and also that we're using Wiki markup.

Fragments of the XML schema:

<db-schema name="varexample">
    <description>
        A sample schema for showing off variables and schema variants.[[BR]]
        [[BR]]
        Generated for DBMS: *${dbms}* [[BR]]
        Schema variant:     *${schemaType}*
    </description>

    <variables>
        <variable name="schemaType" default="single" valid-values="single, master, slave"/>
    </variables>

    ...

    <table name="atable" ... >

        ...

        <indexes>
            <index name="IN_someindex" description="A big expensive index">
                <condition>
                    <and>
                        <if-value-in variable="schemaType" values="single, slave"/>
                        <not><if-value variable="dbms" value="hsql"/></not>
                    </and>
                </condition>

                ...

            </index>
        </indexes>
    </table>

    ...

</db-schema>

And you'd generate the master schema for your target DBMSs like this:

$ main/xml2ddl.py -d postgres8,oracle,mssql -tw --var schemaType:master schema.xml