JBoss.orgCommunity Documentation

Chapter 29. DDL File Sequencer

29.1. Example

The DDL file sequencer included in ModeShape is capable of parsing the more important DDL statements from SQL-92, Oracle, Derby, and PostgreSQL, and constructing a graph structure containing a structured representation of these statements. The resulting graph structure is largely the same for all dialects, though some dialects have non-standard additions to their grammar, and thus require dialect-specific additions to the graph structure.

The sequencer is designed to behave as intelligently as possible with as little configuration. Thus, the sequencer automatically determines the dialect used by a given DDL stream. This can be tricky, of course, since most dialects are very similar and the distinguishing features of a dialect may only be apparent in some of the statements.

To get around this, the sequencer uses a "best fit" algorithm: run the DDL stream through the parser for each of the dialects, and determine which parser was able to successfully read the greatest number of statements and tokens.

Note

It is possible to define which DDL dialects (or grammars) should be considered during sequencing using the "grammars" property in the sequencer configuration. Set the values of this property to the names of the grammars (e.g., "oracle", "postgres", "standard", or "derby"), specified in the order they should be used. To use a custom DDL parser not provided by ModeShape, simply provide the fully-qualified class name of the DdlParser implementation class.

One very interesting capability of this sequencer is that, although only a subset of the (more common) DDL statements are supported, the sequencer is still extremely functional since it does still add all statements into the output graph, just without much detail other than just the statement text and the position in the DDL file. Thus, if a DDL file contains statements the sequencer understands and statements the sequencer does not understand, the graph will still contain all statements, where those statements understood by the sequencer will have full detail. Since the underlying parsers are able to operate upon a single statement, it is possible to go back later (after the parsers have been enhanced to support additional DDL statements) and re-parse only those incomplete statements in the graph.

At this time, the sequencer supports SQL-92 standard DDL as well as dialects from Oracle, Derby, and PostgreSQL. It supports:

Note that the sequencer does not perform detailed parsing of SQL (i.e. SELECT, INSERT, UPDATE, etc....) statements.

Caution

The DDL sequencer is being included as a Technology Preview. It is fully functional for the dialects listed above, and may indeed work on certain DDL files that use other dialects. But we would like to have feedback from users, test against more DDL examples, support additional dialects, and support more kinds of DDL statements. As such, the output format and node types associated with the DefaultClassFileRecorder may change in future versions.

Sequencing results in graph nodes basically representing the BNF structure of each DDL statement. Below is an example DDL schema definition statement containing table and view definition statements.

CREATE SCHEMA hollywood
	  CREATE TABLE films (title varchar(255), release date, producerName varchar(255))
	  CREATE VIEW winners AS SELECT title, release FROM films WHERE producerName IS NOT NULL;

The resulting graph structure contains the raw statement expression, pertinent table, column and key reference information and position of the statement in the text stream (e.g., line number, column number and character index) so the statement can be tied back to the original DDL:



<nt:unstructured jcr:name="statements" 
                 jcr:mixinTypes = "mode:derived" 
                 mode:derivedAt="2011-05-13T13:12:03.925Z" 
                 mode:derivedFrom="/files/foo.sql"
                 ddl:parserId="POSTGRES">
  <nt:unstructured jcr:name="hollywood" jcr:mixinTypes="ddl:createSchemaStatement" 
                     ddl:startLineNumber="1"
                   ddl:startColumnNumber="1"
                   ddl:expression="CREATE SCHEMA hollywood"
                   ddl:startCharIndex="0">
    <nt:unstructured jcr:name="films" jcr:mixinTypes="ddl:createTableStatement"
                   ddl:startLineNumber="2"
                   ddl:startColumnNumber="5"
                   ddl:expression="CREATE TABLE films (title varchar(255), release date, producerName varchar(255))"
                   ddl:startCharIndex="28"/>
    <nt:unstructured jcr:name="title" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="VARCHAR"
                   ddl:datatypeLength="255"/>
    <nt:unstructured jcr:name="release" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="DATE"/>
    <nt:unstructured jcr:name="producerName" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="VARCHAR"
                   ddl:datatypeLength="255"/>
  <nt:unstructured jcr:name="winners" jcr:mixinTypes="ddl:createViewStatement"
                   ddl:startLineNumber="3"
                   ddl:startColumnNumber="5"
                   ddl:expression="CREATE VIEW winners AS SELECT title, release FROM films WHERE producerName IS NOT NULL;"
                   ddl:queryExpression="SELECT title, release FROM films WHERE producerName IS NOT NULL"
                   ddl:startCharIndex="113"/>
</nt:unstructured>

Note that all nodes are of type nt:unstructured while the type of statement is identified using mixins. Also, each of the nodes representing a statement contain: a ddl:expression property with the exact statement as it appeared in the original DDL stream; a ddl:startLineNumber and ddl:startColumnNumber property defining the position in the original DDL stream of the first character in the expression; and a ddl:startCharIndex property that defines the integral index of the first character in the expression as found in the DDL stream. All of these properties make sure the statement can be traced back to its location in the original DDL.

To use this sequencer, simply include the modeshape-sequencer-ddl JAR in your application and configure the JcrConfiguration to use this sequencer using something similar to:



JcrConfiguration config = ...
config.sequencer("DDL Sequencer")
      .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
      .loadedFromClasspath()
      .setDescription("Sequences DDL files to extract individual statements and accompanying statement properties and values")
      .sequencingFrom("//(*.(ddl)[*])/jcr:content[@jcr:data]")
      .andOutputtingTo("/ddls/$1"); 

This will use all of the built-in grammars (e.g., "standard", "oracle", "postgres", and "derby"). To specify a different order or subset of the grammars, use the setProperty(...) method. Here's an example that just uses the standard grammar followed by the PostgreSQL grammar:



config.sequencer("DDL Sequencer")
      .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
      .loadedFromClasspath()
      .setDescription("Sequences DDL files to extract individual statements and accompanying statement properties and values")
      .setProperty("grammar","standard","postgres")
      .sequencingFrom("//(*.(ddl)[*])/jcr:content[@jcr:data]")
      .andOutputtingTo("/ddls/$1"); 

And, to use a custom implementation of DdlParser, simply use the fully-qualified name of the implementation class (which must have a no-arg constructor) as the name of the grammar:



config.sequencer("DDL Sequencer")
      .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
      .loadedFromClasspath()
      .setDescription("Sequences DDL files to extract individual statements and accompanying statement properties and values")
      .setProperty("grammar","standard","postgres","org.example.ddl.MyCustomDdlParser")
      .sequencingFrom("//(*.(ddl)[*])/jcr:content[@jcr:data]")
      .andOutputtingTo("/ddls/$1");