Chapter 4. Using JBoss DNA for Sequencing

As we've mentioned before, JBoss DNA is able to work with existing JCR repositories. Your client applications make changes to the information in those repositories, and JBoss DNA automatically uses its sequencers to extract additional information from the uploaded files.

Note

Configuring JBoss DNA services is a bit more manual than is ideal. As you'll see, JBoss DNA uses dependency injection to allow a great deal of flexibility in how it can be configured and customized. But this flexibility makes it more difficult for you to use. We understand this, and will soon provide a much easier way to set up and manage JBoss DNA. Current plans are to use the JBoss Microcontainer along with a configuration repository.

4.1. Configuring the Sequencing Service

The JBoss DNA sequencing service is the component that manages the sequencers, reacting to changes in JCR repositories and then running the appropriate sequencers. This involves processing the changes on a node, determining which (if any) sequencers should be run on that node, and for each sequencer constructing the execution environment, calling the sequencer, and saving the information generated by the sequencer.

To set up the sequencing service, an instance is created, and dependent components are injected into the object. This includes among other things:

An execution context that defines the context in which the service runs, including a factory for JCR sessions given names of the repository and workspace. This factory must be configured, and is how JBoss DNA knows about your JCR repositories and how to connect to them. More on this a bit later.
An java.util.concurrent.ExecutorService used to execute the sequencing activites. If none is supplied, a new single-threaded executor is created by calling Executors.newSingleThreadExecutor(). (This can easily be changed by subclassing and overriding the SequencerService.createDefaultExecutorService() method.)
Filters for sequencers and events. By default, all sequencers are considered for "node added", "property added" and "property changed" events.

As mentioned above, the JcrExecutionContext provides access to a SessionFactory that is used by JBoss DNA to establish sessions to your JCR repositories. Two implementations are available:

The JndiSessionFactory looks up JCR Repository instances in JNDI using names that are supplied when creating sessions. This implementation also has methods to set the JCR Credentials for a given workspace name.
The SimpleSessionFactory has methods to register the JCR Repository instances with names, as well as methods to set the JCR Credentials for a given workspace name.

You can use the SimpleExecutionContext implementation of ExecutionContext and supply a SessionFactory instance, or you can provide your own implementation.

Here's an example of how to instantiate and configure the SequencingService:



SimpleSessionFactory sessionFactory = new SimpleSessionFactory();

sessionFactory.registerRepository("Repository", this.repository);

Credentials credentials = new SimpleCredentials("jsmith", "secret".toCharArray());

sessionFactory.registerCredentials("Repository/Workspace1", credentials);

JcrExecutionContext context = new JcrExecutionContext(sessionFactory,"Repository/Workspace1");


// Create the sequencing service, passing in the execution context ...

SequencingService sequencingService = new SequencingService();

sequencingService.setExecutionContext(context);

After the sequencing service is created and configured, it must be started. The SequencingService has an administration object (that is an instance of ServiceAdministrator) with start(), pause(), and shutdown() methods. The latter method will close the queue for sequencing, but will allow sequencing operations already running to complete normally. To wait until all sequencing operations have completed, simply call the awaitTermination method and pass it the maximum amount of time you want to wait.

sequencingService.getAdministrator().start();

The sequencing service must also be configured with the sequencers that it will use. This is done using the addSequencer(SequencerConfig) method and passing a SequencerConfig instance that you create. Here's the code that defines 3 sequencer configurations: 1 that places image metadata into "/images/<filename>", another that places MP3 metadata into "/mp3s/<filename>", and a third that places a structure that represents the classes, methods, and attributes found within Java source into "/java/<filename>".



String name = "Image Sequencer";

String desc = "Sequences image files to extract the characteristics of the image";

String classname = "org.jboss.dna.sequencer.image.ImageMetadataSequencer";

String[] classpath = null; // Use the current classpath

String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1"};

SequencerConfig imageSequencerConfig = new SequencerConfig(name, desc, classname, 

                                                           classpath, pathExpressions);

sequencingService.addSequencer(imageSequencerConfig);


name = "MP3 Sequencer";

desc = "Sequences MP3 files to extract the ID3 tags from the audio file";

classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";

pathExpressions = {"//(*.mp3[*])/jcr:content[@jcr:data] =&gt; /mp3s/$1"};

SequencerConfig mp3SequencerConfig = new SequencerConfig(name, desc, classname, 

                                                         classpath, pathExpressions);

sequencingService.addSequencer(mp3SequencerConfig);


name = "Java Sequencer";

desc = "Sequences java files to extract the characteristics of the Java source";

classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";

pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};

SequencerConfig javaSequencerConfig = new SequencerConfig(name, desc, classname, 

                                                          classpath, pathExpressions);

this.sequencingService.addSequencer(javaSequencerConfig);

Each configuration defines several things, including the name, description, and sequencer implementation class. The configuration also defines the classpath information, which can be passed to the ExecutionContext to get a Java classloader with which the sequencer class can be loaded. (If no classpath information is provided, as is done in the code above, the application class loader is used.) The configuration also specifies the path expressions that identify the nodes that should be sequenced with the sequencer and where to store the output generated by the sequencer. Path expressions are pretty straightforward but are quite powerful, so before we go any further with the example, let's dive into path expressions in more detail.

4.1.1. Path Expressions

Path expressions consist of two parts: a selection criteria (or an input path) and an output path:

  inputPath => outputPath

The inputPath part defines an expression for the path of a node that is to be sequenced. Input paths consist of '/' separated segments, where each segment represents a pattern for a single node's name (including the same-name-sibling indexes) and '@' signifies a property name.

Let's first look at some simple examples:

Table 4.1. Simple Input Path Examples

Input Path	Description
/a/b	Match node "`b`" that is a child of the top level node "`a`". Neither node may have any same-name-sibilings.
/a/*	Match any child node of the top level node "`a`".
/a/*.txt	Match any child node of the top level node "`a`" that also has a name ending in "`.txt`".
/a/*.txt	Match any child node of the top level node "`a`" that also has a name ending in "`.txt`".
/a/b@c	Match the property "`c`" of node "`/a/b`".
/a/b[2]	The second child named "`b`" below the top level node "`a`".
/a/b[2,3,4]	The second, third or fourth child named "`b`" below the top level node "`a`".
/a/b[*]	Any (and every) child named "`b`" below the top level node "`a`".
//a/b	Any node named "`b`" that exists below a node named "`a`", regardless of where node "`a`" occurs. Again, neither node may have any same-name-sibilings.

With these simple examples, you can probably discern the most important rules. First, the '*' is a wildcard character that matches any character or sequence of characters in a node's name (or index if appearing in between square brackets), and can be used in conjunction with other characters (e.g., "*.txt").

Second, square brackets (i.e., '[' and ']') are used to match a node's same-name-sibiling index. You can put a single non-negative number or a comma-separated list of non-negative numbers. Use '0' to match a node that has no same-name-sibilings, or any positive number to match the specific same-name-sibling.

Third, combining two delimiters (e.g., "//") matches any sequence of nodes, regardless of what their names are or how many nodes. Often used with other patterns to identify nodes at any level matching other patterns. Three or more sequential slash characters are treated as two.

Many input paths can be created using just these simple rules. However, input paths can be more complicated. Here are some more examples:

Table 4.2. More Complex Input Path Examples

Input Path	Description
/a/(b\|c\|d)	Match children of the top level node "`a`" that are named "`a`", "`b`" or "`c`". None of the nodes may have same-name-sibling indexes.
/a/b[c/d]	Match node "`b`" child of the top level node "`a`", when node "`b`" has a child named "`c`", and "`c`" has a child named "`d`". Node "`b`" is the selected node, while nodes "`b`" and "`b`" are used as criteria but are not selected.
/a(/(b\|c\|d\|)/e)[f/g/@something]	Match node "`/a/b/e`", "`/a/c/e`", "`/a/d/e`", or "`/a/e`" when they also have a child "`f`" that itself has a child "`g`" with property "`something`". None of the nodes may have same-name-sibling indexes.

These examples show a few more advanced rules. Parentheses (i.e., '(' and ')') can be used to define a set of options for names, as shown in the first and third rules. Whatever part of the selected node's path appears between the parentheses is captured for use within the output path. Thus, the first input path in the previous table would match node "/a/b", and "b" would be captured and could be used within the output path using "$1", where the number used in the output path identifies the parentheses.

Square brackets can also be used to specify criteria on a node's properties or children. Whatever appears in between the square brackets does not appear in the selected node.

Let's go back to the previous code fragment and look at the first path expression:

  //(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1

This matches a node named "jcr:content" with property "jcr:data" but no siblings with the same name, and that is a child of a node whose name ends with ".jpg", ".jpeg", ".gif", ".bmp", ".pcx", or ".png" that may have any same-name-sibling index. These nodes can appear at any level in the repository. Note how the input path capture the filename (the segment containing the file extension), including any same-name-sibling index. This filename is then used in the output path, which is where the sequenced content is placed.

4.1.2. Path Expressions Used in the Example

Now that we've covered path expressions, let's go back to the three sequencer configuration in the example. Here they are again, with a description of what each path means:

Table 4.3. Path Expressions for the 3 Sequencers

Input Path	Output Path	Description
//(.(jpg\|jpeg\|gif\|bmp\|pcx\|png)[])/jcr:content[@jcr:data]	/images/$1	Any node with a name ending in "`.jpg`", "`.jpeg`", "`.gif`", "`.bmp`", "`.pcx`", or "`.png`", whether or not it has a same-name-sibling index, but that has a child named "`jcr:content`" with "`jcr:data`" property. The node name representing the filename (including any same-name-sibling index) is captured, and used to place the output in "`/images/<filename>`".
//(.mp3[])/jcr:content[@jcr:data]	/mp3s/$1	Any node with a name ending in "`.mp3`", whether or not it has a same-name-sibling index, but that has a child named "`jcr:content`" with "`jcr:data`" property. The node name representing the filename (including any same-name-sibling index) is captured, and used to place the output in "`/mp3s/<filename>`".
//(.java[])/jcr:content[@jcr:data]	/java/$1	Any node with a name ending in "`.java`", whether or not it has a same-name-sibling index, but that has a child named "`jcr:content`" with "`jcr:data`" property. The node name representing the filename (including any same-name-sibling index) is captured, and used to place the output in "`/java/<filename>`".

After these sequencer configurations are defined and added to the SequencingService, the service is now ready to start reacting to changes in the repository and automatically looking for nodes to sequence. But we first need to wire the service into the repository to receive those change events. This is accomplished using the ObservationService described in the next section.

4.2. Configuring the Observation Service

The JBoss DNA ObservationService is responsible for listening to one or more JCR repositories and multiplexing the events to its listeners. Unlike JCR events, this framework embeds in the events the name of the repository and workspace that can be passed to a SessionFactory to obtain a session to the repository in which the change occurred. This simple design makes it very easy for JBoss DNA to concurrently work with multiple JCR repositories.

Configuring an observation service is pretty easy, especially if you reuse the same SessionFactory supplied to the sequencing service. Here's an example:



  this.observationService = new ObservationService(sessionFactory);

  this.observationService.getAdministrator().start();

Note

Both ObservationService and SequencingService implement AdministeredService, which has a ServiceAdministrator used to start, pause, and shutdown the service. In other words, the lifecycle of the services are managed in the same way.

After the observation service is started, listeners can be added. The SequencingService implements the required interface, and so it may be registered directly:



  observationService.addListener(sequencingService);

Finally, the observation service must be wired to monitor one of your JCR repositories. This is done with one of the monitor(...) methods:



  int eventTypes = Event.NODE_ADDED | Event.PROPERTY_ADDED | Event.PROPERTY_CHANGED;

  observationService.monitor("Main Repository/Workspace1", eventTypes);

At this point, the observation service is listening to a JCR repository and forwarding the appropriate events to the sequencing service, which will asynchronously process the changes and sequence the information added to or changed in the repository.

4.3. Shutting down JBoss DNA services

The JBoss DNA services are utilizing resources and threads that must be released before your application is ready to shut down. The safe way to do this is to simply obtain the ServiceAdministrator for each service (via the getServiceAdministrator() method) and call shutdown(). As previously mentioned, the shutdown method will simply prevent new work from being processed and will not wait for existing work to be completed. If you want to wait until the service completes all its work, you must wait until the service terminates. Here's an example that shows how this is done:



// Shut down the service and wait until it's all shut down ...

sequencingService.getAdministrator().shutdown();

sequencingService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);


// Shut down the observation service ...

observationService.getAdministrator().shutdown();

observationService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);

At this point, we've covered how to configure and use the JBoss DNA services in your application. The next chapter goes back to the sample application to show how all these pieces fit together.

4.4. Reviewing the example application

Recall that the example application consists of a client application that sets up an in-memory JCR repository and that allows a user to upload files into that repository. The client also sets up the DNA services with an image sequencer so that if any of the uploaded files are PNG, JPEG, GIF, BMP or other images, DNA will automatically extract the image's metadata (e.g., image format, physical size, pixel density, etc.) and store that in the repository. Or, if the client uploads MP3 audio files, the title, author, album, year, and comment are extracted from the audio file and stored in the repository.

The example is comprised of 5 classes and 1 interface, located in the src/main/java directory:

  org/jboss/example/dna/sequencers/ConsoleInput.java
                                  /ContentInfo.java
                                  /JavaInfo.java
                                  /MediaInfo.java
                                  /SequencingClient.java
                                  /UserInterface.java

SequencingClient is the class that contains the main application. ContentInfo is a simple class that encapsulate metadata generated by the sequencers and accessed by this example application, and there are two subclasses: MediaInfo encapsulates metadata about media (image and MP3) files, while JavaInfo is a subclass encapsulating information about a Java class. The client accesses the content from the repository and represent the information using instances of ContentInfo (and its subclasses) and then passing them to the UserInterface. UserInterface is an interface with methods that will be called at runtime to request data from the user. ConsoleInput is an implementation of this that creates a text user interface, allowing the user to operate the client from the command-line. We can easily create a graphical implementation of UserInterface at a later date. We can also create a mock implementation for testing purposes that simulates a user entering data. This allows us to check the behavior of the client automatically using conventional JUnit test cases, as demonstrated by the code in the src/test/java directory:

  org/jboss/example/dna/sequencers/SequencingClientTest.java
                                  /MockUserInterface.java

If we look at the SequencingClient code, there are a handful of methods that encapsulate the various activities.

Note

Some of the code samples included in this book have had some of the error handling and comments removed so that the samples are more readable and concise.

The startRepository() method starts up an in-memory Jackrabbit JCR repository. The bulk of this method is simply gathering and passing the information required by Jackrabbit. Because Jackrabbit's TransientRepository implementation shuts down after the last session is closed, the application maintains a session to ensure that the repository remains open throughout the application's lifetime. And finally, the node type needed by the image sequencer is registered with Jackrabbit.



public void startRepository() throws Exception {

    if (this.repository == null) {

        try {


            // Load the Jackrabbit configuration ...

            File configFile = new File(this.jackrabbitConfigPath);

            String pathToConfig = configFile.getAbsolutePath();


            // Find the directory where the Jackrabbit repository data will be stored ...

            File workingDirectory = new File(this.workingDirectory);

            String workingDirectoryPath = workingDirectory.getAbsolutePath();


            // Get the Jackrabbit custom node definition (CND) file ...

            URL cndFile = Thread.currentThread().getContextClassLoader().getResource("jackrabbitNodeTypes.cnd");


            // Create the Jackrabbit repository instance and establish a session to keep the repository alive ...

            this.repository = new TransientRepository(pathToConfig, workingDirectoryPath);

            if (this.username != null) {

                Credentials credentials = new SimpleCredentials(this.username, this.password);

                this.keepAliveSession = this.repository.login(credentials, this.workspaceName);

            } else {

                this.keepAliveSession = this.repository.login();

            }


            try {

                // Register the node types (only valid the first time) ...

                Workspace workspace = this.keepAliveSession.getWorkspace();

                JackrabbitNodeTypeManager mgr = (JackrabbitNodeTypeManager)workspace.getNodeTypeManager();

                mgr.registerNodeTypes(cndFile.openStream(), JackrabbitNodeTypeManager.TEXT_X_JCR_CND);

            } catch (RepositoryException e) {

                if (!e.getMessage().contains("already exists")) throw e;

            }


        } catch (Exception e) {

            this.repository = null;

            this.keepAliveSession = null;

            throw e;

        }

    }

}

As you can see, this method really has nothing to do with JBoss DNA, other than setting up a JCR repository that JBoss DNA will use.

The shutdownRepository() method shuts down the Jackrabbit transient repository by closing the "keep-alive session". Again, this method really does nothing specifically with JBoss DNA, but is needed to manage the JCR repository that JBoss DNA uses.



public void shutdownRepository() throws Exception {

    if (this.repository != null) {

        try {

            this.keepAliveSession.logout();

        } finally {

            this.repository = null;

            this.keepAliveSession = null;

        }

    }

}

The startDnaServices() method first starts the JCR repository (if it was not already started), and proceeds to create and configure the SequencingService as described earlier. This involes setting up the SessionFactory and ExecutionContext, creating the SequencingService instance, and configuring the image sequencer. The method then continues by setting up the ObservationService as described earlier and starting the service.



public void startDnaServices() throws Exception {

    if (this.repository == null) this.startRepository();

    if (this.sequencingService == null) {


        SimpleSessionFactory sessionFactory = new SimpleSessionFactory();

        sessionFactory.registerRepository(this.repositoryName, this.repository);

        if (this.username != null) {

            Credentials credentials = new SimpleCredentials(this.username, this.password);

            sessionFactory.registerCredentials(this.repositoryName + "/" + this.workspaceName, credentials);

        }

        this.executionContext = new JcrExecutionContext(sessionFactory, repositoryWorkspaceName);


        // Create the sequencing service, passing in the execution context ...

        this.sequencingService = new SequencingService();

        this.sequencingService.setExecutionContext(executionContext);


        // Configure the sequencers.

        String name = "Image Sequencer";

        String desc = "Sequences image files to extract the characteristics of the image";

        String classname = "org.jboss.dna.sequencer.images.ImageMetadataSequencer";

        String[] classpath = null; // Use the current classpath

        String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd))[*]/jcr:content[@jcr:data] => /images/$1"};

        SequencerConfig imageSequencerConfig = new SequencerConfig(name, desc, classname, classpath, pathExpressions);

        this.sequencingService.addSequencer(imageSequencerConfig);


        // Set up the MP3 sequencer ...

        name = "Mp3 Sequencer";

        desc = "Sequences mp3 files to extract the id3 tags of the audio file";

        classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";

        pathExpressions = {"//(*.mp3)[*]/jcr:content[@jcr:data] =&gt; /mp3s/$1"};

        SequencerConfig mp3SequencerConfig = new SequencerConfig(name, desc, classname, classpath, pathExpressions);

        this.sequencingService.addSequencer(mp3SequencerConfig);


                name = "Java Sequencer";

                desc = "Sequences java files to extract the characteristics of the Java source";

                classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";

                pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};

                SequencerConfig javaSequencerConfig = new SequencerConfig(name, desc, classname,classpath, pathExpressions);

                this.sequencingService.addSequencer(javaSequencerConfig);


        // Use the DNA observation service to listen to the JCR repository (or multiple ones), and

        // then register the sequencing service as a listener to this observation service...

        this.observationService = new ObservationService(this.executionContext.getSessionFactory());

        this.observationService.getAdministrator().start();

        this.observationService.addListener(this.sequencingService);

        this.observationService.monitor(this.repositoryName + "/" + this.workspaceName, Event.NODE_ADDED | Event.PROPERTY_ADDED | Event.PROPERTY_CHANGED);

    }

    // Start up the sequencing service ...

    this.sequencingService.getAdministrator().start();

}

The shutdownDnaServices() method is pretty straightforward: it just calls shutdown on each of the services and waits until they terminate.



public void shutdownDnaServices() throws Exception {

    if (this.sequencingService == null) return;


    // Shut down the service and wait until it's all shut down ...

    this.sequencingService.getAdministrator().shutdown();

    this.sequencingService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);


    // Shut down the observation service ...

    this.observationService.getAdministrator().shutdown();

    this.observationService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);

}

None of the other methods really do anything with JBoss DNA per se. Instead, they merely work with the repository using the JCR API.

The main method of the SequencingClient class creates a SequencingClient instance, and passes a new ConsoleInput instance:



public static void main( String[] args ) throws Exception {

    SequencingClient client = new SequencingClient();

    client.setRepositoryInformation("repo", "default", "jsmith", "secret".toCharArray());

    client.setUserInterface(new ConsoleInput(client));

}

If we look at the ConsoleInput constructor, it starts the repository, the DNA services, and a thread for the user interface. At this point, the constructor returns, but the main application continues under the user interface thread. When the user requests to quit, the user interface thread also shuts down the DNA services and JCR repository.



public ConsoleInput( SequencerClient client ) {

  try {

      client.startRepository();

      client.startDnaServices();

  

      System.out.println(getMenu());

      Thread eventThread = new Thread(new Runnable() {

          private boolean quit = false;

          public void run() {

              try {

                  while (!quit) {

                      // Display the prompt and process the requested operation ...

                  }

              } finally {

                  try {

                      // Terminate ...

                      client.shutdownDnaServices();

                      client.shutdownRepository();

                  } catch (Exception err) {

                      System.out.println("Error shutting down sequencing service and repository: " 

                                         + err.getLocalizedMessage());

                      err.printStackTrace(System.err);

                  }

              }

          }

      });

      eventThread.start();

  } catch (Exception err) {

      System.out.println("Error: " + err.getLocalizedMessage());

      err.printStackTrace(System.err);

  }

}

At this point, we've reviewed all of the interesting code in the example application. However, feel free to play with the application, trying different things.

4.5. Summarizing what we just did

In this chapter we covered the different JBoss DNA components used for automatically sequencing a variety of types of information, and how those components can be used in your application. Specifically, we described how the SequencingService and ObservationService can be configured and used. And we ended the chapter by reviewing the example application, which not only uses JBoss DNA, but also the repository via the JCR API.