JBoss.orgCommunity Documentation
As we've mentioned before, JBoss DNA is able to work with existing JCR repositories. Your client applications make changes to the information in those repositories, and JBoss DNA automatically uses its sequencers to extract additional information from the uploaded files.
Configuring JBoss DNA services is a bit more manual than is ideal. As you'll see, JBoss DNA uses dependency injection to allow a great deal of flexibility in how it can be configured and customized. But this flexibility makes it more difficult for you to use. We understand this, and will soon provide a much easier way to set up and manage JBoss DNA. Current plans are to use the JBoss Microcontainer along with a configuration repository.
The JBoss DNA sequencing service is the component that manages the sequencers, reacting to changes in JCR repositories and then running the appropriate sequencers. This involves processing the changes on a node, determining which (if any) sequencers should be run on that node, and for each sequencer constructing the execution environment, calling the sequencer, and saving the information generated by the sequencer.
To set up the sequencing service, an instance is created, and dependent components are injected into the object. This includes among other things:
An execution context that defines the context in which the service runs, including a factory for JCR sessions given names of the repository and workspace. This factory must be configured, and is how JBoss DNA knows about your JCR repositories and how to connect to them. More on this a bit later.
An java.util.concurrent.ExecutorService
used to execute the sequencing activites. If none
is supplied, a new single-threaded executor is created by calling Executors.newSingleThreadExecutor()
.
(This can easily be changed by subclassing and overriding the SequencerService.createDefaultExecutorService()
method.)
Filters for sequencers and events. By default, all sequencers are considered for "node added", "property added" and "property changed" events.
As mentioned above, the ExecutionContext
provides access to a SessionFactory
that is used
by JBoss DNA to establish sessions to your JCR repositories. Two implementations are available:
The JndiSessionFactory
looks up JCR Repository
instances in JNDI using
names that are supplied when creating sessions. This implementation also has methods to set the
JCR Credentials
for a given workspace name.
The SimpleSessionFactory
has methods to register the JCR Repository
instances
with names, as well as methods to set the JCR Credentials
for a given workspace name.
You can use the SimpleExecutionContext
implementation of ExecutionContext
and supply
a SessionFactory
instance, or you can provide your own implementation.
Here's an example of how to instantiate and configure the SequencingService:
SimpleSessionFactory sessionFactory = new SimpleSessionFactory();
sessionFactory.registerRepository("Repository", this.repository);
Credentials credentials = new SimpleCredentials("jsmith", "secret".toCharArray());
sessionFactory.registerCredentials("Repository/Workspace1", credentials);
JcrExecutionContext context = new BasicJcrExecutionContext(sessionFactory,"Repository/Workspace1");
// Create the sequencing service, passing in the execution context ...
SequencingService sequencingService = new SequencingService();
sequencingService.setExecutionContext(context);
After the sequencing service is created and configured, it must be started. The SequencingService
has an administration object (that is an instance of ServiceAdministrator
)
with start()
, pause()
, and shutdown()
methods. The latter method will
close the queue for sequencing, but will allow sequencing operations already running to complete normally.
To wait until all sequencing operations have completed, simply call the awaitTermination
method
and pass it the maximum amount of time you want to wait.
sequencingService.getAdministrator().start();
The sequencing service must also be configured with the sequencers that it will use. This is done using the
addSequencer(SequencerConfig)
method and passing a SequencerConfig
instance that
you create. Here's the code that defines 3 sequencer configurations: 1 that places image metadata into
"/images/<filename>
", another that places MP3 metadata into "/mp3s/<filename>
",
and a third that places a structure that represents the classes, methods, and attributes found within Java source into
"/java/<filename>
".
String name = "Image Sequencer";
String desc = "Sequences image files to extract the characteristics of the image";
String classname = "org.jboss.dna.sequencer.images.ImageMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1"};
SequencerConfig imageSequencerConfig = new SequencerConfig(name, desc, classname,
classpath, pathExpressions);
sequencingService.addSequencer(imageSequencerConfig);
name = "MP3 Sequencer";
desc = "Sequences MP3 files to extract the ID3 tags from the audio file";
classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";
pathExpressions = {"//(*.mp3[*])/jcr:content[@jcr:data] => /mp3s/$1"};
SequencerConfig mp3SequencerConfig = new SequencerConfig(name, desc, classname,
classpath, pathExpressions);
sequencingService.addSequencer(mp3SequencerConfig);
name = "Java Sequencer";
desc = "Sequences java files to extract the characteristics of the Java source";
classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";
pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};
SequencerConfig javaSequencerConfig = new SequencerConfig(name, desc, classname,
classpath, pathExpressions);
this.sequencingService.addSequencer(javaSequencerConfig);
Each configuration defines several things, including the name, description, and sequencer implementation class.
The configuration also defines the classpath information, which can be passed to the ExecutionContext
to get
a Java classloader with which the sequencer class can be loaded. (If no classpath information is provided, as is done
in the code above, the application class loader is used.) The configuration also specifies the path expressions that
identify the nodes that should be sequenced with the sequencer and where to store the output generated by the sequencer.
Path expressions are pretty straightforward but are quite powerful, so before we go any further with the example,
let's dive into path expressions in more detail.
Path expressions consist of two parts: a selection criteria (or an input path) and an output path:
inputPath => outputPath
The inputPath part defines an expression for the path of a node that is to be sequenced.
Input paths consist of '/
' separated segments, where each segment represents a pattern for a single node's
name (including the same-name-sibling indexes) and '@
' signifies a property name.
Let's first look at some simple examples:
Table 4.1. Simple Input Path Examples
Input Path | Description |
---|---|
/a/b | Match node "b " that is a child of the top level node "a ". Neither node
may have any same-name-sibilings. |
/a/* | Match any child node of the top level node "a ". |
/a/*.txt | Match any child node of the top level node "a " that also has a name ending in ".txt ". |
/a/*.txt | Match any child node of the top level node "a " that also has a name ending in ".txt ". |
/a/b@c | Match the property "c " of node "/a/b ". |
/a/b[2] | The second child named "b " below the top level node "a ". |
/a/b[2,3,4] | The second, third or fourth child named "b " below the top level node "a ". |
/a/b[*] | Any (and every) child named "b " below the top level node "a ". |
//a/b | Any node named "b " that exists below a node named "a ", regardless
of where node "a " occurs. Again, neither node may have any same-name-sibilings. |
With these simple examples, you can probably discern the most important rules. First, the '*
' is a wildcard character
that matches any character or sequence of characters in a node's name (or index if appearing in between square brackets), and
can be used in conjunction with other characters (e.g., "*.txt
").
Second, square brackets (i.e., '[
' and ']
') are used to match a node's same-name-sibiling index.
You can put a single non-negative number or a comma-separated list of non-negative numbers. Use '0' to match a node that has no
same-name-sibilings, or any positive number to match the specific same-name-sibling.
Third, combining two delimiters (e.g., "//
") matches any sequence of nodes, regardless of what their names are
or how many nodes. Often used with other patterns to identify nodes at any level matching other patterns.
Three or more sequential slash characters are treated as two.
Many input paths can be created using just these simple rules. However, input paths can be more complicated. Here are some more examples:
Table 4.2. More Complex Input Path Examples
Input Path | Description |
---|---|
/a/(b|c|d) | Match children of the top level node "a " that are named "a ",
"b " or "c ". None of the nodes may have same-name-sibling indexes. |
/a/b[c/d] | Match node "b " child of the top level node "a ", when node
"b " has a child named "c ", and "c " has a child named "d ".
Node "b " is the selected node, while nodes "b " and "b " are used as criteria but are not
selected. |
/a(/(b|c|d|)/e)[f/g/@something] | Match node "/a/b/e ", "/a/c/e ", "/a/d/e ",
or "/a/e " when they also have a child "f " that itself has a child "g " with property
"something ". None of the nodes may have same-name-sibling indexes. |
These examples show a few more advanced rules. Parentheses (i.e., '(
' and ')
') can be used
to define a set of options for names, as shown in the first and third rules. Whatever part of the selected node's path
appears between the parentheses is captured for use within the output path. Thus, the first input path in the previous table
would match node "/a/b
", and "b" would be captured and could be used within the output path using "$1
",
where the number used in the output path identifies the parentheses.
Square brackets can also be used to specify criteria on a node's properties or children. Whatever appears in between the square brackets does not appear in the selected node.
Let's go back to the previous code fragment and look at the first path expression:
//(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1
This matches a node named "jcr:content
" with property "jcr:data
" but no siblings with the same name,
and that is a child of a node whose name ends with ".jpg
", ".jpeg
", ".gif
", ".bmp
", ".pcx
",
or ".png
" that may have any same-name-sibling index. These nodes can appear at any level in the repository.
Note how the input path capture the filename (the segment containing the file extension), including any same-name-sibling index.
This filename is then used in the output path, which is where the sequenced content is placed.
Now that we've covered path expressions, let's go back to the three sequencer configuration in the example. Here they are again, with a description of what each path means:
Table 4.3. Path Expressions for the 3 Sequencers
Input Path | Output Path | Description |
---|---|---|
//(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] | /images/$1 | Any node with a name ending in ".jpg ", ".jpeg ", ".gif ", ".bmp ",
".pcx ", or ".png ", whether or not it has a same-name-sibling index, but that has a child named
"jcr:content " with "jcr:data " property. The node name representing the filename (including any
same-name-sibling index) is captured, and used to place the output in "/images/<filename> ". |
//(*.mp3[*])/jcr:content[@jcr:data] | /mp3s/$1 | Any node with a name ending in ".mp3 ", whether or not it has a same-name-sibling index, but that has a child named
"jcr:content " with "jcr:data " property. The node name representing the filename (including any
same-name-sibling index) is captured, and used to place the output in "/mp3s/<filename> ". |
//(*.java[*])/jcr:content[@jcr:data] | /java/$1 | Any node with a name ending in ".java ", whether or not it has a same-name-sibling index, but that has a child named
"jcr:content " with "jcr:data " property. The node name representing the filename (including any
same-name-sibling index) is captured, and used to place the output in "/java/<filename> ". |
After these sequencer configurations are defined and added to the SequencingService
,
the service is now ready to start reacting to changes in the repository and automatically looking for nodes to sequence.
But we first need to wire the service into the repository to receive those change events.
This is accomplished using the ObservationService
described in the next section.
The JBoss DNA ObservationService
is responsible for listening to one or more JCR repositories
and multiplexing the events to its listeners. Unlike JCR events, this framework embeds in the events the
name of the repository and workspace that can be passed to a SessionFactory
to obtain a session
to the repository in which the change occurred. This simple design makes it very easy for JBoss DNA to
concurrently work with multiple JCR repositories.
Configuring an observation service is pretty easy, especially if you reuse the same SessionFactory
supplied to the sequencing service. Here's an example:
this.observationService = new ObservationService(sessionFactory);
this.observationService.getAdministrator().start();
Both ObservationService
and SequencingService
implement
AdministeredService
, which has a ServiceAdministrator
used to start, pause, and shutdown the
service. In other words, the lifecycle of the services are managed in the same way.
After the observation service is started, listeners can be added. The SequencingService
implements the required
interface, and so it may be registered directly:
observationService.addListener(sequencingService);
Finally, the observation service must be wired to monitor one of your JCR repositories. This is done with
one of the monitor(...)
methods:
int eventTypes = Event.NODE_ADDED | Event.PROPERTY_ADDED | Event.PROPERTY_CHANGED;
observationService.monitor("Main Repository/Workspace1", eventTypes);
At this point, the observation service is listening to a JCR repository and forwarding the appropriate events to the sequencing service, which will asynchronously process the changes and sequence the information added to or changed in the repository.
The JBoss DNA services are utilizing resources and threads that must be released before your application is ready to shut down.
The safe way to do this is to simply obtain the ServiceAdministrator
for each service (via the getServiceAdministrator()
method)
and call shutdown()
. As previously mentioned, the shutdown method will simply prevent new work from being processed
and will not wait for existing work to be completed. If you want to wait until the service completes all its work, you must wait
until the service terminates. Here's an example that shows how this is done:
// Shut down the service and wait until it's all shut down ...
sequencingService.getAdministrator().shutdown();
sequencingService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);
// Shut down the observation service ...
observationService.getAdministrator().shutdown();
observationService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);
At this point, we've covered how to configure and use the JBoss DNA services in your application. The next chapter goes back to the sample application to show how all these pieces fit together.
Recall that the example application consists of a client application that sets up an in-memory JCR repository and that allows a user to upload files into that repository. The client also sets up the DNA services with an image sequencer so that if any of the uploaded files are PNG, JPEG, GIF, BMP or other images, DNA will automatically extract the image's metadata (e.g., image format, physical size, pixel density, etc.) and store that in the repository. Or, if the client uploads MP3 audio files, the title, author, album, year, and comment are extracted from the audio file and stored in the repository.
The example is comprised of 5 classes and 1 interface, located in the src/main/java
directory:
org/jboss/example/dna/sequencers/ConsoleInput.java /ContentInfo.java /JavaInfo.java /MediaInfo.java /SequencingClient.java /UserInterface.java
SequencingClient
is the class that contains the main application. ContentInfo
is a simple class
that encapsulate metadata generated by the sequencers and accessed by this example application, and there are two subclasses:
MediaInfo
encapsulates metadata about media (image and MP3) files, while JavaInfo
is a subclass
encapsulating information about a Java class. The client accesses the content from the repository and represent the
information using instances of ContentInfo
(and its subclasses) and then passing them to the UserInterface
.
UserInterface
is an interface with methods that will be called at runtime to
request data from the user. ConsoleInput
is an implementation of this that creates a text user interface,
allowing the user to operate the client from the command-line. We can easily create a graphical implementation of
UserInterface
at a later date. We can also create a mock implementation for testing purposes that simulates
a user entering data. This allows us to check the behavior of the client automatically using conventional JUnit test cases,
as demonstrated by the code in the src/test/java
directory:
org/jboss/example/dna/sequencers/SequencingClientTest.java /MockUserInterface.java
If we look at the SequencingClient
code, there are a handful of methods that encapsulate the various activities.
Some of the code samples included in this book have had some of the error handling and comments removed so that the samples are more readable and concise.
The startRepository()
method starts up an in-memory Jackrabbit JCR repository. The bulk of this method is simply
gathering and passing the information required by Jackrabbit. Because Jackrabbit's TransientRepository
implementation shuts down after the last session is closed, the application maintains a session to ensure that the
repository remains open throughout the application's lifetime. And finally, the node type needed by the image sequencer is
registered with Jackrabbit.
public void startRepository() throws Exception {
if (this.repository == null) {
try {
// Load the Jackrabbit configuration ...
File configFile = new File(this.jackrabbitConfigPath);
String pathToConfig = configFile.getAbsolutePath();
// Find the directory where the Jackrabbit repository data will be stored ...
File workingDirectory = new File(this.workingDirectory);
String workingDirectoryPath = workingDirectory.getAbsolutePath();
// Get the Jackrabbit custom node definition (CND) file ...
URL cndFile = Thread.currentThread().getContextClassLoader().getResource("jackrabbitNodeTypes.cnd");
// Create the Jackrabbit repository instance and establish a session to keep the repository alive ...
this.repository = new TransientRepository(pathToConfig, workingDirectoryPath);
if (this.username != null) {
Credentials credentials = new SimpleCredentials(this.username, this.password);
this.keepAliveSession = this.repository.login(credentials, this.workspaceName);
} else {
this.keepAliveSession = this.repository.login();
}
try {
// Register the node types (only valid the first time) ...
Workspace workspace = this.keepAliveSession.getWorkspace();
JackrabbitNodeTypeManager mgr = (JackrabbitNodeTypeManager)workspace.getNodeTypeManager();
mgr.registerNodeTypes(cndFile.openStream(), JackrabbitNodeTypeManager.TEXT_X_JCR_CND);
} catch (RepositoryException e) {
if (!e.getMessage().contains("already exists")) throw e;
}
} catch (Exception e) {
this.repository = null;
this.keepAliveSession = null;
throw e;
}
}
}
As you can see, this method really has nothing to do with JBoss DNA, other than setting up a JCR repository that JBoss DNA will use.
The shutdownRepository()
method shuts down the Jackrabbit transient repository by closing the "keep-alive session".
Again, this method really does nothing specifically with JBoss DNA, but is needed to manage the JCR repository that JBoss DNA uses.
public void shutdownRepository() throws Exception {
if (this.repository != null) {
try {
this.keepAliveSession.logout();
} finally {
this.repository = null;
this.keepAliveSession = null;
}
}
}
The startDnaServices()
method first starts the JCR repository (if it was not already started), and proceeds
to create and configure the SequencingService
as described earlier.
This involes setting up the SessionFactory
and ExecutionContext
, creating the
SequencingService
instance, and configuring the image sequencer. The method then continues by setting up the
ObservationService
as described earlier and starting the service.
public void startDnaServices() throws Exception {
if (this.repository == null) this.startRepository();
if (this.sequencingService == null) {
SimpleSessionFactory sessionFactory = new SimpleSessionFactory();
sessionFactory.registerRepository(this.repositoryName, this.repository);
if (this.username != null) {
Credentials credentials = new SimpleCredentials(this.username, this.password);
sessionFactory.registerCredentials(this.repositoryName + "/" + this.workspaceName, credentials);
}
this.executionContext = new SimpleExecutionContext(sessionFactory);
// Create the sequencing service, passing in the execution context ...
this.sequencingService = new SequencingService();
this.sequencingService.setExecutionContext(executionContext);
// Configure the sequencers.
String name = "Image Sequencer";
String desc = "Sequences image files to extract the characteristics of the image";
String classname = "org.jboss.dna.sequencer.images.ImageMetadataSequencer";
String[] classpath = null; // Use the current classpath
String[] pathExpressions = {"//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd))[*]/jcr:content[@jcr:data] => /images/$1"};
SequencerConfig imageSequencerConfig = new SequencerConfig(name, desc, classname, classpath, pathExpressions);
this.sequencingService.addSequencer(imageSequencerConfig);
// Set up the MP3 sequencer ...
name = "Mp3 Sequencer";
desc = "Sequences mp3 files to extract the id3 tags of the audio file";
classname = "org.jboss.dna.sequencer.mp3.Mp3MetadataSequencer";
pathExpressions = {"//(*.mp3)[*]/jcr:content[@jcr:data] => /mp3s/$1"};
SequencerConfig mp3SequencerConfig = new SequencerConfig(name, desc, classname, classpath, pathExpressions);
this.sequencingService.addSequencer(mp3SequencerConfig);
name = "Java Sequencer";
desc = "Sequences java files to extract the characteristics of the Java source";
classname = "org.jboss.dna.sequencer.java.JavaMetadataSequencer";
pathExpressions = {"//(*.java[*])/jcr:content[@jcr:data] => /java/$1"};
SequencerConfig javaSequencerConfig = new SequencerConfig(name, desc, classname,classpath, pathExpressions);
this.sequencingService.addSequencer(javaSequencerConfig);
// Use the DNA observation service to listen to the JCR repository (or multiple ones), and
// then register the sequencing service as a listener to this observation service...
this.observationService = new ObservationService(this.executionContext.getSessionFactory());
this.observationService.getAdministrator().start();
this.observationService.addListener(this.sequencingService);
this.observationService.monitor(this.repositoryName + "/" + this.workspaceName, Event.NODE_ADDED | Event.PROPERTY_ADDED | Event.PROPERTY_CHANGED);
}
// Start up the sequencing service ...
this.sequencingService.getAdministrator().start();
}
The shutdownDnaServices()
method is pretty straightforward: it just calls shutdown on each of the services
and waits until they terminate.
public void shutdownDnaServices() throws Exception {
if (this.sequencingService == null) return;
// Shut down the service and wait until it's all shut down ...
this.sequencingService.getAdministrator().shutdown();
this.sequencingService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);
// Shut down the observation service ...
this.observationService.getAdministrator().shutdown();
this.observationService.getAdministrator().awaitTermination(5, TimeUnit.SECONDS);
}
None of the other methods really do anything with JBoss DNA per se. Instead, they merely work with the repository using the JCR API.
The main
method of the SequencingClient
class creates a SequencingClient
instance,
and passes a new ConsoleInput
instance:
public static void main( String[] args ) throws Exception {
SequencingClient client = new SequencingClient();
client.setRepositoryInformation("repo", "default", "jsmith", "secret".toCharArray());
client.setUserInterface(new ConsoleInput(client));
}
If we look at the ConsoleInput
constructor, it starts the repository, the DNA services, and a thread
for the user interface. At this point, the constructor returns, but the main application continues under the user interface thread.
When the user requests to quit, the user interface thread also shuts down the DNA services and JCR repository.
public ConsoleInput( SequencerClient client ) {
try {
client.startRepository();
client.startDnaServices();
System.out.println(getMenu());
Thread eventThread = new Thread(new Runnable() {
private boolean quit = false;
public void run() {
try {
while (!quit) {
// Display the prompt and process the requested operation ...
}
} finally {
try {
// Terminate ...
client.shutdownDnaServices();
client.shutdownRepository();
} catch (Exception err) {
System.out.println("Error shutting down sequencing service and repository: "
+ err.getLocalizedMessage());
err.printStackTrace(System.err);
}
}
}
});
eventThread.start();
} catch (Exception err) {
System.out.println("Error: " + err.getLocalizedMessage());
err.printStackTrace(System.err);
}
}
At this point, we've reviewed all of the interesting code in the example application. However, feel free to play with the application, trying different things.
In this chapter we covered the different JBoss DNA components used for automatically sequencing a variety of
types of information, and how those components can be used in your application.
Specifically, we described how the SequencingService
and ObservationService
can
be configured and used. And we ended the chapter by reviewing the example application, which not only uses
JBoss DNA, but also the repository via the JCR API.