Skip to end of metadata
Go to start of metadata

ModeShape makes it easy to use JCR repositories within web and Java EE applications deployed to virtually any web or application server. ModeShape makes this even easier with JBoss AS7, since ModeShape can be installed, managed and monitored as a true JBoss AS7 subsystem.

But ModeShape is also small and lightweight enough that you can very easily embed it into your own Java SE applications. And doing so is remarkably easy. The only thing you need to determine is how much control and management your application will need to have over the ModeShape repositories. On one hand, if your application needs to just look up and use one or more JCR Repository instances, then it could use the JCR API as we've seen before. On the other hand, your application may need more control over dynamically deploying, monitoring, changing the configuration, and undeploying individual repositories. In this case, your application can use the ModeShape-specific API.

The ModeShape Engine

ModeShape provides a single component, called the ModeShape engine, that controls and manages your repositories. The engine is just a simple Java class, called ModeShapeEngine, that your application instantiates, starts, uses to (dynamically) deploy and undeploy repositories, and stops the engine before your application shuts down. There are two ways to do this: use the ModeShape-specific API, or use only the JCR API (even to manage and use multiple repositories in a single application). We'll cover both approaches, including talking about the pros and cons of each.

Using the ModeShape Engine API

The ModeShape Engine API allows your application to fully control the ModeShape repositories and the lifecycle of all repositories, and is best-suited for applications that dynamically create and manage multiple repositories, or that need explicit control over ModeShape's lifecycle.

There are primarily two classes that are involved: ModeShapeEngine and RepositoryConfiguration.

Creating the engine and deploying repositories

The ModeShapeEngine class represents a container for named javax.jcr.Repository instances, and can

  • dynamically deploy new Repository instances
  • start and stop Repository instances
  • change the configuration of a Repository, even when it is running and being used
  • obtain the names of all deployed Repository instances
  • dynamically undeploy Repository instances
  • shut down the entire engine while (gracefully or immediately) shutting down any running Repository instances

The ModeShapeEngine class is thread-safe, so your application can use multiple threads to do any of these operations. Each ModeShapeEngine instance is completely independent of the others, so your application can even create multiple ModeShapeEngine instances within the same JVM. However, most applications will simply need a single instance.

Most ModeShape components are thread-safe and able to be safely used concurrently by multiple threads. This includes the ModeShapeEngine and implementations of javax.jcr.Repository, javax.jcr.Session, javax.jcr.Node, javax.jcr.Property, and other JCR interfaces. And it also includes immutable classes like RepositoryConfiguration. Remember, however, that each Session instance can contain transient changes, so do not have multiple threads sharing a Session to perform writes - the threads will succeed in making concurrent changes, but the transient state of the Session will be a combination of all the changes and calls to Session.save() will result in strange persisted states and potential invalid content states.

Your application can create a new ModeShapeEngine with its no-argument constructor:

In this state, the engine exists in a minimal state and needs to be started before it can be used. To do this, call the start() method, which will block while the engine initializes its small internal state:

A new repository is deployed by reading in its JSON configuration document (which we'll learn about later) and then passing that to the engine:

The deploy(...) method first validates the JSON document to make sure it is structurally correct and satisfies the schema; any problems result in an exception containing the validation errors. If the configuration is valid and there isn't already a deployed repository with the same name, the deploy(...) method will then create and return the Repository instance.

Each repository can also be started and stopped, although unlike the engine a repository will automatically start when your application attempts to create a Session.

One advantage of using the Engine API is that your application can get the names of the deployed Repository instances and, given a repository name, can return the running state of the Repository as well as the Repository instance:

Note that the Repository doesn't need to be running in order to get it. In fact, each Repository instance can be started explicitly or will automatically start as soon as the Repository.login(...) method is called.

Using a programmatic Infinispan configuration (advanced)

ModeShape's repository configuration files will usually reference an Infinispan configuration file. Sometimes, you want to programmatically define the Infinispan configuration.

To do this, simply use Infinispan's public API to create and/or modify an existing Infinispan configuration for a cache, and then set up your RepositoryConfiguration to use a special "Environment" object:

Lines 6-9 involve defining the name of the Infinispan cache and using the Infinispan API to build a new cache Configuration. (Note you should define the whole configuration or read in an existing configuration to modify.)

On line 12 we instantiate a new org.modeshape.jcr.LocalEnvironment object that owns all non-ModeShape components. Then register your cache configuration and/or cache container:

  • If you want to define and manage your own cache container, simply instantiate it, register it with the environment, and then define your cache (lines 16-19); or
  • If you just want to use a default container, you can simply define your cache configuration (line 21).

In both cases, be sure that the value of cacheName matches the "cacheName" value in the repository configuration.

Finally, obtain a copy of your original RepositoryConfiguration that uses your LocalConfiguration instance.

Modifying the configuration programmatically before deployment (advanced)

Sometimes your application will need to review or modify a repository configuration. If you need to do this before you deploy the repository, then you can edit the JSON document using ModeShape's editor API. Here's a very simple example:

At this point, you can deploy the new configuration:

or even write out the configuration to a JSON file:

or write the JSON to a string:

Modifying the configuration of a deployed repository (advanced)

Sometimes you want to be able to to change the configuration of a repository that is already deployed and running.

Each Repository instance keeps a reference to its immutable RepositoryConfiguration. But that configuration can be edited to alter the repository's configuration even if that Repository is running and being used by JCR clients. Here's the basic workflow for changing the configuration of a deployed Repository:

The example obtained the RepositoryConfiguration (line 2), obtained an editor for it (line 5), and then manipulates the JSON document on lines 8-10 to get or create the "storage" nested document, and then inside that get or create the "binaryStorage" nested document, and inside that set the "minimumBinarySizeInBytes" field to 8K. The example then gets the changes made by our editor (line 13), validates the changes (line 14), and either writes out the validation problems (line 17) or applies the changes (line 18).

The engine.update(...) method call (line 18) applies the configuration in a consistent and thread-safe manner. It first obtains an internal lock, grabs the repository's current configuration (which may have changed since our call at line 2), applies the changes that were made by the editor, validates the configuration, updates the running repository with the new valid configuration, and releases the internal lock. Note that this can all be done even when there are other parts of your application that are still using the Repository to read and update content.

Of course, some configuration changes are pretty severe, like changing the Infinispan cache where a repository stores all its content. These kinds of changes can still be made, but will not take effect until the repository is shutdown and re-started.

This process may seem complicated, but it means that your application doesn't have to coordinate or centralize the changes. Instead, multiple threads can safely make changes to the same repository configuration without having to worry about locking or synchronizing the changes. Of course, if multiple threads make different changes to the same configuration property, the last one to be applied will win.

Shutting down and undeploying Repositories

Repository instances can be shutdown and undeployed:

Note that the ModeShapeEngine.undeploy(String) called on line 2 will undeploy the repository (meaning no new sessions can be created) and asynchronously shut the repository down (close all existing sessions). Because it is asynchronous, the undeploy(...) method returns immediately but returns a java.util.concurrent.Future object that the caller can optionally use to block until the repository was completely shutdown (line 3).

Shutting down the engine

And finally, the entire engine can be shutdown:

Once again, the shutdown() method is asynchronous, but it returns a Future so that the caller can block if needed. There is an alternative form of shutdown that takes a boolean parameter specifying whether the engine should force the shutdown of all running repositories, or whether the shutdown should fail if there is at least one running repository:

Use RepositoryFactory and the JCR API

The simplest approach an Java SE application can take is to use only the JCR 2.0 API. We talked in the Introduction to JCR how an application can use the J2SE Service Loader mechanism and JCR's RepositoryFactory API to find a JCR Repository instance:

This approach is great if your application is designed to use different JCR implementations and you don't want to use implementation-specific APIs. You can even load the properties from a file:

When embedding ModeShape into an application, the parameters map should contain at a minimum single property that defines the URL to the repository's configuration file. Thus the properties file might look like this:

or you can create the parameters programmatically:

In addition to the "org.modeshape.jcr.URL" parameter, ModeShape also looks for a "org.modeshape.jcr.RepositoryName" parameter. Each repository configuration file contains the name of the repository, so this "RepositoryName" parameter is not required. But providing it allows ModeShape's RepositoryFactory implementation to see if the named Repository has already been deployed without having to read the configuration file. If it doesn't find a Repository that was deployed with that name and that configuration file, the factory will automatically deploy the specified configuration and return the named repository.

As a convenience, ModeShape provides two constants you can use in your application when programmatically creating the parameters to pass to RepositoryFactory:

Pros and cons

Obviously the most important benefit of your applications using the JCR RepositoryFactory to find the Repository instances is that it is completely independent of ModeShape (or any other JCR 2.0 implementation). This may be very important if your application needs to work with multiple JCR implementations.

On the other hand, the JCR 2.0 API doesn't provide a way to manage the Repository instances. For example, your application may want to shut down a repository after it has finished using it. Or perhaps it wants to alter the configuration of a repository while it is in use. In these cases, your application may want to use the ModeShape-specific Engine API.

Configuring repositories

Whether your application uses the JCR 2.0 RepositoryFactory to obtain its repositories or the ModeShape Engine API to explicitly manage and access its repositories, your application will need to have a separate ModeShape configuration file for each repository you want to use. You'll also likely want to have for each ModeShape Engine instance one Infinispan configuration file that defines the caches used for all of the repositories in that engine.

ModeShape repository configuration files

Each ModeShape repository is configured with a separate and independent JSON file that adheres to our JSON Schema. Every field within the configuration has a sensible default, so actually the following is a completely valid configuration:

Simplest.json

Note that the name of the repository is derived from the filename. It is more idiomatic, however, to at least specify the repository name:

Simplest.json

When deployed, this configuration specifies a non-clustered repository named "Simplest" that stores the content, binary values, and query indexes on the local file system under a local directory named "Simplest". Of course, your very likely going to want to expressly state the various configuration fields for your repositories.

Here's a far more complete example for a repository named "DataRepository" that uses most of the available fields:

DataRepository.json

Most of the field values match their defaults, although by default:

  • the "storage/cacheConfiguration" field is not specified, meaning an Infinispan cache configuration is dynamically created to store the content in local memory;
  • the "workspaces/predefined", "query/extractors", and "sequencing/sequencers" fields are each empty arrays;
  • the "query/indexing/hibernate.search.*" properties are not defined; and
  • the "security/providers" field defaults to an empty array, meaning only the anonymous provider is configured.

Of course, the standard JSON formatting rules apply.

Variables

Variables may appear anywhere within the configuration JSON document's string field values. If a variable is to be used within a non-string field, simply use a string field within the JSON document. When ModeShape reads in the JSON document, these variables will be replaced with the system properties of the same name, and any resulting fields that are expected to be non-string values will be converted into the expected field type. Any problem converting values will be reported as problems.

Here's the grammar for the variables:

The value of each variableName is used to look up a System property via System.getProperty(String). Note that the grammar allows specifying multiple variable names within a single variable and to optionally specify a default value. The logic will process the multiple variable names from let to right, until an existing system property is found; if one is found, it will stop and will not attempt to find values for the other variables.

For example, here is part of the earlier "DataRepository.json" file, except the "cacheConfiguration" field value has been changed to include a variable:

DataRepository.json

Note how the "minimumBinarySizeInBytes" value is a string with the variable name; this works because ModeShape (in this case) will attempt to autoconvert the variable's replacement and default values to an integer, which is what the JSON Schema stipulates for the "minimumBinarySizeInBytes" field.

Infinispan configuration file

Most of the time you'll probably want to explicitly define an Infinispan configuration file for your repository or repositories. Infinispan provides a [configuration reference] that documents the structure of their XML files.

The following is an example of a configuration file referenced by our repository configuration in the previous section (line 14), and it defines a single cache named "DataRepository" referenced in the repository configuration (line 13):

infinispan_configuration.xml

Clustering

Clustering a repository that is embedded into a Java application simply requires ensuring both ModeShape and Infinispan are clustered properly. Some things to consider are:

  1. Where will the persisted content be stored, and will that persisted data be shared? For example, the repository's binary store should be shared amongst all processes in the cluster (e.g., they all access/use the same file system, JDBC/MongoDB/Cassandra database, Infinispan cache), but the Infinispan cache(s) used by a repository can either share the same storage or have their own independent copy. Be sure to read more about clustering topologies.
  2. How will the different processes communicate. This communication is all via JGroups, so a proper JGroups configuration is essential.
  3. How frequently will processes be added and removed from the cluster?

There are three areas of a repository's configuration that are related to clustering.

Using variables in the ModeShape and Infinispan configuration files is a very good practice because it allows your application to use a single set of configuration files throughout the cluster. Consider using a variable such as "${cluster-id}" to represent the unique identifier of the process within the cluster. Just be sure to set the value of each variable in the system properties; ModeShape does not provide any built-in variables.

Cluster name and JGroups

The "clustering" section of the repository JSON configuration file specifies the name of the cluster and the JGroups configuration. Here is an example of this section:

The ModeShape repository will only act in a clustered way if this section is defined.

Each ModeShape repository must be clustered independently of the other repositories deployed to the same processes, so be sure that the "clusterName" value is unique. Even though there is a default value (e.g., "ModeShape-JCR"), it is far better to explicitly set this value.

The "channelConfiguration" field defines the path to the JGroups configuration file. If this field is absent, then the repository will use the default JGroups configuration, which may or may not work out-of-the-box on your network. Here is a sample JGroups configuration file we use in some of our tests:

A third field in the "clustering" section of the repository's JSON configuration is the "channelProvider" field, which specifies the fully-qualified name of an org.modeshape.jcr.clustering.ChannelProvider implementation class. The purpose of this class is to return a "JChannel" instance, and the default implementation does this by reading the aforementioned "channelConfiguration" field and setting up JGroups. If you require a different way of configuring and acquiring the JGroups channel, simply implement your own ChannelProvider and tell the repository about it with the "channelProvider" field. Most applications don't need to worry about this field.

Storage

The "storage" section of the repository's JSON configuration file defines the Infinispan cache configuration as well as the binary storage, and both need to be properly configured for clustering. Here's an example:

where the "infinispan.xml" file for our example is:

There are a couple of things to note:

  1. The "${cluster-id}" variable is used in file system paths in the Infinispan configuration. If this were set to a unique value for each process via a system or environment variable, then each process will have its own separate cache storage area on the file system. Since the cache is replicated, each directory will contain a complete copy of all the content stored in this cache. (Note: if you want the processes to share the same store, be sure that the cache store implementation supports it. For example, the FileCacheStore should never be used as a shared store.)_
  2. All processes store the binaries in the "storage/binaries" directory. Using the file system may not work under heavy load, so in such cases you may consider using a database or Infinispan for binary storage.
  3. The ModeShape repository and Infinispan cache use the same JGroups configuration. This is perfectly fine and in fact suggested. Also, it is perfectly acceptable for both to use the same cluster name, since they send and expect different information through the channel.

Indexes

The "query" section of the ModeShape repository JSON configuration file defines how it is to create and manage the indexes used for querying. And configuring the indexes properly is an essential part of clustering ModeShape. Unfortunately, clustering the indexes can be a little tricky.

If your application does not use queries at all, disable the query and indexing system altogether by setting "enabled" to false. This will make clustering your repository significantly easier.

One option to clustering indexes is to configure one of the processes to be the master index writer and all others slaves, and then using a (durable) JMS queue to forward all update requests to the master process. The indexes will be immediately updated on the master process, and then periodically copied to the slaves on a schedule that you decide. The advantages of this approach are that it minimizes the number of writes (since the indexes are updated only once for each change in content), and bringing up additional slave processes automatically copy the indexes from the master. The disadvantages are that this is more complicated to setup and maintain, each process needs access to a shared network file system, and the queries on the slaves may show different results than the same query executed at the same time on the master because the slave indexes are updated periodically.

Another approach is to have each process maintain its own completely isolated copy of the indexes. This does tend to increase overall CPU load, since all processes are updating their own indexes for every change made in the repository. It also makes it a bit more difficult to add or remove processes from the cluster, since each process' index must be populated either by copying the indexes from another process or by rebuilding the indexes locally. But this approach is far easier to configure and maintain, makes the processes much more independent, and makes it more likely that the query results on each of the processes are the same.

Here is an example of the "query" section of the ModeShape repository configuration that uses the latter technique:

This enables indexing, ensuring that the indexes are completely rebuilt if they are absent when the process starts up. The indexes themselves are store on the file system in a directory that uses the same variable we used above. It is also very simple and straightforward.

Here is an example of a "query" section for use in the configuration for the repository acting as the master:

and a corresponding example of a "query" section for use in the configuration for the repository acting as the slaves:

Note that there are several differences between the master and slave configurations, but again variables could be used to reduce it down to a single configuration file.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.