ModeShape in Java applications

ModeShape makes it easy to use JCR repositories within web and Java EE applications deployed to virtually any web or application server. ModeShape makes this even easier with JBoss Wildfly, since ModeShape can be installed, managed and monitored as a true JBoss Wildfly subsystem.

But ModeShape is also small and lightweight enough that you can very easily embed it into your own Java SE applications. And doing so is remarkably easy. The only thing you need to determine is how much control and management your application will need to have over the ModeShape repositories. On one hand, if your application needs to just look up and use one or more JCR Repository instances, then it could use the JCR API as we've seen before. On the other hand, your application may need more control over dynamically deploying, monitoring, changing the configuration, and undeploying individual repositories. In this case, your application can use the ModeShape-specific API.

The ModeShape Engine

ModeShape provides a single component, called the ModeShape engine, that controls and manages your repositories. The engine is just a simple Java class, called ModeShapeEngine, that your application instantiates, starts, uses to (dynamically) deploy and undeploy repositories, and stops the engine before your application shuts down. There are two ways to do this: use the ModeShape-specific API, or use only the JCR API (even to manage and use multiple repositories in a single application). We'll cover both approaches, including talking about the pros and cons of each.

Using the ModeShape Engine API

The ModeShape Engine API allows your application to fully control the ModeShape repositories and the lifecycle of all repositories, and is best-suited for applications that dynamically create and manage multiple repositories, or that need explicit control over ModeShape's lifecycle.

There are primarily two classes that are involved: ModeShapeEngine and RepositoryConfiguration.

Creating the engine and deploying repositories

The ModeShapeEngine class represents a container for named javax.jcr.Repository instances, and can

dynamically deploy new Repository instances
start and stop Repository instances
change the configuration of a Repository, even when it is running and being used
obtain the names of all deployed Repository instances
dynamically undeploy Repository instances
shut down the entire engine while (gracefully or immediately) shutting down any running Repository instances

The ModeShapeEngine class is thread-safe, so your application can use multiple threads to do any of these operations. Each ModeShapeEngine instance is completely independent of the others, so your application can even create multiple ModeShapeEngine instances within the same JVM. However, most applications will simply need a single instance.

Most ModeShape components are thread-safe and able to be safely used concurrently by multiple threads. This includes the ModeShapeEngine and implementations of javax.jcr.Repository, javax.jcr.Session, javax.jcr.Node, javax.jcr.Property, and other JCR interfaces. And it also includes immutable classes like RepositoryConfiguration. Remember, however, that each Session instance can contain transient changes, so do not have multiple threads sharing a Session to perform writes - the threads will succeed in making concurrent changes, but the transient state of the Session will be a combination of all the changes and calls to Session.save() will result in strange persisted states and potential invalid content states.

Your application can create a new ModeShapeEngine with its no-argument constructor:

org.modeshape.jcr.ModeShapeEngine engine = new ModeShapeEngine();

In this state, the engine exists in a minimal state and needs to be started before it can be used. To do this, call the start() method, which will block while the engine initializes its small internal state:

engine.start();

A new repository is deployed by reading in its JSON configuration document (which we'll learn about later) and then passing that to the engine:

RepositoryConfiguration config = RepositoryConfiguration.read(...); // from a file, URL, stream, or content String
javax.jcr.Repository repo = engine.deploy(config);

The deploy(...) method first validates the JSON document to make sure it is structurally correct and satisfies the schema; any problems result in an exception containing the validation errors. If the configuration is valid and there isn't already a deployed repository with the same name, the deploy(...) method will then create and return the Repository instance.

Each repository can also be started and stopped, although unlike the engine a repository will automatically start when your application attempts to create a Session.

One advantage of using the Engine API is that your application can get the names of the deployed Repository instances and, given a repository name, can return the running state of the Repository as well as the Repository instance:

Set<String> names = engine.getRepositoryNames();
for ( String name : names ) {
    State state = engine.getRepositoryState(name);
    if ( State.RUNNING == state ) {
        // do something with this info
    }
    Repository repo = engine.getRepository(name);
}

Note that the Repository doesn't need to be running in order to get it. In fact, each Repository instance can be started explicitly or will automatically start as soon as the Repository.login(...) method is called.

Modifying the configuration programmatically before deployment (advanced)

Sometimes your application will need to review or modify a repository configuration. If you need to do this before you deploy the repository, then you can edit the JSON document using ModeShape's editor API. Here's a very simple example:

// Read in the existing configuration ...
RepositoryConfiguration config = RepositoryConfiguration.read("path/to/config.json");

// Edit the document ...
Editor editor = config.edit();
editor.setString(RepositoryConfiguration.FieldName.JNDI_NAME, "new-jndi-name");

// Create a new configuration with the edited document ...
RepositoryConfiguration newConfig = new RepositoryConfiguration(editor,config.getName());

// Deploy the new configuration ...
javax.jcr.Repository repo = engine.deploy(config);

At this point, you can deploy the new configuration:

javax.jcr.Repository repo = engine.deploy(newConfig);

or even write out the configuration to a JSON file:

OutputStream stream = ...
org.modeshape.schematic.document.Json.write(newConfig.getDocument(),stream);

or write the JSON to a string:

String json = org.modeshape.schematic.document.Json.write(newConfig.getDocument());

Modifying the configuration of a deployed repository (advanced)

Sometimes you want to be able to to change the configuration of a repository that is already deployed and running.

Each Repository instance keeps a reference to its immutable RepositoryConfiguration. But that configuration can be edited to alter the repository's configuration even if that Repository is running and being used by JCR clients. Here's the basic workflow for changing the configuration of a deployed Repository:

String repoName = ...
RepositoryConfiguration deployedConfig = engine.getRepositoryConfiguration(repoName);

// Create an editor ...
Editor editor = deployedConfig.edit();

// Use the editor to modify the JSON configuration (we'll do something trivial here) ...
EditableDocument storage = editor.getOrCreateDocument(FieldName.STORAGE);
EditableDocument binaries = storage.getOrCreateDocument(FieldName.BINARY_STORAGE);
binaries.setNumber(FieldName.MINIMUM_BINARY_SIZE_IN_BYTES,8096);

// Get the changes made by the editor and validate them ...
Changes changes = editor.getChanges();
Results validationResults = deployedConfig.validate(changes);
if ( validationResults.hasErrors() ) {
   // Report the errors
   System.out.println(validationResults);
} else {
   // Update the deployed repository's configuration with these changes ...
   engine.update(repoName,changes);
}

The example obtained the RepositoryConfiguration (line 2), obtained an editor for it (line 5), and then manipulates the JSON document on lines 8-10 to get or create the "storage" nested document, and then inside that get or create the "binaryStorage" nested document, and inside that set the "minimumBinarySizeInBytes" field to 8K. The example then gets the changes made by our editor (line 13), validates the changes (line 14), and either writes out the validation problems (line 17) or applies the changes (line 18).

The engine.update(...) method call (line 18) applies the configuration in a consistent and thread-safe manner. It first obtains an internal lock, grabs the repository's current configuration (which may have changed since our call at line 2), applies the changes that were made by the editor, validates the configuration, updates the running repository with the new valid configuration, and releases the internal lock. Note that this can all be done even when there are other parts of your application that are still using the Repository to read and update content.

Of course, some configuration changes are pretty severe, like changing the place where a repository stores all its content. These kinds of changes can still be made, but will not take effect until the repository is shutdown and re-started.

This process may seem complicated, but it means that your application doesn't have to coordinate or centralize the changes. Instead, multiple threads can safely make changes to the same repository configuration without having to worry about locking or synchronizing the changes. Of course, if multiple threads make different changes to the same configuration property, the last one to be applied will win.

Shutting down and undeploying Repositories

Repository instances can be shutdown and undeployed:

String repoName = ...
Future<Boolean> future = engine.undeploy(repoName);
future.get();   // optional, but blocks until repository is completely shutdown and removed

Note that the ModeShapeEngine.undeploy(String) called on line 2 will undeploy the repository (meaning no new sessions can be created) and asynchronously shut the repository down (close all existing sessions). Because it is asynchronous, the undeploy(...) method returns immediately but returns a java.util.concurrent.Future object that the caller can optionally use to block until the repository was completely shutdown (line 3).

Shutting down the engine

And finally, the entire engine can be shutdown:

Future<Boolean> future = engine.shutdown();
if ( future.get() ) {   // optional, but blocks until engine is completely shutdown or interrupted
    System.out.println("Shut down ModeShape");
}

Once again, the shutdown() method is asynchronous, but it returns a Future so that the caller can block if needed. There is an alternative form of shutdown that takes a boolean parameter specifying whether the engine should force the shutdown of all running repositories, or whether the shutdown should fail if there is at least one running repository:

boolean forceShutdown = false;
Future<Boolean> future = engine.shutdown(forceShutdown);
if ( future.get() ) {   // optional, but blocks until engine is completely shutdown or interrupted
    System.out.println("Shut down ModeShape.");
} else {
    System.out.println("At least one repository is in use, so shutdown aborted.");
}

Use RepositoryFactory and the JCR API

The simplest approach an Java SE application can take is to use only the JCR 2.0 API. We talked in the Introduction to JCR how an application can use the J2SE Service Loader mechanism and JCR's RepositoryFactory API to find a JCR Repository instance:

Map<String,String> parameters = ...
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}

This approach is great if your application is designed to use different JCR implementations and you don't want to use implementation-specific APIs. You can even load the properties from a file:

java.io.InputStream stream = ... // get the stream to the properties file
java.util.Properties parameters = new Properties();
parameters.load(stream);  // or reader

Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}

When embedding ModeShape into an application, the parameters map should contain at a minimum single property that defines the URL to the repository's configuration file. Thus the properties file might look like this:

org.modeshape.jcr.URL = file://path/to/configFile.json

or you can create the parameters programmatically:

Map<String,String> parameters = new HashMap<String,String>();
parameters.put("org.modeshape.jcr.URL","file://path/to/configFile.json");

Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}

In addition to the "org.modeshape.jcr.URL" parameter, ModeShape also looks for a "org.modeshape.jcr.RepositoryName" parameter. Each repository configuration file contains the name of the repository, so this "RepositoryName" parameter is not required. But providing it allows ModeShape's RepositoryFactory implementation to see if the named Repository has already been deployed without having to read the configuration file. If it doesn't find a Repository that was deployed with that name and that configuration file, the factory will automatically deploy the specified configuration and return the named repository.

As a convenience, ModeShape provides two constants you can use in your application when programmatically creating the parameters to pass to RepositoryFactory:

package org.modeshape.jcr.api;

public interface RepositoryFactory extends javax.jcr.RepositoryFactory {

    public static final String URL = "org.modeshape.jcr.URL";

    public static final String REPOSITORY_NAME = "org.modeshape.jcr.RepositoryName";

    ...
}

Pros and cons

Obviously the most important benefit of your applications using the JCR RepositoryFactory to find the Repository instances is that it is completely independent of ModeShape (or any other JCR 2.0 implementation). This may be very important if your application needs to work with multiple JCR implementations.

On the other hand, the JCR 2.0 API doesn't provide a way to manage the Repository instances. For example, your application may want to shut down a repository after it has finished using it. Or perhaps it wants to alter the configuration of a repository while it is in use. In these cases, your application may want to use the ModeShape-specific Engine API.

Configuring repositories

Whether your application uses the JCR 2.0 RepositoryFactory to obtain its repositories or the ModeShape Engine API to explicitly manage and access its repositories, your application will need to have a separate ModeShape configuration file for each repository you want to use.

ModeShape repository configuration files

Each ModeShape repository is configured with a separate and independent JSON file that adheres to [our JSON Schema|https://github.com/ModeShape/modeshape/blob/modeshape-5.4.1.Final/modeshape-jcr/src/main/resources/org/modeshape/jcr/repository-config-schema.json]. Every field within the configuration has a sensible default, so actually the following is a completely valid configuration:

Simplest.json

{ }

Note that the name of the repository is derived from the filename. It is more idiomatic, however, to at least specify the repository name:

Simplest.json

{
    "name" : "Simplest"
}

When deployed, this configuration specifies a non-clustered repository named "Simplest" that stores the content, binary values, and query indexes on the local file system under a local directory named "Simplest". Of course, your very likely going to want to expressly state the various configuration fields for your repositories.

Here's a far more complete example for a repository named "DataRepository" that uses most of the available fields:

DataRepository.json

{
    "name" : "DataRepository",
    "transactionMode" : "auto",
    "monitoring" : {
        "enabled" : true,
    },
    "workspaces" : {
        "predefined" : ["otherWorkspace"],
        "default" : "default",
        "allowCreation" : true,
    },
    "storage" : {
        "persistence": {
            "type": "file",
            "path" : "DataRepository/store"
        },
        "binaryStorage" : {
            "type" : "file",
            "directory" : "DataRepository/binaries",
            "minimumBinarySizeInBytes" : 4096
        }
    },
    "security" : {
        "anonymous" : {
            "username" : "<anonymous>",
            "roles" : ["readonly","readwrite","admin"],
            "useOnFailedLogin" : false
        },
        "providers" : [
          {
           "name" : "My Custom Security Provider",
           "classname" : "com.example.MyAuthenticationProvider"
          },
          {
            "classname" : "JAAS",
            "policyName" : "modeshape-jcr"
          }
         ]
    },
    "indexProviders" : {
        "local" : {
            "classname" : "org.modeshape.jcr.index.local.LocalIndexProvider",
            "directory" : "target/LocalIndexProviderQueryTest"
        },
    },
   "indexes" : {
        "nodesByName" : {
            "kind" : "value",
            "provider" : "local",
            "nodeType" : "nt:base",
            "columns" : "jcr:name(NAME)"
        },
        "nodesByLocalName" : {
            "kind" : "value",
            "provider" : "local",
            "nodeType" : "nt:base",
            "columns" : "mode:localName(STRING)"
        },
        "nodesByDepth" : {
            "kind" : "value",
            "provider" : "local",
            "nodeType" : "nt:base",
            "columns" : "mode:depth(LONG)"
        },
        "nodesByPath" : {
            "kind" : "value",
            "provider" : "local",
            "nodeType" : "nt:base",
            "columns" : "jcr:path(PATH)"
        }
     },
    "sequencing" : {
        "removeDerivedContentWithOriginal" : true,
        "threadPool" : "modeshape-workers",
        "sequencers" : {
            "ZIP Sequencer" : {
                "description" : "ZIP Files loaded under '/files' and extracted into '/sequenced/zip/$1'",
                "classname" : "ZipSequencer",
                "pathExpressions" : ["default:/files(//)(*.zip[*])/jcr:content[@jcr:data] => default:/sequenced/zip/$1"],
            },
            "Delimited Text File Sequencer" : {
                "classname" : "org.modeshape.sequencer.text.DelimitedTextSequencer",
                "pathExpressions" : [MODE:Clustering])/jcr:content[@jcr:data] => default:/sequenced/text/delimited/$1"
                ],
                "splitPattern" : ","
            }
        }
    }
}

Most of the field values match their defaults, although by default:

the "workspaces/predefined", "query/extractors", and "sequencing/sequencers" fields are each empty arrays;
the "security/providers" field defaults to an empty array, meaning only the anonymous provider is configured.

Of course, the standard JSON formatting rules apply.

Variables

Variables may appear anywhere within the configuration JSON document's string field values. If a variable is to be used within a non-string field, simply use a string field within the JSON document. When ModeShape reads in the JSON document, these variables will be replaced with the system properties of the same name, and any resulting fields that are expected to be non-string values will be converted into the expected field type. Any problem converting values will be reported as problems.

Here's the grammar for the variables:

    variable := '${' variableNames [ ':' defaultValue ] '}'

    variableNames := variableName [ ',' variableNames ]

    variableName := /* any characters except ',' and ':' and '}'

    defaultValue := /* any characters except

The value of each variableName is used to look up a System property via System.getProperty(String). Note that the grammar allows specifying multiple variable names within a single variable and to optionally specify a default value. The logic will process the multiple variable names from let to right, until an existing system property is found; if one is found, it will stop and will not attempt to find values for the other variables.

For example, here is part of the earlier "DataRepository.json" file, except the "cacheConfiguration" field value has been changed to include a variable:

DataRepository.json

{
    ...
    "storage" : {
         "persistence": {
            "type": "file",
            "path" : "${application.home.location}/data"
        },
        "binaryStorage" : {
            "type" : "file",
            "directory" : "${application.home.location}/binaries",
            "minimumBinarySizeInBytes" : "${application.min.binary.size:4096}"
        }
    },
    ...
}

Note how the "minimumBinarySizeInBytes" value is a string with the variable name; this works because ModeShape (in this case) will attempt to autoconvert the variable's replacement and default values to an integer, which is what the JSON Schema stipulates for the "minimumBinarySizeInBytes" field.

Clustering

Clustering a repository that is embedded into a Java application means making sure of a few things:

How to store the data. ModeShape 5 only supports using a shared database (be sure to read more about clustering).
How will the different processes communicate. This communication is all via JGroups, so a proper JGroups configuration is essential.
How frequently will processes be added and removed from the cluster?

There are three areas of a repository's configuration that are related to clustering.

Using variables in the ModeShape configuration files is a very good practice because it allows your application to use a single set of configuration files throughout the cluster. Consider using a variable such as "${cluster-id}" to represent the unique identifier of the process within the cluster. Just be sure to set the value of each variable in the system properties; ModeShape does not provide any built-in variables.

Cluster name and JGroups

Since it no longer depends on Infinispan, ModeShape 5 brings back the ModeShape 3 style of clustering:

The "clustering" section of the repository JSON configuration file specifies the name of the cluster and the JGroups configuration. Here is an example of this section:

    "clustering" : {
        "clusterName" : "my-repo-cluster",
        "configuration" : "config/jgroups-config.xml"
    }

The ModeShape repository will only act in a clustered way if this section is defined.

Each ModeShape repository must be clustered independently of the other repositories deployed to the same processes, so be sure that the "clusterName" value is unique. Even though there is a default value (e.g., "ModeShape-JCR"), it is far better to explicitly set this value.

The "configuration" field defines the path to the JGroups configuration file. If this field is absent, then the repository will use the default JGroups configuration, which may or may not work out-of-the-box on your network. Here is a sample JGroups configuration file we use in some of our tests:

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="urn:org:jgroups"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
    <TCP bind_port="0"
         recv_buf_size="${tcp.recv_buf_size:5M}"
         send_buf_size="${tcp.send_buf_size:5M}"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         sock_conn_timeout="3000"
         timer_type="new3"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"
         thread_pool.enabled="false"
         oob_thread_pool.enabled="false"
         port_range="0"/>
    <MPING/>
    <MERGE3 min_interval="10000"
            max_interval="30000"/>
    <FD timeout="3000" max_tries="3" />
    <VERIFY_SUSPECT timeout="1500" />
    <BARRIER />
    <pbcast.NAKACK2 use_mcast_xmit="false"
                    discard_delivered_msgs="true"/>
    <UNICAST3 />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS join_timeout="3000" view_bundling="true"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K" />
    <pbcast.STATE_TRANSFER/>
</config>

For ModeShape to work correctly in a cluster it is important that the JGroups stack is configured to dispatch local changes in the same thread. This is accomplished via 2 JGroups configuration attributes:

JGroups Attributes

 <PROTOCOL>
    ....
    thread_pool.enabled="false"
    oob_thread_pool.enabled="false"
 </PROTOCOL>

Storage

The "storage" section of the repository's JSON configuration file defines the repository persistence configuration as well as the binary storage, and both need to be properly configured for clustering. Here's an example:

{
    "name" : "Clustered Repository",
    "clustering" : {
        "clusterName" : "my-repo-cluster",
        "configuration" : "config/jgroups-config.xml"
    },
    "storage" : {
        "persistence" : {
            "type" : "db",
            "connectionUrl": "jdbc:h2:file:./target/clustered/db;AUTO_SERVER=TRUE",
            "driver": "org.h2.Driver"
        },
        "binaryStorage":{
          "type":"file",
          "directory":"storage/binaries",
          "minimumBinarySizeInBytes":4096
        }
    }
}

There are a couple of things to note:

All cluster members must have the same clusterName
All repository content is stored in the same database (in this example an H2 database)
All processes store the binaries in the "storage/binaries" directory. Using the file system may not work under heavy load, so in such cases you may consider using a database for binary storage.

Binary storage

When clustering you have a number of options to consider here: for non binary-intensive applications using the same FS location with or without NFS may be acceptable. However, for larger systems you may want to consider using a database or one of our other binary storage options which are more suitable to these types of cases.

Indexes

If you decide to use indexes (which is optional) depending on the type of index provider you may have indexes stored locally on each cluster member, or shared globally by all members via a provider such as Elastic Search.