Getting Started - ModeShape 5

Complete Maven examples

We have a number of self-contained examples that you can checkout, build, and then modify to try different things. So if Git is your thing, the easiest way to get going with ModeShape 5.4.1.Final is to simply clone this repository and build the examples. For details, see our modeshape-examples repository on GitHub, and follow the instructions shown the readme file on that page.

If Git isn't your thing, then read on to learn how to build a JCR application that embeds ModeShape and how you can install ModeShape into Wildfly and use it from within your web applications and services.

Embedding ModeShape in application or library built with Maven

If you're Java SE application or library uses Maven, then embedding ModeShape is really very easy.

The instructions on this page are for Java SE applications. If you're creating applications for deployment onto JBoss Wildfly, see the specific documentation about how to install ModeShape into JBoss Wildfly and use it with your web applications.

Prerequisites

Before you can use Maven to build an application that uses ModeShape, you'll need to have JDK 8 and Maven 3 installed.

All ModeShape releases are now available directly from the Maven Central repository. It takes a few hours (at least) after the artifacts are in the JBoss repository before they appear in Maven Central. So if you don't see a recent release in Maven Central, just give it a bit of time - or use the JBoss Maven repository.

Add ModeShape Dependencies

The next step is to edit your application or library's POM file and add the dependencies to the JCR API and ModeShape. The easiest way to do that is to use one of our Maven BOMs that specifies the versions for all of the ModeShape components and all of its dependencies:

Maven dependencies for the JCR API and ModeShape engine

<dependencyManagement>
  <dependencies>
    <!-- Import the ModeShape BOM for embedded usage. This adds to the "dependenciesManagement" section
         defaults for all of the modules we might need, but we still have to include in the
         "dependencies" section the modules we DO need. The benefit is that we don't have to
         specify the versions of any of those modules.-->
    <dependency>
      <groupId>org.modeshape.bom</groupId>
      <artifactId>modeshape-bom-embedded</artifactId>
      <version>5.1.0.Final</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

Then include in the POM "<dependencies>" section the ModeShape modules that you will directly use. Note that you don't need to specify any of the versions, since that's what the modeshape-bom-embedded provided. The one module that you need to include is the primary JCR implementation:

Maven dependencies for the JCR API and ModeShape engine

<dependency>
  ...
  <dependency>
    <groupId>org.modeshape</groupId>
    <artifactId>modeshape-jcr</artifactId>
  </dependency>
  ...
</dependencies>

But you should also include any other modules that you'll either directly use or optional modules that you want to use. For example, if you're going to use any of ModeShape's public API (instead of just the JCR API), then you should include this dependency:

Optional Maven dependencies for the ModeShape public API

<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-jcr-api</artifactId>
</dependency>

If you're going to use the Relations Persistent Store then you'll also need to add a dependency on the JDBC driver or embeddable database. For example, here's the dependency required to use the embeddable H2 database:

Maven Dependency for the H2 embeddable database

<dependency>
   <groupId>com.h2database</groupId>
   <artifactId>h2</artifactId>
</dependency>

Logging

ModeShape is designed to use the same logging framework as your application, and it can dynamically bind to Log4J, SLF4J, Logback and the JDK's logging system. Your application or library will probably already be using one of these logging frameworks and will already have them in the dependencies.

Configuring a ModeShape repository

The ModeShape engine is capable of running (or "deploying") multiple JCR repositories. However, each repository is configured separately and is completely independent from all other repositories. To configure a repository, you'll need a configuration file. Starting with ModeShape 3.0, these configuration files use the JSON format (which is a lot easier to read and create). Here is the minimum configuration file for a repository:

{ }

That's not a mistake. An empty JSON document is a completely valid repository configuration. Everything has a default value except for the repository's name, and the filename is used if one is not specified in the file. In this case, the name of this repository will be "my_repository".

Of course, lots of other options can be specified in the configuration file, but typically only the non-default values are specified. Since most of the defaults are sensible, many configurations will be pretty small.

Here's a configuration file that uses most of the available fields, most of which happen to be set the same values as the defaults. (This time we'll show line numbers so we can more easily describe what's going on.)

'my_repository.json'

{
    "name" : "Test Repository",
    "jndiName" : "jcr/Test Repository",
    "monitoring" : {
        "enabled" : true
    },
    "workspaces" : {
        "default" : "defaultWorkspace",
        "predefined" : ["otherWorkspace"],
        "allowCreation" : true
    },
    "storage" : {
        "persistence" : {
            "type" : "file",
            "path": "target/test_repository"
        } ,
        "binaryStorage" : {
            "minimumBinarySizeInBytes" : 4096,
            "minimumStringSize" : 4096,
            "type" : "file"
        }
    },
    "security" : {
        "jaas" : {
            "policyName" : "modeshape-jcr"
        }
        "anonymous" : {
            "roles" : ["readonly","readwrite","admin"],
            "username" : "<anonymous>",
            "useOnFailedLogin" : false
        },
        "providers" : [
            {
                "classname" : "org.example.MyAuthorizationProvider",
                "member1" : "value of instance member1"
            }
        ]
    },
    "indexProviders" : {
        "local" : {
            "classname" : "org.modeshape.jcr.index.local.LocalIndexProvider",
            "directory" : "target/local_index_test_repository/1"
        }
    },
    "indexes" : {
        "index" : {
            "kind" : "value",
            "nodeType" : "mix:title",
            "columns" : "jcr:title(STRING)",
            "provider" : "local",
            "synchronous" : true,
            "workspaces" : "*"
        }
    },
   "textExtraction": {
        "extractors" : {
            "tikaExtractor":{
                "name" : "Tika content-based extractor",
                "classname" : "tika"
            }
        }
    },
    "sequencing" : {
        "removeDerivedContentWithOriginal" : true,
        "threadPool" : "modeshape-workers",
        "sequencers" : { [
            {
                "name" : "XSD sequencer",
                "classname" : "xsd",
                "pathExpressions" : [ "/(*.xsd)/jcr:content[@jcr:data]" ],
            }
        ]
    }
}

This configuration defines:

The name of the repository (on line 2) to be "Test Repository", which will take precedence over the name of the file.
The repository will be registered in JNDI (if JNDI is available in the environment) with the name "jcr/Test Repository" (line 3). By default, the JNDI name will follow the pattern "jcr/<name>", where "<name>" is the repository name.
The repository will periodically collect performance and statistical metrics in the background (line 5). This is enabled by default, but can be set to false to turn off the collection.
The "defaultWorkspace" workspace (on line 8) is used by default when the client a Repository.login(...) method that doesn't have the workspace name as a parameter or if the client provides a null reference for the workspace name. If not specified, the default workspace for the repository will be named "default".
One other workspace named "otherWorkspace" (line 9) will exist upon startup. By default, only the default workspace will exist.
Clients can use the "Workspace.createWorkspace(...)" methods to create new workspaces (line 10). This is the default.
The repository will store data on disk inside the target/test_repository folder relative to the runtime working directory of the JVM
The repository will store all BINARY values equal to or larger than 4096 bytes (line 16) in the binary store that uses the file system (line 18). Smaller BINARY values are held in-memory or persisted with the node information. The default size is 4096 bytes, and the default type is "filesystem".
The repository can also store all STRING values equal to or larger than a specified number of characters. In this case, all STRING values with 4096 or more characters (line 17) will be stored in the binary store that uses the file system (line 18). Smaller STRING values are held in-memory or persisted with the node information. By default, the maximumStringSize value will be set to the explicit or default value of maximumBinaryValueInBytes.
The repository will use several security providers for authentication and authorization. By default, only the anonymous provider is used. The order of the providers is important: a caller will be authenticated or authorized if any of the providers succeed for the caller:
- The JAAS policy named "modeshape-jcr" will be used (lines 23-24). If the "jaas" nested document is not specified, JAAS will not be used. If specified in this fashion, the JAAS security provider will always be used first. The "modeshape-jcr" policy is used by default if JAAS is enabled.
- Any providers as configured by the "providers" nested array (lines 31-36), where each array value is a nested document specifying the provider's name, description, and type (or classname). Only the "type" (or "classname") field is required. The two built-in types are "jaas" and "servlet", but any implementation of the 'org.modeshape.jcr.security.AuthorizationProvider" interface can be specified instead. Any instance members on the implementation class can be set by specifying additional fields of the same name, as long as the member type is String, a primitive boolean or number, java.util.Map, or java.util.List.
- The anonymous provider (lines 26-30) is enabled by default and (if enabled) always is the last provider to be consulted. It authenticates all users with read and write permission by default, although the exact roles (either "read", "readwrite", or "admin") can be configured with the "roles" field; specify an empty "roles" array to completely disable the anonymous provider. All sessions that are authenticated by this provider will be given the username given by the "username" field (line 30), which defaults to the literal "<anonymous>" value (including the angle brackets). Any user that fails to properly authenticate with another provider will not be given an anonymous session unless the "useOnFailedLogin" field is set to true.
Since 4.0.0.Final ModeShape supports defining indexes (lines 38-52) similar to the way you would in a RDBMS and index providers - the mechanism via which the index definition is maintained. If you omit these 2 sections, queries will still work but will not potentially perform as well since ModeShape will have to crawl the entire repository each time to find the nodes. If you decide to defined indexes, make sure both "indexes" and "indexProviders" are configured and each index has a defined index provider. See https://docs.jboss.org/author/display/MODE50/Query+and+search#Queryandsearch-Usingindexes for a more thorough explanation.
Text extractors (lines 54-61) are used to find the search terms from BINARY values. No text extractors are used by default, but specifying the name, description, and type (or classname) for one or more text extractor implementation classes enables this feature. Two text extractor types are provided out of the box, and both are configured here with the required "type" fields (e.g., "tika" and "vdb") and an optional description (useful for documentation and during administration).
The configured sequencers (lines 62-74) specify the types of sequencers that should be run. Each sequencer is configured with one or more path expressions that are matched against the paths of changed nodes; when any changed path matches the expression, the sequencer is called on the changed property/node and the generated output of the sequencer invocation is written to the location specified in the path expression. Each sequencer is configured by specifying the required "type" field, and an optional name and description. Custom implementations of "org.modeshape.jcr.api.sequencer.Sequencer" interface can be specified using the "classname" field (instead of the "type" field), and any instance members on the implementation class can be set by specifying additional fields of the same name, as long as the member type is String, a primitive boolean or number, java.util.Map, or java.util.List. Several types of sequencers are available out of the box:
- "cnd" parses JCR CND files to generate a node structure describing the namespaces, node types, property definitions, and child node definitions
- "class" and "java" parse Java class files and source files (respectively) and generates a node structure describing the encoded types, fields, methods, parameters, etc.
- "ddl" parses the more important DDL statements from SQL-92, Oracle, Derby, and PostgreSQL, and constructing a graph structure containing a structured representation of these statements. The resulting graph structure is largely the same for all dialects, though some dialects have non-standard additions to their grammar, and thus require dialect-specific additions to the graph structure.
- "image" extracts metadata from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM and PSD image files. This sequencer extracts the file format, image resolution, number of bits per pixel and optionally number of images, comments and physical resolution.
- "wsdl" parses WSDL files that adhere to the W3C's Web Service Definition Language (WSDL) 1.1 specification, and output a representation of the WSDL file's messages, port types, bindings, services, types (including embedded XML Schemas), documentation, and extension elements (including HTTP, SOAP and MIME bindings). This derived information is intended to mirror the structure and semantics of the actual WSDL files while also making it possible for ModeShape users to easily navigate, query and search over this derived information. This sequencer captures the namespace and names of all referenced components, and will resolve references to components appearing within the same file.
- "xsd" parses XML Schema Documents that adhere to the W3C's XML Schema Part 1 and Part 2 specifications, and output a representation of the XSD's attribute declarations, element declarations, simple type definitions, complex type definitions, import statements, include statements, attribute group declarations, annotations, other components, and even attributes with a non-schema namespace. This derived information is intended to accurately reflect the structure and semantics of the XSD files while also making it possible for ModeShape users to easily navigate, query and search over this derived information. This sequencer captures the namespace and names of all referenced components, and will resolve references to components appearing within the same files.
- "xml" parses XML files and extracts the element, attribute, namespace, DTD, entity, comments and other information in the file, producing a node structure representative of this information.
- "zip" extracts the files and folders contained in the ZIP archive file, extracting the files and folders into the repository using JCR's nt:file and nt:folder built-in node types. The structure of the output thus matches the logical structure of the contents of the ZIP file. Note that the resulting files may then be sequenced.
- "mp3" processes MP3 audio files added to a repository and extracts the ID3 metadata for the file, including the track's title, author, album name, year, and comment, and then writes a node structure representing this information
- "fixedwidth" extracts rows and fixed-width columns from text streams and generates a node structure representative of the rows and column values in each row.
- "delimited" extracts rows and delimited columns from text streams and generates a node structure representative of the rows and column values in each row.

Persistence

Starting with 5.0, ModeShape comes with its own persistence stores which support storing data in-memory, on disk or in a relational DB.

Simple configuration

The simplest configuration is a JSON file without a specific persistence section. This will store all repository data in-memory.

Minimal configuration

{
    "name" : "repo",
}

One can also explicitly define in-memory persistence like so:

Explicit in-memory persistence

{
    "name" : "test",
    "storage" : {
        "persistence": {
            "type": "mem"
        }
    }
}

Disk persistence

To store repository data on disk, you add a persistence section with the file type:

Disk persistence

{
    "name" : "Test Repository",
    "storage" : {
        "persistence": {
            "type": "file",
            "path" : "target/test_repository"
        },
    }
}

DB persistence

To store repository data in a RDBMS, you add a persistence section with the db type:

DB persistence

{
    "name" : "Test Repository",
    "storage" : {
      "persistence" : {
            "type" : "db",
            "connectionUrl": "jdbc:h2:file:./target/test_repo/db;AUTO_SERVER=TRUE"
        }
    }
}

See the persistence section for more details about each persistent store and the different available options.

Binary store

Add a binary store configuration - see the binary stores section for the different implementations.

Starting a ModeShape Repository

Now that we have a configuration for a ModeShape repository, we can start writing the code to start up ModeShape, deploy our repository, and start using JCR.

Starting the ModeShape engine

The first step is to instantiate and start the ModeShape engine. As we mentioned earlier, the ModeShape engine has no configuration, so this is almost trivial:

Start the ModeShape engine

// Create and start the engine ...
ModeShapeEngine engine = new ModeShapeEngine();
engine.start();

This uses the org.modeshape.jcr.ModeShapeEngine class' no-argument constructor, and then calls start(), which will block until the engine is running. Since the engine is extremely lightweight, this returns almost immediately.

At this point we have a running ModeShape engine, but it doesn't contain any repositories. That's next.

Deploying our Repository

In order to deploy a repository to our running engine, we need to read in the repository's configuration. This is easily done with one of the org.modeshape.jcr.RepositoryConfiguration.read(...) static methods to read a java.io.File, an java.io.InputStream, the java.net.URL to the file, a String with either the path to the resource file on the classpath or the JSON string itself. In this example, we'll read the file from the classpath:

Read a ModeShape repository configuration

RepositoryConfiguration config = RepositoryConfiguration.read("my-repository-config.json");

Here, the name of the repository will either be defined in the file, or will be "my-repository-config" due to the name of the file being read. Of course, we can also optionally change the name programmatically:

Optionally set the repository name programmatically

config = config.withName("My Repository");

Once we've read in the configuration, we can validate it to ensure it was constructed correctly. If not, we'll print out the problems (which will have the line number and description for each error) and simply exit, although you probably want to do something more useful.

Validate the repository configuration

// Verify the configuration for the repository ...
Problems problems = config.validate();
if (problems.hasErrors()) {
    System.err.println("Problems with the configuration.");
    System.err.println(problems);
    System.exit(-1);
}

Any errors at this point will absolutely prevent deploying a repository, and they need to be dealt with. That's why the above sample code exits the process if there are errors. However, not everything in the configuration can be validated at this time. For example, references to CND files or initial content files can only be dereferenced within a running environment, something which the RepositoryConfiguration does not have on its own.

So after we determine the configuration has no errors, the next step is to deploy it to our engine:

Deploy the repository to the engine

javax.jcr.Repository repository = engine.deploy(config);

If there are any catastrophic problems, the repository will not successfully deploy and the above method will throw an exception. If the repository does successfully deploy, then the repository will be in a running state.

Starting with ModeShape 3.6, the repository will record warnings and errors that do not prevent deployment but which otherwise may be significant problems:

Checking for deployment problems

Problems problems = repository.getStartupProblems();
if (problems.hasErrors() || problems.hasWarnings()) {
    System.err.println("Problems deploying the repository.");
    System.err.println(problems);
    System.exit(-1);
}

Again, your application should handle such errors more gracefully than the sample code above.

After this, at any time we could shutdown the repository and/or we could remove it from the engine. But lets continue by getting a JCR Session.

Using the Repository and the JCR API

Once a repository has been deployed to an engine (and is running), we can simply look up the repository by name:

Get the JCR Repository by name

javax.jcr.Repository repository = engine.getRepository("My Repository");

And at this point, we can use the standard JCR API to obtain a Session and start using the repository:

Create and use a JCR Session

javax.jcr.Session session = repository.login("default");

// Get the root node ...
Node root = session.getRootNode();
assert root != null;

System.out.println("Found the root node in the \"" + session.getWorkspace().getName() + "\" workspace");

session.logout();

Stopping the repository and engine

When we're finished with the engine, we can shut it down to stop all repositories, terminate any ongoing background operations (such as sequencing), and reclaim any resources that were acquired by this engine. Since this might take a little time, the "shutdown()" method immediately returns a java.util.concurrent.Future that you can use to wait until the shutdown process has completed. Of course, if you don't want to block while the engine shuts down, there's no need to call "get()" on the future.

Shutdown the ModeShape engine, optionally blocking until completed

Future<Boolean> future = engine.shutdown();
if ( future.get() ) {  // blocks until the engine is shutdown
   System.out.println("Shutdown successful");
}

This entire section showed how to use ModeShape to start an engine, deploy a repository, obtain the repository, create a Session, and then shutdown the repository and the engine. This required the use of ModeShape-specific classes, which isn't always desirable. In the next section, we'll see how this same process can be done while only using the standard JCR API.

Using JCR's RepositoryFactory

The JCR 2.0 specification introduced the javax.jcr.RepositoryFactory interface that can be used with the Java SE Service Locator pattern to find a Repository instance without using any implementation-specific APIs. The basic process is as follows:

Use only the standard JCR API to find a Repository

Map<String,String> parameters = ...
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}
Session session = repository.login("default");
...

Note how simple this is, while under the covers it is doing exactly the same process we described above.

Here, the parameters contain implementation-specific properties, but your application can easily read them from a file to keep all implementation-specific details out of your application code.

ModeShape requires one parameter:

Properties file for the ModeShape RepositoryFactory

org.modeshape.jcr.URL = file:path/to/my_repository.json

where the value of the property is the URL that can be resolved to the JSON configuration file. Other URLs might be to a file on the file system using an absolute path (e.g., "file:///abs/path/to/my_repository.json") or even a URL to a web server (or governance repository!) and the configuration file (e.g., "http://www.example.com/repos/my_repository.json").

At this point using ModeShape just requires using the standard JCR API.

Oh, and if you want to shut down the ModeShape engine, you can (try to) cast the javax.jcr.RepositoryFactory instance to a org.modeshape.jcr.api.RepositoryFactory instance. If successful, you can call the "shutdown()" method that returns a Future<Boolean> like the ModeShapeEngine's shutdown() method.

ModeShape and JBoss Wildfly

If you're building a web application or service (using any Java web or EE technology) and deploying to JBoss Wildfly, then the easiest way to set up ModeShape is to install it as a subsystem within Wildfly. Then you can use the Widlfly administrative tools (including the CLI) to dynamically configure one or more repositories, and ModeShape registers them in JNDI where your applications can simply look them up and start using them.

See our detailed instructions for installing and working within ModeShape and JBoss Wildfly. Also, we have a separate repository with Wildfly specific examples