We've published the ModeShape artifacts and JARs for this beta release only in the JBoss Maven repository. The rest of this page shows how you can use ModeShape within your Maven-based projects. We've also added several distributions on our project's download page:
- a binary distribution with all the JARs, JavaDoc, and examples
- a kit to install ModeShape into a Wildfly installation
- a source distribution
So without further adieu...
- Complete Maven examples
- Embedding ModeShape in application or library built with Maven
- Add ModeShape Dependencies
- Using a transaction manager
- Configuring a ModeShape repository
- Configuring the Infinispan Cache
- Starting a ModeShape Repository
- Starting the ModeShape engine
- Deploying our Repository
- Using the Repository and the JCR API
- Stopping the repository and engine
- Using JCR's RepositoryFactory
- ModeShape and JBoss Wildfly
We have a number of self-contained examples that you can checkout, build, and then modify to try different things. So if Git is your thing, the easiest way to get going with ModeShape 4.0.0 is to simply clone this repository and build the examples. For details, see our modeshape-examples repository on GitHub, and follow the instructions shown the readme file on that page.
If Git isn't your thing, then read on to learn how to build a JCR application that embeds ModeShape and how you can install ModeShape into Wildfly and use it from within your web applications and services.
If you're Java SE application or library uses Maven, then embedding ModeShape is really very easy.
|The instructions on this page are for Java SE applications. If you're creating applications for deployment onto JBoss Wildfly, see the specific documentation about how to install ModeShape into JBoss Wildfly and use it with your web applications.|
All ModeShape releases since 3.0.0.Final are now available directly from the Maven Central repository. It takes a few hours (at least) after the artifacts are in the JBoss repository before they appear in Maven Central. So if you don't see a recent release in Maven Central, just give it a bit of time - or use the JBoss Maven repository.
The next step is to edit your application or library's POM file and add the dependencies to the JCR API and ModeShape. The easiest way to do that is to use one of our Maven BOMs that specifies the versions for all of the ModeShape components and all of its dependencies:
Then include in the POM "<dependencies>" section the ModeShape modules that you will directly use. Note that you don't need to specify any of the versions, since that's what the modeshape-bom-embedded provided. The one module that you need to include is the primary JCR implementation:
But you should also include any other modules that you'll either directly use or optional modules that you want to use. For example, if you're going to use any of ModeShape's public API (instead of just the JCR API), then you should include this dependency:
If you want to use one of Infinispan's cache stores, then pick from ONE of the following:
|Adding multiple cache stores may be necessary if you're using multiple Infinispan caches, each with a different cache store. Adding a dependency on any cache stores that you're not using, however, simply brings in more unnecessary (transient) dependencies and should be avoided.|
If you're going to use the JDBC Cache Store (e.g., "infinispan-cachestore-jdbc"), then you'll also need to add a dependency on the JDBC driver or embeddable database. For example, here's the dependency required to use the embeddable H2 database:
ModeShape is designed to use the same logging framework as your application, and it can dynamically bind to Log4J, SLF4J, Logback and the JDK's logging system. Your application or library will probably already be using one of these logging frameworks and will already have them in the dependencies.
If you're deploying ModeShape within a JavaSE application (or a non-JavaEE environment such as Tomcat), you will likely want to choose a transaction manager. (Infinispan has a simple one that is good enough for non-clustered testing, but probably not for production.)
We're somewhat partial to the JBoss Transaction Manager. It's solid and used in the popular JBoss Application Server and Red Hat Middleware platforms. And it's what we use in our own testing and examples.
Using it is easy, especially if you're using our embedded BOM (as we described above), because all you have to do is add a dependency in your POM on the JBoss Transaction Manager:
Note that you don't need to (but can) specify the version, since our BOM already defines the default version. The BOM also excludes a lot of the dependencies and components not necessary when using in a non-clustered environment.
By default, the Infinispan configuration will automatically look for and find the transaction manager.
If you want to use another transaction manager such as Atomikos or Spring Transaction Manager, simply add it as a normal dependency to your application, but be sure that it's one that Infinispan can automatically find. If not, then you'll have to provide an implementation of the org.infinispan.transaction.lookup.TransactionManagerLookup interface and specify it in your Infinispan configuration file:
The ModeShape engine is capable of running (or "deploying") multiple JCR repositories. However, each repository is configured separately and is completely independent from all other repositories. To configure a repository, you'll need a configuration file. Starting with ModeShape 3.0, these configuration files use the JSON format (which is a lot easier to read and create). Here is the minimum configuration file for a repository:
That's not a mistake. An empty JSON document is a completely valid repository configuration. Everything has a default value except for the repository's name, and the filename is used if one is not specified in the file. In this case, the name of this repository will be "my_repository".
Of course, lots of other options can be specified in the configuration file, but typically only the non-default values are specified. Since most of the defaults are sensible, many configurations will be pretty small.
Here's a configuration file that uses most of the available fields, most of which happen to be set the same values as the defaults. (This time we'll show line numbers so we can more easily describe what's going on.)
This configuration defines:
- The name of the repository (on line 2) to be "Test Repository", which will take precedence over the name of the file.
- The repository will be registered in JNDI (if JNDI is available in the environment) with the name "jcr/Test Repository" (line 3). By default, the JNDI name will follow the pattern "jcr/<name>", where "<name>" is the repository name.
- The repository will periodically collect performance and statistical metrics in the background (line 5). This is enabled by default, but can be set to false to turn off the collection.
- The "defaultWorkspace" workspace (on line 8) is used by default when the client a Repository.login(...) method that doesn't have the workspace name as a parameter or if the client provides a null reference for the workspace name. If not specified, the default workspace for the repository will be named "default".
- One other workspace named "otherWorkspace" (line 9) will exist upon startup. By default, only the default workspace will exist.
- Clients can use the "Workspace.createWorkspace(...)" methods to create new workspaces (line 10). This is the default.
- The repository will look for a Infinispan configuration file at "/path/to/infinispan/cache/configuration.xml" (line 13) to create a new Infinispan CacheContainer instance. The value can be a (absolute or relative) path on the file system, the path to a resource on the (application, system, or thread-context) classloader, or the JNDI name where the CacheContainer instance can be found in JNDI. If no configuration file is found at any of these locations, a default Infinispan configuration (a basic, local mode, non-clustered, in-memory cache) will be used.
- The repository will use the Infinispan cache named "Test Repository" (line 14). If not specified, the repository's name is used.
- The repository will store all BINARY values equal to or larger than 4096 bytes (line 16) in the binary store that uses the file system (line 18). Smaller BINARY values are held in-memory or persisted with the node information. The default size is 4096 bytes, and the default type is "filesystem".
- The repository can also store all STRING values equal to or larger than a specified number of characters. In this case, all STRING values with 4096 or more characters (line 17) will be stored in the binary store that uses the file system (line 18). Smaller STRING values are held in-memory or persisted with the node information. By default, the maximumStringSize value will be set to the explicit or default value of maximumBinaryValueInBytes.
- The repository will use several security providers for authentication and authorization. By default, only the anonymous provider is used. The order of the providers is important: a caller will be authenticated or authorized if any of the providers succeed for the caller:
- The JAAS policy named "modeshape-jcr" will be used (lines 23-24). If the "jaas" nested document is not specified, JAAS will not be used. If specified in this fashion, the JAAS security provider will always be used first. The "modeshape-jcr" policy is used by default if JAAS is enabled.
- Any providers as configured by the "providers" nested array (lines 31-36), where each array value is a nested document specifying the provider's name, description, and type (or classname). Only the "type" (or "classname") field is required. The two built-in types are "jaas" and "servlet", but any implementation of the 'org.modeshape.jcr.security.AuthorizationProvider" interface can be specified instead. Any instance members on the implementation class can be set by specifying additional fields of the same name, as long as the member type is String, a primitive boolean or number, java.util.Map, or java.util.List.
- The anonymous provider (lines 26-30) is enabled by default and (if enabled) always is the last provider to be consulted. It authenticates all users with read and write permission by default, although the exact roles (either "read", "readwrite", or "admin") can be configured with the "roles" field; specify an empty "roles" array to completely disable the anonymous provider. All sessions that are authenticated by this provider will be given the username given by the "username" field (line 30), which defaults to the literal "<anonymous>" value (including the angle brackets). Any user that fails to properly authenticate with another provider will not be given an anonymous session unless the "useOnFailedLogin" field is set to true.
- Since 4.0.0.Final ModeShape supports defining indexes (lines 38-52) similar to the way you would in a RDBMS and index providers - the mechanism via which the index definition is maintained. If you omit these 2 sections, queries will still work but will not potentially perform as well since ModeShape will have to crawl the entire repository each time to find the nodes. If you decide to defined indexes, make sure both "indexes" and "indexProviders" are configured and each index has a defined index provider. See https://docs.jboss.org/author/display/MODE40/Query+and+search#Queryandsearch-Usingindexes for a more thorough explanation.
- Text extractors (lines 54-61) are used to find the search terms from BINARY values. No text extractors are used by default, but specifying the name, description, and type (or classname) for one or more text extractor implementation classes enables this feature. Two text extractor types are provided out of the box, and both are configured here with the required "type" fields (e.g., "tika" and "vdb") and an optional description (useful for documentation and during administration).
- The configured sequencers (lines 62-74) specify the types of sequencers that should be run. Each sequencer is configured with one or more path expressions that are matched against the paths of changed nodes; when any changed path matches the expression, the sequencer is called on the changed property/node and the generated output of the sequencer invocation is written to the location specified in the path expression. Each sequencer is configured by specifying the required "type" field, and an optional name and description. Custom implementations of "org.modeshape.jcr.api.sequencer.Sequencer" interface can be specified using the "classname" field (instead of the "type" field), and any instance members on the implementation class can be set by specifying additional fields of the same name, as long as the member type is String, a primitive boolean or number, java.util.Map, or java.util.List. Several types of sequencers are available out of the box:
- "cnd" parses JCR CND files to generate a node structure describing the namespaces, node types, property definitions, and child node definitions
- "class" and "java" parse Java class files and source files (respectively) and generates a node structure describing the encoded types, fields, methods, parameters, etc.
- "ddl" parses the more important DDL statements from SQL-92, Oracle, Derby, and PostgreSQL, and constructing a graph structure containing a structured representation of these statements. The resulting graph structure is largely the same for all dialects, though some dialects have non-standard additions to their grammar, and thus require dialect-specific additions to the graph structure.
- "image" extracts metadata from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, PBM, PGM, PPM and PSD image files. This sequencer extracts the file format, image resolution, number of bits per pixel and optionally number of images, comments and physical resolution.
- "model" parses the model files produced by the Teiid Designer to extract the structured relational data model described by the XMI file, and outputs a node structure that represents this model.
- "vdb" parses the VDB archive files produced by the Teiid Designer to extract the virtual database information and the structured relational data model described in each of the contained XMI model files, and outputs a node structure that represents the VDB and these models.
- "wsdl" parses WSDL files that adhere to the W3C's Web Service Definition Language (WSDL) 1.1 specification, and output a representation of the WSDL file's messages, port types, bindings, services, types (including embedded XML Schemas), documentation, and extension elements (including HTTP, SOAP and MIME bindings). This derived information is intended to mirror the structure and semantics of the actual WSDL files while also making it possible for ModeShape users to easily navigate, query and search over this derived information. This sequencer captures the namespace and names of all referenced components, and will resolve references to components appearing within the same file.
- "xsd" parses XML Schema Documents that adhere to the W3C's XML Schema Part 1 and Part 2 specifications, and output a representation of the XSD's attribute declarations, element declarations, simple type definitions, complex type definitions, import statements, include statements, attribute group declarations, annotations, other components, and even attributes with a non-schema namespace. This derived information is intended to accurately reflect the structure and semantics of the XSD files while also making it possible for ModeShape users to easily navigate, query and search over this derived information. This sequencer captures the namespace and names of all referenced components, and will resolve references to components appearing within the same files.
- "xml" parses XML files and extracts the element, attribute, namespace, DTD, entity, comments and other information in the file, producing a node structure representative of this information.
- "zip" extracts the files and folders contained in the ZIP archive file, extracting the files and folders into the repository using JCR's nt:file and nt:folder built-in node types. The structure of the output thus matches the logical structure of the contents of the ZIP file. Note that the resulting files may then be sequenced.
- "mp3" processes MP3 audio files added to a repository and extracts the ID3 metadata for the file, including the track's title, author, album name, year, and comment, and then writes a node structure representing this information
- "fixedwidth" extracts rows and fixed-width columns from text streams and generates a node structure representative of the rows and column values in each row.
- "delimited" extracts rows and delimited columns from text streams and generates a node structure representative of the rows and column values in each row.
As noted in the previous section, the repository configuration can specify the configuration file for the Infinispan CacheContainer (see line 13 in the previous example). If a configuration or an existing CacheContainer instance can be found, a basic Infinispan configuration (a basic, local mode, non-clustered in-memory cache) will be used.
The rest of this section describes some basic ways to configure Infinispan. However, please see the Infinispan documentation for much more detailed information about how to properly configure Infinispan and its cache loaders using its XML configuration file format.
As with ModeShape, Infinispan's minimal configuration is a (basically) empty file:
This default configuration will result in a basic, local mode (not replicated or distributed), non-clustered, in-memory cache. While this cache will make the ModeShape repository be exceedingly fast, it's not the most practical. So more often than not, you'll want to configure Infinispan to persist information.
One of the reasons Infinispan is so fast is because it keeps an in-memory cache of the information (node content in ModeShape's case) most recently used. If all of the information can be kept in memory, then retrieving and/or updating the information is extremely fast. However, keeping all the information in-memory is not always a good idea, and Infinispan addresses this in several ways.
The most powerful way is to form a cluster of Infinispan caches so that Infinispan can distribute multiple copies of each piece of information across the different cluster. Normally there are many more machines than there are copies, so the effective storage capacity is many, many times the capacity of a single machine. Doing this forms a data grid, and Infinispan can always calculate on which processes in the grid a piece of information is stored. And, because each piece of information is stored in multiple locations on the grid, the information kept in memory is safe even if some of the grid fails.
An alternative is to use a cluster but to replicate every piece of information across all the processes in the cluster. The size of these clusters is typically much smaller than a data grid, since for durability only a handful of copies are needed. And because every process in the cluster contains all the data, this too is extremely fast, though it can't scale to the capacity of a data grid.
Keeping information in memory is fast, but sometimes it's desirable to also persist the information somewhere. Perhaps all of the information is to be persisted, or perhaps only that which can't be kept in memory is to be persisted. Either way, Infinispan's cache loaders provide a way for Infinispan to write out the information to an external store. The cache loaders that can persist information are also called cache stores.
The cache loader system also means that we can use Infinispan even when we don't have a cluster where Infinispan can replicate or distribute the information. In other words, we can configure an Infinispan cache store when we're running ModeShape as a single process, and we're still able to persist the information. Even in this mode, Infinispan will still act as a cache by keeping the most recently used items in-memory.
Here is a simple configuration file for Infinispan that defines a single cache named "Test Repository" that stores its contents in a LevelDB database stored on the file system at "/target/content/leveldb":
When a MdoeShape repository is configured to use this Infinispan cache, all the repository contents will be persisted to disk (either in the binary store or in the Infinispan cache). Thus, the repository can be shut down and restarted without loss of any information.
Of course other cache stores are available. See the Infinispan documentation for the details of how to properly configure cache stores for your environment and needs.
- org.infinispan.persistence.file.SingleFileStore - A simple loader that store information on the file system. This has severe limitations but is a simple cache loader for testing purposes. Note that it is not transactional, and it should not be used on NFS or Windows shares that do not properly implement file locking.
- org.infinispan.persistence.leveldb.LevelDBStore - A cache loader that stores files in a LevelDB database
- org.infinispan.persistence.jdbc.stringbased.JdbcStringBasedStore - A JDBC-based cache loader that stores each ModeShape node in a separate row in a simple 4-column table. This isn't as fast as some other cache loaders, but works very well when the repository content needs to be stored in a relational database. See the Infinispan documentation for details on configuring the JDBC store.
- org.infinispan.loaders.cloud.CloudCacheStore - A cache loader that stores repository content in Amazon S3, Rackspace Cloudfiles, or any other provider supported by JClouds.
- org.infinispan.loaders.cassandra.CassandraCacheStore - A cache loader that can store repository content in an Apache Cassandra database. See the Infinispan documentation for the details on this cache loader.
Now that we have a configuration for a ModeShape repository and a configuration for our Infinispan cache, we can start writing the code to start up ModeShape, deploy our repository, and start using JCR.
The first step is to instantiate and start the ModeShape engine. As we mentioned earlier, the ModeShape 3 engine has no configuration, so this is almost trivial:
This uses the org.modeshape.jcr.ModeShapeEngine class' no-argument constructor, and then calls start(), which will block until the engine is running. Since the engine is extremely lightweight, this returns almost immediately.
At this point we have a running ModeShape engine, but it doesn't contain any repositories. That's next.
In order to deploy a repository to our running engine, we need to read in the repository's configuration. This is easily done with one of the org.modeshape.jcr.RepositoryConfiguration.read(...) static methods to read a java.io.File, an java.io.InputStream, the java.net.URL to the file, a String with either the path to the resource file on the classpath or the JSON string itself. In this example, we'll read the file from the classpath:
Here, the name of the repository will either be defined in the file, or will be "my-repository-config" due to the name of the file being read. Of course, we can also optionally change the name programmatically:
Once we've read in the configuration, we can validate it to ensure it was constructed correctly. If not, we'll print out the problems (which will have the line number and description for each error) and simply exit, although you probably want to do something more useful.
Any errors at this point will absolutely prevent deploying a repository, and they need to be dealt with. That's why the above sample code exits the process if there are errors. However, not everything in the configuration can be validated at this time. For example, references to CND files or initial content files can only be dereferenced within a running environment, something which the RepositoryConfiguration does not have on its own.
So after we determine the configuration has no errors, the next step is to deploy it to our engine:
If there are any catastrophic problems, the repository will not successfully deploy and the above method will throw an exception. If the repository does successfully deploy, then the repository will be in a running state.
Starting with ModeShape 3.6, the repository will record warnings and errors that do not prevent deployment but which otherwise may be significant problems:
Again, your application should handle such errors more gracefully than the sample code above.
After this, at any time we could shutdown the repository and/or we could remove it from the engine. But lets continue by getting a JCR Session.
Once a repository has been deployed to an engine (and is running), we can simply look up the repository by name:
And at this point, we can use the standard JCR API to obtain a Session and start using the repository:
When we're finished with the engine, we can shut it down to stop all repositories, terminate any ongoing background operations (such as sequencing), and reclaim any resources that were acquired by this engine. Since this might take a little time, the "shutdown()" method immediately returns a java.util.concurrent.Future that you can use to wait until the shutdown process has completed. Of course, if you don't want to block while the engine shuts down, there's no need to call "get()" on the future.
This entire section showed how to use ModeShape to start an engine, deploy a repository, obtain the repository, create a Session, and then shutdown the repository and the engine. This required the use of ModeShape-specific classes, which isn't always desirable. In the next section, we'll see how this same process can be done while only using the standard JCR API.
The JCR 2.0 specification introduced the javax.jcr.RepositoryFactory interface that can be used with the Java SE Service Locator pattern to find a Repository instance without using any implementation-specific APIs. The basic process is as follows:
Note how simple this is, while under the covers it is doing exactly the same process we described above.
Here, the parameters contain implementation-specific properties, but your application can easily read them from a file to keep all implementation-specific details out of your application code.
ModeShape requires one parameter:
where the value of the property is the URL that can be resolved to the JSON configuration file. Other URLs might be to a file on the file system using an absolute path (e.g., "file:///abs/path/to/my_repository.json") or even a URL to a web server (or governance repository!) and the configuration file (e.g., "http://www.example.com/repos/my_repository.json").
At this point using ModeShape just requires using the standard JCR API.
Oh, and if you want to shut down the ModeShape engine, you can (try to) cast the javax.jcr.RepositoryFactory instance to a org.modeshape.jcr.api.RepositoryFactory instance. If successful, you can call the "shutdown()" method that returns a Future<Boolean> like the ModeShapeEngine's shutdown() method.
If you're building a web application or service (using any Java web or EE technology) and deploying to JBoss Wildfly, then the easiest way to set up ModeShape is to install it as a subsystem within Wildfly. Then you can use the Widlfly administrative tools (including the CLI) to dynamically configure one or more repositories, and ModeShape registers them in JNDI where your applications can simply look them up and start using them.
See our detailed instructions for installing and working within ModeShape and JBoss Wildfly. Also, we have a separate repository with Wildfly specific examples