JBoss.orgCommunity Documentation

Chapter 1. Introduction to ModeShape

1.1. Use cases for ModeShape
1.2. What is metadata?
1.3. What is JCR?
1.4. Project roadmap
1.5. ModeShape modules
1.6. What's new?

ModeShape is a JCR implementation that provides access to content stored in many different kinds of systems. A ModeShape repository isn't yet another silo of isolated information, but rather it's a JCR view of the information you already have in your environment: files systems, databases, other repositories, services, applications, etc.

To your applications, ModeShape looks and behaves like a regular JCR repository. Using the standard JCR 2.0 API (a.k.a. JSR-283), applications can search, navigate, version, and listen for changes in the content. But under the covers, ModeShape gets its content by federating multiple back-end systems (like databases, services, other repositories, etc.), allowing those systems to continue "owning" the information while ensuring the unified repository stays up-to-date and in sync.

Of course when you start providing a unified view of all this information, you start recognizing the need to store more information, including metadata about and relationships between the existing content. ModeShape lets you do this, too. And ModeShape even tries to help you discover more about the information you already have, especially the information wrapped up in the kinds of files often found in enterprise systems: service definitions, policy files, images, media, documents, presentations, application components, reusable libraries, configuration files, application installations, databases schemas, management scripts, and so on. As files are loaded into the repository, you can make ModeShape automatically sequence these files to extract from their content meaningful information that can be stored in the repository, where it can then be searched, accessed, and analyzed using the JCR API.

This document goes into detail about how ModeShape works to provide these capabilities. It also talks in detail about many of the parts within ModeShape - what they do, how they work, and how you can extend or customize the behavior. In particular, you'll learn about ModeShape connectors and sequencers, how you can use the implementations included in ModeShape, and how you can write your own to tailor ModeShape for your needs.

So whether you are a developer on the project, or you're trying to learn the intricate details of how ModeShape works, this document hopefully serves a good reference for developers on the project.

ModeShape repositories can be used in a variety of applications. One of the more obvious use cases for a metadata repository is in provisioning and management, where it's critical to understand and keep track of the metadata for models, database, services, components, applications, clusters, machines, and other systems used in an enterprise. Governance takes that a step farther, by also tracking the policies and expectations against which performance of the systems described by the repository can be verified. In these cases, a repository is an excellent mechanism for managing this complex and highly-varied information.

But these large and complex use cases aren't the only way to use a ModeShape repository. You could use an embedded ModeShape repository to manage configuration information for an application, or you could use ModeShape just to provide a JCR interface on top of a few non-JCR systems.

The point is that ModeShape can be used in many different ways, ranging from the very tiny embedded repository to a large and distributed enterprise-grade repository. The choice is yours.

Before we dive into more detail about ModeShape and metadata repositories, it's probably useful to explain what we mean by the term "metadata." Simply put, metadata is the information you need to manage something. For example, it's the information needed to configure an operating system, or the description of the information in an LDAP tree, or the topology of your network. It's the configuration of an application server or enterprise service bus. It's the steps involved in validating an application before it can go into production. It's the description of your database schemas, or of your services, or of the messages going in and coming out of a service. ModeShape is designed to be a repository for all this (and more).

There are a couple of important things to understand about metadata. First, many systems manage (and frequently change) their own metadata and information. Databases, applications, file systems, source code management systems, services, content management systems, and even other repositories are just a few types of systems that do this. We can't pull the information out and duplicate it, because then we risk having multiple copies that are out-of-sync. Ideally, we could access all of this information through a homogenous API that also provides navigation, caching, versioning, search, and notification of changes. That would make our lives significantly easier.

What we want is federation. We can connect to these back-end systems to dynamically access the content and project it into a single, unified repository. We can cache it for faster access, as long as the cache can be invalidated based upon time or event. But we also need to maintain a clear picture of where all the bits come from, so users can be sure they're looking at the right information. And we need to make it as easy as possible to write new connectors, since there are a lot of systems out there that have information we want to federate.

The second important characteristic of the metadata is that a lot of it is represented as files, and there are a lot of different file formats. These include source code, configuration files, web pages, database schemas, XML schemas, service definitions, policies, documents, spreadsheets, presentations, images, audio files, workflow definitions, business rules, and on and on. And logically if files contain metadata, we want to add those files to our metadata repository. The problem is, all that metadata is tied up as blobs in the repository. Ideally, our repository would automatically extract from those files the content that's most useful to us, and place that content inside the repository where it can be much more easily used, searched, related, and analyzed. ModeShape does exactly this via a process we call sequencing, and it's an important part of a metadata repository.

The third important characteristic of metadata is that it rarely stays the same. Different consumers of the information need to see different views of it. Metadata about two similar systems is not always the same. The metadata often needs to be tagged or annotated with additional information. And the things being described often change over time, meaning the metadata has to change, too. As a result, the way in which we store and manage the metadata has to be flexible and able to adapt to our ever-changing needs, and the object model we use to interact with the repository must accommodate these needs. The graph-based nature of the JCR API provides this flexibility while also giving us the ability to constrain information when it needs to be constrained.

There are a lot of choices for how applications can store information persistently so that it can be accessed at a later time and by other processes. The challenge developers face is how to use an approach that most closely matches the needs of their application. This choice becomes more important as developers choose to focus their efforts on application-specific logic, delegating much of the responsibilities for persistence to libraries and frameworks.

Perhaps one of the easiest techniques is to simply store information in files . The Java language makes working with files relatively easy, but Java really doesn't provide many bells and whistles. So using files is an easy choice when the information is either not complicated (for example property files), or when users may need to read or change the information outside of the application (for example log files or configuration files). But using files to persist information becomes more difficult as the information becomes more complex, as the volume of it increases, or if it needs to be accessed by multiple processes. For these situations, other techniques often have more benefits.

Another technique built into the Java language is Java serialization , which is capable of persisting the state of an object graph so that it can be read back in at a later time. However, Java serialization can quickly become tricky if the classes are changed, and so it's beneficial usually when the information is persisted for a very short period of time. For example, serialization is sometimes used to send an object graph from one process to another. Using serialization for longer-term storage of information is far less useful.

One of the more popular and widely-used persistence technologies is the relational database. Relational database management systems have been around for decades and are very capable. The Java Database Connectivity (JDBC) API provides a standard interface for connecting to and interacting with relational databases. However, it is a low-level API that requires a lot of code to use correctly, and it still doesn't abstract away the DBMS-specific SQL grammar. Also, working with relational data in an object-oriented language can feel somewhat unnatural, so many developers map this data to classes that fit much more cleanly into their application. The problem is that manually creating this mapping layer requires a lot of repetitive and non-trivial JDBC code.

Object-relational mapping libraries automate the creation of this mapping layer and result in far less code that is much more maintainable with performance that is often as good as (if not better than) handwritten JDBC code. The Java Persistence API (JPA) provide a standard mechanism for defining the mappings (through annotations) and working with these entity objects. Several commercial and open-source libraries implement JPA, and some even offer additional capabilities and features that go beyond JPA. For example, Hibernate is one of the most feature-rich JPA implementations and offers object caching, statement caching, extra association mappings, and other features that help to improve performance and usefulness. Plus, Hibernate is open-source (with support offered by JBoss).

While relational databases and JPA are solutions that work well for many applications, they are more limited in cases when the information structure is highly flexible, the structure is not known a priori, or that structure is subject to frequent change and customization. In these situations, content repositories may offer a better choice for persistence. Content repositories offer the storage capabilities of relational databases with the flexibility offered by other systems, such as using files. Content repositories also typically provide other capabilities as well, including hierarchical organization, versioning, indexing, search, access control, transactions, and observation. Content repositories are often used by content management systems (CMS), document management systems (DMS), and other applications that manage electronic files (e.g., documents, images, multi-media, web content, etc.) and metadata associated with them (e.g., author, date, status, security information, etc.). The Content Repository for Java technology API provides a standard Java API for working with content repositories. Abbreviated "JCR", this API was developed through the Java Community Process originally under JSR-170 (as "JCR 1.0"), but has since been revised and improved as "JCR 2.0" under JSR-283.

The JCR 2.0 API provides a number of information services that are needed by many applications, including: read and write access to information; the ability to structure information in a hierarchical and flexible manner that can adapt and evolve over time; ability to work with structured, semi-structured, and unstructured content; ability to (transparently) handle large strings; notifications of changes in the information; search and query; versioning of information; access control; integrity constraints; participation within distributed transactions; explicit locking of content; and of course persistence.

ModeShape implements the JCR 2.0 API, including many of the optional features.


The ModeShape open source project uses its JIRA instance to track issues for tasks, requirements, bugs, and other activities. The roadmap report shows how each of these issues are targeted to the upcoming releases, while the change log report shows all of the issues that were fixed in each of the past releases.

By convention, the ModeShape project team periodically review JIRA issues that aren't targeted to a release, and then schedule them based upon current workload, severity, and the roadmap. And if we review an issue and don't know how to target it, we target it to the Future Releases bucket.

At the start of a release, the project team reviews the roadmap, identifies the goals for the release, and targets (or retargets) the issues appropriately.

ModeShape consists of quite a few separate modules. Just a few of these make up the essential core components of the system:

Several other modules are also essential, but for the most part are hidden to client applications as they provide components used within the JCR implementation:

  • modeshape-repository provides the core ModeShape graph engine and services for managing repository connections, sequencers, MIME type detectors, and observation. If you're using ModeShape repositories via our graph API rather than JCR, then this is where you'd start.

  • modeshape-cnd provides a self-contained utility for parsing CND (Compact Node Definition) files and transforming the node definitions into a graph notation compatible with ModeShape's JCR implementation.

  • modeshape-graph defines the Application Programming Interface (API) for ModeShape's low-level graph model, including a fluent-style API for working with graph content. This module also defines the APIs necessary to implement custom connectors, sequencers, and MIME type detectors.

  • modeshape-common is a small low-level library of common utilities and frameworks, including logging, progress monitoring, internationalization/localization, text translators, component management, and class loader factories.

Most of the ModeShape modules, however, are optional extensions. Many of these depend on third party libraries, so you will probably want to include only those modules that provide functionality you'll use in your repository. These modules are located in the source under the extensions/ directory.

  • modeshape-clustering contains ModeShape's clustering components and are needed only when two or more ModeShape engines are to be clustered together (so listeners in one session get notifications made from within any of the engines). ModeShape clustering uses the powerful, flexible and mature JGroups reliable multicast communication library. Simply enable clustering in ModeShape's configuration, include this library, and start your cluster. Engines can be dynamically added and removed from the cluster.

  • modeshape-connector-infinispan is the preferred ModeShape repository connector for persistently storing content. Infinispan is an extremely scalable, highly available data grid platform that distributes the data across the nodes in the grid. This connector makes it possible for repository content to be stored in a very efficient, fast, highly-concurrent (essentially lock- and synchronization-free), and reliable manner, even when the content size grows to massive sizes. This connector is capable of storing any kind of content, and dictates how the content is stored on the data grid. Therefore, this connector cannot be used to access the content of existing data grids created by/for other applications.

  • modeshape-connector-jbosscache is a ModeShape repository connector that stores content within a JBoss Cache instance. JBoss Cache is a powerful cache implementation that can serve as a distributed cache and that can persist information. The cache instance can be found via JNDI or created and managed by the connector. This connector is capable of storing any kind of content, and dictates how the content is stored in the cache. Therefore, this connector cannot be used to access the content of existing cache instances created by/for other applications.

  • modeshape-connector-jdbc-metadata is a ModeShape repository connector that provides read-only access to metadata and schema information from relational databases through a JDBC connection. This connector provides an optional and configurable caching facility to prevent frequent requests to the database.

  • modeshape-connector-store-jpa is a ModeShape repository connector that stores content in a JDBC database, using the Java Persistence API (JPA) and the very highly-regarded and widely-used Hibernate implementation. This connector is capable of storing any kind of content, and dictates the schema in which it stores the content. Therefore, this connector cannot be used to access the data in existing created by/for other applications.

  • modeshape-connector-jcr is a ModeShape repository connector that accesses and stores content in an external JCR 2.0 repository. This allows ModeShape to integrate with other JCR implementations and even federate multiple JCR repositories into a single unified repository. Any differences in namespaces are automatically handled, although node types used by the content in the external JCR repository must also be registered into the ModeShape repository using the connector. Note that this connector is currently a technical preview, and we're seeking feedback and assistance in identifying the required functionality.

  • modeshape-connector-filesystem is a ModeShape repository connector that accesses the files and folders on (a part of) the local file system, providing that content in the form of nt:file and nt:folder nodes. This connector does support updating the file system when changes are made to the nt:file and nt:folder nodes. However, this connector does not support storing other kinds of nodes.

  • modeshape-connector-svn is a ModeShape repository connector that accesses the content of an existing Subversion repository, providing that content in the form of nt:file and nt:folder nodes. This connector does support updating the SVN repository when changes are made to the nt:file and nt:folder nodes. However, this connector does not support storing other kinds of nodes.

  • modeshape-sequencer-cnd is a ModeShape sequencer that extracts JCR node definitions from JCR Compact Node Definition (CND) files.

  • modeshape-sequencer-ddl is a ModeShape sequencer that extracts the structure and content from DDL files. This is still under development and includes support for the basic DDL statements in in the Oracle, PostgreSQL, Derby, and standard DDL dialects.

  • modeshape-sequencer-zip is a ModeShape sequencer that extracts the files (with content) and directories from ZIP archives.

  • modeshape-sequencer-xml is a ModeShape sequencer that extracts the structure and content from XML files.

  • modeshape-sequencer-classfile is a ModeShape sequencer that extracts the package, class/type, member, documentation, annotations, and other information from Java class files.

  • modeshape-sequencer-java is a ModeShape sequencer that extracts the package, class/type, member, documentation, annotations, and other information from Java source files.

  • modeshape-sequencer-jbpm-jpdl is a prototype ModeShape sequencer that extracts process definition metadata from jBPM process definition language (jPDL) files. This is still under development.

  • modeshape-sequencer-msoffice is a ModeShape sequencer that extracts metadata and summary information from Microsoft Office documents. For example, the sequencer extracts from a PowerPoint presentation the outline as well as thumbnails of each slide. Microsoft Word and Excel files are also supported.

  • modeshape-sequencer-images is a ModeShape sequencer that extracts the image metadata (e.g., size, date, etc.) from PNG, JPEG, GIF, BMP, PCS, IFF, RAS, PBM, PGM, and PPM image files.

  • modeshape-sequencer-mp3 is a ModeShape sequencer that extracts metadata (e.g., author, album name, etc.) from MP3 audio files.

  • modeshape-sequencer-teiid contains two sequencers. ModelSequencer extracts the structured data model contained with a Teiid relational XMI model, including the catalogs, schemas, tables, views, columns, primary keys, foreign keys, indexes, procedures, procedure parameters, procedure results, logical relationships, and the JDBC source from which the model was imported. Teiid VDB files contain several models, so the VdbSequencer extracts the virtual database metadata and the structured data model from each of the models contained within the VDB.

  • modeshape-sequencer-text is a ModeShape sequencer that extracts data from text streams. There are separate sequencers for character-delimited sequencing and fixed width sequencing, but both treat the incoming text stream as a series of rows separated by line-terminators with each row consisting of one or more columns.

  • modeshape-search-lucene is an implementation of the SearchEngine interface that uses the Lucene library. This module is one of the few extensions that is used directly by the modeshape-jcr module.

  • modeshape-mimetype-detector-aperture is a MimeTypeDetector implementation that uses the Aperture library to determine the best MIME type given the name and contents of a file.

  • modeshape-classloader-maven is a small library that provides a ClassLoaderFactory implementation that can create ClassLoader instances capable of loading classes given a Maven Repository and a list of Maven coordinates. The Maven Repository can be managed within a JCR repository.

The following modules make up the various web application projects (and are located in the source under the web/ directory). You may be able to use these artifacts "out of the box", but more likely the configuration defined in the WAR files will not be exactly what you want for your environment. In this case, you can replicate one of our "-war" modules and customize the configuration settings to easily assembly a custom WAR.

  • modeshape-web-jcr-webdav provides a WebDAV server for Java Content Repositories. This project provides integration with ModeShape's JCR implementation (of course) but also contains a service provider interface (SPI) that can be used to integrate other JCR implementations with these WebDAV services in the future. For ease of packaging, these classes are provided as a JAR that can be placed in the WEB-INF/lib of a deployed WebDAV server WAR.

  • modeshape-web-jcr-webdav-war wraps the WebDAV services from the modeshape-web-jcr-webdav JAR into a WAR and provides in-container integration tests. This project can be consulted as a template for how to deploy the WebDAV services in a custom implementation.

  • modeshape-web-jcr-rest provides a set of JSR-311 (JAX-RS) objects that form the basis of a RESTful server for Java Content Repositories. This project provides integration with ModeShape's JCR implementation (of course) but also contains a service provider interface (SPI) that can be used to integrate other JCR implementations with these RESTful services in the future. For ease of packaging, these classes are provided as a JAR that can be placed in the WEB-INF/lib of a deployed RESTful server WAR.

  • modeshape-web-jcr-rest-war wraps the RESTful services from the modeshape-web-jcr-rest JAR into a WAR and provides in-container integration tests. This project can be consulted as a template for how to deploy the RESTful services in a custom implementation.

  • modeshape-web-jcr-rest-client is a library that uses POJOs to access the REST web service. This module eliminates the need for applications to know how to create HTTP request URLs and payloads, and how to parse the JSON responses. It can be used to publish (upload) and unpublish (delete) files from ModeShape repositories.

  • modeshape-web-jcr provides a reusable library for web applications using JCR, and is used by the modeshape-web-jcr-rest and modeshape-web-jcr-webdav modules.

ModeShape recently added several modules that make it very easy to deploy ModeShape in JBoss AS or EAP as a full-fledged, central, shared service that can be monitored and administered using the embedded console and used directly by web applications deployed to the application server. Our Maven build produces a "kit" ZIP file that can be unzipped into a JBoss AS profile. When your server restarts, ModeShape will be running with a very simple configuration (although that can be easily changed).

The modules that make up the JBoss AS deployment kit are located in the source under the "deploy/jbossas directory":

  • modeshape-jbossas-service provides several components that are deployed through the microcontainer in JBoss AS, registered in JNDI, and exposed through the Profile Service for monitoring and management. This service leverages the JAAS support within the application server.

  • modeshape-jbossas-console defines the plugin for RHQ/ JOPR that enables administration, monitoring, alerting, operational control and configuration. All of the major components within a ModeShape engine are exposed as JOPR resources, and the plugin provides a number of metrics and administrative operations as well as exposing most configuration properties. (We plan to add more metrics and operations over the next few releases, as we gain more experience using the ModeShape JOPR plugin.)

  • modeshape-jbossas-web-rest-war defines a variant of the more general modeshape-web-rest-war that is tailored for deployment on JBoss AS, since it reuses the same ModeShape service deployed into the application server.

  • modeshape-jbossas-web-webdav-war defines a variant of the more general modeshape-web-webdav-war that is tailored for deployment on JBoss AS, since it reuses the same ModeShape service deployed into the application server.

There are also modules for ModeShape's documentation (located in the source under the docs/ directory):

  • docs-getting-started is the project with the DocBook source for the ModeShape Getting Started document.

  • docs-getting-started-examples is the project with the Java source for the example application used in the ModeShape Getting Started document.

  • docs-reference-guide is the project with the DocBook source for this document, the ModeShape Reference Guide document.

There are several utility modules:

  • modeshape-jpa-ddl-gen provides a standalone utility that can generate the DDL for the database schema used by the JPA connector. Because it uses Hibernate, it can generate DDL for any of the databases that the connector can use. This is also useful for users who prefer not to give DDL privileges to the ModeShape database user.

  • modeshape-jdbc provides a JDBC driver implementation that allows JDBC clients to query the contents of a JCR repository using JCR-SQL2. The driver even supports JDBC metadata, so client applications can dynamically discover the tables and columns available for querying (which are determined from the node types). It can be configured as a data source in JBoss AS, and can even leverage the ModeShape service, allowing JDBC-based access to the same repository content available via the JCR API, RESTful service, or WebDAV.

There is another module that runs the full suite of JCR TCK tests, and which at the moment still contains a few failures. This module is never needed in client applications.

  • modeshape-jcr-tck provides a separate testing project that executes all reference implementation's JCR TCK tests on a nightly basis to track implementation progress against the JCR 1.0 specification. This module will likely be retired when the ModeShape JCR implementation is complete, since modeshape-jcr and modeshape-integration-tests will be running the full suite of JCR TCK unit tests.

Another module provides system- and integration-level tests and is never needed in client applications:

  • modeshape-integration-tests provides a home for all of the integration tests that involve more components that just unit tests. Integration tests are often more complicated, take longer, and involve testing the integration and functionality of multiple components (whereas unit tests focus on testing a single class or component and may use stubs or mock objects to isolate the code being tested from other related components).

Finally, there is a Maven parent pom.xml file that aggregates all of the other projects, provides common defaults for Maven plugins and dependency versions used throughout the modules, and definition of various asset files to help build the necessary Maven artifacts during a build.

Each of these modules is a Maven project with a group ID of org.modeshape . All of these projects correspond to artifacts in the JBoss Maven 2 Repository, the settings for which are described on the JBoss.org wiki.

Although ModeShape 2.3.0.Final includes several improvements and minor features, this release is primarily a bug-fix release, with numerous fixes for issues reported against 2.1 and 2.2. For details, see the release notes.

ModeShape implements all of the required JCR 2.0 features: repository acquisition, authentication, reading/navigating, query, export, node type discovery, and permissions and capability checking. ModeShape also implements most of the optional JCR 2.0 features: writing, import, observation, workspace management, versioning, locking, node type management, same-name siblings, orderable child nodes, and shareable nodes. The remaining optional features (access control management, lifecycle management, retention and hold, and transactions) may be introduced in future versions.

Note

ModeShape 2.3.0.Final currently passes 1372 of the 1391 JCR TCK tests, where 17 of these 19 failures appear to be bugs in the TCK tests (see JCR-2648, JCR-2661, JCR-2662, and JCR-2663). The remaining 2 failures are due to a known issue (see MODE-760).