JBoss.orgCommunity Documentation

ModeShape

Reference Guide


Target audience
1. Introduction to ModeShape
1.1. Use cases for ModeShape
1.2. What is metadata?
1.3. What is JCR?
1.4. Project roadmap
1.5. ModeShape modules
1.6. Compiling and building
1.7. What's new?
I. ModeShape Core
2. Execution Context
2.1. Security
2.1.1. JAAS
2.1.2. Web application security
2.2. Namespace Registry
2.3. Class Loaders
2.4. MIME Type Detectors
2.5. Text Extractors
2.6. Property factory and value factories
2.7. Summary
3. Graph Model
3.1. Names
3.2. Paths
3.3. Properties
3.4. Values and Value Factories
3.5. Readable, TextEncoder, and TextDecoder
3.6. Locations
3.7. Graph API
3.7.1. Using Workspaces
3.7.2. Working with Nodes
3.8. Requests
3.9. Request processors
3.10. Observation
3.10.1. Observable
3.10.2. Observers
3.10.3. Changes
3.11. Summary
4. Connector Framework
4.1. Connectors
4.2. Out-of-the-box connectors
4.3. Writing custom connectors
4.3.1. Creating the Maven 3 project
4.3.2. Implementing a RepositorySource
4.3.3. Implementing a RepositoryConnection
4.3.4. Testing custom connectors
4.4. Summary
5. Sequencing framework
5.1. Sequencers
5.2. Stream Sequencers
5.3. Path Expressions
5.4. Out-of-the-box Sequencers
5.5. Creating Custom Sequencers
5.5.1. Creating the Maven 3 project
5.5.2. Testing custom sequencers
5.6. Summary
II. ModeShape JCR
6. Configuration
6.1. Configuring ModeShape
6.1.1. Configuration Files
6.1.2. Programmatic Configuration
6.1.3. Loading from a Configuration Repository
6.2. JCR Repository options
6.3. Repository system content
6.4. Query index directory
6.5. Clustering
6.5.1. Enabling Clustering in ModeShape
6.5.2. JGroups configuration
6.6. Using ModeShape in Web Applications
6.6.1. Deploying ModeShape to JBoss AS
6.6.2. Deploying ModeShape to Tomcat
6.7. Setting the Classpath
6.7.1. Building against ModeShape via Maven
6.7.2. Add dependencies for logging
6.7.3. Building against ModeShape via JARs
6.8. What's next
7. Using the JCR API with ModeShape
7.1. What's new in JCR 2.0?
7.1.1. Connecting
7.1.2. Identifiers
7.1.3. Binary Values
7.1.4. Node Type Management
7.1.5. Queries
7.1.6. Workspace Management
7.1.7. Observation
7.1.8. Locking
7.1.9. Versioning
7.1.10. Importing and Exporting
7.1.11. Shareable Nodes
7.1.12. Orderable Child Nodes
7.1.13. Paths
7.1.14. getItem(String)
7.2. Obtaining a JCR Repository
7.2.1. Configuration File URLs
7.2.2. Using JNDI URLs
7.2.3. Cleaning Up after JcrRepositoryFactory
7.3. ModeShape's JcrEngine
7.4. Creating JCR Sessions
7.4.1. Using JAAS
7.4.2. Using HTTP Servlet security
7.4.3. Guest (Anonymous) User Access
7.4.4. Using Custom Security
7.5. JCR Specification Support
7.5.1. Required features
7.5.2. Optional features
7.5.3. TCK Compatibility features
7.5.4. JCR Security
7.5.5. Built-In Node Types
7.5.6. Custom Node Type Registration
7.6. Summary
8. Querying and Searching using JCR
8.1. JCR Query API
8.2. JCR XPath Query Language
8.2.1. Column Specifiers
8.2.2. Type Constraints
8.2.3. Property Constraints
8.2.4. Path Constraints
8.2.5. Ordering Specifiers
8.2.6. Miscellaneous
8.3. JCR-SQL Query Language
8.3.1. Queries
8.4. JCR-SQL2 Query Language
8.4.1. Queries
8.4.2. Sources
8.4.3. Joins
8.4.4. Equi-Join Conditions
8.4.5. Same-Node Join Conditions
8.4.6. Child-Node Join Conditions
8.4.7. Descendant-Node Join Conditions
8.4.8. Constraints
8.4.9. And Constraints
8.4.10. Or Constraints
8.4.11. Not Constraints
8.4.12. Comparison Constraints
8.4.13. Between Constraints
8.4.14. Property Existence Constraints
8.4.15. Set Constraints
8.4.16. Full-text Search Constraints
8.4.17. Same-Node Constraint
8.4.18. Child-Node Constraints
8.4.19. Descendant-Node Constraints
8.4.20. Paths and Names
8.4.21. Static Operands
8.4.22. Bind Variables
8.4.23. Subqueries
8.4.24. Dynamic Operands
8.4.25. Ordering
8.4.26. Columns
8.4.27. Limit and Offset
8.4.28. Pseudo-columns
8.4.29. Example JCR-SQL2 queries
8.5. Full-Text Search Language
8.5.1. Full-text Search Language
8.6. JCR Query Object Model (JCR-QOM) API
9. Accessing ModeShape Remotely
9.1. The ModeShape WebDAV Server
9.1.1. Configuring the ModeShape WebDAV Server
9.1.2. Deploying the ModeShape WebDAV Server
9.2. The ModeShape REST Server
9.2.1. Supported Resources and Methods
9.2.2. Configuring the ModeShape REST Server
9.2.3. Deploying the ModeShape REST Server
9.2.4. ModeShape REST Client API
9.3. Repository Providers
9.4. Summary
III. Connector Library
10. In-Memory Connector
11. File System Connector
12. JPA Connector
12.1. Simple Model
13. JCR Connector
14. Federation Connector
14.1. Projections
14.2. Multiple Projections
14.3. Processing flow
14.4. Update operations
14.5. Configuration
14.6. Repository Source properties
15. Subversion Connector
16. JBoss Cache Connector
17. Infinispan Connector
17.1. Considerations for Distributed Sources
18. JDBC Metadata Connector
IV. Sequencer Library
19. Compact Node Type (CND) Sequencer
19.1. Example
20. XML Document Sequencer
20.1. Example
21. XML Schema Document (XSD) Sequencer
21.1. Example
21.2. Node Types
21.3. Configuration
22. Web Service Definition Language (WSDL) 1.1 Sequencer
22.1. Example
22.2. Node Types
22.3. Configuration
23. ZIP File Sequencer
23.1. Example
24. Microsoft Office Document Sequencer
24.1. Example
25. Java Source File Sequencer
26. Java Class File Sequencer
27. Image Sequencer
28. MP3 Sequencer
28.1. Example
29. DDL File Sequencer
29.1. Example
30. Text Sequencers
30.1. Delimited Text Sequencer
30.2. Fixed Width Text Sequencer
31. Teiid Relational Model Sequencer
31.1. UUIDs
31.2. Node Types
31.2.1. XMI Namespace
31.2.2. Core Namespace
31.2.3. Relational Namespace
31.2.4. JDBC Source Namespace
31.2.5. Transformation Namespace
31.3. Default values
31.4. Annotations
31.5. Tags
31.6. Transformation
31.7. Configuration
31.8. Example
32. Teiid VDB Sequencer
32.1. UUIDs and References
32.2. Node Types
32.2.1. VDB Namespace
32.3. Configuration
32.4. Example
V. MIME Type Detector Library
33. Aperture MIME type detector
34. Writing custom detectors
VI. Text Extractor Library
35. Teiid text extractor
36. Tika text extractor
37. Writing custom text extractors
38. Looking to the future

ModeShape is a JCR implementation that provides access to content stored in many different kinds of systems. A ModeShape repository isn't yet another silo of isolated information, but rather it's a JCR view of the information you already have in your environment: files systems, databases, other repositories, services, applications, etc.

To your applications, ModeShape looks and behaves like a regular JCR repository. Using the standard JCR 2.0 API (a.k.a. JSR-283), applications can search, navigate, version, and listen for changes in the content. But under the covers, ModeShape gets its content by federating multiple back-end systems (like databases, services, other repositories, etc.), allowing those systems to continue "owning" the information while ensuring the unified repository stays up-to-date and in sync.

Of course when you start providing a unified view of all this information, you start recognizing the need to store more information, including metadata about and relationships between the existing content. ModeShape lets you do this, too. And ModeShape even tries to help you discover more about the information you already have, especially the information wrapped up in the kinds of files often found in enterprise systems: service definitions, policy files, images, media, documents, presentations, application components, reusable libraries, configuration files, application installations, databases schemas, management scripts, and so on. As files are loaded into the repository, you can make ModeShape automatically sequence these files to extract from their content meaningful information that can be stored in the repository, where it can then be searched, accessed, and analyzed using the JCR API.

This document goes into detail about how ModeShape works to provide these capabilities. It also talks in detail about many of the parts within ModeShape - what they do, how they work, and how you can extend or customize the behavior. In particular, you'll learn about ModeShape connectors and sequencers, how you can use the implementations included in ModeShape, and how you can write your own to tailor ModeShape for your needs.

So whether you are a developer on the project, or you're trying to learn the intricate details of how ModeShape works, this document hopefully serves a good reference for developers on the project.

Before we dive into more detail about ModeShape and metadata repositories, it's probably useful to explain what we mean by the term "metadata." Simply put, metadata is the information you need to manage something. For example, it's the information needed to configure an operating system, or the description of the information in an LDAP tree, or the topology of your network. It's the configuration of an application server or enterprise service bus. It's the steps involved in validating an application before it can go into production. It's the description of your database schemas, or of your services, or of the messages going in and coming out of a service. ModeShape is designed to be a repository for all this (and more).

There are a couple of important things to understand about metadata. First, many systems manage (and frequently change) their own metadata and information. Databases, applications, file systems, source code management systems, services, content management systems, and even other repositories are just a few types of systems that do this. We can't pull the information out and duplicate it, because then we risk having multiple copies that are out-of-sync. Ideally, we could access all of this information through a homogenous API that also provides navigation, caching, versioning, search, and notification of changes. That would make our lives significantly easier.

What we want is federation. We can connect to these back-end systems to dynamically access the content and project it into a single, unified repository. We can cache it for faster access, as long as the cache can be invalidated based upon time or event. But we also need to maintain a clear picture of where all the bits come from, so users can be sure they're looking at the right information. And we need to make it as easy as possible to write new connectors, since there are a lot of systems out there that have information we want to federate.

The second important characteristic of the metadata is that a lot of it is represented as files, and there are a lot of different file formats. These include source code, configuration files, web pages, database schemas, XML schemas, service definitions, policies, documents, spreadsheets, presentations, images, audio files, workflow definitions, business rules, and on and on. And logically if files contain metadata, we want to add those files to our metadata repository. The problem is, all that metadata is tied up as blobs in the repository. Ideally, our repository would automatically extract from those files the content that's most useful to us, and place that content inside the repository where it can be much more easily used, searched, related, and analyzed. ModeShape does exactly this via a process we call sequencing, and it's an important part of a metadata repository.

The third important characteristic of metadata is that it rarely stays the same. Different consumers of the information need to see different views of it. Metadata about two similar systems is not always the same. The metadata often needs to be tagged or annotated with additional information. And the things being described often change over time, meaning the metadata has to change, too. As a result, the way in which we store and manage the metadata has to be flexible and able to adapt to our ever-changing needs, and the object model we use to interact with the repository must accommodate these needs. The graph-based nature of the JCR API provides this flexibility while also giving us the ability to constrain information when it needs to be constrained.

There are a lot of choices for how applications can store information persistently so that it can be accessed at a later time and by other processes. The challenge developers face is how to use an approach that most closely matches the needs of their application. This choice becomes more important as developers choose to focus their efforts on application-specific logic, delegating much of the responsibilities for persistence to libraries and frameworks.

Perhaps one of the easiest techniques is to simply store information in files . The Java language makes working with files relatively easy, but Java really doesn't provide many bells and whistles. So using files is an easy choice when the information is either not complicated (for example property files), or when users may need to read or change the information outside of the application (for example log files or configuration files). But using files to persist information becomes more difficult as the information becomes more complex, as the volume of it increases, or if it needs to be accessed by multiple processes. For these situations, other techniques often have more benefits.

Another technique built into the Java language is Java serialization , which is capable of persisting the state of an object graph so that it can be read back in at a later time. However, Java serialization can quickly become tricky if the classes are changed, and so it's beneficial usually when the information is persisted for a very short period of time. For example, serialization is sometimes used to send an object graph from one process to another. Using serialization for longer-term storage of information is far less useful.

One of the more popular and widely-used persistence technologies is the relational database. Relational database management systems have been around for decades and are very capable. The Java Database Connectivity (JDBC) API provides a standard interface for connecting to and interacting with relational databases. However, it is a low-level API that requires a lot of code to use correctly, and it still doesn't abstract away the DBMS-specific SQL grammar. Also, working with relational data in an object-oriented language can feel somewhat unnatural, so many developers map this data to classes that fit much more cleanly into their application. The problem is that manually creating this mapping layer requires a lot of repetitive and non-trivial JDBC code.

Object-relational mapping libraries automate the creation of this mapping layer and result in far less code that is much more maintainable with performance that is often as good as (if not better than) handwritten JDBC code. The Java Persistence API (JPA) provide a standard mechanism for defining the mappings (through annotations) and working with these entity objects. Several commercial and open-source libraries implement JPA, and some even offer additional capabilities and features that go beyond JPA. For example, Hibernate is one of the most feature-rich JPA implementations and offers object caching, statement caching, extra association mappings, and other features that help to improve performance and usefulness. Plus, Hibernate is open-source (with support offered by JBoss).

While relational databases and JPA are solutions that work well for many applications, they are more limited in cases when the information structure is highly flexible, the structure is not known a priori, or that structure is subject to frequent change and customization. In these situations, content repositories may offer a better choice for persistence. Content repositories offer the storage capabilities of relational databases with the flexibility offered by other systems, such as using files. Content repositories also typically provide other capabilities as well, including hierarchical organization, versioning, indexing, search, access control, transactions, and observation. Content repositories are often used by content management systems (CMS), document management systems (DMS), and other applications that manage electronic files (e.g., documents, images, multi-media, web content, etc.) and metadata associated with them (e.g., author, date, status, security information, etc.). The Content Repository for Java technology API provides a standard Java API for working with content repositories. Abbreviated "JCR", this API was developed through the Java Community Process originally under JSR-170 (as "JCR 1.0"), but has since been revised and improved as "JCR 2.0" under JSR-283.

The JCR 2.0 API provides a number of information services that are needed by many applications, including: read and write access to information; the ability to structure information in a hierarchical and flexible manner that can adapt and evolve over time; ability to work with structured, semi-structured, and unstructured content; ability to (transparently) handle large strings; notifications of changes in the information; search and query; versioning of information; access control; integrity constraints; participation within distributed transactions; explicit locking of content; and of course persistence.

ModeShape implements the JCR 2.0 API, including many of the optional features.


The ModeShape open source project uses its JIRA instance to track issues for tasks, requirements, bugs, and other activities. The roadmap report shows how each of these issues are targeted to the upcoming releases, while the change log report shows all of the issues that were fixed in each of the past releases.

By convention, the ModeShape project team periodically review JIRA issues that aren't targeted to a release, and then schedule them based upon current workload, severity, and the roadmap. And if we review an issue and don't know how to target it, we target it to the Future Releases bucket.

At the start of a release, the project team reviews the roadmap, identifies the goals for the release, and targets (or retargets) the issues appropriately.

ModeShape consists of quite a few separate modules. Just a few of these make up the essential core components of the system:

Several other modules are also essential, but for the most part are hidden to client applications as they provide components used within the JCR implementation:

  • modeshape-repository provides the core ModeShape graph engine and services for managing repository connections, sequencers, MIME type detectors, and observation. If you're using ModeShape repositories via our graph API rather than JCR, then this is where you'd start.

  • modeshape-cnd provides a self-contained utility for parsing CND (Compact Node Definition) files and transforming the node definitions into a graph notation compatible with ModeShape's JCR implementation.

  • modeshape-graph defines the Application Programming Interface (API) for ModeShape's low-level graph model, including a fluent-style API for working with graph content. This module also defines the APIs necessary to implement custom connectors, sequencers, and MIME type detectors.

  • modeshape-common is a small low-level library of common utilities and frameworks, including logging, progress monitoring, internationalization/localization, text translators, component management, and class loader factories.

Most of the ModeShape modules, however, are optional extensions. Many of these depend on third party libraries, so you will probably want to include only those modules that provide functionality you'll use in your repository. These modules are located in the source under the extensions/ directory.

  • modeshape-clustering contains ModeShape's clustering components and are needed only when two or more ModeShape engines are to be clustered together (so listeners in one session get notifications made from within any of the engines). ModeShape clustering uses the powerful, flexible and mature JGroups reliable multicast communication library. Simply enable clustering in ModeShape's configuration, include this library, and start your cluster. Engines can be dynamically added and removed from the cluster.

  • modeshape-connector-infinispan is the preferred ModeShape repository connector for persistently storing content. Infinispan is an extremely scalable, highly available data grid platform that distributes the data across the nodes in the grid. This connector makes it possible for repository content to be stored in a very efficient, fast, highly-concurrent (essentially lock- and synchronization-free), and reliable manner, even when the content size grows to massive sizes. This connector is capable of storing any kind of content, and dictates how the content is stored on the data grid. Therefore, this connector cannot be used to access the content of existing data grids created by/for other applications.

  • modeshape-connector-jbosscache is a ModeShape repository connector that stores content within a JBoss Cache instance. JBoss Cache is a powerful cache implementation that can serve as a distributed cache and that can persist information. The cache instance can be found via JNDI or created and managed by the connector. This connector is capable of storing any kind of content, and dictates how the content is stored in the cache. Therefore, this connector cannot be used to access the content of existing cache instances created by/for other applications.

  • modeshape-connector-jdbc-metadata is a ModeShape repository connector that provides read-only access to metadata and schema information from relational databases through a JDBC connection. This connector provides an optional and configurable caching facility to prevent frequent requests to the database.

  • modeshape-connector-store-jpa is a ModeShape repository connector that stores content in a JDBC database, using the Java Persistence API (JPA) and the very highly-regarded and widely-used Hibernate implementation. This connector is capable of storing any kind of content, and dictates the schema in which it stores the content. Therefore, this connector cannot be used to access the data in existing created by/for other applications.

  • modeshape-connector-jcr is a ModeShape repository connector that accesses and stores content in an external JCR 2.0 repository. This allows ModeShape to integrate with other JCR implementations and even federate multiple JCR repositories into a single unified repository. Any differences in namespaces are automatically handled, although node types used by the content in the external JCR repository must also be registered into the ModeShape repository using the connector. Note that this connector is currently a technical preview, and we're seeking feedback and assistance in identifying the required functionality.

  • modeshape-connector-filesystem is a ModeShape repository connector that accesses the files and folders on (a part of) the local file system, providing that content in the form of nt:file and nt:folder nodes. This connector does support updating the file system when changes are made to the nt:file and nt:folder nodes. However, this connector does not support storing other kinds of nodes.

  • modeshape-connector-svn is a ModeShape repository connector that accesses the content of an existing Subversion repository, providing that content in the form of nt:file and nt:folder nodes. This connector does support updating the SVN repository when changes are made to the nt:file and nt:folder nodes. However, this connector does not support storing other kinds of nodes.

  • modeshape-sequencer-cnd is a ModeShape sequencer that extracts JCR node definitions from JCR Compact Node Definition (CND) files.

  • modeshape-sequencer-ddl is a ModeShape sequencer that extracts the structure and content from DDL files. This is still under development and includes support for the basic DDL statements in in the Oracle, PostgreSQL, Derby, and standard DDL dialects.

  • modeshape-sequencer-zip is a ModeShape sequencer that extracts the files (with content) and directories from ZIP archives.

  • modeshape-sequencer-xml is a ModeShape sequencer that extracts the structure and content from XML files.

  • modeshape-sequencer-xsd is a ModeShape sequencer that extracts the structure and content from XML Schema Definition (XSD) files.

  • modeshape-sequencer-wsdl is a ModeShape sequencer that extracts the structure and content from Web Service Definition Language (WSDL) 1.1 files.

  • modeshape-sequencer-sramp is a library with reusable node types patterned after the core model of S-RAMP, and used by other ModeShape sequencers.

  • modeshape-sequencer-classfile is a ModeShape sequencer that extracts the package, class/type, member, documentation, annotations, and other information from Java class files.

  • modeshape-sequencer-java is a ModeShape sequencer that extracts the package, class/type, member, documentation, annotations, and other information from Java source files.

  • modeshape-sequencer-jbpm-jpdl is a prototype ModeShape sequencer that extracts process definition metadata from jBPM process definition language (jPDL) files. This is still under development.

  • modeshape-sequencer-msoffice is a ModeShape sequencer that extracts metadata and summary information from Microsoft Office documents. For example, the sequencer extracts from a PowerPoint presentation the outline as well as thumbnails of each slide. Microsoft Word and Excel files are also supported.

  • modeshape-sequencer-images is a ModeShape sequencer that extracts the image metadata (e.g., size, date, etc.) from PNG, JPEG, GIF, BMP, PCS, IFF, RAS, PBM, PGM, and PPM image files.

  • modeshape-sequencer-mp3 is a ModeShape sequencer that extracts metadata (e.g., author, album name, etc.) from MP3 audio files.

  • modeshape-sequencer-teiid contains two sequencers. ModelSequencer extracts the structured data model contained with a Teiid relational XMI model, including the catalogs, schemas, tables, views, columns, primary keys, foreign keys, indexes, procedures, procedure parameters, procedure results, logical relationships, and the JDBC source from which the model was imported. Teiid VDB files contain several models, so the VdbSequencer extracts the virtual database metadata and the structured data model from each of the models contained within the VDB.

  • modeshape-sequencer-text is a ModeShape sequencer that extracts data from text streams. There are separate sequencers for character-delimited sequencing and fixed width sequencing, but both treat the incoming text stream as a series of rows separated by line-terminators with each row consisting of one or more columns.

  • modeshape-search-lucene is an implementation of the SearchEngine interface that uses the Lucene library. This module is one of the few extensions that is used directly by the modeshape-jcr module.

  • modeshape-mimetype-detector-aperture is a MimeTypeDetector implementation that uses the Aperture library to determine the best MIME type given the name and contents of a file.

  • modeshape-extractor-tika is a TextExtractor implementation that uses the Apache Tika parsing library to extract from binary content text that can be used for indexing the content.

  • modeshape-classloader-maven is a small library that provides a ClassLoaderFactory implementation that can create ClassLoader instances capable of loading classes given a Maven Repository and a list of Maven coordinates. The Maven Repository can be managed within a JCR repository.

The following modules make up the various web application projects (and are located in the source under the web/ directory). You may be able to use these artifacts "out of the box", but more likely the configuration defined in the WAR files will not be exactly what you want for your environment. In this case, you can replicate one of our "-war" modules and customize the configuration settings to easily assembly a custom WAR.

  • modeshape-web-jcr-webdav provides a WebDAV server for Java Content Repositories. This project provides integration with ModeShape's JCR implementation (of course) but also contains a service provider interface (SPI) that can be used to integrate other JCR implementations with these WebDAV services in the future. For ease of packaging, these classes are provided as a JAR that can be placed in the WEB-INF/lib of a deployed WebDAV server WAR.

  • modeshape-web-jcr-webdav-war wraps the WebDAV services from the modeshape-web-jcr-webdav JAR into a WAR and provides in-container integration tests. This project can be consulted as a template for how to deploy the WebDAV services in a custom implementation.

  • modeshape-web-jcr-rest provides a set of JSR-311 (JAX-RS) objects that form the basis of a RESTful server for Java Content Repositories. This project provides integration with ModeShape's JCR implementation (of course) but also contains a service provider interface (SPI) that can be used to integrate other JCR implementations with these RESTful services in the future. For ease of packaging, these classes are provided as a JAR that can be placed in the WEB-INF/lib of a deployed RESTful server WAR.

  • modeshape-web-jcr-rest-war wraps the RESTful services from the modeshape-web-jcr-rest JAR into a WAR and provides in-container integration tests. This project can be consulted as a template for how to deploy the RESTful services in a custom implementation.

  • modeshape-web-jcr-rest-client is a library that uses POJOs to access the REST web service. This module eliminates the need for applications to know how to create HTTP request URLs and payloads, and how to parse the JSON responses. It can be used to publish (upload) and unpublish (delete) files from ModeShape repositories.

  • modeshape-web-jcr provides a reusable library for web applications using JCR, and is used by the modeshape-web-jcr-rest and modeshape-web-jcr-webdav modules.

ModeShape recently added several modules that make it very easy to deploy ModeShape in JBoss AS or EAP as a full-fledged, central, shared service that can be monitored and administered using the embedded console and used directly by web applications deployed to the application server. Our Maven build produces a "kit" ZIP file that can be unzipped into a JBoss AS profile. When your server restarts, ModeShape will be running with a very simple configuration (although that can be easily changed).

The modules that make up the JBoss AS deployment kit are located in the source under the "deploy/jbossas directory":

  • modeshape-jbossas-service provides several components that are deployed through the microcontainer in JBoss AS, registered in JNDI, and exposed through the Profile Service for monitoring and management. This service leverages the JAAS support within the application server.

  • modeshape-jbossas-console defines the plugin for RHQ that enables administration, monitoring, alerting, operational control and configuration. All of the major components within a ModeShape engine are exposed as RHQ resources, and the plugin provides a number of metrics and administrative operations as well as exposing most configuration properties. (We plan to add more metrics and operations over the next few releases, as we gain more experience using the ModeShape RHQ plugin.)

  • modeshape-jbossas-web-rest-war defines a variant of the more general modeshape-web-rest-war that is tailored for deployment on JBoss AS, since it reuses the same ModeShape service deployed into the application server.

  • modeshape-jbossas-web-webdav-war defines a variant of the more general modeshape-web-webdav-war that is tailored for deployment on JBoss AS, since it reuses the same ModeShape service deployed into the application server.

There are also modules for ModeShape's documentation (located in the source under the docs/ directory):

  • docs-getting-started is the project with the DocBook source for the ModeShape Getting Started document.

  • docs-getting-started-examples is the project with the Java source for the example application used in the ModeShape Getting Started document.

  • docs-reference-guide is the project with the DocBook source for this document, the ModeShape Reference Guide document.

There are several utility modules:

  • modeshape-jpa-ddl-gen provides a standalone utility that can generate the DDL for the database schema used by the JPA connector. Because it uses Hibernate, it can generate DDL for any of the databases that the connector can use. This is also useful for users who prefer not to give DDL privileges to the ModeShape database user.

  • modeshape-jdbc-local provides a JDBC driver implementation that allows JDBC clients to query the contents of a local JCR repository using JCR-SQL2. The driver even supports JDBC metadata, making it possible to dynamically discover the tables and columns available for querying (which are determined from the node types). It can be configured as a data source in JBoss AS, and can even leverage the ModeShape service, allowing JDBC-based access by clients deployed to that JBoss AS instance to query the repository content. This library is very lightweight and fast, since it directly accesses the repository using the JCR API.

  • modeshape-jdbc provides a JDBC driver implementation that allows JDBC clients to query the contents of a local or remote JCR repository using JCR-SQL2. The driver even supports JDBC metadata, making it possible to dynamically discover the tables and columns available for querying (which are determined from the node types). It can be configured as a data source in JBoss AS, and can even leverage the ModeShape service, allowing JDBC-based access to the same repository content available via the JCR API, RESTful service, or WebDAV.

There is another module that runs the full suite of JCR TCK tests, and which at the moment still contains a few failures. This module is never needed in client applications.

  • modeshape-jcr-tck provides a separate testing project that executes all reference implementation's JCR TCK tests on a nightly basis to track implementation progress against the JCR 1.0 specification. This module will likely be retired when the ModeShape JCR implementation is complete, since modeshape-jcr and modeshape-integration-tests will be running the full suite of JCR TCK unit tests.

Another module provides system- and integration-level tests and is never needed in client applications:

  • modeshape-integration-tests provides a home for all of the integration tests that involve more components that just unit tests. Integration tests are often more complicated, take longer, and involve testing the integration and functionality of multiple components (whereas unit tests focus on testing a single class or component and may use stubs or mock objects to isolate the code being tested from other related components).

Finally, there is a Maven parent pom.xml file that aggregates all of the other projects, provides common defaults for Maven plugins and dependency versions used throughout the modules, and definition of various asset files to help build the necessary Maven artifacts during a build.

Each of these modules is a Maven project with a group ID of org.modeshape . All of these projects correspond to artifacts in the JBoss Maven 2 Repository, the settings for which are described on the JBoss.org wiki.

ModeShape 2.5.0.Final includes several improvements and minor features, and numerous fixes for issues reported against the earlier 2.x releases. For details, see the release notes.

ModeShape implements all of the required JCR 2.0 features: repository acquisition, authentication, reading/navigating, query, export, node type discovery, and permissions and capability checking. ModeShape also implements most of the optional JCR 2.0 features: writing, import, observation, workspace management, versioning, locking, node type management, same-name siblings, orderable child nodes, and shareable nodes. The remaining optional features (access control management, lifecycle management, retention and hold, and transactions) may be introduced in future versions.

Note

ModeShape 2.5.0.Final currently passes 1372 of the 1391 JCR TCK tests, where 17 of these 19 failures appear to be bugs in the TCK tests (see JCR-2648, JCR-2661, JCR-2662, and JCR-2663). The remaining 2 failures are due to a known issue (see MODE-760).

The ModeShape project organizes the codebase into a number of subprojects. The most fundamental are those core libraries, including the graph API, connector framework, sequencing framework, as well as the configuration and engine in which all the components run. These are all topics covered in this part of the document.

The ModeShape implementation of the JCR API as well as some other JCR-related components are covered in the next part.

The various components of ModeShape are designed as plain old Java objects, or POJOs (Plain Old Java Objects). And rather than making assumptions about their environment, each component instead requires that any external dependencies necessary for it to operate must be supplied to it. This pattern is known as Dependency Injection, and it allows the components to be simpler and allows for a great deal of flexibility and customization in how the components are configured.

The approach that ModeShape takes is simple: a simple POJO that represents everything about the environment in which components operate. Called ExecutionContext, it contains references to most of the essential facilities, including: security (authentication and authorization); namespace registry; name factories; factories for properties and property values; logging; and access to class loaders (given a classpath). Most of the ModeShape components require an ExecutionContext and thus have access to all these facilities.

The ExecutionContext is a concrete class that is instantiated with the no-argument constructor:

public class ExecutionContext implements ClassLoaderFactory {

    /**
     * Create an instance of an execution context, with default implementations for all components.
     */
    public ExecutionContext() { ... }

    /**
     * Get the factories that should be used to create values for {@link Property properties}.
     * @return the property value factory; never null
     */
    public ValueFactories getValueFactories() {...}

    /**
     * Get the namespace registry for this context.
     * @return the namespace registry; never null
     */
    public NamespaceRegistry getNamespaceRegistry() {...}

    /**
     * Get the factory for creating {@link Property} objects.
     * @return the property factory; never null
     */
    public PropertyFactory getPropertyFactory() {...}

    /**
     * Get the security context for this environment.
     * @return the security context; never null
     */
    public SecurityContext getSecurityContext() {...}

    /**
     * Return a logger associated with this context. This logger records only those activities within the 
     * context and provide a way to capture the context-specific activities. All log messages are also
     * sent to the system logger, so classes that log via this mechanism should <i>not</i> also 
     * {@link Logger#getLogger(Class) obtain a system logger}.
     * @param clazz the class that is doing the logging
     * @return the logger, named after clazz; never null
     */
    public Logger getLogger( Class<?> clazz ) {...}

    /**
    * Return a logger associated with this context. This logger records only those activities within the 
    * context and provide a way to capture the context-specific activities. All log messages are also
    * sent to the system logger, so classes that log via this mechanism should <i>not</i> also 
    * {@link Logger#getLogger(Class) obtain a system logger}.
     * @param name the name for the logger
     * @return the logger, named after clazz; never null
     */
    public Logger getLogger( String name ) {...}

		...
}

The fact that so many of the ModeShape components take ExecutionContext instances gives us some interesting possibilities. For example, one execution context instance can be used as the highest-level (or "application-level") context for all of the services (e.g., RepositoryService, SequencingService, etc.). Then, an execution context could be created for each user that will be performing operations, and that user's context can be passed around to not only provide security information about the user but also to allow the activities being performed to be recorded for user feedback, monitoring and/or auditing purposes.

As mentioned above, the starting point is to create a default execution context, which will have all the default components:

ExecutionContext context = new ExecutionContext();

Once you have this top-level context, you can start creating subcontexts with different components, and different security contexts. (Of course, you can create a subcontext from any instance.) To create a subcontext, simply use one of the with(...) methods on the parent context. We'll show examples later on in this chapter.

ModeShape uses a simple abstraction layer to isolate it from the security infrastructure used within an application. A SecurityContext represents the context of an authenticated user, and is defined as an interface:

public interface SecurityContext {

    /**
     * Get the name of the authenticated user.
     * @return the authenticated user's name
     */
    String getUserName();

    /**
     * Determine whether the authenticated user has the given role.
     * @param roleName the name of the role to check
     * @return true if the user has the role and is logged in; false otherwise
     */
    boolean hasRole( String roleName );

    /**
     * Logs the user out of the authentication mechanism.
     * For some authentication mechanisms, this will be implemented as a no-op.
     */
    void logout();
}

Every ExecutionContext has a SecurityContext instance, though the top-level (default) execution context does not represent an authenticated user. But you can create a subcontext for a user authenticated via JAAS:

ExecutionContext context = ...
String username = ...
char[] password = ...
String jaasRealm = ...
SecurityContext securityContext = new JaasSecurityContext(jaasRealm, username, password);
ExecutionContext userContext = context.with(securityContext);

In the case of JAAS, you might not have the password but would rather prompt the user. In that case, simply create a subcontext with a different security context:

ExecutionContext context = ...
String jaasRealm = ...
CallbackHandler callbackHandler = ...
ExecutionContext userContext = context.with(new JaasSecurityContext(jaasRealm, callbackHandler);

Of course if your application has a non-JAAS authentication and authorization system, you can simply provide your own implementation of SecurityContext:

ExecutionContext context = ...
SecurityContext mySecurityContext = ...
ExecutionContext myAppContext = context.with(mySecurityContext);

These ExecutionContexts then represent the authenticated user in any component that uses the context.

One of the SecurityContext implementations provided by ModeShape is the JaasSecurityContext, which delegates any authentication or authorization requests to a Java Authentication and Authorization Service (JAAS) provider. This is the standard approach for authenticating and authorizing in Java.

There are quite a few JAAS providers available, but one of the best and most powerful providers is JBoss Security, the open source security framework used by JBoss. JBoss Security offers a number of JAAS login modules, including:

  • User-Roles Login Module is a simple javax.security.auth.login.LoginContext implementation that uses usernames and passwords stored in a properties file.

  • Client Login Module prompts the user for their username and password.

  • Database Server Login Module uses a JDBC database to authenticate principals and associate them with roles.

  • LDAP Login Module uses an LDAP directory to authenticate principals. Two implementations are available.

  • Certificate Login Module authenticates using X509 certificates, obtaining roles from either property files or a JDBC database.

  • Operating System Login Module authenticates using the operating system's mechanism.

and many others. Plus, JBoss Security also provides other capabilities, such as using XACML policies or using federated single sign-on. For more detail, see the JBoss Security project.

If ModeShape is being used within a web application, then it is probably desirable to reuse the security infrastructure of the application server. This can be accomplished by implementing the SecurityContext interface with an implementation that delegates to the HttpServletRequest. Then, for each request, create a SecurityContextCredentials instance around your SecurityContext, and use that credentials to obtain a JCR Session.

Here is an example of the SecurityContext implementation that uses the servlet request:

@Immutable
public class ServletSecurityContext implements SecurityContext {

    private final String userName;
    private final HttpServletRequest request;

    /**
     * Create a {@link ServletSecurityContext} with the supplied 
     * {@link HttpServletRequest servlet information}.
     * 
     * @param request the servlet request; may not be null
     */
    public ServletSecurityContext( HttpServletRequest request ) {
        this.request = request;
        this.userName = request.getUserPrincipal() != null ? request.getUserPrincipal().getName() : null;
    }

    /**
     * Get the name of the authenticated user.
     * @return the authenticated user's name
     */
    public String getUserName() {
        return userName;
    }

    /**
     * Determine whether the authenticated user has the given role.
     * @param roleName the name of the role to check
     * @return true if the user has the role and is logged in; false otherwise
     */
    boolean hasRole( String roleName ) {
        request.isUserInRole(roleName);
    }

    /**
     * Logs the user out of the authentication mechanism.
     * For some authentication mechanisms, this will be implemented as a no-op.
     */
    public void logout() {
    }
}

Then use this to create a Session:

HttpServletRequest request = ...
Repository repository = engine.getRepository("my repository");
SecurityContext securityContext = new ServletSecurityContext(httpServletRequest);
ExecutionContext servletContext = context.with(securityContext);

We'll see later in the JCR chapter how this can be used to obtain a JCR Session for the authenticated user.

As we saw earlier, every ExecutionContext has a registry of namespaces. Namespaces are used throughout the graph API (as we'll see soon), and the prefix associated with each namespace makes for more readable string representations. The namespace registry tracks all of these namespaces and prefixes, and allows registrations to be added, modified, or removed. The interface for the NamespaceRegistry shows how these operations are done:

public interface NamespaceRegistry {

    /**
     * Return the namespace URI that is currently mapped to the empty prefix.
     * @return the namespace URI that represents the default namespace, 
     * or null if there is no default namespace
     */
    String getDefaultNamespaceUri();

    /**
     * Get the namespace URI for the supplied prefix.
     * @param prefix the namespace prefix
     * @return the namespace URI for the supplied prefix, or null if there is no 
     * namespace currently registered to use that prefix
     * @throws IllegalArgumentException if the prefix is null
     */
    String getNamespaceForPrefix( String prefix );

    /**
     * Return the prefix used for the supplied namespace URI.
     * @param namespaceUri the namespace URI
     * @param generateIfMissing true if the namespace URI has not already been registered and the 
     *        method should auto-register the namespace with a generated prefix, or false if the  
     *        method should never auto-register the namespace
     * @return the prefix currently being used for the namespace, or "null" if the namespace has 
     *         not been registered and "generateIfMissing" is "false"
     * @throws IllegalArgumentException if the namespace URI is null
     * @see #isRegisteredNamespaceUri(String)
     */
    String getPrefixForNamespaceUri( String namespaceUri, boolean generateIfMissing );

    /**
     * Return whether there is a registered prefix for the supplied namespace URI.
     * @param namespaceUri the namespace URI
     * @return true if the supplied namespace has been registered with a prefix, or false otherwise
     * @throws IllegalArgumentException if the namespace URI is null
     */
    boolean isRegisteredNamespaceUri( String namespaceUri );

    /**
     * Register a new namespace using the supplied prefix, returning the namespace URI previously 
     * registered under that prefix.
     * @param prefix the prefix for the namespace, or null if a namesapce prefix should be generated 
     *        automatically
     * @param namespaceUri the namespace URI
     * @return the namespace URI that was previously registered with the supplied prefix, or null if the 
     *         prefix was not previously bound to a namespace URI
     * @throws IllegalArgumentException if the namespace URI is null
     */
    String register( String prefix, String namespaceUri );

    /**
     * Unregister the namespace with the supplied URI.
     * @param namespaceUri the namespace URI
     * @return true if the namespace was removed, or false if the namespace was not registered
     * @throws IllegalArgumentException if the namespace URI is null
     * @throws NamespaceException if there is a problem unregistering the namespace
     */
    boolean unregister( String namespaceUri );

    /**
     * Obtain the set of namespaces that are registered.
     * @return the set of namespace URIs; never null
     */
    Set<String> getRegisteredNamespaceUris();

    /**
     * Obtain a snapshot of all of the {@link Namespace namespaces} registered at the time this method 
     * is called. The resulting set is immutable, and will not reflect changes made to the registry.
     * @return an immutable set of Namespace objects reflecting a snapshot of the registry; never null
     */
    Set<Namespace> getNamespaces();
}

This interfaces exposes Namespace objects that are immutable:

@Immutable
interface Namespace extends Comparable<Namespace> {
    /**
     * Get the prefix for the namespace
     * @return the prefix; never null but possibly the empty string
     */
    String getPrefix();

    /**
     * Get the URI for the namespace
     * @return the namespace URI; never null but possibly the empty string
     */
    String getNamespaceUri();
}

ModeShape actually uses several implementations of NamespaceRegistry, but you can even implement your own and create ExecutionContexts that use it:

NamespaceRegistry myRegistry = ...
ExecutionContext contextWithMyRegistry = context.with(myRegistry);

ModeShape is designed around extensions: sequencers, connectors, MIME type detectors, and class loader factories. The core part of ModeShape is relatively small and has few dependencies, while many of the "interesting" components are extensions that plug into and are used by different parts of the core or by layers above (such as the JCR implementation). The core doesn't really care what the extensions do or what external libraries they require, as long as the extension fulfills its end of the extension contract.

This means that you only need the core modules of ModeShape on the application classpath, while the extensions do not have to be on the application classpath. And because the core modules of ModeShape have few dependencies, the risk of ModeShape libraries conflicting with the application's are lower. Extensions, on the other hand, will likely have a lot of unique dependencies. By separating the core of ModeShape from the class loaders used to load the extensions, your application is isolated from the extensions and their dependencies.

Note

Of course, you can put all the JARs on the application classpath, too. This is what the examples in the Getting Started document do.

But in this case, how does ModeShape load all the extension classes? You may have noticed earlier that ExecutionContext implements the ClassLoaderFactory interface with a single method:

public interface ClassLoaderFactory {
    /**
     * Get a class loader given the supplied classpath.  The meaning of the classpath 
     * is implementation-dependent.
     * @param classpath the classpath to use
     * @return the class loader; may not be null
     */
    ClassLoader getClassLoader( String... classpath );
}

This means that any component that has a reference to an ExecutionContext has the ability to create a class loader with a supplied class path. As we'll see later, the connectors and sequencers are all defined with a class and optional class path. This is where that class path comes in.

The actual meaning of the class path, however, is a function of the implementation. ModeShape uses a StandardClassLoaderFactory that just loads the classes using the Thread's current context class loader (or, if there is none, delegates to the class loader that loaded the StandardClassLoaderFactory class). Of course, it's possible to implement other ClassLoaderFactory with other implementations. Then, just create a subcontext with your implementation:

ClassLoaderFactory myClassLoaderFactory = ...
ExecutionContext contextWithMyClassLoaderFactories = context.with(myClassLoaderFactory);

Note

The modeshape-classloader-maven project has a class loader factory implementation that parses the names into Maven coordinates, then uses those coordinates to look up artifacts in a Maven 2 repository. The artifact's POM file is used to determine the dependencies, which is done transitively to obtain the complete dependency graph. The resulting class loader has access to these artifacts in dependency order.

This class loader is not ready for use, however, since there is no tooling to help populate the repository.

ModeShape often needs the ability to determine the MIME type for some binary content. When uploading content into a repository, we may want to add the MIME type as metadata. Or, we may want to make some processing decisions based upon the MIME type. So, ModeShape has a small pluggable framework for determining the MIME type by using the name of the file (e.g., extensions) and/or by reading the actual content.

ModeShape defines a MimeTypeDetector interface that abstracts the implementation that actually determines the MIME type given the name and content. If the detector is able to determine the MIME type, it simply returns it as a string. If not, it merely returns null. Note, however, that a detector must be thread-safe. Here is the interface:

@ThreadSafe
public interface MimeTypeDetector {

    /**
     * Returns the MIME-type of a data source, using its supplied content and/or its supplied name, 
     * depending upon the implementation. If the MIME-type cannot be determined, either a "default" 
     * MIME-type or null may be returned, where the former will prevent earlier 
     * registered MIME-type detectors from being consulted.
     * 
     * @param name The name of the data source; may be null.
     * @param content The content of the data source; may be null.
     * @return The MIME-type of the data source, or optionally null 
     * if the MIME-type could not be determined.
     * @throws IOException If an error occurs reading the supplied content.
     */
    String mimeTypeOf( String name, InputStream content ) throws IOException;
}

To use a detector, simply invoke the method and supply the name of the content (e.g., the name of the file, with the extension) and the InputStream to the actual binary content. The result is a String containing the MIME type (e.g., "text/plain") or null if the MIME type cannot be determined. Note that the name or InputStream may be null, making this a very versatile utility.

Once again, you can obtain a MimeTypeDetector from the ExecutionContext. ModeShape provides and uses by default an implementation that uses only the name (the content is ignored), looking at the name's extension and looking for a match in a small listing (loaded from the org/modeshape/graph/mime.types loaded from the classpath). You can add extensions by copying this file, adding or correcting the entries, and then placing your updated file in the expected location on the classpath.

Of course, you can always use a different MimeTypeDetector by creating a subcontext and supplying your implementation:

MimeTypeDetector myDetector = ...
ExecutionContext contextWithMyDetector = context.with(myDetector);

ModeShape can store all kinds of content, and ModeShape makes it easy to perform full-text searches on that content. To support searching, ModeShape extracts the text from the various properties on each node. They way it does this for most property types (e.g., STRING, LONG, DATE, PATH, NAME, etc.) is simply to read and use the literal values. But BINARY properties are another story: there's no way to indexes the binary content directly. Instead, ModeShape has a small pluggable framework for extracting useful text from the binary content, based upon the MIME type of the content itself.

The process works like this: when a BINARY property needs to be indexed for search, ModeShape determines the MIME type of the content, determines if there is a text extractor capable of handling that MIME type, and if so it passes the content to the text extractor and gets back a string of text, and it indexes that text.

ModeShape provides two text extractors out-of-the-box. The Teiid VDB text extractor operates only upon Teiid virtual database (i.e., ".vdb") files and extracts the virtual database's logical name, description, and version, plus the logical name, description, source name, source translator name, and JNDI name for each of the virtual database's models.

The second out-of-the-box extractor is capable of extracting text from wider variety of file types, including Microsoft Office, PDF, HTML, plain text, and XML. This extractor uses the Tika toolkit from Apache, so a number of other file formats are supported. However, these other file formats require additional libraries that are not included out of the box. This is discussed in more detail in a later chapter.

Text extraction can be an intensive process, so it is not enabled by default. But enabling the text extractors in ModeShape's configuration is actually pretty easy. When using a configuration file, simply add a "<mode:textExtractors>" fragment under the "<configuration>" root element. Within the "<mode:textExtractors>" element place one or more "<mode:textExtractor>" fragments specifying at least the extractor's name and fully-qualified Java class.

For example, here is the fragment that defines the Teiid text extractor and the Tika text extractor. Note that the Teiid text extractor has no options and is pretty simple, while the Tika extractor allows much more control over the MIME types that should be processed:



<mode:textExtractors>
    <mode:textExtractor jcr:name="VDB Text Extractors">
      <mode:description>Extract text from Teiid VDB files</mode:description>        
      <mode:classname>org.modeshape.extractor.teiid.TeiidVdbTextExtractor</mode:classname>
    </mode:textExtractor>

    <mode:textExtractor jcr:name="Tika Text Extractors">
      <mode:description>Text extractors using Tika parsers</mode:description>        
      <mode:classname>org.modeshape.extractor.tika.TikaTextExtractor</mode:classname>
      <!-- 
      A comma- or whitespace-delimited list of MIME types that are to be excluded. 
      The following are excluded by default, but the default is completely overridden 
      when this property is set. In other words, if you explicitly exclude any MIME types,
      be sure to list all of the MIME types you want to exclude. Exclusions always 
      have a higher precedence than inclusions.
      -->
      <mode:excludedMimeTypes>
         application/x-archive,application/x-bzip,application/x-bzip2, 
         application/x-cpio,application/x-gtar,application/x-gzip, 
         application/x-ta,application/zip,application/vnd.teiid.vdb
      </mode:excludedMimeTypes>
      <!-- 
      A comma- or whitespace-delimited list of MIME types that are to be included. 
      If this is used, then the extractor will include only those MIME types found 
      in this list for which there is an available parser (unless the MIME type
      is also excluded). Including explicit MIME types is often easier if text is 
      to be extracted for are only a few MIME types.
      -->
      <mode:includedMimeTypes>
         application/msword,application/vnd.oasis.opendocument.text
      </mode:includedMimeTypes>
    </mode:textExtractor>
    ... <!-- other extractors -->
  </mode:textExtractors>

It's also possible to define your own text extractors by implementing the TextExtractor interface:

@ThreadSafe
public interface TextExtractor {

    /**
     * Determine if this extractor is capable of processing content with the supplied MIME type.
     * 
     * @param mimeType the MIME type; never null
     * @return true if this extractor can process content with the supplied MIME type, or false otherwise.
     */
    boolean supportsMimeType( String mimeType );

    /**
     * Sequence the data found in the supplied stream, placing the output information into the supplied map.
     * <p>
     * ModeShape's SequencingService determines the sequencers that should be executed by monitoring the changes to one or more
     * workspaces that it is monitoring. Changes in those workspaces are aggregated and used to determine which sequencers should
     * be called. If the sequencer implements this interface, then this method is called with the property that is to be sequenced
     * along with the interface used to register the output. The framework takes care of all the rest.
     * </p>
     * 
     * @param stream the stream with the data to be sequenced; never null
     * @param output the output from the sequencing operation; never null
     * @param context the context for the sequencing operation; never null
     * @throws IOException if there is a problem reading the stream
     */
    void extractFrom( InputStream stream,
                      TextExtractorOutput output,
                      TextExtractorContext context ) throws IOException;

}

As mentioned above, the "supportsMimeType" method will be called first, and only if your implementation returns true for a given MIME type will the "extractFrom" method be called. The supplied TextExtractorContext object provides information about the text being processed, while the TextExtractorOutput is a simple interface that your extractor uses to record one or more strings containing the extracted text.

If you need text extraction in sequencers or connectors, you can always get a TextExtractor instance from the ExecutionContext. That TextExtractor implementation is actually a composite of all of the text extractors defined in the configuration.

Of course, you can always use a different TextExtractor by creating a subcontext and supplying your implementation:

TextExtractor myExtractor = ...
ExecutionContext contextWithMyExtractor = context.with(myExtractor);

Two other components are made available by the ExecutionContext. The PropertyFactory is an interface that can be used to create Property instances, which are used throughout the graph API. The ValueFactories interface provides access to a number of different factories for different kinds of property values. These will be discussed in much more detail in the next chapter. But like the other components that are in an ExecutionContext, you can create subcontexts with different implementations:

PropertyFactory myPropertyFactory = ...
ExecutionContext contextWithMyPropertyFactory = context.with(myPropertyFactory);

and

ValueFactories myValueFactories = ...
ExecutionContext contextWithMyValueFactories = context.with(myValueFactories);

Of course, implementing your own factories is a pretty advanced topic, and it will likely be something you do not need to do in your application.

In this chapter, we introduced the ExecutionContext as a representation of the environment in which many of the ModeShape components operate. ExecutionContext provides a very simple but powerful way to inject commonly-needed facilities throughout the system.

In the next chapter, we'll dive into Graph API and will introduce the notion of nodes, paths, names, and properties, that are so essential and used throughout ModeShape.

One of the central concepts within ModeShape is that of its graph model. Information is structured into a hierarchy of nodes with properties, where nodes in the hierarchy are identified by their path (and/or identifier properties). Properties are identified by a name that incorporates a namespace and local name, and contain one or more property values consisting of normal Java strings, names, paths, URIs, booleans, longs, doubles, decimals, binary content, dates, UUIDs, references to other nodes, or any other serializable object.

This graph model is used throughout ModeShape: it forms the basis for the connector framework, it is used by the sequencing framework for the generated output, and it is what the JCR implementation uses internally to access and operate on the repository content.

Therefore, this chapter provides essential information that will be essential to really understanding how the connectors, sequencers, and other ModeShape features work.

ModeShape uses names to identify quite a few different types of objects. As we'll soon see, each property of a node is given by a name, and each segment in a path is comprised of a name. Therefore, names are a very important concept.

ModeShape names consist of a local part that is qualified with a namespace. The local part can consist of any character, and the namespace is identified by a URI. Namespaces were introduced in the previous chapter and are managed by the ExecutionContext's namespace registry. Namespaces help reduce the risk of clashes in names that have an equivalent same local part.

All names are immutable, which means that once a Name object is created, it will never change. This characteristic makes it much easier to write thread-safe code - the objects never change and therefore require no locks or synchronization to guarantee atomic reads. This is a technique that is more and more often found in newer languages and frameworks that simplify concurrent operations.

Name is also a interface rather than a concrete class:

@Immutable
public interface Name extends Comparable<Name>, Serializable, Readable {

    /**
     * Get the local name part of this qualified name.
     * @return the local name; never null
     */
    String getLocalName();

    /**
     * Get the URI for the namespace used in this qualified name.
     * @return the URI; never null but possibly empty
     */
    String getNamespaceUri();
}

This means that you need to use a factory to create Name instances.

The use of a factory may seem like a disadvantage and unnecessary complexity, but there actually are several benefits. First, it hides the concrete implementations, which is very appealing if an optimized implementation can be chosen for particular situations. It also simplifies the usage, since Name only has a few methods. Third, it allows the factory to cache or pool instances where appropriate to help conserve memory. Finally, the very same factory actually serves as a conversion mechanism from other forms. We'll actually see more of this later in this chapter, when we talk about other kinds of property values.

The factory for creating Name objects is called NameFactory and is available within the ExecutionContext, via the getValueFactories() method.

We'll see how names are used later on, but one more point to make: Name is both serializable and comparable, and all implementations should support equals(...) and hashCode() so that Name can be used as a key in a hash-based map. Name also extends the Readable interface, which we'll learn more about later in this chapter.

Another important concept in ModeShape's graph model is that of a path, which provides a way of locating a node within a hierarchy. ModeShape's Path object is an immutable ordered sequence of Path.Segment objects. A small portion of the interface is shown here:

@Immutable
public interface Path extends Comparable<Path>, Iterable<Path.Segment>, Serializable, Readable {

    /**
     * Return the number of segments in this path.
     * @return the number of path segments
     */
    public int size();

    /**
     * Return whether this path represents the root path.
     * @return true if this path is the root path, or false otherwise
     */
    public boolean isRoot();

    /**
     * {@inheritDoc}
     */
    public Iterator<Path.Segment> iterator();

    /**
     * Obtain a copy of the segments in this path. None of the segments are encoded.
     * @return the array of segments as a copy
     */
    public Path.Segment[] getSegmentsArray();

    /**
     * Get an unmodifiable list of the path segments.
     * @return the unmodifiable list of path segments; never null
     */
    public List<Path.Segment> getSegmentsList();
    /**
     * Get the last segment in this path.
     * @return the last segment, or null if the path is empty
     */
    public Path.Segment getLastSegment();

    /**
     * Get the segment at the supplied index.
     * @param index the index
     * @return the segment
     * @throws IndexOutOfBoundsException if the index is out of bounds
     */
    public Path.Segment getSegment( int index );

    /**
     * Return an iterator that walks the paths from the root path down to this path. This method 
     * always returns at least one path (the root returns an iterator containing itself).
     * @return the path iterator; never null
     */
    public Iterator<Path> pathsFromRoot();

    /**
     * Return a new path consisting of the segments starting at beginIndex index (inclusive). 
     * This is equivalent to calling path.subpath(beginIndex,path.size()-1).
     * @param beginIndex the beginning index, inclusive.
     * @return the specified subpath
     * @exception IndexOutOfBoundsException if the beginIndex is negative or larger 
     *            than the length of this Path object
     */
    public Path subpath( int beginIndex );

    /**
     * Return a new path consisting of the segments between the beginIndex index (inclusive)
     * and the endIndex index (exclusive).
     * @param beginIndex the beginning index, inclusive.
     * @param endIndex the ending index, exclusive.
     * @return the specified subpath
     * @exception IndexOutOfBoundsException if the beginIndex is negative, or 
     *            endIndex is larger than the length of this Path 
     *            object, or beginIndex is larger than endIndex.
     */
    public Path subpath( int beginIndex, int endIndex );

    ...
}   

There are actually quite a few methods (not shown above) for obtaining related paths: the path of the parent, the path of an ancestor, resolving a path relative to this path, normalizing a path (by removing "." and ".." segments), finding the lowest common ancestor shared with another path, etc. There are also a number of methods that compare the path with others, including determining whether a path is above, equal to, or below this path.

Each Path.Segment is an immutable pair of a Name and same-name-sibling (SNS) index. When two sibling nodes have the same name, then the first sibling will have SNS index of "1" and the second will be given a SNS index of "2". (This mirrors the same-name-sibling index behavior of JCR paths.)

@Immutable
public static interface Path.Segment extends Cloneable, Comparable<Path.Segment>, Serializable, Readable 
{

    /**
     * Get the name component of this segment.
     * @return the segment's name
     */
    public Name getName();

    /**
     * Get the index for this segment, which will be 1 by default.
     * @return the index
     */
    public int getIndex();

    /**
     * Return whether this segment has an index that is not "1"
     * @return true if this segment has an index, or false otherwise.
     */
    public boolean hasIndex();

    /**
     * Return whether this segment is a self-reference (or ".").
     * @return true if the segment is a self-reference, or false otherwise.
     */
    public boolean isSelfReference();

    /**
     * Return whether this segment is a reference to a parent (or "..")
     * @return true if the segment is a parent-reference, or false otherwise.
     */
    public boolean isParentReference();
}
		

Like Name, the only way to create a Path or a Path.Segment is to use the PathFactory, which is available within the ExecutionContext via the getValueFactories() method.

The ModeShape graph model allows nodes to hold multiple properties, where each property is identified by a unique Name and may have one or more values. Like many of the other classes used in the graph model, Property is an immutable object that, once constructed, can never be changed and therefore provides a consistent snapshot of the state of a property as it existed at the time it was read.

ModeShape properties can hold a wide range of value objects, including normal Java strings, names, paths, URIs, booleans, longs, doubles, decimals, binary content, dates, UUIDs, references to other nodes, or any other serializable object. All but three of these are the standard Java classes: dates are represented by an immutable DateTime class; binary content is represented by an immutable Binary interface patterned after the interface of the same name in JSR-283; and Reference is an immutable interface patterned after the corresponding interface is JSR-170 and JSR-283.

The Property interface defines methods for obtaining the name and property values:

@Immutable
public interface Property extends Iterable<Object>, Comparable<Property>, Readable {

    /**
     * Get the name of the property.
     * 
     * @return the property name; never null
     */
    Name getName();

    /**
     * Get the number of actual values in this property.
     * @return the number of actual values in this property; always non-negative
     */
    int size();

    /**
     * Determine whether the property currently has multiple values.
     * @return true if the property has multiple values, or false otherwise.
     */
    boolean isMultiple();

    /**
     * Determine whether the property currently has a single value.
     * @return true if the property has a single value, or false otherwise.
     */
    boolean isSingle();

    /**
     * Determine whether this property has no actual values. This method may return true 
     * regardless of whether the property has a single value or multiple values.
     * This method is a convenience method that is equivalent to size() == 0.
     * @return true if this property has no values, or false otherwise
     */
    boolean isEmpty();

    /**
     * Obtain the property's first value in its natural form. This is equivalent to calling
     * isEmpty() ? null : iterator().next()
     * @return the first value, or null if the property is {@link #isEmpty() empty}
     */
    Object getFirstValue();

    /**
     * Obtain the property's values in their natural form. This is equivalent to calling iterator().
     * A valid iterator is returned if the property has single valued or multi-valued.
     * The resulting iterator is immutable, and all property values are immutable.
     * @return an iterator over the values; never null
     */
    Iterator<?> getValues();

    /**
     * Obtain the property's values as an array of objects in their natural form.
     * A valid iterator is returned if the property has single valued or multi-valued, or a
     * null value is returned if the property is {@link #isEmpty() empty}.
     * The resulting array is a copy, guaranteeing immutability for the property.
     * @return the array of values
     */
    Object[] getValuesAsArray();
}
		

Creating Property instances is done by using the PropertyFactory object owned by the ExecutionContext. This factory defines methods for creating properties with a Name and various representation of values, including variable-length arguments, arrays, Iterator, and Iterable.

When it comes to using the property values, ModeShape takes a non-traditional approach. Many other graph models (including JCR) mark each property with a data type and then require all property values adhere to this data type. When the property values are obtained, they are guaranteed to be of the correct type. However, many times the property's data type may not match the data type expected by the caller, and so a conversion may be required and thus has to be coded.

The ModeShape graph model uses a different tact. Because callers almost always have to convert the values to the types they can handle, ModeShape skips the steps of associating the Property with a data type and ensuring the values match. Instead, ModeShape simply provides a very easy mechanism to convert the property values to the type desired by the caller. In fact, the conversion mechanism is exactly the same as the factories that create the values in the first place.

ModeShape properties can hold a variety of value object types: strings, names, paths, URIs, booleans, longs, doubles, decimals, binary content, dates, UUIDs, references to other nodes, or any other serializable object. To assist in the creation of these values and conversion into other types, ModeShape defines a ValueFactory interface. This interface is parameterized with the type of value that is being created, but defines methods for creating those values from all of the other known value types:

public interface ValueFactory<T> {

    /**
     * Get the PropertyType of values created by this factory.
     * @return the value type; never null
     */
    PropertyType getPropertyType();

		/*
		 * Methods to create a value by converting from another value type.
		 * If the supplied value is the same type as returned by this factory,
		 * these methods simply return the supplied value.
		 * All of these methods throw a ValueFormatException if the supplied value
		 * could not be converted to this type.
		 */
    T create( String value ) throws ValueFormatException;
    T create( String value, TextDecoder decoder ) throws ValueFormatException;
    T create( int value ) throws ValueFormatException;
    T create( long value ) throws ValueFormatException;
    T create( boolean value ) throws ValueFormatException;
    T create( float value ) throws ValueFormatException;
    T create( double value ) throws ValueFormatException;
    T create( BigDecimal value ) throws ValueFormatException;
    T create( Calendar value ) throws ValueFormatException;
    T create( Date value ) throws ValueFormatException;
    T create( DateTime value ) throws ValueFormatException;
    T create( Name value ) throws ValueFormatException;
    T create( Path value ) throws ValueFormatException;
    T create( Reference value ) throws ValueFormatException;
    T create( URI value ) throws ValueFormatException;
    T create( UUID value ) throws ValueFormatException;
    T create( byte[] value ) throws ValueFormatException;
    T create( Binary value ) throws ValueFormatException, IoException;
    T create( InputStream stream, long approximateLength ) throws ValueFormatException, IoException;
    T create( Reader reader, long approximateLength ) throws ValueFormatException, IoException;
    T create( Object value ) throws ValueFormatException, IoException;

    /*
     * Methods to create an array of values by converting from another array of values.
     * If the supplied values are the same type as returned by this factory,
     * these methods simply return the supplied array.
     * All of these methods throw a ValueFormatException if the supplied values
     * could not be converted to this type.
		 */
    T[] create( String[] values ) throws ValueFormatException;
    T[] create( String[] values, TextDecoder decoder ) throws ValueFormatException;
    T[] create( int[] values ) throws ValueFormatException;
    T[] create( long[] values ) throws ValueFormatException;
    T[] create( boolean[] values ) throws ValueFormatException;
    T[] create( float[] values ) throws ValueFormatException;
    T[] create( double[] values ) throws ValueFormatException;
    T[] create( BigDecimal[] values ) throws ValueFormatException;
    T[] create( Calendar[] values ) throws ValueFormatException;
    T[] create( Date[] values ) throws ValueFormatException;
    T[] create( DateTime[] values ) throws ValueFormatException;
    T[] create( Name[] values ) throws ValueFormatException;
    T[] create( Path[] values ) throws ValueFormatException;
    T[] create( Reference[] values ) throws ValueFormatException;
    T[] create( URI[] values ) throws ValueFormatException;
    T[] create( UUID[] values ) throws ValueFormatException;
    T[] create( byte[][] values ) throws ValueFormatException;
    T[] create( Binary[] values ) throws ValueFormatException, IoException;
    T[] create( Object[] values ) throws ValueFormatException, IoException;

    /**
     * Create an iterator over the values (of an unknown type). The factory converts any 
     * values as required.  This is useful when wanting to iterate over the values of a property,
     * where the resulting iterator exposes the desired type.
     * @param values the values
     * @return the iterator of type T over the values, or null if the supplied parameter is null
     * @throws ValueFormatException if the conversion from an iterator of objects could not be performed
     * @throws IoException If an unexpected problem occurs during the conversion.
     */
    Iterator<T> create( Iterator<?> values ) throws ValueFormatException, IoException;
    Iterable<T> create( Iterable<?> valueIterable ) throws ValueFormatException, IoException;
}
	

This makes it very easy to convert one or more values (of any type, including mixtures) into corresponding value(s) that are of the desired type. For example, converting the first value of a property (regardless of type) to a String is simple:

ValueFactory<String> stringFactory = ...
Property property = ...
String value = stringFactory.create( property.getFirstValue() );
		

Likewise, iterating over the values in a property and converting them is just as easy:

ValueFactory<String> stringFactory = ...
Property property = ...
for ( String value : stringFactory.create(property) ) {
    // do something with the values
}
		

What we've glossed over so far, however, is how to obtain the correct ValueFactory for the desired type. If you remember back in the previous chapter, ExecutionContext has a getValueFactories() method that return a ValueFactories interface:

		

This interface exposes a ValueFactory for each of the types, and even has methods to obtain a ValueFactory given the PropertyType enumeration. So, the previous examples could be expanded a bit:

ValueFactory<String> stringFactory = context.getValueFactories().getStringFactory();
Property property = ...
String value = stringFactory.create( property.getFirstValue() );
		

and

ValueFactory<String> stringFactory = context.getValueFactories().getStringFactory();
Property property = ...
for ( String value : stringFactory.create(property) ) {
    // do something with the values
}
		

You might have noticed that several of the ValueFactories methods return subinterfaces of ValueFactory. These add type-specific methods that are more commonly needed in certain cases. For example, here is the NameFactory interface:

public interface NameFactory extends ValueFactory<Name> {

    Name create( String namespaceUri, String localName );
    Name create( String namespaceUri, String localName, TextDecoder decoder );

    NamespaceRegistry getNamespaceRegistry();
}
		

and here is the DateTimeFactory interface, which adds methods for creating DateTime values for the current time as well as for specific instants in time:

public interface DateTimeFactory extends ValueFactory<DateTime> {

    /**
     * Create a date-time instance for the current time in the local time zone.
     */
    DateTime create();

    /**
     * Create a date-time instance for the current time in UTC.
     */
    DateTime createUtc();

    DateTime create( DateTime original, long offsetInMillis );
    DateTime create( int year, int monthOfYear, int dayOfMonth,
                     int hourOfDay, int minuteOfHour, int secondOfMinute, int millisecondsOfSecond );
    DateTime create( int year, int monthOfYear, int dayOfMonth,
                     int hourOfDay, int minuteOfHour, int secondOfMinute, int millisecondsOfSecond,
                     int timeZoneOffsetHours );
    DateTime create(	int year, int monthOfYear, int dayOfMonth,
                     int hourOfDay, int minuteOfHour, int secondOfMinute, int millisecondsOfSecond,
                     int timeZoneOffsetHours, String timeZoneId );
}
		

The PathFactory interface defines methods for creating relative and absolute Path objects using combinations of other Path objects and Names and Path.Segments, and introduces methods for creating Path.Segment objects:

public interface PathFactory extends ValueFactory<Path> {

    Path createRootPath();
    Path createAbsolutePath( Name... segmentNames );
    Path createAbsolutePath( Path.Segment... segments );
    Path createAbsolutePath( Iterable<Path.Segment> segments );

    Path createRelativePath();
    Path createRelativePath( Name... segmentNames );
    Path createRelativePath( Path.Segment... segments );
    Path createRelativePath( Iterable<Path.Segment> segments );

    Path create( Path parentPath, Path childPath );
    Path create( Path parentPath, Name segmentName, int index );
    Path create( Path parentPath, String segmentName, int index );
    Path create( Path parentPath, Name... segmentNames );
    Path create( Path parentPath, Path.Segment... segments );
    Path create( Path parentPath, Iterable<Path.Segment> segments );
    Path create( Path parentPath, String subpath );

    Path.Segment createSegment( String segmentName );
    Path.Segment createSegment( String segmentName, TextDecoder decoder );
    Path.Segment createSegment( String segmentName, int index );
    Path.Segment createSegment( Name segmentName );
    Path.Segment createSegment( Name segmentName, int index );
}

And finally, the BinaryFactory defines methods for creating Binary objects from a variety of binary formats, as well as a method that looks for a cached Binary instance given the supplied secure hash:

public interface BinaryFactory extends ValueFactory<Binary> {

    /**
     * Create a value from the binary content given by the supplied input, the approximate length, 
     * and the SHA-1 secure hash of the content. If the secure hash is null, then a secure hash is
     * computed from the content. If the secure hash is not null, it is assumed to be the hash for 
     * the content and may not be checked.
     */
    Binary create( InputStream stream, long approximateLength, byte[] secureHash ) 
                          throws ValueFormatException, IoException;
    Binary create( Reader reader, long approximateLength, byte[] secureHash ) 
                          throws ValueFormatException, IoException;

    /**
     * Create a binary value from the given file.
     */
    Binary create( File file ) throws ValueFormatException, IoException;

    /**
     * Find an existing binary value given the supplied secure hash. If no such binary value exists, 
     * null is returned. This method can be used when the caller knows the secure hash (e.g., from 
     * a previously-held Binary object), and would like to reuse an existing binary value 
     * (if possible) rather than recreate the binary value by processing the stream contents. This is
     * especially true when the size of the binary is quite large.
     * 
     * @param secureHash the secure hash of the binary content, which was probably obtained from a
     *        previously-held Binary object; a null or empty value is allowed, but will always 
     *        result in returning null
     * @return the existing Binary value that has the same secure hash, or null if there is no 
     *        such value available at this time
     */
    Binary find( byte[] secureHash );
}
		

ModeShape provides efficient implementations of all of these interfaces: the ValueFactory interfaces and subinterfaces; the Path, Path.Segment, Name, Binary, DateTime, and Reference interfaces; and the ValueFactories interface returned by the ExecutionContext. In fact, some of these interfaces have multiple implementations that are optimized for specific but frequently-occurring conditions.

As shown above, the Name, Path.Segment, Path, and Property interfaces all extend the Readable interface, which defines a number of getString(...) methods that can produce a (readable) string representation of of that object. Recall that all of these objects contain names with namespace URIs and local names (consisting of any characters), and so obtaining a readable string representation will require converting the URIs to prefixes, escaping certain characters in the local names, and formatting the prefix and escaped local name appropriately. The different getString(...) methods of the Readable interface accept various combinations of NamespaceRegistry and TextEncoder parameters:

@Immutable
public interface Readable {

    /**
     * Get the string form of the object. A default encoder is used to encode characters.
     * @return the encoded string
     */
    public String getString();

    /**
     * Get the encoded string form of the object, using the supplied encoder to encode characters.
     * @param encoder the encoder to use, or null if the default encoder should be used
     * @return the encoded string
     */
    public String getString( TextEncoder encoder );

    /**
     * Get the string form of the object, using the supplied namespace registry to convert any 
     * namespace URIs to prefixes. A default encoder is used to encode characters.
     * @param namespaceRegistry the namespace registry that should be used to obtain the prefix
     *        for any namespace URIs
     * @return the encoded string
     * @throws IllegalArgumentException if the namespace registry is null
     */
    public String getString( NamespaceRegistry namespaceRegistry );

    /**
     * Get the encoded string form of the object, using the supplied namespace registry to convert 
     * the any namespace URIs to prefixes.
     * @param namespaceRegistry the namespace registry that should be used to obtain the prefix for 
     *        the namespace URIs
     * @param encoder the encoder to use, or null if the default encoder should be used
     * @return the encoded string
     * @throws IllegalArgumentException if the namespace registry is null
     */
    public String getString( NamespaceRegistry namespaceRegistry,
                             TextEncoder encoder );

    /**
     * Get the encoded string form of the object, using the supplied namespace registry to convert 
     * the names' namespace URIs to prefixes and the supplied encoder to encode characters, and using 
     * the second delimiter to encode (or convert) the delimiter used between the namespace prefix 
     * and the local part of any names.
     * @param namespaceRegistry the namespace registry that should be used to obtain the prefix 
     *        for the namespace URIs in the names
     * @param encoder the encoder to use for encoding the local part and namespace prefix of any names, 
     *        or null if the default encoder should be used
     * @param delimiterEncoder the encoder to use for encoding the delimiter between the local part 
     *        and namespace prefix of any names, or null if the standard delimiter should be used
     * @return the encoded string
     */
    public String getString( NamespaceRegistry namespaceRegistry,
                             TextEncoder encoder, TextEncoder delimiterEncoder );
}
		

We've seen the NamespaceRegistry in the previous chapter, but we've haven't yet talked about the TextEncoder interface. A TextEncoder merely does what you'd expect: it encodes the characters in a string using some implementation-specific algorithm. ModeShape provides a number of TextEncoder implementations, including:

  • The Jsr283Encoder escapes characters that are not allowed in JCR names, per the JSR-283 specification. Specifically, these are the '*', '/', ':', '[', ']', and '|' characters, which are escaped by replacing them with the Unicode characters U+F02A, U+F02F, U+F03A, U+F05B, U+F05D, and U+F07C, respectively.

  • The NoOpEncoder does no conversion.

  • The UrlEncoder converts text to be used within the different parts of a URL, as defined by Section 2.3 of RFC 2396. Note that this class does not encode a complete URL (since java.net.URLEncoder and java.net.URLDecoder should be used for such purposes).

  • The XmlNameEncoder converts any UTF-16 unicode character that is not a valid XML name character according to the World Wide Web Consortium (W3C) Extensible Markup Language (XML) 1.0 (Fourth Edition) Recommendation, escaping such characters as _xHHHH_, where HHHH stands for the four-digit hexadecimal UTF-16 unicode value for the character in the most significant bit first order. For example, the name "Customer_ID" is encoded as "Customer_x0020_ID".

  • The XmlValueEncoder escapes characters that are not allowed in XML values. Specifically, these are the '&', '<', '>', '"', and ''', which are all escaped to "&amp;", '&lt;', '&gt;', '&quot;', and '&#039;'.

  • The FileNameEncoder escapes characters that are not allowed in file names on Linux, OS X, or Windows XP. Unsafe characters are escaped as described in the UrlEncoder.

  • The SecureHashTextEncoder performs a secure hash of the input text and returns that hash as the encoded text. This encoder can be configured to use different secure hash algorithms and to return a fixed number of characters from the hash.

All of these classes also implement the TextDecoder interface, which defines a method that decodes an encoded string using the opposite transformation.

Of course, you can provide alternative implementations, and supply them to the appropriate getString(...) methods as required.

In addition to Path objects, nodes can be identified by one or more identification properties. These really are just Property instances with names that have a special meaning (usually to connectors). ModeShape also defines a Location class that encapsulates:

  • the Path to the node; or

  • one or more identification properties that are likely source-specific and that are represented with Property objects; or

  • a combination of both.

So, when a client knows the path and/or the identification properties, they can create a Location object and then use that to identify the node. Location is a class that can be instantiated through factory methods on the class:

public abstract class Location implements Iterable<Property>, Comparable<Location> {

    public static Location create( Path path ) { ... }
    public static Location create( UUID uuid ) { ... }
    public static Location create( Path path, UUID uuid ) { ... }
    public static Location create( Path path, Property idProperty ) { ... }
    public static Location create( Path path, Property firstIdProperty, 
                                     Property... remainingIdProperties ) { ... }
    public static Location create( Path path, Iterable<Property idProperties ) { ... }
    public static Location create( Property idProperty ) { ... }
    public static Location create( Property firstIdProperty, 
                                     Property... remainingIdProperties ) { ... }
    public static Location create( Iterable<Property> idProperties ) { ... }
    public static Location create( List<Property> idProperties ) { ... }
    ...
}		

Like many of the other classes and interfaces, Location is immutable and cannot be changed once created. However, there are methods on the class to create a copy of the Location object with a different Path, a different UUID, or different identification properties:

public abstract class Location implements Iterable<Property>, Comparable<Location> {
    ...
    public Location with( Property newIdProperty );
    public Location with( Path newPath );
    public Location with( UUID uuid );
    ...
}		

One more thing about locations: we'll see later in the next chapter how they are used to make requests to the connectors. When creating the requests, clients usually have an incomplete location (e.g., a path but no identification properties). When processing the requests, connectors provide an actual location that contains the path and all identification properties. If actual Location objects are then reused in subsequent requests by the client, the connectors will have the benefit of having both the path and identification properties and may be able to more efficiently locate the identified node.

ModeShape's Graph API was designed as a lightweight public API for working with graph information. The Graph class is the primary class in API, and each instance represents a single, independent view of a single graph. Graph instances don't maintain state, so every request (or batch of requests) operates against the underlying graph and then returns immutable snapshots of the requested state at the time the request was made.

There are several ways to obtain a Graph instance, as we'll see in later chapters. For the time being, the important thing to understand is what a Graph instance represents and how it interacts with the underlying content to return representations of portions of that underlying graph content.

The Graph class basically represents an internal domain specific language (DSL), designed to be easy to use in an application. The Graph API makes extensive use of interfaces and method chaining, so that methods return a concise interface that has only those methods that make sense at that point. In fact, this should be really easy if your IDE has code completion. Just remember that under the covers, a Graph is just building Request objects, submitting them to the connector, and then exposing the results.

The next few subsections describe how to use a Graph instance.

ModeShape graphs have the notion of workspaces that provide different views of the content. Some graphs may have one workspace, while others may have multiple workspaces. Some graphs will allow a client to create new workspaces or destroy existing workspaces, while other graphs will not allow adding or removing workspaces. Some graphs may have workspaces that may show the same (or very similar) content, while other graphs may have workspaces that contain completely independent content.

The Graph object is always bound to a workspace, which initially is the default workspace. To find out what the name of the default workspace is, simply ask for the current workspace after creating the Graph:

Workspace current = graph.getCurrentWorkspace();

To obtain the list of workspaces available in a graph, simply ask for them:

Set<String> workspaceNames = graph.getWorkspaces();

Once you know the name of a particular workspace, you can specify that the graph should use it:

graph.useWorkspace("myWorkspace");

From this point forward, all requests will apply to the workspace named "myWorkspace". At any time, you can use a different workspace, which will affect all subsequent requests made using the graph. To go back to the default workspace, simply supply a null name:

graph.useWorkspace(null);

Of course, creating a new workspace is just as easy:

graph.createWorkspace().named("newWorkspace");

This will attempt to create a workspace named "newWorkspace", which will fail if that workspace already exists. You may want to create a new workspace with a name that should be altered if the name you supply is already used. The following code shows how you can do this:

graph.createWorkspace().namedSomethingLike("newWorkspace");

If there is no existing workspace named "newWorkspace", a new one will be created with this name. However, if "newWorkspace" already exists, this call will create a workspace with a name that is some alteration of the supplied name.

You can also clone workspaces, too:

graph.createWorkspace().clonedFrom("original").named("something");

or

graph.createWorkspace().clonedFrom("original").namedSomethingLike("something");

As you can see, it's very easy to specify which workspace you want to use or to create new workspaces. You can also find out which workspace the graph is currently using:

String current = graph.getCurrentWorkspaceName();

or, if you want, you can get more information about the workspace:

Workspace current = graph.getCurrentWorkspace();
String name = current.getName();
Location rootLocation = current.getRoot();

Now let's switch to working with nodes. This first example returns a map of properties (keyed by property name) for a node at a specific Path:

Path path = ...
Map<Name,Property> propertiesByName = graph.getPropertiesByName().on(path);

This next example shows how the graph can be used to obtain and loop over the properties of a node:

Path path = ...
for ( Property property : graph.getProperties().on(path) ) {
	  ...
}

Likewise, the next example shows how the graph can be used to obtain and loop over the children of a node:

Path path = ...
for ( Location child : graph.getChildren().of(path) ) {
    Path childPath = child.getPath();
	  ...
}

Notice that the examples pass a Path instance to the on(...) and of(...) methods. Many of the Graph API methods take a variety of parameter types, including String, Paths, Locations, UUID, or Property parameters. This should make it easy to use in many different situations.

Of course, changing content is more interesting and offers more interesting possibilities. Here are a few examples:

Path path = ...
Location location = ...
Property idProp1 = ...
Property idProp2 = ...
UUID uuid = ...
graph.move(path).into(idProp1, idProp2);
graph.copy(path).into(location);
graph.delete(uuid);
graph.delete(idProp1,idProp2);

The methods shown above work immediately, as soon as each request is built. However, there is another way to use the Graph object, and that is in a batch mode. Simply create a Graph.Batch object using the batch() method, create the requests on that batch object, and then execute all of the commands on the batch by calling its execute() method. That execute() method returns a Results interface that can be used to read the node information retrieved by the batched requests.

Method chaining works really well with the batch mode, since multiple commands can be assembled together very easily:

Path path = ...
String path2 = ...
Location location = ...
Property idProp1 = ...
Property idProp2 = ...
UUID uuid = ...
graph.batch().move(path).into(idProp1, idProp2)
       .and().copy(path2).into(location)
       .and().delete(uuid)
       .execute();
Results results = graph.batch().read(path2)
                           .and().readChildren().of(idProp1,idProp2)
                           .and().readSugraphOfDepth(3).at(uuid2)
                           .execute();
for ( Location child : results.getNode(path2) ) {
    ...
}

Of course, this section provided just a hint of the Graph API. The Graph interface is actually quite complete and offers a full-featured approach for reading and updating a graph. For more information, see the Graph JavaDocs.

ModeShape Graph objects operate upon the underlying graph content, but we haven't really talked about how that works. Recall that the Graph objects don't maintain any stateful representation of the content, but instead submit requests to the underlying graph and return representations of the requested portions of the content. This section focuses on what those requests look like, since they'll actually become very important when working with connectors in the next chapter.

A graph Request is an encapsulation of a command that is to be executed by the underlying graph owner (typically a connector). Request objects can take many different forms, as there are different classes for each kind of request. Each request contains the information needed to complete the processing, and it also is the place where the results (or error) are recorded.

The Graph object creates the Request objects using Location objects to identify the node (or nodes) that are the subject of the request. The Graph can either submit the request immediately, or it can batch multiple requests together into "units of work". The submitted requests are then processed by the underlying system (e.g., connector) and returned back to the Graph object, which then extracts and returns the results.

There are actually quite a few different types of Request classes:

Table 3.1. Types of Read Requests

NameDescription
ReadNodeRequest A request to read a node's properties and children from the named workspace in the source. The node may be specified by path and/or by identification properties. The connector returns all properties and the locations for all children, or sets a PathNotFoundException error on the request if the node did not exist in the workspace. If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties). The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
VerifyNodeExistsRequest A request to verify the existence of a node at the specified location in the named workspace of the source. The connector returns all the actual location for the node if it exists, or sets a PathNotFoundException error on the request if the node does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadAllPropertiesRequest A request to read all of the properties of a node from the named workspace in the source. The node may be specified by path and/or by identification properties. The connector returns all properties that were found on the node, or sets a PathNotFoundException error on the request if the node did not exist in the workspace. If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties). The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadPropertyRequest A request to read a single property of a node from the named workspace in the source. The node may be specified by path and/or by identification properties, and the property is specified by name. The connector returns the property if found on the node, or sets a PathNotFoundException error on the request if the node or property did not exist in the workspace. If the node is found, the connector sets on the request the actual location of the node (including the path and identification properties). The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadAllChildrenRequest A request to read all of the children of a node from the named workspace in the source. The node may be specified by path and/or by identification properties. The connector returns an ordered list of locations for each child found on the node, an empty list if the node had no children, or sets a PathNotFoundException error on the request if the node did not exist in the workspace. If the node is found, the connector sets on the request the actual location of the parent node (including the path and identification properties). The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadBlockOfChildrenRequest A request to read a block of children of a node, starting with the nth child from the named workspace in the source. This is designed to allow paging through the children, which is much more efficient for large numbers of children. The node may be specified by path and/or by identification properties, and the block is defined by a starting index and a count (i.e., the block size). The connector returns an ordered list of locations for each of the node's children found in the block, or an empty list if there are no children in that range. The connector also sets on the request the actual location of the parent node (including the path and identification properties) or sets a PathNotFoundException error on the request if the parent node did not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadNextBlockOfChildrenRequest A request to read a block of children of a node, starting with the children that immediately follow a previously-returned child from the named workspace in the source. This is designed to allow paging through the children, which is much more efficient for large numbers of children. The node may be specified by path and/or by identification properties, and the block is defined by the location of the node immediately preceding the block and a count (i.e., the block size). The connector returns an ordered list of locations for each of the node's children found in the block, or an empty list if there are no children in that range. The connector also sets on the request the actual location of the parent node (including the path and identification properties) or sets a PathNotFoundException error on the request if the parent node did not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
ReadBranchRequest A request to read a portion of a subgraph that has as its root a particular node, up to a maximum depth. This request is an efficient mechanism when a branch (or part of a branch) is to be navigated and processed, and replaces some non-trivial code to read the branch iteratively using multiple ReadNodeRequests. The connector reads the branch to the specified maximum depth, returning the properties and children for all nodes found in the branch. The connector also sets on the request the actual location of the branch's root node (including the path and identification properties). The connector sets a PathNotFoundException error on the request if the node at the top of the branch does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.

ChangeRequest is a subclass of Request that provides a base class for all the requests that request a change be made to the content. As we'll see later, these ChangeRequest objects also get reused by the observation system.

Table 3.2. Types of Change Requests

NameDescription
CreateNodeRequest A request to create a node at the specified location and setting on the new node the properties included in the request. The connector creates the node at the desired location, adjusting any same-name-sibling indexes as required. (If an SNS index is provided in the new node's location, existing children with the same name after that SNS index will have their SNS indexes adjusted. However, if the requested location does not include a SNS index, the new node is added after all existing children, and it's SNS index is set accordingly.) The connector also sets on the request the actual location of the new node (including the path and identification properties).. The connector sets a PathNotFoundException error on the request if the parent node does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
RemovePropertiesRequest A request to remove a set of properties on an existing node. The request contains the location of the node as well as the names of the properties to be removed. The connector performs these changes and sets on the request the actual location (including the path and identification properties) of the node. The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
UpdatePropertiesRequest A request to set or update properties on an existing node. The request contains the location of the node as well as the properties to be set and those to be deleted. The connector performs these changes and sets on the request the actual location (including the path and identification properties) of the node. The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
RenameNodeRequest A request to change the name of a node. The connector changes the node's name, adjusts all SNS indexes accordingly, and returns the actual locations (including the path and identification properties) of both the original location and the new location. The connector sets a PathNotFoundException error on the request if the node does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
CopyBranchRequest A request to copy a portion of a subgraph that has as its root a particular node, up to a maximum depth. The request includes the name of the workspace where the original node is located as well as the name of the workspace where the copy is to be placed (these may be the same, but may be different). The connector copies the branch from the original location, up to the specified maximum depth, and places a copy of the node as a child of the new location. The connector also sets on the request the actual location (including the path and identification properties) of the original location as well as the location of the new copy. The connector sets a PathNotFoundException error on the request if the node at the top of the branch does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if one of the named workspaces does not exist.
MoveBranchRequest A request to move a subgraph that has a particular node as its root. The connector moves the branch from the original location and places it as child of the specified new location. The connector also sets on the request the actual location (including the path and identification properties) of the original and new locations. The connector will adjust SNS indexes accordingly. The connector sets a PathNotFoundException error on the request if the node that is to be moved or the new location do not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
DeleteBranchRequest A request to delete an entire branch specified by a single node's location. The connector deletes the specified node and all nodes below it, and sets the actual location, including the path and identification properties, of the node that was deleted. The connector sets a PathNotFoundException error on the request if the node being deleted does not exist in the workspace. The connector sets a InvalidWorkspaceException error on the request if the named workspace does not exist.
CompositeRequest A request that actually comprises multiple requests (none of which will be a composite). The connector simply processes all of the requests in the composite request, but should set on the composite request any error (usually the first error) that occurs during processing of the contained requests.

There are also requests that deal with workspaces:


And there are also requests that deal with changing workspaces (and thus extend ChangeRequest):


Although there are over a dozen different kinds of requests, we do anticipate adding more in future releases. For example, ModeShape has recently added support for searching repository content in sources through an additional subclass of Request. Getting the version history for a node will likely be another kind of request added in an upcoming release.

This section covered the different kinds of Request classes. The next section provides a easy way to encapsulate how a component should responds to these requests, and after that we'll see how these Request objects are also used in the observation framework.

ModeShape connectors are typically the components that receive these Request objects. We'll dive deep into connectors in the next chapter, but before we do there is one more component related to Requests that should be discussed.

The RequestProcessor class is an abstract class that defines a process(...) method for each concrete Request subclass. In other words, there is a process(CompositeRequest) method, a process(ReadNodeRequest) method, and so on. This makes it easy to implement behavior that responds to the different kinds of Request classes: simply subclass the RequestProcessor, override all of the abstract methods, and optionally overriding any of the other methods that have a default implementation.

Note

The RequestProcessor abstract class contains default implementations for quite a few of the process(...) methods, and these will be sufficient but probably not efficient or optimum. If you can provide a more efficient implementation given your source, feel free to do so. However, if performance is not a big issue, all of the concrete methods will provide the correct behavior. Keep things simple to start out - you can always provide better implementations later.

The ModeShape graph model also incorporates an observation framework that allows components to register and be notified when changes occur within the content owned by a graph.

Many event frameworks define the listeners and sources as interfaces. While this is often useful, it requires that the implementations properly address the thread-safe semantics of managing and calling the listeners. The ModeShape observation framework uses abstract or concrete classes to minimize the effort required for implementing ChangeObserver or Observable. These abstract classes provide implementations for a number of utility methods (such as the unregister() method on ChangeObserver) that also save effort and code.

However, one of the more important reasons for providing classes is that ChangeObserver uses weak references to track the Observable instances, and the ChangeObservers class uses weak references for the listeners. This means that an observer does not prevent Observable instances from being garbage collected, nor do observers prevent Observable instances from being garbage collected. These abstract class provide all this functionality for free.

Any component that can have changes and be observed can implement the Observable interface. This interface allows Observers to register (or be registered) to receive notifications of the changes. However, a concrete and thread-safe implementation of this interface, called ChangeObservers, is available and should be used where possible, since it automatically manages the registered ChangeObserver instances and properly implements the register and unregister mechanisms.

Components that are to recieve notifications of changes are called observers. To create an observer, simply extend the ChangeObserver abstract class and provide an implementation of the notify(Changes) method. Then, register the observer with an Observable using its register(ChangeObserver) method. The observer's notify(Changes) method will then be called with the changes that have been made to the Observable.

When an observer is no longer needed, it should be unregistered from all Observable instances with which it was registered. The ChangeObserver class automatically tracks which Observable instances it is registered with, and calling the observer's unregister() will unregister the observer from all of these Observables. Alternatively, an observer can be unregistered from a single Observable using the Observable's unregister(ChangeObserver) method.

The Changes class represents the set of individual changes that have been made during a single, atomic operation. Each Changes instance has information about the source of the changes, the timestamp at which the changes occurred, and the individual changes that were made. These individual changes take the form of ChangeRequest objects, which we'll see more of in the next chapter. Each request is frozen, meaning it is immutable and will not change. Also none of the change requests will be marked as cancelled.

Using the actual ChangeRequest objects as the "events" has a number of advantages. First, the existing ChangeRequest subclasses already contain the information to accurately and completely describe the operation. Reusing these classes means we don't need a duplicate class structure or come up with a generic event class.

Second, the requests have all the state required for an event, plus they often will have more. For example, the DeleteBranchRequest has the actual location of the branch that was deleted (and in this way is not much different than a more generic event), but the CreateNodeRequest has the actual location of the created node along with the properties of that node. Additionally, the RemovePropertyRequest has the actual location of the node along with the name of the property that was removed. In many cases, these requests have all the information a more general event class might have but then hopefully enough information for many observers to use directly without having to read the graph to decide what actually changed.

Third, the requests that make up a Changes instance can actually be replayed. Consider the case of a cache that is backed by a RepositorySource, which might use an observer to keep the cache in sync. As the cache is notified of Changes, the cache can simply replay the changes against its source.

As we'll see in the next chapter, each connector is responsible for propagating the ChangeRequest objects to the connector's Observer. But that's not the only use of Observers. We'll also see later how the sequencing system uses Observers to monitor for changes in the graph content to determine which, if any, sequencers should be run. And, the JCR implementation also uses the observation framework to propagate those changes to JCR clients.

There is a lot of information stored in many of different places: databases, repositories, SCM systems, registries, file systems, services, etc. The purpose of the federation engine is to allow applications to use the JCR API to access that information as if it were all stored in a single JCR repository, but to really leave the information where it is.

Why not just copy or move the information into a JCR repository? Moving it is probably pretty difficult, since most likely there are existing applications that rely upon that information being where it is. All of those applications would break or have to change. And copying the information means that we'd have to continually synchronize the changes. This not only is a lot of work, but it often makes it difficult to know whether information is accurate and "the master" data.

ModeShape lets us leave information where it is, yet access it through the JCR API as if it were in one big repository. One major benefit is that existing applications that use the information in the original locations don't break, since they can keep using the information. But now our JCR clients can also access all the information, too. And if our federating ModeShape repository is configured to allow updates, JCR client applications can change the information in the repository and ModeShape will propagate those changes down to the original source, making those changes visible to all the other applications.

In short, all clients see the correct information, even when it changes in the underlying systems. But the JCR clients can get to all of the information in one spot, using one powerful standard API.

With ModeShape, your applications use the JCR 2.0 API to work with the repository, but the ModeShape repository transparently fetches the information from different kinds of repositories and storage systems, not just a single purpose-built store. This is fundamentally what makes ModeShape different.

How does ModeShape do this? At the heart of ModeShape and it's JCR implementation is a simple graph-based connector system. Essentially, ModeShape's JCR implementation uses a single connector to access all content:


That single repository connector could access:


Really, the federated connector gives us all kinds of possibilities, since we can use that connector on top of lots of connectors to other individual sources. This simple connector architecture is fundamentally what makes ModeShape so powerful and flexible. Along with a good library of connectors, which is what we're planning to create.

For instance, we want to build a connector to access existing relational databases so that some or all of the existing data (in whatever structure) can be accessed through JCR. For more information, check out our roadmap. Of course, if we don't have a connector to suit your needs, you can write your own.


It's even possible to put a different API layer on top of the connectors. For example, the new New I/O (JSR-203) API offers the opportunity to build new file system providers. This would be very straightforward to put on top of a JCR implementation, but it could be made even simpler by putting it on top of a ModeShape connector. In both cases, it'd be a trivial mapping from nodes that represent files and folders into JSR-203 files and directories, and events on those nodes could easily be translated into JSR-203 watch events. Then, simply choose a ModeShape connector and configure it to use the source you want to use.


Before we go further, let's define some terminology regarding connectors.

As an example, consider if we wanted ModeShape to give us access through JCR to the information contained in a relational database. We first have to develop a connector that allows us to interact with relational databases using JDBC. That connector would contain a JdbcAccessSource Java class that implements RepositorySource, and that has all of the various JavaBean properties for setting the name of the driver class, URL, username, password, and other properties. If we add a JavaBean property defining the JNDI name, our connector could look in JNDI to find a JDBC DataSource instance, perhaps already configured to use connection pools.

Note

Of course, before you develop a connector, you should probably check the list of connectors ModeShape already provides out of the box. And we've been adding new connectors with almost every release.

Our new connector might also have a JdbcAccessConnection Java class that implements the RepositoryConnection interface. This class would probably wrap a JDBC database connection, and would implement the execute(...) method such that the nodes exposed by the connector describe the database tables and their contents. For example, the connector might represent each database table as a node with the table's name, with properties that describe the table (e.g., the description, whether it's a temporary table), and with child nodes that represent rows in the table.

To use our connector in an application that uses ModeShape, we would need to create an instance of the JdbcAccessSource for each database instance that we want to access. If we have 3 MySQL databases, 9 Oracle databases, and 4 PostgreSQL databases, then we'd need to create a total of 16 JdbcAccessSource instances, each with the properties describing a single database instance. Those sources are then available for use by ModeShape components, including the JCR implementation.

So, we've so far learned what a connector is and how they're used to establish connections to the underlying sources and access the content in those sources. Next we'll show how connectors expose the notion of workspaces, and describe how to create your own connectors.

There may come a time when you want to tackle creating your own connector. Maybe the connectors we provide out-of-the-box don't work with your source. Maybe you want to use a different cache system. Maybe you have a system that you want to make available through a ModeShape repository. Or, maybe you're a contributor and want to help us round out our library with a new connector. No matter what the reason, creating a new connector is pretty straightforward, as we'll see in this section.

Creating a custom connector involves the following steps:

  1. Create a Maven 3 project for your connector;

  2. Implement the RepositorySource interface, using JavaBean properties for each bit of information the implementation will need to establish a connection to the source system. Then, implement the RepositoryConnection interface with a class that represents a connection to the source. The execute(ExecutionContext, Request) method should process any and all requests that may come down the pike, and the results of each request can be put directly on that request. This approach is pretty straightforward, and gives you ultimate freedom in terms of your class structure.

    Alternatively, an easier way to get a complete read-write connector would be to extend one of our two abstract RepositorySource implementations. If the content your connector exposes has unique keys (such as a unique string, UUID or other identifier), consider implementing MapRepositorySource, subclassing MapRepository, and using the existing MapRepositoryConnection implementation. This MapRepositoryConnection does most of the work already, relying upon your MapRepository subclass for anything that might be source-specific. (See the JavaDoc for details.) Or, if the content your connector exposes is simply path-based, consider implementing PathRepositorySource, subclassing PathRepository, and using the existing PathRepositoryConnection implementation. Again, PathRepositoryConnection class does almost all of the work and delegates to your PathRepository subclass for anything that might be source-specific. (See the JavaDoc for details.)

    Don't forget unit tests that verify that the connector is doing what it's expected to do. (If you'll be committing the connector code to the ModeShape project, please ensure that the unit tests can be run by others that may not have access to the source system. In this case, consider writing integration tests that can be easily configured to use different sources in different environments, and try to make the failure messages clear when the tests can't connect to the underlying source.)

  3. Configure ModeShape to use your connector. This may involve just registering the source with the RepositoryService, or it may involve adding a source to a configuration repository used by the federated repository.

  4. Deploy the JAR file with your connector (as well as any dependencies), and make them available to ModeShape in your application.

Let's go through each one of these steps in more detail.

The first step is to create the Maven 3 project that you can use to compile your code and build the JARs. Maven 3 automates a lot of the work, and since you're already set up to use Maven, using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 3, but then you'll have to get the required libraries and manage the compiling and building process yourself.

Note

ModeShape may provide in the future a Maven archetype for creating connector projects. If you'd find this useful and would like to help create it, please join the community.

In lieu of a Maven archetype, you may find it easier to start with a small existing connector project. The modeshape-connector-filesystem project is small and provides good example of implementing a path-based repository. See the Git repository: http://github.com/ModeShape/modeshape//tree/modeshape-2.5.0.Final/extensions/modeshape-connector-filesystem/

You can create your Maven project any way you'd like. For examples, see the Maven 3 documentation. Once you've done that, just add the dependencies in your project's pom.xml dependencies section:



<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-graph</artifactId>
  <version>2.4.0.Final</version>
</dependency>
     

This is the only dependency required for compiling a connector - Maven pulls in all of the dependencies needed by the 'modeshape-graph' artifact. Of course, you'll still have to add dependencies for any library your connector needs to talk to its underlying system.

As for testing, you probably will want to add more dependencies, such as those listed here:



<!-- ModeShape-related unit testing utilities and classes -->
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-graph</artifactId>
  <version>2.4.0.Final</version>
  <type>test-jar</type>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-common</artifactId>
  <version>2.4.0.Final</version>
  <type>test-jar</type>
  <scope>test</scope>
</dependency>
<!-- Unit testing -->
<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.4</version>
  <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.8.4</version>
    <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.hamcrest</groupId>
  <artifactId>hamcrest-library</artifactId>
  <version>1.1</version>
  <scope>test</scope>
</dependency>
<!-- Logging with Log4J -->
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.11</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>log4j</groupId>
  <artifactId>log4j</artifactId>
  <version>1.2.16</version>
  <scope>test</scope>
</dependency>
     

Testing ModeShape connectors does not require a JCR repository or the ModeShape services. (For more detail, see the testing section.) However, if you want to do integration testing with a JCR repository and the ModeShape services, you'll need additional dependencies (e.g., modeshape-repository and any other extensions).

At this point, your project should be set up correctly, and you're ready to move on to writing the Java implementation for your connector.

As mentioned earlier, a connector consists of the Java code that is used to access content from a system. Perhaps the most important class that makes up a connector is the implementation of the RepositorySource. This class is analogous to JDBC's DataSource in that it is instantiated to represent a single instance of a system that will be accessed, and it contains enough information (in the form of JavaBean properties) so that it can create connections to the source.

Why is the RepositorySource implementation a JavaBean? Well, this is the class that is instantiated, usually reflectively, and so a no-arg constructor is required. Using JavaBean properties makes it possible to reflect upon the object's class to determine the properties that can be set (using setters) and read (using getters). This means that an administrative application can instantiate, configure, and manage the objects that represent the actual sources, without having to know anything about the actual implementation.

So, your connector will need a public class that implements RepositorySource and provides JavaBean properties for any kind of inputs or options required to establish a connection to and interact with the underlying source. Most of the semantics of the class are defined by the RepositorySource and inherited interface. However, there are a few characteristics that are worth mentioning here.

The previous chapter talked about how connector expose their information through the graph language of ModeShape. This is true, except that we didn't dive into too much of the detail. ModeShape graphs have the notion of workspaces in which the content appears, and its very easy for clients using the graph to switch between workspaces. In fact, workspaces differ from each other in that they provide different views of the same information.

Consider a source control system, like SVN or CVS. These systems provide different views of the source code: a mainline development branch as well as other branches (or tags) commonly used for releases. So, just like one source file might appear in the mainline branch as well as the previous two release branches, a node in a repository source might appear in multiple workspaces.

However, each connector can kind of decide how (or whether) it uses workspaces. For example, there may be no overlap in the content between workspaces. Or a connector might only expose a single workspace (in other words, there's only one "default" workspace).

When your RepositorySource instance is put into the library within a running ModeShape system, the initialize(RepositoryContext) method will be called on the instance. The supplied RepositoryContext object represents the context in which the RepositorySource is running, and provides access to an ExecutionContext, a RepositoryConnectionFactory that can be used to obtain connections to other sources, and an Observer of your source that should be called with events describing the Changes being made within the source, either as a result of ChangeRequest operations being performed on this source, or as a result of operations being performed on the content from outside the source.

Each connector is responsible for determining whether and how long ModeShape is to cache the content made available by the connector. This is referred to as the caching policy, and consists of a time to live value representing the number of milliseconds that a piece of data may be cached. After the TTL has passed, the information is no longer used.

ModeShape allows a connector to use a flexible and powerful caching policy. First, each connection returns the default caching policy for all information returned by that connection. Often this policy can be configured via properties on the RepositorySource implementation. This is optional, meaning the connector can return null if it does not wish to have a default caching policy.

Second, the connector is able to override its default caching policy on individual requests (which we'll cover in the next section). Again, this is optional, meaning that a null caching policy on a request implies that the request has no overridden caching policy.

Third, if the connector has no default caching policy and none is set on the individual requests, ModeShape uses whatever caching policy is set up for that component using the connector. For example, the federating connector allows a default caching policy to be specified, and this policy is used should the sources being federated not define their own caching policy.

In summary, a connector has total control over whether and for how long the information it provides is cached.

Note

At this time, not every connector takes advantage of cache policies. However, it is anticipated that this will change.

Sometimes it is necessary (or easier) for a RepositorySource implementation to look up an object in JNDI. One example of this is the JBoss Cache connector: while the connector can instantiate a new JBoss Cache instance, more interesting use cases involve JBoss Cache instances that are set up for clustering and replication, something that is generally difficult to configure in a single JavaBean. Therefore the JBossCacheSource has optional JavaBean properties that define how it is to look up a JBoss Cache instance in JNDI.

This is a simple pattern that you may find useful in your connector. Basically, if your source implementation can look up an object in JNDI, simply use a single JavaBean String property that defines the full name that should be used to locate that object in JNDI. Usually it's best to include "Jndi" in the JavaBean property name so that administrative users understand the purpose of the property. (And some may suggest that any optional property also use the word "optional" in the property name.)

Another characteristic of a RepositorySource implementation is that it provides some hint as to whether it supports several features. This is defined on the interface as a method that returns a RepositorySourceCapabilities object. This class currently provides methods that say whether the connector supports updates, whether it supports same-name-siblings (SNS), and whether the connector supports listeners and events.

Note that these may be hard-coded values, or the connector's response may be determined at runtime by various factors. For example, a connector may interrogate the underlying system to decide whether it can support updates.

The RepositorySourceCapabilities can be used as is (the class is immutable), or it can be subclassed to provide more complex behavior. It is important, however, that the capabilities remain constant throughout the lifetime of the RepositorySource instance.

Note

Why a concrete class and not an interface? By using a concrete class, connectors inherit the default behavior. If additional capabilities need to be added to the class in future releases, connectors may not have to override the defaults. This provides some insulation against future enhancements to the connector framework.

As we'll see in the next section, the main method connectors have to process requests takes an ExecutionContext, which contains the JAAS security information of the subject performing the request. This means that the connector can use this to determine authentication and authorization information for each request.

Sometimes that is not sufficient. For example, it may be that the connector needs its own authorization information so that it can establish a connection (even if user-level privileges still use the ExecutionContext provided with each request). In this case, the RepositorySource implementation will probably need JavaBean properties that represent the connector's authentication information. This may take the form of a username and password, or it may be properties that are used to delegate authentication to JAAS. Either way, just realize that it's perfectly acceptable for the connector to require its own security properties.

One job of the RepositorySource implementation is to create connections to the underlying sources. Connections are represented by classes that implement the RepositoryConnection interface, and creating this class is the next step in writing a connector. This is what we'll cover in this section.

The RepositoryConnection interface is pretty straightforward:

/**
 * A connection to a repository source.
 *
 * These connections need not support concurrent operations by multiple threads.
 */
@NotThreadSafe
public interface RepositoryConnection {

    /**
     * Get the name for this repository source. This value should be the same as that returned
     * by the same RepositorySource that created this connection.
     * 
     * @return the identifier; never null or empty
     */
    String getSourceName();

    /**
     * Return the transactional resource associated with this connection. The transaction manager 
     * will use this resource to manage the participation of this connection in a distributed transaction.
     * 
     * @return the XA resource, or null if this connection is not aware of distributed transactions
     */
    XAResource getXAResource();

    /**
     * Ping the underlying system to determine if the connection is still valid and alive.
     * 
     * @param time the length of time to wait before timing out
     * @param unit the time unit to use; may not be null
     * @return true if this connection is still valid and can still be used, or false otherwise
     * @throws InterruptedException if the thread has been interrupted during the operation
     */
    boolean ping( long time, TimeUnit unit ) throws InterruptedException;

    /**
     * Get the default cache policy for this repository. If none is provided, a global cache policy
     * will be used.
     * 
     * @return the default cache policy
     */
    CachePolicy getDefaultCachePolicy();

    /**
     * Execute the supplied commands against this repository source.
     * 
     * @param context the environment in which the commands are being executed; never null
     * @param request the request to be executed; never null
     * @throws RepositorySourceException if there is a problem loading the node data
     */
    void execute( ExecutionContext context, Request request ) throws RepositorySourceException;

    /**
     * Close this connection to signal that it is no longer needed and that any accumulated 
     * resources are to be released.
     */
    void close();
}

While most of these methods are straightforward, a few warrant additional information. The ping(...) method allows ModeShape to check the connection to see if it is alive. This method can be used in a variety of situations, ranging from verifying that a RepositorySource's JavaBean properties are correct to ensuring that a connection is still alive before returning the connection from a connection pool.

The most important method on this interface, though, is the execute(...) method, which serves as the mechanism by which the component using the connector access and manipulates the content exposed by the connector. The first parameter to this method is the ExecutionContext, which contains the information about environment as well as the subject performing the request. This was discussed earlier.

The second parameter, however, represents a Request that is to be processed by the connector. Request objects can take many different forms, as there are different classes for each kind of request (see the previous chapter for details). Each request contains the information a connector needs to do the processing, and it also is the place where the connector places the results (or the error, if one occurs).

A connector is technically free to implement the execute(...) method in any way, as long as the semantics are maintained. But as discussed in the previous chapter, ModeShape provides a RequestProcessor class that can simplify writing your own connector and at the same time help insulate your connector from new kinds of requests that may be added in the future. The RequestProcessor is an abstract class that defines a process(...) method for each concrete Request subclass. In other words, there is a process(CompositeRequest) method, a process(ReadNodeRequest) method, and so on.

To use this in your connector, simply create a subclass of RequestProcessor, overriding all of the abstract methods and optionally overriding any of the other methods that have a default implementation.

Note

The RequestProcessor abstract class contains default implementations for quite a few of the process(...) methods, and these will be sufficient but probably not efficient or optimum. If you can provide a more efficient implementation given your source, feel free to do so. However, if performance is not a big issue, all of the concrete methods will provide the correct behavior. Keep things simple to start out - you can always provide better implementations later.

Also, make sure your RequestProcessor is properly broadcasting the changes made during execution. The RequestProcessor class has a recordChange(ChangeRequest) that can be called from each of the process(...) methods that take a ChangeRequest. The RequestProcessor enqueues these requests, and when the RequestProcessor is closed, the default implementation is to send a Changes to the Observer supplied into the constructor.

Then, in your connector's execute(ExecutionContext, Request) method, instantiate your RequestProcessor subclass and call its process(Request) method, passing in the execute(...) method's Request parameter. The RequestProcessor will determine the appropriate method given the actual Request object and will then invoke that method:

public void execute( final ExecutionContext context,
                     final Request request ) throws RepositorySourceException {
    String sourceName = // from the RepositorySource
    Observer observer = // from the RepositoryContext
    RequestProcessor processor = new CustomRequestProcessor(sourceName,context,observer);
    try {
        processor.process(request);
    } finally {
        processor.close();	// sends the accumulated ChangeRequests as a Changes to the Observer
    }
}

If you do this, the bulk of your connector implementation may be in the RequestProcessor implementation methods. This not only is pretty maintainable, it also lends itself to easier testing. And should any new request types be added in the future, your connector may work just fine without any changes. In fact, if the RequestProcessor class can implement meaningful methods for those new request types, your connector may "just work". Or, at least your connector will still be binary compatible, even if your connector won't support any of the new features.

Finally, how should the connector handle exceptions? As mentioned above, each Request object has a slot where the connector can set any exception encountered during processing. This not only handles the exception, but in the case of CompositeRequests it also correctly associates the problem with the request. However, it is perfectly acceptable to throw an exception if the connection becomes invalid (e.g., there is a communication failure) or if a fatal error would prevent subsequent requests from being processed.

Many repositories are used (at least in part) to manage files and other artifacts, including service definitions, policy files, images, media, documents, presentations, application components, reusable libraries, configuration files, application installations, databases schemas, management scripts, and so on. Unlocking the information buried within all of those files is what ModeShape sequencing is all about. As files are loaded into the repository, you ModeShape instance can automatically sequence these files to extract from their content meaningful information that can be stored in the repository, where it can then be searched, accessed, and analyzed using the JCR API.

Sequencers are just POJOs that implement a specific interface, and their job is to process a stream of data (supplied by ModeShape) to extract meaningful content that usually takes the form of a structured graph. Exactly what content is up to each sequencer implementation. For example, ModeShape comes with an image sequencer that extracts the simple metadata from different kinds of image files (e.g., JPEG, GIF, PNG, etc.). Another example is the Compact Node Definition (CND) sequencer that processes the CND files to extract and produce a structured representation of the node type definitions, property definitions, and child node definitions contained within the file.

Sequencers are configured to identify the kinds of nodes that the sequencers can work against. When content in the repository changes, ModeShape looks to see which (if any) sequencers might be able to run on the changed content. If any sequencer configurations do match, those sequencers are run against the content, and the structured graph output of the sequencers is then written back into the repository (at a location dictated by the sequencer configuration). And once that information is in the repository, it can be easily found and accessed via the standard JCR API.

In other words, ModeShape uses sequencers to help you extract more meaning from the artifacts you already are managing, and makes it much easier for applications to find and use all that valuable information. All without your applications doing anything extra.

The StreamSequencer interface defines the single method that must be implemented by a sequencer:

public interface StreamSequencer {

    /**
     * Sequence the data found in the supplied stream, placing the output 
     * information into the supplied map.
     *
     * @param stream the stream with the data to be sequenced; never null
     * @param output the output from the sequencing operation; never null
     * @param context the context for the sequencing operation; never null
     */
    void sequence( InputStream stream, SequencerOutput output, StreamSequencerContext context );
}

A new instance is created for each sequencing operation, so there is no need for the class to be synchronized or thread-safe. Additionally, when a sequencer configuration includes properties (see configuring a sequencer), ModeShape will set those properties on the StreamSequencer implementation using JavaBean-style setter methods. This makes it easy to define sequencer-specific properties on the sequencer configurations, while making it easy to implement with JavaBean-style setter methods.

Implementations are responsible for processing the content in the supplied InputStream content and generating structured content using the supplied SequencerOutput interface. The StreamSequencerContext provides additional details about the information that is being sequenced, including the location and properties of the node being sequenced, the MIME type of the node being sequenced, and a Problems object where the sequencer can record problems that aren't severe enough to warrant throwing an exception. The StreamSequencerContext also provides access to the ValueFactories that can be used to create Path, Name, and any other value objects.

The SequencerOutput interface is fairly easy to use, and its job is to hide from the sequencer all the specifics about where the output is being written. Therefore, the interface has only a few methods for implementations to call. Two methods set the property values on a node, while the other sets references to other nodes in the repository. Use these methods to describe the properties of the nodes you want to create, using relative paths for the nodes and valid JCR property names for properties and references. ModeShape will ensure that nodes are created or updated whenever they're needed.

public interface SequencerOutput {

  /**
   * Set the supplied property on the supplied node.  The allowable
   * values are any of the following:
   *   - primitives (which will be autoboxed)
   *   - String instances
   *   - String arrays
   *   - byte arrays
   *   - InputStream instances
   *   - Calendar instances
   *
   * @param nodePath the path to the node containing the property; 
   * may not be null
   * @param property the name of the property to be set
   * @param values the value(s) for the property; may be empty if 
   * any existing property is to be removed
   */
  void setProperty( String nodePath, String property, Object... values );
  void setProperty( Path nodePath, Name property, Object... values );

  /**
   * Set the supplied reference on the supplied node.
   *
   * @param nodePath the path to the node containing the property; 
   * may not be null
   * @param property the name of the property to be set
   * @param paths the paths to the referenced property, which may be
   * absolute paths or relative to the sequencer output node;
   * may be empty if any existing property is to be removed
   */
  void setReference( String nodePath, String property, String... paths );
}
		

Note

ModeShape will create nodes of type nt:unstructured unless you specify the value for the jcr:primaryType property. You can also specify the values for the jcr:mixinTypes property if you want to add mixins to any node.

Each sequencer must be configured to describe the areas or types of content that the sequencer is capable of handling. This is done by specifying these patterns using path expressions that identify the nodes (or node patterns) that should be sequenced and where to store the output generated by the sequencer. We'll see how to fully configure a sequencer in the next chapter, but before then let's dive into path expressions in more detail.

A path expression consist of two parts: a selection criteria (or an input path) and an output path:

  inputPath => outputPath 

The inputPath part defines an expression for the path of a node that is to be sequenced. Input paths consist of '/' separated segments, where each segment represents a pattern for a single node's name (including the same-name-sibling indexes) and '@' signifies a property name.

Let's first look at some simple examples:


With these simple examples, you can probably discern the most important rules. First, the '*' is a wildcard character that matches any character or sequence of characters in a node's name (or index if appearing in between square brackets), and can be used in conjunction with other characters (e.g., "*.txt").

Second, square brackets (i.e., '[' and ']') are used to match a node's same-name-sibiling index. You can put a single non-negative number or a comma-separated list of non-negative numbers. Use '0' to match a node that has no same-name-sibilings, or any positive number to match the specific same-name-sibling.

Third, combining two delimiters (e.g., "//") matches any sequence of nodes, regardless of what their names are or how many nodes. Often used with other patterns to identify nodes at any level matching other patterns. Three or more sequential slash characters are treated as two.

Many input paths can be created using just these simple rules. However, input paths can be more complicated. Here are some more examples:


These examples show a few more advanced rules. Parentheses (i.e., '(' and ')') can be used to define a set of options for names, as shown in the first and third rules. Whatever part of the selected node's path appears between the parentheses is captured for use within the output path. Thus, the first input path in the previous table would match node "/a/b", and "b" would be captured and could be used within the output path using "$1", where the number used in the output path identifies the parentheses.

Square brackets can also be used to specify criteria on a node's properties or children. Whatever appears in between the square brackets does not appear in the selected node.

So far, we've talked about how input paths and output paths are independent of the repository and workspace. However, there are times when it's desirable to configure sequencers to only work against content in a specific source and/or specific workspace. In these cases, it is possible to specify the repository name and workspace names before the path. For example:


Again, the rules are pretty straightforward. You can leave off the repository name and workspace name, or you can prepend the path with "{sourceNamePattern}:{workspaceNamePattern}:", where "{sourceNamePattern} is a regular-expression pattern used to match the applicable source names, and "{workspaceNamePattern} is a regular-expression pattern used to match the applicable workspace names. A blank pattern implies any match, and is a shorthand notation for ".*". Note that the repository names may not include forward slashes (e.g., '/') or colons (e.g., ':').

Let's go back to the previous code fragment and look at the first path expression:

  //(*.(jpg|jpeg|gif|bmp|pcx|png)[*])/jcr:content[@jcr:data] => /images/$1 

This matches a node named "jcr:content" with property "jcr:data" but no siblings with the same name, and that is a child of a node whose name ends with ".jpg", ".jpeg", ".gif", ".bmp", ".pcx", or ".png" that may have any same-name-sibling index. These nodes can appear at any level in the repository. Note how the input path capture the filename (the segment containing the file extension), including any same-name-sibling index. This filename is then used in the output path, which is where the sequenced content is placed.

The current release of ModeShape comes with eleven sequencers. However, it's very easy to create your own sequencers and to then configure ModeShape to use them in your own application.

Creating a custom sequencer involves the following steps:

  1. Create a Maven 3 project for your sequencer;

  2. Implement the StreamSequencer interface with your own implementation, and create unit tests to verify the functionality and expected behavior;

  3. Add the sequencer configuration to the ModeShape SequencingService in your application as described in the previous chapter; and

  4. Deploy the JAR file with your implementation (as well as any dependencies), and make them available to ModeShape in your application.

It's that simple.

The first step is to create the Maven 3 project that you can use to compile your code and build the JARs. Maven 3 automates a lot of the work, and since you're already set up to use Maven, using Maven for your project will save you a lot of time and effort. Of course, you don't have to use Maven 3, but then you'll have to get the required libraries and manage the compiling and building process yourself.

Note

ModeShape may provide in the future a Maven archetype for creating sequencer projects. If you'd find this useful and would like to help create it, please join the community.

In lieu of a Maven archetype, you may find it easier to start with a small existing sequencer project. The modeshape-sequencer-images project is a small, self-contained sequencer implementation that has only the minimal dependencies. See the Git repository: http://github.com/ModeShape/modeshape//tree/modeshape-2.5.0.Final/extensions/modeshape-sequencer-images/

You can create your Maven project any way you'd like. For examples, see the Maven 3 documentation. Once you've done that, just add the dependencies in your project's pom.xml dependencies section:



<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-graph</artifactId>
  <version>2.4.0.Final</version>
</dependency>
     

These are minimum dependencies required for compiling a sequencer. Of course, you'll have to add other dependencies that your sequencer needs.

As for testing, you probably will want to add more dependencies, such as those listed here:



<!-- ModeShape-related unit testing utilities and classes -->
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-graph</artifactId>
  <version>2.4.0.Final</version>
  <type>test-jar</type>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-common</artifactId>
  <version>2.4.0.Final</version>
  <type>test-jar</type>
  <scope>test</scope>
</dependency>
<!-- Unit testing -->
<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.4</version>
  <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.8.4</version>
    <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.hamcrest</groupId>
  <artifactId>hamcrest-library</artifactId>
  <version>1.1</version>
  <scope>test</scope>
</dependency>
<!-- Logging with Log4J -->
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.11</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>log4j</groupId>
  <artifactId>log4j</artifactId>
  <version>1.2.16</version>
  <scope>test</scope>
</dependency>    

Testing ModeShape sequencers does not require a JCR repository or the ModeShape services. (For more detail, see the testing section.) However, if you want to do integration testing with a JCR repository and the ModeShape services, you'll need additional dependencies for these libraries.



<!-- ModeShape JCR Repository -->
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-jcr</artifactId>
  <version>2.4.0.Final</version>
  <scope>test</scope>
</dependency>
<!-- Java Content Repository API -->
<dependency>
  <groupId>javax.jcr</groupId>
  <artifactId>jcr</artifactId>
  <version>2.0</version>
  <scope>test</scope>
</dependency>
     

At this point, your project should be set up correctly, and you're ready to move on to write your custom implementation of the StreamSequencer interface. As stated earlier, this should be fairly straightforward: process the stream and generate the output that's appropriate for the kind of file being sequenced.

Let's look at an example. Here is the complete code for the ImageMetadataSequencer implementation:

public class ImageMetadataSequencer implements StreamSequencer {

		/**
		 * {@inheritDoc}
		 * 
		 * @see StreamSequencer#sequence(InputStream, SequencerOutput, StreamSequencerContext)
		 */
		public void sequence( InputStream stream,
		                      SequencerOutput output,
		                      StreamSequencerContext context ) {

		    ImageMetadata metadata = new ImageMetadata();
		    metadata.setInput(stream);
		    metadata.setDetermineImageNumber(true);
		    metadata.setCollectComments(true);

		    // Process the image stream and extract the metadata ...
		    if (!metadata.check()) {
		        metadata = null;
		    }

		    // Generate the output graph if we found useful metadata ...
		    if (metadata != null) {
		        PathFactory pathFactory = context.getValueFactories().getPathFactory();
		        Path metadataNode = pathFactory.createRelativePath(ImageMetadataLexicon.METADATA_NODE);

		        // Place the image metadata into the output map ...
		        output.setProperty(metadataNode, JcrLexicon.PRIMARY_TYPE, "image:metadata");
		        // output.psetProperty(metadataNode, nameFactory.create(IMAGE_MIXINS), "");
		        output.setProperty(metadataNode, JcrLexicon.MIMETYPE, metadata.getMimeType());
		        // output.setProperty(metadataNode, nameFactory.create(IMAGE_ENCODING), "");
		        output.setProperty(metadataNode, ImageMetadataLexicon.FORMAT_NAME, metadata.getFormatName());
		        output.setProperty(metadataNode, ImageMetadataLexicon.WIDTH, metadata.getWidth());
		        output.setProperty(metadataNode, ImageMetadataLexicon.HEIGHT, metadata.getHeight());
		        output.setProperty(metadataNode, ImageMetadataLexicon.BITS_PER_PIXEL, metadata.getBitsPerPixel());
		        output.setProperty(metadataNode, ImageMetadataLexicon.PROGRESSIVE, metadata.isProgressive());
		        output.setProperty(metadataNode, ImageMetadataLexicon.NUMBER_OF_IMAGES, metadata.getNumberOfImages());
		        output.setProperty(metadataNode, ImageMetadataLexicon.PHYSICAL_WIDTH_DPI, metadata.getPhysicalWidthDpi());
		        output.setProperty(metadataNode, ImageMetadataLexicon.PHYSICAL_HEIGHT_DPI, metadata.getPhysicalHeightDpi());
		        output.setProperty(metadataNode, ImageMetadataLexicon.PHYSICAL_WIDTH_INCHES, metadata.getPhysicalWidthInch());
		        output.setProperty(metadataNode, ImageMetadataLexicon.PHYSICAL_HEIGHT_INCHES, metadata.getPhysicalHeightInch());
		    }
		}
}

where the ImageMetadataLexicon class contains the Name constants and is defined as:

	/**
	 * A lexicon of names used within the image sequencer.
	 */
	@Immutable
	public class ImageMetadataLexicon {

	    public static class Namespace {
	        public static final String URI = "http://www.modeshape.org/images/1.0";
	        public static final String PREFIX = "image";
	    }

	    public static final Name METADATA_NODE = new BasicName(Namespace.URI, "metadata");
	    public static final Name FORMAT_NAME = new BasicName(Namespace.URI, "formatName");
	    public static final Name WIDTH = new BasicName(Namespace.URI, "width");
	    public static final Name HEIGHT = new BasicName(Namespace.URI, "height");
	    public static final Name BITS_PER_PIXEL = new BasicName(Namespace.URI, "bitsPerPixel");
	    public static final Name PROGRESSIVE = new BasicName(Namespace.URI, "progressive");
	    public static final Name NUMBER_OF_IMAGES = new BasicName(Namespace.URI, "numberOfImages");
	    public static final Name PHYSICAL_WIDTH_DPI = new BasicName(Namespace.URI, "physicalWidthDpi");
	    public static final Name PHYSICAL_HEIGHT_DPI = new BasicName(Namespace.URI, "physicalHeightDpi");
	    public static final Name PHYSICAL_WIDTH_INCHES = new BasicName(Namespace.URI, "physicalWidthInches");
	    public static final Name PHYSICAL_HEIGHT_INCHES = new BasicName(Namespace.URI, "physicalHeightInches");

	}

Notice how the image metadata is extracted and the output graph is generated. A single node is created with the name image:metadata and with the image:metadata node type. No mixins are defined for the node, but several properties are set on the node using the values obtained from the image metadata. After this method returns, the constructed graph will be saved to the repository in all of the places defined by its configuration. (This is why only relative paths are used in the sequencer.)

The sequencing framework was designed to make testing sequencers much easier. In particular, the StreamSequencer interface does not make use of the JCR API. So instead of requiring a fully-configured JCR repository and ModeShape system, unit tests for a sequencer can focus on testing that the content is processed correctly and the desired output graph is generated.

Note

For a complete example of a sequencer unit test, see the ImageMetadataSequencerTest unit test in the org.modeshape.sequencer.images package of the modeshape-sequencers-image project.

The following code fragment shows one way of testing a sequencer, using JUnit 4.4 assertions and some of the classes made available by ModeShape. Of course, this example code does not do any error handling and does not make all the assertions a real test would.

StreamSequencer sequencer = new ImageMetadataSequencer();
MockSequencerOutput output = new MockSequencerOutput();
MockSequencerContext context = new MockSequencerContext();
InputStream stream = null;
try {
    stream = this.getClass().getClassLoader().getResource("caution.gif").openStream();
    sequencer.sequence(stream,output,context);   // writes to 'output'
    assertThat(output.getPropertyValues("image:metadata", "jcr:primaryType"), 
               is(new Object[] {"image:metadata"}));
    assertThat(output.getPropertyValues("image:metadata", "jcr:mimeType"), 
               is(new Object[] {"image/gif"}));
    // ... make more assertions here
    assertThat(output.hasReferences(), is(false));
} finally {
    stream.close();
}

It's also useful to test that a sequencer produces no output for something it should not understand:

Sequencer sequencer = new ImageMetadataSequencer();
MockSequencerOutput output = new MockSequencerOutput();
MockSequencerContext context = new MockSequencerContext();
InputStream stream = null;
try {
    stream = this.getClass().getClassLoader().getResource("caution.pict").openStream();
    sequencer.sequence(stream,output,context);   // writes to 'output'
    assertThat(output.hasProperties(), is(false));
    assertThat(output.hasReferences(), is(false));
} finally {
    stream.close();
}

These are just two simple tests that show ways of testing a sequencer. Some tests may get quite involved, especially if a lot of output data is produced.

It may also be useful to create some integration tests that configure ModeShape to use a custom sequencer, and to then upload content using the JCR API, verifying that the custom sequencer did run. However, remember that ModeShape runs sequencers asynchronously in the background, and you must synchronize your tests to ensure that the sequencers have a chance to run before checking the results.

The ModeShape project provides an implementation of the JCR 2.0 API, which is built on top of the core libraries discussed earlier. This implementation as well as a number of JCR-related components are described in this part of the document. But before talking about how to use the JCR API with a ModeShape repository, first we need to show how to set up a ModeShape engine.

Table of Contents

6. Configuration
6.1. Configuring ModeShape
6.1.1. Configuration Files
6.1.2. Programmatic Configuration
6.1.3. Loading from a Configuration Repository
6.2. JCR Repository options
6.3. Repository system content
6.4. Query index directory
6.5. Clustering
6.5.1. Enabling Clustering in ModeShape
6.5.2. JGroups configuration
6.6. Using ModeShape in Web Applications
6.6.1. Deploying ModeShape to JBoss AS
6.6.2. Deploying ModeShape to Tomcat
6.7. Setting the Classpath
6.7.1. Building against ModeShape via Maven
6.7.2. Add dependencies for logging
6.7.3. Building against ModeShape via JARs
6.8. What's next
7. Using the JCR API with ModeShape
7.1. What's new in JCR 2.0?
7.1.1. Connecting
7.1.2. Identifiers
7.1.3. Binary Values
7.1.4. Node Type Management
7.1.5. Queries
7.1.6. Workspace Management
7.1.7. Observation
7.1.8. Locking
7.1.9. Versioning
7.1.10. Importing and Exporting
7.1.11. Shareable Nodes
7.1.12. Orderable Child Nodes
7.1.13. Paths
7.1.14. getItem(String)
7.2. Obtaining a JCR Repository
7.2.1. Configuration File URLs
7.2.2. Using JNDI URLs
7.2.3. Cleaning Up after JcrRepositoryFactory
7.3. ModeShape's JcrEngine
7.4. Creating JCR Sessions
7.4.1. Using JAAS
7.4.2. Using HTTP Servlet security
7.4.3. Guest (Anonymous) User Access
7.4.4. Using Custom Security
7.5. JCR Specification Support
7.5.1. Required features
7.5.2. Optional features
7.5.3. TCK Compatibility features
7.5.4. JCR Security
7.5.5. Built-In Node Types
7.5.6. Custom Node Type Registration
7.6. Summary
8. Querying and Searching using JCR
8.1. JCR Query API
8.2. JCR XPath Query Language
8.2.1. Column Specifiers
8.2.2. Type Constraints
8.2.3. Property Constraints
8.2.4. Path Constraints
8.2.5. Ordering Specifiers
8.2.6. Miscellaneous
8.3. JCR-SQL Query Language
8.3.1. Queries
8.4. JCR-SQL2 Query Language
8.4.1. Queries
8.4.2. Sources
8.4.3. Joins
8.4.4. Equi-Join Conditions
8.4.5. Same-Node Join Conditions
8.4.6. Child-Node Join Conditions
8.4.7. Descendant-Node Join Conditions
8.4.8. Constraints
8.4.9. And Constraints
8.4.10. Or Constraints
8.4.11. Not Constraints
8.4.12. Comparison Constraints
8.4.13. Between Constraints
8.4.14. Property Existence Constraints
8.4.15. Set Constraints
8.4.16. Full-text Search Constraints
8.4.17. Same-Node Constraint
8.4.18. Child-Node Constraints
8.4.19. Descendant-Node Constraints
8.4.20. Paths and Names
8.4.21. Static Operands
8.4.22. Bind Variables
8.4.23. Subqueries
8.4.24. Dynamic Operands
8.4.25. Ordering
8.4.26. Columns
8.4.27. Limit and Offset
8.4.28. Pseudo-columns
8.4.29. Example JCR-SQL2 queries
8.5. Full-Text Search Language
8.5.1. Full-text Search Language
8.6. JCR Query Object Model (JCR-QOM) API
9. Accessing ModeShape Remotely
9.1. The ModeShape WebDAV Server
9.1.1. Configuring the ModeShape WebDAV Server
9.1.2. Deploying the ModeShape WebDAV Server
9.2. The ModeShape REST Server
9.2.1. Supported Resources and Methods
9.2.2. Configuring the ModeShape REST Server
9.2.3. Deploying the ModeShape REST Server
9.2.4. ModeShape REST Client API
9.3. Repository Providers
9.4. Summary

Using ModeShape within your application is actually quite straightforward, and with JCR 2.0 it is possible for your application to do everything using only the JCR 2.0 API. Your application will first obtain a javax.jcr.Repository instance, and will use that object to create sessions through which your application will read, modify, search, or monitor content in the repository.

However, before you can use ModeShape, you need to configure it, and that's what this chapter covers.

There really are three options:

Each of these approaches has their obvious advantages, so the choice of which one to use is entirely up to you.

By far the easiest approach to defining your ModeShape configuration is to use a configuration file. As mentioned above, you'll want to do this if your application uses the standard and implementation-independent RepositoryFactory mechanism to obtain the JCR Repository reference.

Here is an example configuration file used in the repository example covered in the Getting Started document, though it has been slightly simplified for clarity):


<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">
  <!-- 
  Define the JCR repositories 
  -->
  <mode:repositories>
      <!-- 
      Define a JCR repository that accesses the 'Cars' source directly.
      This of course is optional, since we could access the same content through 'vehicles'.
      -->
      <mode:repository jcr:name="car repository" mode:source="Cars">
          <mode:options jcr:primaryType="mode:options">
              <mode:option jcr:name="jaasLoginConfigName" mode:value="modeshape-jcr"/>
          </mode:options>
          <mode:descriptors>
            <!-- 
                This adds a JCR Repository descriptor named "myDescriptor" with a value of "foo".
                So this code:
                Repository repo = ...;
                System.out.println(repo.getDescriptor("myDescriptor");

                Will now print out "foo".
            -->
            <myDescriptor mode:value="foo" />
          </mode:descriptors>
          <!-- 
                Import the custom node types defined in the named files. The values
                can be an absolute path to a classpath resource, an absolute file system
                path, a relative path on the file system (relative to where the process was
                started from), or a resolvable URL. If more than one node type definition 
                file is needed, the files can be listed as a single comma-delimited string
                in the 'mode:resource' attribute of the 'jcr:nodeTypes' element, or listed 
                individually using multiple mode:resource child elements (as shown below).
            -->
          <jcr:nodeTypes>
               <mode:resource>/org/example/my-node-types.cnd</mode:resource>
               <mode:resource>/org/example/additional-node-types.cnd</mode:resource>
            </jcr:nodeTypes>
      </mode:repository>
  </mode:repositories>
   <!-- 
   Define the sources for the content. These sources are directly accessible using the 
   ModeShape-specific Graph API.
   -->
   <mode:sources jcr:primaryType="nt:unstructured">
       <mode:source jcr:name="Cars" 
              mode:classname="org.modeshape.graph.connector.inmemory.InMemoryRepositorySource" 
              mode:retryLimit="3" mode:defaultWorkspaceName="workspace1">
               <mode:predefinedWorkspaceNames>workspace2</mode:predefinedWorkspaceNames>
               <mode:predefinedWorkspaceNames>workspace3</mode:predefinedWorkspaceNames>
       </mode:source>
   </mode:sources>
   <!-- 
   Define the sequencers. This is an optional section. For this example, we're not using any sequencers. 
   -->
   <mode:sequencers>
       <!--mode:sequencer jcr:name="Image Sequencer">
           <mode:classname>
            org.modeshape.sequencer.image.ImageMetadataSequencer
           </mode:classname>
           <mode:description>Image metadata sequencer</mode:description>        
           <mode:pathExpression>/foo/source => /foo/target</mode:pathExpression>
           <mode:pathExpression>/bar/source => /bar/target</mode:pathExpression>
       </mode:sequencer-->
   </mode:sequencers>
   <mode:mimeTypeDetectors>
       <mode:mimeTypeDetector jcr:name="Detector" 
                             mode:description="Standard extension-based MIME type detector"/>
   </mode:mimeTypeDetectors>
</configuration>

Most likely you'll define your configuration in a file. But there are some situations where it's far easier - even necessary - to programmatically configure ModeShape. For example, you may not be able to predefine a configuration, because it needs parameters and information known only at runtime.

One obvious approach is to write code that takes this new information and generates a ModeShape configuration file. The challenge here is that a sizable amount of code may be required just to write out the XML file in the correct format.

Perhaps an easier approach is to use the ModeShape JcrConfiguration class to programmatically construct the configuration, and then have it write the configuration out to a file. You can even load a starting configuration, programmatically modify it, and write it out to a file. From there, your application can use the standard and implementation-independent JCR API to find and use the Repository instances.

The JcrConfiguration class is used by ModeShape to read in the configuration files, but it was also designed to have an easy-to-use API that makes it easy to configure each of the different kinds of components, especially when using an IDE with code completion. The next few sections describe how to configure the various parts of a ModeShape configuration.

Each repository source definition must include the name of the RepositorySource class as well as each bean property that should be set on the object:

JcrConfiguration config = ...

config.repositorySource("source A")
     .usingClass(InMemoryRepositorySource.class)
     .setDescription("The repository for our content")
     .setProperty("defaultWorkspaceName", workspaceName);

This example defines an in-memory source with the name "source A", a description, and a single "defaultWorkspaceName" bean property. Different RepositorySource implementations will the bean properties that are required and optional. Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from the classpath or from a specific classpath).

Note

Each time repositorySource(String) is called, it will either load the existing definition with the supplied name or will create a new definition if one does not already exist. To remove a definition, simply call remove() on the result of repositorySource(String). The set of existing definitions can be accessed with the repositorySources() method.

Each defined sequencer must specify the name of the StreamSequencer implementation class as well as the path expressions defining which nodes should be sequenced and the output paths defining where the sequencer output should be placed (often as a function of the input path expression).

JcrConfiguration config = ...

config.sequencer("Image Sequencer")
     .usingClass("org.modeshape.sequencer.image.ImageMetadataSequencer")
     .loadedFromClasspath()
     .setDescription("Sequences image files to extract the characteristics of the image")
     .sequencingFrom("//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd)[*])/jcr:content[@jcr:data]")
     .andOutputtingTo("/images/$1");

This shows an example of a sequencer definition named "Image Sequencer" that uses the ImageMetadataSequencer class (loaded from the classpath), that is to sequence the "jcr:data" property on any new or changed nodes that are named "jcr:content" below a parent node with a name ending in ".jpg", ".jpeg", ".gif", ".bmp", ".pcx", ".iff", ".ras", ".pbm", ".pgm", ".ppm" or ".psd". The output of the sequencing operation should be placed at the "/images/$1" node, where the "$1" value is captured as the name of the parent node. (The capture groups work the same way as regular expressions.) Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from the classpath or from a specific classpath).

Note

Each time sequencer(String) is called, it will either load the existing definition with the supplied name or will create a new definition if one does not already exist. To remove a definition, simply call remove() on the result of sequencer(String). The set of existing definitions can be accessed with the sequencers() method.

Note that in addition to including a description for the configuration, it is also possible to set sequencer-specific properties using the setProperty(String,String[]) method. When ModeShape uses this configuration to set up a sequencing operation, it will instantiate the StreamSequencer class and will call a JavaBean-style setter method for each property. For example, calling setProperty("foo","val1") on the sequencer configuration will mean that ModeShape will instantiate the sequencer implementation and will look for a setFoo(String) method on the sequencer implementation class, and use that method (if found) to pass the "val1" value to the instance.

Each defined MIME type detector must specify the name of the MimeTypeDetector implementation class as well as any other bean properties required by the implementation.

JcrConfiguration config = ...

config.mimeTypeDetector("Extension Detector")
     .usingClass(org.modeshape.graph.mimetype.ExtensionBasedMimeTypeDetector.class);

Of course, the class can be specified as Class reference or a string (followed by whether the class should be loaded from the classpath or from a specific classpath).

Note

Each time mimeTypeDetector(String) is called, it will either load the existing definition with the supplied name or will create a new definition if one does not already exist. To remove a definition, simply call remove() on the result of mimeTypeDetector(String). The set of existing definitions can be accessed with the mimeTypeDetectors() method.

Regardless of how the JcrConfiguration is loaded, it can also be stored to a file or stream in an XML format that can then be reloaded in the future to recreate the configuration. This makes it very easy to programmatically generate a configuration file once while being able to load that same configuration at a later time (or on a different instance).

JcrConfiguration config = ...

String pathToFile = ...
// Save any changes before this point in the configuration repository ...
configuration.save();
// And now write out the configuration repository to a file ...
configuration.storeTo(pathToFile);

This will create a file at pathToFile that contains the current configuration in XML format. Any changes made after the most recent call to the save() method on the JcrConfiguration object will not be saved in the configuration repository, and thus will not be in the generated file. The generated XML will not be formatted, so it may be a bit hard to read. (Any good XML editor will be able to format it for readability.)

So far, we've seen how to load a configuration from a file, how to programmatically define a configuration and write it out to a file. In this section, we'll see how ModeShape can load its configuration from another repository.

The first step is to create and configure the RepositorySource instance that we'll use to access the repository where the configuration is stored. Then, create a JcrConfiguration instance and load from this source:

RepositorySource configSource = ...

JcrConfiguration config = new JcrConfiguration();
configuration.loadFrom(configSource);

The loadFrom(...) method can be called any number of times, but each time it is called it completely wipes out any current notion of the configuration and replaces it with the configuration found in the file.

There is an optional second parameter that defines the name of the workspace in the supplied source where the configuration content can be found. It is not needed if the workspace is the source's default workspace. There is an optional third parameter that defines the Path within the configuration repository identifying the parent node of the various configuration nodes. If not specified, it assumes "/". This makes it possible for the configuration content to be located at a different location in the hierarchical structure. (This is not often required, but it is very useful if you ModeShape configuration file is embedded within another XML file.)

Once the JcrConfiguration has been loaded from a RepositorySource, the JcrConfiguration instance can be used to modify the configuration and then save those changes back to the repository. This technique can be used to place a configuration into a repository (such as a database) for the first time:

RepositorySource configSource = ... // a RepositorySource to an empty source

JcrConfiguration config = new JcrConfiguration();
// Bind the configuration to the repository source (which is initially empty)...
configuration.loadFrom(configSource);
// Now load a configuration from a file (or construct one programmatically) ...
String pathToFile = ... 
configuration.loadFrom(pathToFile);
// Now save the configuration into the source ...
configuration.save();

Now you can load this configuration in multiple processes, using the approach mentioned above.

ModeShape JCR repositories have a number of behaviors that can be controlled from within the configuration. These are known as repository options, and all have sensible defaults. However, they do allow you to better configure the JCR repository instances to best suit your needs.

As mentioned earlier, these options can be set programmatically or within the configuration file. When setting up the configuration programmatically, the actual enum literal values must be used, and all values are String literals:

JcrConfiguration config = ...

config.repository("repository A")
     .setOption(JcrRepository.Option.JAAS_LOGIN_CONFIG_NAME, "modeshape-jcr");

When using a configuration file, you set the option within the "mode:options" fragment under the "mode:repository" section. Each option fragment typically looks something like this:

<mode:option jcr:name="jaasLoginConfigName" mode:value="modeshape-jcr"/>

where the "jcr:name" XML attribute value contains the lower-camel-case form of the option literal, and the "mode:value" XML attribute value contains the repository option value. In the example above, the "jaasLoginConfigName" is the option name, and "modeshape-jcr" is the option value. An alternative representation is to set the name using the XML element name and set the primary type with an XML attribute. Thus, this fragment is equivalent to the previous listing:

<jaasLoginConfigName jcr:primaryType="mode:option" mode:value="modeshape-jcr"/>

The following table describes all of the current repository options.

Table 6.1. JCR Repository Options

OptionDescription
jaasLoginConfigName The JAAS JAAS application configuration name that specifies which login module should be used to validate credentials. By default, "modeshape-jcr" is used. The enumeration literal is Option.JAAS_LOGIN_CONFIG_NAME
systemSourceName

The name of the source (and optionally the workspace in the source) where the "/jcr:system" branch should be stored. The format is "name of workspace@name of source", or simply "name of source" if the default workspace is to be used. If this option is not used, a transient in-memory source will be used. Note that all leading and trailing whitespaces is removed for both the source name and workspace name. Thus, a value of "@" implies a zero-length workspace name and zero-length source name. Also, any use of the '@' character in source and workspace names must be escaped with a preceding backslash.

The enumeration literal is Option.SYSTEM_SOURCE_NAME

anonymousUserRoles A comma-delimited list of default roles provided for anonymous access. A null or empty value for this option means that anonymous access is disabled. The enumeration literal is Option.ANONYMOUS_USER_ROLES
exposeWorksapceNamesInDescription

A boolean flag that indicates whether a complete list of workspace names should be exposed in the custom repository descriptor "org.modeshape.jcr.api.Repository.REPOSITORY_WORKSPACES". If this option is set to true, then any code that can access the repository can retrieve a complete list of workspace names through the javax.jcr.Repository.getDescriptor(String) method without logging in. The default value is 'true', meaning that the descriptor is populated.

Since some ModeShape installations may consider the list of workspace names to be restricted information and limit the ability of some or all users to see a complete list of workspace names, this option can be set to "false" to disable this capability. If this option is set to "false", the "org.modeshape.jcr.api.Repository.REPOSITORY_WORKSPACES" descriptor will not be set.

The enumeration literal is Option.EXPOSE_WORKSPACE_NAMES_IN_DESCRIPTOR

repositoryJndiLocation A string property that when specified tells the JcrEngine where to put the Repository in JNDI. Assumes that you have write access to the JNDI tree. If no value set, then the Repository will not be bound to JNDI. The enumeration literal is Option.REPOSITORY_JNDI_LOCATION
queryExecutionEnabled A boolean flag that specifies whether this repository is expected to execute searches and queries. If client applications will never perform searches or queries, then maintaining the query indexes is an unnecessary overhead, and can be disabled. Note that this is merely a hint, and that searches and queries might still work when this is set to 'false'. The default is 'true', meaning that clients can execute searches and queries. The enumeration literal is Option.QUERY_EXECUTION_ENABLED
queryIndexDirectory

The system may maintain a set of indexes that improve the performance of searching and querying the content. These size of these indexes depend upon the size of the content being stored, and thus may consume a significant amount of space. This option defines a location on the file system where this repository may (if needed) store indexes so they don't consume large amounts of memory.

If specified, the value must be a valid path to a writable directory on the file system. If the path specifies a non-existant location, the repository may attempt to create the missing directories. The path may be absolute or relative to the location where this VM was started. If the specified location is not a readable and writable directory (or cannot be created as such), then this will generate an exception when the repository is created.

The default value is null, meaning the search indexes may not be stored on the local file system and, if needed, will be stored within memory.

The enumeration literal is Option.QUERY_INDEX_DIRECTORY

queryIndexesUpdatedSynchronously

An advanced boolean flag that specifies whether updates to the indexes (if used) should be made synchronously, meaning that a call to Session.save() will not return until the search indexes have been completely updated. The benefit of synchronous updates is that a search or query performed immediately after a save() will operate upon content that was just changed. The downside is that the save() operation will take longer.

With asynchronous updates, however, the only work done during a save() invocation is that required to persist the changes in the underlying repository source, while changes to the search indexes are made in a different thread that may not run immediately. In this case, there may be an indeterminate lag before searching or querying after a save() will operate upon the changed content.

The default is value 'false', meaning the updates are performed asynchronously.

The enumeration literal is Option.QUERY_INDEXES_UPDATED_SYNCHRONOUSLY

queryIndexesRebuiltSynchronously

An advanced boolean flag that specifies whether the indexes should be rebuilt synchronously when the repository restarts. If this flag is set to 'true', query indexes for each workspace in the repository will be rebuilt synchronously the first time that the repository is accessed (e.g., at the first login). If this flag is set to 'false', the query indexes for each workspace in the repository will be rebuilt asynchronously.

Rebuilding the indexes synchronously can cause very significant latency in the initial repository access if the repository contains a significant amount of content that must be reindexed. Updating the indexes asynchronously eliminates this latency, but repository queries may generate inconsistent results while the indexes are being updated. That is, query results may refer to content that is no longer in the repository or may fail to include appropriate results for nodes that had been added to the repository.

The default is value 'true', meaning the rebuilds are performed synchronously.

The enumeration literal is Option.QUERY_INDEXES_REBUILT_SYNCHRONOUSLY

rebuildQueryIndexOnStartup

An advanced setting that specifies the strategy used to determine which query indexes need to be rebuilt when the repository restarts. ModeShape currently supports two strategies:

  • A value of "always" dictates that the query index for every workspace in the repository will be rebuilt each time that the repository restarts. This can sharply increase the startup time for the repository, particularly if the queryIndexesRebuiltSynchronously option is set to 'true' (the default). However, this strategy ensures that any repository content that was modified outside of the repository (e.g., files in a FileSystemSource that were directly modified on the file system) are properly indexed.

  • A value of "ifMissing" indicates that indexes should only be rebuilt if they do not currently exist or are obviously invalid. This strategy is always the most appropriate strategy for non-clustered repositories with repository sources that provide exclusive control over content (e.g., the InfinispanSource, the JpaSource) as it greatly reduces repository startup time for repositories with significant amounts of content.

Note that repositories that do not configure the queryIndexDirectory option will always use an in-memory index. This type of index will not be persisted across repository restarts and will require ModeShape to rebuild the indexes each time the repository starts up even if the "ifMissing" strategy is specified.

The "always" strategy is used by default and in cases where the option's value does not case-independently match the one of these two values. This was the only strategy available prior to ModeShape 2.5.0.Beta3.

The enumeration literal is Option.QUERY_INDEXES_REBUILT_SYNCHRONOUSLY, and the values are RebuildQueryIndexOnStartupOption.ALWAYS and RebuildQueryIndexOnStartupOption.IF_MISSING

projectNodeTypes An advanced boolean flag that defines whether or not the node types should be exposed as content under the "/jcr:system/jcr:nodeTypes" node. Value is either "true" or "false" (default). The enumeration literal is Option.PROJECT_NODE_TYPES
readDepth An advanced integer flag that specifies the depth of the subgraphs that should be loaded from the connectors during normal read operations. The default value is 1. The enumeration literal is Option.READ_DEPTH
indexReadDepth An advanced integer flag that specifies the depth of the subgraphs that should be loaded from the connectors during indexing operations. The default value is 4. The enumeration literal is Option.INDEX_READ_DEPTH
tablesIncludeColumnsForInheritedProperties

An advanced boolean flag that dictates whether the property definitions inherited from supertypes should be represented in the corresponding queryable table with columns. The JCR specification gives implementations some flexibility, so ModeShape allows this to be controlled.

When this option is set to "false", then each table has only those columns representing the (single-valued) property definitions explicitly defined by the node type. When this option is set to "true" (the default), each table will contain columns for each of the (single-valued) property definitions explicitly defined on the node type and inherited by the node type from all of the supertypes.

The enumeration literal is Option.TABLES_INCLUDE_COLUMNS_FOR_INHERITED_PROPERTIES

performReferentialIntegrityChecks

An advanced boolean flag that specifies whether referential integrity checks should be performed upon Session.save(). If set to "true" (the default), referential integrity checks are performed to ensure that nodes referenced by other nodes cannot be removed. If the value is set to "false", then these referential integrity checks will not be performed when removing nodes.

Many people generally discourage the use of REFERENCE properties because of the overhead and the need for referential integrity. These concerns are somewhat mitigated by the introduction in JCR 2.0 of the WEAKREFERENCE property type, which are excluded from referential integrity checks.

This option is available for those cases where REFERENCE properties are not used within your content, and thus the referential integrity checks will never find violations. In these cases, you may disable these checks to slightly improve performance of delete operations.

The enumeration literal is Option.PERFORM_REFERENTIAL_INTEGRITY_CHECKS

versionHistoryStructure

An advanced flag that specifies the structure used to store version histories under the "/jcr:system/jcr:versionStorage" branch. The JCR 2.0 specification does not predefine any particular structure, but ModeShape supports two types:

  • A value of "flat" dictates that all "nt:versionHistory" nodes are stored with a name matching the UUID of the versioned node and directly under the "/jcr:system/jcr:versionStorage" node. For example, given a "mix:versionable" node with the UUID fae2b929-c5ef-4ce5-9fa1-514779ca0ae3, the corresponding " nt:versionHistory" node will be at "/jcr:system/jcr:versionStorage/fae2b929-c5ef-4ce5-9fa1-514779ca0ae3".

  • A value of "hierarchical" dictates that all "nt:versionHistory" nodes are stored under a hierarchical structure created by the first 8 characters of the UUID string. For example, given a "mix:versionable" node with the UUID fae2b929-c5ef-4ce5-9fa1-514779ca0ae3, the corresponding "nt:versionHistory" node will be at "/jcr:system/jcr:versionStorage/fa/e2/b9/29/c5ef-4ce5-9fa1-514779ca0ae3.

The "hierarchical" structure is used by default and in cases where the option's value does not case-independently match the one of these two values.

The enumeration literal is Option.VERSION_HISTORY_STRUCTURE, and the values are VersionHistoryOption.FLAT and VersionHistoryOption.HIERARCHICAL

removeDerivedContentWithOriginal

An advanced boolean flag that dictates whether content derived from other content (e.g., that output by sequencers) should be automatically (re)moved when the content from which it was derived is (re)moved from the repository. For example, consider that a file is uploaded and sequenced, and that the content derived from the file is stored in the repository. When that file is (re)moved, this option dictates whether the derived content should also be (re)moved automatically.

By default this option has a value of "true", ensuring that all derived content is deleted whenever the original content is deleted. A value of "false" will leave the derived content.

The enumeration literal is Option.REMOVE_DERIVED_CONTENT_WITH_ORIGINAL

useAnonymousAccessOnFailedLogin

A boolean flag that indicates whether any failed, non-anonymous login attempts will automatically cause the Session to be created using the anonymous context. If anonymous logins are not enabled (with the anonymousUserRoles option), then the login will still fail.

By default this option has a value of "false", ensuring that non-anonymous login attempts either succeed as the requested user or fail.

The enumeration literal is Option.USE_ANONYMOUS_ACCESS_ON_FAILED_LOGIN


Each JCR repository contains information about the system in the "/jcr:system" area of the repository content. All of this system content applies to the whole repository (e.g., namespaces, node types, locks, versions, etc.) and therefore every session for each workspace sees the exact same "/jcr:system" content.

ModeShape implements this behavior by storing all "/jcr:system" content in a separate workspace, and then using federation to project that content into each workspace. This ensures that all workspaces see the same content, without having to duplicate the "/jcr:system" content in each workspace and ensure those copies stay in sync. Federation is better than duplication.

By default, ModeShape creates this separate system workspace in a transient, in-memory store. This works great for some simplistic cases, but this doesn't work when using clustering, versioning, or dynamically registering namespaces or adding or changing node types. This is because these features all rely upon changing or adding content in the "/jcr:system" area. For example, version histories are stored under "/jcr:system/jcr:versionStorage", node types under "/jcr:system/jcr:versionStorage", and namespaces under "/jcr:system/mode:namespaces".

In these situations, it is necessary to persist the system content in a repository source, and if clustering is enabled this source needs to be accessible to all members of the cluster. Many times, the easiest approach is to simply define an extra workspace in your repository source where the system content can be stored. It's also possible to define a separate repository source with a separate workspace for each repository's system content. (Using a separate source is required when the repository is using a single repository source that can only store limited kinds of nodes, like the file system connector or Subversion connector that can only store nt:file and nt:folder nodes.)

You should always configure each ModeShape repository with a source for its system workspace by using the SYSTEM_SOURCE_NAME repository option with a value that defines the name of source and name of the workspace in that source where the system content should be stored, in the format:

  workspaceName@sourceName

This specifies the system content should be stored in the workspace named "workspaceName" in the "sourceName" repository source.

The system content can be stored in any repository source capable of storing any content and, in the case of clustering, that is accessible across multiple processes. For most people, this will mean a relational database. Here is an abbreviated example of an XML configuration that defines a source for the system storage (in a MySQL database) and a repository that uses it:


<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" 
                 xmlns:jcr="http://www.jcp.org/jcr/1.0">
  <mode:repositories>
    <mode:repository jcr:name="car repository" mode:source="Cars">
      <mode:options jcr:primaryType="mode:options">
        <!-- Explicitly specify the "system" workspace in the "SystemStore" source. -->
        <systemSourceName jcr:primaryType="mode:option" 
                               mode:value="system@SystemStore"/>
        ...
      </mode:options>
      ...
    </mode:repository>
    ...
  </mode:repositories>
  <mode:sources jcr:primaryType="nt:unstructured">
    <!-- One source for the "/jcr:system" content ... -->
    <mode:source jcr:name="SystemStore" 
                 mode:classname="org.modeshape.connector.store.jpa.JpaSource"
                 mode:description="The database store for our system content"
                 mode:dialect="org.hibernate.dialect.MySQLDialect"
                 mode:dataSourceJndiName="java:/MyDataSource"
                 mode:defaultWorkspaceName="system"
                 mode:autoGenerateSchema="validate"/>    
    </mode:sources>
    <!-- An another source for the regular content ... -->
    <mode:source jcr:name="Cars" 
                 mode:classname="org.modeshape.connector.store.jpa.JpaSource"
                 mode:description="The database store for our system content"
                 mode:dialect="org.hibernate.dialect.MySQLDialect"
                 mode:dataSourceJndiName="java:/MyDataSource"
                 mode:defaultWorkspaceName="workspace1"
                 mode:autoGenerateSchema="validate">
      <mode:predefinedWorkspaceNames>workspace1</mode:predefinedWorkspaceNames>
      <mode:predefinedWorkspaceNames>workspace2</mode:predefinedWorkspaceNames>
      <mode:predefinedWorkspaceNames>workspace3</mode:predefinedWorkspaceNames>
    </mode:sources>
    ...
  </mode:sources>
  ...
</configuration>

Of course, you can always use a separate workspace in your primary source, too:


<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">
  <mode:repositories>
    <mode:repository jcr:name="car repository" mode:source="Cars">
      <mode:options jcr:primaryType="mode:options">
        <!-- Explicitly specify the "system" workspace in the "Cars" source. -->
        <systemSourceName jcr:primaryType="mode:option" mode:value="system@Cars"/>
        ...
      </mode:options>
      ...
    </mode:repository>
    ...
  </mode:repositories>
  <mode:sources jcr:primaryType="nt:unstructured">
    <!-- 
    Define one source for the regular content with a special workspace for the system content.
    -->
    <mode:source jcr:name="Cars" 
                 mode:classname="org.modeshape.connector.store.jpa.JpaSource"
                 mode:description="The database store for our system content"
                 mode:dialect="org.hibernate.dialect.MySQLDialect"
                 mode:dataSourceJndiName="java:/MyDataSource"
                 mode:defaultWorkspaceName="workspace1"
                 mode:autoGenerateSchema="validate">
      <mode:predefinedWorkspaceNames>workspace1</mode:predefinedWorkspaceNames>    
      <mode:predefinedWorkspaceNames>workspace2</mode:predefinedWorkspaceNames>    
      <mode:predefinedWorkspaceNames>workspace3</mode:predefinedWorkspaceNames>    
      <mode:predefinedWorkspaceNames>system</mode:predefinedWorkspaceNames>    
    </mode:sources>
    ...
  </mode:sources>
  ...
</configuration>

ModeShape maintains a set of index files that are used to process queries and searches, using the Lucene search engine. By default, these indexes are kept in memory (primarily because it's easy to configure). But most production configurations should not store them in-memory but should instead store these index files on the local file system.

Each ModeShape repository can be configured where the indexes should be stored, using the "QUERY_INDEX_DIRECTORY" repository option (see JcrRepository.Option) when using the programmatic API or the "queryIndexDirectory" repository option in a ModeShape configuration file. The value of this setting should be the absolute or relative path to the folder where the indexes should be stored. In this directory, ModeShape will store the index files for each workspace in a folder named similarly to the workspace. Note that ModeShape will dynamically create these workspace folders as required.

For example, here is part of a ModeShape configuration file that specifies these index files should be stored in the "data/car_repository/indexes" folder, relative to the folder where the JVM process was started:


<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" 
                 xmlns:jcr="http://www.jcp.org/jcr/1.0">
  <mode:repositories>
    <mode:repository jcr:name="car repository" mode:source="Cars">
      <mode:options jcr:primaryType="mode:options">
        <!-- Explicitly specify the directory where the index files should be stored. -->
        <queryIndexDirectory jcr:primaryType="mode:option" 
                               mode:value="data/car_repository/indexes"/>
        ...
      </mode:options>
      ...
    </mode:repository>
    ...
  </mode:repositories>
  ...
</configuration>

ModeShape 2.1 introduced the ability to have a cluster of JcrEngine instances distributed across multiple processes while behaving as though everything was happening in a single process. With clusters, the workload can be distributed across multiple machines, increasing tolerance against failure while allowing ModeShape to scale out to handle more workload.

ModeShape clustering uses the powerful, flexible and mature JGroups library to handle all network communication within the cluster. JGroups provides a wealth of capabilities, including automatically detecting new engines in the cluster (called discovery), reliable multicast communication, and automatic determination of the master node in the cluster. JGroups has a flexible protocol stack, works across firewalls, WANs and LANs, and supports multiple transport protocols, failure detection, reliable unicast and multicast message transmission, and encryption.

By default, clustering is not enabled. This means that each JcrEngine instance is self-contained and will not be aware of changes made in other JcrEngine instances. This is perfect in many lightweight or embedded scenarios, because it does not introduce any overhead associated with network communication.

However, clustering ModeShape is very easy and requires only a few simple steps:

  1. Enable clustering in the ModeShape configuration (more on this in a bit).

  2. Include the modeshape-clustering module in your application, either by JAR file or Maven dependency.

  3. Start (or deploy) multiple JcrEngine instances using the same configuration. For embedded scenarios, this means simply instantiating multiple JcrEngine instances in multiple processes. In other cases, this means deploying ModeShape to multiple servers (either using the WebDAV server, REST server, or into JNDI and using with your own applications).

Your JCR-based application doesn't need to change in any other ways. Any implementations registered in Sessions on any of the engines will be notified of all events, regardless of whether those events were due to changes in the local or remote engines.

It also doesn't matter how many Repository instances are defined in the configuration and managed by each JcrEngine instance: each engine in the cluster can manage multiple named repositories. ModeShape ensures that all Sessions for a named repository see the changes made to that repository, regardless of where those sessions are located in the cluster. Likewise, those same changes will not be visible to the sessions for any other named repository.

A ModeShape configuration can have a "clustering" fragment that defines the name of the cluster and the JGroups configuration:


<mode:clustering clusterName="modeshape-cluster" configuration="jgroups-modeshape.xml" />

The "clusterName" is a string that is a logical name of the cluster; all engines connecting to the same name form a cluster. Any messages multicast from one engine in the cluster will be received by all other members of the cluster. Again, the cluster name is independent of the repositories managed by th

The "configuration" value is a string that is one of:

The format of this JGroups configuration will be described in the next section. If the "configuration" property is not given, ModeShape will use the default JGroups configuration (as defined by the specific JGroups version).

Note

Note that all engines in the cluster must have the same JGroups configuration. In fact, all engines in the cluster will almost always have exactly the same ModeShape configuration.

Here is an example of a "clustering" fragment defining a cluster named "modeshape-cluster" using the JGroups configuration defined in the "jgroups-modeshape.xml" file at the supplied URL:


<clustering clusterName="modeshape-cluster" 
      configuration="file://some/path/jgroups-modeshape.xml" />

This next example uses the JGroups configuration defined in the "jgroups-modeshape.xml" resource file on the classpath (or as an absolute path on a *nix system):


<clustering clusterName="modeshape-cluster" 
      configuration="/some/path/jgroups-modeshape.xml" />

Next is an example that specifies the JGroups configuration using the older string representation of the form:


<clustering clusterName="modeshape-cluster" 
      configuration="PROTOCOL(param=value;param=value):PROTOCOL:PROTOCOL" />

Of course, the "configuration" property can be specified as a child element, too (line breaks added for readability):


<clustering clusterName="modeshape-cluster">
         <configuration>UDP(max_bundle_size="60000":max_bundle_timeout="30"):
                          PING(timeout="2000"):...</configuration>
</clustering>

And finally an example that specifies the JGroups configuration using the newer XML representation (line breaks added for readability):


<clustering clusterName="modeshape-cluster">
     <configuration><![CDATA[<config><UDP max_bundle_size="60000" 
          max_bundle_timeout="30".../><PING timeout="2000"/>...</config>]]>
     </configuration>
</clustering>

Note that the this example uses a child XML element for the "configuration", along with a CDATA section, so that the XML configuration can be nested within the ModeShape configuration.

Warning

Remember to specify the system workspace name for each repository that is clustered.

The JGroups configuration defines a protocol stack that is used for messaging, starting with the bottom-most protocol and ending with the top-most protocol.

An example of the newer-style JGroups XML format is:


<config>
   <UDP
        mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
        mcast_port="${jgroups.udp.mcast_port:45588}"
        discard_incompatible_packets="true"
        max_bundle_size="60000"
        max_bundle_timeout="30"
        ip_ttl="${jgroups.udp.ip_ttl:2}"
        enable_bundling="true"
        thread_pool.enabled="true"
        thread_pool.min_threads="1"
        thread_pool.max_threads="25"
        thread_pool.keep_alive_time="5000"
        thread_pool.queue_enabled="false"
        thread_pool.queue_max_size="100"
        thread_pool.rejection_policy="Run"
        oob_thread_pool.enabled="true"
        oob_thread_pool.min_threads="1"
        oob_thread_pool.max_threads="8"
        oob_thread_pool.keep_alive_time="5000"
        oob_thread_pool.queue_enabled="false"
        oob_thread_pool.queue_max_size="100"
        oob_thread_pool.rejection_policy="Run"/>
   <PING timeout="2000"
           num_initial_members="3"/>
   <MERGE2 max_interval="30000"
           min_interval="10000"/>
   <FD_SOCK/>
   <FD timeout="10000" max_tries="5" />
   <VERIFY_SUSPECT timeout="1500"  />
   <BARRIER />
   <pbcast.NAKACK
                  use_mcast_xmit="false" gc_lag="0"
                  retransmit_timeout="300,600,1200,2400,4800"
                  discard_delivered_msgs="true"/>
   <UNICAST timeout="300,600,1200,2400,3600"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                  max_bytes="400000"/>
   <VIEW_SYNC avg_send_interval="60000"   />
   <pbcast.GMS print_local_addr="true" join_timeout="3000"
               view_bundling="true"/>
   <FC max_credits="20000000"
                   min_threshold="0.10"/>
   <FRAG2 frag_size="60000"  />
   <pbcast.STATE_TRANSFER  />
</config>

The older-style JGroups string format is of the form:

PROTOCOL(param1=value1:param2=value2):PROTOCOL:PROTOCOL

This format is generally harder to read and generally discouraged. Nevertheless, here's an example of the older string format defining the same stack as the previous XML example (line breaks have been added for readability):

UDP(
        mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}":
        mcast_port="${jgroups.udp.mcast_port:45588}":
        discard_incompatible_packets="true":
        max_bundle_size="60000":
        max_bundle_timeout="30":
        ip_ttl="${jgroups.udp.ip_ttl:2}":
        enable_bundling="true":
        thread_pool.enabled="true":
        thread_pool.min_threads="1":
        thread_pool.max_threads="25":
        thread_pool.keep_alive_time="5000":
        thread_pool.queue_enabled="false":
        thread_pool.queue_max_size="100":
        thread_pool.rejection_policy="Run":
        oob_thread_pool.enabled="true":
        oob_thread_pool.min_threads="1":
        oob_thread_pool.max_threads="8":
        oob_thread_pool.keep_alive_time="5000":
        oob_thread_pool.queue_enabled="false":
        oob_thread_pool.queue_max_size="100":
        oob_thread_pool.rejection_policy="Run"):
   PING(timeout="2000":
        num_initial_members="3"):
   MERGE2(max_interval="30000":
          min_interval="10000"):
   FD_SOCK:
   FD(timeout="10000":max_tries="5"):
   VERIFY_SUSPECT(timeout="1500"):
   BARRIER:
   pbcast.NAKACK(use_mcast_xmit="false":gc_lag="0":
                 retransmit_timeout="300,600,1200,2400,4800":
                 discard_delivered_msgs="true"):
   UNICAST(timeout="300,600,1200,2400,3600"):
   pbcast.STABLE(stability_delay="1000":desired_avg_gossip="50000":
                 max_bytes="400000"):
   VIEW_SYNC(avg_send_interval="60000"):
   pbcast.GMS(print_local_addr="true":join_timeout="3000"
              view_bundling="true"):
   FC(max_credits="20000000":
      min_threshold="0.10"):
   FRAG2(frag_size="60000"):
   pbcast.STATE_TRANSFER

For more details on how to configure the JGroups stack, see the JGroups Manual.

Note

JGroups is also used in Infinispan, JBoss AS, and other open source projects, and many of the JGroups configurations will work with ModeShape deployed in those same environments. For example, this blog post describes how to configure JGroups with three autodiscovery options available on Amazon EC2.

Sometimes your applications can simply define a configuration file and use the RepositoryFactory to access its repositories. This is very straightforward, and this is useful for many simple applications because the application will then own the ModeShape instance(s).

Web applications are a different story. Often, you would rather your web application not contain the code that initializes the JCR repository, but instead configure ModeShape as a central, shared service that all of your web applications can simply reference and use.

Unfortunately, there's not single way to deploy ModeShape into any web or application server, since they all have slightly different deployment and configuration techniques. The remainder of this section will talk about how to deploy ModeShape to two popular open source servers.

The JBoss Application Server (or JBoss AS) is a very popular open source Java application server, with an extremely healthy and active community. ModeShape offers a way to deploy ModeShape into JBoss AS as as a central, shared service that can be monitored and administered using the embedded console.

ModeShape provides a downloadable ZIP file that can be unzipped into any JBoss AS profile. When you do this, that profile will contain all the files necessary for ModeShape to run when the server is started. The default configuration is for a single, in-memory repository with two users. However, other than basic playing, you will want to edit the configuration files to define a more robust, persistent and secure configuration.

This JBoss AS distribution ZIP file contains several components:

  • JAR files for the JCR 2.0 API and ModeShape's small extensions to the JCR API on the global classpath (that is, in the "lib/" directory). These APIs are available to all deployed applications, services and components. The JCR API contains the "javax.jcr" packages and has no other dependencies. ModeShape's extensions define interfaces in the "org.modeshape.jcr.api" packages; these extend a few of the standard JCR API interfaces and add several methods to make them more useful.

  • The ModeShape Service, represented as an exploded JAR file in the "deploy" directory. This is where the JcrEngine is running, though any application (or other JBoss service) can access its JCR Repository instances using the standard RepositoryFactory approach (covered in the next chapter) with JNDI URLs:

     jndi:jcr/local?repositoryName=repository

    By default, there is a single in-memory repository named "repository", but this can be changed by simply editing the "deploy/modeshape-services.jar/managedConfigRepository.xml" configuration file. All of ModeShape's standard sequencers and connectors (and JARs for their dependencies) are included, meaning they can be configured for use without worrying about adding JARs to the classpath. Feel free to remove any of the JARs are not needed for your custom configuration.

  • A pair of JAAS properties files, located in the "conf/props/" directory, that come out of the box with an "admin" user (with password "admin") that has full read, write, and administration privileges, and a "guest" user (with password "guest") that has only read and write privileges. Simply edit these files to change users, passwords, and roles, or to configure JAAS differently.

  • The ModeShape RESTful API, represented as an exploded WAR file in the "deploy" directory. This allows remote applications to interact with ModeShape to access and manipulate repository content using a RESTful API that uses JSON in the requests and responses. All ModeShape repositories can be accessed, and authentication is done using the ModeShape JAAS configuration.

  • The ModeShape WebDAV API, represented as an exploded WAR file in the "deploy" directory. This web application allows external clients to access and manipulate the content in the ModeShape repositories using the standard WebDAV protocol. For example, you can mount a repository (or parts of it) as a network drive on most operating systems, and then upload or download files and folders using standard OS operations and graphical tools. All ModeShape repositories can be accessed, and authentication is done using the ModeShape JAAS configuration.

  • A plugin for the embedded JBoss AS console, represented as a WAR file in the "deploy" directory. This plugin also works with RHQ administration, monitoring, alerting, operational control and configuration system. (We plan to add more metrics and operations over the next few releases, as we gain more experience using the ModeShape RHQ plugin.)

  • A JDBC driver that allows applications also deployed on the same JBoss AS instance to query the repositories through JDBC. This driver is on the global classpath so it can be used in any deployed component. A single JDBC DataSource is also configured in the "deploy/modeshape-services.jar/modeshape-jdbc-ds.xml" file to use the single default in-memory repository available out of the box. Simply edit this file to add or change the DataSource definitions. The driver can also be used in a separate JVM to issue queries and access database metadata.

  • A remote client JAR that can be used by Java applications to use JDBC or the RESTful API to remotely access a ModeShape repository deployed on JBoss AS. This JAR includes ModeShape's full JDBC driver.

Here are the contents of this file:

conf/
conf/props/
conf/props/modeshape-roles.properties  
conf/props/modeshape-users.properties  
lib/
lib/jcr-2.0.jar         
lib/modeshape-jcr-api-2.5.0.Final.jar  
lib/modeshape-jdbc-local-2.5.0.Final.jar  
deploy/
deploy/modeshape-jboss-beans.xml  
deploy/modeshape-services.jar/
deploy/modeshape-services.jar/META-INF/
deploy/modeshape-services.jar/aperture-1.1.0.Beta1.jar 
deploy/modeshape-services.jar/joda-time-1.6.jar  
deploy/modeshape-services.jar/lucene-analyzers-3.0.2.jar  
deploy/modeshape-services.jar/lucene-core-3.0.2.jar  
deploy/modeshape-services.jar/lucene-regex-3.0.2.jar  
deploy/modeshape-services.jar/lucene-snowball-3.0.2.jar  
deploy/modeshape-services.jar/lucene-misc-3.0.2.jar  
deploy/modeshape-services.jar/poi-3.6.jar  
deploy/modeshape-services.jar/poi-scratchpad-3.6.jar  
deploy/modeshape-services.jar/managedConfigRepository.xml  
deploy/modeshape-services.jar/rdf2go.api-4.6.2.jar
deploy/modeshape-services.jar/META-INF/jboss-beans.xml  
deploy/modeshape-services.jar/modeshape-cnd-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-common-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-filesystem-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-infinispan-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-jbosscache-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-jcr-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-jdbc-metadata-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-store-jpa-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-connector-svn-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-graph-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-jbossas-service-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-jcr-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-jdbc-ds.xml  
deploy/modeshape-services.jar/modeshape-mimetype-detector-aperture-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-repository-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-search-lucene-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-classfile-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-cnd-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-ddl-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-java-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-jbpm-jpdl-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-msoffice-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-teiid-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-text-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-xml-2.5.0.Final.jar  
deploy/modeshape-services.jar/modeshape-sequencer-zip-2.5.0.Final.jar  
deploy/modeshape-rest.war/
deploy/modeshape-rest.war/META-INF/
deploy/modeshape-rest.war/WEB-INF/
deploy/modeshape-rest.war/WEB-INF/lib/
deploy/modeshape-rest.war/META-INF/MANIFEST.MF  
deploy/modeshape-rest.war/WEB-INF/jboss-web.xml  
deploy/modeshape-rest.war/WEB-INF/lib/jaxrs-api-1.2.1.GA.jar  
deploy/modeshape-rest.war/WEB-INF/lib/jettison-1.1.jar  
deploy/modeshape-rest.war/WEB-INF/lib/modeshape-jcr-2.5.0.Final.jar  
deploy/modeshape-rest.war/WEB-INF/lib/modeshape-web-jcr-2.5.0.Final.jar  
deploy/modeshape-rest.war/WEB-INF/lib/modeshape-web-jcr-rest-2.5.0.Final.jar  
deploy/modeshape-rest.war/WEB-INF/lib/resteasy-jaxb-provider-1.2.1.GA.jar  
deploy/modeshape-rest.war/WEB-INF/lib/resteasy-jaxrs-1.2.1.GA.jar  
deploy/modeshape-rest.war/WEB-INF/lib/resteasy-jettison-provider-1.2.1.GA.jar  
deploy/modeshape-rest.war/WEB-INF/lib/scannotation-1.0.2.jar  
deploy/modeshape-rest.war/WEB-INF/web.xml  
deploy/modeshape-webdav.war/
deploy/modeshape-webdav.war/WEB-INF/
deploy/modeshape-webdav.war/WEB-INF/lib/
deploy/modeshape-webdav.war/WEB-INF/jboss-web.xml  
deploy/modeshape-webdav.war/WEB-INF/lib/aperture-1.1.0.Beta1.jar  
deploy/modeshape-webdav.war/WEB-INF/lib/modeshape-jcr-2.5.0.Final.jar  
deploy/modeshape-webdav.war/WEB-INF/lib/modeshape-mimetype-detector-aperture-2.5.0.Final.jar  
deploy/modeshape-webdav.war/WEB-INF/lib/modeshape-web-jcr-2.5.0.Final.jar  
deploy/modeshape-webdav.war/WEB-INF/lib/modeshape-web-jcr-webdav-2.5.0.Final.jar  
deploy/modeshape-webdav.war/WEB-INF/lib/webdav-servlet-2.0.jar  
deploy/modeshape-webdav.war/WEB-INF/web.xml  
deploy/admin-console.war/
deploy/admin-console.war/plugins/
deploy/admin-console.war/plugins/modeshape-jbossas-console-2.5.0.Final.jar  

Your web application or JBoss service can use one of the JCR Repository instances running inside the ModeShape service by simply using the RepositoryFactory technique described earlier, with a URL such as:

 jndi:jcr/local?repositoryName=repository

Be sure to use the correct repository name.

Since the JCR API JAR is on the global classpath, your web application can use the JCR API without having to include the JAR file in your application's WAR file. In fact, your application will likely get ClassCastExceptions if it does include the JCR API in its WAR file. Plus, if needed, your application can use ModeShape's "org.modeshape.jcr.api" extensions to the JCR API (again, on the global classpath), and should not need or use any of the classes or interfaces in the ModeShape implementation.

Each kind of web server or application server is different, but all servlet containers do provide a way of configuring objects and placing them into JNDI. ModeShape provides a JndiRepositoryFactory class that implements and that can be used in the server's configuration. The JndiRepositoryFactory requires two properties:

  • configFile is the path to the configuration file resource, which must be available on the classpath

  • repositoryName is the name of a JCR repository that exists in the JCR configuration and that will be made available by this JNDI entry

Here's an example of a fragment of the conf/context.xml for Tomcat:


<Resource name="jcr/local" 
          auth="Container"
          type="javax.jcr.Repository"
          factory="org.modeshape.jcr.JndiRepositoryFactory"
          configFile="/resource/path/to/configuration.xml"
          repositoryName="Test Repository Source" />

Note that it is possible to have multiple Resource entries. The JndiRepositoryFactory ensures that only one JcrEngine is instantiated, but that a Repository instance is registered for each entry.

Before the server can start, however, all of the ModeShape jars need to be placed on the classpath for the server. JAAS also needs to be configured, and this can be done using the application server's configuration or in your web application if you're using a simple servlet container. For more details, see the Reference Guide.

Note

The ModeShape community has solicited input on how we can make it easier to consume and use ModeShape in applications that do not use Maven. Check out the discussion thread, and please add any suggestions or opinions!

Then, your web application needs to reference the Resource and state its requirements in its web.xml:


<resource-env-ref>
   <description>Repository</description>
   <resource-env-ref-name>jcr/local</resource-env-ref-name>
   <resource-env-ref-type>javax.jcr.Repository</resource-env-ref-type>
</resource-env-ref>

Note that the value of resource-env-ref-name matches the value of the name attribute on the <Resource> tag in the context.xml described above. This is a must.

At this point, your web application can perform the lookup of the Repository object by using JNDI directly (or the more standard RepositoryFactory technique shown in the next chapter), create and use a Session, and then close the Session. Here's an example of a JSP page that does this:



<%@ page import="javax.naming.*, javax.jcr.*, org.jboss.security.config.IDTrustConfiguration" %>
<%!
static {
    // Initialize IDTrust
    IDTrustConfiguration idtrustConfig = new IDTrustConfiguration();
    try {
        idtrustConfig.config("security/jaas.conf.xml");
    } catch (Exception ex) {
        throw new IllegalStateException(ex);
    }
}
%>
<%
Session sess = null;
try {
    InitialContext initCtx = new InitialContext();
    Context envCtx = (Context) initCtx.lookup("java:comp/env");
    Repository repo = (Repository) envCtx.lookup("jcr/local");
    sess = repo.login(new SimpleCredentials("readwrite", "readwrite".toCharArray()));
    // Do something interesting with the Session ...
    out.println(sess.getRootNode().getPrimaryNodeType().getName());
} catch (Exception ex) {
    ex.printStackTrace();
} finally {
    if (sess != null) sess.logout();
}
%>

Since this uses a servlet container, there is no JAAS implementation configured, so note the loading of IDTrust to create the JAAS realm. (To make this work in Tomcat, the security folder that contains the jaas.conf.xml, users.properties, and roles.properties needs to be moved into the %CATALINA_HOME% directory.)

Note

If you deploy your application to JBoss AS or EAP and deploy ModeShape as a service, your application doesn't have to do anything with JAAS, since that's provided by the platform.

Before you deploy ModeShape into your application or its environment, you need to make sure that all of the ModeShape JARs are on the appropriate classpath. Two different scenarios are covered in this section: Maven-based, and using JARs with the traditional classpath.

By far the easiest way to use ModeShape is to use Maven, because with just a few lines of code, Maven will automatically pull all the JARs and source for all of the ModeShape libraries as well as everything those libraries need. All of ModeShape's artifacts for each release are published in the new JBoss Maven repository under the "org.modeshape" group ID.

The JBoss Maven repository not only contains all of the artifacts for ModeShape and other open source projects hosted at JBoss.org, but it also proxies quite a few other repositories that contain many other third-party libraries.

So if you're using Maven (or Ivy), first make sure your project knows about this new JBoss Maven repository. One way to do this is to add the following to your project POM (you'll still likely want to use other Maven repositories for third-party artifacts):


<repositories>
  <repository>
    <id>jboss</id>
    <url>http://repository.jboss.org/nexus/content/groups/public/</url>
  </repository>
</repositories>

Or, you can add this information to your ~/.m2/settings.xml file. For more information, see the JBoss wiki page.

Then, simply modify your project's POM by adding dependencies on the ModeShape JCR library:


<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-jcr</artifactId>
  <version>2.4.0.Final</version>
</dependency>

This adds only the minimal libraries required to use ModeShape. If your application is going to use clustering, you'll need to also depend upon the clustering module:


<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-clustering</artifactId>
  <version>2.4.0.Final</version>
</dependency>

You also need to add dependencies for each of the connectors and sequencers you want to use. Here is the list of available sequencers:


<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-cnd</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-ddl</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshapce</groupId>
  <artifactId>modeshape-sequencer-images</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-classfile</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-java</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-mp3</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-msoffice</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-xml</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-teiid</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-text</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-sequencer-zip</artifactId>
  <version>2.4.0.Final</version>
</dependency>

Here is the list of available connectors:


<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-filesystem</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-infinispan</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-jcr</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-jbosscache</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-jdbc-metadata</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-store-jpa</artifactId>
  <version>2.4.0.Final</version>
</dependency>
<dependency>
  <groupId>org.modeshape</groupId>
  <artifactId>modeshape-connector-svn</artifactId>
  <version>2.4.0.Final</version>
</dependency>

The sequencer and connector libraries you choose, plus every third-party library they need, will be pulled in automatically by Maven into your project.

ModeShape is designed to use the same logging framework as your application, and it uses SLF4J to accomplish this. In other words, ModeShape depends upon the SLF4J API library, but requires you to provide provide a logging implementation as well as the appropriate SLF4J binding JAR.

For example, if your application is using Log4J, your application will already have a dependency for it, and so ModeShape log messages will be sent to the same logging system used in your application, you need to add a dependency to the SLF4J-to-Log4J binding JAR:


<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.11</version>
</dependency>
<dependency>
  <groupId>log4j</groupId>
  <artifactId>log4j</artifactId>
  <version>1.2.16</version>
</dependency>

Of course, SLF4J works with other logging frameworks, too. Some logging implementations (such as LogBack) implement the SLF4J API natively, meaning they require no binding JAR. For details on the options and how to configure them, see the SLF4J manual.

If your application doesn't use Maven, you'll need to obtain the ModeShape JARs and place them onto your application's classpath. ModeShape provides a single download with all of the JARs for all ModeShape components and all dependencies. This file contains the following:

  • modeshape-jcr-2.5.0.Final-jar-with-dependencies.jar contains all of the classes (except those under javax.jcr) necessary to run the core ModeShape JCR repository engine using the in-memory connector and the federating connector;

  • one modeshape-connector-<type>-2.5.0.Final-jar-with-dependencies.jar for each type of connector, each containing all of the classes necessary for that connector, designed to be added to the classpath after the modeshape-jcr-2.5.0.Final-jar-with-dependencies.jar file;

  • one modeshape-sequencer-<type>-2.5.0.Final-jar-with-dependencies.jar for each type of connector, each containing all of the classes necessary for that sequencer, designed to be added to the classpath after the modeshape-jcr-2.5.0.Final-jar-with-dependencies.jar file;

  • modeshape-mimetype-detector-aperture-2.5.0.Final-jar-with-dependencies.jar containing all of the classes necessary for detecting the MIME type of files based upon their name and/or content, designed to be added to the classpath after the modeshape-jcr-2.5.0.Final-jar-with-dependencies.jar file;

Note that the core engine is required in all configurations. The jcr-2.0.jar file is not included and must be provided by you. And, as mentioned in the previous section, ModeShape uses SLF4J for logging and you must provide a logging implementation as well as the appropriate SLF4J binding JAR.

The Content Repository for Java Technology API 2.0 provides a standard Java API for working with content repositories. Abbreviated "JCR", this API was developed as part of the Java Community Process under JSR-170 (JCR 1.0) and has been revised and improved as JCR 2.0 under JSR-283. Some of the improvements make it possible for your application to be written entirely against the JCR 2.0 API.

Note

In the interests of brevity, this chapter does not attempt to reproduce the JSR-283 specification nor provide an exhaustive definition of ModeShape JCR capabilities. Rather, this chapter will describe any deviations from the specification as well as any ModeShape-specific public APIs and configuration. So, for a detailed explanation of the JCR API and its many interfaces and methods, see the JSR-283 specification.

Using ModeShape within your application is actually quite straightforward, and with JCR 2.0 it is possible for your application to do everything using only the JCR 2.0 API. Your application will first obtain a javax.jcr.Repository instance, and will use that object to create sessions through which your application will read, modify, search, or monitor content in the repository. JCR sessions are designed to be lightweight, so it is perfectly fine (and actually recommended) for your application to create many short-lived sessions while generally avoiding longer-lived sessions. In fact, javax.jcr.Session objects are not required to be thread-safe (and are not in ModeShape), so your application should avoid using a single Session instance in multiple threads.

Before we get started talking about how to use ModeShape via the standard JCR 2.0 API, it's worth spending a little time talking about the changes in JCR 2.0 compared with JCR 1.0.

Although an application written against the JCR 1.0 API will for the most part work very well against a JCR 2.0 repository, there are a few improvements to the JCR 2.0 API that your application will likely want to leverage.

Let's look at some of the more important changes in the JCR 2.0 API. However, this is certainly not definitive nor a complete comparison, so please consult the JSR-283 specification.

JCR 1.0 did not specify a way for client applications to obtain the Repository instance, though the JCR 1.0 specification did state this is typically done through JNDI. Consequently, JCR clients either used the JNDI approach or were required to use implementation-specific code. Often, client applications abstracted this process to minimize their reliance upon implementation-specific interfaces.

While the JNDI approach still works, JCR 2.0 introduces a new mechanism that makes it possible to find a Repository instance using only the JCR API. Details of this are covered more in later, but suffice to say that ModeShape does support this new RepositoryFactory approach.

How this affects your application: If your application used an implementation-specific approach to obtaining a Repository instance, you might consider changing it to use the new RepositoryFactory mechanism.

JCR 1.0 has always supported storing binary values in properties, but clients could do little more than just stream the bytes for each value. JCR 2.0 introduces a Binary interface that defines a way to get the size of the binary value, an InputStream to the value, a method for random access to the value's bytes, and a way to dispose of the binary value when completed (allowing the implementation to better clean up memory and other resources).

How this affects your application: The way your existing JCR application accesses and sets binary values will still work, but the methods are now deprecated. Therefore, you will very likely want to change to use the new Binary interface. For example, code that previously accessed the input stream directly from the Property:

Property property = ...

InputStream stream = property.getInputStream();
try {
   // Read stream
} finally {
   stream.close();
}

can be minimally changed to first get the Binary value and then get the stream from this Binary value:

Property property = ...

InputStream stream = property.getBinary().getInputStream();
try {
   // Read stream
} finally {
   stream.close();
}

This second example is not using any deprecated methods, but does not actually dispose of the Binary object. This actually works just fine in ModeShape, as closing the InputStream will automatically dispose of the Binary object.

You may also consider whether your application may benefit from the new Binary.getSize() or Binary.read(byte[],long) methods.

JCR 1.0 made it possible for applications to query the repository using XPath and JCR-SQL query languages. JCR 2.0 maintains the (mostly) similar Java interfaces for executing queries, but it deprecates the XPath and JCR-SQL query languages and introduces a new declarative language called "JCR-SQL2" that is a very good improvement over JCR-SQL. JCR 2.0 also introduces a new query object model (called "JCR-QOM") for defining queries using a programmatic API.

ModeShape supports all of these languages (XPath, JCR-SQL, JCR-SQL2, JCR-QOM), and also supports a full-text query language that is defined by the full-text search expression in the JCR-SQL2 language. Additionally, ModeShape extends most of these languages to support richer and more capable queries.

How this affects your application: Your application can continue to use XPath and JCR-SQL queries. However, your application may benefit from switching from JCR-SQL to JCR-SQL2 and its greater capabilities and expressive power. Leverage some of the ModeShape extensions to make your JCR-SQL2 queries even more powerful.

Versioning of nodes was defined as an optional feature of the JCR 1.0 API. The JCR 2.0 API expanded upon locking by defining a simple versioning model, introducing the VersionManager interface, and making some semantic changes as well. For example, restoring a version that contained a versioned child in its subgraph no longer automatically restores the versioned child. This behavior was ambiguous in the JCR 1.0 specification, and ModeShape 1.x performed the restore operation recursively down the graph. The JCR 2.0 specification more clearly requires a non-recursive restore. Therefore, ModeShape 2.5.0.Final now supports the "full versioning" model.

How this affects your application: If your application is already using JCR 1.0 versioning feature, be aware that many of the version-related methods on Node were deprecated in JCR 2.0 and moved to the new VersionManager interface. Also, any reliance upon ModeShape's recursive restore operation must be changed, per the JCR 2.0 specification.

Note

Remember to specify the system workspace name for your repositories if using versioning. Otherwise, ModeShape will not persist your versioning information.

Before your application can use a JCR repository, it has to find it. As mentioned above, the JCR 2.0 API defines a new RepositoryFactory interface that can be used with the Java Standard Edition Service Loader mechanism to obtain a Repository instance, all using the JCR API alone:



Map<String,String> parameters = ...
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}

This code looks for all RepositoryFactory implementations on the classpath (assuming those implementations properly defined the service provider within their JARs), and will ask each to create a repository given the supplied parameters. Thus, the parameters are specific to the implementation you want to use.

Note

With JCR 1.0, applications could only find a Repository instance using implementation-specific code. This new JCR 2.0 approach is a bit more complicated, but should work with most JCR 2.0 implementations and does not require using any implementation classes. And your application can even load the parameters from a configuration resource, meaning nothing in your application depends on a particular JCR implementation.

ModeShape uses a single property named "org.modeshape.jcr.URL" with a value that is a URL that either resolves to a ModeShape configuration file. Pointing directly to a configuration file often works well in stand-alone applications or where the configuration is managed in a central system. JNDI works great for applications deployed to server platforms (e.g., an application server or servlet container) where multiple applications might want to use the same JCR repository (or same ModeShape engine). We'll see in the next section how to configure ModeShape's JcrEngine explicitly and register it in JNDI.

So, here's the ServiceLoader example again, but with ModeShape-specific parameters:



String configUrl = ... ; // URL that points to your configuration file
Map<String,String> parameters = Collections.singletonMap("org.modeshape.jcr.URL", configUrl);
Repository repository = null;
for (RepositoryFactory factory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = factory.getRepository(parameters);
    if (repository != null) break;
}

Once you've gotten hold of a Repository instance, you can use it to create Sessions, using code similar to:



Credentials credentials = ...; // JCR credentials
String workspaceName = ...;  // Name of repository workspace
Session session = repository.login(credentials,workspaceName);

We'll talk about the various ways of creating sessions in a later chapter. First, let's look at the various kinds of URLs that you can use.

The value of configUrl in the code snippets can be any URL that is resolvable on your system. For example:

 file://path/to/configFile.xml?repositoryName=MyRepository 

In this example, the configuration file that specifies the repository setup will be loaded from the file path relativePathToConfigFile and the repository named yourRepositoryName will be returned. If ModeShape cannot find a file at the given path, it will try to load a configuration file as a resource through the classloader.

You might have noticed that this URL contains a query parameter (the "?repositoryName=MyRepository" part). ModeShape strips all query parameters when attempting to resolve file: URLs to the underlying file.

Here's another example of a file URL that uses an absolute path to the file:

 file://path/to/configFile.xml?repositoryName=MyRepository 

Note the addition of the three forward slashes after the protocol portion of the URL (i.e., file:). These indicate the path is absolute.

Other URLs are possible, too. Here is a URL that points to a configuration file stored in a web-enabled service, such as a web server, WebDAV file share, or version control system:

 http://www.example.com/path/to/configFile.xml?repositoryName=MyRepository 

Unlike with "file:" URLs, ModeShape does not strip the URL's query parameters when resolving to the configuration file, since most web servers ignore any query parameters not needed. This allows you to include additional query parameters in the URL if they're needed to retrieve the file from the server.

If your platform supports URLs with the "classpath:" scheme, you can point to a resource file on the classpath:

 classpath:path/to/configFile.xml?repositoryName=MyRepository 

Not all environments have such support, however. Many application servers, including JBoss AS and EAP, do include support by default. However, the Java Standard Edition (SE) does not come with a "classpath:" URL handler, though it is easy to add.

ModeShape does the same thing with all of these URLs: it looks to see whether it already has started a JcrEngine with a configuration file at the given URL. If so, it uses the value of the "repositoryName" query parameter and passes it to the getRepository(String) method. The result of this method call will be a Repository object that is then returned from the factory.

However, if the RepositoryFactory has not yet seen this URL, it will download the configuration file at the URL, load it using a new JcrConfiguration object, and start a new JcrEngine instance. It then uses the "repositoryName" query parameter to obtain the Repository as mentioned above.

The previous section showed how to use a URL to a configuration file to start a new ModeShape instance. However, ModeShape can be deployed and managed as a central, shared service in a variety of environments, including JBoss AS and EAP. Since a single ModeShape instance can manage multiple repositories, using a single shared instance will have a smaller footprint than multiple ModeShape instances each running a single repository. Plus, the central ModeShape instance can be configured, monitored, administered, and managed without requiring each application to perform these functions.

The easiest and most common way for applications to find and reuse this central, shared ModeShape service is to use JNDI. ModeShape's RepositoryFactory implementation accepts "jndi:" URLs instead of the file-based URL described in the previous chapter. The format of these JNDI URLs is:

 jndi:name/in/jndi?repositoryName=MyRepository 

The RepositoryFactory will look for a ModeShape engine registered in JNDI at "name/in/jndi", and will ask that engine for the Repository instance with the name "MyRepository". Note that when a JNDI URL is used, RepositoryFactory is will never create its own ModeShape engine instance: if none can be found in JNDI, the RepositoryFactory will simply return null.

Sometimes a JNDI implementation will require creating a new InitialContext instance with a hashtable of environment parameters. If this is the case for your environment, simply include those extra parameters in the Map passed into the getRepository(Map) method. ModeShape will forward these extra parameters into the InitialContext constructor it uses look up the JNDI reference.

If your application uses RepositoryFactory with a ModeShape URL pointing to a configuration file, the RepositoryFactory creates an embedded ModeShape engine (or several, if multiple configuration files are used) that maintains a serious of connections, thread pools, and other resources. In these cases, your application should shutdown ModeShape so that it can properly release all accumulated resources.

The JSR-283 specification does not specify a standard way to shutdown engines or repositories created as a side effect of RepositoryFactory, so ModeShape has an extension to the JSR-283 API that provides this capability.

When you obtain your Repository instance using the ServiceLoader mechanism described earlier, keep a reference to the RepositoryFactory that returns a non-null Repository:

Map<String,String> parameters = ...

Repository repository = null;
RepositoryFactory factory = null;
for (RepositoryFactory aFactory : ServiceLoader.load(RepositoryFactory.class)) {
    repository = aFactory.getRepository(parameters);
    if (repository != null) {
        factory = aFactory;
        break;
    }
}

Save this reference where your application's shutdown code can access it, then when your application is terminating, check the type of the factory, cast to the ModeShape extension, and call the "shutdown()" method:

if ( factory instanceof org.modeshape.jcr.api.RepositoryFactory ) {

    ((org.modeshape.jcr.api.RepositoryFactory)factory).shutdown();
}

This call to shutdown(...) instructs each of the JcrEngine instances created by the factory to shutdown gracefully and return immediately (without waiting for any of them to complete the shutdown process). If you'd rather block while the engines perform their shutdown, simply supply a timeout:

if ( factory instanceof org.modeshape.jcr.api.RepositoryFactory ) {

    ((org.modeshape.jcr.api.RepositoryFactory)factory).shutdown(30,TimeUnit.SECONDS);
}

This call will wait up to 30 seconds for each JcrEngine to shut down.

Although the preferred mechanism to obtain a Repository object is through the RepositoryFactory interface described above, there are times when an application wants or needs to have more control over an actual ModeShape engine, which encapsulates everything necessary to run one or more JCR repositories and managing the underlying repository sources, the pools of connections to the sources, the sequencers, the MIME type detector(s), and the Repository implementations.

Note

If your application uses the RepositoryFactory, then you can proceed to the next section.

The first step to programmatically instantiating a ModeShape JcrEngine is to define a configuration file as described in the previous chapter. Then, load that configuration file and check for problems:

JcrConfiguration config = new JcrConfiguration();

configuration.loadFrom(file);
if ( !configuration.getProblems().isEmpty() ) {
    for ( Problem problem : configuration.getProblems() ) {
        // Report these problems!
    }
}

where the file parameter can actually be a File instance, a URL to the file, an InputStream containing the contents of the file, or a String containing the path to the configuration file.

Note

The loadFrom(...) method can be called any number of times, but each time it is called it completely wipes out any current notion of the configuration and replaces it with the configuration found in the file.

There is an optional second parameter that defines the Path within the configuration file identifying the parent node of the various configuration nodes. If not specified, it assumes "/". This makes it possible for the configuration content to be located at a different location in the hierarchical structure. (This is not often required, but it is very useful if you ModeShape configuration file is embedded within another XML file.)

Note

If your application is coding against the ModeShape classes, you may also consider programmatically creating the configuration. This is useful when you cannot predefine a configuration, but instead have to build one based upon some parameters known only at runtime. Of course, you can always create the configuration programmatically, write that configuration out to a file, and then load the configuration using the standard RepositoryFactory mechanism.

Once you have a valid JcrConfiguration instance with no errors, you can build and start the JcrEngine:



JcrConfiguration config = ...
JcrEngine engine = config.build();
engine.start();
 

Obtaining a JCR Repository instance is a matter of simply asking the engine for it by the name defined in the configuration:



javax.jcr.Repository repository = engine.getRepository("Name of repository");
 

At this point, your application can proceed by working with the JCR API.

And, once you're finished with the JcrEngine, you should shut it down:



engine.shutdown();
engine.awaitTermination(3,TimeUnit.SECONDS);    // optional
 

When the shutdown() method is called, the Repository instances managed by the engine are marked as being shut down, and they will not be able to create new Sessions. However, any existing Sessions or ongoing operations (e.g., event notifications) present at the time of the shutdown() call will be allowed to finish. In essence, shutdown() is a graceful request, and since it may take some time to complete, you can wait until the shutdown has completed by simply calling awaitTermination(...) as shown above. This method will block until the engine has indeed shutdown or until the supplied time duration has passed (whichever comes first). And, yes, you can call the awaitTermination(...) method repeatedly if needed.

Once you have obtained a reference to the JCR Repository, you can create a JCR session using one of its login(...) methods. The JSR-283 specification provides four login methods, but the behavior of these methods depends on the kind of authentication system your application is using.

The login() method allows the implementation to choose its own security context to create a session in the default workspace for the repository. The ModeShape JCR implementation uses the security context from the current JAAS AccessControlContext. This implies that this method will throw a LoginException if it is not executed as a PrivilegedAction (AND the JcrRepository.Options.ANONYMOUS_USER_ROLES option does not allow access; see below for an example of how to configure guest user access). Here is one example of how this might work:

Subject subject = ...;
Session session = Subject.doAsPrivileged(subject, 
    new PrivilegedExceptionAction<Session>() {
        public Session run() throws Exception {
            return repository.login();
        }
    }, AccessController.getContext());

Another variant of this is to use the AccessControlContext directly, which then operates against the current Subject:

Session session = AccessController.doPrivileged( 
    new PrivilegedExceptionAction<Session>() {
        public Session run() throws Exception {
            return repository.login();
        }
    });

Either of these approaches will yield a session with the same user name and roles as subject. The login(String workspaceName) method is comparable and allows the workspace to be specified by name:

Subject subject = ...;
final String workspaceName = ...;
Session session = (Session) Subject.doAsPrivileged(subject, 
    new PrivilegedExceptionAction<Session>() {
        public Session run() throws Exception {
            return repository.login(workspaceName);
        }
    }, AccessController.getContext());

The JCR API also allows supplying a JCR Credentials object directly as part of the login process, although ModeShape imposes some requirements on what types of Credentials may be supplied. The simplest way is to provide a JCR SimpleCredentials object. These credentials will be validated against the JAAS realm named "modeshape-jcr", unless another realm name is provided as an option during the JCR repository configuration. For example:

String userName = ...;
char[] password = ...;
Session session = repository.login(new SimpleCredentials(userName, password));

Similarly, the login(Credentials credentials, String workspaceName) method enables passing the credentials and a workspace name:

String userName = ...;
char[] password = ...;
String workspaceName = ...;
Credentials credentials = new SimpleCredentials(userName, password);
Session session = repository.login(credentials, workspaceName);

If you'd want to use a different JAAS realm that what ModeShape is configured to use, you can use a JaasCredentials instance to pass the actual JAAS LoginContext that should be used for authentication and authorization:

LoginContext loginContext = ...;
Credentials credentials = new JaasCredentials(loginContext);
String workspaceName = ...;
Session session = repository.login(credentials,workspaceName);

Note that even in this case, ModeShape will still use the same roles for authorization.

Servlet-based applications can make use of the servlet's existing authentication mechanism from HttpServletRequest. Please note that the example below assumes that the servlet has a security constraint that prevents unauthenticated access.

HttpServletRequest request = ...;
Session session = repository.login(new ServletCredentials(securityContext));

The ServletCredentials is just a JCR Credentials implementation that uses ModeShape's ServletSecurityContext to delegate the authorization requests to HttpServletRequest's "hasRole" method. The ServletCredentials class is in the small "modeshape-web-jcr" module, so feel free to use this class in your servlet-based applications.

By default, ModeShape allows guest users full administrative access. This is done to make it easier to get started with ModeShape. Of course, this is clearly not an appropriate security model for a production system.

To modify the roles granted to guest users, change the JcrRepository.Options.ANONYMOUS_USER_ROLES option for your repository to have a different value, like "" (to disable guest access entirely) or "readonly" (to give guests read-only access to all repositories). The value of this option can be any pattern that matches those described in the table below.

Note

The Using ModeShape chapter of the Getting Started Guide provides examples of modifying this option through programmatic configuration or in an XML configuration file.

Once ModeShape is configured properly, getting anonymous JCR sessions requires no authentication. The easiest way to do this is to use the JCR API methods that do not have Credentials parameters. For example, this gets an anonymous session to the default workspace:

Session session = repository.login();

while the following gets an anonymous session to the workspace with the supplied name:

String workspaceName = ...;
Session session = repository.login(workspaceName);

Per the JCR API, these are equivalent to passing a null Credentials reference to "login" methods, so you can choose that approach as well. ModeShape provides the AnonymousCredentials implementation that can be used if your application expects a to use non-null Credentials object:

Session session = repository.login(new AnonymousCredentials());

or

String workspaceName = ...;
Session session = repository.login(new AnonymousCredentials(),workspaceName);

If you supply any other Credentials implementation to the "login" methods, ModeShape will not treat it as an anonymous login and will authenticate using JAAS or, if the credentials is a SecurityContextCredentials instance, its SecurityContext instance. In other words, there's no way to turn off authentication, but you can use anonymous sessions.

Not all applications can or want to use JAAS for their authentication system, so ModeShape provides a way to integrate your own custom security provider. The first step is to provide a custom implementation of SecurityContext that integrates with your application security, allowing ModeShape to discover the authenticated user's name, determine whether the authenticated user has been assigned particular roles (see the JCR Security section), and to notify your application security system that the authenticated session (for JCR) has ended.

The next step is to wrap your SecurityContext instance within an instance of SecurityContextCredentials, and pass it as the Credentials parameter in one of the two login(...) methods:

SecurityContext securityContext = new CustomSecurityContext(...);
Session session = repository.login(new SecurityContextCredentials(securityContext));
			

Once the Session is obtained, the repository content can be accessed and modified like any other JCR repository.

We believe that ModeShape JCR implementation is JCR-compliant, but we are awaiting final certification of compliance. Additionally, the JCR specification allows some latitude to implementors for some implementation details. The sections below clarify ModeShape's current and planned behavior. As always, please consult the current list of known issues and bugs.

ModeShape 2.5.0.Final implements all of the JCR 2.0 required features:

ModeShape supports several query languages, including the JCR-SQL2 and JCR-QOM query languages defined in JSR-283, and the XPath and JCR-SQL languages defined in JSR-170 but deprecated in JSR-283. ModeShape also supports a fulltext search language that is defined by the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language. We just pulled it out and made it available as a first-class query language.

The ModeShape project has not yet been certified to be fully-compliant with the JCR 2.0 specification, but does plan on attaining this certification in the very near future.

However, the ModeShape project also runs the JCR TCK unit tests from the reference implementation every night. These tests technically do not represent the official TCK, but are used within the TCK. Most of these unit tests are run in the modeshape-jcr module against the in-memory repository to ensure our JCR implementation behaves correctly, and the same tests are run in the modeshape-integration-tests module against a variety of connectors to ensure they're implemented correctly. The modeshape-jcr-tck module runs all of these TCK unit tests, and currently there are only a handful of failures due to known issues (see the JCR specification support section for details).

ModeShape 2.5.0.Final currently passes 1372 of the 1391 JCR TCK tests, where 17 of these 19 failures appear to be bugs in the TCK tests (see JCR-2648, JCR-2661, JCR-2662, and JCR-2663). The remaining 2 failures are due to a known issue (see MODE-760).

Although the JSR-283 specification requires implementation of the Session.checkPermission(String, String) method, it allows implementors to choose the granularity of their access controls. ModeShape supports coarse-grained, role-based access control at the repository and workspace level.

ModeShape has extended the set of JCR-defined actions ("add_node", "set_property", "remove", and "read") with additional actions ("register_type", "register_namespace", "unlock_any", "create_workspace" and "delete_workspace"). The "register_type" and "register_namespace" permissions control the ability to register (and unregister) node types and namespaces, respectively. The "unlock_any"" permission grants the user the ability to unlock any locked node or branch (as opposed to users without that permission who can only unlock nodes or branches that they have locked themselves or for which they hold the lock token). Finally, the "create_workspace" and "delete_workspace" permissions grant the user the ability to create workspaces and delete workspaces, respectively, using the corresponding methods on Workspace. Permissions to perform these actions are aggregated in roles that can be assigned to users.

ModeShape currently defines three roles: readonly, readwrite, and admin. If the Credentials passed into Repository.login(...) (or the Subject from the AccessControlContext, if one of the no-credential login methods were used) have any of these roles, the session will have the corresponding access to all workspaces within the repository. The mapping from the roles to the actions that they allow is provided below, for any values of path.


It is also possible to grant access only to one or more repositories on a single ModeShape server or to one or more named workspaces within a repository. The format for role names is defined below:


It is also possible to grant more than one role to the same user. For example, the user "jsmith" could be granted the roles "readonly.production", "readwrite.production.jsmith", and "readwrite.staging" to allow read-only access to any workspace on a production repository, read/write access to a personal workspace on the same production repository, and read/write access to any workspace in a staging repository.

As a final note, the ModeShape JCR implementation may have additional security roles added in the future. A CONNECT role is already being used by the ModeShape REST Server to control whether users have access to the repository through that means.

ModeShape supports all of the built-in node types described in the JSR-283 specification. ModeShape also defines some custom node types in the mode namespace, but none of these node types (other than mode:resource) are intended to be used by developers integrating with ModeShape and may be changed or removed at any time.

Although the JSR-283 specification does not require support for registration and unregistration of custom types, ModeShape supports this extremely useful feature. Custom node types can be added at startup, as noted above, at runtime using the standard JCR API for managing node types, or at runtime by reading CND files or Jackrabbit XML files. These node type registration mechanisms are supported equally within ModeShape, although defining node types in standard CND files is recommended for portability.

Note

ModeShape also supports defining custom node types to load at startup. This is discussed in more detail in the previous chapter.

The JCR 2.0 API provides a mechanism for registering and unregistering node types. Registration is done by creating NodeTypeTemplate objects, NodeDefinitionTemplate objects (for child node definitions), and PropertyDefinitionTemplate objects (for property definitions). Use the setter methods to set the various attributes, and then register the node type definition with the NodeTypeManager:

Session session = ... ;
Workspace workspace = session.getWorkspace();

// Obtain the ModeShape-specific node type manager ...
NodeTypeManager nodeTypeManager = workspace.getNodeTypeManager();

// Declare a mixin node type named "searchable" (with no namespace)
NodeTypeTemplate nodeType = nodeTypeManager.createNodeTypeTemplate();
nodeType.setName("searchable");
nodeType.setMixin(true);

// Add a mandatory child named "source" with a required primary type of "nt:file" 
NodeDefinitionTemplate childNode = nodeTypeManager.createNodeDefinitionTemplate();
childNode.setName("source");
childNode.setMandatory(true);
childNode.setRequiredPrimaryTypesNames(new String[] { "nt:file" });
childNode.setDefaultPrimaryTypeName("nt:file");
nodeType.getNodeDefinitionTemplates().add(childNode);

// Add a multi-valued STRING property named "keywords"
PropertyDefinitionTemplate property = nodeTypeManager.createPropertyDefinitionTemplate();
property.setName("keywords");
property.setMultiple(true);
property.setRequiredType(PropertyType.STRING);
nodeType.getPropertyDefinitionTemplates().add(property);

// Register the custom node type
nodeTypeManager.registerNodeType(nodeType,false);

Residual properties and child node definitions can also be defined simply by not calling setName on the template.

ModeShape also supports a simple means of unregistering types, although it is not possible to unregister types that are currently being used by nodes or as required primary types or supertypes of other types. Unused node types can be unregistered with the following code, using the standard JCR 2.0 API:

String[] unusedNodeTypeNames = ...;

Session session = ... ;
NodeTypeManager nodeTypeManager = session.getWorkspace().getNodeTypeManager();
nodeTypeManager.unregisterNodeTypes(unusedNodeTypeNames);

This approach is often used to register custom node types within an application, when the application knows the node type definitions or retrieves these definitions from some persisted format (e.g., file, database, etc.). However, ModeShape provides some utilities if you want to programmatically register node types defined in certain file formats. We'll see in the next section how to use these.

Custom node types can be defined more succinctly through the CND file format defined by the JCR 2.0 specification. In fact, this is how JBoss ModeShape defines its built-in node types. An example CND file that declares the same node type as above would be:

[searchable] mixin
- keywords (string) multiple
+ source (nt:file) = nt:file mandatory

This definition could then be registered as part of the repository configuration (see the previous chapter). Or, you can also use a Session to programmatically register the node types in a CND file, but this requires ModeShape-specific class to read this file:

Session session = ...
CndNodeTypeReader reader = new CndNodeTypeReader(session);
reader.read(cndFile); // from file, file system path, classpath resource, URL, etc.

if (!reader.getProblems().isEmpty()) {
  for (Problem problem : nodeTypeSource.getProblems()) {
    // report or record problem
  }
} else {
  boolean allowUpdate = ...
  NodeTypeManager nodeTypeManager = session.getWorkspace().getNodeTypeManager();
  nodeTypeManager.registerNodeTypes(reader.getNodeTypeDefinitions(), allowUpdate);
}

The CndNodeTypeReader class provides a number of read(...) methods that accept Files, paths to files on the file system, the names of resources on the classpath, , and InputStreams. And CndNodeTypeReader will also register any namespace mappings defined in the CND file but not yet registered in the session or workspace. For details, see the JavaDoc for CndNodeTypeReader. If you have multiple CND files, you can either call read(...) multiple times before registering (as long as the CND files don't contain duplicate node type definitions), or you can simply create and use a new reader for each CND file. The choice is yours.

ModeShape also provides a class that reads the node types defined in a Jackrabbit XML format. This is useful if you've been using Jackrabbit, have defined your custom node types in the Jackrabbit-specific format, but want to switch to ModeShape and don't want to have to manually convert your node types in the standard CND format. This class is used almost identically to the CndNodeTypeReader class described above:

Session session = ...
JackrabbitXmlNodeTypeReader reader = new JackrabbitXmlNodeTypeReader(session);
reader.read(cndFile); // from file, file system path, classpath resource, URL, etc.

if (!reader.getProblems().isEmpty()) {
  for (Problem problem : nodeTypeSource.getProblems()) {
    // report or record problem
  }
} else {
  boolean allowUpdate = ...
  NodeTypeManager nodeTypeManager = session.getWorkspace().getNodeTypeManager();
  nodeTypeManager.registerNodeTypes(reader.getNodeTypeDefinitions(), allowUpdate);
}

The JCR API defines a way to query a repository for content that meets user-defined criteria. The JCR 2.0 API actually makes it possible for implementations to support multiple query languages, and the specification requires support for two languages: JCR-SQL2 and JCR-QOM. JCR 1.0 defined two other languages (XPath and JCR-SQL), though these languages were deprecated in JCR 2.0.

At this time, ModeShape supports all of these query languages, plus one search-engine-like language called "search" that is actually just the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language.

ModeShape handles all of these languages in nearly the same manner, the only difference being whether the query is represented as a string or build programmatically using the javax.jcr.query.qom part of the JCR API.

  1. A language-independent representation, called the query model, is constructed by parsing the string representation of the query (using a language-specific parser) or the JCR-QOM objects created by the client.

  2. The language-independent query model is used to create a canonical (relational) query plan.

  3. The canonical query plan is then validated to ensure that all identifiers in the query are resolvable.

  4. The canonical query plan is then optimized using a flexible rule-based optimizer. Optimizations include (but are not limited to): replace view references; unify handling of aliases; convert right outer joins into left outer joins; choose algorithms for each join; raise and lower criteria; push projection of columns as low in the plan as possible; duplicate criteria across identity joins; rewrite identity joins involving only columns that form keys; remove parts of the plan that (based upon the criteria) will return no rows; determination of the low-level "access" queries that will be submitted to the connector layer.

  5. The optimized query plan is then executed, whereby each access query is pushed down to the connector and the results are then processed and combined to produce the desired result set.

Note that only the parsing step is dependent upon the query language. This means that all of the query languages are processed using the same, unified engine.

The rest of this chapter describes how your applications can use queries to search your repositories, and outlines the specifics of each of the four query languages available in ModeShape.

With ModeShape, all query operations can be performed using only the JCR API interfaces. The first step is to obtain the QueryManager from your Session instance. The QueryManager interface defines methods for creating Query objects, executing queries, storing queries (not results) as Nodes in the repository, and reconstituting queries that were stored on Nodes. Thus, querying a repository generally follows this pattern:



// Obtain the query manager for the session ...
javax.jcr.query.QueryManager queryManager = session.getWorkspace().getQueryManager();
// Create a query object ...
String language = ...
String expression = ...
javax.jcr.Query query = queryManager.createQuery(expression,language);
// Execute the query and get the results ...
javax.jcr.QueryResult result = query.execute();
// Iterate over the nodes in the results ...
javax.jcr.NodeIterator nodeIter = result.getNodes();
while ( nodeIter.hasNext() ) {
    javax.jcr.Node node = nodeIter.nextNode();
        ...
}
// Or iterate over the rows in the results ...
String[] columnNames = result.getColumnNames();
javax.jcr.query.RowIterator rowIter = result.getRows();
while ( rowIter.hasNext() ) {
    javax.jcr.query.Row row = rowIter.nextRow();
    // Iterate over the column values in each row ...
    javax.jcr.Value[] values = row.getValues();
    for ( javax.jcr.Value value : values ) {
                ...
    }
    // Or access the column values by name ...
    for ( String columnName : columnNames ) {
        javax.jcr.Value value = row.getValue(columnName);
                ...
    }
}
// When finished, close the session ...
session.logout();

For more detail about these methods or about how to use other facets of the JCR query API, please consult chapter 6 of the JCR 2.0 specification.

The JCR 1.0 specification uses the XPath query language because node structures in JCR are very analogous to the structure of an XML document. Thus, XPath provides a useful language for selecting and searching workspace content. And since JCR 1.0 defines a mapping between XML and a workspace view called the "document view", adapting XPath to workspace content is quite natural.

A JCR XPath query specifies the subset of nodes in a workspace that satisfy the constraints defined in the query. Constraints can limit the nodes in the results to be those nodes with a specific (primary or mixin) node type, with properties having particular values, or to be within a specific subtree of the workspace. The query also defines how the nodes are to be returned in the result sets using column specifiers and ordering specifiers.

ModeShape offers a bit more functionality in the "jcr:contains(...)" clauses than required by the specification. In particular, the second parameter specifies the search expression, and for these ModeShape accepts full-text search language expressions, including wildcard support.

Note

As an aside, ModeShape actually implements XPath queries by transforming them into the equivalent JCR-SQL2 representation. And the JCR-SQL2 language, although often more verbose, is much more capable of representing complex queries with multiple combinations of type, property, and path constraints.

JCR 1.0 specifies that support is required only for specifying constraints of one primary type, and it is optional to support specifying constraints on one (or more) mixin types. The specification also defines that the XPath element test be used to test against node types, and that it is optional to support element tests on location steps other than the last one. Type constraints are inherently inheritance-sensitive, in that a constraint against a particular node type 'X' will be satisfied by nodes explicitly declared to be of type 'X' or of subtypes of 'X'.

ModeShape does support using the element test to test against primary or mixin type. ModeShape also only supports using an element test on the last location step. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.


Note that the JCR-SQL2 language supported by ModeShape is far more capable of joining multiple sets of nodes with different type, property and path constraints.

JCR 1.0 specifies that attribute tests on the last location step is required, but that predicate tests on any other location steps are optional.

ModeShape does support using attribute tests on the last location step to specify property constraints, as well as supporting axis and filter predicates on other location steps. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.


Section 6.6.3.3 of the JCR 1.0 specification contains an in-depth description of property value constraints using various comparison operators.

JCR 1.0 specifies that exact, child node, and descendants-or-self path constraints be supported on the location steps in an XPath query.

ModeShape does support the four kinds of path constraints. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.


Note that the JCR-SQL2 language supported by ModeShape is capable of representing a wider combination of path constraints, although the XPath expressions are easier to understand and significantly shorter.

Also, path constraints in XPath do not need to specify wildcards for the same-name-sibling (SNS) indexes, as XPath should naturally find all nodes regardless of the SNS index, unless the SNS index is explicitly specified. In other words, any path segment that does not have an explicit SNS index (or an SNS index of '[%]' or '[_]') will match all SNS index values. However, any segments in the path expression that have an explicit numeric SNS index will require an exact match. Thus this path constraint:

/a/b/c[2]/d[%]/%/e[_]

will effectively be converted into

/a[%]/b[%]/c[2]/d[%]/%/e[_]

This behavior is very different than how JCR-SQL and JCR-SQL2 path constraints are handled, since these languages interpret a lack of a SNS index as equating to '[1]'. To achieve the XPath-like matching, a query written in JCR-SQL or JCR-SQL2 would need to explicitly include '[%]' in each path segment where an SNS index literal is not already specified.

The JCR-SQL query language is defined by the JCR 1.0 specification as a way to express queries using strings that are similar to SQL. Support for the language is optional, and in fact this language was deprecated in the JCR 2.0 specification in favor of the improved and more powerful (and more SQL-like) JCR-SQL2 language, which is covered in the next section.

The JCR 2.0 specification defines how nodes in a repository are mapped onto relational tables queryable through a SQL-like language, including JCR-SQL and JCR-SQL2. Basically, each node type is mapped as a relational view with a single column for each of the node type's (residual and non-residual) property definitions. Conceptually, each node in the repository then appears as a record inside the view corresponding to the node type for which "Node.isNodeType(nodeTypeName)" would return true.

Since each node likely returns true from this method for multiple node type (e.g., the primary node type, the mixin types, and all supertypes of the primary and mixin node types), all nodes will likely appear as records in multiple views. And since each view only exposes those properties defined by (or inherited by) the corresponding node type, a full picture of a node will likely require joining the views for multiple node types. This special kind of join, where the nodes have the same identity on each side of the join, is referred to as an identity join, and is handled very efficiently by ModeShape.

ModeShape includes support for the JCR-SQL language, and adds several extensions to make it even more powerful and useful:

  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.

  • Removal of duplicate rows in the results, using "SELECT DISTINCT ...".

  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped.

  • Support for the IN and NOT IN clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... prop1 IN (3,5,7,10,11,50) ...".

  • Support for the BETWEEN clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... prop1 BETWEEN 3 EXCLUSIVE AND 10 ...".

  • Support for (non-correlated) subqueries in the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., IN (...)), then all of the subquery's rows will be used. For example, this query "WHERE ... prop1 IN ( SELECT my:prop2 FROM my:type2 WHERE my:prop3 < '1000' ) AND ..." will use the results of the subquery as the literal values in the IN clause.

The grammar for the JCR-SQL query language is actually a superset of that defined by the JCR 1.0 specification, and as such the complete grammar is included here.

Note

The grammar is presented using the same EBNF nomenclature as used in the JCR 1.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values. Literals (or keywords) are denoted by single-quotes.

QueryCommand ::= Query | SetQuery

SetQuery ::= Query ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query
                 { ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query }

Query ::= Select From [Where] [OrderBy] [Limit]

Select ::= 'SELECT' ('*' | Proplist ) 

From ::= 'FROM' NtList 

Where ::= 'WHERE' WhereExp

OrderBy ::= 'ORDER BY' propname [Order] {',' propname [Order]}

Order ::= 'DESC' | 'ASC'

Proplist ::= propname {',' propname}

NtList ::= ntname {',' ntname}

WhereExp ::= propname Op value |
             propname 'IS' ['NOT'] 'NULL' |
             like | 
             contains | 
             whereexp ('AND'|'OR') whereexp | 
             'NOT' whereexp |
             '(' whereexp ')' | 
             joinpropname '=' joinpropname |
             between |
             propname ['NOT'] 'IN' '(' value {',' value } ')'

Op ::= '='|'>'|'<'|'>='|'<='|'<>'

joinpropname ::= quotedjoinpropname | unquotedjoinpropname
quotedjoinpropname ::= ''' unquotedjoinpropname '''
unquotedjoinpropname ::= ntname '.jcr:path'

propname ::= quotedpropname | unquotedpropname
quotedpropname ::= ''' unquotedpropname '''
unquotedpropname ::= /* A property name, possible a pseudo-property: jcr:score or jcr:path */

ntname ::= quotedntname | unquotedntname 
quotedntname ::= ''' unquotedntname ''' 
unquotedntname ::= /* A node type name */ 

value ::= literal | subquery

literal ::= ''' literalvalue ''' | literalvalue
literalvalue ::= /* A property value (in standard string form) */

subquery ::= '(' QueryCommand ')' | QueryCommand

like ::= propname 'LIKE' likepattern [ escape ]
likepattern ::= ''' likechar { likepattern } '''
likechar ::= char | '%' | '_'

escape ::= 'ESCAPE' ''' likechar '''

char ::= /* Any character valid within the string representation of a value
            except for the characters % and _ themselves. These must be escaped */

contains ::= 'CONTAINS(' scope ',' searchexp ')'
scope ::= unquotedpropname | '.'
searchexp ::= ''' exp '''
exp ::= ['-']term {whitespace ['OR'] whitespace ['-']term}
term ::= word | '"' word {whitespace word} '"'
word ::= /* A string containing no whitespace */
whitespace ::= /* A string of only whitespace*/

between ::= propname ['NOT'] 'BETWEEN' lowerBound ['EXCLUSIVE'] 
                                 'AND' upperBound ['EXCLUSIVE']
lowerBound ::= value
upperBound ::= value

Limit ::= 'LIMIT' count [ 'OFFSET' offset ]
count ::= /* Positive integer value */
offset ::= /* Non-negative integer value */

The JCR-SQL2 query language is defined by the JCR 2.0 specification as a way to express queries using strings that are similar to SQL. This query language is an improvement over the JCR-SQL language, providing among other things far richer specifications of joins and criteria.

ModeShape includes full support for the complete JCR-SQL2 query language. However, ModeShape adds several extensions to make it even more powerful:

  • Support for the "FULL OUTER JOIN" and "CROSS JOIN" join types, in addition to the "LEFT OUTER JOIN", "RIGHT OUTER JOIN" and "INNER JOIN" types defined by JCR-SQL2. Note that "JOIN" is a shorthand for "INNER JOIN". For detail, see the grammar for joins.

  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.

  • Removal of duplicate rows in the results, using "SELECT DISTINCT ...". For detail, see the grammar for queries.

  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped. For detail, see the grammar for limits and offsets.

  • Additional dynamic operands "DEPTH([<selectorName>])" and "PATH([<selectorName>])" that enable placing constraints on the node depth and path, respectively. These dynamic operands can be used in a manner similar to "NAME([<selectorName>])" and "LOCALNAME([<selectorName>])" that are defined by JCR-SQL2. Note in each of these cases, the selector name is optional if there is only one selector in the query. For detail, see the grammar for dynamic operands.

  • Additional dynamic operand "REFERENCE([<selectorName>.]<propertyName>)" and "REFERENCE([<selectorName>])" that enables placing constraints on one or any of the reference properties, respectively, and which can be used in a manner similar to " PropertyValue([<selectorName>.]<propertyName>)". Note in each of these cases, the selector name is optional if there is only one selector in the query, and that the property name can be excluded if the constraint should apply to all reference properties. For detail, see the grammar for dynamic operands.

  • Support for the IN and NOT IN clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... [my:type].[prop1] IN (3,5,7,10,11,50) ...". For detail, see the grammar for set constraints.

  • Support for the BETWEEN clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... [my:type].[prop1] BETWEEN 3 EXCLUSIVE AND 10 ...". For detail, see the grammar for between constraints.

  • Support for simple arithmetic in numeric-based criteria and order-by clauses. For example, "... WHERE SCORE(type1) + SCORE(type2) > 1.0" or "... ORDER BY (SCORE(type1) * SCORE(type2)) ASC, LENGTH(type2.property1) DESC". For detail, see the grammar for order-by clauses.

  • Support for (non-correlated) subqueries in the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., IN (...)), then all of the subquery's rows will be used. For example, this query "WHERE ... [my:type].[prop1] IN ( SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ..." will use the results of the subquery as the literal values in the IN clause.

  • Support for several pseudo-columns ("jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth") that can be used in the SELECT, equijoin, and WHERE clauses. These pseudo-columns make it possible to return location-related and score information within the QueryResult's rows. They also make queries look more like SQL, and thus may be more friendly and easier to use in existing SQL-aware client applications. See the detailed description for more information.

The grammar for the JCR-SQL2 query language is actually a superset of that defined by the JCR 2.0 specification, and as such the complete grammar is included here.

Note

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values. Literals (or keywords) are denoted by single-quotes.

	
FullTextSearch ::= 'CONTAINS(' ([selectorName'.']propertyName | selectorName'.*') 
                           ',' ''' fullTextSearchExpression''' ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional */

fullTextSearchExpression ::= FulltextSearch

where FulltextSearch is defined by the following, and is the same as the full-text search language supported by ModeShape:


FulltextSearch ::= Disjunct {Space 'OR' Space Disjunct}

Disjunct ::= Term {Space Term}

Term ::= ['-'] SimpleTerm

SimpleTerm ::= Word | '"' Word {Space Word} '"'

Word ::= NonSpaceChar {NonSpaceChar}

Space ::= SpaceChar {SpaceChar}

NonSpaceChar ::= Char - SpaceChar /* Any Char except SpaceChar */

SpaceChar ::= ' '

Char ::= /* Any character */

	
DynamicOperand ::= PropertyValue | ReferenceValue | Length | NodeName | NodeLocalName | NodePath |
                   NodeDepth | FullTextSearchScore | LowerCase | UpperCase | Arithmetic | 
                   '(' DynamicOperand ')'

PropertyValue ::= [selectorName'.'] propertyName
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional */

ReferenceValue ::= 'REFERENCE(' selectorName '.' propertyName ')' |
                   'REFERENCE(' selectorName ')' |
                   'REFERENCE()' |
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional. Also, the property name may be excluded 
                      if the constraint should apply to any reference property. *&#47;

Length ::= 'LENGTH(' PropertyValue ')'

NodeName ::= 'NAME(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodeLocalName ::= 'LOCALNAME(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodePath ::= 'PATH(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodeDepth ::= 'DEPTH(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

FullTextSearchScore ::= 'SCORE(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

LowerCase ::= 'LOWER(' DynamicOperand ')'

UpperCase ::= 'UPPER(' DynamicOperand ')'

Arithmetic ::= DynamicOperand ('+'|'-'|'*'|'/') DynamicOperand

The design of the JCR-SQL2 query language makes fairly heavy use of functions, including SCORE(), NAME(), and LOCALNAME(). ModeShape adds several more useful functions, including PATH() and DEPTH(), that follow the same patterns.

However, there are several disadvantages of these functions. First, they make the JCR-SQL2 language less "SQL-like", since SQL-92 and -99 don't define these kinds of functions. (There are aggregate functions, like COUNT, SUM, etc., but they are not terribly analogous.) This means that applications that use SQL and SQL-like query languages are less likely to be able to build and issue JCR-SQL2 queries.

A second disadvantage of these functions is that JCR-SQL2 does not allow them to be used within the SELECT clause. As a result, the location-related and score information cannot be included as columns of values in the QueryResult rows. Instead, a client can only access this information by obtaining the Node object(s) for each row. Relying upon both the result set and additional Java objects makes it difficult to use.

For example, ModeShape's JDBC driver is designed to enable JDBC-aware applications to query repository content using JCR-SQL2 queries. The standard JDBC API cannot expose the Node objects, so the only way to return the path-related and score information is through additional columns in the result. While such columns could "magically" appear in the result set, doing this is not compatible with JDBC applications that dynamically build queries based upon database metadata. Such applications require the columns to be properly described in database metadata, and the columns need to be used within queries.

ModeShape attempts to solve these issues by directly supporting a number of "pseudo-columns" within JCR-SQL2 queries, wherever columns can be used. These "pseudo-columns" include:

  • jcr:score is a column of type DOUBLE that represents the full-text search score of the node, which is a measure of the node's relevance to the full-text search expression. ModeShape does compute the scores for all queries, though the score for rows in queries that do not include a full-text search criteria may not be reliable.

  • jcr:path is a column of type PATH that represents the normalized path of a node, including same-name siblings. This is the same as what would be returned by the getPath() method of Node. Examples of paths include "/jcr:system" and "/foo/bar[3]".

  • jcr:name is a column of type NAME that represents the node name in its namespace-qualified form using namespace prefixes and excluding same-name-sibling indexes. Examples of node names include "jcr:system", "jcr:content", "ex:UserData", and "bar".

  • mode:localName is a column of type STRING that represents the local name of the node, which excludes the namespace prefix and same-name-sibling index. As an example, the local name of the "jcr:system" node is "system", while the local name of the "ex:UserData[3]" node is "UserData".

  • mode:depth is a column of type LONG that represents the depth of a node, which corresponds exactly to the number of path segments within the path. For example, the depth of the root node is 0, whereas the depth of the "/jcr:system/jcr:nodeTypes" node is 2.

All of these pseudo-columns can be used in the SELECT clause of any JCR-SQL2 query, and their use defines whether such columns appear in the result set. In fact, all of these pseudo-columns will be included when "SELECT *" clauses in JCR-SQL2 queries are expanded by the query engine. This means that every node type (even mixin node types that have no properties and are essentially markers) are represented by a queryable table.

Like any other column, all of these pseudo-columns can be also be used in the WHERE clause of any JCR-SQL2 query, even if they are not included in the SELECT clause. They can be used anywhere that a regular column can be used, including within constraints and dynamic operands. ModeShape will automatically rewrite queries that use pseudo-columns in the dynamic operands to use the corresponding function, such as SCORE(), PATH(), NAME(), LOCALNAME(), and DEPTH(). Additionally, any property existence constraint using these pseudo-columns will always evaluate to 'true' (and will thus be removed by the optimizer).

The jcr:path pseudo-column may also be used on both sides of an equijoin constraint clause. For example:

 ... selector1.[jcr:path] = selector2.[jcr:path] ... 

Equijoins of this form will be automatically rewritten by the optimizer to the following form:

 ... ISSAMENODE(selector1,selector2) ... 

As with regular columns, the pseudo-columns must be qualified with the selector name if the query contains more than one selector.

Note

Note that the jcr:path and jcr:score pseudo-columns are consistent with the pseudo-columns of the same names used in JCR-SQL query language. However, unlike in JCR-SQL, in JCR-SQL2 these columns are not automatically included in the results unless explicitly included in the SELECT clause or implicitly included via "SELECT *"

One of the simplest JCR-SQL2 queries finds all nodes in the current workspace of the repository:

 SELECT * FROM [nt:base] 

This query will return a result set containing a single "jcr:primaryType" column, since the nt:base defines only one single-valued property called "jcr:primaryType". (The jcr:mixinTypes property is multi-valued, and as such the JCR 2.0 specification does not require returning these in query results.)

Queries can explicitly specify the columns that are to be returned in the results. The following query is semantically equivalent to the previous query, and produces identical results:

 SELECT [jcr:primaryType] FROM [nt:base] 

The following query will return the same rows as in the previous queries, but the SELECT clause includes two pseudo-columns containing the values computed from the nodes' locations:

 SELECT [jcr:primaryType], [jcr:path], [mode:depth] FROM [nt:base] 

In JCR-SQL2, a table representing a particular node type will have a column for each of the node type's property definitions, including those inherited from supertypes. For example, the nt:file node type, its nt:hierarchyNode supertype, and the mix:created mixin type are defined using the CND notation as follows:

[mix:created] mixin 
  - jcr:created (date) protected  
  - jcr:createdBy (string) protected

[nt:hierarchyNode] > mix:created abstract 

[nt:file] > nt:hierarchyNode 
  + jcr:content (nt:base) primary mandatory

Therefore, the table representing the nt:file node type will have two three columns: the jcr:created and jcr:createdBy columns inherited from the mix:created mixin node type (via the nt:hierarchyNode node type), and the jcr:primaryType column inherited from the nt:base node type, which is the implicit supertype of the nt:hierarchyNode. Thus, this query:

 SELECT * FROM [nt:file] 

is equivalent to this query:

SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], 
       [jcr:path], [jcr:name], [jcr:score], [mode:localName], [mode:depth] 
FROM [nt:file] 

Here is an example query that selects some of the available columns from the nt:file table and uses a constraint to ensure the resulting file nodes have names that end in '.txt':

SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] 
WHERE LOCALNAME() LIKE '%.txt'

Of course, we could instead using mode:localName pseudo-column instead of the LOCALNAME() function. Such a query is equivalent to the previous query and will produce the exact same results:

SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] 
WHERE [mode:localName] LIKE '%.txt'

Although this query looks much more like SQL, the use of the '[' and ']' characters to quote the identifiers is not typical of a SQL dialect. ModeShape actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous two queries:

SELECT "jcr:primaryType", "jcr:created", "jcr:createdBy", "jcr:path" FROM "nt:file" 
WHERE "mode:localName" LIKE '%.txt'

In JCR-SQL2, a node will appear as a row in each table that corresponds to the node types defined by that node's primary type or mixin types, or any supertypes of these node types. In other words, a node will appear in the table corresponding to each node type for which Node.isNodeType(...) returns true.

For example, consider a node that has a primary type of nt:file but has a mixin of mix:referenceable. This node will appear as a result in the nt:file, mix:referenceable, nt:hierarchy, mix:created, and nt:base. The table for nt:file contains all of the columns in the nt:hierarchyNode, mix:referenceable, and nt:base. However, the nt:file table does not contain the jcr:uuid column, since the nt:file node type does not extend mix:referenceable. Thus, to obtain the UUID for our node, we need to perform an identity join. The next query shows how this is done to return all properties for nt:file nodes that are also mix:referenceable:

SELECT file.*, ref.* FROM [nt:file] AS file JOIN [mix:referenceable] AS ref
JOIN ON ISSAMENODE(file,ref)

The select clause would be expanded to the following query:

SELECT file.[jcr:primaryType], file.[jcr:created], file.[jcr:createdBy], ref.[jcr:uuid] 
       file.[jcr:path], file.[jcr:name], file.[jcr:score], file.[mode:localName], file.[mode:depth] 
FROM [nt:file] AS file JOIN [mix:referenceable] AS ref
JOIN ON ISSAMENODE(file,ref)

Of course, would could return even more information and make the query look very SQL-like by using pseudo-columns:

SELECT file."jcr:primaryType", file."jcr:created", file."jcr:createdBy", ref."jcr:uuid",
       file."jcr:path", file."jcr:name", file."mode:localName", file."mode:depth", file."jcr:score" 
FROM "nt:file" AS file JOIN "mix:referenceable" AS ref
JOIN ON file."jcr:path" = ref."jcr:path"

These are examples of two-way inner joins, but ModeShape supports joining multiple tables together in a single query. ModeShape also supports a variety of joins, including INNER JOIN (or just JOIN), LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN.

ModeShape supports several other query features beyond JCR-SQL2. One of these is support for UNION, INTERSECT and EXCEPT. Here is an example of a union:

SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file]
UNION
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:folder]

ModeShape also supports using (non-correlated) subqueries within the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries, though, must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used.

Subqueries in ModeShape are a powerful and easy way to use more complex criteria that is a function of the content in the repository, without having to resort to multiple queries (take the results of one query and dynamically generate the criteria of another query).

Here's an example of a query that finds all nt:file nodes in the repository whose paths are referenced in the vdb:originalFile property of the vdb:virtualDatabase nodes. (This query also uses bind variables in the subquery.)

SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file] 
WHERE PATH() IN ( 
   SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
   WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe')
) 

Without subqueries, this query would need to be broken into two separate queries: the first would find all of the paths referenced by the vdb:virtualDatabase nodes matching the version and description criteria, followed by one (or more) subsequent queries to find the nt:file nodes with the paths expressed as literal values (or bind variables).

The examples shown in this section hopefully begin to show the power and flexibility of JCR-SQL2 and the ModeShape extensions.

There are times when a formal structured query language is overkill, and the easiest way to find the right content is to perform a search, like you would with a search engine such as Google or Yahoo! This is where ModeShape's full-text search language comes in, because it allows you to use the JCR query API but with a far simpler, Google-style search grammar.

This query language is actually defined by the JCR 2.0 specification as the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language. We just pulled it out and made it available as a first-class query language, such that a full-text search query supplied by the user, full-text-query, is equivalent to executing this JCR-SQL2:

 SELECT * FROM [nt:base] WHERE CONTAINS([nt:base],'full-text-query') 

This language allows a JCR client to construct a query to find nodes with property values that match the supplied terms. Nodes that "best" match the terms are returned before nodes that have a lesser match. Of course, ModeShape uses a complex system to analyze the node content and the query terms, and may perform a number of optimizations, such as (but not limited to) eliminating stop words (e.g., "the", "a", "and", etc.), treating terms independent of case, and converting words to base forms using a process called stemming (e.g., "running" into "run", "customers" into "customer").

Search terms can also include phrases by simply wrapping the phrase with double-quotes. For example, the search term 'table "customer invoice"' would rank higher those nodes with properties containing the phrase "customer invoice" than nodes with properties containing just "customer" or "invoice".

Term in the query are implicitly AND-ed together, meaning that the matches occur when a node has property values that match all of the terms. However, it is also possible to put an "OR" in between two terms where either of those terms may occur.

By default, all terms are assumed to be positive terms, in the sense that the occurrence of the term will increase the rank of any nodes containing the value. However, it is possible to specify that terms should not appear in the results. This is called a negative term, and it reduces the rank of any node whose property values contain the the value. To specify a negative term, simply prefix the term with a hyphen ('-').

Each term may also contain wildcards to specify the pattern to be matched (or negated). ModeShape supports two different sets of wildcards:

  • '*' matches zero or more characters, and '?' matches any single character; and

  • '%' matches zero or more characters, and '_' matches any single character.

The former are wildcards that are more commonly used in various systems (including older JCR repository implementations), while the latter are the wildcards used in LIKE expressions in both JCR-SQL and JCR-SQL2. Both families are supported for convenience, and you can also mix and match and combine the various wildcards, such as 'ta**bl_' and 'ta__ble%*'. (Of course, placing multiple '*' or '%' characters next to each other offers no real benefit, as it is equivalent to a single '*' or '%'.)

If you want to use these characters literally in a term and do not want them to be treated as wildcards, they must be escaped by prefixing them with a '\' character. For example, this full text search expression:

 table\* 'customer invoice\?' 

will would rank higher those nodes with properties containing 'table*' (including the asterisk) and those containing the phrase "customer invoice?" (including the question mark). To use a literal backslash character, simply escape it as well.

The grammar for this full-text search language is specified in Section 6.7.19 of the JCR 2.0 specification, but it is also included here as a convenience.

Note

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values.

JCR 2.0 introduces a new API for programmatically constructing a query. This API allows the client to construct the lower-level objects for each part of the query, and is a great fit for applications that would otherwise generate fairly complicated query expressions. Using this API is a matter of getting the QueryObjectModelFactory from the session's QueryManager, and using the factory to create the various components, starting with the lowest-level components. Then, these lower-level components can be passed to other factory methods to create the higher-level components, and so on, until finally the createQuery(...) method is called to return the QueryObjectModel.

Here is a simple example that shows how this is done for the simple query "SELECT * FROM [nt:unstructured] AS unstructNodes":



// Obtain the query manager for the session ...
javax.jcr.query.QueryManager queryManager = session.getWorkspace().getQueryManager();
// Create a query object model factory ...
QueryObjectModelFactory factory = queryManager.getQOMFactory();
// Create the FROM clause: a selector for the [nt:unstructured] nodes ...
Selector source = factory.selector("nt:unstructured","unstructNodes");
// Create the SELECT clause (we want all columns defined on the node type) ...
Column[] columns = null;
// Create the WHERE clause (we have none for this query) ...
Constraint constraint = null;
// Define the orderings (we have none for this query)...
Ordering[] orderings = null;
// Create the query ...
QueryObjectModel query = factory.createQuery(source,constraint,orderings,columns);
 
// Execute the query and get the results ...
// (This is the same as before.)
javax.jcr.QueryResult result = query.execute();

From this point on, processing the results is the same as when using the JCR Query API:




// Iterate over the nodes in the results ...
javax.jcr.NodeIterator nodeIter = result.getNodes();
while ( nodeIter.hasNext() ) {
    javax.jcr.Node node = nodeIter.nextNode();
        ...
}
// Or iterate over the rows in the results ...
String[] columnNames = result.getColumnNames();
javax.jcr.query.RowIterator rowIter = result.getRows();
while ( rowIter.hasNext() ) {
    javax.jcr.query.Row row = rowIter.nextRow();
    // Iterate over the column values in each row ...
    javax.jcr.Value[] values = row.getValues();
    for ( javax.jcr.Value value : values ) {
                ...
    }
    // Or access the column values by name ...
    for ( String columnName : columnNames ) {
        javax.jcr.Value value = row.getValue(columnName);
                ...
    }
}
// When finished, close the session ...
session.logout();

Of course, most queries will create the columns, orderings, and constraints using the QueryObjectModelFactory, whereas the example above just assumes all of the columns, no orderings, and no constraints.

ModeShape provides a pair of ways to connect from remote clients: a WebDAV interface and a RESTful interface. This chapter details the capabilities of both as well as the configuration required to use each.

Note

Although the WebDAV and REST servers are treated separately here, many of the configuration parameters are the same. This is because both share a fair amount of common code and have been designed to be able to be deployed simultaneously on the same server or even within the same web archive.

Note

The WebDAV and REST servers described here exist for easy use, though they may need to be customized and WAR files reassembled to fit your particular application server and configuration. ModeShape's JBoss AS kit is one such customization, with a number of additional components built specifically for the JBoss Application Server environment.

ModeShape provides a WebDAV server interface to its JCR implementation to ease integration with client applications. The WebDAV server maps some of the content nodes (by default, nodes with a primary type of nt:file) to WebDAV resources and the other nodes to WebDAV folders. This allows any WebDAV client to navigate through the content repository to store files in a given location, as well as to create or delete nodes in the repository. The remainder of this section describes how to configure and deploy the WebDAV server.

The ModeShape WebDAV server is deployed as a WAR and configured mostly through its web configuration file (web.xml). Here is an example web configuration that is used for integration testing of the ModeShape WebDAV server along with an explanation of its parts.



<?xml version="1.0"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
                         "http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
    <display-name>ModeShape JCR RESTful Interface</display-name>

This first section is largely boilerplate and should look familiar to anyone who has deployed a servlet-based application before. The display-name can be customized, of course.

The next stanza configures the repository provider.



  <!--
    This parameter provides the fully-qualified name of a class that implements
    the o.m.web.jcr.spi.RepositoryProvider interface.  It is required
    by the ModeShapeJcrDeployer that controls the lifecycle for the ModeShape WebDAV server.
  -->
  <context-param>
    <param-name>org.modeshape.web.jcr.REPOSITORY_PROVIDER</param-name>
    <param-value>org.modeshape.web.jcr.spi.FactoryRepositoryProvider</param-value>
  </context-param>

As noted above, this parameter informs the ModeShapeJcrDeployer of the specific repository provider in use. Unless you are using the ModeShape WebDAV server to connect to a different JCR implementation, this should never change. The ModeShape REST server also uses the ModeShapeJcrDeployer to get access to the JCR repository, so the two servlets can be deployed in the same WAR.

Next we configure the ModeShape JcrEngine itself.



  <!--
    This parameter, specific to the FactoryRepositoryProvider implementation, specifies
    the name of the configuration file to initialize the repository or repositories.
    This configuration file must be on the classpath and is given as a classpath-relative
    directory.
  -->
  <context-param>
    <param-name>org.modeshape.web.jcr.JCR_URL</param-name>
    <param-value>file:/configRepository.xml</param-value>
  </context-param>

If you are not familiar with the file format for a JcrEngine configuration file, you can build one programatically with the JcrConfiguration class and call save(...) instead of build() to output the configuration file that equates to the configuration.

The ContentMapper implementation can also be configured, but this is optional.



    <!--
        This parameter provides the fully-qualified name of a class that implements
        the o.m.w.jcr.webdav.ContentMapper interface.  If no value is provided for this
        parameter, o.m.w.jcr.webdav.DefaultContentMapper will be used.
    -->
    <context-param>
        <param-name>org.modeshape.web.jcr.webdav.CONTENT_MAPPER_CLASS_NAME</param-name>
        <param-value>org.modeshape.web.jcr.webdav.DefaultContentMapper</param-value>
    </context-param>

This class is used to prepare WebDAV responses from content nodes. The DefaultContentMapper implementation creates nodes with type nt:folder and nt:file for WebDAV requests to create WebDAV folders and files, respectively. Users can provide their own implementation that maps WebDAV content to other node content or structures.

This is followed by some additional WebDAV configuration that controls the mapping between JCR node types and WebDAV files and resources. These parameters are all specific to the DefaultContentMapper implementation. You can omit this section entirely to use the default values or if a custom ContentMapper is used.



<!--
    Nodes with any of the primary node types in this comma-delimited list will be treated by the 
    WebDAV implementation as content nodes.  The value below is the default value for this 
    parameter.  That is, if this init parameter is omitted, the value below will be used by default.
-->
<context-param>
    <param-name>org.modeshape.web.jcr.webdav.CONTENT_PRIMARY_TYPE_NAMES</param-name>
    <param-value>nt:resource, mode:resource</param-value>
</context-param>

<!--
    Nodes with any of the primary node types in this comma-delimited list will be treated by the 
    WebDAV implementation as resource (file) nodes.  The value below is the default value for this 
    parameter.  That is, if this init parameter is omitted, the value below will be used by default.
-->
<context-param>
    <param-name>org.modeshape.web.jcr.webdav.RESOURCE_PRIMARY_TYPE_NAMES</param-name>
    <param-value>nt:file</param-value>
</context-param>

<!--
    Each folder created through the WebDAV servlet will be created as a node with the primary node 
    type below.  The value below is the default value for this parameter.  That is, if this init 
    parameter is omitted, the value below will be used by default.
-->
<context-param>
    <param-name>org.modeshape.web.jcr.webdav.NEW_FOLDER_PRIMARY_TYPE_NAME</param-name>
    <param-value>nt:folder</param-value>
</context-param>

<!--
    Each resource (file created through the WebDAV servlet will be created as a node with the primary 
    node type below.  The value below is the default value for this parameter.  That is, if this init 
    parameter is omitted, the value below will be used by default.
-->
<context-param>
    <param-name>
        org.modeshape.web.jcr.webdav.NEW_RESOURCE_PRIMARY_TYPE_NAME
    </param-name>
    <param-value>nt:file</param-value>
</context-param>

<!--
    Content created through the WebDAV servlet will be created as a node with the primary node 
    type below.  The value below is the default value for this parameter.  That is, if this init 
    parameter is omitted, the value below will be used by default.
-->
<context-param>
    <param-name>
        org.modeshape.web.jcr.webdav.NEW_CONTENT_PRIMARY_TYPE_NAME
    </param-name>
    <param-value>nt:resource</param-value>
</context-param>

In general, this part of the web configuration file should not be modified.

Next, the RequestResolver must be configured. The RequestResolver converts the incoming URI into a repository name, workspace name, and path within the repository. ModeShape provides several implementations:

  • MultiRepositoryRequestResolver - supports multiple repositories and workspaces, by using a URI format with repository name and workspace name as the first two levels of the URI. This was added in ModeShape 2.3.0.Final, and is now the resolver that is configured by default.

  • SingleRepositoryRequestResolver - maps URIs onto a single repository and workspace that are configured in the web.xml. This is useful if you want to limit which repository and workspace is exposed via WebDAV.

  • DefaultRequestResolver - maps URIs onto a single repository and workspace that are configured in the web.xml. This used to be the default resolver, and is identical to SingleRepositoryRequestResolver. However, it is now deprecated and will be removed in a future version.

If none of these fit your needs, it is easy to develop a custom implementation of this interface.

To specify the resolver, set the org.modeshape.web.jcr.webdav.REQUEST_RESOLVER_CLASS_NAME property to the name of the implementation class. For example, here is how the MultiRepositoryRequestResolver class is specified:



<!--
    This optional parameter provides the name of the o.m.w.j.webdav.RequestResolver
    implementation class.  The provided value must be the name of a class that 
    implements the RequestResolver interface and has a public, no-arg constructor.
    If no value is provided, o.m.w.j.webdav.MultiRepositoryRequestResolver will be used.
-->
<context-param>
    <param-name>org.modeshape.web.jcr.webdav.REQUEST_RESOLVER_CLASS_NAME</param-name>
    <param-value>org.modeshape.web.jcr.webdav.MultiRepositoryRequestResolver</param-value>
</context-param>

Alternatively, if the SingleRepositoryRequestResolver class is to be used, then two additional properties must define the repository name and workspace name:



<!--
    This optional parameter provides the name of the o.m.w.j.webdav.RequestResolver
    implementation class.  The provided value must be the name of a class that 
    implements the RequestResolver interface and has a public, no-arg constructor.
-->
<context-param>
    <param-name>org.modeshape.web.jcr.webdav.REQUEST_RESOLVER_CLASS_NAME</param-name>
    <param-value>org.modeshape.web.jcr.webdav.SingleRepositoryRequestResolver</param-value>
</context-param>

<!--
    This parameter is required if (and only if) the SingleRequestResolver is used.
    It provides the name of the JCR repository that will be accessed.  An exception
    will be thrown if no value is provided for this parameter.
-->
<context-param>
    <param-name>
        org.modeshape.web.jcr.webdav.SINGLE_REPOSITORY_RESOLVER_REPOSITORY_NAME
    </param-name>
    <param-value>repository</param-value>
</context-param>

<!--
    This parameter is required if (and only if) the SingleRequestResolver is used.
    It provides the name of the JCR workspace that will be accessed.  An exception
    will be thrown if no value is provided for this parameter.
-->
<context-param>
    <param-name>
        org.modeshape.web.jcr.webdav.SINGLE_REPOSITORY_RESOLVER_WORKSPACE_NAME
    </param-name>
    <param-value>default</param-value>
</context-param>

ModeShape also provides the older DefaultRequestResolver class is to be used, which is now deprecated. Please switch use the SingleRepositoryRequestResolver or MultiRepositoryRequestResolver classes. This class is provided for backward compatibility.

Once the RequestResolver has been specified, then more brief boilerplate ensues defines additional configuration information:



<!-- Required parameter for ModeShape WebDAV - should not be modified -->
<listener>
    <listener-class>org.modeshape.web.jcr.ModeShapeJcrDeployer</listener-class>
</listener>

<!-- Required WebDAV servlet - should not be modified -->
<servlet>
    <servlet-name>WebDAV</servlet-name>
    <servlet-class>org.modeshape.web.jcr.webdav.ModeShapeWebdavServlet</servlet-class>
    
    <!--
        The webdav library requires this parameter to be present, but does not use it.
    -->
    <init-param>
        <param-name>rootpath</param-name>
        <param-value>.</param-value>
    </init-param>
</servlet>

<!-- Required parameter for ModeShape WebDAV - should not be modified -->
<servlet-mapping>
    <servlet-name>WebDAV</servlet-name>
    <url-pattern>/*</url-pattern>
</servlet-mapping>

Finally, security must be configured for the WebDAV server.



    <!-- 
        The ModeShape WebDAV implementation leverages the HTTP credentials to for authentication 
        and authorization within the JCR repository.  Unless the repository provides for anonymous 
        access, it makes no sense to try to log into the JCR repository without credentials, so 
        this constraint helps lock down the repository.
        
        This should generally not be modified. 
    -->
    <security-constraint>
        <display-name>ModeShape WebDAV</display-name>
        <web-resource-collection>
            <web-resource-name>WebDAV</web-resource-name>
            <url-pattern>/*</url-pattern>
        </web-resource-collection>
        <auth-constraint>
            <!--  
                A user must be assigned this role to connect to any JCR repository, in addition to 
                needing the READONLY or READWRITE roles to actually read or modify the data.  This 
                is not used internally, so another role could be substituted here.
            -->
            <role-name>connect</role-name>
        </auth-constraint>
    </security-constraint>

    <!--  
        Any auth-method will work for ModeShape.  BASIC is used this example for simplicity.
     -->
    <login-config>
        <auth-method>BASIC</auth-method>
    </login-config>

    <!-- 
        This must match the role-name in the auth-constraint above. 
     -->
    <security-role>
        <role-name>connect</role-name>
    </security-role>
</web-app>

As noted above, the WebDAV server will not function properly unless security is configured. All authorization methods supported by the Servlet specification are supported by ModeShape and can be used interchangeable, as long as authenticated users have the connect role listed above.

Deploying the ModeShape WebDAV server only requires three steps: preparing the web configuration, configuring the users and their roles in your web container (outside the scope of this document), and assembling the WAR. This section describes the requirements for assembling the WAR.

If you are using Maven to build your projects, the WAR can be built from a POM. Here is a portion of the POM used to build the ModeShape WebDAV Server integration subproject.

<project xmlns="http://maven.apache.org/POM/4.0.0" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<artifactId>modeshape</artifactId>
		<groupId>org.modeshape</groupId>
		<version>2.0</version>
		<relativePath>../..</relativePath>
	</parent>
	<artifactId>modeshape-web-jcr-webdav-war</artifactId>
	<packaging>war</packaging>
	<name>ModeShape JCR WebDAV Servlet</name>
	<description>ModeShape servlet that provides WebDAV access to JCR items</description>
	<url>http://www.modeshape.org</url>
	<dependencies>
		<dependency>
			<groupId>org.modeshape</groupId>
			<artifactId>modeshape-web-jcr-webdav</artifactId>
			<version>${project.version}</version>
		</dependency>

		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-log4j12</artifactId>
			<version>1.5.8</version>
			<scope>runtime</scope>
		</dependency>
	</dependencies>
</project>

If you use this approach, make sure that web configuration file is in the /src/main/webapp/WEB-INF directory.

Of course, the JBoss WebDAV Server WAR can still be built if you are not using Maven. Simply construct a WAR with the following contents:

+ /WEB-INF
	+ /classes
	|	+ configRepository.xml
	|	+ log4j.properties (Optional)
	+ /lib
	|	+ aperture-1.1.0.Beta1.jar
	|	+ hamcrest-core-1.1.jar
	|	+ jakarta-regexp-1.4.jar
	|	+ jcr-2.0.jar
	|	+ joda-time-1.6.jar
	|	+ junit-dep-4.4.jar
	|	+ lucene-analyzers-3.0.2.jar
	|	+ lucene-core-3.0.2.jar
	|	+ lucene-regex-3.0.2.jar
	|	+ lucene-snowball-3.0.2.jar
	|	+ lucene-misc-3.0.2.jar
	|	+ modeshape-cnd-2.5.0.Final.jar
	|	+ modeshape-common-2.5.0.Final.jar
	|	+ modeshape-graph-2.5.0.Final.jar
	|	+ modeshape-jcr-2.5.0.Final.jar
	|	+ modeshape-jcr-api-2.5.0.Final.jar
	|	+ modeshape-mimetype-detector-aperture-2.5.0.Final.jar
	|	+ modeshape-repository-2.5.0.Final.jar
	|	+ modeshape-search-lucene-2.5.0.Final.jar
	|	+ modeshape-web-jcr-2.5.0.Final.jar
	|	+ modeshape-web-jcr-webdav-2.5.0.Final.jar
	|	+ rdf2go.api-4.6.2.jar
	|	+ slf4j-api-1.5.11.jar
	|	+ slf4j-log4j12-1.5.8.jar
	|	+ stax-api-1.0-2.jar
	|	+ webdav-servlet-2.0.jar
	+ web.xml
			

If you are using sequencers or any connectors other than the in-memory or federated connector, you will also have to add the JARs for those dependencies into the WEB-INF/lib directory as well. You will also have to change the version numbers on the JARs to reflect the current version of ModeShape.

Note

Your servlet container may already provide a logging system, and you may need to remove the "slf4j-log4j12-1.5.8.jar" and replace with the appropriate SLF4J binding jar. Or, if your servlet container already uses SLF4J globally, you may want to remove all of the "slf4j*.jar" files.

This WAR can be deployed into your servlet container.

ModeShape provides a RESTful interface to its JCR implementation that allows HTTP-based access and updating of content. Although the initial version of this REST server only supports the ModeShape JCR implementation, it has been designed to make integration with other JCR implementors easy. This section describes how to configure and deploy the REST server.

The REST Server currently supports the URIs and HTTP methods described below. The URI patterns assume that the REST server is deployed at its conventional location of "/resources". These URI patterns would change if the REST server were deployed under a different web context and URI patterns below would change accordingly.

Note

The JBoss AS kit by default will deploy the RESTful service at the "/modeshape-rest" location, which is more descriptive and better fits with the other deployed applications and services. To use these examples against this RESTful service, simply replace "/resources" with "/modeshape-rest" in each of the URLs.

Currently, only JSON-encoded responses are provided.


Note that this approach supports dynamic discovery of the available repositories on the server. A typical conversation might start with a request to the server to check the available repositories.

GET http://www.example.com/resources

This request would generate a response that mapped the names of the available repositories to metadata information about the repositories like so:

{
	"modeshape%3arepository" : { 
		"repository" : {
			"name" : "modeshape%3arepository",
			"resources" : { "workspaces":"/resources/modeshape%3arepository" }
			"metadata" : {
			   "jcr.specification.name" : "Content Repository for Java Technology API",
         "jcr.specification.version" : "2.0",
         "jcr.repository.name" : "ModeShape JCR Repository",
         "jcr.repository.vendor.url" : "http://www.modeshape.org",
         "jcr.repository.version" : "2.6.0.FINAL",
         "option.versioning.supported" : "true",
         
         ... etc. ...

         }
			}
		}
	}
}

The actual response wouldn't be pretty-printed like the example, but the format would be the same. The name of the repository ("repository" URL-encoded) is mapped to a repository object that contains a name (the redundant "repository") and a list of available resources within the repository and their respective URIs. Note that ModeShape supports deploying multiple JCR repositories side-by-side on the same server, so this response could easily contain multiple repositories in a real deployment.

Also, the "metadata" section is included only in responses from RESTful services starting with the version 2.5.0.Final release, and contains the JCR descriptors keys and values, where each value will either be a string or, if there are multiple values for the descriptor, an array of strings. Note not all the descriptors are shown in the above example.

The only thing that you can do with a repository through the REST interface at this time is to get a list of its workspaces. A request to do so can be built up from the previous response like this:

GET http://www.example.com/resources/modeshape%3arepository

This request (and all of the following requests) actually create a JCR Session to service the request and require that security be configured. This process is described in more detail in a later section. Assuming that security has been properly configured, the response would look something like this:

{
	"default" : {
		"workspace" : {
			"name" : "default",
			"resources" : { 
				"items":"/resources/modeshape%3arepository/default/items", 
				"query":"/resources/modeshape%3arepository/default/query"
			},
		}
	}
}

Like the first response, this response consists of a list of workspace names mapped to metadata about the workspaces. The example above only lists one workspace for simplicity, but there could be many different workspaces returned in a real deployment. Note that the "items" resource builds the full URI to the root of the items hierarchy, including the encoding of the repository name and the workspace name and the "query" resource builds the full URI needed to execute queries.

Now a request can be built to retrieve the root item of the repository.

GET http://www.example.com/resources/modeshape%3arepository/default/items

Any other item in the repository could be accessed by appending its path to the URI above. In a default repository with no content, this would return the following response:

{
	"properties": {
		"jcr:primaryType": "mode:root",
		"jcr:uuid": "97d7e2ef-996e-4d99-8ec2-dc623e6c2239"
	},
	"children": ["jcr:system"]

The response contains a mapping of property names to their values and an array of child names. Had one of the properties been multi-valued, the values for that property would have been provided as an array as well, as will shortly be shown.

The items resource also contains an option query parameter: mode:depth. This parameter, which defaults to 1, controls how deep the hierarchy of returned nodes should be. Had the request had the parameter:

GET http://www.example.com/resources/modeshape%3arepository/default/items?mode:depth=2

Then the response would have contained details for the children of the root node as well.

{
	"properties": {
		"jcr:primaryType": "mode:root",
		"jcr:uuid": "163bc5e5-3b57-4e63-b2ae-ededf43d3445"
	},
	"children": {
		"jcr:system": {
			"properties": {"jcr:primaryType": "mode:system"},
    		"children": ["mode:namespaces"]
		}
	}
}

It is also possible to use the RESTful API to add, modify and remove repository content. Removes are simple - a DELETE request with no body returns a response with no body.

DELETE http://www.example.com/resources/modeshape%3arepository/default/items/path/to/deletedNode

Adding content simply requires a POST to the name of the relative root node of the content that you wish to add and a request body in the same format as the response from a GET. Adding multiple nodes at once is supported, as shown below.

POST http://www.example.com/resources/modeshape%3arepository/default/items/newNode

{
	"properties": {
		"jcr:primaryType": "nt:unstructured",
		"jcr:mixinTypes": "mix:referenceable",
		"someProperty": "foo"
	},
	"children": {
		"newChildNode": {
			"properties": {"jcr:primaryType": "nt:unstructured"}
		}
	}
}

Note that protected properties like jcr:uuid are not provided but that the primary type and mixin types are provided as properties. The REST server will translate these into the appropriate calls behind the scenes. The JSON-encoded response from the request will contain the node that you just posted, including any autocreated properties and child nodes.

If you do not need this information, add mode:includeNode=false as a query parameter to your URL.

POST http://www.example.com/resources/modeshape%3arepository/default/items/newNode?mode:includeNode=false

{
	"properties": {
		"jcr:primaryType": "nt:unstructured",
		"jcr:mixinTypes": "mix:referenceable",
		"someProperty": "foo"
	},
	"children": {
		"newChildNode": {
			"properties": {"jcr:primaryType": "nt:unstructured"}
		}
	}
}

This will instruct the REST server to only return the path of the newly-created node in the response.

The PUT method allows for updates of nodes and properties. If the URI points to a property, the body of the request should be the new JSON-encoded value for the property, which includes the property name (allowing proper determination of whether the values are binary; see the next section"").

PUT http://www.example.com/resources/modeshape%3arepository/default/items/some/existing/node/someProperty

{
	"someProperty" : "bar"
}

Setting multiple properties at once can be performed by providing a URI to a node instead of a property. The body of the request should then be a JSON object that maps property names to their new values.

PUT http://www.example.com/resources/modeshape%3arepository/default/items/some/existing/node

{
	"someProperty": "foobar",
	"someOtherProperty": "newValue"
}

The JSON request can even contain a properties container:

PUT http://www.example.com/resources/modeshape%3arepository/default/items/some/existing/node

{
	"properties": {
		"someProperty": "foobar",
		"someOtherProperty": "newValue"
	}
}

A subgraph can be updated all at once using a PUT against a URI of the top node in the subgraph. Note that in this case, very node in the subgraph must be provided in the JSON request (any node not in the request will be removed). This method will attempt to set all of the properties to the new value(s) as specified in the JSON request, plus any descendant node in the JSON request that doesn't reflect an existing node will be created while any existing node not reflected in the JSON request will be removed. (Any specifications of "jcr:primaryType" are ignored if the node already exists.) In other words, the request only needs to contain the properties that are changed. Of course, if a node is being added, all of its properties need to be included in the request.

Here is an example:

PUT http://www.example.com/resources/modeshape%3arepository/default/items/some/existing/node

{
	"properties": {
		"jcr:primaryType": "nt:unstructured",
		"jcr:mixinTypes": "mix:referenceable",
		"someProperty": "foo"
	},
	"children": {
		"childNode": {
			"properties": {"jcr:primaryType": "nt:unstructured"}
		}
	}
}

This will update the existing node at "/some/existing/node" with the specified properties, and ensure that it contains one child node named "childNode". Note that the body of this request is identical in structure to that of the POST requests.

Queries can be executed through the REST interface by POSTing to the query URI with the query statement in the body of the request. The query language must be specified by setting the appropriate MIME type.


If no content type is specified or the content type for the request is not one of the content types listed above, the request will generate a response code of 400 (BAD REQUEST).

All queries for a given workspace are posted to the same URI and the request body is not JSON-encoded.

POST http://www.example.com/resources/modeshape%3arepository/default/query

/a/b/c/d[@foo='bar']

Assuming that the request above was POSTed with a content type of application/jcr+xpath, a response would be generated that consisted of a JSON object that contained a property named "rows". The "rows" property would contain an array of rows with each element being a JSON object that represented one row in the query result set.

{
	"types": {
		"someProperty": "STRING",
		"someOtherProperty": "BOOLEAN",
		"jcr:path": "STRING",
		"jcr:score": "DECIMAL"
	},
	"rows": {
		{
			"someProperty": "foobar",
			"someOtherProperty": "true",
			"jcr:path" : "/a/b/c/d",
			"jcr:score" : 0.9327
		},
		{
			"someProperty": "localValue",
			"someOtherProperty": "false",
			"jcr:path" : "/a/b/c/d[2]",
			"jcr:score" : 0.8143
		}
	}
}

If ModeShape is used as the underlying JCR implementation, the JSON object in the response will also contain a "types" property. The value of the "types" property is a JSON object that maps column names to their JCR type.

Binary property values are included in any of the the responses or requests, but are represented string values containing the Base 64 encoding of the binary content. Any such property is explicitly annotated such that "/base64/" is appended to the property name. First of all, this makes it very clear to the client and service which properties are encoded, allowing them to properly decode the values before use. Secondly, the "/base64/" suffix was carefully chosen because it cannot be used in a real property name (without escaping). Here's an example of a node containing a "jcr:primaryType" property with a single string value, a "jcr:uuid" property with another single UUID value, another "options" property that has two integer values, and a fourth "content" property that has a single binary value:

{
	"properties": {
		"jcr:primaryType": "nt:unstructured",
		"jcr:uuid": "163bc5e5-3b57-4e63-b2ae-ededf43d3445"
		"options": [ "1", "2" ]
		"content/base64/": 
	"TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4="
	},
}

All values of a property will always be Base 64 encoded if at least one of the values is binary. If there are multiple values, then they will be separated by commas and will appear within '[' and ']' characters (just like other properties).

The ModeShape REST server is deployed as a WAR and configured mostly through its web configuration file (web.xml). Here is an example web configuration that is used for integration testing of the ModeShape REST server along with an explanation of its parts.



<?xml version="1.0"?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
                         "http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
  <display-name>ModeShape JCR RESTful Interface</display-name>

This first section is largely boilerplate and should look familiar to anyone who has deployed a servlet-based application before. The display-name can be customized, of course.

The next stanza configures the repository provider.



  <!--
    This parameter provides the fully-qualified name of a class that implements
    the o.m.web.jcr.spi.RepositoryProvider interface.  It is required
    by the ModeShapeJcrDeployer that controls the lifecycle for the ModeShape REST server.
  -->
  <context-param>
    <param-name>org.modeshape.web.jcr.REPOSITORY_PROVIDER</param-name>
    <param-value>org.modeshape.web.jcr.spi.FactoryRepositoryProvider</param-value>
  </context-param>

As noted above, this parameter informs the ModeShapeJcrDeployer of the specific repository provider in use. Unless you are using the ModeShape REST server to connect to a different JCR implementation, this should never change.

Next we configure the ModeShape JcrEngine itself.



  <!--
    This parameter, specific to the FactoryRepositoryProvider implementation, specifies
    the name of the configuration file to initialize the repository or repositories.
    This configuration file must be on the classpath and is given as a classpath-relative
    directory.
  -->
  <context-param>
    <param-name>org.modeshape.web.jcr.JCR_URL</param-name>
    <param-value>file:/configRepository.xml</param-value>
  </context-param>

If you are not familiar with the file format for a JcrEngine configuration file, you can build one programatically with the JcrConfiguration class and call save(...) instead of build() to output the configuration file that equates to the configuration.

This is followed by a bit of RESTEasy and JAX-RS boilerplate.



  <!--
    This parameter defines the JAX-RS application class, which is really just a metadata class
    that lets the JAX-RS engine (RESTEasy in this case) know which classes implement pieces
    of the JAX-RS specification like exception handling and resource serving.
        
    This should not be modified. 
  -->
  <context-param>
    <param-name>javax.ws.rs.Application</param-name>
    <param-value>org.modeshape.web.jcr.rest.JcrApplication</param-value>
  </context-param>

  <!-- Required parameter for RESTEasy - should not be modified -->
  <listener>
    <listener-class>org.jboss.resteasy.plugins.server.servlet.ResteasyBootstrap</listener-class>
  </listener>

  <!-- Required parameter for ModeShape REST - should not be modified -->
  <listener>
    <listener-class>org.modeshape.web.jcr.ModeShapeJcrDeployer</listener-class>
  </listener>

  <!-- Required parameter for RESTEasy - should not be modified -->
  <servlet>
    <servlet-name>Resteasy</servlet-name>
    <servlet-class>org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher</servlet-class>
  </servlet>

  <!-- Required parameter for ModeShape REST - should not be modified -->
  <servlet-mapping>
    <servlet-name>Resteasy</servlet-name>
    <url-pattern>/*</url-pattern>
  </servlet-mapping>

In general, this part of the web configuration file should not be modified.

Finally, security must be configured for the REST server.



  <!-- 
    The ModeShape REST implementation leverages the HTTP credentials to for authentication and 
    authorization within the JCR repository.  It makes no sense to try to log into the JCR 
    repository without credentials, so this constraint helps lock down the repository.
        
    This should generally not be modified. 
  -->
  <security-constraint>
    <display-name>ModeShape REST</display-name>
    <web-resource-collection>
      <web-resource-name>RestEasy</web-resource-name>
      <url-pattern>/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
            <!--  
        A user must be assigned this role to connect to any JCR repository, in addition to needing the 
        READONLY or READWRITE roles to actually read or modify the data.  This is not used internally, 
        so another role could be substituted here.
      -->
      <role-name>connect</role-name>
    </auth-constraint>
  </security-constraint>

  <!--  
    Any auth-method will work for ModeShape.  BASIC is used this example for simplicity.
  -->
  <login-config>
    <auth-method>BASIC</auth-method>
  </login-config>

  <!-- 
    This must match the role-name in the auth-constraint above. 
  -->
  <security-role>
    <role-name>connect</role-name>
  </security-role>
</web-app>

As noted above, the REST server will not function properly unless security is configured. All authorization methods supported by the Servlet specification are supported by ModeShape and can be used interchangeable, as long as authenticated users have the connect role listed above.

Just as with the ModeShape WebDAV server, deploying the ModeShape REST server only requires three steps: preparing the web configuration, configuring the users and their roles in your web container (outside the scope of this document), and assembling the WAR. This section describes the requirements for assembling the WAR.

If you are using Maven to build your projects, the WAR can be built from a POM. Here is a portion of the POM used to build the ModeShape REST Server integration subproject.

<project xmlns="http://maven.apache.org/POM/4.0.0" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<artifactId>modeshape</artifactId>
		<groupId>org.modeshape</groupId>
		<version>2.0</version>
		<relativePath>../..</relativePath>
	</parent>
	<artifactId>modeshape-web-jcr-rest-war</artifactId>
	<packaging>war</packaging>
	<name>ModeShape JCR REST Servlet</name>
	<description>ModeShape servlet that provides RESTful access to JCR items</description>
	<url>http://www.modeshape.org</url>
	<dependencies>
		<dependency>
			<groupId>org.modeshape</groupId>
			<artifactId>modeshape-web-jcr-rest</artifactId>
			<version>2.0</version>
		</dependency>

		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-log4j12</artifactId>
			<version>1.5.8</version>
			<scope>runtime</scope>
		</dependency>
		
		<dependency>
			<groupId>org.jboss.resteasy</groupId>
			<artifactId>resteasy-client</artifactId>
			<version>1.2.1.GA</version>
		</dependency>		
	</dependencies>
</project>

If you use this approach, make sure that web configuration file is in the /src/main/webapp/WEB-INF directory.

The JBoss REST Server WAR is still easy enough to build if you are not using Maven. Simply construct a WAR with the following contents:

+ /WEB-INF
	+ /classes
	|	+ configRepository.xml
	|	+ log4j.properties (Optional)
	+ /lib
	|	+ activation-1.1.jar
	|	+ commons-codec-1.2.jar
	|	+ commons-httpclient-3.1.jar
	|	+ hamcrest-core-1.1.jar
	|	+ httpclient-4.0.jar
	|	+ httpcore-4.0.1.jar
	|	+ jakarta-regexp-1.4.jar
	|	+ javassist-3.6.0.GA.jar
	|	+ jaxb-api-2.1.jar
	|	+ jaxb-impl-2.1.12.jar
	|	+ jaxrs-api-1.2.1.GA.jar
	|	+ jcl-over-slf4j-1.5.8.jar
	|	+ jcr-2.0.jar
	|	+ jettison-1.1.jar
	|	+ joda-time-1.6.jar
	|	+ jsr250-api-1.0.jar
	|	+ junit-dep-4.4.jar
	|	+ lucene-analyzers-3.0.0.jar
	|	+ lucene-core-3.0.0.jar
	|	+ lucene-regex-3.0.0.jar
	|	+ lucene-snowball-3.0.0.jar
	|	+ modeshape-cnd-2.5.0.Final.jar
	|	+ modeshape-common-2.5.0.Final.jar
	|	+ modeshape-graph-2.5.0.Final.jar
	|	+ modeshape-jcr-2.5.0.Final.jar
	|	+ modeshape-jcr-api-2.5.0.Final.jar
	|	+ modeshape-repository-2.5.0.Final.jar
	|	+ modeshape-search-lucene-2.5.0.Final.jar
	|	+ modeshape-web-jcr-2.5.0.Final.jar
	|	+ modeshape-web-jcr-rest-2.5.0.Final.jar
	|	+ resteasy-jaxb-provider-1.2.1.GA.jar
	|	+ resteasy-jaxrs-1.2.1.GA.jar
	|	+ resteasy-jettison-provider-1.2.1.GA.jar
	|	+ scannotation-1.0.2.jar
	|	+ sjsxp-1.0.1.jar
	|	+ slf4j-api-1.5.11.jar
	|	+ slf4j-log4j12-1.5.8.jar
	|	+ slf4j-simple-1.5.8.jar
	|	+ stax-api-1.0-2.jar
	+ web.xml
			

If you are using sequencers or any connectors other than the in-memory or federated connector, you will also have to add the JARs for those dependencies into the WEB-INF/lib directory as well. You will also have to change the version numbers on the JARs to reflect the current version of ModeShape.

Note

Your servlet container may already provide a logging system, and you may need to remove the "slf4j-log4j12-1.5.8.jar" and replace with the appropriate SLF4J binding jar. Or, if your servlet container already uses SLF4J globally, you may want to remove all of the "slf4j*.jar" files.

This WAR can be deployed into your servlet container.

The ModeShape REST Client API provides a POJO way of using the ModeShape REST web service to publish (upload) and unpublish (delete) files from ModeShape repositories. Java objects open the HTTP connection, create the HTTP request URLs, attach the payload associated with PUT and POST requests, parse the HTTP JSON response back into Java objects, and close the HTTP connection.

Here are the Java business objects you will need (all found in the org.modeshape.web.jcr.rest.client.domain package):

Along with the POJOs above, an org.modeshape.web.jcr.rest.client.IRestClient is needed. The IRestClient is responsible for executing the publishing and unpublishing operations. You can also use the IRestClient to find out what repositories and workspaces are available on a ModeShape server.

Here's a code snippet that publishes (uploads) a file:

// Setup POJOs
Server server = new Server("http://localhost:8080", "username", "password");
Repository repository = new Repository("repositoryName", server);
Workspace workspace = new Workspace("workspaceName", repository);

// Publish
File file = new File("/path/to/file");
IRestClient restClient = new JsonRestClient();
Status status = restClient.publish(workspace, "/workspace/path/", file);

if (status.isError() {
    // Handle error here
}
            

Successfully executing the above code results in the creation a JCR folder node (nt:folder) for each segment of the workspace path (if the folder didn't already exist). Also, a JCR file node (a node with primary type nt:file) is created or updated under the last folder node and the file contents are encoded and uploaded into a child node of that file node.

Both the ModeShape REST server and the ModeShape WebDAV server can also be used as an interface to to other JCR repositories by creating an implementation of the RepositoryProvider interface that connects to the other repository.

The RepositoryProvider only has a few methods that must be implemented. When the ModeShapeJcrDeployer starts up, it will dynamically load the RepositoryProvider implementation (as noted above) and call the startup(ServletContext) method on the provider. The provider can use this method to load any required configuration parameters from the web configuration (web.xml) and initialize the repository.

As an example, here's the ModeShape JCR provider implementation of this method with exception handling omitted for brevity.

public void startup( ServletContext context ) {
    String configFile = context.getInitParameter(CONFIG_FILE);

     InputStream configFileInputStream = getClass().getResourceAsStream(configFile);
     jcrEngine = new JcrConfiguration().loadFrom(configFileInputStream).build();
     jcrEngine.start();
}

As you can see, the name of configuration file for the JcrEngine is read from the servlet context and used to initialize the engine. Once the repository has been started, it is now ready to accept the main methods that provide the interface to the repository.

The first method returns the set of repository names supported by this repository.

public Set<String> getJcrRepositoryNames() {
    return new HashSet<String>(jcrEngine.getRepositoryNames());
}

The ModeShape JCR repository does support multiple repositories on the same server. Other JCR implementations that don't support multiple repositories are free to return a singleton set containing any string from this method.

The other required method returns an open JCR Session for the user from the current request in a given repository and workspace. The provider can use the HttpServletRequest to get the authentication credentials for the HTTP user.

public Session getSession( HttpServletRequest request,
                           String repositoryName,
                           String workspaceName ) throws RepositoryException {
    Repository repository = getRepository(repositoryName);

	SecurityContext context = new ServletSecurityContext(request);
	Credentials credentials = new SecurityContextCredentials(context);
    return repository.login(credentials, workspaceName);
}

The getSession(...) method is used by most of the REST server methods to access the JCR repository and return results as needed.

Finally, the shutdown() method signals that the web context is being undeployed and the JCR repository should shutdown and clean up any resources that are in use.

This chapter has described two ways to access a ModeShape JCR repository remotely through HTTP-based protocols. In the next chapter, the different repository connectors will be described so that you can start to use ModeShape to store new data, connect to existing data through JCR, or both.

The in-memory repository connector is a simple connector that creates a transient, in-memory repository. This repository is used as a very simple in-memory cache or as a standalone transient repository. This connector works well for a readable and writable repository source with small to moderate sized content that need not be permanently saved.

The InMemoryRepositorySource class provides a number of JavaBean properties that control its behavior:

Table 10.1. InMemoryRepositorySource properties

PropertyDescription
defaultCachePolicyOptional property that, if used, defines the default for how long this information provided by this source may to be cached by other, higher-level components. The default value of null implies that this source does not define a specific duration for caching information provided by this repository source.
defaultWorkspaceNameOptional property that is initialized to an empty string and which defines the name for the workspace that will be used by default if none is specified.
jndiNameOptional property that, if used, specifies the name in JNDI where an InMemoryRepository instance can be found. This is an advanced property that is infrequently used.
nameThe name of the repository source, which is used by the RepositoryService when obtaining a RepositoryConnection by name.
rootNodeUuidOptional property that, if used, defines the UUID of the root node in the in-memory repository. If not used, then a new UUID is generated.
retryLimitOptional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried following a communication failure. The default value is '0'.

One way to configure the in-memory connector is to create JcrConfiguration instance with a repository source that uses the InMemoryRepositorySource class. For example:



JcrConfiguration config = ...
config.repositorySource("IMR Store")
      .usingClass(InMemoryRepositorySource.class)
      .setDescription("The repository for our content")
      .setProperty("predefinedWorkspaceNames", new String[] { "staging", "dev"})
      .setProperty("defaultWorkspaceName", workspaceName);
 

Another way to configure the in-memory connector is to create JcrConfiguration instance and load an XML configuration file that contains a repository source that uses the InMemoryRepositorySource class. For example a file named configRepository.xml can be created with these contents:



<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">
    <!-- 
    Define the sources for the content.  These sources are directly accessible using the 
    ModeShape-specific Graph API.  In fact, this is how the ModeShape JCR implementation works.  You 
    can think of these as being similar to JDBC DataSource objects, except that they expose 
    graph content via the Graph API instead of records via SQL or JDBC. 
    -->
    <mode:sources jcr:primaryType="nt:unstructured">
        <!-- 
        The 'IMR Store' repository is an in-memory source with a single default workspace (though 
        others could be created, too).
        -->
        <mode:source jcr:name="IMR Store" 
                    mode:classname="org.modeshape.graph.connector.inmemory.InMemoryRepositorySource" 
                    mode:description="The repository for our content" 
                    mode:defaultWorkspaceName="default">
           <mode:predefinedWorkspaceNames>staging</mode:predefinedWorkspaceNames>
                   <mode:predefinedWorkspaceNames>dev</mode:predefinedWorkspaceNames>
                </mode:source>
                               
    </mode:sources>
    
    <!-- MIME type detectors and JCR repositories would be defined below --> 
</configuration>
 

The configuration can then be loaded from Java like this:



JcrConfiguration config = new JcrConfiguration().loadFrom("/configRepository.xml");
 

This connector exposes an area of the local file system as a graph of "nt:file" and "nt:folder" nodes. The connector can be configured so that the workspace name is either a path to the directory on the file system that represents the root of that workspace or the name of subdirectory within a root directory (see the workspaceRootPath property below). Each connector can define whether it allows new workspaces to be created. If the directory for a workspace does not exist, this connector will attempt to create the directory (and any missing parent directories).

By default, this connector is not capable of storing extra properties other than those defined on the nt:file, nt:folder and nt:resource node types. This is because such properties cannot be represented natively on the file system. When the connector is asked to store such properties, the default behavior is to log warnings and then to ignore these extra properties. Obviously this is probably not sufficient for production (unless only the standard properties are to be used). To explicitly turn on this behavior, set the "extraPropertiesBehavior" to "log".

However, the connector can be configured differently. If the "extraPropertiesBehavior" is set to "ignore", then these extra properties will simply be silently ignored and lost: none will be stored, none will be loaded, and no warnings will be logged. If the "extraPropertiesBehavior" is set to "error", the connector will throw an exception if any extra properties are used.

Perhaps the best setting for general use, however, is to set the "extraPropertiesBehavior" to "store". In this mode, any extra properties are written to files on the file system that are adjacent to the actual file or folder. For example, given a "nt:folder" node that represents the "folder1" directory, all extra properties will be stored in a text file named "folder1.modeshape" in the same parent directory as the "folder1" directory. Similarly, given a "nt:file" node that represents the "file1" file on the file system, all extra properties will be stored in a text file named "file1.modeshape" located next to the "file1" file. Note that the "nt:resource" node for our "nt:file" node also is stored in the same location, so we can't use the "file1.modeshape" file (it's already used for the "nt:file" node), so the connector uses the "file1.content.modeshape" file instead.

The FileSystemSource class provides a number of JavaBean properties that control its behavior:

Table 11.1. FileSystemSource properties

PropertyDescription
cachePolicyOptional property that, if used, defines the cache policy for this repository source. When not used, this source will not define a specific duration for caching information.
creatingWorkspaceAllowedOptional property that defines whether clients can create additional workspaces. The default value is "true".
customPropertiesFactory Specifies the CustomPropertiesFactory implementation that should be used to augment the default properties available on each node. This property can be set either from an object that implements the CustomPropertiesFactory interface or from the name of a class with a public, no-argument constructor that implements the CustomPropertiesFactory interface. In the latter case, a the named class will be instantiated and used as the custom properties factory implementation. See also the "extraPropertiesBehavior" setting.
extraPropertiesBehavior Optional setting that specifies how to handle the extra properties on "nt:file", "nt:folder", and "nt:resource" nodes that cannot be represented on the native files themselves. Set this to "log" if warnings are to be sent to the log (the default), or "error" if setting such properties should cause an error, or "store" if they should be stored in ancillary files next to the files and folders, or "ignore" if they should be silently ignored. The "log" value will be used by default or an invalid value is specified. This setting will be ignored if a "customPropertiesFactory" class name is specified.
defaultWorkspaceNameOptional property that is initialized to "default" and which defines the name for the workspace that will be used by default if none is specified.
exclusionPattern

Specifies a regular expression that is used to determine which files and folders in the underlying file system should be exposed through this connector. Files and folders with a name that matches the provided regular expression will not be exposed by this source. Setting this property to null has the effect of removing the exclusion pattern.

inclusionPattern

Specifies a regular expression that is used to determine which files and folders in the underlying file system should be exposed through this connector. Files and folders with a name that matches the provided regular expression will be exposed by this source. Setting this property to null has the effect of removing the inclusion pattern.

filenameFilter

Specifies the FilenameFilter that is used to determine which files and folders in the underlying file system should be exposed through this connector. Only files and folders that the filter accepts will be accessible through this source.

This property can be set either from an object that implements the FilenameFilter interface or from the name of a class with a public, no-argument constructor that implements the FilenameFilter interface. In the latter case, a the named class will be instantiated and used as the filename filter implementation. Setting this property to null has the effect of clearing the filter.

 

Note: the filenameFilter, exclusionPattern, and inclusionPattern properties are somewhat mutually exclusive. If a filenameFilter is specified, then exclusionPattern and inclusionPattern are both ignored.

nameThe name of the repository source, which is used by the RepositoryService when obtaining a RepositoryConnection by name.
predefinedWorkspaceNamesOptional property that, if used, defines names of the workspaces that are predefined and need not be created before being used. This can be coupled with a "false" value for the "creatingWorkspaceAllowed" property to allow only the use of only predefined workspaces.
rootNodeUuidOptional property that, if used, specifies the UUID that should be used for the root node of each workspace. If no value is specified, a default UUID is used.
retryLimitOptional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried following a communication failure. The default value is '0'.
updatesAllowedDetermines whether the content in the file system can be updated ("true"), or if the content may only be read ("false"). The default value is "false" to avoid unintentional security vulnerabilities.
workspaceRootPath

Optional property that, if used, specifies a path on the local file system to the root of all workspaces. The source will will use the name of the workspace as a relative path from the workspaceRootPath to determine the path for a particular workspace. If no value (or a null value) is specified, the source will use the name of the workspace as a relative path from the current working directory of this virtual machine (as defined by new File(".").

As an example for a workspace named "default/foo", the source will use new File(workspaceRootPath, "default/foo") as the source directory for the connector if workspaceRootPath is set to a non-null value, or new File(".", "default/foo") as the source directory for the connector if workspaceRootPath is set to null.


One way to configure the file system connector is to create JcrConfiguration instance with a repository source that uses the FileSystemSource class. For example:



JcrConfiguration config = ...
config.repositorySource("FS Store")
      .usingClass(FileSystemSource.class)
      .setDescription("The repository for our content")
      .setProperty("workspaceRootPath", "/home/content/someApp")
      .setProperty("defaultWorkspaceName", "prod")
      .setProperty("predefinedWorkspaceNames", new String[] { "staging", "dev"})
      .setProperty("rootNodeUuid", UUID.fromString("fd129c12-81a8-42ed-aa4b-820dba49e6f0")
      .setProperty("updatesAllowed", "true")
      .setProperty("creatingWorkspaceAllowed", "false");
 

Another way to configure the file system connector is to create JcrConfiguration instance and load an XML configuration file that contains a repository source that uses the FileSystemSource class. For example a file named configRepository.xml can be created with these contents:



<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:mode="http://www.modeshape.org/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">
    <!-- 
    Define the sources for the content.  These sources are directly accessible using the 
    ModeShape-specific Graph API. In fact, this is how the ModeShape JCR implementation works.  You can 
    think of these as being similar to JDBC DataSource objects, except that they expose graph 
    content via the Graph API instead of records via SQL or JDBC. 
    -->
    <mode:sources jcr:primaryType="nt:unstructured">
        <!-- 
        The 'FS Store' repository is a file system source with a three predefined workspaces 
        ("prod", "staging", and "dev").
        -->
        <mode:source jcr:name="FS Store" 
            mode:classname="org.modeshape.connector.filesystem.FileSystemSource"
            mode:description="The repository for our content"
            mode:workspaceRootPath="/home/content/someApp"
            mode:defaultWorkspaceName="prod"
            mode:creatingWorkspacesAllowed="false"
            mode:rootNodeUuid="fd129c12-81a8-42ed-aa4b-820dba49e6f0"
            mode:updatesAllowed="true" >
            <mode:predefinedWorkspaceNames>staging</mode:predefinedWorkspaceNames>
            <mode:predefinedWorkspaceNames>dev</mode:predefinedWorkspaceNames>
            <!-- 
            If desired, specify a cache policy that caches items in memory for 5 minutes (300000 ms).
            This fragment can be left out if the connector should not cache any content.
            -->
            <mode:cachePolicy jcr:name="cachePolicy" 
              mode:classname="org.modeshape.graph.connector.path.cache.InMemoryWorkspaceCache$InMemoryCachePolicy"
              mode:timeToLiveInMilliseconds="300000" />
        </mode:source>    
    </mode:sources>

    <!-- MIME type detectors and JCR repositories would be defined below --> 
</configuration>
 

The configuration can then be loaded from Java like this:



JcrConfiguration config = new JcrConfiguration().loadFrom("/configRepository.xml");
 

This connector stores a graph of any structure or size in a relational database, using a JPA provider on top of a JDBC driver. Currently this connector relies upon some Hibernate-specific capabilities. The schema of the database is dictated by this connector and is optimized for storing a graph structure. (In other words, this connector does not expose as a graph the data in an existing database with an arbitrary schema.)

The JpaSource class provides a number of JavaBean properties that control its behavior:

Table 12.1. JpaSource properties

PropertyDescription
autoGenerateSchema Sets the Hibernate setting dictating what it does with the database schema upon first connection. Valid values are as follows (though the value is not checked):
  • "create" - Create the database schema objects when the EntityManagerFactory is created (actually when Hibernate's SessionFactory is created by the entity manager factory). If a file named "import.sql" exists in the root of the class path (e.g., '/import.sql') Hibernate will read and execute the SQL statements in this file after it has created the database objects. Note that Hibernate first delete all tables, constraints, or any other database object that is going to be created in the process of building the schema.

  • "create-drop" - Same as "create", except that the schema will be dropped after the EntityManagerFactory is closed.

  • "update" - Attempt to update the database structure to the current mapping (but does not read and invoke the SQL statements from "import.sql"). Use with caution.

  • "validate" - Validates the existing schema with the current entities configuration, but does not make any changes to the schema (and does not read and invoke the SQL statements from "import.sql"). This is the default value because it is the least intrusive and safest option, since it will verify the database's schema matches what the connector expects.

  • "disable" - Does nothing and assumes that the database is already properly configured. This should be the setting used in production, as it is a best-practice that DB administrators explicitly configure/upgrade production database schemas (using scripts).

cacheTimeToLiveInMillisecondsOptional property that, if used, defines the maximum time in milliseconds that any information returned by this connector is allowed to be cached before being considered invalid. When not used, this source will not define a specific duration for caching information. The default value is "600000" milliseconds, or 10 minutes.
compressData An advanced boolean property that dictates whether large binary and string values should be stored in a compressed form. This is enabled by default. Setting this value only affects how new records are stored; records can always be read regardless of the value of this setting. The default value is "true".
creatingWorkspaceAllowed Optional property that defines whether clients can create additional workspaces. The default value is "true".
dialect Optional property that defines the dialect of the database. If not provided, the dialect will be auto-discovered by Hibernate. Otherwise, this must match one of the Hibernate dialect names, and must correspond to the type of driver being used. And because Hibernate does a good job of auto-determining the dialect, it is recommended that you set this only if auto-discovery fails for your database. (Note that auto-discovering the dialect does not always work well with MySQL, since Hibernate has multiple dialects for MySQL and will often choose MySQL 4 MyISAM.)
dataSourceJndiName The JNDI name of the JDBC DataSource instance that should be used. If not specified, the other driver properties must be set.
driverClassloaderName The name of the class loader or classpath that should be used to load the JDBC driver class. This is not required if the DataSource is found in JNDI.
driverClassName The name of the JDBC driver class. This is not required if the DataSource is found in JNDI, but is required otherwise.
idleTimeInSecondsBeforeTestingConnections The number of seconds after a connection remains in the pool that the connection should be tested to ensure it is still valid. The default is 180 seconds (or 3 minutes).
isolationLevelOptional property that, if used, de of the java.sql.Connection#TRANSACTION_* c