ModeShape contains a backup and restore feature enables repository administrators to create backups of an entire repository (even when the repository is in use), and to then restore a repository to the state reflected by a particular backup. This works regardless of where the repository content is persisted.
There are several reasons why you might want to restore a repository to a previous state, and many are quite obvious. For example, the application or the process it’s running in might stop unexpectedly. Or perhaps the hardware on which the process is running might fail. Or perhaps the persistent store might have a catastrophic failure (although surely you’re also using the persistent store’s backup system, too).
But there are also non-failure related reasons. Backups of a running repository can be used to transfer the content to a new repository that is perhaps hosted in a different location. It might be possible to manually transfer the persisted content (e.g., in a database or on the file system), but the process of doing so varies with different kinds of persistence options. Also, ModeShape can be configured to use a distributed in-memory data grid that already maintains its own copies for ensuring high availability, and therefore the data grid might not persist anything to disk. In such cases, the content is stored on the data grid’s virtual heap, and getting access to it without ModeShape may be quite difficult. Or, you may initially configure your repository to use a particular persistence approach that suitable given the current needs, but over time the repository grows and you want to move to a different, more scalable (but perhaps more complex) persistence approach. Finally, the backup and restore feature can be used to migrate to a new major version of ModeShape.
In short, you may very well have the need to set the contents of a repository back to an earlier state. ModeShape’s backup and restore feature makes this easy to do.
Let’s walk through the basic process of creating a backup of an existing repository and then restoring the repository. Both of these steps require an authenticated Session that has administrative privileges. It actually doesn’t matter which workspace the session uses:
So far, this is basic and standard stuff for any JCR client.
Each JCR Session instance has it’s own Workspace object that provides workspace-level functionality and access to a set of “manager” interfaces: the VersionManager, NodeTypeManager, ObservationManager, LockManager, etc. The JSR-333 (aka, “JCR 2.1″) effort is still incomplete, but has plans to introduce a RepositoryManager that offers some repository-level functionality. The ModeShape public API has created such an interface, and accessing it from a standard JCR Session instance is pretty simple:
The interface is pretty self-explanatory, and defines several methods including two that are related to the backup and restore feature:
Next, we’ll take a look at each of these two methods.
The backupRepository(...) method on ModeShape’s RepositoryManager interface is used to create a backup of the entire repository, including all workspaces that existed when the backup was initiated. This method blocks until the backup is completed, so it is the caller’s responsibility to invoke the method asynchronously if that is desired. When this method is called on a repository that is being actively used, all of the changes made while the backup process is underway will be included; at some point near the end of the backup process, however, additional changes will be excluded from the backup. This means that each backup contains a fully-consistent snapshot of the entire repository as it existed near the time at which the backup completed.
Here’s an code example showing how easy it is to call this method:
Each ModeShape backup is stored on the file system in a directory that contains a series of GZIP-ed files (each containing representations of a approximately 100K nodes) and a subdirectory in which all the large BINARY values are stored.
It is also the application’s responsibility to initiate each backup operation. In other words, there currently is no way to configure ModeShape to perform backups on a schedule. Doing so would add significant complexity to ModeShape and the configuration, whereas leaving it to the application lets the application fully control how and when such backups occur.
Once you have a complete backup on disk, you can then restore a repository back to the state captured within the backup. To do that, simply start a repository (or perhaps a new instance of a repository with a different configuration) and, before it’s used by any applications, load into the new repository all of the content in the backup. Here’s a simple code example that shows how this is done:
Here’s an code example showing how easy it is to call this method:
Once a restore succeeds, the newly-restored repository will be restarted and will be ready to be used.
The above examples show how to perform a default backup & restore. However, ModeShape also offers a more advanced API which can allow fine tuning both the backup and restore behavior by extending a couple of abstract classes: BackupOptions and RestoreOptions.
In the case of backup, the following options are configurable:
|includeBinaries||true||Whether binary values should be included in the backup or not. If your repository has a large amount of binary values, you may want to exclude them since it can cause the backup to take a very long time. Since ModeShape stores only references between nodes and the binary values which are stored externally, backing up and restoring a repository without including binaries will work and ModeShape will recreate the correct links|
|documentsPerFile||100000||The number of documents (i.e. entries stored by ModeShape in Infinispan) to be included in each backup file|
|compress||true||Whether or not each file containing documents should be compressed or not|
while for restore:
|includeBinaries||true||Whether binary values should be restored from the backup folder or not. Since ModeShape stores only references between nodes and the binary values which are stored externally, backing up and restoring a repository without including binaries will work and ModeShape will recreate the correct links|
|reindexContent||true||Whether or not a reindexing of the content should be performed once the restore is complete. If the backup contains lots of data and you already have accurate indexes created prior to the backup, you may want to skip this|
We mentioned above that backup and restore can be used to migrate from one version of ModeShape to the next major version of ModeShape. To do this, first upgrade to the latest version of 3.8 (this can be done in-place) and perform a backup of the repository, which will store the backup files on the local file system. Start up a new ModeShape 4.x repository instance with the desired storage and configuration, and then use the restore functionality to read your backup files.
Note that in a few limited cases it may be possible to upgrade in-place from 3.8 to 4.x. This will only work if the Infinispan 5.3 cache store used in ModeShape 3.8 can be directly accessed by the Infinispan 6.x cache store used in ModeShape 4.x. For example, this appears to be true of the JDBC cache store. If you are going to attempt this route, please be sure and backup the 3.8 repository first.
Earlier I mentioned that backup and restore can be used to migrate from one version of ModeShape to the next major version of ModeShape. Unfortunately this backup does not exist in any of the 2.x releases, so this is not currently an option. Instead, content can be migrated via JCR import/export or with a custom application that walks a 2.x repository and copies content into the 3.x repository.
When ModeShape creates a backup in the directory of your choosing, it simply extracts the node representations and binary values and writes them to files in a directory on the file system. The node representations are actually Schematic documents, which are in-memory documents that have all the capability of both JSON and BSON, and can easily be written out in either format without loss of information.
ModeShape performs the following steps when creating a backup:
- Iterate over the all of the Infinispan cache entries, appending a number (e.g., 1000) of node documents in JSON format to a file. Whenever the maximum number of entries per backup file is reached, the file is closed, a new file will created, and the appending will continue. Note that files are compressed using GZIP as the JSON format compresses quite well. By default, 100K nodes will be exported to a single backup file; if each node requied about 200 bytes (compressed), the resulting files will be about 19 MB in size.
- Write out each of the binary values to a separate file.
We'll use a naming convention and organization within a single directory so that the restore process can simply process all of these files, load them into the new repository's Infinispan cache and binary store.