JBoss Community Archive (Read Only)

ModeShape 5

Backup and restore

ModeShape contains a backup and restore feature enables repository administrators to create backups of an entire repository (even when the repository is in use), and to then restore a repository to the state reflected by a particular backup. This works regardless of where the repository content is persisted.

There are several reasons why you might want to restore a repository to a previous state, and many are quite obvious. For example, the application or the process it’s running in might stop unexpectedly. Or perhaps the hardware on which the process is running might fail. Or perhaps the persistent store might have a catastrophic failure (although surely you’re also using the persistent store’s backup system, too).

But there are also non-failure related reasons. Backups of a running repository can be used to transfer the content to a new repository that is perhaps hosted in a different location. It might be possible to manually transfer the persisted content (e.g., in a database or on the file system), but the process of doing so varies with different kinds of persistence options. Also, ModeShape can be configured to use a distributed in-memory data grid that already maintains its own copies for ensuring high availability, and therefore the data grid might not persist anything to disk. In such cases, the content is stored on the data grid’s virtual heap, and getting access to it without ModeShape may be quite difficult. Or, you may initially configure your repository to use a particular persistence approach that suitable given the current needs, but over time the repository grows and you want to move to a different, more scalable (but perhaps more complex) persistence approach. Finally, the backup and restore feature can be used to migrate to a new major version of ModeShape.

In short, you may very well have the need to set the contents of a repository back to an earlier state. ModeShape’s backup and restore feature makes this easy to do.

Getting started

Let’s walk through the basic process of creating a backup of an existing repository and then restoring the repository. Both of these steps require an authenticated Session that has administrative privileges. It actually doesn’t matter which workspace the session uses:

javax.jcr.Repository repository = ...
javax.jcr.Credentials credentials = ...
String workspaceName = ...
javax.jcr.Session session = repository.login(credentials,workspaceName);

So far, this is basic and standard stuff for any JCR client.

Introducing the RepositoryManager

Each JCR Session instance has it’s own Workspace object that provides workspace-level functionality and access to a set of “manager” interfaces: the VersionManager, NodeTypeManager, ObservationManager, LockManager, etc. The JSR-333 (aka, “JCR 2.1″) effort is still incomplete, but has plans to introduce a RepositoryManager that offers some repository-level functionality. The ModeShape public API has created such an interface, and accessing it from a standard JCR Session instance is pretty simple:

org.modeshape.jcr.api.Session msSession = (org.modeshape.jcr.api.Session)session;
org.modeshape.jcr.api.RepositoryManager repoMgr = ((org.modeshape.jcr.api.Session)session).getWorkspace().getRepositoryManager();

The interface is pretty self-explanatory, and defines several methods including two that are related to the backup and restore feature:

public interface RepositoryManager {

    ...

     /**
     * Begin a backup operation of the entire repository, writing the files associated with the backup to the specified directory
     * on the local file system.
     * <p>
     * The repository must be active when this operation is invoked, and it can continue to be used during backup (e.g., this can
     * be a "live" backup operation), but this is not recommended if the backup will be used as part of a migration to a different
     * version of ModeShape or to different installation.
     * </p>
     * <p>
     * Multiple backup operations can operate at the same time, so it is the responsibility of the caller to not overload the
     * repository with backup operations.
     * </p>
     * 
     * @param backupDirectory the directory on the local file system into which all backup files will be written; this directory
     *        need not exist, but the process must have write privilege for this directory
     * @return the problems that occurred during the backup operation
     * @throws AccessDeniedException if the current session does not have sufficient privileges to perform the backup
     * @throws RepositoryException if the backup cannot be run
     */
    Problems backupRepository( File backupDirectory ) throws RepositoryException;
    
    /**
     * Begin a backup operation of the entire repository, writing the files associated with the backup to the specified directory
     * on the local file system.
     * <p>
     * The repository must be active when this operation is invoked, and it can continue to be used during backup (e.g., this can
     * be a "live" backup operation), but this is not recommended if the backup will be used as part of a migration to a different
     * version of ModeShape or to different installation.
     * </p>
     * <p>
     * Multiple backup operations can operate at the same time, so it is the responsibility of the caller to not overload the
     * repository with backup operations.
     * </p>
     * 
     * @param backupDirectory the directory on the local file system into which all backup files will be written; this directory
     *        need not exist, but the process must have write privilege for this directory
     * @param backupOptions a {@link org.modeshape.jcr.api.BackupOptions} instance which can be used for more fine-grained control
     * of the elements that are backed up.
     * @return the problems that occurred during the backup operation
     * @throws AccessDeniedException if the current session does not have sufficient privileges to perform the backup
     * @throws RepositoryException if the backup cannot be run
     */
    Problems backupRepository( File backupDirectory, BackupOptions backupOptions ) throws RepositoryException;

    /**
     * Begin a restore operation of the entire repository, reading the backup files in the specified directory on the local file
     * system. Upon completion of the restore operation, the repository will be restarted automatically.
     * <p>
     * The repository must be active when this operation is invoked. However, the repository <em>may not</em> be used by any other
     * activities during the restore operation; doing so will likely result in a corrupt repository.
     * </p>
     * <p>
     * It is the responsibility of the caller to ensure that this method is only invoked once; calling multiple times wil lead to
     * a corrupt repository.
     * </p>
     * 
     * @param backupDirectory the directory on the local file system in which all backup files exist and were written by a
     *        previous {@link #backupRepository(File) backup operation}; this directory must exist, and the process must have read
     *        privilege for all contents in this directory
     * @return the problems that occurred during the restore operation
     * @throws AccessDeniedException if the current session does not have sufficient privileges to perform the restore
     * @throws RepositoryException if the restoration cannot be run
     */
    Problems restoreRepository( File backupDirectory ) throws RepositoryException;
    
    /**
     * Begin a restore operation of the entire repository, reading the backup files in the specified directory on the local file
     * system. Upon completion of the restore operation, the repository will be restarted automatically.
     * <p>
     * The repository must be active when this operation is invoked. However, the repository <em>may not</em> be used by any other
     * activities during the restore operation; doing so will likely result in a corrupt repository.
     * </p>
     * <p>
     * It is the responsibility of the caller to ensure that this method is only invoked once; calling multiple times wil lead to
     * a corrupt repository.
     * </p>
     * 
     * @param backupDirectory the directory on the local file system in which all backup files exist and were written by a
     *        previous {@link #backupRepository(File) backup operation}; this directory must exist, and the process must have read
     *        privilege for all contents in this directory
     * @param options a {@link org.modeshape.jcr.api.RestoreOptions} instance which can be used to control the elements and the
     * behavior of the restore process.
     * @return the problems that occurred during the restore operation
     * @throws AccessDeniedException if the current session does not have sufficient privileges to perform the restore
     * @throws RepositoryException if the restoration cannot be run
     */
    Problems restoreRepository( File backupDirectory, RestoreOptions options ) throws RepositoryException;
}

Next, we’ll take a look at each of these two methods.

Creating a backup

The backupRepository(...) method on ModeShape’s RepositoryManager interface is used to create a backup of the entire repository, including all workspaces that existed when the backup was initiated. This method blocks until the backup is completed, so it is the caller’s responsibility to invoke the method asynchronously if that is desired. When this method is called on a repository that is being actively used, all of the changes made while the backup process is underway will be included; at some point near the end of the backup process, however, additional changes will be excluded from the backup. This means that each backup contains a fully-consistent snapshot of the entire repository as it existed near the time at which the backup completed.

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.backupRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems restoring the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
       System.out.println(problem);
    }
} else {
    System.out.println("The backup was successful");
}

Each ModeShape backup is stored on the file system in a directory that contains a series of GZIP-ed files (each containing representations of a approximately 100K nodes) and a subdirectory in which all the large BINARY values are stored.

It is also the application’s responsibility to initiate each backup operation. In other words, there currently is no way to configure ModeShape to perform backups on a schedule. Doing so would add significant complexity to ModeShape and the configuration, whereas leaving it to the application lets the application fully control how and when such backups occur.

Restoring a repository

Once you have a complete backup on disk, you can then restore a repository back to the state captured within the backup. To do that, simply start a repository (or perhaps a new instance of a repository with a different configuration) and, before it’s used by any applications, load into the new repository all of the content in the backup. Here’s a simple code example that shows how this is done:

Here’s an code example showing how easy it is to call this method:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.restoreRepository(backupDirectory);
if ( problems.hasProblems() ) {
    System.out.println("Problems backing up the repository:");
    // Report the problems (we'll just print them out) ...
    for ( Problem problem : problems ) {
         System.out.println(problem);
    }
} else {
    System.out.println("The restoration was successful");
}

Once a restore succeeds, the newly-restored repository will be restarted and will be ready to be used.

Remote backup & restore

It's also possible to backup and restore a repository remotely, using ModeShape's Rest Service backup and restore methods or via the repository Web Explorer

Advanced backup & restore configuration options

The above examples show how to perform a default backup & restore. However, ModeShape also offers a more advanced API which can allow fine tuning both the backup and restore behavior by extending a couple of abstract classes: BackupOptions and RestoreOptions.

In the case of backup, the following options are configurable:

Parameter

Default

Description

includeBinaries

true

Whether binary values should be included in the backup or not. If your repository has a large amount of binary values, you may want to exclude them since it can cause the backup to take a very long time. Since ModeShape stores only references between nodes and the binary values which are stored externally, backing up and restoring a repository without including binaries will work and ModeShape will recreate the correct links

documentsPerFile

100000

The number of documents (i.e. entries stored by ModeShape) to be included in each backup file

compress

true

Whether or not each file containing documents should be compressed or not

batchSize

10000

The number of documents read in a batch from the persistent store and written to the backup. This is a performance setting to avoid out of memory errors

To use these options, one would use ModeShape's Backup API like so:

org.modeshape.jcr.api.RepositoryManager repoMgr = ...
java.io.File backupDirectory = ...
Problems problems = repoMgr.backupRepository(backupDirectory, new BackupOptions() {
            @Override
            public boolean includeBinaries() {
                return false;
            }

            @Override
            public long documentsPerFile() {
                return 1000;
            }

            @Override
            public boolean compress() {
                return true;
            }
        });

while for restore:

Parameter

Default

Description

includeBinaries

true

Whether binary values should be restored from the backup folder or not. Since ModeShape stores only references between nodes and the binary values which are stored externally, backing up and restoring a repository without including binaries will work and ModeShape will recreate the correct links

reindexContent

true

Whether or not a reindexing of the content should be performed once the restore is complete. If the backup contains lots of data and you already have accurate indexes created prior to the backup, you may want to skip this

batchSize

1000

The number of documents to write in a single transaction to the persistent store. This is a performance setting to avoid out of memory errors

The API call would look similar to the backup API call.

What's in the backup?

When ModeShape creates a backup in the directory of your choosing, it simply extracts the node representations and binary values and writes them to files in a directory on the file system. The node representations are actually Schematic documents, which are in-memory documents that have all the capability of both JSON and BSON, and can easily be written out in either format without loss of information.

ModeShape performs the following steps when creating a backup:

  1. Iterate over the all of repository documents, appending a number (e.g., 1000) of node documents in JSON format to a file. Whenever the maximum number of entries per backup file is reached, the file is closed, a new file will created, and the appending will continue. Note that files are compressed using GZIP as the JSON format compresses quite well. By default, 100K nodes will be exported to a single backup file; if each node requied about 200 bytes (compressed), the resulting files will be about 19 MB in size.

  2. Write out each of the binary values to a separate file.

We'll use a naming convention and organization within a single directory so that the restore process can simply process all of these files, load them into the new repository's persistent store and binary store.

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-11 12:12:25 UTC, last content change 2016-03-29 08:46:14 UTC.