JBoss.orgCommunity Documentation

Chapter 36. eXo JCR Backup Service

36.1. Concept
36.2. How it works
36.2.1. Implementation details
36.2.2. Work basics
36.3. Configuration
36.4. RDBMS backup
36.5. Usage
36.5.1. Performing a Backup
36.5.2. Performing a Restore
36.5.3. Repository and Workspace initialization from backup
36.6. Scheduling (experimental)
36.7. Restore existing workspace or repository
36.8. Restore a workspace or a repository using original configuration
36.9. Backup set portability

Note

Restore of system workspace is not supported only as part of restoring of whole repository.

The main purpose of that feature is to restore data in case of system faults and repository crashes. Also, the backup results may be used as a content history.

The concept is based on the export of a workspace unit in the Full, or Full + Incrementals model. A repository workspace can be backup and restored using a combination of these modes. In all cases, at least one Full (initial) backup must be executed to mark a starting point of the backup history. An Incremental backup is not a complete image of the workspace. It contains only changes for some period. So it is not possible to perform an Incremental backup without an initial Full backup.

The Backup service may operate as a hot-backup process at runtime on an in-use workspace. It's a case when the Full + Incrementals model should be used to have a guaranty of data consistency during restoration. An Incremental will be run starting from the start point of the Full backup and will contain changes that have occured during the Full backup, too.

A restore operation is a mirror of a backup one. At least one Full backup should be restored to obtain a workspace corresponding to some points in time. On the other hand, Incrementals may be restored in the order of creation to reach a required state of a content. If the Incremental contains the same data as the Full backup (hot-backup), the changes will be applied again as if they were made in a normal way via API calls.

According to the model there are several modes for backup logic:

The work of Backup is based on the BackupConfig configuration and the BackupChain logical unit.

BackupConfig describes the backup operation chain that will be performed by the service. When you intend to work with it, the configuration should be prepared before the backup is started.

The configuration contains such values as:

BackupChain is a unit performing the backup process and it covers the principle of initial Full backup execution and manages Incrementals operations. BackupChain is used as a key object for accessing current backups during runtime via BackupManager. Each BackupJob performs a single atomic operation - a Full or Incremental process. The result of that operation is data for a Restore. BackupChain can contain one or more BackupJobs. But at least the initial Full job is always there. Each BackupJobs has its own unique number which means its Job order in the chain, the initial Full job always has the number 0.

Backup process, result data and file location

To start the backup process, it's necessary to create the BackupConfig and call the BackupManager.startBackup(BackupConfig) method. This method will return BackupChain created according to the configuration. At the same time, the chain creates a BackupChainLog which persists BackupConfig content and BackupChain operation states to the file in the service working directory (see Configuration).

When the chain starts the work and the initial BackupJob starts, the job will create a result data file using the destination directory path from BackupConfig. The destination directory will contain a directory with an automatically created name using the pattern repository_workspace-timestamp where timestamp is current time in the format of yyyyMMdd_hhmmss (E.g. db1_ws1-20080306_055404). The directory will contain the results of all Jobs configured for execution. Each Job stores the backup result in its own file with the name repository_workspace-timestamp.jobNumber. BackupChain saves each state (STARTING, WAITING, WORKING, FINISHED) of its Jobs in the BackupChainLog, which has a current result full file path.

BackupChain log file and job result files are a whole and consistent unit, that is a source for a Restore.

Restore requirements

As mentioned before a Restore operation is a mirror of a Backup. The process is a Full restore of a root node with restoring an additional Incremental backup to reach a desired workspace state. Restoring of the workspace Full backup will create a new workspace in the repository using given RepositoyEntry of existing repository and given (preconfigured) WorkspaceEntry for a new target workspace. A Restore process will restore a root node from the SysView XML data.

Finally, we may say that Restore is a process of a new Workspace creation and filling it with a Backup content. In case you already have a target Workspace (with the same name) in a Repository, you have to configure a new name for it. If no target workspace exists in the Repositor, you may use the same name as the Backup one.

As an optional extension, the Backup service is not enabled by default. You need to enable it via configuration.

The following is an example configuration :

<component>
  <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key>
  <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type>
  <init-params>
    <properties-param>
      <name>backup-properties</name>
      <property name="backup-dir" value="target/backup" />
    </properties-param>
  </init-params>
</component>

Where mandatory paramet is:

Also, there are optional parameters:

RDBMS backup It is the lastest, currently supportedm used by default and recommended implementation of full backup job for BackupManager service. It is useful in case when database is used to store data.

Brings such advantages:

Restoration involves reloading the backup file into a BackupChainLog and applying appropriate workspace initialization. The following snippet shows the typical sequence for restoring a workspace :

// find BackupChain using the repository and workspace names (return null if not found)
BackupChain chain = backup.findBackup("db1", "ws1");

// Get the RepositoryEntry and WorkspaceEntry
ManageableRepository repo = repositoryService.getRepository(repository);
RepositoryEntry repoconf = repo.getConfiguration();
List<WorkspaceEntry> entries = repoconf.getWorkspaceEntries();
WorkspaceEntry = getNewEntry(entries, workspace); // create a copy entry from an existing one

// restore backup log using ready RepositoryEntry and WorkspaceEntry
File backLog = new File(chain.getLogFilePath());
BackupChainLog bchLog = new BackupChainLog(backLog);

// initialize the workspace
repository.configWorkspace(workspaceEntry);

// run restoration
backup.restore(bchLog, repositoryEntry, workspaceEntry);

Repository and Workspace initialization from backup can use the BackupWorkspaceInitializer.

Will be configured BackupWorkspaceInitializer in configuration of workspace to restore the Workspace from backup over initializer.

Will be configured BackupWorkspaceInitializer in all configurations workspaces of the Repository to restore the Repository from backup over initializer.

Restoring the repository or workspace requires to shutdown the repository.

Follow these steps:

Example of configuration initializer to restore workspace "backup" over BackupWorkspaceInitializer:

<workspaces>
  <workspace name="backup" ... >
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
      ...
    </container>
    <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
      <properties>
         <property name="restore-path" value="D:\java\exo-working\backup\repository_backup-20110120_044734"/>
      </properties>
   </initializer>
    ...
</workspace>

Example of configuration initializers to restore the repository "repository" over BackupWorkspaceInitializer:

In configuration of repository will be configured initializers of workspace to refer to your backup.

For example:

...
<workspaces>
 <workspace name="system" ... >
  <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
  ...
  </container>
  <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
   <properties>
    <property name="restore-path" value="D:\java\exo-working\backup\repository_system-20110120_052334"/>
   </properties>
  </initializer>
  ...
 </workspace>

 <workspace name="collaboration" ... >
   <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
   ...
  </container>
  <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
   <properties>
    <property name="restore-path" value="D:\java\exo-working\backup\repository_collaboration-20110120_052341"/>
   </properties>
  </initializer>
  ...
 </workspace>

 <workspace name="backup" ... >
  <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
  ...
  </container>

  <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
   <properties>
    <property name="restore-path" value="D:\java\exo-working\backup\repository_backup-20110120_052417"/>
   </properties>
  </initializer>
  ...
  </workspace>
</workspaces>

The Backup service has an additional feature that can be useful for a production level backup implementation. When you need to organize a backup of a repository, it's necessary to have a tool which will be able to create and manage a cycle of Full and Incremental backups in periodic manner.

The service has internal BackupScheduler which can run a configurable cycle of BackupChains as if they have been executed by a user during some period of time. I.e. BackupScheduler is a user-like daemon which asks the BackupManager to start or stop backup operations.

For that purpose, BackupScheduler has the method.

BackupScheduler.schedule(backupConfig, startDate, stopDate, chainPeriod, incrementalPeriod)

where

// geting the scheduler from the BackupManager
   BackupScheduler scheduler = backup.getScheduler();

// schedule backup using a ready configuration (Full + Incrementals) to run from startTime
// to stopTime. Full backuop will be performed every 24 hours (BackupChain lifecycle),
// incremental will rotate result files every 3 hours.
   scheduler.schedule(config, startTime, stopTime, 3600  * 24, 3600 * 3);

// it's possible to run the scheduler for an uncertain period of time (i.e. without stop time).
// schedule backup to run from startTime till it will be stopped manually
// also there, the incremental will rotate result files as it configured in BackupConfig
   scheduler.schedule(config, startTime, null, 3600 * 24, 0);

// to unschedule backup simply call the scheduler with the configuration describing the 
// already planned backup cycle.
// the scheduler will search in internal tasks list for task with repository and
// workspace name from the configuration and will stop that task.
   scheduler.unschedule(config);

When the BackupScheduler starts the scheduling, it uses the internal Timer with startDate for the first (or just once) execution. If chainPeriod is greater than 0, then the task is repeated with this value used as a period starting from startDate. Otherwise, the task will be executed once at startDate time. If the scheduler has stopDate, it will stop the task ( the chain cycle) after stopDate. And the last parameter incrementalPeriod will be used instead of the same from BackupConfig if its values are greater than 0.

Starting each task (BackupScheduler.schedule(...)), the scheduler creates a task file in the service working directory (see Configuration, backup-dir) which describes the task backup configuration and periodic values. These files will be used at the backup service start (JVM start) to reinitialize BackupScheduler for continuous task scheduling. Only tasks that don't have a stopDate or a stopDate not expired will be reinitialized.

There is one notice about BackupScheduler task reinitialization in the current implementation. It comes from the BackupScheduler nature and its implemented behaviour. As the scheduler is just a virtual user which asks the BackupManager to start or stop backup operations, it isn't able to reinitialize each existing BackupChain before the service (JVM) is stopped. But it's possible to start a new operation with the same configuration via BackupManager (that was configured before and stored in a task file).

This is a main detail of the BackupScheduler which should be taken into suggestion of a backup operation design now. In case of reinitialization, the task will have new time values for the backup operation cycle as the chainPeriod and incrementalPeriod will be applied again. That behaviour may be changed in the future.

The resore of existing workspace or repositry is available.

For restore will be used spacial methods:

 /**
    * Restore existing workspace. Previous data will be deleted.
    * For getting status of workspace restore can use 
    * BackupManager.getLastRestore(String repositoryName, String workspaceName) method 
    * 
    * @param workspaceBackupIdentifier
    *          backup identifier
    * @param workspaceEntry
    *          new workspace configuration
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingWorkspace(String workspaceBackupIdentifier, String repositoryName, WorkspaceEntry workspaceEntry,
      boolean asynchronous) throws BackupOperationException, BackupConfigurationException;

   /**
    * Restore existing workspace. Previous data will be deleted.
    * For getting status of workspace restore use can use 
    * BackupManager.getLastRestore(String repositoryName, String workspaceName) method 
    * 
    * @param log
    *          workspace backup log
    * @param workspaceEntry
    *          new workspace configuration
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingWorkspace(BackupChainLog log, String repositoryName, WorkspaceEntry workspaceEntry, boolean asynchronous)  throws BackupOperationException, BackupConfigurationException;

   /**
    * Restore existing repository. Previous data will be deleted.
    * For getting status of repository restore can use 
    * BackupManager.getLastRestore(String repositoryName) method 
    * 
    * @param repositoryBackupIdentifier
    *          backup identifier
    * @param repositoryEntry
    *          new repository configuration
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingRepository(String  repositoryBackupIdentifier, RepositoryEntry repositoryEntry, boolean asynchronous)  throws BackupOperationException, BackupConfigurationException;

   /**
    * Restore existing repository. Previous data will be deleted.
    * For getting status of repository restore can use 
    * BackupManager.getLastRestore(String repositoryName) method 
    * 
    * @param log
    *          repository backup log
    * @param repositoryEntry
    *          new repository configuration
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingRepository(RepositoryBackupChainLog log, RepositoryEntry repositoryEntry, boolean asynchronous)
      throws BackupOperationException, BackupConfigurationException;

These methods for restore will do:

The Backup manager allows you to restore a repository or a workspace using the original configuration stored into the backup log:

/**
    * Restore existing workspace. Previous data will be deleted.
    * For getting status of workspace restore can use 
    * BackupManager.getLastRestore(String repositoryName, String workspaceName) method
    * WorkspaceEntry for restore should be contains in BackupChainLog. 
    * 
    * @param workspaceBackupIdentifier
    *          identifier to workspace backup. 
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread) 
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred 
    */
   void restoreExistingWorkspace(String workspaceBackupIdentifier, boolean asynchronous)
            throws BackupOperationException,
            BackupConfigurationException;

   /**
    * Restore existing repository. Previous data will be deleted.
    * For getting status of repository restore can use 
    * BackupManager.getLastRestore(String repositoryName) method.
    * ReprositoryEntry for restore should be contains in BackupChainLog. 
    * 
    * @param repositoryBackupIdentifier
    *          identifier to repository backup.   
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingRepository(String repositoryBackupIdentifier, boolean asynchronous)
            throws BackupOperationException,
            BackupConfigurationException;

   /**
    * WorkspaceEntry for restore should be contains in BackupChainLog. 
    * 
    * @param workspaceBackupIdentifier
    *          identifier to workspace backup. 
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread) 
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred 
    */
   void restoreWorkspace(String workspaceBackupIdentifier, boolean asynchronous) throws BackupOperationException,
            BackupConfigurationException;

   /**
    * ReprositoryEntry for restore should be contains in BackupChainLog. 
    * 
    * @param repositoryBackupIdentifier
    *          identifier to repository backup.   
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreRepository(String repositoryBackupIdentifier, boolean asynchronous) throws BackupOperationException,
            BackupConfigurationException;

    /**
    * Restore existing workspace. Previous data will be deleted.
    * For getting status of workspace restore can use 
    * BackupManager.getLastRestore(String repositoryName, String workspaceName) method
    * WorkspaceEntry for restore should be contains in BackupChainLog. 
    * 
    * @param workspaceBackupSetDir
    *          the directory with backup set  
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread) 
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred 
    */
   void restoreExistingWorkspace(File workspaceBackupSetDir, boolean asynchronous)
            throws BackupOperationException, BackupConfigurationException;

   /**
    * Restore existing repository. Previous data will be deleted.
    * For getting status of repository restore can use 
    * BackupManager.getLastRestore(String repositoryName) method.
    * ReprositoryEntry for restore should be contains in BackupChainLog. 
    * 
    * @param repositoryBackupSetDir
    *          the directory with backup set     
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreExistingRepository(File repositoryBackupSetDir, boolean asynchronous)
            throws BackupOperationException, BackupConfigurationException;

   /**
    * WorkspaceEntry for restore should be contains in BackupChainLog. 
    * 
    * @param workspaceBackupSetDir
    *          the directory with backup set 
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread) 
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred 
    */
   void restoreWorkspace(File workspaceBackupSetDir, boolean asynchronous) throws BackupOperationException,
            BackupConfigurationException;

   /**
    * ReprositoryEntry for restore should be contains in BackupChainLog. 
    * 
    * @param repositoryBackupSetDir
    *          the directory with backup set   
    * @param asynchronous
    *          if 'true' restore will be in asynchronous mode (i.e. in separated thread)
    * @throws BackupOperationException
    *           if backup operation exception occurred 
    * @throws BackupConfigurationException
    *           if configuration exception occurred
    */
   void restoreRepository(File repositoryBackupSetDir, boolean asynchronous) throws BackupOperationException,
            BackupConfigurationException;

The Backup log is stored during the Backup operation into two different locations: backup-dir directory of BackupService to support interactive operations via Backup API (e.g. console) and backup set files for portability (e.g. on another server).