JBoss.orgCommunity Documentation
The main purpose of that feature is to restore data in case of system faults and repository crashes. Also, the backup results may be used as a content history.
The eXo JCR backup service was developed from the JCR 1.8 implementation. It's an independent service available as an eXo JCR Extensions project.
The concept is based on the export of a workspace unit in the Full, or Full + Incrementals model. A repository workspace can be backup and restored using a combination of these modes. In all cases, at least one Full (initial) backup must be executed to mark a starting point of the backup history. An Incremental backup is not a complete image of the workspace. It contains only changes for some period. So it is not possible to perform an Incremental backup without an initial Full backup.
The Backup service may operate as a hot-backup process at runtime on an in-use workspace. It's a case when the Full + Incrementals model should be used to have a guaranty of data consistency during restoration. An Incremental will be run starting from the start point of the Full backup and will contain changes that have occured during the Full backup, too.
A restore operation is a mirror of a backup one. At least one Full backup should be restored to obtain a workspace corresponding to some points in time. On the other hand, Incrementals may be restored in the order of creation to reach a required state of a content. If the Incremental contains the same data as the Full backup (hot-backup), the changes will be applied again as if they were made in a normal way via API calls.
According to the model there are several modes for backup logic:
Full backup only : Single operation, runs once
Full + Incrementals : Start with an initial Full backup and then keep incrementals changes in one file. Run until it is stopped.
Full + Incrementals(periodic) : Start with an initial Full backup and then keep incrementals with periodic result file rotation. Run until it is stopped.
Full backup/restore is implemented using the JCR SysView Export/Import. Workspace data will be exported into Sysview XML data from root node.
Restoring is implemented, using the special eXo JCR API feature: a dynamic workspace creation. Restoring of the workspace Full backup will create one new workspace in the repository. Then, the SysView XML data will be imported as the root node.
Incremental backup is implemented using the eXo JCR ChangesLog API. This API allows to record each JCR API call as atomic entries in a changelog. Hence, the Incremental backup uses a listener that collects these logs and stores them in a file.
Restoring an incremental backup consists in applying the collected set of ChangesLogs to a workspace in the correct order.
The work of Backup is based on the BackupConfig configuration and the BackupChain logical unit.
BackupConfig describes the backup operation chain that will be performed by the service. When you intend to work with it, the configuration should be prepared before the backup is started.
The configuration contains such values as:
Types of full and incremental backup (fullBackupType, incrementalBackupType): Strings with full names of classes which will cover the type functional.
Incremental period: A period after that a current backup will be stopped and a new one will be started in seconds (long).
Target repository and workspace names: Strings with described names
Destination directory for result files: String with a path to a folder where operation result files will be stored.
BackupChain is a unit performing the backup process and it covers the principle of initial Full backup execution and manages Incrementals operations. BackupChain is used as a key object for accessing current backups during runtime via BackupManager. Each BackupJob performs a single atomic operation - a Full or Incremental process. The result of that operation is data for a Restore. BackupChain can contain one or more BackupJobs. But at least the initial Full job is always there. Each BackupJobs has its own unique number which means its Job order in the chain, the initial Full job always has the number 0.
Backup process, result data and file location
To start the backup process, it's necessary to create the BackupConfig and call the BackupManager.startBackup(BackupConfig) method. This method will return BackupChain created according to the configuration. At the same time, the chain creates a BackupChainLog which persists BackupConfig content and BackupChain operation states to the file in the service working directory (see Configuration).
When the chain starts the work and the initial BackupJob starts, the job will create a result data file using the destination directory path from BackupConfig. The destination directory will contain a directory with an automatically created name using the pattern repository_workspace-timestamp where timestamp is current time in the format of yyyyMMdd_hhmmss (E.g. db1_ws1-20080306_055404). The directory will contain the results of all Jobs configured for execution. Each Job stores the backup result in its own file with the name repository_workspace-timestamp.jobNumber. BackupChain saves each state (STARTING, WAITING, WORKING, FINISHED) of its Jobs in the BackupChainLog, which has a current result full file path.
BackupChain log file and job result files are a whole and consistent unit, that is a source for a Restore.
BackupChain log contains absolute paths to job result files. Don't move these files to another location.
Restore requirements
As mentioned before a Restore operation is a mirror of a Backup. The process is a Full restore of a root node with restoring an additional Incremental backup to reach a desired workspace state. Restoring of the workspace Full backup will create a new workspace in the repository using given RepositoyEntry of existing repository and given (preconfigured) WorkspaceEntry for a new target workspace. A Restore process will restore a root node from the SysView XML data.
The target workspace should not be in the repository. Otherwise, a BackupConfigurationException exception will be thrown.
Finally, we may say that Restore is a process of a new Workspace creation and filling it with a Backup content. In case you already have a target Workspace (with the same name) in a Repository, you have to configure a new name for it. If no target workspace exists in the Repositor, you may use the same name as the Backup one.
As an optional extension, the Backup service is not enabled by default. You need to enable it via configuration.
The following is an example configuration compatible with JCR 1.9.3 and later :
<component> <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key> <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type> <init-params> <properties-param> <name>backup-properties</name> <property name="default-incremental-job-period" value="3600" /> <!-- set default incremental period = 60 minutes --> <property name="full-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob" /> <property name="incremental-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob" /> <property name="backup-dir" value="target/backup" /> </properties-param> </init-params> </component>
Where:
incremental-backup-type (since 1.9.3) : The FQN of incremental job class. Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
full-backup-type (since 1.9.3) : The FQN of the full backup job class; Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
default-incremental-job-period (since 1.9.3) : The period between incremetal flushes (in seconds)
backup-dir : The path to a working directory where the service will store internal files and chain logs.
In the following example, we create a BackupConfig bean for the Full + Incrementals mode, then we ask the BackupManager to start the backup process.
// Obtaining the backup service from the eXo container. BackupManager backup = (BackupManager) container.getComponentInstanceOfType(BackupManager.class); // And prepare the BackupConfig instance with custom parameters. // full backup & incremental File backDir = new File("/backup/ws1"); // the destination path for result files backDir.mkdirs(); BackupConfig config = new BackupConfig(); config.setRepository(repository.getName()); config.setWorkspace("ws1"); config.setBackupDir(backDir); // Before 1.9.3, you also need to indicate the backupjobs class FDNs // config.setFullBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob"); // config.setIncrementalBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob"); // start backup using the service manager BackupChain chain = backup.startBackup(config);
To stop the backup operation, you have to use the BackupChain instance.
// stop backup backup.stopBackup(chain);
Restoration involves reloading the backup file into a BackupChainLog and applying appropriate workspace initialization. The following snippet shows the typical sequence for restoring a workspace :
// find BackupChain using the repository and workspace names (return null if not found) BackupChain chain = backup.findBackup("db1", "ws1"); // Get the RepositoryEntry and WorkspaceEntry ManageableRepository repo = repositoryService.getRepository(repository); RepositoryEntry repoconf = repo.getConfiguration(); List<WorkspaceEntry> entries = repoconf.getWorkspaceEntries(); WorkspaceEntry = getNewEntry(entries, workspace); // create a copy entry from an existing one // restore backup log using ready RepositoryEntry and WorkspaceEntry File backLog = new File(chain.getLogFilePath()); BackupChainLog bchLog = new BackupChainLog(backLog); // initialize the workspace repository.configWorkspace(workspaceEntry); // run restoration backup.restore(bchLog, repositoryEntry, workspaceEntry);
These instructions only applies to regular workspace. Special instructions are provided for System workspace below.
To restore a backup over an existing workspace, you are required to clear its data. Your backup process should follow these steps :
Remove workspace
ManageableRepository repo = repositoryService.getRepository(repository); repo.removeWorkspace(workspace);
Clean database, value storage, index
Restore (see snippet above)
The BackupWorkspaceInitializer is available in JCR 1.9 and later.
Restoring the JCR System workspace requires to shutdown the system and use of a special initializer.
Follow these steps (this will also work for normal workspaces) :
Stop repository (or portal)
Clean database, value storage, index;
In configuration, the workspace set BackupWorkspaceInitializer to refer to your backup.
For example :
<workspaces> <workspace name="production" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_production-20090527_030434"/> </properties> </initializer> ... </workspace>
Start repository (or portal).
The Backup service has an additional feature that can be useful for a production level backup implementation. When you need to organize a backup of a repository, it's necessary to have a tool which will be able to create and manage a cycle of Full and Incremental backups in periodic manner.
The service has internal BackupScheduler which can run a configurable cycle of BackupChains as if they have been executed by a user during some period of time. I.e. BackupScheduler is a user-like daemon which asks the BackupManager to start or stop backup operations.
For that purpose, BackupScheduler has the method.
BackupScheduler.schedule(backupConfig, startDate, stopDate, chainPeriod, incrementalPeriod)
where
backupConfig: A ready configuration which will be given to the BackupManager.startBackup() method
startDate: The date and time of the backup start
stopDate: The date and time of the backup stop
chainPeriod: A period after which a current BackupChain will be stopped and a new one will be started in seconds
incrementalPeriod: If it is greater than 0, it will be used to override the same value in backupConfig.
// geting the scheduler from the BackupManager BackupScheduler scheduler = backup.getScheduler(); // schedule backup using a ready configuration (Full + Incrementals) to run from startTime // to stopTime. Full backuop will be performed every 24 hours (BackupChain lifecycle), // incremental will rotate result files every 3 hours. scheduler.schedule(config, startTime, stopTime, 3600 * 24, 3600 * 3); // it's possible to run the scheduler for an uncertain period of time (i.e. without stop time). // schedule backup to run from startTime till it will be stopped manually // also there, the incremental will rotate result files as it configured in BackupConfig scheduler.schedule(config, startTime, null, 3600 * 24, 0); // to unschedule backup simply call the scheduler with the configuration describing the // already planned backup cycle. // the scheduler will search in internal tasks list for task with repository and // workspace name from the configuration and will stop that task. scheduler.unschedule(config);
When the BackupScheduler starts the scheduling, it uses the internal Timer with startDate for the first (or just once) execution. If chainPeriod is greater than 0, then the task is repeated with this value used as a period starting from startDate. Otherwise, the task will be executed once at startDate time. If the scheduler has stopDate, it will stop the task ( the chain cycle) after stopDate. And the last parameter incrementalPeriod will be used instead of the same from BackupConfig if its values are greater than 0.
Starting each task (BackupScheduler.schedule(...)), the scheduler creates a task file in the service working directory (see Configuration, backup-dir) which describes the task backup configuration and periodic values. These files will be used at the backup service start (JVM start) to reinitialize BackupScheduler for continuous task scheduling. Only tasks that don't have a stopDate or a stopDate not expired will be reinitialized.
There is one notice about BackupScheduler task reinitialization in the current implementation. It comes from the BackupScheduler nature and its implemented behaviour. As the scheduler is just a virtual user which asks the BackupManager to start or stop backup operations, it isn't able to reinitialize each existing BackupChain before the service (JVM) is stopped. But it's possible to start a new operation with the same configuration via BackupManager (that was configured before and stored in a task file).
This is a main detail of the BackupScheduler which should be taken into suggestion of a backup operation design now. In case of reinitialization, the task will have new time values for the backup operation cycle as the chainPeriod and incrementalPeriod will be applied again. That behaviour may be changed in the future.