JBoss.orgCommunity Documentation
eXo provides JCR implementation called eXo JCR.
This part will show you how to configure and use eXo JCR in GateIn and standalone.
Java Content Repository API as well as other Java language related standards is created within the Java Community Process http://jcp.org/ as a result of collaboration of an expert group and the Java community. It is known as JSR-170 (Java Specification Request).
The main purpose of content repository is to maintain the data. The heart of CR is the data model:
The main data storage abstraction of JCR's data model is a workspace
Each repository should have one or more workspaces
The content is stored in a workspace as a hierarchy of items
Each workspace has its own hierarchy of items
Node is intended to support the data hierarchy. It is of type using namespaced names which allows the content to be structured in accordance with standardized constraints. A node may be versioned through an associated version graph (optional feature)
Property stored data are values of predefined types (String, Binary, Long, Boolean, Double, Date, Reference, Path).
It is important to note that the data model for the interface (the repository model) is rarely the same as the data models used by the repository's underlying storage subsystems. The repository knows how to make the client's changes persistent because that is part of the repository configuration, rather than part of the application programming task.
JCR (Java Content Repository) is a java interface used to access contents that are not only web contents, but also other hierarchically stored data. The content is stored in a repository. The repository can be a file system, a relational database or an XML document. The internal structure of JCR data looks similar to an XML document, that means a document tree with nodes and data, but with a small difference, in JCR the data are stored in "property items".
Or better to cite the specification of JCR: "A content repository is a high-level information management system that is a superset of traditional data repositories."
How do you know the data of your website are stored? The images are probably in a file system, the meta data are in some dedicated files - maybe in XML - the text documents and pdfs are stored in different folders with the meta data in an other place (a database?) and in a proprietary structure. How do you manage to update these data and how do you manage the access rights? If your boss asks you to manage different versions of each document or not? The larger your website is, the more you need a Content Management Systems (CMS) which tackles all these issues.
These CMS solutions are sold by different vendors and each vendor provides its own API for interfacing the proprietary content repository. The developers have to deal with this and need to learn the vendor-specific API. If in the future you wish to switch to a different vendor, everything will be different and you will have a new implementation, a new interface, etc.
JCR provides a unique java interface for interacting with both text and binary data, for dealing with any kind and amount of meta data your documents might have. JCR supplies methods for storing, updating, deleting and retrieving your data, independent of the fact if this data is stored in a RDBMS, in a file system or as an XML document - you just don't need to care about. The JCR interface is also defined as classes and methods for searching, versioning, access control, locking, and observation.
Furthermore, an export and import functionality is specified so that a switch to a different vendor is always possible.
eXo fully complies a JCR standard JSR 170; therefore with eXo JCR you can use a vendor-independent API. It means that you could switch any time to a different vendor. Using the standard lowers your lifecycle cost and reduces your long term risk.
Of course eXo does not only offer JCR, but also the complete solution for ECM (Enterprise Content Management) and for WCM (Web Content Management).
In order to further understand the theory of JCR and the API, please refer to some external documents about this standard:
Roy T. Fielding, JSR 170 Overview: Standardizing the Content Repository Interface (March 13, 2005)
Benjamin Mestrallet, Tuan Nguyen, Gennady Azarenkov, Francois Moron and Brice Revenant eXo Platform v2, Portal, JCR, ECM, Groupware and Business Intelligence. (January 2006)
Access Control Configuration, Export Import Implementation, External Value Storages, JDBC Data Container config, Locking, Multilanguage support, Node types and Namespaces, Repository and Workspace management, Repository container life cycle, Workspace, Persistence Storage Workspace, SimpleDB storage
eXo Repository Service is a standard eXo service and is a registered IoC component, i.e. can be deployed in some eXo Containers (see Service configuration for details). The relationships between components are shown in the picture below:
eXo Container: some subclasses of org.exoplatform.container.ExoContainer (usually org.exoplatform.container.StandaloneContainer or org.exoplatform.container.PortalContainer) that holds a reference to Repository Service.
Repository Service: contains information about repositories. eXo JCR is able to manage many Repositories.
Repository: Implementation of javax.jcr.Repository. It holds references to one or more Workspace(s).
Workspace: Container of a single rooted tree of Items. (Note that here it is not exactly the same as javax.jcr.Workspace as it is not a per Session object).
Usual JCR application use case includes two initial steps:
Obtaining Repository object by getting Repository Service from the current eXo Container (eXo "native" way) or via JNDI lookup if eXo repository is bound to the naming context using (see Service configuration for details).
Creating javax.jcr.Session object that calls Repository.login(..).
The following diagram explains which components of eXo JCR implementation are used in a data flow to perform operations specified in JCR API
The Workspace Data Model can be split into 4 levels by data isolation and value from the JCR model point of view.
eXo JCR core implements JCR API interfaces, such as Item, Node, Property. It contains JCR "logical" view on stored data.
Session Level: isolates transient data viewable inside one JCR Session and interacts with API level using eXo JCR internal API.
Session Data Manager: maintains transient session data. With data access/ modification/ validation logic, it contains Modified Items Storage to hold the data changed between subsequent save() calling and Session Items Cache.
Transaction Data Manager: maintains session data between save() and transaction commit/ rollback if the current session is part of a transaction.
Workspace Level: operates for particular workspace shared data. It contains per-Workspace objects
Workspace Storage Data Manager: maintains workspace data, including final validation, events firing, caching.
Workspace Data Container: implements physical data storage. It allows different types of backend (like RDB, FS files, etc) to be used as a storage for JCR data. With the main Data Container, other storages for persisted Property Values can be configured and used.
Indexer: maintains workspace data indexing for further queries.
Storage Level: Persistent storages for:
JCR Data
Indexes (Apache Lucene)
Values (e.g., for BLOBs) if different from the main Data Container
Data repository and application are isolated from each other so an application developer should not learn the details of particular data storage's interfaces, but can need to concentrate on business logic of a particular application built on the top of JCR.
Repositories can be simply exchanged between different applications without changing the applications themselves. This is the matter of the repository configuration.
Data storage types/ versions can be changed and also, different types of data storages can be combined in one repository data model (of course, the complexity and work of building interfaces between the repository and its data storage don't disappear but these changes are isolated in the repository and thus manageable from the point of view of the customer).
Using a standardized repository for content management reduces the risk of dependence on a particular software vendor and proprietary API.
Costs for maintaining and developing a content repository based custom application is significantly lower than developing and supporting your own interfaces and maintaining your own data repository applications (staff can be trained once, it is possible to take help from the community and the third party consulters).
Thanks to flexible layered JCR API (see below), it is possible to fit the legacy storage subsystem into new interfaces and decrease the costs and the risk of losing data.
An extension to the API exists as we can see in the following layer schema.
The Java Content Repository specification JSR-170 has been split into two compliance levels as well as a set of optional features.
Level 1 defines a read-only repository.
Level 2 defines methods for writing content and bidirectional interaction with the repository.
eXo JCR supports JSR-170 level 1 and level 2 and all optional features. The recent JSR-283 is not yet supported.
Level 1 includes read-only functionality for very simple repositories. It is useful to port an existing data repository and convert it to a more advanced form step by step. JCR uses a well-known Session abstraction to access the repository data (similar to the sessions we have in OS, web, etc).
The features of level 1:
Initiating a session calling login method with the name of desired workspace and client credentials. It involves some security mechanisms (JAAS) to authenticate the client and in case the client is authorized to use the data from a particular workspace, he can retrieve the session with a workspace tied to it.
Using the obtained session, the client can retrieve data (items) by traversing the tree, directly accessing a particular item (requesting path or UUID) or traversing the query result. So an application developer can choose the "best" form depending on the content structure and desired operation.
Reading property values. All content of a repository is ultimately accessed through properties and stored in property values of predefined types (Boolean, Binary Data, Double, Long, String) and special types Name, Reference, and Path. It is possible to read property value without knowing its real name as a primary item.
Export to XML. Repository supports two XML/JCR data model mappings: system and doc view. The system view provides complete XML serialization without loss of information and is somewhat difficult for a human to read. In contrast, the document view is well readable but does not completely reflect the state of repository, it is used for Xpath queries.
Query facility with Xpath syntax. Xpath, originally developed for XML, suits the JCR data model as well because the JCR data model is very close to XML's one. It is applied to JCR as it would be applied to the document view of the serialized repository content, returning a table of property names and content matching the query.
Discovery of available node types. Every node should have only one primary node type that defines names, types and other characteristics of child nodes and properties. It also can have one or more mixin data types that defines additional characteristics. Level 1 provides methods for discovering available in repository node types and node types of a concrete node.
Transient namespace remapping. Item name can have prefix, delimited by a single ':' (colon) character that indicates the namespace of this name. It is patterned after XML namespaces, prefix is mapped to URI to minimize names collisions. In Level 1, a prefix can be temporary overridden by another prefix in the scope of a session.
JCR level 2 includes reading/ writing content functionality, importing other sources and managing content definition and structuring using extensible node types.
In addition to the features of the Level 1, it also supports the following major features:
Adding, moving, copying and removing items inside workspace and moving, copying and cloning items between workspaces. The client can also compare the persisted state of an item with its unsaved states and either save the new state or discard it.
Modifying and writing value of properties. Property types are checked and can be converted to the defined format.
Importing XML document into the repository as a tree of nodes and properties. If the XML document is an export of JCR system view, the content of repository can be completely restored. If this is not the case, the document is interpreted as a document view and the import procedure builds a tree of JCR nodes and properties that matches the tree structure of the XML document.
Assigning node types to nodes. The primary node type is assigned when adding a node. This can be done automatically based on the parent node type definition and mixin node types.
Persistent namespaces changes. Adding, changing and removing namespaces stored in the namespace registry, excluding built-in namespaces required by JCR.
On the top of Level 1 or Level 2, a number of optional features are defined for a more advanced repository functionality. This includes functions such as Versioning, (JTA) Transactions, Query using SQL, Explicit Locking and Content Observation. eXo JCR supports all optional features.
A javax.jcr.Repository object can be obtained by:
Using the eXo Container "native" mechanism. All Repositories are kept with a single RepositoryService component. So it can be obtained from eXo Container, described as the following:
RepositoryService repositoryService = (RepositoryService) container.getComponentInstanceOfType(RepositoryService.class); Repository repository = repositoryService.getRepository("repositoryName");
Using the eXo Container "native" mechanism with a thread local saved "current" repository (especially if you plan to use a single repository which covers more than 90% of use cases)
// set current repository at initial time RepositoryService repositoryService = (RepositoryService) container.getComponentInstanceOfType(RepositoryService.class); repositoryService.setCurrentRepositoryName("repositoryName"); .... // retrieve and use this repository Repository repository = repositoryService.getCurrentRepository();
Using JNDI as specified in JSR-170. This way you have to configure the reference (see eXo JNDI Naming configuration )
Context ctx = new InitialContext(); Repository repository =(Repository) ctx.lookup("repositoryName");
Remember that javax.jcr.Session is not a thread safe object. Never try to share it between threads.
Do not use System session from the user related code because a system session has unlimited rights. Call ManageableRepository.getSystemSession() from process related code only.
Call Session.logout() explicitly to release resources assigned to the session.
When designing your application, take care of the Session policy inside your application. Two strategies are possible: Stateless (Session per business request) and Stateful (Session per User) or some mix.
(one-shot logout for all opened sessions)
Use org.exoplatform.services.jcr.ext.common.SessionProvider which is responsible for caching/obtaining your JCR Sessions and closing all opened sessions at once.
public class SessionProvider implements SessionLifecycleListener { /** * Creates a SessionProvider for a certain identity * @param cred */ public SessionProvider(Credentials cred) /** * Gets the session from internal cache or creates and caches a new one */ public Session getSession(String workspaceName, ManageableRepository repository) throws LoginException, NoSuchWorkspaceException, RepositoryException /** * Calls a logout() method for all cached sessions */ public void close() /** * a Helper for creating a System session provider * @return System session */ public static SessionProvider createSystemProvider() /** * a Helper for creating an Anonimous session provider * @return System session */ public static SessionProvider createAnonimProvider() /** * Helper for creating session provider from AccessControlEntry. * * @return System session */ SessionProvider createProvider(List<AccessControlEntry> accessList) /** * Remove the session from the cache */ void onCloseSession(ExtendedSession session) /** * Gets the current repository used */ ManageableRepository getCurrentRepository() /** * Gets the current workspace used */ String getCurrentWorkspace() /** * Set the current repository to use */ void setCurrentRepository(ManageableRepository currentRepository) /** * Set the current workspace to use */ void setCurrentWorkspace(String currentWorkspace) }
The SessionProvider is per-request or per-user object, depending on your policy. Create it with your application before performing JCR operations, use it to obtain the Sessions and close at the end of an application session(request). See the following example:
// (1) obtain current javax.jcr.Credentials, for example get it from AuthenticationService Credentials cred = .... // (2) create SessionProvider for current user SessionProvider sessionProvider = new SessionProvider(ConversationState.getCurrent()); // NOTE: for creating an Anonymous or System Session use the corresponding static SessionProvider.create...() method // Get appropriate Repository as described in "Obtaining Repository object" section for example ManageableRepository repository = (ManageableRepository) ctx.lookup("repositoryName"); // get an appropriate workspace's session Session session = sessionProvider.getSession("workspaceName", repository); ......... // your JCR code ......... // Close the session provider sessionProvider.close();
As shown above, creating the SessionProvider involves multiple steps and you may not want to repeat them each time you need to get a JCR session. In order to avoid all this plumbing code, we provide the SessionProviderService whose goal is to help you to get a SessionProvider object.
The org.exoplatform.services.jcr.ext.app.SessionProviderService interface is defined as follows:
public interface SessionProviderService { void setSessionProvider(Object key, SessionProvider sessionProvider); SessionProvider getSessionProvider(Object key); void removeSessionProvider(Object key); }
Using this service is pretty straightforward, the main contract of an implemented component is getting a SessionProvider by key. eXo provides two implementations :
Table 1.1. SessionProvider implementations
Implementation | Description | Typical Use |
---|---|---|
org.exoplatform.services.jcr.ext.app.MapStoredSessionProviderService | per-user style : keeps objects in a Map | per-user. The usual practice uses a user's name or Credentials as a key. |
org.exoplatform.services.jcr.ext.app.ThreadLocalSessionProviderService | per-request style : keeps a single SessionProvider in a static ThreadLocal variable | Always use null for the key. |
For any implementation, your code should follow the following sequence :
Call SessionProviderService.setSessionProvider(Object key, SessionProvider sessionProvider) at the beginning of a business request for Stateless application or application's session for Statefull policy.
Call SessionProviderService.getSessionProvider(Object key) for obtaining a SessionProvider object
Call SessionProviderService.removeSessionProvider(Object key) at the end of a business request for Stateless application or application's session for Statefull policy.
eXo JCR supports observation (JSR-170 8.3), which enables applications to register interest in events that describe changes to a workspace, and then monitor and respond to those events. The standard observation feature allows dispatching events when persistent change to the workspace is made.
eXo JCR also offers a proprietary Extension Action which dispatches and fires an event upon each transient session level change, performed by a client. In other words, the event is triggered when a client's program invokes some updating methods in a session or a workspace (such as: Session.addNode(), Session.setProperty(), Workspace.move() etc.
By default when an action fails, the related exception is simply logged. In case you would like to change the default exception handling, you can implement the interface AdvancedAction. In case the JCR detects that your action is of type AdvancedAction, it will call the method onError instead of simply logging it. A default implementation of the onError method is available in the abstract class AbstractAdvancedAction. It reverts all pending changes of the current JCR session for any kind of event corresponding to a write operation. Then in case the provided exception is an instance of type AdvancedActionException, it will throw it otherwise it will log simply it. An AdvancedActionException will be thrown in case the changes could not be reverted.
AdvancedAction interface must be implemented with a lot of caution to avoid being a performance killer.
One important recommendation should be applied for an extension action implementation. Each action will add its own execution time to standard JCR methods (Session.addNode(), Session.setProperty(), Workspace.move() etc.) execution time. As a consequence, it's necessary to minimize Action.execute(Context) body execution time.
To make the rule, you can use the dedicated Thread in Action.execute(Context) body for a custom logic. But if your application logic requires the action to add items to a created/updated item and you save these changes immediately after the JCR API method call is returned, the suggestion with Thread is not applicable for you in this case.
Add a SessionActionCatalog service and an appropriate AddActionsPlugin (see the example below) configuration to your eXo Container configuration. As usual, the plugin can be configured as in-component-place, which is the case for a Standalone Container or externally, which is a usual case for Root/Portal Container configuration).
Each Action entry is exposed as org.exoplatform.services.jcr.impl.ext.action. ActionConfiguration of actions collection of org.exoplatform.services.jcr.impl.ext.action.AddActionsPlugin$ActionsConfig (see an example below). The mandatory field named actionClassName is the fully qualified name of org.exoplatform.services.command.action.Action implementation - the command will be launched in case the current event matches the criteria. All other fields are criteria. The criteria are *AND*ed together. In other words, for a particular item to be listened to, it must meet ALL the criteria:
* workspace: the comma delimited (ORed) list of workspaces
* eventTypes: a comma delimited (ORed) list of event names (see below) to be listened to. This is the only mandatory field, others are optional and if they are missing they are interpreted as ANY.
* path - a comma delimited (ORed) list of item absolute paths (or within its subtree if isDeep is true, which is the default value)
* nodeTypes - a comma delimited (ORed) list of the current NodeType. Since version 1.6.1 JCR supports the functionalities of nodeType and parentNodeType. This parameter has different semantics, depending on the type of the current item and the operation performed. If the current item is a property it means the parent node type. If the current item is a node, the semantic depends on the event type: ** add node event: the node type of the newly added node. ** add mixin event: the newly added mixing node type of the current node. ** remove mixin event the removed mixin type of the current node. ** other events: the already assigned NodeType(s) of the current node (can be both primary and mixin).
The list of fields can be extended.
No spaces between list elements.
isDeep=false means node, node properties and child nodes.
The list of supported Event names: addNode, addProperty, changeProperty, removeProperty, removeNode, addMixin, removeMixin, lock, unlock, checkin, checkout, read.
<component> <type>org.exoplatform.services.jcr.impl.ext.action.SessionActionCatalog</type> <component-plugins> <component-plugin> <name>addActions</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.jcr.impl.ext.action.AddActionsPlugin</type> <description>add actions plugin</description> <init-params> <object-param> <name>actions</name> <object type="org.exoplatform.services.jcr.impl.ext.action.AddActionsPlugin$ActionsConfig"> <field name="actions"> <collection type="java.util.ArrayList"> <value> <object type="org.exoplatform.services.jcr.impl.ext.action.ActionConfiguration"> <field name="eventTypes"><string>addNode,removeNode</string></field> <field name="path"><string>/test,/exo:test</string></field> <field name="isDeep"><boolean>true</boolean></field> <field name="nodeTypes"><string>nt:file,nt:folder,mix:lockable</string></field> <!-- field name="workspace"><string>backup</string></field --> <field name="actionClassName"><string>org.exoplatform.services.jcr.ext.DummyAction</string></field> </object> </value> </collection> </field> </object> </object-param> </init-params> </component-plugin> </component-plugins> </component>
The following is a picture about the interaction between Applications and JCR:
Every Content (JCR) dependent application interacts with eXo JCR via JSR-170 and eXo JCR API extension (mostly for administration) directly or using some intermediate Framework (Neither Application nor Framework should ever rely on Implementation directly!)
Content Application: all applications may use JCR as a data storage. Some of them are generic and completely decoupled from JCR API as interaction protocol hides Content storage nature (like WebDav client), some partially decoupled (like Command framework based), meaning that they do not use JCR API directly, and some (most part) use JSR-170 directly.
Frameworks is a special kind of JCR client that acts as an intermediate level between Content Repository and End Client Application. There are Protocol (WebDav, RMI or FTP servers for example) and Pattern (Command, Web(servlet), J2EE connector) specific Frameworks. It is possible to build a multi-layered (in framework sense) JCR application, for example Web application uses Web framework that uses Command framework underneath.
eXo JCR implementation supports two ways of Nodetypes registration:
From a NodeTypeValue POJO
From an XML document (stream)
The ExtendedNodeTypeManager (from JCR 1.11) interface provides the following methods related to registering node types:
public static final int IGNORE_IF_EXISTS = 0; public static final int FAIL_IF_EXISTS = 2; public static final int REPLACE_IF_EXISTS = 4; /** * Return NodeType for a given InternalQName. * * @param qname nodetype name * @return NodeType * @throws NoSuchNodeTypeException if no nodetype found with the name * @throws RepositoryException Repository error */ NodeType findNodeType(InternalQName qname) throws NoSuchNodeTypeException, RepositoryException; /** * Registers node type using value object. * * @param nodeTypeValue * @param alreadyExistsBehaviour * @throws RepositoryException */ NodeType registerNodeType(NodeTypeValue nodeTypeValue, int alreadyExistsBehaviour) throws RepositoryException; /** * Registers all node types using XML binding value objects from xml stream. * * @param xml a InputStream * @param alreadyExistsBehaviour a int * @throws RepositoryException */ NodeTypeIterator registerNodeTypes(InputStream xml, int alreadyExistsBehaviour, String contentType) throws RepositoryException; /** * Gives the {@link NodeTypeManager} * * @throws RepositoryException if another error occurs. */ NodeTypeDataManager getNodeTypesHolder() throws RepositoryException; /** * Return <code>NodeTypeValue</code> for a given nodetype name. Used for * nodetype update. Value can be edited and registered via * <code>registerNodeType(NodeTypeValue nodeTypeValue, int alreadyExistsBehaviour)</code> * . * * @param ntName nodetype name * @return NodeTypeValue * @throws NoSuchNodeTypeException if no nodetype found with the name * @throws RepositoryException Repository error */ NodeTypeValue getNodeTypeValue(String ntName) throws NoSuchNodeTypeException, RepositoryException; /** * Registers or updates the specified <code>Collection</code> of * <code>NodeTypeValue</code> objects. This method is used to register or * update a set of node types with mutual dependencies. Returns an iterator * over the resulting <code>NodeType</code> objects. <p/> The effect of the * method is "all or nothing"; if an error occurs, no node types are * registered or updated. <p/> Throws an * <code>InvalidNodeTypeDefinitionException</code> if a * <code>NodeTypeDefinition</code> within the <code>Collection</code> is * invalid or if the <code>Collection</code> contains an object of a type * other than <code>NodeTypeDefinition</code> . <p/> Throws a * <code>NodeTypeExistsException</code> if <code>allowUpdate</code> is * <code>false</code> and a <code>NodeTypeDefinition</code> within the * <code>Collection</code> specifies a node type name that is already * registered. <p/> Throws an * <code>UnsupportedRepositoryOperationException</code> if this implementation * does not support node type registration. * * @param values a collection of <code>NodeTypeValue</code>s * @param alreadyExistsBehaviour a int * @return the registered node types. * @throws InvalidNodeTypeDefinitionException if a * <code>NodeTypeDefinition</code> within the * <code>Collection</code> is invalid or if the * <code>Collection</code> contains an object of a type other than * <code>NodeTypeDefinition</code>. * @throws NodeTypeExistsException if <code>allowUpdate</code> is * <code>false</code> and a <code>NodeTypeDefinition</code> within * the <code>Collection</code> specifies a node type name that is * already registered. * @throws UnsupportedRepositoryOperationException if this implementation does * not support node type registration. * @throws RepositoryException if another error occurs. */ public NodeTypeIterator registerNodeTypes(List<NodeTypeValue> values, int alreadyExistsBehaviour) throws UnsupportedRepositoryOperationException, RepositoryException; /** * Unregisters the specified node type. * * @param name a <code>String</code>. * @throws UnsupportedRepositoryOperationException if this implementation does * not support node type registration. * @throws NoSuchNodeTypeException if no registered node type exists with the * specified name. * @throws RepositoryException if another error occurs. */ public void unregisterNodeType(String name) throws UnsupportedRepositoryOperationException, NoSuchNodeTypeException, RepositoryException; /** * Unregisters the specified set of node types.<p/> Used to unregister a set * of node types with mutual dependencies. * * @param names a <code>String</code> array * @throws UnsupportedRepositoryOperationException if this implementation does * not support node type registration. * @throws NoSuchNodeTypeException if one of the names listed is not a * registered node type. * @throws RepositoryException if another error occurs. */ public void unregisterNodeTypes(String[] names) throws UnsupportedRepositoryOperationException, NoSuchNodeTypeException, RepositoryException;
The NodeTypeValue interface represents a simple container structure used to define node types which are then registered through the ExtendedNodeTypeManager.registerNodeType method. The implementation of this interface does not contain any validation logic.
/** * @return Returns the declaredSupertypeNames. */ public List<String> getDeclaredSupertypeNames(); /** * @param declaredSupertypeNames *The declaredSupertypeNames to set. */ public void setDeclaredSupertypeNames(List<String> declaredSupertypeNames); /** * @return Returns the mixin. */ public boolean isMixin(); /** * @param mixin *The mixin to set. */ public void setMixin(boolean mixin); /** * @return Returns the name. */ public String getName(); /** * @param name *The name to set. */ public void setName(String name); /** * @return Returns the orderableChild. */ public boolean isOrderableChild(); /** * @param orderableChild *The orderableChild to set. */ public void setOrderableChild(boolean orderableChild); /** * @return Returns the primaryItemName. */ public String getPrimaryItemName(); /** * @param primaryItemName *The primaryItemName to set. */ public void setPrimaryItemName(String primaryItemName); /** * @return Returns the declaredChildNodeDefinitionNames. */ public List<NodeDefinitionValue> getDeclaredChildNodeDefinitionValues(); /** * @param declaredChildNodeDefinitionNames *The declaredChildNodeDefinitionNames to set. */ public void setDeclaredChildNodeDefinitionValues(List<NodeDefinitionValue> declaredChildNodeDefinitionValues); /** * @return Returns the declaredPropertyDefinitionNames. */ public List<PropertyDefinitionValue> getDeclaredPropertyDefinitionValues(); /** * @param declaredPropertyDefinitionNames *The declaredPropertyDefinitionNames to set. */ public void setDeclaredPropertyDefinitionValues(List<PropertyDefinitionValue> declaredPropertyDefinitionValues);
The NodeDefinitionValue interface extends ItemDefinitionValue with the addition of writing methods, enabling the characteristics of a child node definition to be set, after that the NodeDefinitionValue is added to a NodeTypeValue.
/** * @return Returns the defaultNodeTypeName. */ public String getDefaultNodeTypeName() /** * @param defaultNodeTypeName The defaultNodeTypeName to set. */ public void setDefaultNodeTypeName(String defaultNodeTypeName) /** * @return Returns the sameNameSiblings. */ public boolean isSameNameSiblings() /** * @param sameNameSiblings The sameNameSiblings to set. */ public void setSameNameSiblings(boolean multiple) /** * @return Returns the requiredNodeTypeNames. */ public List<String> getRequiredNodeTypeNames() /** * @param requiredNodeTypeNames The requiredNodeTypeNames to set. */ public void setRequiredNodeTypeNames(List<String> requiredNodeTypeNames)
The PropertyDefinitionValue interface extends ItemDefinitionValue with the addition of writing methods, enabling the characteristics of a child property definition to be set, after that the PropertyDefinitionValue is added to a NodeTypeValue.
/** * @return Returns the defaultValues. */ public List<String> getDefaultValueStrings(); /** * @param defaultValues The defaultValues to set. */ public void setDefaultValueStrings(List<String> defaultValues); /** * @return Returns the multiple. */ public boolean isMultiple(); /** * @param multiple The multiple to set. */ public void setMultiple(boolean multiple); /** * @return Returns the requiredType. */ public int getRequiredType(); /** * @param requiredType The requiredType to set. */ public void setRequiredType(int requiredType); /** * @return Returns the valueConstraints. */ public List<String> getValueConstraints(); /** * @param valueConstraints The valueConstraints to set. */ public void setValueConstraints(List<String> valueConstraints);
/** * @return Returns the autoCreate. */ public boolean isAutoCreate(); /** * @param autoCreate The autoCreate to set. */ public void setAutoCreate(boolean autoCreate); /** * @return Returns the mandatory. */ public boolean isMandatory(); /** * @param mandatory The mandatory to set. */ public void setMandatory(boolean mandatory); /** * @return Returns the name. */ public String getName(); /** * @param name The name to set. */ public void setName(String name); /** * @return Returns the onVersion. */ public int getOnVersion(); /** * @param onVersion The onVersion to set. */ public void setOnVersion(int onVersion); /** * @return Returns the readOnly. */ public boolean isReadOnly(); /** * @param readOnly The readOnly to set. */ public void setReadOnly(boolean readOnly);
eXo JCR implementation supports various methods of the node-type registration.
ExtendedNodeTypeManager nodeTypeManager = (ExtendedNodeTypeManager) session.getWorkspace() .getNodeTypeManager(); InputStream is = MyClass.class.getResourceAsStream("mynodetypes.xml"); nodeTypeManager.registerNodeTypes(is,ExtendedNodeTypeManager.IGNORE_IF_EXISTS );
ExtendedNodeTypeManager nodeTypeManager = (ExtendedNodeTypeManager) session.getWorkspace() .getNodeTypeManager(); NodeTypeValue testNValue = new NodeTypeValue(); List<String> superType = new ArrayList<String>(); superType.add("nt:base"); testNValue.setName("exo:myNodeType"); testNValue.setPrimaryItemName(""); testNValue.setDeclaredSupertypeNames(superType); List<PropertyDefinitionValue> props = new ArrayList<PropertyDefinitionValue>(); props.add(new PropertyDefinitionValue("*", false, false, 1, false, new ArrayList<String>(), false, 0, new ArrayList<String>())); testNValue.setDeclaredPropertyDefinitionValues(props); nodeTypeManager.registerNodeType(testNValue, ExtendedNodeTypeManager.FAIL_IF_EXISTS);
If you want to replace existing node type definition, you should pass ExtendedNodeTypeManager.REPLACE_IF_EXISTS as a second parameter for the method ExtendedNodeTypeManager.registerNodeType.
ExtendedNodeTypeManager nodeTypeManager = (ExtendedNodeTypeManager) session.getWorkspace() .getNodeTypeManager(); InputStream is = MyClass.class.getResourceAsStream("mynodetypes.xml"); ..... nodeTypeManager.registerNodeTypes(is,ExtendedNodeTypeManager.REPLACE_IF_EXISTS );
Node type is possible to remove only when the repository does not contain nodes of this type.
nodeTypeManager.unregisterNodeType("myNodeType");
NodeTypeValue myNodeTypeValue = nodeTypeManager.getNodeTypeValue(myNodeTypeName); List<PropertyDefinitionValue> props = new ArrayList<PropertyDefinitionValue>(); props.add(new PropertyDefinitionValue("tt", true, true, 1, false, new ArrayList<String>(), false, PropertyType.STRING, new ArrayList<String>())); myNodeTypeValue.setDeclaredPropertyDefinitionValues(props); nodeTypeManager.registerNodeType(myNodeTypeValue, ExtendedNodeTypeManager.REPLACE_IF_EXISTS);
NodeTypeValue myNodeTypeValue = nodeTypeManager.getNodeTypeValue(myNodeTypeName); List<NodeDefinitionValue> nodes = new ArrayList<NodeDefinitionValue>(); nodes.add(new NodeDefinitionValue("child", false, false, 1, false, "nt:base", new ArrayList<String>(), false)); testNValue.setDeclaredChildNodeDefinitionValues(nodes); nodeTypeManager.registerNodeType(myNodeTypeValue, ExtendedNodeTypeManager.REPLACE_IF_EXISTS);
Note that the existing data must be consistent before changing or removing a existing definition . JCR does not allow you to change the node type in the way in which the existing data would be incompatible with a new node type. But if these changes are needed, you can do it in several phases, consistently changing the node type and the existing data.
For example:
Add a new residual property definition with name "downloadCount" to the existing node type "myNodeType".
There are two limitations that do not allow us to make the task with a single call of registerNodeType method.
Existing nodes of the type "myNodeType", which does not contain properties "downloadCount" that conflicts with node type what we need.
Registered node type "myNodeType" will not allow us to add properties "downloadCount" because it has no such specific properties.
To complete the task, we need to make 3 steps:
Change the existing node type "myNodeType" by adding the mandatory property "downloadCount".
Add the node type "myNodeType" with the property "downloadCount" to all the existing node types.
Change the definition of the property "downloadCount" of the node type "myNodeType" to mandatory.
NodeTypeValue testNValue = nodeTypeManager.getNodeTypeValue("exo:myNodeType"); List<String> superType = testNValue.getDeclaredSupertypeNames(); superType.add("mix:versionable"); testNValue.setDeclaredSupertypeNames(superType); nodeTypeManager.registerNodeType(testNValue, ExtendedNodeTypeManager.REPLACE_IF_EXISTS);
The Registry Service is one of the key parts of the infrastructure built around eXo JCR. Each JCR that is based on service, applications, etc may have its own configuration, settings data and other data that have to be stored persistently and used by the approptiate service or application. ( We call it "Consumer").
The service acts as a centralized collector (Registry) for such data. Naturally, a registry storage is JCR based i.e. stored in some JCR workspace (one per Repository) as an Item tree under /exo:registry node.
Despite the fact that the structure of the tree is well defined (see the scheme below), it is not recommended for other services to manipulate data using JCR API directly for better flexibility. So the Registry Service acts as a mediator between a Consumer and its settings.
The proposed structure of the Registry Service storage is divided into 3 logical groups: services, applications and users:
exo:registry/ <-- registry "root" (exo:registry) exo:services/ <-- service data storage (exo:registryGroup) service1/ Consumer data (exo:registryEntry) ... exo:applications/ <-- application data storage (exo:registryGroup) app1/ Consumer data (exo:registryEntry) ... exo:users/ <-- user personal data storage (exo:registryGroup) user1/ Consumer data (exo:registryEntry) ...
Each upper level eXo Service may store its configuration in eXo Registry. At first, start from xml-config (in jar etc) and then from Registry. In configuration file, you can add force-xml-configuration parameter to component to ignore reading parameters initialization from RegistryService and to use file instead:
<value-param> <name>force-xml-configuration</name> <value>true</value> </value-param>
The main functionality of the Registry Service is pretty simple and straightforward, it is described in the Registry abstract class as the following:
public abstract class Registry { /** * Returns Registry node object which wraps Node of "exo:registry" type (the whole registry tree) */ public abstract RegistryNode getRegistry(SessionProvider sessionProvider) throws RepositoryConfigurationException, RepositoryException; /** * Returns existed RegistryEntry which wraps Node of "exo:registryEntry" type */ public abstract RegistryEntry getEntry(SessionProvider sessionProvider, String entryPath) throws PathNotFoundException, RepositoryException; /** * creates an entry in the group. In a case if the group does not exist it will be silently * created as well */ public abstract void createEntry(SessionProvider sessionProvider, String groupPath, RegistryEntry entry) throws RepositoryException; /** * updates an entry in the group */ public abstract void recreateEntry(SessionProvider sessionProvider, String groupPath, RegistryEntry entry) throws RepositoryException; /** * removes entry located on entryPath (concatenation of group path / entry name) */ public abstract void removeEntry(SessionProvider sessionProvider, String entryPath) throws RepositoryException; }
As you can see it looks like a simple CRUD interface for the RegistryEntry object which wraps registry data for some Consumer as a Registry Entry. The Registry Service itself knows nothing about the wrapping data, it is Consumer's responsibility to manage and use its data in its own way.
To create an Entity Consumer you should know how to serialize the data to some XML structure and then create a RegistryEntry from these data at once or populate them in a RegistryEntry object (using RegistryEntry(String entryName) constructor and then obtain and fill a DOM document).
Example of RegistryService using:
RegistryService regService = (RegistryService) container .getComponentInstanceOfType(RegistryService.class); RegistryEntry registryEntry = regService.getEntry(sessionProvider, RegistryService.EXO_SERVICES + "/my-service"); Document doc = registryEntry.getDocument(); String mySetting = getElementsByTagName("tagname").item(index).getTextContent(); .....
RegistryService has two optional params: value parameter mixin-names and properties parameter locations. The mixin-names is used for adding additional mixins to nodes exo:registry, exo:applications, exo:services, exo:users and exo:groups of RegistryService. This allows the top level applications to manage these nodes in special way. Locations is used to mention where exo:registry is placed for each repository. The name of each property is interpreted as a repository name and its value as a workspace name (a system workspace by default).
<component> <type>org.exoplatform.services.jcr.ext.registry.RegistryService</type> <init-params> <values-param> <name>mixin-names</name> <value>exo:hideable</value> </values-param> <properties-param> <name>locations</name> <property name="db1" value="ws2"/> </properties-param> </init-params> </component>
Since version 1.11, eXo JCR implementation supports namespaces altering.
ExtendedNamespaceRegistry namespaceRegistry = (ExtendedNamespaceRegistry) workspace.getNamespaceRegistry(); namespaceRegistry.registerNamespace("newMapping", "http://dumb.uri/jcr");
ExtendedNamespaceRegistry namespaceRegistry = (ExtendedNamespaceRegistry) workspace.getNamespaceRegistry(); namespaceRegistry.registerNamespace("newMapping", "http://dumb.uri/jcr"); namespaceRegistry.registerNamespace("newMapping2", "http://dumb.uri/jcr");
Support of node types and namespaces is required by the JSR-170 specification. Beyond the methods required by the specification, eXo JCR has its own API extension for the Node type registration as well as the ability to declaratively define node types in the Repository at the start-up time.
Node type registration extension is declared in org.exoplatform.services.jcr.core.nodetype.ExtendedNodeTypeManager interface
Your custom service can register some neccessary predefined node types at the start-up time. The node definition should be placed in a special XML file (see DTD below) and declared in the service's configuration file thanks to eXo component plugin mechanism, described as follows:
<external-component-plugins> <target-component>org.exoplatform.services.jcr.RepositoryService</target-component> <component-plugin> <name>add.nodeType</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.jcr.impl.AddNodeTypePlugin</type> <init-params> <values-param> <name>autoCreatedInNewRepository</name> <description>Node types configuration file</description> <value>jar:/conf/test/nodetypes-tck.xml</value> <value>jar:/conf/test/nodetypes-impl.xml</value> </values-param> <values-param> <name>repo1</name> <description>Node types configuration file for repository with name repo1</description> <value>jar:/conf/test/nodetypes-test.xml</value> </values-param> <values-param> <name>repo2</name> <description>Node types configuration file for repository with name repo2</description> <value>jar:/conf/test/nodetypes-test2.xml</value> </values-param> </init-params> </component-plugin>
There are two types of registration. The first type is the registration of node types in all created repositories, it is configured in values-param with the name autoCreatedInNewRepository. The second type is registration of node types in specified repository and it is configured in values-param with the name of repository.
Node type definition file format:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE nodeTypes [ <!ELEMENT nodeTypes (nodeType)*> <!ELEMENT nodeType (supertypes?|propertyDefinitions?|childNodeDefinitions?)> <!ATTLIST nodeType name CDATA #REQUIRED isMixin (true|false) #REQUIRED hasOrderableChildNodes (true|false) primaryItemName CDATA > <!ELEMENT supertypes (supertype*)> <!ELEMENT supertype (CDATA)> <!ELEMENT propertyDefinitions (propertyDefinition*)> <!ELEMENT propertyDefinition (valueConstraints?|defaultValues?)> <!ATTLIST propertyDefinition name CDATA #REQUIRED requiredType (String|Date|Path|Name|Reference|Binary|Double|Long|Boolean|undefined) #REQUIRED autoCreated (true|false) #REQUIRED mandatory (true|false) #REQUIRED onParentVersion (COPY|VERSION|INITIALIZE|COMPUTE|IGNORE|ABORT) #REQUIRED protected (true|false) #REQUIRED multiple (true|false) #REQUIRED > <!-- For example if you need to set ValueConstraints [], you have to add an empty element <valueConstraints/>. The same order is for other properties like defaultValues, requiredPrimaryTypes etc. --> <!ELEMENT valueConstraints (valueConstraint*)> <!ELEMENT valueConstraint (CDATA)> <!ELEMENT defaultValues (defaultValue*)> <!ELEMENT defaultValue (CDATA)> <!ELEMENT childNodeDefinitions (childNodeDefinition*)> <!ELEMENT childNodeDefinition (requiredPrimaryTypes)> <!ATTLIST childNodeDefinition name CDATA #REQUIRED defaultPrimaryType CDATA #REQUIRED autoCreated (true|false) #REQUIRED mandatory (true|false) #REQUIRED onParentVersion (COPY|VERSION|INITIALIZE|COMPUTE|IGNORE|ABORT) #REQUIRED protected (true|false) #REQUIRED sameNameSiblings (true|false) #REQUIRED > <!ELEMENT requiredPrimaryTypes (requiredPrimaryType+)> <!ELEMENT requiredPrimaryType (CDATA)> ]>
Default namespaces are registered by repository at the start-up time
Your custom service can extend a set of namespaces with some application specific ones, declaring it in service's configuration file thanks to eXo component plugin mechanism, described as follows:
<component-plugin> <name>add.namespaces</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.jcr.impl.AddNamespacesPlugin</type> <init-params> <properties-param> <name>namespaces</name> <property name="test" value="http://www.test.org/test"/> </properties-param> </init-params> </component-plugin>
This section provides you the knowledge about eXo JCR configuration in details, including the basic and advanced configuration.
Like other eXo services, eXo JCR can be configured and used in the portal or embedded mode (as a service embedded in GateIn) and in standalone mode.
In Embedded mode, JCR services are registered in the Portal container and the second option is to use a Standalone container. The main difference between these container types is that the first one is intended to be used in a Portal (Web) environment, while the second one can be used standalone (see the comprehensive page Service Configuration for Beginners for more details).
The following setup procedure is used to obtain a Standalone configuration (see more in Container configuration):
Configuration that is set explicitly using StandaloneContainer.addConfigurationURL(String url) or StandaloneContainer.addConfigurationPath(String path) before getInstance()
Configuration from $base:directory/exo-configuration.xml or $base:directory/conf/exo-configuration.xml file. Where $base:directory is either AS's home directory in case of J2EE AS environment or just the current directory in case of a standalone application.
/conf/exo-configuration.xml in the current classloader (e.g. war, ear archive)
Configuration from $service_jar_file/conf/portal/configuration.xml. WARNING: Don't rely on some concrete jar's configuration if you have more than one jar containing conf/portal/configuration.xml file. In this case choosing a configuration is unpredictable.
JCR service configuration looks like:
<component> <key>org.exoplatform.services.jcr.RepositoryService</key> <type>org.exoplatform.services.jcr.impl.RepositoryServiceImpl</type> </component> <component> <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key> <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type> <init-params> <value-param> <name>conf-path</name> <description>JCR repositories configuration file</description> <value>jar:/conf/standalone/exo-jcr-config.xml</value> </value-param> <value-param> <name>max-backup-files</name> <value>5</value> </value-param> <properties-param> <name>working-conf</name> <description>working-conf</description> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="hsqldb" /> <property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister" /> </properties-param> </init-params> </component>
conf-path : a path to a RepositoryService JCR Configuration.
max-backup-files : max number of backup files. This option lets you specify the number of stored backups. Number of backups can't exceed this value. File which will exceed the limit will replace the oldest file.
working-conf : optional; JCR configuration persister configuration. If there isn't a working-conf, the persister will be disabled.
The Configuration is defined in an XML file (see DTD below).
JCR Service can use multiple Repositories and each repository can have multiple Workspaces.
From v.1.9 JCR, repositories configuration parameters support human-readable formats of values. They are all case-insensitive:
Numbers formats: K,KB - kilobytes, M,MB - megabytes, G,GB - gigabytes, T,TB - terabytes. Examples: 100.5 - digit 100.5, 200k - 200 Kbytes, 4m - 4 Mbytes, 1.4G - 1.4 Gbytes, 10T - 10 Tbytes
Time format endings: ms - milliseconds, m - minutes, h - hours, d - days, w - weeks, if no ending - seconds. Examples: 500ms - 500 milliseconds, 20 - 20 seconds, 30m - 30 minutes, 12h - 12 hours, 5d - 5 days, 4w - 4 weeks.
Service configuration may be placed in jar:/conf/standalone/exo-jcr-config.xml for standalone mode. For portal mode, it is located in the portal web application portal/WEB-INF/conf/jcr/repository-configuration.xml.
default-repository: The name of a default repository (one returned by RepositoryService.getRepository()).
repositories: The list of repositories.
name: The name of a repository.
default-workspace: The name of a workspace obtained using Session's login() or login(Credentials) methods (ones without an explicit workspace name).
system-workspace: The name of workspace where /jcr:system node is placed.
security-domain: The name of a security domain for JAAS authentication.
access-control: The name of an access control policy. There can be 3 types: optional - ACL is created on-demand(default), disable - no access control, mandatory - an ACL is created for each added node(not supported yet).
authentication-policy: The name of an authentication policy class.
workspaces: The list of workspaces.
session-max-age: The time after which an idle session will be removed (called logout). If session-max-age is not set up, idle session will never be removed.
lock-remover-max-threads: Number of threads that can serve LockRemover tasks. Default value is 1. Repository may have many workspaces, each workspace have own LockManager. JCR supports Locks with defined lifetime. Such a lock must be removed is it become expired. That is what LockRemovers does. But LockRemovers is not an independent timer-threads, its a task that executed each 30 seconds. Such a task is served by ThreadPoolExecutor which may use different number of threads.
name: The name of a workspace
container: Workspace data container (physical storage) configuration.
initializer: Workspace initializer configuration.
cache: Workspace storage cache configuration.
query-handler: Query handler configuration.
auto-init-permissions: DEPRECATED in JCR 1.9 (use initializer). Default permissions of the root node. It is defined as a set of semicolon-delimited permissions containing a group of space-delimited identities (user, group, etc, see Organization service documentation for details) and the type of permission. For example, any read; :/admin read;:/admin add_node; :/admin set_property;:/admin remove means that users from group admin have all permissions and other users have only a 'read' permission.
The value-storage element is optional. If you don't include it, the values will be stored as BLOBs inside the database.
value-storage: Optional value Storage plugin definition.
class: A value storage plugin class name (attribute).
properties: The list of properties (name-value pairs) for a concrete Value Storage plugin.
filters: The list of filters defining conditions when this plugin is applicable.
class: Initializer implementation class.
properties: The list of properties (name-value pairs). Properties are supported.
root-nodetype: The node type for root node initialization.
root-permissions: Default permissions of the root node. It is defined as a set of semicolon-delimited permissions containing a group of space-delimited identities (user, group etc, see Organization service documentation for details) and the type of permission. For example any read; :/admin read;:/admin add_node; :/admin set_property;:/admin remove means that users from group admin have all permissions and other users have only a 'read' permission.
Configurable initializer adds a capability to override workspace initial startup procedure (used for Clustering).
enabled: If workspace cache is enabled or not.
class: Cache implementation class, optional from 1.9. Default value is. org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl.
Cache can be configured to use concrete implementation of WorkspaceStorageCache interface. JCR core has two implementation to use:
LinkedWorkspaceStorageCacheImpl - default, with configurable read behavior and statistic.
WorkspaceStorageCacheImpl - pre 1.9, still can be used.
properties: The list of properties (name-value pairs) for Workspace cache.
max-size: Cache maximum size (maxSize prior to v.1.9).
live-time: Cached item live time (liveTime prior to v.1.9).
From 1.9 LinkedWorkspaceStorageCacheImpl supports additional optional parameters.
statistic-period: Period (time format) of cache statistic thread execution, 5 minutes by default.
statistic-log: If true cache statistic will be printed to default logger (log.info), false by default or not.
statistic-clean: If true cache statistic will be cleaned after was gathered, false by default or not.
cleaner-period: Period of the eldest items remover execution, 20 minutes by default.
blocking-users-count: Number of concurrent users allowed to read cache storage, 0 - unlimited by default.
class: A Query Handler class name.
properties: The list of properties (name-value pairs) for a Query Handler (indexDir).
Properties and advanced features described in Search Configuration.
time-out: Time after which the unused global lock will be removed.
persister: A class for storing lock information for future use. For example, remove lock after jcr restart.
path: A lock folder. Each workspace has its own one.
Also see lock-remover-max-threads repository configuration parameter.
<!ELEMENT repository-service (repositories)> <!ATTLIST repository-service default-repository NMTOKEN #REQUIRED> <!ELEMENT repositories (repository)> <!ELEMENT repository (security-domain,access-control,session-max-age,authentication-policy,workspaces)> <!ATTLIST repository default-workspace NMTOKEN #REQUIRED name NMTOKEN #REQUIRED system-workspace NMTOKEN #REQUIRED > <!ELEMENT security-domain (#PCDATA)> <!ELEMENT access-control (#PCDATA)> <!ELEMENT session-max-age (#PCDATA)> <!ELEMENT authentication-policy (#PCDATA)> <!ELEMENT workspaces (workspace+)> <!ELEMENT workspace (container,initializer,cache,query-handler)> <!ATTLIST workspace name NMTOKEN #REQUIRED> <!ELEMENT container (properties,value-storages)> <!ATTLIST container class NMTOKEN #REQUIRED> <!ELEMENT value-storages (value-storage+)> <!ELEMENT value-storage (properties,filters)> <!ATTLIST value-storage class NMTOKEN #REQUIRED> <!ELEMENT filters (filter+)> <!ELEMENT filter EMPTY> <!ATTLIST filter property-type NMTOKEN #REQUIRED> <!ELEMENT initializer (properties)> <!ATTLIST initializer class NMTOKEN #REQUIRED> <!ELEMENT cache (properties)> <!ATTLIST cache enabled NMTOKEN #REQUIRED class NMTOKEN #REQUIRED > <!ELEMENT query-handler (properties)> <!ATTLIST query-handler class NMTOKEN #REQUIRED> <!ELEMENT access-manager (properties)> <!ATTLIST access-manager class NMTOKEN #REQUIRED> <!ELEMENT lock-manager (time-out,persister)> <!ELEMENT time-out (#PCDATA)> <!ELEMENT persister (properties)> <!ELEMENT properties (property+)> <!ELEMENT property EMPTY>
Products that use eXo JCR, sometimes missuse it since they continue to use a session that has been closed through a method call on a node, a property or even the session itself. To prevent bad practices we propose three modes which are the folllowing:
If the system property exo.jcr.prohibit.closed.session.usage has been set to true, then a RepositoryException will be thrown any time an application will try to access to a closed session. In the stack trace, you will be able to know the call stack that closes the session.
If the system property exo.jcr.prohibit.closed.session.usage has not been set and the system property exo.product.developing has been set to true, then a warning will be logged in the log file with the full stack trace in order to help identifying the root cause of the issue. In the stack trace, you will be able to know the call stack that closes the session.
If none of the previous system properties have been set, then we will ignore that the issue and let the application use the closed session as it was possible before without doing anything in order to allow applications to migrate step by step.
Since usage of closed session affects usage of closed datasource we propose three ways to resolve such kind of isses:
If the system property exo.jcr.prohibit.closed.datasource.usage is set to true (default value) then a SQLException will be thrown any time an application will try to access to a closed datasource. In the stack trace, you will be able to know the call stack that closes the datasource.
If the system property exo.jcr.prohibit.closed.datasource.usage is set to false and the system property exo.product.developing is set to true, then a warning will be logged in the log file with the full stack trace in order to help identifying the root cause of the issue. In the stack trace, you will be able to know the call stack that closes the datasource.
If the system property exo.jcr.prohibit.closed.datasource.usage is set to false and the system property exo.product.developing is set to false usage of closed datasource will be allowed and nothing will be logged or thrown.
The effective configuration of all the repositories and their workspaces can be known thanks to the method getConfigurationXML() that is exposed through JMX at the RepositoryServiceConfiguration level in case of a PortalContainer the name of the related MBean will be of type exo:portal=${portal-container-name},service=RepositoryServiceConfiguration. This method will give you the effective configuration in XML format that has been really interpreted by the the JCR core. This could be helpful to understand how your repositories/workspaces are configured especially if you would like to overwrite the configuration for some reasons.
You can configure values of properties defined in the file repository-configuration.xml using System Properties. This is quite helpful especially when you want to change the default configuration of all the workspaces for example if we want to disable the rdms indexing for all the workspace without this kind of improvement it is very error prone. For all components that can be configured thanks to properties such as container, value-storage, workspace-initializer, cache, query-handler, lock-manager, access-manager and persister the logic for example for the component 'container' and the property called 'foo' will be the following:
If we have a system property called exo.jcr.config.force.workspace.repository_collaboration.container.foo that has been defined, its value will be used for the configuration of the repository 'repository' and the workspace 'collaboration'
If we have a system property called exo.jcr.config.force.repository.repository.container.foo that has been defined, its value will be used for the configuration of all the workspaces of the repository 'repository' except the workspaces for which we configured the same property using system properties defined in #1
If we have a system property called exo.jcr.config.force.all.container.foo that has been defined, its value will be used for the configuration of all the workspaces except the workspaces for which we configured the same property using system properties defined in #1 or #2
If we have a property 'foo' configured for the repository 'repository' and the workspace 'collaboration' and we have no system properties corresponding to rule #1, #2 and #3, we will use this value (current behavior)
If the previous rules don't allow to give a value to the property 'foo', we will then check the default value in the following order exo.jcr.config.default.workspace.repository_collaboration.container.foo, exo.jcr.config.default.repository.repository.container.foo, exo.jcr.config.default.all.container.foo
To turn on this feature you need to define a component called SystemParametersPersistenceConfigurator. A simple example:
<component> <key>org.exoplatform.services.jcr.config.SystemParametersPersistenceConfigurator</key> <type>org.exoplatform.services.jcr.config.SystemParametersPersistenceConfigurator</type> <init-params> <value-param> <name>file-path</name> <value>target/temp</value> </value-param> <values-param> <name>unmodifiable</name> <value>cache.test-parameter-I</value> </values-param> <values-param> <name>before-initialize</name> <value>value-storage.enabled</value> </values-param> </init-params> </component>
To make the configuration process easier here you can define thee parameters.
file-path — this is mandatory parameter which defines the location of the file where all parameters configured on pervious launch of AS are stored.
unmodifiable — this defines the list of parameters which cannot be modified using system properties
before-initialize — this defines the list of parameters which can be set only for not initialized workspaces (e.g. during the first start of the AS)
The parameter in the list have the following format: {component-name}.{parameter-name}. This takes affect for every workspace component called {component-name}.
Please take into account that if this component is not defined in the configuration, the workspace configuration overriding using system properties mechanism will be disabled. In other words: if you don't configure SystemParametersPersistenceConfigurator, the system properties are ignored.
Whenever relational database is used to store multilingual text data of eXo Java Content Repository, it is necessary to adapt configuration in order to support UTF-8 encoding. Here is a short HOWTO instruction for several supported RDBMS with examples.
The configuration file you have to modify: .../webapps/portal/WEB-INF/conf/jcr/repository-configuration.xml
Datasource jdbcjcr
used in examples can be
configured via InitialContextInitializer
component.
In order to run multilanguage JCR on an Oracle backend Unicode
encoding for characters set should be applied to the database. Other
Oracle globalization parameters don't make any impact. The only property
to modify is NLS_CHARACTERSET
.
We have tested NLS_CHARACTERSET
=
AL32UTF8
and it works well for many European and
Asian languages.
Example of database configuration (used for JCR testing):
NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET AL32UTF8 NLS_CALENDAR GREGORIAN NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_DUAL_CURRENCY $ NLS_COMP BINARY NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CONV_EXCP FALSE NLS_NCHAR_CHARACTERSET AL16UTF16
JCR doesn't use NVARCHAR columns, so that the value of the parameter NLS_NCHAR_CHARACTERSET does not matter for JCR.
Create database with Unicode encoding and use Oracle dialect for the Workspace Container:
<workspace name="collaboration"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="oracle" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="target/temp/swap/ws" /> </properties> .....
DB2 Universal Database (DB2 UDB) supports UTF-8 and UTF-16/UCS-2. When a Unicode database is created, CHAR, VARCHAR, LONG VARCHAR data are stored in UTF-8 form. It's enough for JCR multi-lingual support.
Example of UTF-8 database creation:
DB2 CREATE DATABASE dbname USING CODESET UTF-8 TERRITORY US
Create database with UTF-8 encoding and use db2 dialect for Workspace Container on DB2 v.9 and higher:
<workspace name="collaboration"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="db2" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="target/temp/swap/ws" /> </properties> .....
For DB2 v.8.x support change the property "dialect" to db2v8.
JCR MySQL-backend requires special dialect MySQL-UTF8 to be used for internationalization support. But the database default charset should be latin1 to use limited index space effectively (1000 bytes for MyISAM engine, 767 for InnoDB). If database default charset is multibyte, a JCR database initialization error is thrown concerning index creation failure. In other words, JCR can work on any singlebyte default charset of database, with UTF8 supported by MySQL server. But we have tested it only on latin1 database default charset.
Repository configuration, workspace container entry example:
<workspace name="collaboration"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="mysql-utf8" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="target/temp/swap/ws" /> </properties> .....
You will need also to indicate the charset name either at server level using the server parameter --character-set-server (find more details there ) or at datasource configuration level by adding a new property as below:
<property name="connectionProperties" value="useUnicode=yes;characterEncoding=utf8;characterSetResults=UTF-8;" />
On PostgreSQL/PostgrePlus-backend, multilingual support can be enabled in different ways:
Using the locale features of the operating system to provide locale-specific collation order, number formatting, translated messages, and other aspects. UTF-8 is widely used on Linux distributions by default, so it can be useful in such case.
Providing a number of different character sets defined in the PostgreSQL/PostgrePlus server, including multiple-byte character sets, to support storing text of any languages, and providing character set translation between client and server. We recommend to use UTF-8 database charset, it will allow any-to-any conversations and make this issue transparent for the JCR.
Create database with UTF-8 encoding and use a PgSQL dialect for Workspace Container:
<workspace name="collaboration"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="pgsql" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="target/temp/swap/ws" /> </properties> .....
Frequently, a single database instance must be shared by several other applications. But some of our customers have also asked for a way to host several JCR instances in the same database instance. To fulfill this need, we had to review our queries and scope them to the current schema; it is now possible to have one JCR instance per DB schema instead of per DB instance. To benefit of the work done for this feature you will need to apply the configuration changes described below.
To enable this feature you need to replace org.jboss.cache.loader.JDBCCacheLoader with org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader in JBossCache configuration file.
Here is an example of this very part of the configuration:
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">
<locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false"
lockAcquisitionTimeout="20000" />
<clustering mode="replication" clusterName="${jbosscache-cluster-name}">
<stateRetrieval timeout="20000" fetchInMemoryState="false" />
<sync />
</clustering>
<loaders passivation="false" shared="true">
<!-- All the data of the JCR locks needs to be loaded at startup -->
<preload>
<node fqn="/" />
</preload>
<!--
For another cache-loader class you should use another template with
cache-loader specific parameters
-->
<loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false"
ignoreModifications="false" purgeOnStartup="false">
<properties>
cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name}
cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create}
cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop}
cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey}
cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column}
cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type}
cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column}
cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type}
cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column}
cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource}
</properties>
</loader>
</loaders>
</jbosscache>
You can also obtain file example from GitHub.
If you use HibernateService for JDBC connections management you will need to specify explicitly the default schema by setting "hibernate.default_schema" property in the configuration of HibernateService.
Here is an example:
<component>
<key>org.exoplatform.services.database.HibernateService</key>
<jmx-name>database:type=HibernateService</jmx-name>
<type>org.exoplatform.services.database.impl.HibernateServiceImpl</type>
<init-params>
<properties-param>
<name>hibernate.properties</name>
<description>Default Hibernate Service</description>
...........
<property name="hibernate.default_schema" value="${gatein.idm.datasource.schema:}"/>
</properties-param>
</init-params>
</component>
Search is an important function in eXo JCR, so it is very necessary for you to know how to configure the eXo JCR Search tool.
JCR index configuration. You can find this file here:
.../portal/WEB-INF/conf/jcr/repository-configuration.xml
<repository-service default-repository="db1"> <repositories> <repository name="db1" system-workspace="ws" default-workspace="ws"> .... <workspaces> <workspace name="ws"> .... <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="${java.io.tmpdir}/temp/index/db1/ws" /> <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /> <property name="synonymprovider-config-path" value="/synonyms.properties" /> <property name="indexing-configuration-path" value="/indexing-configuration.xml" /> <property name="query-class" value="org.exoplatform.services.jcr.impl.core.query.QueryImpl" /> </properties> </query-handler> ... </workspace> </workspaces> </repository> </repositories> </repository-service>
Table 1.2.
Parameter | Default | Description | Since |
---|---|---|---|
index-dir | none | The location of the index directory. This parameter is mandatory. Up to 1.9, this parameter called "indexDir" | 1.0 |
use-compoundfile | true | Advises lucene to use compound files for the index files. | 1.9 |
min-merge-docs | 100 | Minimum number of nodes in an index until segments are merged. | 1.9 |
volatile-idle-time | 3 | Idle time in seconds until the volatile index part is moved to a persistent index even though minMergeDocs is not reached. | 1.9 |
max-merge-docs | Integer.MAX_VALUE | Maximum number of nodes in segments that will be merged. The default value changed in JCR 1.9 to Integer.MAX_VALUE. | 1.9 |
merge-factor | 10 | Determines how often segment indices are merged. | 1.9 |
max-field-length | 10000 | The number of words that are fulltext indexed at most per property. | 1.9 |
cache-size | 1000 | Size of the document number cache. This cache maps uuids to lucene document numbers | 1.9 |
force-consistencycheck | false | Runs a consistency check on every startup. If false, a consistency check is only performed when the search index detects a prior forced shutdown. | 1.9 |
auto-repair | true | Errors detected by a consistency check are automatically repaired. If false, errors are only written to the log. | 1.9 |
query-class | QueryImpl | Class name that implements the javax.jcr.query.Query interface.This class must also extend from the class: org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl. | 1.9 |
document-order | true | If true and the query does not contain an 'order by' clause, result nodes will be in document order. For better performance when queries return a lot of nodes set to 'false'. | 1.9 |
result-fetch-size | Integer.MAX_VALUE | The number of results when a query is executed. Default value: Integer.MAX_VALUE (-> all). | 1.9 |
excerptprovider-class | DefaultXMLExcerpt | The name of the class that implements org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptProvider and should be used for the rep:excerpt() function in a query. | 1.9 |
support-highlighting | false | If set to true additional information is stored in the index to support highlighting using the rep:excerpt() function. | 1.9 |
synonymprovider-class | none | The name of a class that implements org.exoplatform.services.jcr.impl.core.query.lucene.SynonymProvider. The default value is null (-> not set). | 1.9 |
synonymprovider-config-path | none | The path to the synonym provider configuration file. This path interpreted is relative to the path parameter. If there is a path element inside the SearchIndex element, then this path is interpreted and relative to the root path of the path. Whether this parameter is mandatory or not, it depends on the synonym provider implementation. The default value is null (-> not set). | 1.9 |
indexing-configuration-path | none | The path to the indexing configuration file. | 1.9 |
indexing-configuration-class | IndexingConfigurationImpl | The name of the class that implements org.exoplatform.services.jcr.impl.core.query.lucene.IndexingConfiguration. | 1.9 |
force-consistencycheck | false | If setting to true, a consistency check is performed, depending on the parameter forceConsistencyCheck. If setting to false, no consistency check is performed on startup, even if a redo log had been applied. | 1.9 |
spellchecker-class | none | The name of a class that implements org.exoplatform.services.jcr.impl.core.query.lucene.SpellChecker. | 1.9 |
spellchecker-more-popular | true | If setting true, spellchecker returns only the suggest words that are as frequent or more frequent than the checked word. If setting false, spellchecker returns null (if checked word exit in dictionary), or spellchecker will return most close suggest word. | 1.10 |
spellchecker-min-distance | 0.55f | Minimal distance between checked word and proposed suggest word. | 1.10 |
errorlog-size | 50(Kb) | The default size of error log file in Kb. | 1.9 |
upgrade-index | false | Allows JCR to convert an existing index into the new format. Also, it is possible to set this property via system property, for example: -Dupgrade-index=true Indexes before JCR 1.12 will not run with JCR 1.12. Hence you have to run an automatic migration: Start JCR with -Dupgrade-index=true. The old index format is then converted in the new index format. After the conversion the new format is used. On the next start, you don't need this option anymore. The old index is replaced and a back conversion is not possible - therefore better take a backup of the index before. (Only for migrations from JCR 1.9 and later.) | 1.12 |
analyzer | org.apache.lucene.analysis.standard.StandardAnalyzer | Class name of a lucene analyzer to use for fulltext indexing of text. | 1.12 |
The Maximum number of clauses permitted per BooleanQuery, can be changed via the System property org.apache.lucene.maxClauseCount. The default value of this parameter is Integer.MAX_VALUE.
The global search index is configured in the above-mentioned
configuration file
(portal/WEB-INF/conf/jcr/repository-configuration.xml
)
in the tag "query-handler".
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
In fact, when using Lucene, you should always use the same analyzer for indexing and for querying, otherwise the results are unpredictable. You don't have to worry about this, eXo JCR does this for you automatically. If you don't like the StandardAnalyzer configured by default, just replace it by your own.
If you don't have a handy QueryHandler, you should learn how to create a customized Handler in 5 minutes.
By default Exo JCR uses the Lucene standard Analyzer to index contents. This analyzer uses some standard filters in the method that analyzes the content:
public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym); tokenStream.setMaxTokenLength(maxTokenLength); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); return result; }
The first one (StandardFilter) removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.
The second one (LowerCaseFilter) normalizes token text to lower case.
The last one (StopFilter) removes stop words from a token stream. The stop set is defined in the analyzer.
For specific cases, you may wish to use additional filters like ISOLatin1AccentFilter, which replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents.
In order to use a different filter, you have to create a new analyzer, and a new search index to use the analyzer. You put it in a jar, which is deployed with your application.
The ISOLatin1AccentFilter is not present in the current Lucene version used by eXo. You can use the attached file. You can also create your own filter, the relevant method is
public final Token next(final Token reusableToken) throws java.io.IOException
which defines how chars are read and used by the filter.
The analyzer has to extends org.apache.lucene.analysis.standard.StandardAnalyzer, and overload the method
public TokenStream tokenStream(String fieldName, Reader reader)
to put your own filters. You can have a glance at the example analyzer attached to this article.
Now, we have the analyzer, we have to write the SearchIndex, which will use the analyzer. Your have to extends org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. You have to write the constructor, to set the right analyzer, and the method
public Analyzer getAnalyzer() { return MyAnalyzer; }
to return your analyzer. You can see the attached SearchIndex.
Since 1.12 version, we can set Analyzer directly in configuration. So, creation new SearchIndex only for new Analyzer is redundant.
In
portal/WEB-INF/conf/jcr/repository-configuration.xml
,
you have to replace each
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
by your own class
<query-handler class="mypackage.indexation.MySearchIndex">
In
portal/WEB-INF/conf/jcr/repository-configuration.xml
,
you have to add parameter "analyzer" to each query-handler
config:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> ... <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/> ... </properties> </query-handler>
When you start exo, your SearchIndex will start to index contents with the specified filters.
Starting with version 1.9, the default search index implementation in JCR allows you to control which properties of a node are indexed. You also can define different analyzers for different nodes.
The configuration parameter is called indexingConfiguration and per default is not set. This means all properties of a node are indexed.
If you wish to configure the indexing behavior, you need to add a parameter to the query-handler element in your configuration file.
<property name="indexing-configuration-path" value="/indexing_configuration.xml"/>
Index configuration path can indicate any file located on the file system, in the jar or war files.
Please note that you have to declare the namespace prefixes in the configuration element that you are using throughout the XML file!
To optimize the index size, you can limit the node scope so that only certain properties of a node type are indexed.
With the below configuration, only properties named Text are indexed for nodes of type nt:unstructured. This configuration also applies to all nodes whose type extends from nt:unstructured.
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured"> <property>Text</property> </index-rule> </configuration>
It is also possible to configure a boost value for the nodes that match the index rule. The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured" boost="2.0"> <property>Text</property> </index-rule> </configuration>
If you do not wish to boost the complete node but only certain properties, you can also provide a boost value for the listed properties:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured"> <property boost="3.0">Title</property> <property boost="1.5">Text</property> </index-rule> </configuration>
You may also add a condition to the index rule and have multiple rules with the same nodeType. The first index rule that matches will apply and all remain ones are ignored:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured" boost="2.0" condition="@priority = 'high'"> <property>Text</property> </index-rule> <index-rule nodeType="nt:unstructured"> <property>Text</property> </index-rule> </configuration>
In the above example, the first rule only applies if the nt:unstructured node has a priority property with a value 'high'. The condition syntax supports only the equals operator and a string literal.
You may also refer properties in the condition that are not on the current node:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured" boost="2.0" condition="ancestor::*/@priority = 'high'"> <property>Text</property> </index-rule> <index-rule nodeType="nt:unstructured" boost="0.5" condition="parent::foo/@priority = 'low'"> <property>Text</property> </index-rule> <index-rule nodeType="nt:unstructured" boost="1.5" condition="bar/@priority = 'medium'"> <property>Text</property> </index-rule> <index-rule nodeType="nt:unstructured"> <property>Text</property> </index-rule> </configuration>
The indexing configuration also allows you to specify the type of a node in the condition. Please note however that the type match must be exact. It does not consider sub types of the specified node type.
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured" boost="2.0" condition="element(*, nt:unstructured)/@priority = 'high'"> <property>Text</property> </index-rule> </configuration>
Per default all configured properties are fulltext indexed if they are of type STRING and included in the node scope index. A node scope search finds normally all nodes of an index. That is, the select jcr:contains(., 'foo') returns all nodes that have a string property containing the word 'foo'. You can exclude explicitly a property from the node scope index:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <index-rule nodeType="nt:unstructured"> <property nodeScopeIndex="false">Text</property> </index-rule> </configuration>
You have an ability to disable the indexing on nodes that are sub nodes of excluded paths and/or that are of a given type. To get this done you simply need to add some lines to the configuration file:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.3.dtd"> <configuration xmlns:exo="http://www.exoplatform.com/jcr/exo/1.0"> <exclude nodeType="exo:hiddenable"/> <exclude path="/my[2]/path"/> <exclude nodeType="exo:foo" path="/my/other[2]/path"/> </configuration>
This will exclude nodes of type "exo:hiddenable" and nodes with the path "/my[2]/path" from the results. As you see you can also combine exclusions.
Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
JCR allows you to define indexed aggregates, basing on relative path patterns and primary node types.
The following example creates an indexed aggregate on nt:file that includes the content of the jcr:content node:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregate primaryType="nt:file"> <include>jcr:content</include> </aggregate> </configuration>
You can also restrict the included nodes to a certain type:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregate primaryType="nt:file"> <include primaryType="nt:resource">jcr:content</include> </aggregate> </configuration>
You may also use the * to match all child nodes:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregate primaryType="nt:file"> <include primaryType="nt:resource">*</include> </aggregate> </configuration>
If you wish to include nodes up to a certain depth below the current node, you can add multiple include elements. E.g. the nt:file node may contain a complete XML document under jcr:content:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregate primaryType="nt:file"> <include>*</include> <include>*/*</include> <include>*/*/*</include> </aggregate> </configuration>
In this configuration section, you define how a property has to be analyzed. If there is an analyzer configuration for a property, this analyzer is used for indexing and searching of this property. For example:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd"> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <analyzers> <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer"> <property>mytext</property> </analyzer> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"> <property>mytext2</property> </analyzer> </analyzers> </configuration>
The configuration above means that the property "mytext" for the entire workspace is indexed (and searched) with the Lucene KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyzer. Using different analyzers for different languages is particularly useful.
The WhitespaceAnalyzer tokenizes a property, the KeywordAnalyzer takes the property as a whole.
When using analyzers, you may encounter an unexpected behavior when searching within a property compared to searching within a node scope. The reason is that the node scope always uses the global analyzer.
Let's suppose that the property "mytext" contains the text : "testing my analyzers" and that you haven't configured any analyzers for the property "mytext" (and not changed the default analyzer in SearchIndex).
If your query is for example:
xpath = "//*[jcr:contains(mytext,'analyzer')]"
This xpath does not return a hit in the node with the property above and default analyzers.
Also a search on the node scope
xpath = "//*[jcr:contains(.,'analyzer')]"
won't give a hit. Realize that you can only set specific analyzers on a node property, and that the node scope indexing/analyzing is always done with the globally defined analyzer in the SearchIndex element.
Now, if you change the analyzer used to index the "mytext" property above to
<analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer"> <property>mytext</property> </analyzer>
and you do the same search again, then for
xpath = "//*[jcr:contains(mytext,'analyzer')]"
you would get a hit because of the word stemming (analyzers - analyzer).
The other search,
xpath = "//*[jcr:contains(.,'analyzer')]"
still would not give a result, since the node scope is indexed with the global analyzer, which in this case does not take into account any word stemming.
In conclusion, be aware that when using analyzers for specific properties, you might find a hit in a property for some search text, and you do not find a hit with the same search text in the node scope of the property!
Both index rules and index aggregates influence how content is indexed in JCR. If you change the configuration, the existing content is not automatically re-indexed according to the new rules. You, therefore, have to manually re-index the content when you change the configuration!
eXo JCR supports some advanced features, which are not specified in JSR 170:
Get a text excerpt with highlighted words that matches the query: ExcerptProvider.
Search a term and its synonyms: SynonymSearch
Search similar nodes: SimilaritySearch
Check spelling of a full text query statement: SpellChecker
Define index aggregates and rules: IndexingConfiguration.
eXo JCR allows using persister to store configuration. In this section, you will understand how to use and configure eXo JCR persister.
JCR Repository Service uses
org.exoplatform.services.jcr.config.RepositoryServiceConfiguration
component to read its configuration.
<component> <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key> <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type> <init-params> <value-param> <name>conf-path</name> <description>JCR configuration file</description> <value>/conf/standalone/exo-jcr-config.xml</value> </value-param> </init-params> </component>
In the example, Repository Service will read the configuration from
the file /conf/standalone/exo-jcr-config.xml
.
But in some cases, it's required to change the configuration on the fly. And know that the new one will be used. Additionally we wish not to modify the original file.
In this case, we have to use the configuration persister feature which allows to store the configuration in different locations.
On startup RepositoryServiceConfiguration
component checks if a configuration persister was configured. In that
case, it uses the provided ConfigurationPersister
implementation class to instantiate the persister object.
Configuration with persister:
<component> <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key> <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type> <init-params> <value-param> <name>conf-path</name> <description>JCR configuration file</description> <value>/conf/standalone/exo-jcr-config.xml</value> </value-param> <properties-param> <name>working-conf</name> <description>working-conf</description> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="mysql" /> <property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister" /> </properties-param> </init-params> </component>
Where:
source-name
: JNDI source name
configured in InitialContextInitializer
component. (sourceName
prior v.1.9.) Find
more in database
configuration.
dialect
: SQL dialect which will be used
with database from source-name
. Find more in
database
configuration.
persister-class-name
- class name of
ConfigurationPersister
interface
implementation. (persisterClassName
prior
v.1.9.)
ConfigurationPersister interface:
/** * Init persister. * Used by RepositoryServiceConfiguration on init. * @return - config data stream */ void init(PropertiesParam params) throws RepositoryConfigurationException; /** * Read config data. * @return - config data stream */ InputStream read() throws RepositoryConfigurationException; /** * Create table, write data. * @param confData - config data stream */ void write(InputStream confData) throws RepositoryConfigurationException; /** * Tell if the config exists. * @return - flag */ boolean hasConfig() throws RepositoryConfigurationException;
JCR Core implementation contains a persister which stores the
repository configuration in the relational database using JDBC calls -
org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister
.
The implementation will crate and use table JCR_CONFIG in the provided database.
But the developer can implement his own persister for his particular usecase.
eXo JCR persistent data container can work in two configuration modes:
Multi-database: One database for each workspace (used in standalone eXo JCR service mode)
Single-database: All workspaces persisted in one database (used in embedded eXo JCR service mode, e.g. in GateIn)
The data container uses the JDBC driver to communicate with the actual database software, i.e. any JDBC-enabled data storage can be used with eXo JCR implementation.
Currently the data container is tested with the following configurations:
MySQL 5.0.18 MYSQL Connector/J 5.0.8
MySQL 5.1.36 MYSQL Connector/J 5.1.14
MySQL 5.5.17 MYSQL Connector/J 5.1.18
MySQL Cluster (NDB engine)
PostgreSQL 8.2.4 JDBC4 Driver, Version 8.2-507
PostgreSQL 8.3.7 JDBC4 Driver, Version 8.3-606
PostgreSQL 8.4.14 JDBC4 Driver, Version 8.4-702
PostgreSQL 9.1.5 JDBC4 Driver, Version 9.1-902
PostgreSQL 9.2.4 JDBC4 Driver, Version 9.2-1002
Enterprise DB Postgres Plus Advanced Server 9.2.1 JDBC4 Driver, Version 9.2.1.3
Oracle DB 10g R2 (10.2.0.4), JDBC Driver Oracle 10g R2 (10.2.0.4)
Oracle DB 11g R1 (11.1.0.6.0), JDBC Driver Oracle 11g R1 (11.1.0.6.0)
Oracle DB 11g R2 (11.2.0.1.0), JDBC Driver Oracle 11g R2 (11.2.0.1.0)
DB2 9.7.4 IBM Data Server Driver for JDBC and SQLJ (JCC Driver) v.9.7
MS SQL Server 2005 SP3 JDBC Driver 3.0
MS SQL Server 2008 JDBC Driver 3.0
MS SQL Server 2008 R2 JDBC Driver 3.0
Sybase 15.0.3 ASE Driver: Sybase jConnect JDBC driver v7 (Build 26502)
Sybase ASE 15.7 Driver: Sybase jConnect JDBC driver v7 (build 26666)
HSQLDB (2.0.0)
H2 (1.3.161)
Each database software supports ANSI SQL standards but also has its own specifics. So, each database has its own configuration in eXo JCR as a database dialect parameter. If you need a more detailed configuration of the database, it's possible to do that by editing the metadata SQL-script files.
SQL-scripts you can obtain from jar-file exo.jcr.component.core-XXX.XXX.jar:conf/storage/. They also can be found at GitHub here.
In the next two tables correspondence between the scripts and databases is shown.
MySQL DB | jcr-sjdbc.mysql.sql |
MySQL DB with utf-8 | jcr-sjdbc.mysql-utf8.sql |
MySQL DB with MyISAM* | jcr-sjdbc.mysql-myisam.sql |
MySQL DB with MyISAM and utf-8* | jcr-sjdbc.mysql-myisam-utf8.sql |
MySQL DB with NDB engine | jcr-sjdbc.mysql-ndb.sql |
MySQL DB with NDB engine and utf-8 | jcr-sjdbc.mysql-ndb-utf8.sql |
PostgresSQL and Postgre Plus | jcr-sjdbc.pqsql.sql |
Oracle DB | jcr-sjdbc.ora.sql |
DB2 | jcr-sjdbc.db2.sql |
MS SQL Server | jcr-sjdbc.mssql.sql |
Sybase | jcr-sjdbc.sybase.sql |
HSQLDB | jcr-sjdbc.sql |
H2 | jcr-sjdbc.h2.sql |
MySQL DB | jcr-mjdbc.mysql.sql |
MySQL DB with utf-8 | jcr-mjdbc.mysql-utf8.sql |
MySQL DB with MyISAM* | jcr-mjdbc.mysql-myisam.sql |
MySQL DB with MyISAM and utf-8* | jcr-mjdbc.mysql-myisam-utf8.sql |
MySQL DB with NDB engine | jcr-mjdbc.mysql-ndb.sql |
MySQL DB with NDB engine and utf-8 | jcr-mjdbc.mysql-ndb-utf8.sql |
PostgresSQL and Postgre Plus | jcr-mjdbc.pqsql.sql |
Oracle DB | jcr-mjdbc.ora.sql |
DB2 | jcr-mjdbc.db2.sql |
MS SQL Server | jcr-mjdbc.mssql.sql |
Sybase | jcr-mjdbc.sybase.sql |
HSQLDB | jcr-mjdbc.sql |
H2 | jcr-mjdbc.h2.sql |
In case the non-ANSI node name is used, it's necessary to use a
database with MultiLanguage
support. Some JDBC drivers need additional parameters for
establishing a Unicode friendly connection. E.g. under mysql it's necessary
to add an additional parameter for the JDBC driver at the end of JDBC URL.
For instance:
jdbc:mysql://exoua.dnsalias.net/portal?characterEncoding=utf8
There are preconfigured configuration files for HSQLDB. Look for these files in /conf/portal and /conf/standalone folders of the jar-file exo.jcr.component.core-XXX.XXX.jar or source-distribution of eXo JCR implementation.
By default, the configuration files are located in service jars
/conf/portal/configuration.xml
(eXo services including
JCR Repository Service) and exo-jcr-config.xml
(repositories configuration). In GateIn product, JCR is configured in portal
web application
portal/WEB-INF/conf/jcr/jcr-configuration.xml
(JCR
Repository Service and related serivces) and repository-configuration.xml
(repositories configuration).
Read more about Repository configuration.
Please note, that JCR requires at least READ_COMMITED isolation level and other RDBMS configurations can cause some side-effects and issues. So, please, make sure proper isolation level is configured on database server side.
RDBMS reindexing feature use queries based on LIMIT and OFFSET clauses which are not enabled by default. However, you can ensure they are enabled by executing the following
$ db2set DB2_COMPATIBILITY_VECTOR=MYS $ db2stop $ db2start
Statistics is collected automatically starting from DB2 Version 9, however it is needed to launch statistics collection manually during the very first start, otherwise it could be very long. You need to run 'RUNSTATS' command
RUNSTATS ON TABLE <scheme>.<table> WITH DISTRIBUTION AND INDEXES ALL
for JCR_SITEM (or JCR_MITEM) and JCR_SVALUE (or JCR_MVALUE) tables.
If you don't want to enable the LIMIT/OFFSET clauses, you can still use "db2-mys" as dialect however please note that the indexing is 120 times slower.
MyISAM is not supported due to its lack of transaction support and integrity check, so use it only if you don't expect any support and if performances in read accesses are more important than the consistency in your use-case. This dialect is only dedicated to the community.
MySQL relies on collected statistics for keeping track of data distribution in tables and for optimizing join statements, but you can manually call 'ANALYZE' to update statistics if needed. For example
ANALYZE TABLE JCR_SITEM, JCR_SVALUE
Be aware, when using the RDBMS reindexing, you need to set "enable_seqscan" to "off" or "default_statistics_target" to at least "50"
Though PostgreSQL/PostgrePlus server performs query optimization automatically, you can manualy call 'ANALYZE' command to collect statistics which can influence the performance. For example
ANALYZE JCR_SITEM ANALYZE JCR_SVALUE
If for a version prior to 9.1, the parameter standard_conforming_strings is enabled, you need to use "pgsql-scs" as dialect
One more mandatory JCR requirement for underlying databases is a case sensitive collation. Microsoft SQL Server both 2005 and 2008 customers must configure their server with collation corresponding to personal needs and requirements, but obligatorily case sensitive. For more information please refer to Microsoft SQL Server documentation page "Selecting a SQL Server Collation" here.
MS SQL DB server's optimizer automatically processes queries to increase performance. Optimization is based on statistical data which is collected automatically, but you can manually call Transact-SQL command 'UPDATE STATISTICS' which in very few situations may increase performance. For example
UPDATE STATISTICS JCR_SITEM UPDATE STATISTICS JCR_SVALUE
Sybase DB Server optimizer automatically processes queries to increase performance. Optimization is based on statistical data which is collected automatically, but you can manually call Transact-SQL command 'update statistics' which in very few situations may increase performance. For example
update statistics JCR_SITEM update statistics JCR_SVALUE
Oracle DB automatically collects statistics to optimize performance of queries, but you can manually call 'ANALYZE' command to start collecting statistics immediately which may improve performance. For example
ANALYZE TABLE JCR_SITEM COMPUTE STATISTICS ANALYZE TABLE JCR_SVALUE COMPUTE STATISTICS ANALYZE TABLE JCR_SREF COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SITEM COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_FK COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_NAME COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_ID COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SVALUE COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SVALUE_PROPERTY COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SREF COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SREF_PROPERTY COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SCONTAINER COMPUTE STATISTICS
Isolated-database configuration allows to configure single database
for repository but separate database tables for each workspace. First step
is to configure the data container in the
org.exoplatform.services.naming.InitialContextInitializer
service. It's the JNDI context initializer, which registers (binds) naming
resources (DataSources) for data containers.
For example:
<external-component-plugins> <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="org.postgresql.Driver"/> <property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/> <property name="username" value="exoadmin"/> <property name="password" value="exo12321"/> </properties-param> </init-params> </component-plugin> </external-component-plugins>
We configure the database connection parameters:
driverClassName
, e.g.
"org.hsqldb.jdbcDriver", "com.mysql.jdbc.Driver",
"org.postgresql.Driver"
url
, e.g.
"jdbc:hsqldb:file:target/temp/data/portal",
"jdbc:mysql://exoua.dnsalias.net/jcr"
username
, e.g. "sa", "exoadmin"
password
, e.g. "", "exo12321"
When the data container configuration is done, we can configure the repository service. Each workspace will be configured for the same data container.
For example:
<workspaces> <workspace name="ws"> <!-- for system storage --> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="db-structure-type" value="isolated" /> ... </properties> ... </container> ... </workspace> <workspace name="ws1"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="db-structure-type" value="isolated" /> ... </properties> ... </container> ... </workspace> </workspaces>
In this way, we have configured two workspace which will be persisted in different database tables.
Starting from v.1.9 repository configuration parameters supports human-readable formats of values (e.g. 200K - 200 Kbytes, 30m - 30 minutes etc)
This configuration option is now deprecated. Use isolated database configuration instead.
You need to configure each workspace in a repository. You may have each one on different remote servers as far as you need.
First of all configure the data containers in the
org.exoplatform.services.naming.InitialContextInitializer
service. It's the JNDI context initializer which registers (binds) naming
resources (DataSources) for data containers.
For example:
<component> <key>org.exoplatform.services.naming.InitialContextInitializer</key> <type>org.exoplatform.services.naming.InitialContextInitializer</type> <component-plugins> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="org.hsqldb.jdbcDriver"/> <property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/> <property name="username" value="sa"/> <property name="password" value=""/> </properties-param> </init-params> </component-plugin> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr1</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="com.mysql.jdbc.Driver"/> <property name="url" value="jdbc:mysql://exoua.dnsalias.net/jcr"/> <property name="username" value="exoadmin"/> <property name="password" value="exo12321"/> <property name="maxActive" value="50"/> <property name="maxIdle" value="5"/> <property name="initialSize" value="5"/> </properties-param> </init-params> </component-plugin> <component-plugins> </component>
When the data container configuration is done, we can configure the repository service. Each workspace will be configured for its own data container.
For example:
<workspaces> <workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="db-structure-type" value="multi"/> ... </properties> </container> ... </workspace> <workspace name="ws1"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr1"/> <property name="db-structure-type" value="multi"/> ... </properties> </container> ... </workspace> </workspaces>
In this way, we have configured two workspace which will be persisted in two different databases (ws in HSQLDB, ws1 in MySQL).
It's simplier to configure a single-database data container. We have to configure one naming resource.
For example:
<external-component-plugins> <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="org.postgresql.Driver"/> <property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/> <property name="username" value="exoadmin"/> <property name="password" value="exo12321"/> <property name="maxActive" value="50"/> <property name="maxIdle" value="5"/> <property name="initialSize" value="5"/> </properties-param> </init-params> </component-plugin> </external-component-plugins>
And configure repository workspaces in repositories configuration with this one database.
For example:
<workspaces> <workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="db-structure-type" value="single" /> ... </properties> </container> ... </workspace> <workspace name="ws1"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="db-structure-type" value="single" /> ... </properties> ... </workspace> </workspaces>
In this way, we have configured two workspaces which will be persisted in one database (PostgreSQL).
Workspaces can be added dynamically during runtime.
This can be performed in two steps:
Firstly,
ManageableRepository.configWorkspace(WorkspaceEntry
wsConfig)
- register a new configuration in
RepositoryContainer and create a WorkspaceContainer.
Secondly, the main step,
ManageableRepository.createWorkspace(String
workspaceName)
- creation of a new workspace.
eXo JCR provides two ways for interact with Database -
JDBCStorageConnection
that uses simple queries and
CQJDBCStorageConection
that uses complex queries
for reducing amount of database callings.
Simple queries will be used if you chose
org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer
:
<workspaces> <workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> ... </workspace> </worksapces>
Complex queries will be used if you chose
org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer
:
<workspaces> <workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </workspace> </worksapces>
Why we should use a Complex Queries?
They are optimised to reduce amount of requests to database. |
Why we should use a Simple Queries?
Simple queries implemented in way to support as many database dialects as possible. |
Simple queries do not use sub queries, left or right joins. |
Some databases supports hints to increase query performance (like Oracle, MySQL, etc). eXo JCR have separate Complex Query implementation for Orcale dialect, that uses query hints to increase performance for few important queries.
To enable this option put next configuration property:
<workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="dialect" value="oracle"/> <property name="force.query.hints" value="true" /> ......
Query hints enabled by default.
eXo JCR uses query hints only for Complex Query Oracle dialect. For all other dialects this parameter is ignored.
The current configuration of eXo JCR uses Apache DBCP connection pool
(org.apache.commons.dbcp.BasicDataSourceFactory
).
It's possible to set a big value for maxActive parameter in
configuration.xml
. That means usage of lots of TCP/IP
ports from a client machine inside the pool (i.e. JDBC driver). As a
result, the data container can throw exceptions like "Address already in
use". To solve this problem, you have to configure the client's machine
networking software for the usage of shorter timeouts for opened TCP/IP
ports.
Microsoft Windows has MaxUserPort
,
TcpTimedWaitDelay
registry keys in the node
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters
,
by default these keys are unset, set each one with values like
these:
"TcpTimedWaitDelay"=dword:0000001e, sets TIME_WAIT parameter to 30 seconds, default is 240.
"MaxUserPort"=dword:00001b58, sets the maximum of open ports to 7000 or higher, default is 5000.
A sample registry file is below:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "MaxUserPort"=dword:00001b58 "TcpTimedWaitDelay"=dword:0000001e
By default JCR Values are stored in the Workspace Data container along with the JCR structure (i.e. Nodes and Properties). eXo JCR offers an additional option of storing JCR Values separately from Workspace Data container, which can be extremely helpful to keep Binary Large Objects (BLOBs) for example.
Value storage configuration is a part of Repository configuration, find more details there.
Tree-based storage is recommended for most of cases. If you run an application on Amazon EC2 - the S3 option may be interesting for architecture. Simple 'flat' storage is good in speed of creation/deletion of values, it might be a compromise for a small storages.
Holds Values in tree-like FileSystem files. path property points to the root directory to store the files.
This is a recommended type of external storage, it can contain large amount of files limited only by disk/volume free space.
A disadvantage is that it's a higher time on Value deletion due to unused tree-nodes remove.
<value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="data/values"/> </properties> <filters> <filter property-type="Binary" min-value-size="1M"/> </filters>
Where :
id : The value storage unique
identifier, used for linking with properties stored in workspace
container. |
path : A location where value files will
be stored. |
Each file value storage can have the filter(s)
for incoming values. A filter can match values by property type
(property-type), property name
(property-name), ancestor path
(ancestor-path) and/or size of values stored
(min-value-size, in bytes). In code sample, we use a
filter with property-type and min-value-size only. I.e. storage for binary
values with size greater of 1MB. It's recommended to store properties with
large values in file value storage only.
Another example shows a value storage with different locations for large files (min-value-size a 20Mb-sized filter). A value storage uses ORed logic in the process of filter selection. That means the first filter in the list will be asked first and if not matched the next will be called etc. Here a value matches the 20 MB-sized filter min-value-size and will be stored in the path "data/20Mvalues", all other in "data/values".
<value-storages> <value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="data/20Mvalues"/> </properties> <filters> <filter property-type="Binary" min-value-size="20M"/> </filters> <value-storage> <value-storage id="Storage #2" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="data/values"/> </properties> <filters> <filter property-type="Binary" min-value-size="1M"/> </filters> <value-storage> <value-storages>
It's not recommended to use in production due to low capacity capabilities on most file systems.
But if you're sure in your file-system or data amount is small it may be useful for you as haves a faster speed of Value removal.
Hold Values in flat FileSystem files. path property points to root directory in order to store files
<value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.SimpleFileValueStorage"> <properties> <property name="path" value="data/values"/> </properties> <filters> <filter property-type="Binary" min-value-size="1M"/> </filters>
eXo JCR supports Content-addressable storage feature for Values storing.
Content-addressable storage, also referred to as associative storage and abbreviated CAS, is a mechanism for storing information that can be retrieved based on its content, not its storage location. It is typically used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations.
Content Addressable Value storage stores unique content once. Different properties (values) with same content will be stored as one data file shared between those values. We can tell the Value content will be shared across some Values in storage and will be stored on one physical file.
Storage size will be decreased for application which governs potentially same data in the content.
For example: if you have 100 different properties containing the same data (e.g. mail attachment), the storage stores only one single file. The file will be shared with all referencing properties.
If property Value changes, it is stored in an additional file. Alternatively the file is shared with other values, pointing to the same content.
The storage calculates Value content address each time the property was changed. CAS write operations are much more expensive compared to the non-CAS storages.
Content address calculation based on java.security.MessageDigest hash computation and tested with MD5 and SHA1 algorithms.
CAS storage works most efficiently on data that does not change often. For data that changes frequently, CAS is not as efficient as location-based addressing.
CAS support can be enabled for Tree and Simple File Value Storage types.
To enable CAS support, just configure it in JCR Repositories configuration as we do for other Value Storages.
<workspaces> <workspace name="ws"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="dialect" value="oracle"/> <property name="multi-db" value="false"/> <property name="max-buffer-size" value="200k"/> <property name="swap-directory" value="target/temp/swap/ws"/> </properties> <value-storages> <!------------------- here -----------------------> <value-storage id="ws" class="org.exoplatform.services.jcr.impl.storage.value.fs.CASableTreeFileValueStorage"> <properties> <property name="path" value="target/temp/values/ws"/> <property name="digest-algo" value="MD5"/> <property name="vcas-type" value="org.exoplatform.services.jcr.impl.storage.value.cas.JDBCValueContentAddressStorageImpl"/> <property name="jdbc-source-name" value="jdbcjcr"/> <property name="jdbc-dialect" value="oracle"/> </properties> <filters> <filter property-type="Binary"/> </filters> </value-storage> </value-storages>
Properties:
digest-algo : Digest hash algorithm (MD5
and SHA1 were tested); |
vcas-type : Value CAS internal data
type, JDBC backed is currently implemented
org.exoplatform.services.jcr.impl.storage.value.cas.JDBCValueContentAddressStorageImp;l |
jdbc-source-name :
JDBCValueContentAddressStorageImpl specific parameter, database will
be used to save CAS metadata. It's simple to use same as in workspace
container; |
jdbc-dialect :
JDBCValueContentAddressStorageImpl specific parameter, database
dialect. It's simple to use the same as in workspace
container; |
Each Workspace of JCR has its own persistent storage to hold workspace's items data. eXo Content Repository can be configured so that it can use one or more workspaces that are logical units of the repository content. Physical data storage mechanism is configured using mandatory element container. The type of container is described in the attribute class = fully qualified name of org.exoplatform.services.jcr.storage.WorkspaceDataContainer subclass like
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr1"/> <property name="dialect" value="hsqldb"/> <property name="multi-db" value="true"/> <property name="max-buffer-size" value="200K"/> <property name="swap-directory" value="target/temp/swap/ws"/> <property name="lazy-node-iterator-page-size" value="50"/> <property name="acl-bloomfilter-false-positive-probability" value="0.1d"/> <property name="acl-bloomfilter-elements-number" value="1000000"/> <property name="check-sns-new-connection" value="false"/> <property name="batch-size" value="1000"/> </properties>
Workspace Data Container specific parameters:
max-buffer-size: A threshold in bytes, if a value size is greater, then it will be spooled to a temporary file. Default value is 200k.
swap-directory: A location where the value will be spooled if no value storage is configured but a max-buffer-size is exceeded. Default value is the value of "java.io.tmpdir" system property.
lazy-node-iterator-page-size: "Lazy" child nodes iterator settings. Defines size of page, the number of nodes that are retrieved from persistent storage at once. Default value is 100.
acl-bloomfilter-false-positive-probability: ACL Bloom-filter settings. ACL Bloom-filter desired false positive probability. Range [0..1]. Default value 0.1d.
acl-bloomfilter-elements-number: ACL Bloom-filter settings. Expected number of ACL-elements in the Bloom-filter. Default value 1000000.
check-sns-new-connection: Defines if we need to create new connection for checking if an older same-name sibling exists. Default value is "false".
trigger-events-for-descendants-on-rename: Indicates whether or not each descendant item must be included into the changes log in case of a rename. If it is set to false, it will allow to increase performance on rename operations if there is a big amount of nodes under the source parent node but it will decrease the performance with a small amount of sub nodes. If it is set to true, we will get the exact opposite, the performance will be better in case of small amount of sub nodes and worse in case of big amount of sub nodes. When this parameter is not set, the application will rely on the parameter max-descendant-nodes-allowed-on-move to add or not the descendant items to the changes log. If this parameter is not set but the parameter trigger-events-for-descendants-on-move is set, it will have the same value.
trigger-events-for-descendants-on-move: Indicates whether or not each descendant item must be included into the changes log in case of a move. If it is set to false, it will allow to increase performance on move operations if there is a big amount of nodes under the source parent node but it will decrease the performance with a small amount of sub nodes. If it is set to true, we will get the exact opposite, the performance will be better in case of small amount of sub nodes and worse in case of big amount of sub nodes. When this parameter is not set, the application will rely on the parameter max-descendant-nodes-allowed-on-move to add or not the descendant items to the changes log.
max-descendant-nodes-allowed-on-move: The maximum amount of descendant nodes allowed before considering that the descendant items should not be included into the changes log. This allows to have the best possible performances whatever the total amount of sub nodes. The default value is 100. This parameter is only used if and only if trigger-events-for-descendants-on-move is not set and in case of a rename trigger-events-for-descendants-on-rename is not set.
Bloom filters are not supported by all the cache implementations so far only the inplementation for infinispan supports it. They are used to avoid read nodes that definitely do not have ACL. acl-bloomfilter-false-positive-probability and acl-bloomfilter-elements-number used to configure such filters.More about Bloom filters you can read here.
eXo JCR has an RDB (JDBC) based, production ready Workspace Data Container.
JDBC Workspace Data Container specific parameters:
source-name: JDBC data source name, registered in JDNI by InitialContextInitializer. ( sourceName prior v.1.9). This property is mandatory.
dialect: Database dialect, one of "hsqldb", "h2", "mysql", "mysql-myisam", "mysql-utf8", "mysql-myisam-utf8", "pgsql", "pgsql-scs", "oracle", "oracle-oci", "mssql", "sybase", "derby", "db2" ,"db2-mys", "db2v8". The default value is "auto".
multi-db: Enable multi-database container with this parameter (if "true"). Otherwise (if "false") configured for single-database container. Please, be aware, that this property is currently deprecated. It is advised to use db-structure-type instead.
db-structure-type: Can be set to isolated, multi, single to set corresponding configuration for data container. This property is mandatory.
db-tablename-suffix: If db-structure-type is set to isolated, tables, used by repository service, have the following format:
JCR_I${db-tablename-suffix} - for items
JCR_V${db-tablename-suffix} - for values
JCR_R${db-tablename-suffix} - for references
db-tablename-suffix by default equals to workspace name, but can be set via configuration to any suitable.
batch-size: the batch size. Default value is -1 (disabled)
Workspace Data Container MAY support external storages for javax.jcr.Value (which can be the case for BLOB values for example) using the optional element value-storages. Data Container will try to read or write Value using underlying value storage plugin if the filter criteria (see below) match the current property.
<value-storages> <value-storage id="Storage #1" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="data/values"/> </properties> <filters> <filter property-type="Binary" min-value-size="1M"/><!-- Values large of 1Mbyte --> </filters> ......... </value-storages>
Where value-storage is the subclass of org.exoplatform.services.jcr.storage.value.ValueStoragePlugin and properties are optional plugin specific parameters.
filters : Each file value storage can have the filter(s) for incoming values. If there are several filter criteria, they all have to match (AND-Condition).
A filter can match values by property type (property-type), property name (property-name), ancestor path (ancestor-path) and/or the size of values stored (min-value-size, e.g. 1M, 4.2G, 100 (bytes)).
In a code sample, we use a filter with property-type and min-value-size only. That means that the storage is only for binary values whose size is greater than 1Mbyte.
It's recommended to store properties with large values in a file value storage only.
PostgreSQL/PostgrePlus's dialect is set automatically. The dialect depends on the version of database. If you change default value of standard_conforming_strings parameter than you must configure one of the following dialects manually:
PgSQL - this dialect is used if standard_conforming_strings is set to off. This is default value for version before 9.1.
PgSQL-SCS - this dialect is used if standard_conforming_strings is set to on. This is default value for version after 9.1.
As well as PostgreSQL, DB2's dialect is set automatically depends on the version of database. If you change the default value of DB2_COMPATIBILITY_VECTOR parameter than you must configure one of the following dialects manually:
DB2V8 - this dialect is used if version of database is lower than 9
DB2 - this dialect is used if version of database not lower than 9 and DB2_COMPATIBILITY_VECTOR is se to 0
DB2-MYS - this dialect is used if version of database not lower than 9 and DB2_COMPATIBILITY_VECTOR is se to MYS. This is default value for version begining from 9.7.2.
mysql - this dialect is used if needed to create JCR tables with InnoDB engine (by default)
mysql-utf8 - this dialect is used if needed to create JCR tables with InnoDB engine with UTF-8 encoding support
mysql-myisam - this dialect is used if needed to create JCR tables with MyISAM engine
mysql-myisam-utf8 - this dialect is used if needed to create JCR tables with MyISAM engine with UTF-8 encoding support
mysql-ndb - this dialect is used if needed to create JCR tables with NDB engine (mysql cluster)
mysql-ndb-utf8 - this dialect is used if needed to create JCR tables with NDB engine (mysql cluster) with UTF-8 encoding support
Since MySQL NDB engine does not support foreign keys, which may lead to improper item removal and as consequence to InvalidItemStateException. In this case you will need to use consistency checker tool.
Starting from version 1.9, JCR Service supports REST services creation on Groovy script.
The feature bases on RESTful framework and uses ResourceContainer concept.
Scripts should extend ResourceContainer and should be stored in JCR as a node of type exo:groovyResourceContainer.
Detailed REST services step-by-step implementation check there Create REST service step by step.
Component configuration enables Groovy services loader:
<component> <type>org.exoplatform.services.jcr.ext.script.groovy.GroovyScript2RestLoader</type> <init-params> <object-param> <name>observation.config</name> <object type="org.exoplatform.services.jcr.ext.script.groovy.GroovyScript2RestLoader$ObservationListenerConfiguration"> <field name="repository"> <string>repository</string> </field> <field name="workspaces"> <collection type="java.util.ArrayList"> <value> <string>collaboration</string> </value> </collection> </field> </object> </object-param> </init-params> </component>
To deploy eXo JCR to JBoss, do the following steps:
Download the latest version of eXo JCR .ear file distribution.
Copy <jcr.ear> into <%jboss_home%/server/default/deploy>
Put exo-configuration.xml to the root <%jboss_home%/exo-configuration.xml>
Configure JAAS by inserting XML fragment shown below into <%jboss_home%/server/default/conf/login-config.xml>
<application-policy name="exo-domain"> <authentication> <login-module code="org.exoplatform.services.security.j2ee.JbossLoginModule" flag="required"></login-module> </authentication> </application-policy>
Ensure that you use JBossTS Transaction Service and JBossCache Transaction Manager. Your exo-configuration.xml must contain such parts:
<component> <key>org.jboss.cache.transaction.TransactionManagerLookup</key> <type>org.jboss.cache.GenericTransactionManagerLookup</type>^ </component> <component> <key>org.exoplatform.services.transaction.TransactionService</key> <type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type> <init-params> <value-param> <name>timeout</name> <value>300</value> </value-param> </init-params> </component>
Start server:
bin/run.sh for Unix
bin/run.bat for Windows
Try accessing http://localhostu:8080/browser
with
root/exo as login/password if you have done everything right, you'll
get access to repository browser.
To manually configure repository, create a new configuration file (e.g., exo-jcr-configuration.xml). For details, see JCR Configuration. Your configuration must look like:
<repository-service default-repository="repository1"> <repositories> <repository name="repository1" system-workspace="ws1" default-workspace="ws1"> <security-domain>exo-domain</security-domain> <access-control>optional</access-control> <authentication-policy>org.exoplatform.services.jcr.impl.core.access.JAASAuthenticator</authentication-policy> <workspaces> <workspace name="ws1"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="oracle" /> <property name="multi-db" value="false" /> <property name="update-storage" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="../temp/swap/production" /> </properties> <value-storages> see "Value storage configuration" part. </value-storages> </container> <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer"> <properties> <property name="root-nodetype" value="nt:unstructured" /> </properties> </initializer> <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache"> see "Cache configuration" part. </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> see "Indexer configuration" part. </query-handler> <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> see "Lock Manager configuration" part. </lock-manager> </workspace> <workspace name="ws2"> ... </workspace> <workspace name="wsN"> ... </workspace> </workspaces> </repository> </repositories> </repository-service>
Then, update RepositoryServiceConfiguration configuration in exo-configuration.xml to use this file:
<component> <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key> <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type> <init-params> <value-param> <name>conf-path</name> <description>JCR configuration file</description> <value>exo-jcr-configuration.xml</value> </value-param> </init-params> </component>
Every node of cluster MUST have the same mounted Network File System with the read and write permissions on it.
"/mnt/tornado" - path to the mounted Network File System (all cluster nodes must use the same NFS).
Every node of cluster MUST use the same database.
The same Clusters on different nodes MUST have the same names (e.g., if Indexer cluster in workspace production on the first node has the name "production_indexer_cluster", then indexer clusters in workspace production on all other nodes MUST have the same name "production_indexer_cluster" ).
Configuration of every workspace in repository must contains of such parts:
<value-storages> <value-storage id="system" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="/mnt/tornado/temp/values/production" /> <!--path within NFS where ValueStorage will hold it's data--> </properties> <filters> <filter property-type="Binary" /> </filters> </value-storage> </value-storages>
<cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache"> <properties> <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-data.xml" /> <!-- path to JBoss Cache configuration for data storage --> <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" /> <!-- path to JGroups configuration --> <property name="jbosscache-cluster-name" value="JCR_Cluster_cache" /> <!-- JBoss Cache data storage cluster name --> <property name="jgroups-multiplexer-stack" value="false" /> <property name="jbosscache-shareable" value="true" /> </properties> </cache>
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="index-dir" value="/mnt/tornado/temp/jcrlucenedb/production" /> <!-- path within NFS where ValueStorage will hold it's data --> <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-indexer.xml" /> <!-- path to JBoss Cache configuration for indexer --> <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" /> <!-- path to JGroups configuration --> <property name="jbosscache-cluster-name" value="JCR_Cluster_indexer" /> <!-- JBoss Cache indexer cluster name --> <property name="jgroups-multiplexer-stack" value="false" /> <property name="jbosscache-shareable" value="true" /> </properties> </query-handler>
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="jar:/conf/portal/test-jbosscache-lock.xml" /> <!-- path to JBoss Cache configuration for lock manager --> <property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" /> <!-- path to JGroups configuration --> <property name="jbosscache-cluster-name" value="JCR_Cluster_locks" /> <!-- JBoss Cache locks cluster name --> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks"/> <!-- the name of the DB table where lock's data will be stored --> <property name="jbosscache-cl-cache.jdbc.table.create" value="true"/> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false"/> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk"/> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn"/> <property name="jbosscache-cl-cache.jdbc.node.column" value="node"/> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent"/> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr"/> <property name="jgroups-multiplexer-stack" value="false" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager>
This section will show you how to use and configure Jboss Cache in the clustered environment. Also, you will know how to use a template-based configuration offered by eXo JCR for JBoss Cache instances.
Each mentioned components uses instances of JBoss Cache product for caching in clustered environment. So every element has its own transport and has to be configured in a proper way. As usual, workspaces have similar configuration but with different cluster-names and may-be some other parameters. The simplest way to configure them is to define their own configuration files for each component in each workspace:
<property name="jbosscache-configuration" value="conf/standalone/test-jbosscache-lock-db1-ws1.xml" />
But if there are few workspaces, configuring them in such a way can be painful and hard-manageable. eXo JCR offers a template-based configuration for JBoss Cache instances. You can have one template for Lock Manager, one for Indexer and one for data container and use them in all the workspaces, defining the map of substitution parameters in a main configuration file. Just simply define ${jbosscache-<parameter name>} inside xml-template and list correct value in JCR configuration file just below "jbosscache-configuration", as shown:
Template:
... <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> ...
and JCR configuration file:
... <property name="jbosscache-configuration" value="jar:/conf/portal/jbosscache-lock.xml" /> <property name="jbosscache-cluster-name" value="JCR-cluster-locks-db1-ws" /> ...
JGroups is used by JBoss Cache for network communications and transport in a clustered environment. If property "jgroups-configuration" is defined in component configuration, it will be injected into the JBoss Cache instance on startup.
<property name="jgroups-configuration" value="your/path/to/modified-udp.xml" />
As mentioned above, each component (lock manager, data container and query handler) for each workspace requires its own clustered environment. In other words, they have their own clusters with unique names. By default, each cluster should perform multi-casts on a separate port. This configuration leads to much unnecessary overhead on cluster. That's why JGroups offers multiplexer feature, providing ability to use one single channel for set of clusters. This feature reduces network overheads and increase performance and stability of application. To enable multiplexer stack, you should define appropriate configuration file (upd-mux.xml is pre-shipped one with eXo JCR) and set "jgroups-multiplexer-stack" into "true".
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" />
It is now highly recommended to use the shared transport instead of the multiplexer, to do so simply disable the multiplexer stack in the configuration of each component then set the property singleton_name of your JGroups configuration to a unique name.
<property name="jgroups-configuration" value="jar:/conf/portal/udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="false" />
A JBoss Cache instance is quite resource consuming and by default we will have 3 JBoss Cache instances (one instance for the indexer, one for the lock manager and one for the data container) for each workspace, so if you intend to have a lot of workspaces it could make sense to decide to share one JBoss Cache instance with several cache instances of the same type (i.e. indexer, lock manager or data container). This feature is disabled by default and can be enabled at component configuration level (i.e. indexer configuration, lock manager configuration and/or data container configuration) by setting the property "jbosscache-shareable" to true as below:
<property name="jbosscache-shareable" value="true" />
Once enabled this feature will allow the JBoss Cache instance used by the component to be re-used by another components of the same type (i.e. indexer, lock manager or data container) with the exact same JBoss Cache configuration (except the eviction configuration that cans be different), which means that all the parameters of type ${jbosscache-<parameter name>} must be identical between the components of same type of different workspaces. In other words, if we use the same values for the parameters of type ${jbosscache-<parameter name>} in each workspace, we will have only 3 JBoss Cache instances (one instance for the indexer, one for the lock manager and one for the data container) used whatever the total amount of workspaces defined.
eXo JCR implementation is shipped with ready-to-use JBoss Cache configuration templates for JCR's components. They are situated in application package in /conf/porta/ folder.
Data container template is "jbosscache-data.xml":
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <sync /> </clustering> <!-- Eviction configuration --> <eviction wakeUpInterval="5000"> <default algorithmClass="org.jboss.cache.eviction.ExpirationAlgorithm" actionPolicyClass="org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.ParentNodeEvictionActionPolicy" eventQueueSize="1000000"> <property name="maxNodes" value="1000000" /> <property name="warnNoExpirationKey" value="false" /> </default> </eviction> </jbosscache>
Table 1.5. Template variables
Variable | Description |
---|---|
jbosscache-cluster-name | cluster name (must be unique) |
It's template name is "jbosscache-lock.xml"
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <sync /> </clustering> <loaders passivation="false" shared="true"> <preload> <node fqn="/" /> </preload> <loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false"> <properties> cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name} cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create} cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop} cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey} cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column} cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type} cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column} cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type} cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column} cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource} </properties> </loader> </loaders> </jbosscache>
To prevent any consistency issue regarding the lock data please ensure that your cache loader is org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader and that your database engine is transactional.
Table 1.6. Template variables
Variable | Description |
---|---|
jbosscache-cluster-name | cluster name (must be unique) |
jbosscache-cl-cache.jdbc.table.name | the name of the table. |
jbosscache-cl-cache.jdbc.table.create | can be true or false. Indicates whether to create the able during startup. If true, the table is created if it doesn't already exist. The default value is true. |
jbosscache-cl-cache.jdbc.table.drop | can be true or false. Indicates whether to drop the table during shutdown. The default value is true. |
jbosscache-cl-cache.jdbc.table.primarykey | the name of the primary key for the table. |
jbosscache-cl-cache.jdbc.fqn.column | FQN column name. The default value is 'fqn'. |
jbosscache-cl-cache.jdbc.fqn.type | FQN column type. The default value is 'varchar(255)'. |
jbosscache-cl-cache.jdbc.node.column | node contents column name. The default value is 'node'. |
jbosscache-cl-cache.jdbc.node.type | node contents column type. The default value is 'blob'. This type must specify a valid binary data type for the database being used. |
jbosscache-cl-cache.jdbc.parent.column | Parent column name. The default value is 'parent'. |
jbosscache-cl-cache.jdbc.datasource | JNDI name of the DataSource. |
Have a look at "jbosscache-indexer.xml"
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <sync /> </clustering> </jbosscache>
Table 1.7. Template variables
Variable | Description |
---|---|
jbosscache-cluster-name | cluster name (must be unique) |
What LockManager does?
In general, LockManager stores Lock objects, so it can give a Lock object or can release it.
Also, LockManager is responsible for removing Locks that live too long. This parameter may be configured with "time-out" property.
JCR provides one basic implementations of LockManager:
org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl
CacheableLockManagerImpl stores Lock objects in JBoss-cache, so Locks are replicable and affect on cluster, not only a single node. Also, JBoss-cache has JDBCCacheLoader, so Locks will be stored to the database.
You can enable LockManager by adding lock-manager-configuration to workspace-configuration.
For example:
<workspace name="ws"> ... <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> ... </properties> </lock-manager> ... </workspace>
Wher time-out parameter represents interval to remove Expired Locks. LockRemover separates threads, that periodically ask LockManager to remove Locks that live so long.
The configuration uses the template JBoss-cache configuration for all LockManagers.
Lock template configuration
test-jbosscache-lock.xml
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="500" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <sync /> </clustering> <loaders passivation="false" shared="true"> <!-- All the data of the JCR locks needs to be loaded at startup --> <preload> <node fqn="/" /> </preload> <!-- For another cache-loader class you should use another template with cache-loader specific parameters -> <loader class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader" async="false" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false"> <properties> cache.jdbc.table.name=${jbosscache-cl-cache.jdbc.table.name} cache.jdbc.table.create=${jbosscache-cl-cache.jdbc.table.create} cache.jdbc.table.drop=${jbosscache-cl-cache.jdbc.table.drop} cache.jdbc.table.primarykey=${jbosscache-cl-cache.jdbc.table.primarykey} cache.jdbc.fqn.column=${jbosscache-cl-cache.jdbc.fqn.column} cache.jdbc.fqn.type=${jbosscache-cl-cache.jdbc.fqn.type} cache.jdbc.node.column=${jbosscache-cl-cache.jdbc.node.column} cache.jdbc.node.type=${jbosscache-cl-cache.jdbc.node.type} cache.jdbc.parent.column=${jbosscache-cl-cache.jdbc.parent.column} cache.jdbc.datasource=${jbosscache-cl-cache.jdbc.datasource} </properties> </loader> </loaders> </jbosscache>
To prevent any consistency issue regarding the lock data + please ensure that your cache loader is org.exoplatform.services.jcr.impl.core.lock.jbosscache.JDBCCacheLoader and that your database engine is transactional.
As you see, all configurable parameters are filled by templates and will be replaced by LockManagers configuration parameters:
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="test-jbosscache-lock.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-locks-ws" /> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks_ws" /> <property name="jbosscache-cl-cache.jdbc.table.create" value="true" /> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" /> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_ws_pk" /> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" /> <property name="jbosscache-cl-cache.jdbc.fqn.type" value="AUTO"/> <property name="jbosscache-cl-cache.jdbc.node.column" value="node" /> <property name="jbosscache-cl-cache.jdbc.node.type" value="AUTO"/> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" /> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager>
Configuration requirements:
jbosscache-cl-cache.jdbc.fqn.column
and jbosscache-cl-cache.jdbc.node.type
is
the same as cache.jdbc.fqn.type and cache.jdbc.node.type in
JBoss-Cache configuration. You can set those data types according
to your database type or set it as AUTO (or do not set at all) and
data type will be detected automatically.
As you see, jgroups-configuration is moved to separate the configuration file - udp-mux.xml. In this case, the udp-mux.xml file is a common JGroup configuration for all components (QueryHandler, Cache, LockManager), but we can still create our own configuration.
our udp-mux.xml
<config> <UDP singleton_name="JCR-cluster" mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}" mcast_port="${jgroups.udp.mcast_port:45588}" tos="8" ucast_recv_buf_size="20000000" ucast_send_buf_size="640000" mcast_recv_buf_size="25000000" mcast_send_buf_size="640000" loopback="false" discard_incompatible_packets="true" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="true" ip_ttl="${jgroups.udp.ip_ttl:2}" enable_bundling="false" enable_diagnostics="true" thread_naming_pattern="cl" use_concurrent_stack="true" thread_pool.enabled="true" thread_pool.min_threads="2" thread_pool.max_threads="8" thread_pool.keep_alive_time="5000" thread_pool.queue_enabled="true" thread_pool.queue_max_size="1000" thread_pool.rejection_policy="discard" oob_thread_pool.enabled="true" oob_thread_pool.min_threads="1" oob_thread_pool.max_threads="8" oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false" oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="Run" /> <PING timeout="2000" num_initial_members="3"/> <MERGE2 max_interval="30000" min_interval="10000"/> <FD_SOCK /> <FD timeout="10000" max_tries="5" shun="true" /> <VERIFY_SUSPECT timeout="1500" /> <BARRIER /> <pbcast.NAKACK use_stats_for_retransmission="false" exponential_backoff="150" use_mcast_xmit="true" gc_lag="0" retransmit_timeout="50,300,600,1200" discard_delivered_msgs="true"/> <UNICAST timeout="300,600,1200" /> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/> <VIEW_SYNC avg_send_interval="60000" /> <pbcast.GMS print_local_addr="true" join_timeout="3000" shun="false" view_bundling="true"/> <FC max_credits="500000" min_threshold="0.20"/> <FRAG2 frag_size="60000" /> <!--pbcast.STREAMING_STATE_TRANSFER /--> <pbcast.STATE_TRANSFER /> <pbcast.FLUSH /> </config>
Table 1.8. FQN type and node type in different databases
DataBase name | Node data type | FQN data type |
---|---|---|
default | BLOB | VARCHAR(512) |
HSSQL | OBJECT | VARCHAR(512) |
MySQL | LONGBLOB | VARCHAR(512) |
ORACLE | BLOB | VARCHAR2(512) |
PostgreSQL/PostgrePlus | bytea | VARCHAR(512) |
MSSQL | VARBINARY(MAX) | VARCHAR(512) |
DB2 | BLOB | VARCHAR(512) |
Sybase | IMAGE | VARCHAR(512) |
Ingres | long byte | VARCHAR(512) |
There are 3 choices:
I. When new Shareable Cache feature is not going to be used and all locks should be kept after migration.
Ensure that the same lock tables used in configuration;
Start the server;
II. When new Shareable Cache feature is not going to be used and all locks should be removed after migration.
Ensure that the same lock tables used in configuration;
Start the sever WITH system property -Dorg.exoplatform.jcr.locks.force.remove=true;
Stop the server;
Start the server (WITHOUT system property -Dorg.exoplatform.jcr.locks.force.remove);
III. When new Shareable Cache feature will be used (in this case all locks are removed after migration).
Start the sever WITH system property -Dorg.exoplatform.jcr.locks.force.remove=true;
Stop the server;
Start the server (WITHOUT system property -Dorg.exoplatform.jcr.locks.force.remove);
(Not mandatory) manually remove old tables for lock;
This section shows you how to configure QueryHandler: Indexing in clustered environment.
JCR offers multiple indexing strategies. They include both for standalone and clustered environments using the advantages of running in a single JVM or doing the best to use all resources available in cluster. JCR uses Lucene library as underlying search and indexing engine, but it has several limitations that greatly reduce possibilities and limits the usage of cluster advantages. That's why eXo JCR offers three strategies that are suitable for it's own usecases. They are standalone, clustered with shared index, clustered with local indexes and RSync-based. Each one has it's pros and cons.
Stanadlone strategy provides a stack of indexes to achieve greater performance within single JVM.
It combines in-memory buffer index directory with delayed file-system flushing. This index is called "Volatile" and it is invoked in searches also. Within some conditions volatile index is flushed to the persistent storage (file system) as new index directory. This allows to achieve great results for write operations.
Clustered implementation with local indexes is built upon same strategy with volatile in-memory index buffer along with delayed flushing on persistent storage.
As this implementation designed for clustered environment it has additional mechanisms for data delivery within cluster. Actual text extraction jobs done on the same node that does content operations (i.e. write operation). Prepared "documents" (Lucene term that means block of data ready for indexing) are replicated withing cluster nodes and processed by local indexes. So each cluster instance has the same index content. When new node joins the cluster it has no initial index, so it must be created. There are some supported ways of doing this operation. The simplest is to simply copy the index manually but this is not intended for use. If no initial index found JCR uses automated sceneries. They are controlled via configuration (see "index-recovery-mode" parameter) offering full re-indexing from database or copying from another cluster node.
For some reasons having a multiple index copies on each instance can be costly. So shared index can be used instead (see diagram below).
This indexing strategy combines advantages of in-memory index along with shared persistent index offering "near" real time search capabilities. This means that newly added content is accessible via search practically immediately. This strategy allows nodes to index data in their own volatile (in-memory) indexes, but persistent indexes are managed by single "coordinator" node only. Each cluster instance has a read access for shared index to perform queries combining search results found in own in-memory index also. Take in account that shared folder must be configured in your system environment (i.e. mounted NFS folder). But this strategy in some extremely rare cases can have a bit different volatile indexes within cluster instances for a while. In a few seconds they will be up2date.
Shared index is consistent and stable enough, but slow, while local index is fast, but requires much time for re-synchronization, when cluster node is leaving a cluster for a small period of time. RSync-based index solves this problem along with local file system advantages in term of speed.
This strategy is the same shared index, but stores actual data on local file system, instead of shared. Eventually triggering a synchronization job, that woks on the level of file blocks, synchronizing only modified data. Diagram shows it in action. Only single node in the cluster is responsible for modifying index files, this is the Coordinator node. When data persisted, corresponding command fired, starting synchronization jobs all over the cluster.
See more about Search Configuration.
Configuration example:
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="shareddir/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> <property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" /> <property name="indexing-thread-pool-size" value="16" /> </properties> </query-handler> </workspace>
Table 1.9. Config properties description
Property name | Description |
---|---|
index-dir | path to index |
changesfilter-class | The FQN of the class to use to indicate the policy to use to manage the lucene indexes changes. This class must extend org.exoplatform.services.jcr.impl.core.query.IndexerChangesFilter. This must be set in cluster environment to define the clustering strategy to adopt. To use the Shared Indexes Strategy, you can set it to org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter. I you prefer the Local Indexes Strategy, you can set it to org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter. |
jbosscache-configuration | template of JBoss-cache configuration for all query-handlers in repository (search, cache, locks) |
jgroups-configuration | This is the path to JGroups configuration that should not be anymore jgroups' stack definitions but a normal jgroups configuration format with the shared transport configured by simply setting the jgroups property singleton_name to a unique name (it must remain unique from one portal container to another). This file is also pre-bundled with templates and is recommended for use. |
jgroups-multiplexer-stack | if set to true, it will indicate that the file corresponding to the parameter jgroups-configuration is a actually a file defining a set of jgroups multiplexer stacks. In the XML tag jgroupsConfig within the jboss cache configuration, you will then be able to set the name of the multiplexer stack to use thanks to the attribute multiplexerStack. Please note that the jgroups multiplexer has been deprecated by the jgroups Team and has been replaced by the shared transport so it is highly recommended to not use it anymore. |
jbosscache-cluster-name | cluster name (must be unique) |
max-volatile-time | max time to live for Volatile Index |
rdbms-reindexing | Indicates whether the rdbms re-indexing mechanism must be used, the default value is true. |
reindexing-page-size | maximum amount of nodes which can be retrieved from storage for re-indexing purpose, the default value is 100 |
index-recovery-mode | If the parameter has been set to
from-indexing , so a full indexing will be
automatically launched, if the parameter has been set to
from-coordinator (default behavior), the
index will be retrieved from coordinator |
index-recovery-filter | Defines implementation class or classes of RecoveryFilters, the mechanism of index synchronization for Local Index strategy. |
async-reindexing | Controls the process of re-indexing on JCR's startup. If flag set, indexing will be launched asynchronously, without blocking the JCR. Default is "false". |
indexing-thread-pool-size | Defines the total amount of indexing threads. |
max-volatile-size | The maximum volatile index size in bytes until it is written to disk. The default value is 1048576 (1MB). |
If you use postgreSQL and the parameter rdbms-reindexing is set to true, the performances of the queries used while indexing can be improved by setting the parameter "enable_seqscan" to "off" or "default_statistics_target" to at least "50" in the configuration of your database. Then you need to restart DB server and make analyze of the JCR_SVALUE (or JCR_MVALUE) table.
If you use DB2 and the parameter rdbms-reindexing is set to true, the performance of the queiries used while indexing can be improved by making statisticks on tables by running "RUNSTATS ON TABLE <scheme>.<table> WITH DISTRIBUTION AND INDEXES ALL" for JCR_SITEM (or JCR_MITEM) and JCR_SVALUE (or JCR_MVALUE) tables.
When running JCR in standalone usually standalone indexing is used also. Such parameters as "changesfilter-class", "jgroups-configuration" and all the "jbosscache-*" must be skipped and not defined. Like the configuration below.
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="shareddir/index/db1/ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler> </workspace>
For both cluster-ready implementations JBoss Cache, JGroups and Changes Filter values must be defined. Shared index requires some kind of remote or shared file system to be attached in a system (i.e. NFS, SMB or etc). Indexing directory ("indexDir" value) must point to it. Setting "changesfilter-class" to "org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" will enable shared index implementation.
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler>
Mandatory requirement for Rsync-based indexing strategy is an installed and properly configured RSync utility. It must be accessible by calling "rsync" without defining it's full path, in addition each cluster node should have a running RSync Server supporting "rsync://" protocol. For more details, please refer to RSync and operation system documentations. Sample RSync Server configuration will be shown below. There are some additional limitations also. Path for index for each workspace must be the same across the cluster, i.e. "/var/data/index/<repository-name>/<workspace-name>". Next limitation is RSync Server configuration. It must share some of index's parent folders. For example, "/var/data/index". In other words, index is stored inside of RSync Server shared folder. Configuration details are give below.
Configuration has much in common with Shared Index, it just requires some additional parameters for RSync options. If they are present, JCR switches from shared to RSync-based index. Here is an example configuration:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="/var/data/index/repository1/production" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jar:/conf/portal/cluster/jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="jar:/conf/portal/cluster/udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="false" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer" /> <property name="jbosscache-shareable" value="true" /> <property name="max-volatile-time" value="60" /> <property name="rsync-entry-name" value="index" /> <property name="rsync-entry-path" value="/var/data/index" /> <property name="rsync-port" value="8085" /> <property name="rsync-user" value="rsyncexo" /> <property name="rsync-password" value="exo" /> </properties> </query-handler>
Let's start with authentication: "rsync-user" and "rsync-password". They are optional and can be skipped if RSync Server configured to accept anonymous identity. Before reviewing other RSync index options need to have a look at RSync Server configuration. Sample RSync Server (rsyncd) Configuration
uid = nobody gid = nobody use chroot = no port = 8085 log file = rsyncd.log pid file = rsyncd.pid [index] path = /var/data/index comment = indexes read only = true auth users = rsyncexo secrets file= rsyncd.secrets
This sample configuration shares folder "/var/data/index" as an entry "index". Those parameters should match corresponding properties in JCR configuration. Respectively "rsync-entry-name", "rsync-entry-path", "rsync-port" properties. Notice! Make sure "index-dir" is a descendant folder of RSync shared folder and those paths are the same on each cluster node.
In order to use cluster-ready strategy based on local indexes, when each node has own copy of index on local file system, the following configuration must be applied. Indexing directory must point to any folder on local file system and "changesfilter-class" must be set to "org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter".
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler>
Common usecase for all cluster-ready applications is a hot joining and leaving of processing units. Node that is joining cluster for the first time or node joining after some downtime, they all must be in a synchronized state. When having a deal with shared value storages, databases and indexes, cluster nodes are synchronized anytime. But it's an issue when local index strategy used. If new node joins cluster, having no index it is retrieved or recreated. Node can be restarted also and thus index not empty. By default existing index is thought to be actual, but can be outdated. JCR offers a mechanism called RecoveryFilters that will automatically retrieve index for the joining node on startup. This feature is a set of filters that can be defined via QueryHandler configuration:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" />
Filter number is not limited so they can be combined:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter" /> <property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter" />
If any one fires, the index is re-synchronized. Please take in account, that DocNumberRecoveryFilter is used in cases when no filter configured. So, if resynchronization should be blocked, or strictly required on start, then ConfigurationPropertyRecoveryFilter can be used.
This feature uses standard index recovery mode defined by previously described parameter (can be "from-indexing" or "from-coordinator" (default value))
<property name="index-recovery-mode" value="from-coordinator" />
There are couple implementations of filters:
org.exoplatform.services.jcr.impl.core.query.lucene.DummyRecoveryFilter: always returns true, for cases when index must be force resynchronized (recovered) each time;
org.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFilter : return value of system property "org.exoplatform.jcr.recoveryfilter.forcereindexing". So index recovery can be controlled from the top without changing documentation using system properties;
org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter : return value of QueryHandler configuration property "index-recovery-filter-forcereindexing". So index recovery can be controlled from configuration separately for each workspace. I.e:
<property name="index-recovery-filter" value="org.exoplatform.services.jcr.impl.core.query.lucene.ConfigurationPropertyRecoveryFilter" /> <property name="index-recovery-filter-forcereindexing" value="true" />
org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFilter : checks number of documents in index on coordinator side and self-side. Return true if differs. Advantage of this filter comparing to other, it will skip reindexing for workspaces where index wasn't modified. I.e. there is 10 repositories with 3 workspaces in each one. Only one is really heavily used in cluster : frontend/production. So using this filter will only reindex those workspaces that are really changed, without affecting other indexes thus greatly reducing startup time.
JBoss-Cache template configuration for query handler is about the same for both clustered strategies.
jbosscache-indexer.xml
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <!-- Configure the TransactionManager --> <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <sync /> </clustering> </jbosscache>
See more about template configurations here.
Managing a big set of data using JCR in production environment sometimes requires special operations with Indexes, stored on File System. One of those maintenance operations is a recreation of it. Also called "re-indexing". There are various usecases when it's important to do. They include hardware faults, hard restarts, data-corruption, migrations and JCR updates that brings new features related to index. Usually index re-creation requested on server's startup or in runtime.
Common usecase for updating and re-creating the index is to stop the server and manually remove indexes for workspaces requiring it. When server will be started, missing indexes are automatically recovered by re-indexing. JCR Supports direct RDBMS re-indexing, that usually is faster than ordinary and can be configured via QueryHandler parameter "rdbms-reindexing" set to "true" (for more information please refer to "Query-handler configuration overview"). New feature to introduce is asynchronous indexing on startup. Usually startup is blocked until process is finished. Block can take any period of time, depending on amount of data persisted in repositories. But this can be resolved by using an asynchronous approaches of startup indexation. Saying briefly, it performs all operations with index in background, without blocking the repository. This is controlled by the value of "async-reindexing" parameter in QueryHandler configuration. With asynchronous indexation active, JCR starts with no active indexes present. Queries on JCR still can be executed without exceptions, but no results will be returned until index creation completed. Checking index state is possible via QueryManagerImpl:
boolean online = ((QueryManagerImpl)Workspace.getQueryManager()).getQueryHandeler().isOnline();
"OFFLINE" state means that index is currently re-creating. When state changed, corresponding log event is printed. From the start of background task index is switched to "OFFLINE", with following log event :
[INFO] Setting index OFFLINE (repository/production[system]).
When process finished, two events are logged :
[INFO] Created initial index for 143018 nodes (repository/production[system]). [INFO] Setting index ONLINE (repository/production[system]).
Those two log lines indicates the end of process for workspace given in brackets. Calling isOnline() as mentioned above, will also return true.
Some hard system faults, error during upgrades, migration issues and some other factors may corrupt the index. Most likely end customers would like the production systems to fix index issues in run-time, without delays and restarts. Current versions of JCR supports "Hot Asynchronous Workspace Reindexing" feature. It allows end-user (Service Administrator) to launch the process in background without stopping or blocking whole application by using any JMX-compatible console (see screenshot below, "JConsole in action").
Server can continue working as expected while index is recreated. This depends on the flag "allow queries", passed via JMX interface to reindex operation invocation. If the flag set, then application continues working. But there is one critical limitation the end-users must be aware. Index is frozen while background task is running. It meant that queries are performed on index present on the moment of task startup and data written into repository after startup won't be available through the search until process finished. Data added during re-indexation is also indexed, but will be available only when task is done. Briefly, JCR makes the snapshot of indexes on asynch task startup and uses it for searches. When operation finished, stale indexes replaced by newly created including newly added data. If flag "allow queries" is set to false, then all queries will throw an exception while task is running. Current state can be acquired using the following JMX operation:
getHotReindexingState() - returns information about latest invocation: start time, if in progress or finish time if done.
First of all, can't launch Hot re-indexing via JMX if index is already in offline mode. It means that index is currently is invoked in some operations, like re-indexing at startup, copying in cluster to another node or whatever. Another important this is Hot Asynchronous Reindexing via JMX and "on startup" reindexing are completely different features. So you can't get the state of startup reindexing using command getHotReindexingState in JMX interface, but there are some common JMX operations:
getIOMode - returns current index IO mode (READ_ONLY / READ_WRITE), belongs to clustered configuration states;
getState - returns current state: ONLINE / OFFLINE.
As mentioned above, JCR Indexing is based on Lucene indexing library as underlying search engine. It uses Directories to store index and manages access to index by Lock Factories. By default JCR implementation uses optimal combination of Directory implementation and Lock Factory implementation. When running on OS different from Windows, NIOFSDirectory implementation used. And SimpleFSDirectory for Windows stations. NativeFSLockFactory is an optimal solution for wide variety of cases including clustered environment with NFS shared resources. But those default can be overridden with the help of system properties. There are two properties: "org.exoplatform.jcr.lucene.store.FSDirectoryLockFactoryClass" and "org.exoplatform.jcr.lucene.FSDirectory.class" that are responsible for changing default behavior. First one defines implementation of abstract Lucene LockFactory class and the second one sets implementation class for FSDirectory instances. For more information please refer to Lucene documentation. But be sure You know what You are changing. JCR allows end users to change implementation classes of Lucene internals, but doesn't guarantee it's stability and functionality.
From time to time, the Lucene index needs to be optimized. The process is essentially a defragmentation. Until an optimization is triggered Lucene only marks deleted documents as such, no physical deletions are applied. During the optimization process the deletions will be applied. Optimizing the Lucene index speeds up searches but has no effect on the indexation (update) performance. First of all ensure repository is suspended to avoid any possible inconsistency. It is recommended to schedule optimization. Also checking for pending deletions is supported. If it is so, it is a first signal to index optimization. All operation are available via JMX:
JBossTransactionsService implements eXo TransactionService and provides access to JBoss Transaction Service (JBossTS) JTA implementation via eXo container dependency.
TransactionService is used in JCR cache org.exoplatform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache implementaion. See Cluster configuration for example.
Example configuration:
<component> <key>org.exoplatform.services.transaction.TransactionService</key> <type>org.exoplatform.services.transaction.jbosscache.JBossTransactionsService</type> <init-params> <value-param> <name>timeout</name> <value>3000</value> </value-param> </init-params> </component>
timeout - XA transaction timeout in seconds
JBossCache class is registered as an eXo container component in the configuration.xml file.
<component> <key>org.jboss.cache.transaction.TransactionManagerLookup</key> <type>org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup</type> </component>
JBossStandaloneJTAManagerLookup is used in a standalone environment, but GenericTransactionManagerLookup is used in the Application Server environment.
eXo JCR can rely on distributed cache such as Infinispan. This article describes the required configuration.
<component> <key>org.infinispan.transaction.lookup.TransactionManagerLookup</key> <type>org.exoplatform.services.transaction.infinispan.JBossStandaloneJTAManagerLookup</type> </component> <component profiles="ispn"> <key>org.exoplatform.services.transaction.TransactionService</key> <type>org.exoplatform.services.transaction.infinispan.JBossTransactionsService</type> <init-params> <value-param> <name>timeout</name> <value>3000</value> </value-param> </init-params> </component> <component profiles="ispn"> <key>org.exoplatform.services.rpc.RPCService</key> <type>org.exoplatform.services.rpc.jgv3.RPCServiceImpl</type> <init-params> <value-param> <name>jgroups-configuration</name> <value>jar:/conf/udp-mux-v3.xml</value> </value-param> <value-param> <name>jgroups-cluster-name</name> <value>RPCService-Cluster</value> </value-param> <value-param> <name>jgroups-default-timeout</name> <value>0</value> </value-param> </init-params> </component>
Each mentioned below components uses instances of Infinispan Cache product for caching in clustered environment. So every element has it's own transport and has to be configured in a proper way. As usual, workspaces have similar configuration. The simplest way to configure them is to define their own configuration files for each component in each workspace. There are several commons parameters.
"infinispan-configuration" defines path to template based configuration for Infinispan Cache instance.
JGroups is used by Infinispan Cache for network communications and transport in a clustered environment. If property "jgroups-configuration" is defined in component configuration, it will be injected into the Infinispan Cache instance on startup.
The another parameter is "infinispan-cluster-name". This defines the name of the cluster. Needs to be the same for all nodes in a cluster in order to find each other.
Cache configuration:
<cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.infinispan.ISPNCacheWorkspaceStorageCache"> <properties> <property name="infinispan-configuration" value="jar:/conf/portal/cluster/infinispan-data.xml" /> <property name="jgroups-configuration" value="jar:/conf/udp-mux-v3.xml" /> <property name="infinispan-cluster-name" value="JCR-cluster" /> </properties> </cache>
Indexer configuration
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="${exo.jcr.parent.dir:..}/temp/jcrlucenedb/production" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.ispn.ISPNIndexChangesFilter" /> <property name="infinispan-configuration" value="jar:/conf/portal/cluster/infinispan-indexer.xml" /> <property name="jgroups-configuration" value="jar:/conf/udp-mux-v3.xml" /> <property name="infinispan-cluster-name" value="JCR-cluster" /> <property name="max-volatile-time" value="60" /> </properties> </query-handler>
changesfilter-class - defines cluster-ready
index strategy based on Infinispan Cache, it can be either
org.exoplatform.services.jcr.impl.core.query.ispn.ISPNIndexChangesFilter
(for shared and rsync-based index strategies) or
org.exoplatform.services.jcr.impl.core.query.ispn.LocalIndexChangesFilter
(for local index)
Lock Manager configuration
<lock-manager class="org.exoplatform.services.jcr.impl.core.lock.infinispan.ISPNCacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="infinispan-configuration" value="jar:/conf/portal/cluster/infinispan-lock.xml" /> <property name="jgroups-configuration" value="jar:/conf/udp-mux-v3.xml" /> <property name="infinispan-cluster-name" value="JCR-cluster" /> <property name="infinispan-cl-cache.jdbc.table.name" value="lk" /> <property name="infinispan-cl-cache.jdbc.table.create" value="true" /> <property name="infinispan-cl-cache.jdbc.table.drop" value="false" /> <property name="infinispan-cl-cache.jdbc.id.column" value="id" /> <property name="infinispan-cl-cache.jdbc.data.column" value="data" /> <property name="infinispan-cl-cache.jdbc.timestamp.column" value="timestamp" /> <property name="infinispan-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="infinispan-cl-cache.jdbc.connectionFactory" value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" /> </properties> </lock-manager>^
infinispan-cl-cache.jdbc.table.name - table name
infinispan-cl-cache.jdbc.table.create - is true or false. Indicates whether to create table at start phase. If true, the table is created if it does not already exist.
infinispan-cl-cache.jdbc.table.drop - is true or false. Indicates whether to drop the table at stop phase.
infinispan-cl-cache.jdbc.id.column - id column name
infinispan-cl-cache.jdbc.data.column - data column name
infinispan-cl-cache.jdbc.timestamp.column - timestamp column name
infinispan-cl-cache.jdbc.datasource - name of the datasource to use to store locks.
infinispan-cl-cache.jdbc.connectionFactory - connection factory to use with the JDBC Cache Store.
eXo JCR implementation is shipped with ready-to-use Infinispan Cache configuration templates for JCR's components.
Data container template is "infinispan-data.xml":
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd". xmlns="urn:infinispan:config:5.1"> <global> <evictionScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory"> <properties> <property name="threadNamePrefix" value="EvictionThread"/> </properties> </evictionScheduledExecutor> <globalJmxStatistics jmxDomain="exo" enabled="true" allowDuplicateDomains="true"/> <transport transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport" clusterName="${infinispan-cluster-name}" distributedSyncTimeout= <properties> <property name="configurationFile" value="${jgroups-configuration}"/> </properties> </transport> </global> <default> <clustering mode="replication"> <stateTransfer timeout="20000" fetchInMemoryState="false" /> <sync replTimeout="20000"/> </clustering> <locking isolationLevel="READ_COMMITTED" lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="500" useLockStriping="true"/> <transaction transactionManagerLookupClass="org.exoplatform.services.transaction.infinispan.JBossStandaloneJTAManagerLookup" syncRollbackPhase="true" s <jmxStatistics enabled="true"/> <eviction strategy="LRU" threadPolicy="DEFAULT" maxEntries="1000000"/> <expiration wakeUpInterval="5000"/> </default> </infinispan>
Table 1.10. Template variables
Variable | Description |
---|---|
jgroups-configuration | This is the path to JGroups configuration that should not be anymore jgroups' stack definitions but a normal jgroups configuration format with the shared transport configured by simply setting the jgroups property singleton_name to a unique name (it must remain unique from one portal container to another). This file is also pre-bundled with templates and is recommended for use. |
infinispan-cluster-name | This defines the name of the cluster. Needs to be the same for all nodes in a cluster in order to find each other. |
Its template name is "infinispan-lock.xml"
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd". xmlns="urn:infinispan:config:5.1"> <global> <evictionScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory"> <properties> <property name="threadNamePrefix" value="EvictionThread"/> </properties> </evictionScheduledExecutor> <globalJmxStatistics jmxDomain="exo" enabled="true" allowDuplicateDomains="true"/> <transport transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport" clusterName="${infinispan-cluster-name}" distributedSyncTimeout= <properties> <property name="configurationFile" value="${jgroups-configuration}"/> </properties> </transport> </global> <default> <clustering mode="replication"> <stateTransfer timeout="20000" fetchInMemoryState="false" /> <sync replTimeout="20000"/> </clustering> <locking isolationLevel="READ_COMMITTED" lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="500" useLockStriping="false"/> <transaction transactionManagerLookupClass="org.exoplatform.services.transaction.infinispan.JBossStandaloneJTAManagerLookup" syncRollbackPhase="true" s <jmxStatistics enabled="true"/> <eviction strategy="NONE"/> <loaders passivation="false" shared="true" preload="true"> <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStar <properties> <property name="stringsTableNamePrefix" value="${infinispan-cl-cache.jdbc.table.name}"/> <property name="idColumnName" value="${infinispan-cl-cache.jdbc.id.column}"/> <property name="dataColumnName" value="${infinispan-cl-cache.jdbc.data.column}"/> <property name="timestampColumnName" value="${infinispan-cl-cache.jdbc.timestamp.column}"/> <property name="idColumnType" value="${infinispan-cl-cache.jdbc.id.type}"/> <property name="dataColumnType" value="${infinispan-cl-cache.jdbc.data.type}"/> <property name="timestampColumnType" value="${infinispan-cl-cache.jdbc.timestamp.type}"/> <property name="dropTableOnExit" value="${infinispan-cl-cache.jdbc.table.drop}"/> <property name="createTableOnStart" value="${infinispan-cl-cache.jdbc.table.create}"/> <property name="connectionFactoryClass" value="${infinispan-cl-cache.jdbc.connectionFactory}"/> <property name="datasourceJndiLocation" value="${infinispan-cl-cache.jdbc.datasource}"/> </properties> <async enabled="false"/> </loader> </loaders> </default> </infinispan>
Table 1.11. Template variables
Variable | Description |
---|---|
jgroups-configuration | This is the path to JGroups configuration that should not be anymore jgroups' stack definitions but a normal jgroups configuration format with the shared transport configured by simply setting the jgroups property singleton_name to a unique name (it must remain unique from one portal container to another). This file is also pre-bundled with templates and is recommended for use. |
infinispan-cluster-name | This defines the name of the cluster. Needs to be the same for all nodes in a cluster in order to find each other. |
infinispan-cl-cache.jdbc.table.name | The table name. |
infinispan-cl-cache.jdbc.id.column | The name of the column id. |
infinispan-cl-cache.jdbc.data.column | The name of the column data. |
infinispan-cl-cache.jdbc.timestamp.column | The name of the column timestamp. |
infinispan-cl-cache.jdbc.id.type | The type of the column id. |
infinispan-cl-cache.jdbc.data.type | The type of the column data. |
infinispan-cl-cache.jdbc.timestamp.type | The type of the column timestamp. |
infinispan-cl-cache.jdbc.table.drop | Can be set to true or false. Indicates whether to drop the table at stop phase. |
infinispan-cl-cache.jdbc.table.create | Can be set to true or false. Indicates whether to create table at start phase. If true, the table is created if it does not already exist. |
infinispan-cl-cache.jdbc.connectionFactory | The connection factory to use with the JDBC Cache Store. |
infinispan-cl-cache.jdbc.datasource | The name of the datasource to use to store locks. |
Have a look at "infinispan-indexer.xml"
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd". xmlns="urn:infinispan:config:5.1"> <global> <evictionScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory"> <properties> <property name="threadNamePrefix" value="EvictionThread"/> </properties> </evictionScheduledExecutor> <globalJmxStatistics jmxDomain="exo" enabled="true" allowDuplicateDomains="true"/> <transport transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport" clusterName="${infinispan-cluster-name}" distributedSyncTimeout= <properties> <property name="configurationFile" value="${jgroups-configuration}"/> </properties> </transport> </global> <default> <clustering mode="replication"> <stateTransfer timeout="20000" fetchInMemoryState="false" /> <sync replTimeout="20000"/> </clustering> <locking isolationLevel="READ_COMMITTED" lockAcquisitionTimeout="20000" writeSkewCheck="false" concurrencyLevel="500" useLockStriping="false"/> <transaction transactionManagerLookupClass="org.exoplatform.services.transaction.infinispan.JBossStandaloneJTAManagerLookup" syncRollbackPhase="true" s <jmxStatistics enabled="true"/> <eviction strategy="NONE"/> <loaders passivation="false" shared="false" preload="false"> <loader class="${infinispan-cachestore-classname}" fetchPersistentState="false" ignoreModifications="false" purgeOnStartup="false"> <async enabled="false"/> </loader> </loaders> </default> </infinispan>
Table 1.12. Template variables
Variable | Description |
---|---|
jgroups-configuration | This is the path to JGroups configuration that should not be anymore jgroups' stack definitions but a normal jgroups configuration format with the shared transport configured by simply setting the jgroups property singleton_name to a unique name (it must remain unique from one portal container to another). This file is also pre-bundled with templates and is recommended for use. |
infinispan-cluster-name | This defines the name of the cluster. Needs to be the same for all nodes in a cluster in order to find each other. |
RepositoryCreationService is the service which is used to create repositories in runtime. The service can be used in a standalone or cluster environment.
RepositoryConfigurationService depends to next components:
DBCreator - DBCreator used to create new database for each unbinded datasource.
BackupManager - BackupManager used to created repository from backup.
RPCService - RPCService used for communication between cluster-nodes
RPCService may not be configured - in this case, RepositoryService will work as standalone service.
User executes reserveRepositoryName(String repositoryName) - client-node calls coordinator-node to reserve repositoryName. If this name is already reserved or repository with this name exist, client-node will fetch RepositoryCreationException. If not Client will get token string.
than user executes createRepository(String backupId, RepositoryEntry rEntry, String token). Coordinator-node checks the token, and creates Repository.
whan repository become created - user-node broadcast message to all clusterNodes with RepositoryEntry, so each cluster node starts new Repository.
There is two ways to create repositry: make it in single step - just call createRepository(String backupId, RepositoryEntry); or reserve repositoryName at first (reserveRepositoryName(String repositoryName)), than create reserved repository (createRepository(String backupId, RepositoryEntry rEntry, String token)).
RepositoryCreationService configuration
<component> <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key> <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type> <init-params> <properties-param> <name>backup-properties</name> <property name="backup-dir" value="target/backup" /> </properties-param> </init-params> </component> <component> <key>org.exoplatform.services.database.creator.DBCreator</key> <type>org.exoplatform.services.database.creator.DBCreator</type> <init-params> <properties-param> <name>db-connection</name> <description>database connection properties</description> <property name="driverClassName" value="org.hsqldb.jdbcDriver" /> <property name="url" value="jdbc:hsqldb:file:target/temp/data/" /> <property name="username" value="sa" /> <property name="password" value="" /> </properties-param> <properties-param> <name>db-creation</name> <description>database creation properties</description> <property name="scriptPath" value="src/test/resources/test.sql" /> <property name="username" value="sa" /> <property name="password" value="" /> </properties-param> </init-params> </component> <component> <key>org.exoplatform.services.rpc.RPCService</key> <type>org.exoplatform.services.rpc.impl.RPCServiceImpl</type> <init-params> <value-param> <name>jgroups-configuration</name> <value>jar:/conf/standalone/udp-mux.xml</value> </value-param> <value-param> <name>jgroups-cluster-name</name> <value>RPCService-Cluster</value> </value-param> <value-param> <name>jgroups-default-timeout</name> <value>0</value> </value-param> </init-params> </component> <component> <key>org.exoplatform.services.jcr.ext.repository.creation.RepositoryCreationService</key> <type> org.exoplatform.services.jcr.ext.repository.creation.RepositoryCreationServiceImpl </type> <init-params> <value-param> <name>factory-class-name</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> </init-params> </component>
factory-class-name - is not mandatory parameter, indicates what the factory need to use to create DataSource objects
public interface RepositoryCreationService { /** * Reserves, validates and creates repository in a simplified form. * * @param rEntry - repository Entry - note that datasource must not exist. * @param backupId - backup id * @param creationProps - storage creation properties * @throws RepositoryConfigurationException * if some exception occurred during repository creation or repository name is absent in reserved list * @throws RepositoryCreationServiceException * if some exception occurred during repository creation or repository name is absent in reserved list */ void createRepository(String backupId, RepositoryEntry rEntry, StorageCreationProperties creationProps) throws RepositoryConfigurationException, RepositoryCreationException; /** * Reserves, validates and creates repository in a simplified form. * * @param rEntry - repository Entry - note that datasource must not exist. * @param backupId - backup id * @throws RepositoryConfigurationException * if some exception occurred during repository creation or repository name is absent in reserved list * @throws RepositoryCreationServiceException * if some exception occurred during repository creation or repository name is absent in reserved list */ void createRepository(String backupId, RepositoryEntry rEntry) throws RepositoryConfigurationException, RepositoryCreationException; /** * Reserve repository name to prevent repository creation with same name from other place in same time * via this service. * * @param repositoryName - repositoryName * @return repository token. Anyone obtaining a token can later create a repository of reserved name. * @throws RepositoryCreationServiceException if can't reserve name */ String reserveRepositoryName(String repositoryName) throws RepositoryCreationException; /** * Creates repository, using token of already reserved repository name. * Good for cases, when repository creation should be delayed or made asynchronously in dedicated thread. * * @param rEntry - repository entry - note, that datasource must not exist * @param backupId - backup id * @param rToken - token * @param creationProps - storage creation properties * @throws RepositoryConfigurationException * if some exception occurred during repository creation or repository name is absent in reserved list * @throws RepositoryCreationServiceException * if some exception occurred during repository creation or repository name is absent in reserved list */ void createRepository(String backupId, RepositoryEntry rEntry, String rToken, StorageCreationProperties creationProps) throws RepositoryConfigurationException, RepositoryCreationException; /** * Creates repository, using token of already reserved repository name. Good for cases, when repository creation should be delayed or * made asynchronously in dedicated thread. * * @param rEntry - repository entry - note, that datasource must not exist * @param backupId - backup id * @param rToken - token * @throws RepositoryConfigurationException * if some exception occurred during repository creation or repository name is absent in reserved list * @throws RepositoryCreationServiceException * if some exception occurred during repository creation or repository name is absent in reserved list */ void createRepository(String backupId, RepositoryEntry rEntry, String rToken) throws RepositoryConfigurationException, RepositoryCreationException; /** * Remove previously created repository. * * @param repositoryName - the repository name to delete * @param forceRemove - force close all opened sessions * @throws RepositoryCreationServiceException * if some exception occurred during repository removing occurred */ void removeRepository(String repositoryName, boolean forceRemove) throws RepositoryCreationException; }
Each datasource in RepositoryEntry of new Repository must have unbinded datasources. Thats mean, such datasource must have not databases behind them. This restriction exists to avoid corruption of existing repositories data.
RPCService is optional component, but without it, RepositoryCreatorService can not communicate with other cluster-nodes and works as standalone.
JCR supports two query languages - JCR and XPath. A query, whether XPath or SQL, specifies a subset of nodes within a workspace, called the result set. The result set constitutes all the nodes in the workspace that meet the constraints stated in the query.
SQL
// get QueryManager QueryManager queryManager = workspace.getQueryManager(); // make SQL query Query query = queryManager.createQuery("SELECT * FROM nt:base ", Query.SQL); // execute query QueryResult result = query.execute();
XPath
// get QueryManager QueryManager queryManager = workspace.getQueryManager(); // make XPath query Query query = queryManager.createQuery("//element(*,nt:base)", Query.XPATH); // execute query QueryResult result = query.execute();
// fetch query result QueryResult result = query.execute();
Now we can get result in an iterator of nodes:
NodeIterator it = result.getNodes();
or we get the result in a table:
// get column names String[] columnNames = result.getColumnNames(); // get column rows RowIterator rowIterator = result.getRows(); while(rowIterator.hasNext()){ // get next row Row row = rowIterator.nextRow(); // get all values of row Value[] values = row.getValues(); }
The result returns a score for each row in the result set. The score contains a value that indicates a rating of how well the result node matches the query. A high value means a better matching than a low value. This score can be used for ordering the result.
eXo JCR Scoring is a mapping of Lucene scoring. For a more in-depth understanding, please study Lucene documentation.
jcr:score counted in next way - (lucene score)*1000f.
Score may be increased for specified nodes, see Index Boost Value
Also, see an example Order by Score
Select all nodes with primary type 'nt:unstructured' and returns only 3 nodes starting with the second node in the list.
QueryImpl class has two methods: one to indicate how many results shall be returned at most, and another to fix the starting position.
setOffset(long offset) - Sets the start offset of the result set.
setLimit(long position) - Sets the maximum size of the result set.
Repository contains mix:title nodes, where jcr:title has different values.
root
node1 (nt:unstructured)
node2 (nt:unstructured)
node3 (nt:unstructured)
node4 (nt:unstructured)
node5 (nt:unstructured)
node6 (nt:unstructured)
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:unstructured"; QueryImpl query = (QueryImpl)queryManager.createQuery(sqlStatement, Query.SQL); //return starting with second result query.setOffset(1); // return 3 results query.setLimit(3); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
In usual case (without using setOffset and setLimit methods), Node iterator returns all nodes (node1...node6). But in our case NodeIterator will return "node2","node3" and "node4".
\[node1 node2 node3 node4 node5 node6\]
Find all nodes in the repository. Only those nodes are found to which the session has READ permission. See also Access Control.
Repository contains many different nodes.
root
folder1 (nt:folder)
document1 (nt:file)
folder2 (nt:folder)
document2 (nt:unstructured)
document3 (nt:folder)
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:base"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:base)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "folder1", "folder2","document1","document2","document3", and each other nodes in workspace if they are here.
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.13. Table content
jcr:path | jcr:score |
---|---|
/folder1 | 1000 |
/folder1/document1 | 1000 |
/folder1/folder2 | 1000 |
/folder1/folder2/document2 | 1000 |
/folder1/folder2/document3 | 1000 |
... | ... |
Find all nodes whose primary type is "nt:file".
The repository contains nodes with different primary types and mixin types.
root
document1 primarytype = "nt:unstructured" mixintype = "mix:title"
document2 primarytype = "nt:file" mixintype = "mix:lockable"
document3 primarytype = "nt:file" mixintype = "mix:title"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:file"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:file)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document2" and "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is
Find all nodes in repository, that contain a mixin type "mix:title".
The repository contains nodes with different primary types and mixin types.
root
document1 primarytype = "nt:unstructured" mixintype = "mix:title"
document2 primarytype = "nt:file" mixintype = "mix:lockable"
document3 primarytype = "nt:file" mixintype = "mix:title"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "document1" and "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.15. Table content
jcr:title | ... | jcr:path | jcr:score |
---|---|---|---|
First document | ... | /document1 | 2674 |
Second document | ... | /document3 | 2674 |
Find all nodes with mixin type 'mix:title' where the prop_pagecount property contains a value less than 90. Only select the title of each node.
Repository contains several mix:title nodes, where each prop_pagecount contains a different value.
root
document1 (mix:title) jcr:title="War and peace" prop_pagecount=1000
document2 (mix:title) jcr:title="Cinderella" prop_pagecount=100
document3 (mix:title) jcr:title="Puss in Boots" prop_pagecount=60
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT jcr:title FROM mix:title WHERE prop_pagecount < 90"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[@prop_pagecount < 90]/@jcr:title"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is
Find all nodes with mixin type 'mix:title' and where the property 'jcr:title' starts with 'P'.
See also the article about "Find all mix:title nodes where jcr:title does NOT start with 'P'"
The repository contains 3 mix:title nodes, where each jcr:title has a different value.
root
document1 (mix:title) jcr:title="Star wars" jcr:description="Dart rules!!"
document2 (mix:title) jcr:title="Prison break" jcr:description="Run, Forest, run ))"
document3 (mix:title) jcr:title="Panopticum" jcr:description="It's imagine film"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE jcr:title LIKE 'P%'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[jcr:like(@jcr:title, 'P%')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "document2" and "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is
Table 1.17. Table content
jcr:title | jcr:description | jcr:path | jcr:score |
---|---|---|---|
Prison break | Run, Forest, run )) | /document2 | 4713 |
Panopticum | It's imagine film | /document3 | 5150 |
Find all nodes with a mixin type 'mix:title' and whose property 'jcr:title' starts with 'P%ri'.
As you see "P%rison break" contains the symbol '%'. This symbol is reserved for LIKE comparisons. So what can we do?
Within the LIKE pattern, literal instances of percent ("%") or underscore ("_") must be escaped. The SQL ESCAPE clause allows the definition of an arbitrary escape character within the context of a single LIKE statement. The following example defines the backslash ' \' as escape character:
SELECT * FROM mytype WHERE a LIKE 'foo\%' ESCAPE '\'
XPath does not have any specification for defining escape symbols, so we must use the default escape character (' \').
The repository contains mix:title nodes, where jcr:title can have different values.
root
document1 (mix:title) jcr:title="Star wars" jcr:description="Dart rules!!"
document2 (mix:title) jcr:title="P%rison break" jcr:description="Run, Forest, run ))"
document3 (mix:title) jcr:title="Panopticum" jcr:description="It's imagine film"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE jcr:title LIKE 'P#%ri%' ESCAPE '#'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[jcr:like(@jcr:title, 'P\\%ri%')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is
Table 1.18. Table content
jcr:title | jcr:description | jcr:path | jcr:score |
---|---|---|---|
P%rison break | Run, Forest, run )) | /document2 | 7452 |
Find all nodes with a mixin type 'mix:title' and where the property 'jcr:title' does NOT start with a 'P' symbol
The repository contains a mix:title nodes, where the jcr:title has different values.
root
document1 (mix:title) jcr:title="Star wars" jcr:description="Dart rules!!"
document2 (mix:title) jcr:title="Prison break" jcr:description="Run, Forest, run ))"
document3 (mix:title) jcr:title="Panopticum" jcr:description="It's imagine film"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE NOT jcr:title LIKE 'P%'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[not(jcr:like(@jcr:title, 'P%'))]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get the nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document1".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.19. Table content
jcr:title | jcr:description | jcr:path | jcr:score |
---|---|---|---|
Star wars | Dart rules!! | /document1 | 4713 |
Find all fairytales with a page count more than 90 pages.
How does it sound in jcr terms - Find all nodes with mixin type 'mix:title' where the property 'jcr:description' equals "fairytale" and whose "prop_pagecount" property value is less than 90.
See also Multivalue Property Comparison.
The repository contains mix:title nodes, where prop_pagecount has different values.
root
document1 (mix:title) jcr:title="War and peace" jcr:description="novel" prop_pagecount=1000
document2 (mix:title) jcr:title="Cinderella" jcr:description="fairytale" prop_pagecount=100
document3 (mix:title) jcr:title="Puss in Boots" jcr:description="fairytale" prop_pagecount=60
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE jcr:description = 'fairytale' AND prop_pagecount > 90"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[@jcr:description='fairytale' and @prop_pagecount > 90]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.20. Table content
jcr:title | jcr:description | prop_pagecount | jcr:path | jcr:score |
---|---|---|---|---|
Cinderella | fairytale | 100 | /document2 | 7086 |
Find all documents whose title is 'Cinderella' or whose description is 'novel'.
How does it sound in jcr terms? - Find all nodes with a mixin type 'mix:title' whose property 'jcr:title' equals "Cinderella" or whose "jcr:description" property value is "novel".
The repository contains mix:title nodes, where jcr:title and jcr:description have different values.
root
document1 (mix:title) jcr:title="War and peace" jcr:description="novel"
document2 (mix:title) jcr:title="Cinderella" jcr:description="fairytale"
document3 (mix:title) jcr:title="Puss in Boots" jcr:description="fairytale"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE jcr:title = 'Cinderella' OR jcr:description = 'novel'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[@jcr:title='Cinderella' or @jcr:description = 'novel']"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document1" and "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.21. Table content
jcr:title | jcr:description | jcr:path | jcr:score |
---|---|---|---|
War and peace | novel | /document1 | 3806 |
Cinderella | fairytale | /document2 | 3806 |
Find all nodes with a mixin type 'mix:title' where the property 'jcr:description' does not exist (is null).
The repository contains mix:title nodes, in one of these nodes the jcr:description property is null.
root
document1 (mix:title) jcr:title="Star wars" jcr:description="Dart rules!!"
document2 (mix:title) jcr:title="Prison break" jcr:description="Run, Forest, run ))"
document3 (mix:title) jcr:title="Titanic" // The description property does not exist. This is the node we wish to find.
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE jcr:description IS NULL"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = ""//element(*,mix:title)[not(@jcr:description)]""; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Find all nodes with a mixin type 'mix:title' and where the property 'jcr:title' equals 'casesensitive' in lower or upper case.
The repository contains mix:title nodes, whose jcr:title properties have different values.
root
document1 (mix:title) jcr:title="CaseSensitive"
document2 (mix:title) jcr:title="casesensitive"
document3 (mix:title) jcr:title="caseSENSITIVE"
UPPER case
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE UPPER(jcr:title) = 'CASESENSITIVE'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[fn:upper-case(@jcr:title)='CASESENSITIVE']"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
LOWER case
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE LOWER(jcr:title) = 'casesensitive'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[fn:lower-case(@jcr:title)='casesensitive']"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document1", "document2" and "document3" (in all examples).
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.23. Table content
jcr:title | ... | jcr:path |
---|---|---|
CaseSensitive | ... | /document1 |
casesensitive | ... | /document2 |
caseSENSITIVE | ... | /document3 |
Find all nodes of primary type "nt:resource" whose jcr:lastModified property value is greater than 2006-06-04 and less than 2008-06-04.
Repository contains nt:resource nodes with different values of jcr:lastModified property
root
document1 (nt:file)
jcr:content (nt:resource) jcr:lastModified="2006-01-19T15:34:15.917+02:00"
document2 (nt:file)
jcr:content (nt:resource) jcr:lastModified="2005-01-19T15:34:15.917+02:00"
document3 (nt:file)
jcr:content (nt:resource) jcr:lastModified="2007-01-19T15:34:15.917+02:00"
SQL
In SQL you have to use the keyword TIMESTAMP for date comparisons. Otherwise, the date would be interpreted as a string. The date has to be surrounded by single quotes (TIMESTAMP 'datetime') and in the ISO standard format: YYYY-MM-DDThh:mm:ss.sTZD ( http://en.wikipedia.org/wiki/ISO_8601 and well explained in a W3C note http://www.w3.org/TR/NOTE-datetime).
You will see that it can be a date only (YYYY-MM-DD) but also a complete date and time with a timezone designator (TZD).
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query StringBuffer sb = new StringBuffer(); sb.append("select * from nt:resource where "); sb.append("( jcr:lastModified >= TIMESTAMP '"); sb.append("2006-06-04T15:34:15.917+02:00"); sb.append("' )"); sb.append(" and "); sb.append("( jcr:lastModified <= TIMESTAMP '"); sb.append("2008-06-04T15:34:15.917+02:00"); sb.append("' )"); String sqlStatement = sb.toString(); Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
Compared to the SQL format, you have to use the keyword xs:dateTime and surround the datetime by extra brackets: xs:dateTime('datetime'). The actual format of the datetime also conforms with the ISO date standard.
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query StringBuffer sb = new StringBuffer(); sb.append("//element(*,nt:resource)"); sb.append("["); sb.append("@jcr:lastModified >= xs:dateTime('2006-08-19T10:11:38.281+02:00')"); sb.append(" and "); sb.append("@jcr:lastModified <= xs:dateTime('2008-06-04T15:34:15.917+02:00')"); sb.append("]"); String xpathStatement = sb.toString(); Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node foundNode = it.nextNode(); }
NodeIterator will return "/document3/jcr:content".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is:
Table 1.24. Table content
jcr:lastModified | ... | jcr:path |
---|---|---|
2007-01-19T15:34:15.917+02:00 | ... | /document3/jcr:content |
Find all nodes with primary type 'nt:file' whose node name is 'document'. The node name is accessible by a function called "fn:name()".
fn:name() can be used ONLY with an equal('=') comparison.
The repository contains nt:file nodes with different names.
root
document1 (nt:file)
file (nt:file)
somename (nt:file)
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:file WHERE fn:name() = 'document'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:file)[fn:name() = 'document']"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return the node whose fn:name equals "document".
Also we can get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Find all nodes with the primary type 'nt:unstructured' whose property 'multiprop' contains both values "one" and "two".
The repository contains nt:unstructured nodes with different 'multiprop' properties.
root
node1 (nt:unstructured) multiprop = [ "one","two" ]
node1 (nt:unstructured) multiprop = [ "one","two","three" ]
node1 (nt:unstructured) multiprop = [ "one","five" ]
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:unstructured WHERE multiprop = 'one' AND multiprop = 'two'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:unstructured)[@multiprop = 'one' and @multiprop = 'two']"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "node1" and "node2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.26. Table content
jcr:primarytyp | jcr:path | jcr:score |
---|---|---|
nt:unstructured | /node1 | 3806 |
nt:unstructured | /node2 | 3806 |
Find a node with the primary type 'nt:file' that is located on the exact path "/folder1/folder2/document1".
Repository filled by different nodes. There are several folders which contain other folders and files.
root
folder1 (nt:folder)
folder2 (nt:folder)
document1 (nt:file) // This document we want to find
folder3 (nt:folder)
document1 (nt:file)
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // we want find 'document1' String sqlStatement = "SELECT * FROM nt:file WHERE jcr:path = '/folder1/folder2/document1'"; // create query Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // we want to find 'document1' String xpathStatement = "/jcr:root/folder1[1]/folder2[1]/element(document1,nt:file)[1]"; // create query Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Remark: The indexes [1] are used in order to get the same result as the SQL statement. SQL by default only returns the first node, whereas XPath fetches by default all nodes.
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return expected "document1".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Find all nodes with the primary type 'nt:folder' that are children of node by path "/root1/root2". Only find children, do not find further descendants.
The repository is filled by "nt:folder" nodes. The nodes are placed in a multilayer tree.
root
folder1 (nt:folder)
folder2 (nt:folder)
folder3 (nt:folder) // This node we want to find
folder4 (nt:folder) // This node is not child but a descendant of '/folder1/folder2/'.
folder5 (nt:folder) // This node we want to find
SQL
The use of "%" in the LIKE statement includes any string, therefore there is a second LIKE statement that excludes that the string contains "/". This way child nodes are included but descendant nodes are excluded.
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:folder WHERE jcr:path LIKE '/folder1/folder2/%' AND NOT jcr:path LIKE '/folder1/folder2/%/%'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "/jcr:root/folder1[1]/folder2[1]/element(*,nt:folder)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "folder3" and "folder5".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is:
Find all nodes with the primary type 'nt:folder' that are descendants of the node "/folder1/folder2".
The repository contains "nt:folder" nodes. The nodes are placed in a multilayer tree.
root
folder1 (nt:folder)
folder2 (nt:folder)
folder3 (nt:folder) // This node we want to find
folder4 (nt:folder) // This node we want to find
folder5 (nt:folder) // This node we want to find
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:folder WHERE jcr:path LIKE '/folder1/folder2/%'"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "/jcr:root/folder1[1]/folder2[1]//element(*,nt:folder)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return "folder3", "folder4" and "folder5" nodes.
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.29. Table content
jcr:path | jcr:score |
---|---|
/folder1/folder2/folder3 | 1000 |
/folder1/folder2/folder3/folder4 | 1000 |
/folder1/folder2/folder5 | 1000 |
Select all nodes with the mixin type ''mix:title' and order them by the 'prop_pagecount' property.
The repository contains several mix:title nodes, where prop_pagecount has different values.
root
document1 (mix:title) jcr:title="War and peace" jcr:description="roman" prop_pagecount=4
document2 (mix:title) jcr:title="Cinderella" jcr:description="fairytale" prop_pagecount=7
document3 (mix:title) jcr:title="Puss in Boots" jcr:description="fairytale" prop_pagecount=1
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title ORDER BY prop_pagecount ASC"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title) order by @prop_pagecount ascending"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
The NodeIterator will return nodes in the following order "document3", "document1", "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.30. Table content
jcr:title | jcr:description | prop_pagecount | jcr:path | jcr:score |
---|---|---|---|---|
Puss in Boots | fairytale | 1 | /document3 | 1405 |
War and peace | roman | 4 | /document1 | 1405 |
Cinderella | fairytale | 7 | /document2 | 1405 |
Find all nodes with the primary type 'nt:unstructured' and sort them by the property value of descendant nodes with the relative path '/a/b'.
This ORDER BY construction only works in XPath!
root
node1 (nt:unstructured)
a (nt:unstructured)
b (nt:unstructured)
node2 (nt:unstructured)
a (nt:unstructured)
b (nt:unstructured)
c (nt:unstructured) prop = "a"
node3 (nt:unstructured)
a (nt:unstructured)
b (nt:unstructured)
c (nt:unstructured) prop = "b"
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "/jcr:root/* order by a/b/c/@prop descending; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return nodes in the following order - "node3","node2" and "node1".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.31. Table content
jcr:primaryType | jcr:path | jcr:score |
---|---|---|
nt:unstructured | /testroot/node3 | 1000 |
nt:unstructured | /testroot/node2 | 1000 |
nt:unstructured | /testroot/node1 | 1000 |
Select all nodes with the mixin type 'mix:title' containing any word from the set {'brown','fox','jumps'}. Then, sort result by the score in ascending node. This way nodes that match better the query statement are ordered at the last positions in the result list.
SQL and XPath queries support both score constructions jcr:score and jcr:score()
SELECT * FROM nt:base ORDER BY jcr:score [ASC|DESC] SELECT * FROM nt:base ORDER BY jcr:score()[ASC|DESC] //element(*,nt:base) order by jcr:score() [descending] //element(*,nt:base) order by @jcr:score [descending]
Do not use "ascending" combined with jcr:score in XPath. The following XPath statement may throw an exception:
... order by jcr:score() ascending
Do not set any ordering specifier - ascending is default:
... order by jcr:score()
The repository contains mix:title nodes, where the jcr:description has different values.
root
document1 (mix:title) jcr:description="The quick brown fox jumps over the lazy dog."
document2 (mix:title) jcr:description="The brown fox lives in the forest."
document3 (mix:title) jcr:description="The fox is a nice animal."
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(*, 'brown OR fox OR jumps') ORDER BY jcr:score() ASC"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[jcr:contains(., 'brown OR fox OR jumps')] order by jcr:score()"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return nodes in the following order: "document3", "document2", "document1".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.32. Table content
jcr:description | ... | jcr:path | jcr:score |
---|---|---|---|
The fox is a nice animal. | ... | /document3 | 2512 |
The brown fox lives in the forest. | ... | /document2 | 3595 |
The quick brown fox jumps over the lazy dog. | ... | /document1 | 5017 |
Ordering by jcr:path or jcr:name does not supported.
There is two ways to order results, when path may be used as criteria:
Order by property with value type NAME or PATH (jcr supports it)
Order by jcr:path or jcr:name - sort by exact path or name of node (jcr do not supports it)
If no order specification is supplied in the query statement, implementations may support document order on the result nodes (see jsr-170 / 6.6.4.2 Document Order). And it's sorted by order number.
By default, (if query do not contains any ordering statements) result nodes is sorted by document order.
SELECT * FROM nt:unstructured WHERE jcr:path LIKE 'testRoot/%'
Find all nodes containing a mixin type 'mix:title' and whose 'jcr:description' contains "forest" string.
The repository is filled with nodes of the mixin type 'mix:title' and different values of the 'jcr:description' property.
root
document1 (mix:title) jcr:description = "The quick brown fox jumps over the lazy dog."
document2 (mix:title) jcr:description = "The brown fox lives in a forest." // This is the node we want to find
document3 (mix:title) jcr:description = "The fox is a nice animal."
document4 (nt:unstructured) jcr:description = "There is the word forest, too."
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // we want find document which contains "forest" word String sqlStatement = "SELECT \* FROM mix:title WHERE CONTAINS(jcr:description, 'forest')"; // create query Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // we want find document which contains "forest" word String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:description, 'forest')]"; // create query Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Find nodes with mixin type 'mix:title' where any property contains 'break' string.
Repository filled with different nodes with mixin type 'mix:title' and different values of 'jcr:title' and 'jcr:description' properties.
root
document1 (mix:title) jcr:title ='Star Wars' jcr:description = 'Dart rules!!'
document2 (mix:title) jcr:title ='Prison break' jcr:description = 'Run, Forest, run ))'
document3 (mix:title) jcr:title ='Titanic' jcr:description = 'An iceberg breaks a ship.'
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(*,'break')"; // create query Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // we want find 'document1' String xpathStatement = "//element(*,mix:title)[jcr:contains(.,'break')]"; // create query Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); while(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document1" and "document2".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.34. Table content
jcr:title | jcr:description | ... | jcr:path |
---|---|---|---|
Prison break. | Run, Forest, run )) | ... | /document2 |
Titanic | An iceberg breaks a ship. | ... | /document3 |
In this example, we will create new Analyzer, set it in QueryHandler configuration, and make query to check it.
Standard analyzer does not normalize accents like é,è,à. So, a word like 'tréma' will be stored to index as 'tréma'. But if we want to normalize such symbols or not? We want to store 'tréma' word as 'trema'.
There is two ways of setting up new Analyzer (no matter standarts or our):
The first way: Create descendant class of SearchIndex with new Analyzer (see Search Configuration);
There is only one way - create new Analyzer (if there is no previously created and accepted for our needs) and set it in Search index.
The second way: Register new Analyzer in QueryHandler configuration (this one eccepted since 1.12 version);
We will use the last one:
Create new MyAnalyzer
public class MyAnalyzer extends Analyzer { @Override public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer tokenStream = new StandardTokenizer(reader); // process all text with standard filter // removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms. TokenStream result = new StandardFilter(tokenStream); // this filter normalizes token text to lower case result = new LowerCaseFilter(result); // this one replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents result = new ISOLatin1AccentFilter(result); // and finally return token stream return result; } }
Then, register new MyAnalyzer in configuration
<workspace name="ws"> ... <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/> ... </properties> </query-handler> ... </workspace>
After that, check it with query:
Find node with mixin type 'mix:title' where 'jcr:title' contains "tréma" and "naïve" strings.
Repository filled by nodes with mixin type 'mix:title' and different values of 'jcr:title' property.
root
node1 (mix:title) jcr:title = "tréma blabla naïve"
node2 (mix:title) jcr:description = "trema come text naive"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:title, 'tr\u00E8ma na\u00EFve')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:title, 'tr\u00E8ma na\u00EFve')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "node1" and "node2". How is it possible? Remember that our MyAnalyzer transforms 'tréma' word to 'trema'. So node2 accepts our constraints to.
Also, we can get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.35. Table content
cr:title | ... | cr:path |
---|---|---|
trèma blabla naïve | ... | /node1 |
trema come text naive | ... | /node2 |
The node type nt:file represents a file. It requires a single child node, called jcr:content. This node type represents images and other binary content in a JCRWiki entry. The node type of jcr:conent is nt:resource which represents the actual content of a file.
Find node with the primary type is 'nt:file' and which whose 'jcr:content' child node contains "cats".
Normally, we can't find nodes (in our case) using just JCR SQL or XPath queries. But we can configure indexing so that nt:file aggregates jcr:content child node.
So, change indexing-configuration.xml:
<?xml version="1.0"?> <!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.2.dtd"> <configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregate primaryType="nt:file"> <include>jcr:content</include> <include>jcr:content/*</include> <include-property>jcr:content/jcr:lastModified</include-property> </aggregate> </configuration>
Now the content of 'nt:file' and 'jcr:content' ('nt:resource') nodes are concatenated in a single Lucene document. Then, we can make a fulltext search query by content of 'nt:file'; this search includes the content of child 'jcr:content' node.
Repository contains different nt:file nodes.
root
document1 (nt:file)
jcr:content (nt:resource) jcr:data = "The quick brown fox jumps over the lazy dog."
document2 (nt:file)
jcr:content (nt:resource) jcr:data = "Dogs do not like cats."
document3 (nt:file)
jcr:content (nt:resource) jcr:data = "Cats jumping high."
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:file WHERE CONTAINS(*,'cats')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:file)[jcr:contains(.,'cats')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "document2" and "document3".
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
In this example, we will set different boost values for predefined nodes, and will check effect by selecting those nodes and order them by jcr:score.
The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.
See 4.2.2 Index Boost Value Search Configuration
In next configuration, we will set boost values for nt:ustructured nodes 'text' property.
indexing-config.xml:
<!-- This rule actualy do nothing. 'text' property has default boost value. --> <index-rule nodeType="nt:unstructured" condition="@rule='boost1'"> <!-- default boost: 1.0 --> <property>text</property> </index-rule> <!-- Set boost value as 2.0 for 'text' property in nt:unstructured nodes where property 'rule' equal to 'boost2' --> <index-rule nodeType="nt:unstructured" condition="@rule='boost2'"> <!-- boost: 2.0 --> <property boost="2.0">text</property> </index-rule> <!-- Set boost value as 3.0 for 'text' property in nt:unstructured nodes where property 'rule' equal to 'boost3' --> <index-rule nodeType="nt:unstructured" condition="@rule='boost3'"> <!-- boost: 3.0 --> <property boost="3.0">text</property> </index-rule>
Repository contains many nodes with primary type nt:unstructured. Each node contains 'text' property and 'rule' property with different values.
root
node1(nt:unstructured) rule='boost1' text='The quick brown fox jump...'
node2(nt:unstructured) rule='boost2' text='The quick brown fox jump...'
node3(nt:unstructured) rule='boost3' text='The quick brown fox jump...'
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(text, 'quick') ORDER BY jcr:score() DESC"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(@text, 'quick')] order by @jcr:score descending"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
In this example, we will exclude some 'text' property of nt:unstructured node from indexind. And, therefore, node will not be found by the content of this property, even if it accepts all constraints.
First of all, add rules to indexing-configuration.xml:
<index-rule nodeType="nt:unstructured" condition="@rule='nsiTrue'"> <!-- default value for nodeScopeIndex is true --> <property>text</property> </index-rule> <index-rule nodeType="nt:unstructured" condition="@rule='nsiFalse'"> <!-- do not include text in node scope index --> <property nodeScopeIndex="false">text</property> </index-rule>
Repository contains nt:unstructured nodes, with same 'text'property and different 'rule' properties (even null)
root
node1 (nt:unstructured) rule="nsiTrue" text="The quick brown fox ..."
node2 (nt:unstructured) rule="nsiFalse" text="The quick brown fox ..."
node3 (nt:unstructured) text="The quick brown fox ..." // as you see this node not mentioned in indexing-coniguration
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(*,'quick')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'quick')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "node1" and "node3". Node2, as you see, is not in result set.
Also, we can get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.37. Table content
jcr:primarytype | jcr:path | jcr:score |
---|---|---|
nt:unstructured | /node1 | 3806 |
nt:unstructured | /node3 | 3806 |
In this example, we want to configure indexind in the next way. All properties of nt:unstructured nodes must be excluded from search, except properties whoes names ends with 'Text' string. First of all, add rules to indexing-configuration.xml:
<index-rule nodeType="nt:unstructured""> <property isRegexp="true">.*Text</property> </index-rule>
Now, let's check this rule with simple query - select all nodes with primary type 'nt:unstructured' and containing 'quick' string (fulltext search by full node).
Repository contains nt:unstructured nodes, with different 'text'-like named properties
root
node1 (nt:unstructured) Text="The quick brown fox ..."
node2 (nt:unstructured) OtherText="The quick brown fox ..."
node3 (nt:unstructured) Textle="The quick brown fox ..."
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:unstructured WHERE CONTAINS(*,'quick')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'quick')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "node1" and "node2". "node3", as you see, is not in result set.
Also, we can get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is:
Table 1.38. Table content
jcr:primarytype | jcr:path | jcr:score |
---|---|---|
nt:unstructured | /node1 | 3806 |
nt:unstructured | /node2 | 3806 |
It's also called excerption (see Excerpt configuration in Search Configuration and in Searching Repository article).
The goal of this query is to find words "eXo" and "implementation" with fulltext search and high-light this words in result value.
High-lighting is not default feature so we must set it in jcr-config.xml, also excerpt provider must be defined:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> ... <property name="support-highlighting" value="true" /> <property name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.WeightedHTMLExcerpt"/> ... <properties> </query-handler>
Also, remember that we can make indexing rules, as in the example below:
Let's write rule for all nodes with primary node type 'nt:unstructed' where property 'rule' equal to "excerpt" string. For those nodes, we will exclude property "title" from high-lighting and set "text" property as highlightable. Indexing-configuration.xml must containt the next rule:
<index-rule nodeType="nt:unstructured" condition="@rule='excerpt'"> <property useInExcerpt="false">title</property> <property>text</property> </index-rule>
We have single node with primary type 'nt:unstructured'
document (nt:unstructured)
rule = "excerpt"
title = "eXoJCR"
text = "eXo is a JCR implementation"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT rep:excerpt() FROM nt:unstructured WHERE CONTAINS(*, 'eXo implementation')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,nt:unstructured)[jcr:contains(., 'eXo implementation')]/rep:excerpt(.)"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Now let's see on the result table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
Table content is
Table 1.39. Table content
rep:excerpt() | jcr:path | jcr:score |
---|---|---|
<div><span><strong>eXo<strong>is JCR<strong>implementation<strong><span><div> | /testroot/node1 | 335 |
As you see, words "eXo" and "implamentation" is highlighted.
Also, we can get exactly "rep:excerpt" value:
RowIterator rows = result.getRows(); Value excerpt = rows.nextRow().getValue("rep:excerpt(.)"); // excerpt will be equal to "<div><span\><strong>eXo</strong> is a JCR <strong>implementation</strong></span></div>"
Find all mix:title nodes where title contains synonims to 'fast' word.
See also about synonim propvider configuration - Searching Repository Content
Synonim provider must be configured in indexing-configuration.xml :
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> ... <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /> <property name="synonymprovider-config-path" value="../../synonyms.properties" /> ... </properties> </query-handler>
File synonim.properties contains next synonims list:
ASF=Apache Software Foundation quick=fast sluggish=lazy
Repository contains mix:title nodes, where jcr:title has different values.
root
document1 (mix:title) jcr:title="The quick brown fox jumps over the lazy dog."
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:title, '~fast')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*,mix:title)[jcr:contains(@jcr:title, '~fast')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Check the correct spelling of phrase 'quik OR (-foo bar)' according to data already stored in index.
See also about SpellChecker configuration - Searching Repository Content
SpellChecker must be settled in query-handler config.
test-jcr-config.xml:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> ... <property name="spellchecker-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$FiveSecondsRefreshInterval" /> ... </properties> </query-handler>
Repository contains node, with string property "The quick brown fox jumps over the lazy dog."
root
node1 property="The quick brown fox jumps over the lazy dog."
Query looks only for root node, because spell checker looks for suggestions by full index. So complicated query is redundant.
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT rep:spellcheck() FROM nt:base WHERE jcr:path = '/' AND SPELLCHECK('quik OR (-foo bar)')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "/jcr:root[rep:spellcheck('quik OR (-foo bar)')]/(rep:spellcheck())"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Find similar nodes to node by path '/baseFile/jcr:content'.
In our example, baseFile will contain text where "terms" word happens many times. That's a reason why the existanse of this word will be used as a criteria of node similarity (for node baseFile).
See also about Similarity and configuration - Searching Repository Content
Higlighting support must be added to configuration. test-jcr-config.xml:
<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> ... <property name="support-highlighting" value="true" /> ... </properties> </query-handler>
Repository contains many nt:file nodes"
root
baseFile (nt:file)
jcr:content (nt:resource) jcr:data="Similarity is determined by looking up terms that are common to nodes. There are some conditions that must be met for a term to be considered. This is required to limit the number possibly relevant terms. Only terms with at least 4 characters are considered. Only terms that occur at least 2 times in the source node are considered. Only terms that occur in at least 5 nodes are considered."
target1 (nt:file)
jcr:content (nt:resource) jcr:data="Similarity is determined by looking up terms that are common to nodes."
target2 (nt:file)
jcr:content (nt:resource) jcr:data="There is no you know what"
target3 (nt:file)
jcr:content (nt:resource) jcr:data=" Terms occures here"
SQL
// make SQL query QueryManager queryManager = workspace.getQueryManager(); // create query String sqlStatement = "SELECT * FROM nt:resource WHERE SIMILAR(.,'/baseFile/jcr:content')"; Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
XPath
// make XPath query QueryManager queryManager = workspace.getQueryManager(); // create query String xpathStatement = "//element(*, nt:resource)[rep:similar(., '/testroot/baseFile/jcr:content')]"; Query query = queryManager.createQuery(xpathStatement, Query.XPATH); // execute query and fetch result QueryResult result = query.execute();
Let's get nodes:
NodeIterator it = result.getNodes(); if(it.hasNext()) { Node findedNode = it.nextNode(); }
NodeIterator will return "/baseFile/jcr:content","/target1/jcr:content" and "/target3/jcr:content".
As you see the base node are also in result set.
We can also get a table:
String[] columnNames = result.getColumnNames(); RowIterator rit = result.getRows(); while (rit.hasNext()) { Row row = rit.nextRow(); // get values of the row Value[] values = row.getValues(); }
The table content is
Table 1.40. Table content
jcr:path | ... | jcr:score |
---|---|---|
/baseFile/jcr:content | ... | 2674 |
/target1/jcr:content | ... | 2674 |
/target3/jcr:content | ... | 2674 |
If you execute an XPath request like this:
XPath
// get QueryManager QueryManager queryManager = workspace.getQueryManager(); // make XPath query Query query = queryManager.createQuery("/jcr:root/Documents/Publie/2010//element(*, exo:article)", Query.XPATH);
You will have an error : "Invalid request". This happens because XML does not allow names starting with a number - and XPath is part of XML: http://www.w3.org/TR/REC-xml/#NT-Name
Therefore, you cannot do XPath requests using a node name that starts with a number.
Easy workarounds:
Use an SQL request.
Use escaping :
XPath
// get QueryManager QueryManager queryManager = workspace.getQueryManager(); // make XPath query Query query = queryManager.createQuery("/jcr:root/Documents/Publie/_x0032_010//element(*, exo:article)", Query.XPATH);
You can find the JCR configuration file at .../portal/WEB-INF/conf/jcr/repository-configuration.xml. Please read also Search Configuration for more information about index configuration.
QueryResult.getNodes() will return bi-directional NodeIterator implementation.
Bi-directional NodeIterator is not supported in two cases:
SQL query: select * from nt:base
XPath query: //* .
TwoWayRangeIterator interface:
/** * Skip a number of elements in the iterator. * * @param skipNum the non-negative number of elements to skip * @throws java.util.NoSuchElementException if skipped past the first element * in the iterator. */ public void skipBack(long skipNum);
Usage:
NodeIterator iter = queryResult.getNodes(); while (iter.hasNext()) { if (skipForward) { iter.skip(10); // Skip 10 nodes in forward direction } else if (skipBack) { TwoWayRangeIterator backIter = (TwoWayRangeIterator) iter; backIter.skipBack(10); // Skip 10 nodes back } ....... }
JCR supports such features as Lucene Fuzzy Searches Apache Lucene - Query Parser Syntax.
To use it, you have to form a query like the one described below:
QueryManager qman = session.getWorkspace().getQueryManager(); Query q = qman.createQuery("select * from nt:base where contains(field, 'ccccc~')", Query.SQL); QueryResult res = q.execute();
Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in Google. If a search term is prefixed by a tilde symbol ( ~ ), also synonyms of the search term are taken into consideration. For example:
SQL: select * from nt:resource where contains(., '~parameter') XPath: //element(*, nt:resource)[jcr:contains(., '~parameter')
This feature is disabled by default and you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.
<param name="synonymprovider-config-path" value="..you path to configuration file....."/> <param name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"/>
/** * <code>SynonymProvider</code> defines an interface for a component that * returns synonyms for a given term. */ public interface SynonymProvider { /** * Initializes the synonym provider and passes the file system resource to * the synonym provider configuration defined by the configuration value of * the <code>synonymProviderConfigPath</code> parameter. The resource may be * <code>null</code> if the configuration parameter is not set. * * @param fsr the file system resource to the synonym provider * configuration. * @throws IOException if an error occurs while initializing the synonym * provider. */ public void initialize(InputStream fsr) throws IOException; /** * Returns an array of terms that are considered synonyms for the given * <code>term</code>. * * @param term a search term. * @return an array of synonyms for the given <code>term</code> or an empty * array if no synonyms are known. */ public String[] getSynonyms(String term); }
An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.
By default highlighting words matched the query is disabled because this feature requires that additional information is written to the search index. To enable this feature, you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.
<param name="support-highlighting" value="true"/>
Additionally, there is a parameter that controls the format of the excerpt created. In JCR 1.9, the default is set to org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configuration parameter for this setting is:
<param name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.DefaultXMLExcerpt"/>
This excerpt provider creates an XML fragment of the following form:
<excerpt> <fragment> <highlight>exoplatform</highlight> implements both the mandatory XPath and optional SQL <highlight>query</highlight> syntax. </fragment> <fragment> Before parsing the XPath <highlight>query</highlight> in <highlight>exoplatform</highlight>, the statement is surrounded </fragment> </excerpt>
This excerpt provider creates an HTML fragment of the following form:
<div> <span> <strong>exoplatform</strong> implements both the mandatory XPath and optional SQL <strong>query</strong> syntax. </span> <span> Before parsing the XPath <strong>query</strong> in <strong>exoplatform</strong>, the statement is surrounded </span> </div>
If you are using XPath, you must use the rep:excerpt() function in the last location step, just like you would select properties:
QueryManager qm = session.getWorkspace().getQueryManager(); Query q = qm.createQuery("//*[jcr:contains(., 'exoplatform')]/(@Title|rep:excerpt(.))", Query.XPATH); QueryResult result = q.execute(); for (RowIterator it = result.getRows(); it.hasNext(); ) { Row r = it.nextRow(); Value title = r.getValue("Title"); Value excerpt = r.getValue("rep:excerpt(.)"); }
The above code searches for nodes that contain the word exoplatform and then gets the value of the Title property and an excerpt for each result node.
It is also possible to use a relative path in the call Row.getValue() while the query statement still remains the same. Also, you may use a relative path to a string property. The returned value will then be an excerpt based on string value of the property.
Both available excerpt provider will create fragments of about 150 characters and up to 3 fragments.
In SQL, the function is called excerpt() without the rep prefix, but the column in the RowIterator will nonetheless be labled rep:excerpt(.)!
QueryManager qm = session.getWorkspace().getQueryManager(); Query q = qm.createQuery("select excerpt(.) from nt:resource where contains(., 'exoplatform')", Query.SQL); QueryResult result = q.execute(); for (RowIterator it = result.getRows(); it.hasNext(); ) { Row r = it.nextRow(); Value excerpt = r.getValue("rep:excerpt(.)"); }
The lucene based query handler implementation supports a pluggable spell checker mechanism. By default, spell checking is not available and you have to configure it first. See parameter spellCheckerClass on page Search Configuration. JCR currently provides an implementation class , which uses the lucene-spellchecker to contribute . The dictionary is derived from the fulltext indexed content of the workspace and updated periodically. You can configure the refresh interval by picking one of the available inner classes of org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker:
OneMinuteRefreshInterval
FiveMinutesRefreshInterval
ThirtyMinutesRefreshInterval
OneHourRefreshInterval
SixHoursRefreshInterval
TwelveHoursRefreshInterval
OneDayRefreshInterval
For example, if you want a refresh interval of six hours, the class name is: org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$SixHoursRefreshInterval. If you use org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker, the refresh interval will be one hour.
The spell checker dictionary is stored as a lucene index under "index-dir"/spellchecker. If it does not exist, a background thread will create it on startup. Similarly, the dictionary refresh is also done in a background thread to not block regular queries.
You can spell check a fulltext statement either with an XPath or a SQL query:
// rep:spellcheck('explatform') will always evaluate to true Query query = qm.createQuery("/jcr:root[rep:spellcheck('explatform')]/(rep:spellcheck())", Query.XPATH); RowIterator rows = query.execute().getRows(); // the above query will always return the root node no matter what string we check Row r = rows.nextRow(); // get the result of the spell checking Value v = r.getValue("rep:spellcheck()"); if (v == null) { // no suggestion returned, the spelling is correct or the spell checker // does not know how to correct it. } else { String suggestion = v.getString(); }
And the same using SQL:
// SPELLCHECK('exoplatform') will always evaluate to true Query query = qm.createQuery("SELECT rep:spellcheck() FROM nt:base WHERE jcr:path = '/' AND SPELLCHECK('explatform')", Query.SQL); RowIterator rows = query.execute().getRows(); // the above query will always return the root node no matter what string we check Row r = rows.nextRow(); // get the result of the spell checking Value v = r.getValue("rep:spellcheck()"); if (v == null) { // no suggestion returned, the spelling is correct or the spell checker // does not know how to correct it. } else { String suggestion = v.getString(); }
Starting with version, 1.12 JCR allows you to search for nodes that are similar to an existing node.
Similarity is determined by looking up terms that are common to nodes. There are some conditions that must be met for a term to be considered. This is required to limit the number possibly relevant terms.
Only terms with at least 4 characters are considered.
Only terms that occur at least 2 times in the source node are considered.
Only terms that occur in at least 5 nodes are considered.
Note: The similarity functionality requires that the support Hightlighting is enabled. Please make sure that you have the following parameter set for the query handler in your workspace.xml.
<param name="support-highlighting" value="true"/>
The functions are called rep:similar() (in XPath) and similar() (in SQL) and have two arguments:
relativePath: a relative path to a descendant node or . for the current node. absoluteStringPath: a string literal that contains the path to the node for which to find similar nodes.
Relative path is not supported yet.
Examples:
//element(*, nt:resource)[rep:similar(., '/parentnode/node.txt/jcr:content')]
Finds nt:resource nodes, which are similar to node by path /parentnode/node.txt/jcr:content.
Each property of a node (if it is indexable) is processed with Lucene analyzer and stored in Lucene index. That's called indexing of a property. After that we can perform a fulltext search among these indexed properties.
The sense of analyzers is to transform all strings stored in the index in a well-defined condition. The same analyzer(s) is/are used when searching in order to adapt the query string to the index reality.
Therefore, performing the same query using different analyzers can return different results.
Now, let's see how the same string is transformed by different analyzers.
Table 1.41. "The quick brown fox jumped over the lazy dogs"
Analyzer | Parsed |
---|---|
org.apache.lucene.analysis.WhitespaceAnalyzer | [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
org.apache.lucene.analysis.SimpleAnalyzer | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
org.apache.lucene.analysis.StopAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
org.apache.lucene.analysis.standard.StandardAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
org.apache.lucene.analysis.snowball.SnowballAnalyzer | [quick] [brown] [fox] [jump] [over] [lazi] [dog] |
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
Table 1.42. "XY&Z Corporation - xyz@example.com"
Analyzer | Parsed |
---|---|
org.apache.lucene.analysis.WhitespaceAnalyzer | [XY&Z] [Corporation] [-] [xyz@example.com] |
org.apache.lucene.analysis.SimpleAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
org.apache.lucene.analysis.StopAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
org.apache.lucene.analysis.standard.StandardAnalyzer | [xy&z] [corporation] [xyz@example] [com] |
org.apache.lucene.analysis.snowball.SnowballAnalyzer | [xy&z] [corpor] [xyz@exampl] [com] |
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) | [xy&z] [corporation] [xyz@example] [com] |
StandardAnalyzer is the default analyzer in exo's jcr search engine. But we do not use stop words.
You can assign your analyzer as described in Search Configuration
Different properties are indexed in different ways, this affects to if it can be searched like fulltext by property or not.
Only two property types are indexed as fulltext searcheable: STRING and BINARY.
Table 1.43. Fulltext search by different properties
Property Type | Fulltext search by all properties | Fulltext search by exact property |
---|---|---|
STRING | YES | YES |
BINARY | YES | NO |
For example, ưe have property jcr:data (it' BINARY). It's stored well, but you will never find any string with query like:
SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, 'some string')
Because, BINARY is not searchable by fulltext search on exact property.
But, next query will return result (off course if node has searched data):
SELECT * FROM nt:resource WHERE CONTAINS( * , 'some string')
First of all, we will fill repository by nodes with mixin type 'mix:title' and different values of 'jcr:description' property.
root
document1 (mix:title) jcr:description = "The quick brown fox jumped over the lazy dogs"
document2 (mix:title) jcr:description = "Brown fox live in forest."
document3 (mix:title) jcr:description = "Fox is a nice animal."
Let's see analyzers effect closer. In first case, we use base jcr settings, so, as mentioned above, string "The quick brown fox jumped over the lazy dogs" will be transformed to set {[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] }
// make SQL query QueryManager queryManager = workspace.getQueryManager(); String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:description, 'the')"; // create query Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
NodeIterator will return "document1".
Now change the default analyzer to org.apache.lucene.analysis.StopAnalyzer. Fill repository again (new Analyzer must process nodes properties) and run the same query again. It will return nothing, because stop words like "the" will be excluded from parsed string set.
eXo JCR implementation offers new extended feature beyond JCR specification. Sometimes one JCR Node has hundreds or even thousands of child nodes. This situation is highly not recommended for content repository data storage, but sometimes it occurs. JCR Team is pleased to announce new feature that will help to have a deal with huge child lists. They can be iterated in a "lazy" manner now giving improvement in term of performance and RAM usage.
Lazy child nodes iteration feature is accessible via extended interface org.exoplatform.services.jcr.core.ExtendedNode, the inheritor of javax.jcr.Node. It provides a new single method shown below:
/** * Returns a NodeIterator over all child Nodes of this Node. Does not include properties * of this Node. If this node has no child nodes, then an empty iterator is returned. * * @return A NodeIterator over all child Nodes of this <code>Node</code>. * @throws RepositoryException If an error occurs. */ public NodeIterator getNodesLazily() throws RepositoryException;
From the view of end-user or client application, getNodesLazily() works similar to JCR specified getNodes() returning NodeIterator. "Lazy" iterator supports the same set of features as an ordinary NodeIterator, including skip() and excluding remove() features. "Lazy" implementation performs reading from DB by pages. Each time when it has no more elements stored in memory, it reads next set of items from persistent layer. This set is called "page". Must admit that getNodesLazily feature fully supports session and transaction changes log, so it's a functionally-full analogue of specified getNodes() operation. So when having a deal with huge list of child nodes, getNodes() can be simply and safely substituted with getNodesLazily().
JCR gives an experimental opportunity to replace all getNodes() invocations with getNodesLazily() calls. It handles a boolean system property named "org.exoplatform.jcr.forceUserGetNodesLazily" that internally replaces one call with another, without any code changes. But be sure using it only for development purposes. This feature can be used with top level products using eXo JCR to perform a quick compatibility and performance tests without changing any code. This is not recommended to be used as a production solution.
In order to enable add the "-Dorg.exoplatform.jcr.forceUserGetNodesLazily=true" to the java system properties.
The "lazy" iterator reads the child nodes "page" after "page" into the memory. In this context, a "page" is a set of nodes that is read at once. The size of the page is by default 100 nodes and can be configured though workspace container configuration using "lazy-node-iterator-page-size" parameter. For example:
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="multi-db" value="true" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="target/temp/swap/ws" /> <property name="lazy-node-iterator-page-size" value="50" /> ... </properties>
It's not recommended to configure a large number for the page size.
Current "lazy" child nodes iterator supports caching, when pages are cached atomically in safe and optimized way. Cache is always kept in consistent state using invalidation if child list changed. Take in account the following difference in getNodes and getNodesLazily. Specification defined getNodes method reads whole list of nodes, so child items added after invocation will never be in results. GetNodesLazily doesn't acquire full list of nodes, so child items added after iterator creation can be found in result. So getNodesLazily can represent some kind of "real-time" results. But it is highly depend on numerous conditions and should not be used as a feature, it more likely implementation specific issue typical for "lazy-pattern".
The WebDAV protocol enables you to use the third party tools to communicate with hierarchical content servers via HTTP. It is possible to add and remove documents or a set of documents from a path on the server. DeltaV is an extension of the WebDav protocol that allows managing document versioning. Locking guarantees protection against multiple access when writing resources. The ordering support allows changing the position of the resource in the list and sort the directory to make the directory tree viewed conveniently. The full-text search makes it easy to find the necessary documents. You can search by using two languages: SQL and XPATH.
In eXo JCR, we plug in the WebDAV layer - based on the code taken from the extension modules of the reference implementation - on the top of our JCR implementation so that it is possible to browse a workspace using the third party tools (it can be Windows folders or Mac ones as well as a Java WebDAV client, such as DAVExplorer or IE using File->Open as a Web Folder).
Now WebDav is an extension of the REST service. To get the WebDav server ready, you must deploy the REST application. Then, you can access any workspaces of your repository by using the following URL:
Standalone mode:
http://host:port/rest/jcr/{RepositoryName}/{WorkspaceName}/{Path}
Portal mode:
http://host:port/portal/rest/private/jcr/{RepositoryName}/{WorkspaceName}/{Path}
When accessing the WebDAV server with the
URLhttp://localhost:8080/rest/jcr/repository/production
, you
might also use "collaboration" (instead of "production") which is the
default workspace in eXo products. You will be asked to enter your login and
password. Those will then be checked by using the organization service that
can be implemented thanks to an InMemory (dummy) module or a DB module or an
LDAP one and the JCR user session will be created with the correct JCR
Credentials.
If you try the "in ECM" option, add "@ecm" to the user's password. Alternatively, you may modify jaas.conf by adding the domain=ecm option as follows:
exo-domain { org.exoplatform.services.security.jaas.BasicLoginModule required domain=ecm; };
Related documents
<component> <key>org.exoplatform.services.jcr.webdav.WebDavServiceImpl</key> <type>org.exoplatform.services.jcr.webdav.WebDavServiceImpl</type> <init-params> <!-- default node type which is used for the creation of collections --> <value-param> <name>def-folder-node-type</name> <value>nt:folder</value> </value-param> <!-- default node type which is used for the creation of files --> <value-param> <name>def-file-node-type</name> <value>nt:file</value> </value-param> <!-- if MimeTypeResolver can't find the required mime type, which conforms with the file extension, and the mimeType header is absent in the HTTP request header, this parameter is used as the default mime type--> <value-param> <name>def-file-mimetype</name> <value>application/octet-stream</value> </value-param> <!-- This parameter indicates one of the three cases when you update the content of the resource by PUT command. In case of "create-version", PUT command creates the new version of the resource if this resource exists. In case of "replace" - if the resource exists, PUT command updates the content of the resource and its last modification date. In case of "add", the PUT command tries to create the new resource with the same name (if the parent node allows same-name siblings).--> <value-param> <name>update-policy</name> <value>create-version</value> <!--value>replace</value --> <!-- value>add</value --> </value-param> <!-- This parameter determines how service responds to a method that attempts to modify file content. In case of "checkout-checkin" value, when a modification request is applied to a checked-in version-controlled resource, the request is automatically preceded by a checkout and followed by a checkin operation. In case of "checkout" value, when a modification request is applied to a checked-in version-controlled resource, the request is automatically preceded by a checkout operation. --> <value-param> <name>auto-version</name> <value>checkout-checkin</value> <!--value>checkout</value --> </value-param> <!-- This parameter is responsible for managing Cache-Control header value which will be returned to the client. You can use patterns like "text/*", "image/*" or wildcard to define the type of content. --> <value-param> <name>cache-control</name> <value>text/xml,text/html:max-age=3600;image/png,image/jpg:max-age=1800;*/*:no-cache;</value> </value-param> <!-- This parameter determines the absolute path to the folder icon file, which is shown during WebDAV view of the contents --> <value-param> <name>folder-icon-path</name> <value>/absolute/path/to/file</value> </value-param> <!-- This parameter determines the absolute path to the file icon file, which is shown during WebDAV view of the contents --> <value-param> <name>file-icon-path</name> <value>/absolute/path/to/file</value> </value-param> <!-- This parameter is responsible for untrusted user agents definition. Content-type headers of listed here user agents should be ignored and MimeTypeResolver should be explicitly used instead --> <values-param> <name>untrusted-user-agents</name> <value>Microsoft Office Core Storage Infrastructure/1.0</value> </values-param> <-- Allows to define which node type can be used to create files via WebDAV. Default value: nt:file --> <values-param> <name>allowed-file-node-types</name> <value>nt:file</value> </values-param> <-- Allows to define which node type can be used to create folders via WebDAV. Default value: nt:folder --> <values-param> <name>allowed-folder-node-types</name> <value>nt:folder</value> </values-param> </init-params> </component>
At present, eXo JCR WebDav server is tested by using MS Internet Explorer, Dav Explorer, Xythos Drive, Microsoft Office 2003 (as client), and Ubuntu Linux.
(as client) (File->Open with typing http://... href in the file name box)
Table 1.44.
WebDav | JCR |
---|---|
COPY | Workspace.copy(...) |
DELETE | Node.remove() |
GET | Node.getProperty(...); Property.getValue() |
HEAD | Node.getProperty(...); Property.getLength() |
MKCOL | Node.addNode(...) |
MOVE | Session.move(...) or Workspace.move(...) |
PROPFIND | Session.getNode(...); Node.getNode(...); Node.getNodes(...); Node.getProperties() |
PROPPATCH | Node.setProperty(...); Node.getProperty(...).remove() |
PUT | Node.addNode("node","nt:file"); Node.setProperty("jcr:data", "data") |
CHECKIN | Node.checkin() |
CHECKOUT | Node.checkout() |
REPORT | Node.getVersionHistory(); VersionHistory.getAllVersions(); Version.getProperties() |
UNCHECKOUT | Node.restore(...) |
VERSION-CONTROL | Node.addMixin("mix:versionable") |
LOCK | Node.lock(...) |
UNLOCK | Node.unlock() |
ORDERPATCH | Node.orderBefore(...) |
SEARCH | Workspace.getQueryManager(); QueryManager.createQuery(); Query.execute() |
ACL | Node.setPermission(...) |
There are some restrictions for WebDAV in different Operating systems.
When you try to set up a web folder by “adding a network location” or “map a network drive” through My Computer, you can get an error message saying that either “The folder you entered does not appear to be valid. Please choose another” or “Windows cannot access… Check the spelling of the name. Otherwise, there might be…”. These errors may appear when you are using SSL or non-SSL.
To fix this, do as follows:
Go to Windows Registry Editor.
Find a key: \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlset\services\WebClient\Parameters\BasicAuthLevel .
Change the value to 2.
If you have Microsoft Office 2010 applications or Microsoft Office 2007 applications installed on a client computer. From that client computer, you try to access an Office file that is stored on a web server that is configured for Basic authentication. The connection between your computer and the web server does not use Secure Sockets Layer (SSL). When you try to open or to download the file, you experience the following symptoms:
The Office file does not open or download.
You do not receive a Basic authentication password prompt when you try to open or to download the file.
You do not receive an error message when you try to open the file. The associated Office application starts. However, the selected file does not open.
To enable Basic authentication on the client computer, follow these steps:
Click Start, type regedit in the Start Search box, and then press Enter.
Locate and then click the following registry subkey:
HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Common\Internet
On the Edit menu, point to New, and then click DWORD Value.
Type BasicAuthLevel, and then press Enter.
Right-click BasicAuthLevel, and then click Modify.
In the Value data box, type 2, and then click OK.
The JCR-FTP Server represents the standard eXo service, operates as an FTP server with an access to a content stored in JCR repositories in the form of nt:file/nt:folder nodes or their successors. The client of an executed Server can be any FTP client. The FTP server is supported by a standard configuration which can be changed as required.
<value-param> <name>command-port</name> <value>21</value> </value-param>
The value of the command channel port. The value '21' is default.
When you have already some FTP server installed in your system , this parameter needs to be changed (2121 for example) to avoid conflicts or if the port is protected.
<value-param> <name>data-min-port</name> <value>52000</value> </value-param>
<value-param> <name>data-max-port</name> <value>53000</value> </value-param>
These two parameters indicate the minimal and maximal values of the range of ports, used by the server. The usage of the additional data channel is required by the FTP - protocol, which is used to transfer the contents of files and the listing of catalogues. This range of ports should be free from listening by other server-programs.
<value-param> <name>system</name> <value>Windows_NT</value> or <value>UNIX Type: L8</value> </value-param>
Types of formats of listing of catalogues which are supported.
<value-param> <name>client-side-encoding</name> <value>windows-1251</value> or <value>KOI8-R</value> </value-param>
This parameter specifies the coding which is used for dialogue with the client.
<value-param> <name>def-folder-node-type</name> <value>nt:folder</value> </value-param>
This parameter specifies the type of a node, when an FTP-folder is created.
<value-param> <name>def-file-node-type</name> <value>nt:file</value> </value-param>
This parameter specifies the type of a node, when an FTP - file is created.
<value-param> <name>def-file-mime-type</name> <value>application/zip</value> </value-param>
The mime type of a created file is chosen by using its file extention. In case, a server cannot find the corresponding mime type, this value is used.
<value-param> <name>cache-folder-name</name> <value>../temp/ftp_cache</value> </value-param>
The Path of the cache folder.
<value-param> <name>upload-speed-limit</name> <value>20480</value> </value-param>
Restriction of the upload speed. It is measured in bytes.
<value-param> <name>download-speed-limit</name> <value>20480</value> </value-param>
Restriction of the download speed. It is measured in bytes.
<value-param> <name>timeout</name> <value>60</value> </value-param>
Defines the value of a timeout.
<value-param> <name>replace-forbidden-chars</name> <value>true</value> </value-param>
Indicates whether or not the forbidden characters must be replaced.
<value-param> <name>forbidden-chars</name> <value>:[]*'"|</value> </value-param>
Defines the list of forbidden characters.
Restore of system workspace is not supported only as part of restoring of whole repository.
The main purpose of that feature is to restore data in case of system faults and repository crashes. Also, the backup results may be used as a content history.
The concept is based on the export of a workspace unit in the Full, or Full + Incrementals model. A repository workspace can be backup and restored using a combination of these modes. In all cases, at least one Full (initial) backup must be executed to mark a starting point of the backup history. An Incremental backup is not a complete image of the workspace. It contains only changes for some period. So it is not possible to perform an Incremental backup without an initial Full backup.
The Backup service may operate as a hot-backup process at runtime on an in-use workspace. It's a case when the Full + Incrementals model should be used to have a guaranty of data consistency during restoration. An Incremental will be run starting from the start point of the Full backup and will contain changes that have occured during the Full backup, too.
A restore operation is a mirror of a backup one. At least one Full backup should be restored to obtain a workspace corresponding to some points in time. On the other hand, Incrementals may be restored in the order of creation to reach a required state of a content. If the Incremental contains the same data as the Full backup (hot-backup), the changes will be applied again as if they were made in a normal way via API calls.
According to the model there are several modes for backup logic:
Full backup only : Single operation, runs once
Full + Incrementals : Start with an initial Full backup and then keep incrementals changes in one file. Run until it is stopped.
Full + Incrementals(periodic) : Start with an initial Full backup and then keep incrementals with periodic result file rotation. Run until it is stopped.
Full backup/restore is implemented using the JCR SysView Export/Import. Workspace data will be exported into Sysview XML data from root node.
Restoring is implemented, using the special eXo JCR API feature: a dynamic workspace creation. Restoring of the workspace Full backup will create one new workspace in the repository. Then, the SysView XML data will be imported as the root node.
Incremental backup is implemented using the eXo JCR ChangesLog API. This API allows to record each JCR API call as atomic entries in a changelog. Hence, the Incremental backup uses a listener that collects these logs and stores them in a file.
Restoring an incremental backup consists in applying the collected set of ChangesLogs to a workspace in the correct order.
Incremental backup is an experimental feture and not supported, so it must be used with a lot of caution.
The work of Backup is based on the BackupConfig configuration and the BackupChain logical unit.
BackupConfig describes the backup operation chain that will be performed by the service. When you intend to work with it, the configuration should be prepared before the backup is started.
The configuration contains such values as:
Types of full and incremental backup (fullBackupType, incrementalBackupType): Strings with full names of classes which will cover the type functional.
Incremental period: A period after that a current backup will be stopped and a new one will be started in seconds (long).
Target repository and workspace names: Strings with described names
Destination directory for result files: String with a path to a folder where operation result files will be stored.
BackupChain is a unit performing the backup process and it covers the principle of initial Full backup execution and manages Incrementals operations. BackupChain is used as a key object for accessing current backups during runtime via BackupManager. Each BackupJob performs a single atomic operation - a Full or Incremental process. The result of that operation is data for a Restore. BackupChain can contain one or more BackupJobs. But at least the initial Full job is always there. Each BackupJobs has its own unique number which means its Job order in the chain, the initial Full job always has the number 0.
Backup process, result data and file location
To start the backup process, it's necessary to create the BackupConfig and call the BackupManager.startBackup(BackupConfig) method. This method will return BackupChain created according to the configuration. At the same time, the chain creates a BackupChainLog which persists BackupConfig content and BackupChain operation states to the file in the service working directory (see Configuration).
When the chain starts the work and the initial BackupJob starts, the job will create a result data file using the destination directory path from BackupConfig. The destination directory will contain a directory with an automatically created name using the pattern repository_workspace-timestamp where timestamp is current time in the format of yyyyMMdd_hhmmss (E.g. db1_ws1-20080306_055404). The directory will contain the results of all Jobs configured for execution. Each Job stores the backup result in its own file with the name repository_workspace-timestamp.jobNumber. BackupChain saves each state (STARTING, WAITING, WORKING, FINISHED) of its Jobs in the BackupChainLog, which has a current result full file path.
BackupChain log file and job result files are a whole and consistent unit, that is a source for a Restore.
BackupChain log contains absolute paths to job result files. Don't move these files to another location.
Restore requirements
As mentioned before a Restore operation is a mirror of a Backup. The process is a Full restore of a root node with restoring an additional Incremental backup to reach a desired workspace state. Restoring of the workspace Full backup will create a new workspace in the repository using given RepositoyEntry of existing repository and given (preconfigured) WorkspaceEntry for a new target workspace. A Restore process will restore a root node from the SysView XML data.
The target workspace should not be in the repository. Otherwise, a BackupConfigurationException exception will be thrown.
Finally, we may say that Restore is a process of a new Workspace creation and filling it with a Backup content. In case you already have a target Workspace (with the same name) in a Repository, you have to configure a new name for it. If no target workspace exists in the Repositor, you may use the same name as the Backup one.
As an optional extension, the Backup service is not enabled by default. You need to enable it via configuration.
The following is an example configuration :
<component> <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key> <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type> <init-params> <properties-param> <name>backup-properties</name> <property name="backup-dir" value="target/backup" /> </properties-param> </init-params> </component>
Where mandatory paramet is:
backup-dir : The path to a working directory where the service will store internal files and chain logs.
Also, there are optional parameters:
incremental-backup-type : The FQN of incremental job class. Must implement org.exoplatform.services.jcr.ext.backup.BackupJob. By default : org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob used.
default-incremental-job-period : The period between incremetal flushes (in seconds). Default is 3600 seconds.
full-backup-type : The FQN of the full backup job class; Must implement org.exoplatform.services.jcr.ext.backup.BackupJob. By default : org.exoplatform.services.jcr.ext.backup.impl.rdbms.FullBackupJob used. Please, notice that file-system based implementation org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob is deprecated and not recommended for use.
The number of rows that should be fetched from the database during backup operation, can be changed thanks to the System property exo.jcr.component.ext.FullBackupJob.fetch-size. The default value of this parameter is 1000.
RDBMS backup It is the lastest, currently supportedm used by default and recommended implementation of full backup job for BackupManager service. It is useful in case when database is used to store data.
Brings such advantages:
fast: backup takes only several minutes to perform full backup of repository with 1 million rows in tables;
atomic restore: restore process into existing workspace/repository with same configuration is atomic, it means you don’t loose the data when restore failed, the original data remains;
cluster aware: it is possible to make backup/restore in cluster environment into existing workspace/repository with same configuration;
consistence backup: all threads make waiting until backup is finished and then continue to work, so, there are no data modification during backup process;
In the following example, we create a BackupConfig bean for the Full + Incrementals mode, then we ask the BackupManager to start the backup process.
// Obtaining the backup service from the eXo container. BackupManager backup = (BackupManager) container.getComponentInstanceOfType(BackupManager.class); // And prepare the BackupConfig instance with custom parameters. // full backup & incremental File backDir = new File("/backup/ws1"); // the destination path for result files backDir.mkdirs(); BackupConfig config = new BackupConfig(); config.setRepository(repository.getName()); config.setWorkspace("ws1"); config.setBackupDir(backDir); // Before 1.9.3, you also need to indicate the backupjobs class FDNs // config.setFullBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob"); // config.setIncrementalBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob"); // start backup using the service manager BackupChain chain = backup.startBackup(config);
To stop the backup operation, you have to use the BackupChain instance.
// stop backup backup.stopBackup(chain);
Restoration involves reloading the backup file into a BackupChainLog and applying appropriate workspace initialization. The following snippet shows the typical sequence for restoring a workspace :
// find BackupChain using the repository and workspace names (return null if not found) BackupChain chain = backup.findBackup("db1", "ws1"); // Get the RepositoryEntry and WorkspaceEntry ManageableRepository repo = repositoryService.getRepository(repository); RepositoryEntry repoconf = repo.getConfiguration(); List<WorkspaceEntry> entries = repoconf.getWorkspaceEntries(); WorkspaceEntry = getNewEntry(entries, workspace); // create a copy entry from an existing one // restore backup log using ready RepositoryEntry and WorkspaceEntry File backLog = new File(chain.getLogFilePath()); BackupChainLog bchLog = new BackupChainLog(backLog); // initialize the workspace repository.configWorkspace(workspaceEntry); // run restoration backup.restore(bchLog, repositoryEntry, workspaceEntry);
These instructions only applies to regular workspace. Special instructions are provided for System workspace below.
To restore a backup over an existing workspace, you are required to clear its data. Your backup process should follow these steps:
Remove workspace
ManageableRepository repo = repositoryService.getRepository(repository); repo.removeWorkspace(workspace);
Clean database, value storage, index
Restore (see snippet above)
The BackupWorkspaceInitializer is available in JCR 1.9 and later.
Restoring the JCR System workspace requires to shutdown the system and use of a special initializer.
Follow these steps (this will also work for normal workspaces):
Stop repository (or portal)
Clean database, value storage, index;
In configuration, the workspace set BackupWorkspaceInitializer to refer to your backup.
For example:
<workspaces> <workspace name="production" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_production-20090527_030434"/> </properties> </initializer> ... </workspace>
Start repository (or portal).
Repository and Workspace initialization from backup can use the BackupWorkspaceInitializer.
Will be configured BackupWorkspaceInitializer in configuration of workspace to restore the Workspace from backup over initializer.
Will be configured BackupWorkspaceInitializer in all configurations workspaces of the Repository to restore the Repository from backup over initializer.
Restoring the repository or workspace requires to shutdown the repository.
Follow these steps:
Stop repository (will be skipped this step if repository or workace is not exists)
Clean database, value storage, index; (will be skipped this step if repository or worksace is new)
In configuration, the workspace/-s set BackupWorkspaceInitializer to refer to your backup.
Start repository
Example of configuration initializer to restore workspace "backup" over BackupWorkspaceInitializer:
<workspaces> <workspace name="backup" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_backup-20110120_044734"/> </properties> </initializer> ... </workspace>
Example of configuration initializer to resore the workspace "backup" over BackupWorkspaceInitializer:
Stop repository (will be skipped this step if workspace is not exists)
Clean database, value storage, index; (will be skipped this step if workspace is new)
In configuration, the workspace/-s set BackupWorkspaceInitializer to refer to your backup.
<workspaces> <workspace name="backup" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_backup-20110120_044734"/> </properties> </initializer> ... </workspace>
Start repository
Example of configuration initializers to restore the repository "repository" over BackupWorkspaceInitializer:
Stop repository (will be skipped this step if repository is not exists)
Clean database, value storage, index; (will be skipped this step if repository is new)
In configuration of repository will be configured initializers of workspace to refer to your backup.
For example:
... <workspaces> <workspace name="system" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_system-20110120_052334"/> </properties> </initializer> ... </workspace> <workspace name="collaboration" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_collaboration-20110120_052341"/> </properties> </initializer> ... </workspace> <workspace name="backup" ... > <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </container> <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer"> <properties> <property name="restore-path" value="D:\java\exo-working\backup\repository_backup-20110120_052417"/> </properties> </initializer> ... </workspace> </workspaces>
Start repository.
The resore of existing workspace or repositry is available.
For restore will be used spacial methods:
/** * Restore existing workspace. Previous data will be deleted. * For getting status of workspace restore can use * BackupManager.getLastRestore(String repositoryName, String workspaceName) method * * @param workspaceBackupIdentifier * backup identifier * @param workspaceEntry * new workspace configuration * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingWorkspace(String workspaceBackupIdentifier, String repositoryName, WorkspaceEntry workspaceEntry, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing workspace. Previous data will be deleted. * For getting status of workspace restore use can use * BackupManager.getLastRestore(String repositoryName, String workspaceName) method * * @param log * workspace backup log * @param workspaceEntry * new workspace configuration * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingWorkspace(BackupChainLog log, String repositoryName, WorkspaceEntry workspaceEntry, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing repository. Previous data will be deleted. * For getting status of repository restore can use * BackupManager.getLastRestore(String repositoryName) method * * @param repositoryBackupIdentifier * backup identifier * @param repositoryEntry * new repository configuration * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingRepository(String repositoryBackupIdentifier, RepositoryEntry repositoryEntry, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing repository. Previous data will be deleted. * For getting status of repository restore can use * BackupManager.getLastRestore(String repositoryName) method * * @param log * repository backup log * @param repositoryEntry * new repository configuration * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingRepository(RepositoryBackupChainLog log, RepositoryEntry repositoryEntry, boolean asynchronous) throws BackupOperationException, BackupConfigurationException;
These methods for restore will do:
remove existed workspace or repository;
clean database;
clean index data;
clean value storage;
restore from backup.
The Backup manager allows you to restore a repository or a workspace using the original configuration stored into the backup log:
/** * Restore existing workspace. Previous data will be deleted. * For getting status of workspace restore can use * BackupManager.getLastRestore(String repositoryName, String workspaceName) method * WorkspaceEntry for restore should be contains in BackupChainLog. * * @param workspaceBackupIdentifier * identifier to workspace backup. * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingWorkspace(String workspaceBackupIdentifier, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing repository. Previous data will be deleted. * For getting status of repository restore can use * BackupManager.getLastRestore(String repositoryName) method. * ReprositoryEntry for restore should be contains in BackupChainLog. * * @param repositoryBackupIdentifier * identifier to repository backup. * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingRepository(String repositoryBackupIdentifier, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * WorkspaceEntry for restore should be contains in BackupChainLog. * * @param workspaceBackupIdentifier * identifier to workspace backup. * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreWorkspace(String workspaceBackupIdentifier, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * ReprositoryEntry for restore should be contains in BackupChainLog. * * @param repositoryBackupIdentifier * identifier to repository backup. * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreRepository(String repositoryBackupIdentifier, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing workspace. Previous data will be deleted. * For getting status of workspace restore can use * BackupManager.getLastRestore(String repositoryName, String workspaceName) method * WorkspaceEntry for restore should be contains in BackupChainLog. * * @param workspaceBackupSetDir * the directory with backup set * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingWorkspace(File workspaceBackupSetDir, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * Restore existing repository. Previous data will be deleted. * For getting status of repository restore can use * BackupManager.getLastRestore(String repositoryName) method. * ReprositoryEntry for restore should be contains in BackupChainLog. * * @param repositoryBackupSetDir * the directory with backup set * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreExistingRepository(File repositoryBackupSetDir, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * WorkspaceEntry for restore should be contains in BackupChainLog. * * @param workspaceBackupSetDir * the directory with backup set * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreWorkspace(File workspaceBackupSetDir, boolean asynchronous) throws BackupOperationException, BackupConfigurationException; /** * ReprositoryEntry for restore should be contains in BackupChainLog. * * @param repositoryBackupSetDir * the directory with backup set * @param asynchronous * if 'true' restore will be in asynchronous mode (i.e. in separated thread) * @throws BackupOperationException * if backup operation exception occurred * @throws BackupConfigurationException * if configuration exception occurred */ void restoreRepository(File repositoryBackupSetDir, boolean asynchronous) throws BackupOperationException, BackupConfigurationException;
The Backup log is stored during the Backup operation into two different locations: backup-dir directory of BackupService to support interactive operations via Backup API (e.g. console) and backup set files for portability (e.g. on another server).
You can use backup/restore mechanism to migrate between different DB types configuration. Currently three DB types supported (single, multi, isolated) and you can migrate between each of them.
To accomplish migration you simply need to set desired DB type in the repository configuration file of backup set. It is highly recommended to make backup at the DB level before starting the migration process.
After migration process, due to different DB structures, there can remain some unnecessary DB tables, which can be removed safetly.
Before starting migrating the data of your JCR from single/multi data format to isolated data format, you need to have the backupconsole.
See the Building application section for more details.
Or you can download it from ow2 directly.
Enable the Backup service
See the Configuration Backup service section for details.
Create a full backup
For example:
jcrbackup.cmd http://root:exo@localhost:8080/rest start /repository
Return
Successful : status code = 200
Get the backup id
You need get the backup id used in restore action.
For example:
jcrbackup http://root:exo@localhost:8080 list completed
Return
The completed (ready to restore) backups information : 1) Repository backup with id 5dcbc851c0a801c9545eb434947dbe87 : repository name : repository backup type : full only started time : lun., 21 janv. 2013 16:48:21 GMT+01:00 finished time : lun., 21 janv. 2013 16:48:25 GMT+01:00
The backup id: 5dcbc851c0a801c9545eb434947dbe87
See the Backup Client Usage section for more details.
Set desired DB type in the repository configuration file of backup
Change db-structure-type to isolated.
For example: In original-repository-config :
exo-tomcat\temp\backup\repository_repository_backup_1358783301705\original-repository-config
replace
<property name="db-structure-type" value="single"/>
by
<property name="db-structure-type" value="isolated"/>
This change must be done for all workspaces.
Activate the persister config
Before starting the restore operation, ensure that the persister is configured to save the changes of the repository configuration.
If it's not activated, it should be configured, See the JCR Configuration persister section for more details.
Restore repository with original configuation and remove exists
For example:
jcrbackup.cmd http://root:exo@localhost:8080/rest restore remove-exists 5dcbc851c0a801c9545eb434947dbe87
Return
Successful : status code = 200
Drop the old tables with the old data format
drop table JCR_SREF; drop table JCR_SVALUE; drop table JCR_SITEM;
Enable the Backup service
See the Configuration Backup service section for details.
Create a full backup
For example:
jcrbackup.cmd http://root:exo@localhost:8080/rest start /repository
Return
Successful : status code = 200
Get the backup id
You need get the backup id to launch the restore action.
For example:
jcrbackup http://root:exo@localhost:8080 list completed
Return
The completed (ready to restore) backups information : 1) Repository backup with id 5dcbc851c0a801c9545eb434947dbe87 : repository name : repository backup type : full only started time : lun., 21 janv. 2013 16:48:21 GMT+01:00 finished time : lun., 21 janv. 2013 16:48:25 GMT+01:00
The backup id: 5dcbc851c0a801c9545eb434947dbe87
See the Backup Client Usage section for more details.
Set desired DB type in the repository configuration file of backup
Change db-structure-type to isolated.
For example: In original-repository-config :
exo-tomcat\temp\backup\repository_repository_backup_1358783301705\original-repository-config
replace
<property name="db-structure-type" value="multi"/>
by
<property name="db-structure-type" value="isolated"/>
This change must be done for all workspaces.
Configure the datasource name used for the isolated mode
Make sure that in your repository configuration all the workspaces of a same repository share the same datasource.
Activate the persister config
Before starting the restore operation, ensure that the persister is configured to save the changes of the repository configuration.
If it's not activated, it should be configured, See the JCR Configuration persister section for more details.
Restore repository with original configuation and remove exists
For example:
jcrbackup.cmd http://root:exo@localhost:8080/rest restore remove-exists 5dcbc851c0a801c9545eb434947dbe87
Return
Successful : status code = 200
Drop the old tables with the old data format
drop table JCR_MREF; drop table JCR_MVALUE; drop table JCR_MITEM;
For this service, you should configure the org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister in order to save the changes of the repository configuration. See the eXo JCR Configuration article at the 'Portal and Standalone configuration' section.
GateIn uses context /portal/rest, therefore you need to use http://host:port/portal/rest/ instread of http://host:port/rest/
GateIn uses form authentication, so first you need to login (url to form authentication is http://host:port/portal/login) and then perform requests.
The service org.exoplatform.services.jcr.ext.backup.server.HTTPBackupAgent is REST-based front-end to service org.exoplatform.services.jcr.ext.backup.BackupManager. HTTPBackupAgent is representation BackupManager to creation backup, restore, getting status of current or completed backup/restore, etc.
The backup client is http client for HTTPBackupAgent.
The HTTPBackupAgent is based on REST (see details about the REST Framework).
HTTPBackupAgent is using POST and GET methods for request.
The HTTPBackupAgent allows :
Start backup
Stop backup
Restore from backup
Delete the workspace
Get information about backup service (BackupManager)
Get information about current backup / restores / completed backups
/rest/jcr-backup/start/{repo}/{ws}
Start backup on specific workspace
URL:
http://host:port/rest/jcr-backup/start/{repo}/{ws}
Formats: json.
Method: POST
Parameters:
{repo} - the repository name;
{ws} - the workspace name;
BackupConfigBean - the JSON to BackupConfigBean.
The BackupConfigBean:
header : "Content-Type" = "application/json; charset=UTF-8" body: <JSON to BackupConfigBean>
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.BackupConfigBean :
{"incrementalRepetitionNumber":<Integer>,"incrementalBackupJobConfig":<JSON to BackupJobConfig>, "backupType":<Integer>,"fullBackupJobConfig":<JSON to BackupJobConfig>, "incrementalJobPeriod":<Long>,"backupDir":"<String>"}
Where :
backupType - the type of backup: 0 - full backup only; 1 - full and incremental backup. backupDir - the path to backup folder; incrementalJobPeriod - the incremental job period; incrementalRepetitionNumber - the incremental repetition number; fullBackupJobConfig - the configuration to full backup, JSON to BackupJobConfig; incrementalJobPeriod - the configuration to incremental backup, JSON to BackupJobConfig.
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.BackupJobConfig :
{"parameters":[<JSON to Pair>, ..., <JSON to pair> ],"backupJob":"<String>"}
Where:
backupJob - the FQN (fully qualified name) to BackupJob class; parameters - the list of JSON of Pair.
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.Pair :
{"name":"<String>","value":"<String>"}
Where:
name - the name of parameter; value - the value of parameter.
Returns:
Return when being successful
status code = 200
Return when being failure
status code = 404 - the not found repositry '{repo}' or workspace '{ws}' status code = 500 - the other unknown errors failure message in response - the description of failure
/rest/jcr-backup/stop/{id}
Stop backup with identifier {id}.
URL:
http://host:port/rest/jcr-backup/stop/{id}
Formats: plain text
Method: GET
Parameters:
{id} - the identifier of backup
Returns:
Return when being successful
status code = 200
Return when being failure
status code = 404 - the no active backup with identifier {id} status code = 500 - the other unknown errors failure message in response - the description of failure
/rest/jcr-backup/info
Information about the backup service.
URL:
http://host:port/rest/jcr-backup/info
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
Return the JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.BackupServiceInfoBean :
{"backupLogDir":"<String>","defaultIncrementalJobPeriod":<Long>,"fullBackupType":"<String>","incrementalBackupType":"<String>"}
Where:
fullBackupType - the FQN (fully qualified name) of BackupJob class for full backup type; incrementalBackupType - the FQN (fully qualified name) of BackupJob class for incremental backup type; backupLogDir - path to backup folder; defaultIncrementalJobPeriod - the default incremental job period.
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/drop-workspace/{repo}/{ws}/{force-session-close}
Delete the workspace from repository /{repo}/{ws}. With this service, you can delete any workspaces regardless of whether the workspace is a backup or has been copied to a backup.
URL:
http://host:port/rest/jcr-backup/drop-workspace/{repo}/{ws}/{force-session-close}
Formats: plain text
Method: GET
Parameters:
{repo} - the repository name;
{ws} - the workspace name;
{force-session-close} - the boolean value : true - the open sessions on workspace will be closed; false - will not close open sessions.
Returns:
Return when being successful.
status code = 200
Return when being failure
status code = 500 - the other unknown errors; - not found repositry '{repo}' or workspace '{ws}' failure message in response - the description of failure
/rest/jcr-backup/info/backup
Information about the current and completed backups
URL:
http://host:port/rest/jcr-backup/info/backup
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfoList :
{"backups":[<JSON to ShortInfo>,<JSON to ShortInfo>,...,<JSON to ShortInfo>]}
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfo :
{"startedTime":"<String>","backupId":"<String>","type":<Integer>,"state":<Integer>,"backupType":<Integer>, "workspaceName":"<String>","finishedTime":"<String>","repositoryName":"<String>"}
Where:
type - the type of ShortInfo : 0 - the ShorInfo to completed backup; -1 - the ShorInfo to current (active) backup. 1 - the ShorInfo to current restore. backupType - the type of backup: 0 - full backup only; 1 - full and incremental backup. backupId - the identifier of backup; workspaceName - the name of workspace; repositoryName - the name of repository. startedTime - the date of started backup. The date in format RFC 1123 (for examle "Thu, 16 Apr 2009 14:56:49 EEST"). The ShorInfo to current (active) backup : finishedTime - no applicable, always an empty string (""); state - the state of full backup : 0 - starting; 1 - waiting; 2 - working; 4 - finished. The ShorInfo to completed backup : finishedTime - the date of finished backup. The date in format RFC 1123; state - no applicable, always zero (0). The ShorInfo to current restore : finishedTime - the date of finished backup. The date in format RFC 1123; state - the state of restore : 1 - started; 2 - successful; 3 - failure; 4 - initialized.
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/backup/current Information about the current backups
URL:
http://host:port/rest/jcr-backup/info/backup/current
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfoList (see item /rest/jcr-backup/info/backup)
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/backup/completed Information about the completed backups.
URL:
http://host:port/rest/jcr-backup/info/backup/completed
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfoList (see item /rest/jcr-backup/info/backup)
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/backup/{repo}/{ws} Information about the current and completed backups for specific workspace.
URL:
http://host:port/rest/jcr-backup/info/backup/{repo}/{ws}
Formats: json
Method: GET
Parameters:
{repo} - the repository name
{ws} - the workspace name
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfoList (see item /rest/jcr-backup/info/backup)
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/backup/{id} Detailed information about a current or completed backup with identifier '{id}'.
URL:
http://host:port/rest/jcr-backup/info/backup/{id}
Formats: json
Method: GET
Parameters:
{id} - the identifier of backup
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.DetailedInfo :
{"backupConfig":<JSON to BackupConfigBean>,"startedTime":"<String>","backupId":"<String>","type":<Integer>, "state":<Integer>,"backupType":<Integer>,"workspaceName":"<String>","finishedTime":"<String>", "repositoryName":"<String>"}
Where:
type - the type of DetailedInfo : 0 - the DetailedInfo to completed backup; -1 - the DetailedInfo to current (active) backup; 1 - the DetailedInfo to restore. backupType - the type of backup: 0 - full backup only; 1 - full and incremental backup. backupId - the identifier of backup; workspaceName - the name of workspace; repositoryName - the name of repository; backupConfig - the JSON to BackupConfigBean. The DetailedInfo to current (active) backup : startedTime - the date of started backup. The date in format RFC 1123 (for examle "Thu, 16 Apr 2009 14:56:49 EEST"); finishedTime - no applicable, always an empty string (""); state - the state of full backup : 0 - starting; 1 - waiting; 2 - working; 4 - finished. The DetailedInfo to completed backup : startedTime - the date of started backup. The date in format RFC 1123 (for examle "Thu, 16 Apr 2009 14:56:49 EEST"); finishedTime - the date of finished backup. The date in format RFC 1123; state - no applicable, always zero (0). The DetailedInfo to restore : startedTime - the date of started restore. The date in format RFC 1123 (for examle "Thu, 16 Apr 2009 14:56:49 EEST"); finishedTime - the date of finished restore; state - the state of restore : 1 - started; 2 - successful; 3 - failure; 4 - initialized.
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.BackupConfigBean (see item /rest/jcr-backup/start/{repo}/{ws}).
Return when being failure
status code = 404 - not found the backup with {id} status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/restore/{repo}/{ws} The information about the last restore on a specific workspace /{repo}/{ws}.
URL:
http://host:port/rest/jcr-backup/info/restore/{repo}/{ws}
Formats: json
Method: GET
Parameters:
{repo} - the repository name
{ws} - the workspace name
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.DetailedInfo (see item /rest/jcr-backup/info/backup/{id})
Return when being failure
status code = 404 - the not found the restore for workspace /{repo}/{ws} status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/info/restores
The information about the last restores.
URL:
http://host:port/rest/jcr-backup/info/restores
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
The JSON bean of org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfoList (see item /rest/jcr-backup/info/backup)
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
/rest/jcr-backup/restore/{repo}/{id}
Restore the workspace from specific backup.
URL:
http://host:port/rest/jcr-backup/restore/{repo}/{id}
Formats: json.
Method: POST
Parameters:
{repo} - the repository name;
{id} - the identifier to backup; * WorkspaceEntry - the JSON to WorkspaceEntry.
The RestoreBean:
header : "Content-Type" = "application/json; charset=UTF-8" body: <JSON to WorkspaceEntry>
The example of JSON bean to org.exoplatform.services.jcr.config.WorkspaceEntry :
{ "accessManager" : null, "autoInitPermissions" : null, "autoInitializedRootNt" : null, "cache" : { "parameters" : [ { "name" : "max-size", "value" : "10k" }, { "name" : "live-time", "value" : "1h" } ], "type" : "org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl" }, "container" : { "parameters" : [ { "name" : "source-name", "value" : "jdbcjcr" }, { "name" : "dialect", "value" : "hsqldb" }, { "name" : "multi-db", "value" : "false" }, { "name" : "max-buffer-size", "value" : "200k" }, { "name" : "swap-directory", "value" : "../temp/swap/production" } ], "type" : "org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer", "valueStorages" : [ { "filters" : [ { "ancestorPath" : null, "minValueSize" : 0, "propertyName" : null, "propertyType" : "Binary" } ], "id" : "system", "parameters" : [ { "name" : "path", "value" : "../temp/values/production" } ], "type" : "org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage" } ] }, "initializer" : { "parameters" : [ { "name" : "root-nodetype", "value" : "nt:unstructured" } ], "type" : "org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer" }, "lockManager" : "timeout" : 15728640 }, "name" : "production", "queryHandler" : { "analyzer" : { }, "autoRepair" : true, "bufferSize" : 10, "cacheSize" : 1000, "documentOrder" : true, "errorLogSize" : 50, "excerptProviderClass" : "org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt", "excludedNodeIdentifers" : null, "extractorBackLogSize" : 100, "extractorPoolSize" : 0, "extractorTimeout" : 100, "indexDir" : "../temp/jcrlucenedb/production", "indexingConfigurationClass" : "org.exoplatform.services.jcr.impl.core.query.lucene.IndexingConfigurationImpl", "indexingConfigurationPath" : null, "maxFieldLength" : 10000, "maxMergeDocs" : 2147483647, "mergeFactor" : 10, "minMergeDocs" : 100, "parameters" : [ { "name" : "index-dir", "value" : "../temp/jcrlucenedb/production" } ], "queryClass" : "org.exoplatform.services.jcr.impl.core.query.QueryImpl", "queryHandler" : null, "resultFetchSize" : 2147483647, "rootNodeIdentifer" : "00exo0jcr0root0uuid0000000000000", "spellCheckerClass" : null, "supportHighlighting" : false, "synonymProviderClass" : null, "synonymProviderConfigPath" : null, "type" : "org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex", "useCompoundFile" : false, "volatileIdleTime" : 3 }, "uniqueName" : "repository_production" }
Returns:
Return when being successful
status code = 200
Return the JSON bean org.exoplatform.services.jcr.ext.backup.server.bean.response.ShortInfo of just started restore. For JSON description see item /rest/jcr-backup/info/backup
Return when being failure
status code = 403 - the already was restore to workspace /{repo}/{ws} status code = 404 - the not found repositry '{repo}' or unsupported encoding to workspaceConfig status code = 500 - the other unknown errors failure message in response - the description of failure
/rest/jcr-backup/info/default-ws-config Will be returned the JSON bean to WorkspaceEntry for default workspace.
URL:
http://host:port/rest/jcr-backup/info/default-ws-config
Formats: json
Method: GET
Parameters: no
Returns:
Return when being successful
The JSON bean to org.exoplatform.services.jcr.config.WorkspaceEntry :
{ "accessManager" : null, "autoInitPermissions" : null, "autoInitializedRootNt" : null, "cache" : { "parameters" : [ { "name" : "max-size", "value" : "10k" }, { "name" : "live-time", "value" : "1h" } ], "type" : "org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl" }, "container" : { "parameters" : [ { "name" : "source-name", "value" : "jdbcjcr" }, { "name" : "dialect", "value" : "hsqldb" }, { "name" : "multi-db", "value" : "false" }, { "name" : "max-buffer-size", "value" : "200k" }, { "name" : "swap-directory", "value" : "../temp/swap/production" } ], "type" : "org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer", "valueStorages" : [ { "filters" : [ { "ancestorPath" : null, "minValueSize" : 0, "propertyName" : null, "propertyType" : "Binary" } ], "id" : "system", "parameters" : [ { "name" : "path", "value" : "../temp/values/production" } ], "type" : "org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage" } ] }, "initializer" : { "parameters" : [ { "name" : "root-nodetype", "value" : "nt:unstructured" } ], "type" : "org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer" }, "lockManager" : "timeout" : 15728640 }, "name" : "production", "queryHandler" : { "analyzer" : { }, "autoRepair" : true, "bufferSize" : 10, "cacheSize" : 1000, "documentOrder" : true, "errorLogSize" : 50, "excerptProviderClass" : "org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt", "excludedNodeIdentifers" : null, "extractorBackLogSize" : 100, "extractorPoolSize" : 0, "extractorTimeout" : 100, "indexDir" : "../temp/jcrlucenedb/production", "indexingConfigurationClass" : "org.exoplatform.services.jcr.impl.core.query.lucene.IndexingConfigurationImpl", "indexingConfigurationPath" : null, "maxFieldLength" : 10000, "maxMergeDocs" : 2147483647, "mergeFactor" : 10, "minMergeDocs" : 100, "parameters" : [ { "name" : "index-dir", "value" : "../temp/jcrlucenedb/production" } ], "queryClass" : "org.exoplatform.services.jcr.impl.core.query.QueryImpl", "queryHandler" : null, "resultFetchSize" : 2147483647, "rootNodeIdentifer" : "00exo0jcr0root0uuid0000000000000", "spellCheckerClass" : null, "supportHighlighting" : false, "synonymProviderClass" : null, "synonymProviderConfigPath" : null, "type" : "org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex", "useCompoundFile" : false, "volatileIdleTime" : 3 }, "uniqueName" : "repository_production" }
Return when being failure
status code = 500 - the unknown error failure message in response - the description of failure
Add the components org.exoplatform.services.jcr.ext.backup.server.HTTPBackupAgent and org.exoplatform.services.jcr.ext.backup.BackupManager to services configuration :
<component> <type>org.exoplatform.services.jcr.ext.backup.server.HTTPBackupAgent</type> </component> <component> <type>org.exoplatform.services.jcr.ext.repository.RestRepositoryService</type> </component> <component> <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key> <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type> <init-params> <properties-param> <name>backup-properties</name> <property name="backup-dir" value="../temp/backup" /> </properties-param> </init-params> </component>
In case, if you will restore backup in same workspace (so you will drop previous workspace), you need configure RepositoryServiceConfiguration in order to save the changes of the repository configuration. For example
<component> <key>org.exoplatform.services.jcr.config.RepositoryServiceConfiguration</key> <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceConfigurationImpl</type> <init-params> <value-param> <name>conf-path</name> <description>JCR repositories configuration file</description> <value>jar:/conf/portal/exo-jcr-config.xml</value> </value-param> <properties-param> <name>working-conf</name> <description>working-conf</description> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="hsqldb" /> <property name="persister-class-name" value="org.exoplatform.services.jcr.impl.config.JDBCConfigurationPersister" /> </properties-param> </init-params> </component>
See the eXo JCR Configuration article at the 'Portal and Standalone configuration' section for details.
For GateIn should use context "/portal/rest". GateIn uses form authentication, so first you need to login (url to form authentication is http://host:port/portal/login) and then perform requests.
Backup client is support form authentication. For example call command "info" with form authentication to GateIn :
./jcrbackup.sh http://127.0.0.1:8080/portal/rest form POST "/portal/login?initialURI=/portal/private&username=root&password=gtn" info
Backup client is console application.
The backup client is http client for HTTPBackupAgent.
Command signature:
Help info: <url_basic_authentication>|<url form authentication> <cmd> <url_basic_authentication> : http(s)//login:password@host:port/<context> <url form authentication> : http(s)//host:port/<context> "<form auth parm>" <form auth parm> : form <method> <form path> <method> : POST or GET <form path> : /path/path?<paramName1>=<paramValue1>&<paramName2>=<paramValue2>... Example to <url form authentication> : http://127.0.0.1:8080/portal/rest form POST "/portal/login?initialURI=/portal/private&username=root&password=gtn" <cmd> : start <repo[/ws]> <backup_dir> [<incr>] stop <backup_id> status <backup_id> restores <repo[/ws]> restore [remove-exists] {{<backup_id>|<backup_set_path>} | {<repo[/ws]> {<backup_id>|<backup_set_path>} [<pathToConfigFile>]}} list [completed] info drop [force-close-session] <repo[/ws]> help start - start backup of repository or workspace stop - stop backup status - information about the current or completed backup by 'backup_id' restores - information about the last restore on specific repository or workspace restore - restore the repository or workspace from specific backup list - information about the current backups (in progress) list completed - information about the completed (ready to restore) backups info - information about the service backup drop - delete the repository or workspace help - print help information about backup console <repo[/ws]> - /<reponsitory-name>[/<workspace-name>] the repository or workspace <backup_dir> - path to folder for backup on remote server <backup_id> - the identifier for backup <backup_set_dir> - path to folder with backup set on remote server <incr> - incemental job period <pathToConfigFile> - path (local) to repository or workspace configuration remove-exists - remove fully (db, value storage, index) exists repository/workspace force-close-session - close opened sessions on repository or workspace. All valid combination of parameters for command restore: 1. restore remove-exists <repo/ws> <backup_id> <pathToConfigFile> 2. restore remove-exists <repo> <backup_id> <pathToConfigFile> 3. restore remove-exists <repo/ws> <backup_set_path> <pathToConfigFile> 4. restore remove-exists <repo> <backup_set_path> <pathToConfigFile> 5. restore remove-exists <backup_id> 6. restore remove-exists <backup_set_path> 7. restore <repo/ws> <backup_id> <pathToConfigFile> 8. restore <repo> <backup_id> <pathToConfigFile> 9. restore <repo/ws> <backup_set_path> <pathToConfigFile> 10. restore <repo> <backup_set_path> <pathToConfigFile> 11. restore <backup_id> 12. restore <backup_set_path>
Go to folder of "backup client" ${JCR-SRC-HOME}/applications/exo.jcr.applications.backupconsole . - build the application :
mvn clean install -P deploy
Go to ${JCR-SRC-HOME}/applications/exo.jcr.applications.backupconsole/target/backupconsole-binary and use it.
${JCR-SRC-HOME} the path where eXo JCR sources located
Run jar
java -jar exo.jcr.applications.backupconsole-binary.jar <command>
or use jcrbackup.cmd (or .sh);
jcrbackup http://root:exo@127.0.0.1:8080 info
Return :
The backup service information : full backup type : org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob incremetal backup type : org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob backup log folder : /home/rainf0x/java/exo-working/JCR-839/new_JCR/exo-tomcat/bin/../temp/backup default incremental job period : 3600
Start full backup only on workspace "backup", the parameter <bakcup_dir> (../temp/backup) should be exists:
jcrbackup http://root:exo@127.0.0.1:8080 start /repository/backup ../temp/backup
Return :
Successful : status code = 200
Start full and incremental backup on workspace "production":
jcrbackup http://root:exo@127.0.0.1:8080 start /repository/production ../temp/backup 10000
Return :
Successful : tatus code = 200
jcrbackup http://root:exo@127.0.0.1:8080 list
Return :
The current backups information : 1) Backup with id b46370107f000101014b03ea5fbe8d54 : repository name : repository workspace name : production backup type : full + incremetal full backup state : finished incremental backup state : working started time : Fri, 17 Apr 2009 17:03:16 EEST 2) Backup with id b462e4427f00010101cf243b4c6015bb : repository name : repository workspace name : backup backup type : full only full backup state : finished started time : Fri, 17 Apr 2009 17:02:41 EEST
jcrbackup http://root:exo@127.0.0.1:8080 status b46370107f000101014b03ea5fbe8d54
return:
The current backup information : backup id : b46370107f000101014b03ea5fbe8d54 backup folder : /home/rainf0x/java/exo-working/JCR-839/new_JCR/exo-tomcat/bin/../temp/backup repository name : repository workspace name : production backup type : full + incremetal full backup state : finished incremental backup state : working started time : Fri, 17 Apr 2009 17:03:16 EEST
jcrbackup http://root:exo@127.0.0.1:8080 stop 6c302adc7f00010100df88d29535c6ee
Return:
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 list completed
Return:
The completed (ready to restore) backups information : 1) Backup with id adf6fadc7f00010100053b2cba43513c : repository name : repository workspace name : backup backup type : full only started time : Thu, 16 Apr 2009 11:07:05 EEST 2) Backup with id b46370107f000101014b03ea5fbe8d54 : repository name : repository workspace name : production backup type : full + incremetal started time : Fri, 17 Apr 2009 17:03:16 EEST 3) Backup with id aec419cc7f000101004aca277b2b4e9f : repository name : repository workspace name : backup8 backup type : full only started time : Thu, 16 Apr 2009 14:51:08 EEST
Restore to workspace "backup3", for restore need the <backup_id> of completed backup and path to file with workspace configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore /repository/backup3 6c302adc7f00010100df88d29535c6ee /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup3.xml
Return:
Successful : status code = 200
Get information about the current restore for workspace /repository/backup3:
jcrbackup http://root:exo@127.0.0.1:8080 restores
Return:
The current restores information : 1) Restore with id 6c302adc7f00010100df88d29535c6ee: full backup date : 2009-04-03T16:34:37.394+03:00 backup log file : /home/rainf0x/java/exo-working/JCR-839/exo-tomcat/bin/../temp/backup/backup-6c302adc7f00010100df88d29535c6ee.xml repository name : repository workspace name : backup3 backup type : full only path to backup folder : /home/rainf0x/java/exo-working/JCR-839/exo-tomcat/bin/../temp/backup restore state : successful
Restore to workspace "backup" and remove fully (will be removed content from db, value storage, index) exists workspace, for restore need the <backup_id> of completed backup and path to file with workspace configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /repository/backup 6c302adc7f00010100df88d29535c6ee /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup.xml
Return:
Successful : status code = 200
Restore to workspace "backup", for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup and path to file with workspace configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore /repository/backup /tmp/123/repository_backup-20101220_114156 /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup.xml
Return:
Successful : status code = 200
Restore to workspace "backup" and remove fully (will be removed content from db, value storage, index) exists workspace, for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup and path to file with workspace configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /repository/backup /repository/backup /tmp/123/repository_backup-20101220_114156 /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup.xml
Return:
Successful : status code = 200
Restore to workspace "backup" with original configuration of workspace (the original configuration was stored in backup set), for restore need the <backup_id> of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore 6c302adc7f00010100df88d29535c6ee
Return:
Successful : status code = 200
Restore to workspace "backup" with original configuration of workspace (the original configuration was stored in backup set) and remove fully (will be removed content from db, value storage, index) exists workspace, for restore need the <backup_id> of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists 6c302adc7f00010100df88d29535c6ee
Return:
Successful : status code = 200
Restore to workspace "backup" with original configuration of workspace (the original configuration was stored in backup set), for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore /tmp/123/repository_backup-20101220_114156
Return:
Successful : status code = 200
Restore to workspace "backup" and remove fully (will be removed content from db, value storage, index) exists workspace with original configuration of workspace (the original configuration was stored in backup set), for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /tmp/123/repository_backup-20101220_114156
Return:
Successful : status code = 200
Restore to repository "repository" , for restore need the <backup_id> of completed backup and path to file with repository configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /repository 6c302adc7f00010100df88d29535c6ee /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config.xml
Return:
Successful : status code = 200
Restore to repositoy "repository" and remove fully (will be removed content from db, value storage, index) exists repository, for restore need the <backup_id> of completed backup and path to file with repository configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /repository 6c302adc7f00010100df88d29535c6ee /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config.xml
Return:
Successful : status code = 200
Restore to repository "repository", for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup and path to file with repository configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore /repository /tmp/123/repository_repository_backup_1292833493681 /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config.xml
Return:
Successful : status code = 200
Restore to repository "repository" and remove fully (will be removed content from db, value storage, index) exists repository, for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup and path to file with repository configuration:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /repository /repository/backup /tmp/123/repository_repository_backup_1292833493681 /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config.xml
Return:
Successful : status code = 200
Restore to repository "repository" with original configuration of repository (the original configuration was stored in backup set), for restore need the <backup_id> of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore 6c302adc7f00010100df88d29535c6ee
Return:
Successful : status code = 200
Restore to repository "repository" with original configuration of repository (the original configuration was stored in backup set) and remove fully (will be removed content from db, value storage, index) exists repository, for restore need the <backup_id> of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists 6c302adc7f00010100df88d29535c6ee
Return:
Successful : status code = 200
Restore to repository "repository" with original configuration of repository (the original configuration was stored in backup set), for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore /tmp/123/repository_repository_backup_1292833493681
Return:
Successful : status code = 200
Restore to repository "repository" and remove fully (will be removed content from db, value storage, index) exists repository with original configuration of repository (the original configuration was stored in backup set), for restore need the <backup_set_path> (<backup_set_path> is path to backup set folder on server side) of completed backup:
jcrbackup http://root:exo@127.0.0.1:8080 restore remove-exists /tmp/123/repository_repository_backup_1292833493681
Return:
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 start /repository/backup ../temp/backup 10000
Return :
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 list
Return :
The current backups information : 1) Backup with id b469ba957f0001010178febaedf20eb7 : repository name : repository workspace name : backup backup type : full + incremetal full backup state : finished incremental backup state : working started time : Fri, 17 Apr 2009 17:10:09 EEST
Stop backup with id b469ba957f0001010178febaedf20eb7 :
jcrbackup http://root:exo@127.0.0.1:8080 stop b469ba957f0001010178febaedf20eb7
Return :
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 drop force-close-session /repository/backup
Return :
Successful : status code = 200
Delete/clean the database for workspace "backup" : When we use "single-db", then we will run the SQL queries for clean database :
delete from JCR_SREF where NODE_ID in (select ID from JCR_SITEM where CONTAINER_NAME = 'backup') delete from JCR_SVALUE where PROPERTY_ID in (select ID from JCR_SITEM where CONTAINER_NAME = 'backup') delete from JCR_SITEM where CONTAINER_NAME='backup'
Delete the value storage for workspace "backup"; - delete the index data for workspace "backup"; - restore :
jcrbackup http://root:exo@127.0.0.1:8080 restore /repository/backup b469ba957f0001010178febaedf20eb7 /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup.xml
Return :
Successful : status code = 200
The /home/rainf0x/java/exo-working/JCR-839/exo-jcr-config_backup.xml content the configuration for restored workspace "backup":
<repository-service default-repository="repository"> <repositories> <repository name="repository" system-workspace="production" default-workspace="production"> <security-domain>exo-domain</security-domain> <access-control>optional</access-control> <authentication-policy>org.exoplatform.services.jcr.impl.core.access.JAASAuthenticator</authentication-policy> <workspaces> <workspace name="backup"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="dialect" value="pgsql" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="../temp/swap/backup" /> </properties> <value-storages> <value-storage id="draft" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="../temp/values/backup" /> </properties> <filters> <filter property-type="Binary"/> </filters> </value-storage> </value-storages> </container> <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer"> <properties> <property name="root-nodetype" value="nt:unstructured" /> </properties> </initializer> <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl"> <properties> <property name="max-size" value="10k" /> <property name="live-time" value="1h" /> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="../temp/jcrlucenedb/backup" /> </properties> </query-handler> <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="jbosscache-lock.xml" /> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks" /> <property name="jbosscache-cl-cache.jdbc.table.create" value="true" /> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" /> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk" /> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" /> <property name="jbosscache-cl-cache.jdbc.node.column" value="node" /> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" /> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager> </workspace> </workspaces> </repository> </repositories> </repository-service>
jcrbackup http://root:exo@127.0.0.1:8080 restores /repository/backup
Return:
The current restores information : Restore with id b469ba957f0001010178febaedf20eb7: backup folder : /home/rainf0x/java/exo-working/JCR-839/new_JCR/exo-tomcat/bin/../temp/backup repository name : repository workspace name : backup backup type : full + incremetal restore state : successful started time : Fri, 17 Apr 2009 16:38:00 EEST finished time : Fri, 17 Apr 2009 16:38:00 EEST
If delete default repository that should be restored repository with name as default repository.
This usecase needs RestRepositoryService enabled. (Deleting the repository needs it)
<component> <type>org.exoplatform.services.jcr.ext.repository.RestRepositoryService</type> </component>
jcrbackup http://root:exo@127.0.0.1:8080 start /repository ../temp/backup 10000
Return :
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 list
Return :
The current backups information : 1) Repository backup with id 9a4d40fb7f0000012ec8f0a4ec70b3da : repository name : repository backup type : full + incremetal full backups state : finished incremental backups state : working started time : Mon, 11 Oct 2010 10:59:35 EEST
Stop backup with id 9a4d40fb7f0000012ec8f0a4ec70b3da :
jcrbackup http://root:exo@127.0.0.1:8080 stop 9a4d40fb7f0000012ec8f0a4ec70b3da
Return :
Successful : status code = 200
jcrbackup http://root:exo@127.0.0.1:8080 drop force-close-session /repository
Return :
Successful : status code = 200
Delete/clean the database for workspace "repository": When we use "single-db", then we will run the SQL queries for clean database :
drop table JCR_SREF; drop table JCR_SVALUE; drop table JCR_SITEM;
Delete the value storage for repository "repository";
Delete the index data for repository "repository";
Restore:
jcrbackup http://root:exo@127.0.0.1:8080 restore /repository 9a6dba327f000001325dfb228a181b07 /home/rainf0x/exo-jcr-config_backup.xml
Return :
Successful : status code = 200
The /home/rainf0x/exo-jcr-config_backup.xml content the configuration for restored repository "repository":
<repository-service default-repository="repository"> <repositories> <repository name="repository" system-workspace="production" default-workspace="production"> <security-domain>exo-domain</security-domain> <access-control>optional</access-control> <authentication-policy>org.exoplatform.services.jcr.impl.core.access.JAASAuthenticator</authentication-policy> <workspaces> <workspace name="production"> <!-- for system storage --> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="../temp/swap/production" /> </properties> <value-storages> <value-storage id="system" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="../temp/values/production" /> </properties> <filters> <filter property-type="Binary" /> </filters> </value-storage> </value-storages> </container> <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer"> <properties> <property name="root-nodetype" value="nt:unstructured" /> </properties> </initializer> <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl"> <properties> <property name="max-size" value="10k" /> <property name="live-time" value="1h" /> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="../temp/jcrlucenedb/production" /> </properties> </query-handler> <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="jbosscache-lock.xml" /> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks" /> <property name="jbosscache-cl-cache.jdbc.table.create" value="true" /> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" /> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk" /> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" /> <property name="jbosscache-cl-cache.jdbc.node.column" value="node" /> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" /> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager> </workspace> <workspace name="backup"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="../temp/swap/backup" /> </properties> <value-storages> <value-storage id="draft" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="../temp/values/backup" /> </properties> <filters> <filter property-type="Binary" /> </filters> </value-storage> </value-storages> </container> <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer"> <properties> <property name="root-nodetype" value="nt:unstructured" /> </properties> </initializer> <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl"> <properties> <property name="max-size" value="10k" /> <property name="live-time" value="1h" /> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="../temp/jcrlucenedb/backup" /> </properties> </query-handler> <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="jbosscache-lock.xml" /> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks" /> <property name="jbosscache-cl-cache.jdbc.table.create" value="true" /> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" /> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk" /> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" /> <property name="jbosscache-cl-cache.jdbc.node.column" value="node" /> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" /> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager> </workspace> <workspace name="digital-assets"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr" /> <property name="multi-db" value="false" /> <property name="max-buffer-size" value="200k" /> <property name="swap-directory" value="../temp/swap/digital-assets" /> </properties> <value-storages> <value-storage id="digital-assets" class="org.exoplatform.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> <properties> <property name="path" value="../temp/values/digital-assets" /> </properties> <filters> <filter property-type="Binary" /> </filters> </value-storage> </value-storages> </container> <initializer class="org.exoplatform.services.jcr.impl.core.ScratchWorkspaceInitializer"> <properties> <property name="root-nodetype" value="nt:folder" /> </properties> </initializer> <cache enabled="true" class="org.exoplatform.services.jcr.impl.dataflow.persistent.LinkedWorkspaceStorageCacheImpl"> <properties> <property name="max-size" value="5k" /> <property name="live-time" value="15m" /> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="../temp/jcrlucenedb/digital-assets" /> </properties> </query-handler> <lock-manager class="org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockManagerImpl"> <properties> <property name="time-out" value="15m" /> <property name="jbosscache-configuration" value="jbosscache-lock.xml" /> <property name="jbosscache-cl-cache.jdbc.table.name" value="jcrlocks" /> <property name="jbosscache-cl-cache.jdbc.table.create" value="true" /> <property name="jbosscache-cl-cache.jdbc.table.drop" value="false" /> <property name="jbosscache-cl-cache.jdbc.table.primarykey" value="jcrlocks_pk" /> <property name="jbosscache-cl-cache.jdbc.fqn.column" value="fqn" /> <property name="jbosscache-cl-cache.jdbc.node.column" value="node" /> <property name="jbosscache-cl-cache.jdbc.parent.column" value="parent" /> <property name="jbosscache-cl-cache.jdbc.datasource" value="jdbcjcr" /> <property name="jbosscache-shareable" value="true" /> </properties> </lock-manager> </workspace> </workspaces> </repository> </repositories> </repository-service>
jcrbackup http://root:exo@127.0.0.1:8080 restores /repository
Return:
Repository restore with id 9a6dba327f000001325dfb228a181b07: backup folder : /home/rainf0x/java/exo-working/JCR-1459/exo-tomcat/bin/../temp/backup/repository_repository_backup_1286786103858 repository name : repository backup type : full + incremetal restore state : successful started time : Mon, 11 Oct 2010 11:51:15 EEST finished time : Mon, 11 Oct 2010 11:51:17 EEST
To keep all the data of your repository consistent, you have to suspend it which means that all the working threads will be suspended until the resume operation is performed. Indexes will be flushed during the suspend operation.
You can suspend your repository by calling the suspend method on the MBean of the RepositorySuspendController corresponding to your repository as shown below:
The result of the suspend method will be "suspended" if everything worked well otherwise you should get "undefined" which means that at least one component has not been suspended successfully, in that case you can check the log file to understand what happens.
Now we can backup the data manually or using third party softwares. We will need to backup:
The database content
The Lucene indexes
The value storages content (if configured)
This section will show you how to get and manage all statistics provided by eXo JCR.
In order to have a better idea of the time spent into the database access layer, it can be interesting to get some statistics on that part of the code, knowing that most of the time spent into eXo JCR is mainly the database access. This statistics will then allow you to identify without using any profiler what is normally slow in this layer, which could help to fix the problem quickly.
In case you use
org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer
or
org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer
as WorkspaceDataContainer
, you can get statistics on the
time spent into the database access layer. The database access layer (in
eXo JCR) is represented by the methods of the interface
org.exoplatform.services.jcr.storage.WorkspaceStorageConnection
,
so for all the methods defined in this interface, we can have the
following figures:
The minimum time spent into the method.
The maximum time spent into the method.
The average time spent into the method.
The total amount of time spent into the method.
The total amount of time the method has been called.
Those figures are also available globally for all the methods which gives us the global behavior of this layer.
If you want to enable the statistics, you just need to set the JVM parameter called JDBCWorkspaceDataContainer.statistics.enabled to true. The corresponding CSV file is StatisticsJDBCStorageConnection-${creation-timestamp}.csv for more details about how the csv files are managed, please refer to the section dedicated to the statistics manager.
The format of each column header is ${method-alias}-${metric-alias}. The metric alias are described in the statistics manager section.
The name of the category of statistics corresponding to these statistics is JDBCStorageConnection, this name is mostly needed to access to the statistics through JMX.
Table 1.45. Method Alias
global | This is the alias for all the methods. |
getItemDataById | This is the alias for the method getItemData(String identifier). |
getItemDataByNodeDataNQPathEntry | This is the alias for the method getItemData(NodeData parentData, QPathEntry name). |
getChildNodesData | This is the alias for the method getChildNodesData(NodeData parent). |
getChildNodesCount | This is the alias for the method getChildNodesCount(NodeData parent). |
getChildPropertiesData | This is the alias for the method getChildPropertiesData(NodeData parent). |
listChildPropertiesData | This is the alias for the method listChildPropertiesData(NodeData parent). |
getReferencesData | This is the alias for the method getReferencesData(String nodeIdentifier). |
commit | This is the alias for the method commit(). |
addNodeData | This is the alias for the method add(NodeData data). |
addPropertyData | This is the alias for the method add(PropertyData data). |
updateNodeData | This is the alias for the method update(NodeData data). |
updatePropertyData | This is the alias for the method update(PropertyData data). |
deleteNodeData | This is the alias for the method delete(NodeData data). |
deletePropertyData | This is the alias for the method delete(PropertyData data). |
renameNodeData | This is the alias for the method rename(NodeData data). |
rollback | This is the alias for the method rollback(). |
isOpened | This is the alias for the method isOpened(). |
close | This is the alias for the method close(). |
In order to know exactly how your application uses eXo JCR, it can be interesting to register all the JCR API accesses in order to easily create real life test scenario based on pure JCR calls and also to tune your eXo JCR to better fit your requirements.
In order to allow you to specify the configuration which part of eXo JCR needs to be monitored without applying any changes in your code and/or building anything, we choose to rely on the Load-time Weaving proposed by AspectJ.
To enable this feature, you will have to add in your classpath the following jar files:
exo.jcr.component.statistics-X.Y.Z.jar corresponding to your eXo JCR version that you can get from the jboss maven repository https://repository.jboss.org/nexus/content/groups/public/org/exoplatform/jcr/exo.jcr.component.statistics.
aspectjrt-1.6.8.jar that you can get from the main maven
repository http://repo2.maven.org/maven2/org/aspectj/aspectjrt
.
You will also need to get aspectjweaver-1.6.8.jar from the main maven repository http://repo2.maven.org/maven2/org/aspectj/aspectjweaver. At this stage, to enable the statistics on the JCR API accesses, you will need to add the JVM parameter -javaagent:${pathto}/aspectjweaver-1.6.8.jar to your command line, for more details please refer to http://www.eclipse.org/aspectj/doc/released/devguide/ltw-configuration.html.
By default, the configuration will collect statistics on all the methods of the internal interfaces org.exoplatform.services.jcr.core.ExtendedSession and org.exoplatform.services.jcr.core.ExtendedNode, and the JCR API interface javax.jcr.Property. To add and/or remove some interfaces to monitor, you have two configuration files to change that are bundled into the jar exo.jcr.component.statistics-X.Y.Z.jar, which are conf/configuration.xml and META-INF/aop.xml.
The file content below is the content of conf/configuration.xml that you will need to modify to add and/or remove the full qualified name of the interfaces to monitor, into the list of parameter values of the init param called targetInterfaces.
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.exoplatform.org/xml/ns/kernel_1_3.xsd http://www.exoplatform.org/xml/ns/kernel_1_3.xsd" xmlns="http://www.exoplatform.org/xml/ns/kernel_1_3.xsd"> <component> <type>org.exoplatform.services.jcr.statistics.JCRAPIAspectConfig</type> <init-params> <values-param> <name>targetInterfaces</name> <value>org.exoplatform.services.jcr.core.ExtendedSession</value> <value>org.exoplatform.services.jcr.core.ExtendedNode</value> <value>javax.jcr.Property</value> </values-param> </init-params> </component> </configuration>
The file content below is the content of META-INF/aop.xml that you will need to modify to add and/or remove the full qualified name of the interfaces to monitor, into the expression filter of the pointcut called JCRAPIPointcut. As you can see below, by default only JCR API calls from the exoplatform packages are took into account, don't hesistate to modify this filter to add your own package names.
<aspectj> <aspects> <concrete-aspect name="org.exoplatform.services.jcr.statistics.JCRAPIAspectImpl" extends="org.exoplatform.services.jcr.statistics.JCRAPIAspect"> <pointcut name="JCRAPIPointcut" expression="(target(org.exoplatform.services.jcr.core.ExtendedSession) || target(org.exoplatform.services.jcr.core.ExtendedNode) || target(javax.jcr.Property)) && call(public * *(..))" /> </concrete-aspect> </aspects> <weaver options="-XnoInline"> <include within="org.exoplatform..*" /> </weaver> </aspectj>
The corresponding CSV files are of type Statistics${interface-name}-${creation-timestamp}.csv for more details about how the csv files are managed, please refer to the section dedicated to the statistics manager.
The format of each column header is ${method-alias}-${metric-alias}. The method alias will be of type ${method-name}(list of parameter types separeted by ; to be compatible with the CSV format).
The metric alias are described in the statistics manager section.
The name of the category of statistics corresponding to these statistics is the simple name of the monitored interface (e.g. ExtendedSession for org.exoplatform.services.jcr.core.ExtendedSession), this name is mostly needed to access to the statistics through JMX.
Please note that this feature will affect the performances of eXo JCR so it must be used with caution.
The statistics manager manages all the statistics provided by eXo JCR, it is responsible of printing the data into the CSV files and also exposing the statistics through JMX and/or Rest.
The statistics manager will create all the CSV files for each
category of statistics that it manages, the format of those files is
Statistics${category-name}-${creation-timestamp}.csv.
Those files will be created into the user directory if it is possible
otherwise it will create them into the temporary directory. The format of
those files is CSV
(i.e. Comma-Seperated Values), one new
line will be added regularily (every 5 seconds by default) and one last
line will be added at JVM exit. Each line, will be composed of the 5
figures described below for each method and globaly for all the
methods.
Table 1.46. Metric Alias
Min | The minimum time spent into the method expressed in milliseconds. |
Max | The maximum time spent into the method expressed in milliseconds. |
Total | The total amount of time spent into the method expressed in milliseconds. |
Avg | The average time spent into the method expressed in milliseconds. |
Times | The total amount of times the method has been called. |
You can disable the persistence of the statistics by setting the
JVM parameter called
JCRStatisticsManager.persistence.enabled to
false, by default, it is set to
true. You can aslo define the period of time between
each record (i.e. line of data into the file) by setting the JVM parameter
called JCRStatisticsManager.persistence.timeout to
your expected value expressed in milliseconds, by default it is set to
5000.
You can also access to the statistics thanks to JMX, the available methods are the following:
Table 1.47. JMX Methods
getMin | Give the minimum time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
getMax | Give the maximum time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
getTotal | Give the total amount of time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
getAvg | Give the average time spent into the method corresponding to the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
getTimes | Give the total amount of times the method has been called corresponding to the given ,category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
reset | Reset the statistics for the given category name and statistics name. The expected arguments are the name of the category of statistics (e.g. JDBCStorageConnection) and the name of the expected method or global for the global value. |
resetAll | Reset all the statistics for the given category name. The expected argument is the name of the category of statistics (e.g. JDBCStorageConnection). |
The full name of the related MBean is
exo:service=statistic, view=jcr.
It is highly recommended to back up your data before repairing inconsistencies (either automatically or maually). It is also recommended to store the results of queries that check the data consistency. This may be useful for the support team in case of deeper restoration process.
Production and any systems may have faults in some days. They may be caused by hardware and/or software problems, human faults during updates and in many other circumstances. It is important to check integrity and consistency of the system if it is not backed up or stale, or it takes the recovery process much time. The eXo JCR implementation offers an innovative JMX-based complex checking tool. Running inspection, this tool checks every major JCR component, such as persistent data layer and index. The persistent layer includes JDBC Data Container and Value Storage if they are configured. The database is verified using the set of complex specialized domain-specific queries. The Value Storage tool checks the existence and access to each file. Index verification contains two-way pass cycle, existence of each node in the index checks on persistent layer along with opposite direction, when each node from Data Container is validated in the index. Access to the checking tool is exposed via the JMX interface (RepositoryCheckController MBean) with the following operations available:
Table 1.48. Check methods
Operation | Description |
---|---|
checkAll() | Inspect the full repository data (database, value storage and search indexes). |
checkDataBase() | Inspect only the DB. |
checkValueStorage() | Inspect only the value storage. |
checkIndex() | Inspect only the search indexes. |
Among the list of known inconsistencies described in the next section, see below what can be checked and repaired automatically:
An item has no parent node: Properties will be removed and the root UUID will be assigned in case of nodes.
A node has a single valued property with nothing declared in the VALUE table: This property will be removed if it is not required by primary type of its node.
A node has no primary type property: This node and the whole subtree will be removed if it is not required by primary type of its parent.
Value record has no related property record: Value record will be removed from database.
An item is its own parent: Properties will be removed and root UUID will be assigned in case of nodes.
Several versions of same item: All earlier records with earlier versions will be removed from ITEM table.
Reference properties without reference records: The property will be removed if it is not required by the primary type of its node.
A node is marked as locked in the lockmanager's table but not in ITEM table or the opposite: All lock inconsistencies will be removed from both tables.
The only inconsistency that cannot be fixed automatically is Corrupted VALUE records. Both STORAGE_DESC and DATA fields contain not null value. Since there is no way to determinate which value is valid: either on the file system or in the database.
The list of ValueStorage inconsistencies which can be checked and repaired automatically:
Property's value is stored in the File System but the content is missing: A new empty file corresponding to this value will be created.
The list of SearchIndex inconsistencies which can be checked. To repair them we need to reindex the content completely, what also can be done using JMX:
Not indexed document
Document indexed more than one time
Document corresponds to removed node
Table 1.49. Repair methods
Operation | Description |
---|---|
repairDataBase() | Repair DB inconsistencies declared above. |
repairValueStorage() | Repair value storage inconsistencies declared above. |
All tool activities are stored into a file, which can be found in app directory.
The syntax of the name of the file is report-<repository name>-dd-MMM-yy-HH-mm.txt
.
Here are examples of corrupted JCR and ways to eliminate them:
It is assumed that queries for single and multi DB configurations are different only in the JCR_xITEM table name, otherwise queries will be explicitly introduced.
In some examples, you will be asked to replace some identificators with the corresponding value. That basically means that you need to insert values, from each row of result of query executed during the issue detection stage to the corresponding place. The explicit explanation of what to do will be introduced in case replacing is needed to be fulfilled in other way.
Items have no parent nodes.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM I where NOT EXISTS(select * from JCR_SITEM P where P.ID = I.PARENT_ID)
Fix description: Assign root as parent node to be able to delete this node later if it is not needed anymore.
To fix this issue, do as follows:
For all query results rows containing items belonging to I_CLASS = 1 (nodes).
Execute the next query. Replace ${ID}
and
${CONTAINER_NAME}
with the corresponding
values:
Single DB:
update JCR_SITEM set PARENT_ID='${CONTAINER_NAME}00exo0jcr0root0uuid0000000000000' where ID = '${ID}'
Multi DB:
update JCR_MITEM set PARENT_ID='00exo0jcr0root0uuid0000000000000' where ID = '${ID}'
For all query results rows containing items belonging to I_CLASS = 2 (property).
delete from JCR_SREF where PROPERTY_ID = '${ID}' delete from JCR_SVALUE where PROPERTY_ID = '${ID}' delete from JCR_SITEM where PARENT_ID = '${ID}' or ID='${ID}'
A node has a single valued property with no declaration in the VALUE table.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM P where P.I_CLASS=2 and P.P_MULTIVALUED=0 and NOT EXISTS (select * from JCR_SVALUE V where V.PROPERTY_ID=P.ID)
P_MULTIVALUED=0
should be replaced by
P_MULTIVALUED='f'
for PostgreSQL.
Fix description: Simply remove corrupted properties.
To fix for every row, execute next queries and replace
${ID}
with the corresponding value:
delete from JCR_SREF where PROPERTY_ID = '${ID}' delete from JCR_SITEM where ID = '${ID}'
Node has no primary type property.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM N where N.I_CLASS=1 and NOT EXISTS (select * from JCR_SITEM P where P.I_CLASS=2 and P.PARENT_ID=N.ID and P.NAME='[http://www.jcp.org/jcr/1.0]primaryType')
Fix description: Remove node, all its children, properties, values and reference records.
To fix this issue, do as follows:
Recursively traverse to the bottom of the tree until query results are in empty value:
select * from JCR_SITEM where PARENT_ID='${ID}' and I_CLASS=1
You will receive a tree structure containing a node, its children and properties.
Execute the following steps with tree structure elements in reverse order (from leaves to head).
Execute query for tree element's
${ID}
.
select * from JCR_SITEM where PARENT_ID='${ID}'
Execute queries for each ${ID}
received
during the query execution mentioned above.
delete from JCR_SREF where PROPERTY_ID = '${ID}' delete from JCR_SVALUE where PROPERTY_ID = '${ID}' delete from JCR_SITEM where PARENT_ID = '${ID}' or ID='${ID}'
All value records have no related property record.
To detect this issue, you need to execute the following query:
select * from JCR_SVALUE V where NOT EXISTS(select * from JCR_SITEM P where V.PROPERTY_ID = P.ID and P.I_CLASS=2)
Fix description: Remove these unnecessary records from the JCR_SVALUE table.
To fix this issue, execute next queries and replace
${ID}
with the corresponding value for every
row:
delete from JCR_SVALUE where ID = '${ID}'
Corrupted VALUE records. Both STORAGE_DESC and DATA fields contain not null value.
To detect this issue, you need to execute the following query:
select * from JCR_SVALUE where (STORAGE_DESC is not null and DATA is not null)
Fix description: Set null for the STORAGE_DESC field by assuming value stored in database is valid
To fix this issue, execute next queries replacing
${ID}
with corresponding value for every
row:
update JCR_SVALUE set STORAGE_DESC = null where ID = '${ID}'
For Sybase DB, "DATA is not null" must be replaced by "not DATA like null".
Item is its own parent.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM I where I.ID = I.PARENT_ID and I.NAME <> '__root_parent'
Fix description: Assign root as parent node to delete this node later if it is not needed to use anymore.
To fix this issue, do as follows:
For all query results rows containing items belonging to I_CLASS = 1 (nodes).
Execute the next query, replacing ${ID}
and ${CONTAINER_NAME}
with corresponding
values:
Single DB:
update JCR_SITEM set PARENT_ID='${CONTAINER_NAME}00exo0jcr0root0uuid0000000000000' where ID = '${ID}'
Multi DB:
update JCR_MITEM set PARENT_ID='00exo0jcr0root0uuid0000000000000' where ID = '${ID}'
For all query results rows containing items belonging to I_CLASS = 2 (property).
delete from JCR_SREF where PROPERTY_ID = '${ID}' delete from JCR_SVALUE where PROPERTY_ID = '${ID}' delete from JCR_SITEM where PARENT_ID = '${ID}' or ID='${ID}'
Several versions of same item.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM I where EXISTS (select * from JCR_SITEM J WHERE I.CONTAINER_NAME = J.CONTAINER_NAME and I.PARENT_ID = J.PARENT_ID AND I.NAME = J.NAME and I.I_INDEX = J.I_INDEX and I.I_CLASS = J.I_CLASS and I.VERSION != J.VERSION)
Fix description: Keep the newest version and remove the others.
To fix this issue, do as follows:
Grouping:
select max(VERSION) as MAX_VERSION, PARENT_ID, NAME, CONTAINER_NAME, I_CLASS, I_INDEX from JCR_SITEM WHERE I_CLASS=2 GROUP BY PARENT_ID, CONTAINER_NAME, NAME, I_CLASS, I_INDEX HAVING count(VERSION) > 1
Execute the following query, replacing
${PARENT_ID}
and
${CONTAINER_NAME}
, ${NAME}
,
${I_CLASS}
, ${I_INDEX}
,
${MAX_VERSION}
with corresponding values
contained in results of the query mentioned above:
Single DB:
select * from JCR_SITEM where I.CONTAINER_NAME='${CONTAINER_NAME}' and PARENT_ID='${PARENT_ID}' and NAME='${NAME}' and I_CLASS='${I_CLASS}' and I_INDEX='${I_INDEX}' and VERSION < ${MAX_VERSION}
Multi DB:
select * from JCR_SITEM where PARENT_ID='${PARENT_ID}' and NAME='${NAME}' and I_CLASS='${I_CLASS}' and I_INDEX='${I_INDEX}' and VERSION < ${MAX_VERSION}
Execute the following queries and replace
${ID}
with corresponding values of newly
obtained results.
delete from JCR_SREF where PROPERTY_ID = '${ID}' delete from JCR_SVALUE where PROPERTY_ID = '${ID}' delete from JCR_SITEM where ID='${ID}'
Reference properties without reference records.
To detect this issue, you need to execute the following query:
select * from JCR_SITEM P, JCR_SVALUE V where P.ID = V.PROPERTY_ID and P.P_TYPE=9 and NOT EXISTS (select * from JCR_SREF R where P.ID=R.PROPERTY_ID)
Fix description: Remove broken reference properties.
To fix this issue, execute the following query and replace
${ID}
with the corresponding value.
delete from JCR_SVALUE where PROPERTY_ID = '${ID}' delete from JCR_SITEM where ID = '${ID}'
Node considered to be locked in the lockmanager data, is not locked according to the JCR data or the opposite situation.
To detect this issue, you need to:
First, get all locked nodes IDs in repository, mentioned in the JCR_xITEM table by executing a query:
select distinct PARENT_ID from JCR_SITEM where I_CLASS=2 and (NAME='[http://www.jcp.org/jcr/1.0]lockOwner' or NAME='[http://www.jcp.org/jcr/1.0]lockIsDeep')
Then compare it to nodes IDs from the LockManager's table.
JBC:
During comparing results, be aware that for single DB configurations you need to cut off ID prefix representing workspace name for results obtained from the JCR_xITEM table.
Though a single lock table is usually used for the whole repository, it is possible to configure separate DB lock tables for each workspace. In this case, you need to execute queries for each table to obtain information over repository.
Non shareable:
Select fqn from ${LOCK_TABLE} where parent='/$LOCKS'
Shareable:
Replace ${REPOSITORY_NAME}
with the
corresponding value:
select fqn from ${LOCK_TABLE} where parent like '/${REPOSITORY_NAME}%/$LOCKS/'
ISPN:
For ISPN lock tables which are defined for each workspace separately, you must execute queries for all lock tables to obtain information over repository.
To get all set of locked node IDs in repository, you must execute the following query for each workspace.
select id from ${LOCK_TABLE}
Fix description: Remove inconsistent lock entries and properties. Remove entries in LOCK_TABLE that have no corresponding properties in JCR_xITEM table and remove the JCR_xITEM properties that have no corresponding entries in LOCK_TABLE.
To fix this, do the followings:
First, remove property values, and replace
${ID}
with corresponding node ID.
Delete from JCR_SVALUE where PROPERTY_ID in (select ID from JCR_SITEM where PARENT_ID='${ID}' and (NAME = '[http://www.jcp.org/jcr/1.0]lockIsDeep' or NAME = '[http://www.jcp.org/jcr/1.0]lockOwner'))
then
remove property items themselves, replace ${ID}
with the corresponding node ID:
delete from JCR_SITEM where PARENT_ID='${ID}' and (NAME = '[http://www.jcp.org/jcr/1.0]lockIsDeep' or NAME = '[http://www.jcp.org/jcr/1.0]lockOwner')
Replace ${ID}
and ${FQN}
with
the corresponding node ID and FQN:
JBC:
delete from ${LOCK_TABLE} where fqn = '${FQN}'
ISPN:
Execute the following query for each workspace:
delete from ${LOCK_TABLE} where id = '${ID}'
A property's value is stored in the file system, but its content is missing.
This cannot be checked via simple SQL queries.
eXo JCR supports the Java Transaction API out of the box. If a TransactionService has been defined (refer to the section about the TransactionService for more details) at session save, it checks if a global transaction is active and if so, it automatically enrolles the JCR session in the global transaction. If you intend to use a managed data source, you will have to configure the service DataSourceProvider (for more details please refer to the corresponding section).
eXo JCR supports J2EE Connector Architecture 1.5, thus If you would like to delegate the JCR Session lifecycle to your application server, you can use the JCA Resource Adapter for eXo JCR if your application server supports JCA 1.5. This adapter only supports XA Transaction, in other words you cannot use it for local transactions. Since the JCR Sessions have not been designed to be shareable, the session pooling is simply not covered by the adapter.
The equivalent of the javax.resource.cci.ConnectionFactory in JCA terminology is org.exoplatform.connectors.jcr.adapter.SessionFactory in the context of eXo JCR, the resource that you will get thanks to a JNDI lookup is of type SessionFactory and provides the following methods:
/** * Get a JCR session corresponding to the repository * defined in the configuration and the default workspace. * @return a JCR session corresponding to the criteria * @throws RepositoryException if the session could not be created */ Session getSession() throws RepositoryException; /** * Get a JCR session corresponding to the repository * defined in the configuration and the default workspace, using * the given user name and password. * @param userName the user name to use for the authentication * @param password the password to use for the authentication * @return a JCR session corresponding to the criteria * @throws RepositoryException if the session could not be created */ Session getSession(String userName, String password) throws RepositoryException; /** * Get a JCR session corresponding to the repository * defined in the configuration and the given workspace. * @param workspace the name of the expected workspace * @return a JCR session corresponding to the criteria * @throws RepositoryException if the session could not be created */ Session getSession(String workspace) throws RepositoryException; /** * Get a JCR session corresponding to the repository * defined in the configuration and the given workspace, using * the given user name and password. * @param workspace the name of the expected workspace * @param userName the user name to use for the authentication * @param password the password to use for the authentication * @return a JCR session corresponding to the criteria * @throws RepositoryException if the session could not be created */ Session getSession(String workspace, String userName, String password) throws RepositoryException;
Table 1.50. Configuration Properties
PortalContainer | In case of the portal mode, if no portal container can be found in the context of the request, the adapter will use the value of this parameter to get the name of the expected portal container to create the JCR sessions. In case of a standalone mode, this parameter is not used. This parameter is optional, by default the default portal container will be used. |
Repository | The repository name used to create JCR sessions. This parameter is optional, by default the current repository will be used. |
In case of the standalone mode where the JCR and its dependencies are not provided, you will need to deploy the whole ear file corresponding to the artifactId exo.jcr.ear and groupId org.exoplatform.jcr, the rar file is embedded into the ear file. In case the JCR and its dependencies are provided like when you use it with gateIn for example, you will need to deploy only the rar file corresponding to the artifactId exo.jcr.connectors.jca and groupId org.exoplatform.jcr.
To deploy JCA module on standalone mode :
Run "mvn clean install -DskipTests=true" from the root of the jcr project.
Get/download the JBoss AS 7 bundle.
Deploy the whole ear file corresponding to the artifactId exo.jcr.ear and groupId org.exoplatform.jcr
Configure the resource adapter in JBOSS_HOME/standalone/configuration/standalone.xml , you should replace :
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.0"/>
by :
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.0"> <resource-adapters> <resource-adapter> <archive>exo.jcr.ear.ear#exo-jcr.rar</archive> <transaction-support>XATransaction</transaction-support> <connection-definitions> <connection-definition class-name="org.exoplatform.connectors.jcr.impl.adapter.ManagedSessionFactory" jndi-name="java:/jcr/Repository"> <config-property name="PortalContainer">portal</config-property> <config-property name="Repository">repository</config-property> </connection-definition> </connection-definitions> </resource-adapter> </resource-adapters> </subsystem>
To deploy JCA module on Platform :
Get/download the JBoss bundle of platform 4 or higher
Go to folder "exo.jcr.connectors.jca" and run "mvn clean install -Pplatform" command.
Deploy exo.jcr.connectors.jca/target/exo.jcr.connectors.jca-1.15.x-GA.rar in PLATFORM_HOME/standalone/deployments/ of platform 4 bundle, and rename it to exo-jcr.rar
Configure the resource adapter in PLATFORM_HOME/standalone/configuration/standalone-exo.xml , you should replace :
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.1"/>
by
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.1"> <resource-adapters> <resource-adapter> <archive>exo-jcr.rar</archive> <transaction-support>XATransaction</transaction-support> <connection-definitions> <connection-definition class-name="org.exoplatform.connectors.jcr.impl.adapter.ManagedSessionFactory" jndi-name="java:/jcr/Repository"> <config-property name="PortalContainer">portal</config-property> <config-property name="Repository">repository</config-property> </connection-definition> </connection-definitions> </resource-adapter> </resource-adapters> </subsystem>
To deploy JCA module on Gatein/JPP:
Get/download the JBoss bundle of GateIn 3.5/JPP6 or higher
Go to folder "exo.jcr.connectors.jca" and run "mvn clean install -Pgatein" command.
Deploy exo.jcr.connectors.jca/target/exo.jcr.connectors.jca-1.15.x-GA.rar in GATEIN_HOME/standalone/deployments/ and rename it to exo-jcr.rar
Configure the resource adapter in GATEIN_HOME/standalone/configuration/standalone.xml , you should replace :
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.0"/>
by
<subsystem xmlns="urn:jboss:domain:resource-adapters:1.0"> <resource-adapters> <resource-adapter> <archive>exo-jcr.rar</archive> <transaction-support>XATransaction</transaction-support> <connection-definitions> <connection-definition class-name="org.exoplatform.connectors.jcr.impl.adapter.ManagedSessionFactory" jndi-name="java:/jcr/Repository"> <config-property name="PortalContainer">portal</config-property> <config-property name="Repository">repository</config-property> </connection-definition> </connection-definitions> </resource-adapter> </resource-adapters> </subsystem>
eXo JCR is a complete implementation of the standard JSR 170: Content Repository for Java TM Technology API, including Level 1, Level 2 and Additional Features specified in the JCR Specification.
The JCR specification (JSR 170) does not have many requirements about Access Control. It only requires the implementation of the Session.checkPermission(String absPath, String actions) method. This method checks if a current session has permissions to perform some actions on absPath:
absPath : The string representation of a JCR absolute path.
actions : eXo JCR interprets this string as a comma separated the list of individual action names, such as the 4 types defined in JSR 170 :
add_node : Permission to add a node.
set_property : Permission to set a property.
remove : Permission to remove an item (node or property).
read : Permission to retrieve a node or read a property value.
For example :
session.checkPermission("/Groups/organization", "add_node,set_property") will check if the session is allowed to add a child node to "organization" and to modify its properties. If one of the two permissions is denied, an AccessDeniedException is thrown.
session.checkPermission("/Groups/organization/exo:name", "read,set_property") will check if the session is allowed to read and change the "exo:name" property of the "organization" node.
session.checkPermission("/Groups/organization/exo:name", "remove") will check if the session allowed to remove "exo:name" property or node.
The JSR170 specification does not define how permissions are managed or checked. So eXo JCR has implemented its own proprietary extension to manage and check permissions on nodes. In essence, this extension uses an Access Control List (ACL) policy model applied to eXo Organization model (see eXo Platform Organization Service).
At the heart of eXo Access Control, is the notion of the identity concept. Access to JCR is made through sessions acquired against a repository. Sessions can be authenticated through the standard (but optional) repository login mechanism. Each session is associated with a principal. The principal is an authenticated user or group that may act on JCR data. The identity is a string identifying this group or user.'
There are 3 reserved identities that have special meanings in eXo JCR:
any : represents any authenticated session.
anonim : represents a principal for non authenticated sessions. (No error, it's really "anonim").
system : represents a principal for system sessions, typically used for administrative purposes. System session has full access (all permissions) to all nodes; therefore be careful when working with system sessions.
An access control list (ACL) is a list of permissions attached to an object. An ACL specifies which users, groups or system processes are granted access to JCR nodes, as well as what operations are allowed to be performed on given objects.
eXo JCR Access Control is based on two facets applied to nodes :
Privilegeable : Means that the user or group (also called principal) needs the appropriate privileges to access to this node. The privileges are defined as (positive) permissions that are granted to users or groups.
Ownable : The node has an owner. The owner has always full access (all permissions) to the node, independent of the privilegeable facet.
A privilegeable node defines the permissions required for actions on this node. For this purpose, it contains an ACL.
At JCR level, this is implemented by an exo:privilegeable mixin.
<nodeType name="exo:privilegeable" isMixin="true" hasOrderableChildNodes="false" primaryItemName=""> <propertyDefinitions> <propertyDefinition name="exo:permissions" requiredType="Permission" autoCreated="true" mandatory="true" onParentVersion="COPY" protected="true" multiple="true"> <valueConstraints/> </propertyDefinition> </propertyDefinitions> </nodeType>
A privilegeable node can have multiple exo:permissions values. The type of these values is the eXo JCR specific Permission type. The Permission type contains a list of ACL.
The possible values are corresponding to JCR standard actions:
read: The node or its properties can be read.
remove: The node or its properties can be removed.
add_node : Child nodes can be added to this node.
set_property : The node's properties can be modified, added or removed.
An ownable node defines an owner identity. The owner has always full privileges. These privileges are independent of the permissions set by exo:permissions. At JCR level, the ownership is implemented by an exo:owneable mixin. This mixin holds an owner property.
<nodeType name="exo:owneable" isMixin="true" hasOrderableChildNodes="false" primaryItemName=""> <propertyDefinitions> <propertyDefinition name="exo:owner" requiredType="String" autoCreated="true" mandatory="true" onParentVersion="COPY" protected="true" multiple="false"> <valueConstraints/> </propertyDefinition> </propertyDefinitions> </nodeType>
The exo:owner property value contains exactly one identity string value. There might be a long list of different permissions for different identities (user or groups). All permissions are always positive permissions; denials are not possible. When checking a permission of an action, it's therefore perfectly sufficient that the principal of a session belongs to the groups to which the concerned action is granted.
To grant or deny access to a node, eXo JCR applies a privilege resolving logic at node access time.
If a node is privilegeable, the node's ACL is used exclusively. If the ACL does not match the principal's identity, the principal has no access (except the owner of the node).
Non-privilegeable nodes inherit permissions from their parent node. If the parent node is not privilegeable either, the resolving logic looks further up the node hierarchy and stops with the first privilegeable ancestor of the current node. All nodes potentially inherit from the workspace root node.
The owner of a node is inherited in accordance with the same logic: If the node has no owner, the owner information of the closest owneable ancestor is inherited.
This inheritance is implemented by browsing up the node's hierarchy. At access time, if the node does not have owner or permissions, the system looks up into the node's ancestor hierarchy for the first ACL.
When no matching ACL is found in the ancestor hierarchy, the system may end up looking at the root node's ACL. As ACL are optional, even for the root node, if the root node has no ACL, the following rule is ultimately applied to resolve privileges:
any identity (any authenticated session) is granted all permissions
Access Control nodetypes are not extendible: The access control mechanism works for exo:owneable and exo:privilegeable nodetypes only, not for their subtypes! So you cannot extend those nodetypes.
Autocreation: By default, newly created nodes are neither exo:privilegeable nor exo:owneable but it is possible to configure the repository to auto-create exo:privilegeable or/and exo:owneable thanks to eXo's JCR interceptors extension (see JCR Extensions)
OR-based Privilege Inheritance: Note, that eXo's Access Control implementation supports a privilege inheritance that follows a strategy of either...or/ and has only an ALLOW privilege mechanism (there is no DENY feature). This means that a session is allowed to perform some operations on some nodes if its identity has an appropriate permission assigned to this node. Only if there is no exo:permission property assigned to the node itself, the permissions of the node's ancestors are used.
In the following example, you see a node named "Politics" which contains two nodes named "Cats" and "Dogs".
These examples are exported from eXo DMS using the \"document view\" representation of JCR. Each value of a multi-value property is separated by a whitespace, each whitespace is escaped by x0020.
<Politics jcr:primaryType="nt:unstructured" jcr:mixinTypes="exo:owneable exo:datetime exo:privilegeable" exo:dateCreated="2009-10-08T18:02:43.687+02:00" exo:dateModified="2009-10-08T18:02:43.703+02:00" exo:owner="root" exo:permissions="any_x0020_read *:/platform/administrators_x0020_read *:/platform/administrators_x0020_add_node *:/platform/administrators_x0020_set_property *:/platform/administrators_x0020_remove"> <Cats jcr:primaryType="exo:article" jcr:mixinTypes="exo:owneable" exo:owner="marry" exo:summary="The_x0020_secret_x0020_power_x0020_of_x0020_cats_x0020_influences_x0020_the_x0020_leaders_x0020_of_x0020_the_x0020_world." exo:text="" exo:title="Cats_x0020_rule_x0020_the_x0020_world" /> <Dogs jcr:primaryType="exo:article" jcr:mixinTypes="exo:privilegeable" exo:permissions="manager:/organization_x0020_read manager:/organization_x0020_set_property" exo:summary="Dogs" exo:text="" exo:title="Dogs_x0020_are_x0020_friends" /> </Politics>
The "Politics" node is exo:owneable and exo:privilegeable. It has both an exo:owner property and an exo:permissions property. There is an exo:owner="root" property so that the user root is the owner. In the exo:permissions value, you can see the ACL that is a list of access controls. In this example, the group *:/platform/administrators has all rights on this node (remember that the "*" means any kind of membership). any means that any users also have the read permission.s
As you see in the jcr:mixinTypes property, the "Cats" node is exo:owneable and there is an exo:owner="marry" property so that the user marry is the owner. The "Cats" node is not exo:privilegeable and has no exo:permissions. In this case, we can see the inheritance mechanism here is that the "Cats" node has the same permissions as "Politics" node.
Finally, the "Dogs" node is also a child node of "Politics". This node is not exo:owneable and inherits the owner of the "Politics" node (which is the user root). Otherwise, "Dogs" is exo:privilegeable and therefore, it has its own exo:permissions. That means only the users having a "manager" role in the group "/organization" and the user "root" have the rights to access this node.
Here is an example showing the accessibility of two nodes (to show inheritance) for two sample users named manager and user:
The "+" symbol means that there is a child node "exo:owneable".
This session describes how permission is validated for different JCR actions.
read node: Check the read permission on a target node.
For example: Read /node1/subnode node, JCR will check the "read" permission exactly on "subnode".
read property : Check the read permission on a parent node.
For example: Read /node1/myprop - JCR will check the "read" permission on "node1".
add node: Check add_node on a parent node.
For example: Add /node1/subnode node, JCR will check the "add_node" permission on "node1".
set property: set_property on a parent node.
For example: Try to set /node1/myprop property, JCR will check the "set_property" permission on "node1".
remove node: Check the remove permission on a target node.
For example: Try to remove /node1/subnode node, JCR will check the "remove" permission on "subnode".
remove property: Check the remove permission on a parent node.
For example: Try to remove /node1/myprop property, JCR will check the "remove" permission on "node1".
add mixin: Check the "add_node" and "set_property" permission on a target node.
For example: Try to add mixin to /node1/subnode node, JCR will check the "add_node" and "set_property" permission on "subnode".
The behavior of the permission "remove" and "add mixin" validation has changed since JCR 1.12.6-GA. The old behavior is:
remove node: Check the remove permission on a parent node.
For example: Try to remove /node1/subnode node, JCR will check the "remove" permission on "node1".
add mixin: Check the "add_node" and "set_property" permission on a parent node.
For example: Try to add mixin to /node1/subnode node, JCR will check the "add_node" and "set_property" permission on "node1".
eXo JCR's ExtendedNode interface which extends javax.jcr.Node interface provides additional methods for Access Control management.
Table 1.51. Additional methods
Method signature | Description |
---|---|
void setPermissions(Map<String, String[]> permissions) | Assigns a set of Permissions to a node |
void setPermission(String identity, String[] permission) | Assigns some Identities' Permission to a node |
void removePermission(String identity) | Removes an Identity's Permission |
void removePermission(String identity, String permission) | Removes the specified permission for a particular identity |
void clearACL() | Clears the current ACL so it becomes default |
AccessControlList getACL() | Returns the current ACL |
void checkPermission(String actions) | Checks Permission (AccessDeniedException will be thrown if denied) |
The "identity" parameter is a user or a group name. The permissions are the literal strings of the standard action permissions (add_node, set_property, remove, read).
An extended Access Control system consists of:
Specifically configured custom ExtendedAccessManager which is called by eXo JCR internals to check if user's Session (user) has some privilege to perform some operation or not.
The Action sets a thread local InvocationContext at runtime, the InvocationContext instance is then used by the ExtendedAccessManager in handling permissions of the current Session.
InvocationContext is a collection of properties which reflect the state of a current Session. At present, it contains: the type of the current operation on Session (event), current Item (javax.jcr.Item) on which this operation is performed and the current eXo Container
This is an extension of eXo JCR Access Control features. Please read Access Control and JCR Extensions topics first.
SetAccessControlContextAction implements Action and may be called by SessionActionInterceptor as a reaction of some events - usually before writing methods and after reading (getNode(), getProperty() etc). This SetAccessControlContextAction calls the AccessManager.setContext(InvocationContext context) method which sets the ThreadLocal invocation context for the current call.
Action's Configuration may look like as the following:
<value> <object type="org.exoplatform.services.jcr.impl.ext.action.ActionConfiguration"> <field name="eventTypes"><string>addNode,read</string></field> <field name="workspace"><string>production</string></field > <field name="actionClassName"><string>org.exoplatform.services.jcr.ext.access.SetAccessControlContextAction</string></field> </object> </value>
The InvocationContext contains the current Item, the previous Item, the current ExoContainer and the current EventType is like below:
public class InvocationContext extends HashMap implements Context { /** * @return The related eXo container. */ public final ExoContainer getContainer() /** * @return The current item. */ public final Item getCurrentItem() /** * @return The previous item before the change. */ public final Item getPreviousItem() /** * @return The type of the event. */ public final int getEventType() }
By default, all Workspaces share an AccessManager instance, created by RepositoryService at the startup (DefaultAccessManagerImpl) which supports default access control policy as described in the Access Control section. Custom Access Control policy can be applied to certain Workspace configuring access-manager element inside workspace as follows:
<workspace name="ws"> ... <!-- after query-handler element --> <access-manager class="org.exoplatform.services.jcr.CustomAccessManagerImpl"> <properties> <property name="someProperty" value="value"/> ... </properties> </access-manager> ... </workspace>
When implementing AccessManager, hasPermission() method has to be overriden so it uses the current invocation context at its discretion. For instance, it may get the current node's metadata and make a decision if the current User has appropriate permissions. Use Invocation Context's runtime properties to make a decision about current Session's privileges (see the Example below)
Simplified Sequence diagram for the Session.getNode() method (as an Example):
The sample CustomAccessManagerImpl below extends the default access manager and uses some DecisionMakingService in the overloaded hasPermission method to find out if a current user has permission to use current item, event type, user and some parameter of AccessManager. To make this Access manager work, it is necessary to configure it in jcr configuration as mentioned in Custom Extended Access Manager and SetAccessControlContextAction should be configured in the way mentioned in Access Context Action.
public class CustomAccessManagerImpl extends AccessManager { private String property; private DecisionMakingService theService; public CustomAccessManagerImpl (RepositoryEntry config, WorkspaceEntry wsConfig, DecisionMakingService someService) throws RepositoryException, RepositoryConfigurationException { super(config, wsConfig); this.property = wsConfig.getAccessManager().getParameterValue("someParam"); this.theService = someService; } @Override public boolean hasPermission(AccessControlList acl, String[] permission, Identity user) { // call the default permission check if (super.hasPermission(acl, permission, user)) { Item curItem = context().getCurrentItem(); int eventType = context().getEventType(); ExoContainer container = context().getContainer(); // call some service's method return theService.makeDecision(curItem, eventType, user, property); } else { return false; } } }
Link Producer Service - a simple service, which generates an .lnk file, that is compatible with the Microsoft link file format. It is an extension of the REST Framework library and is included into the WebDav service. On dispatching a GET request the service generates the content of an .lnk file, which points to a JCR resource via WebDav.
Link Producer has a simple configuration like described below:
<component> <key>org.exoplatform.services.jcr.webdav.lnkproducer.LnkProducer</key> <type>org.exoplatform.services.jcr.webdav.lnkproducer.LnkProducer</type> </component>
When using JCR the resource can be addressed by WebDav reference
(href) like
http://host:port/rest/jcr/repository/workspace/somenode/somefile.extention
, the link servlet must be called for this resource by several hrefs, like
http://localhost:8080/rest/lnkproducer/openit.lnk?path=/repository/workspace/somenode/somefile.extention
Please note, that when using the portal mode the REST servlet is
available using a reference (href) like
http://localhost:8080/portal/rest/...
The name of the .lnk file can be any. But for the best compatibility it must be the same as the name of the JCR resource.
Here is a step by step sample of a use case of the link producer... At first, type valid reference to the resource, using the link producer in your browser's adress field:
Internet Explorer will give a dialog window requesting to Open a file or to Save it. Click on the Open button
In Windows system an .lnk file will be downloaded and opened with the application which is registered to open the files, which are pointed to by the .lnk file. In case of a .doc file, Windows opens Microsoft Office Word which will try to open a remote file (test0000.doc). Maybe it will be necessary to enter USERNAME and PASSWORD.
Next, you will be able to edit the file in Microsoft Word.
The Link Producer is necessary for opening/editing and then saving the remote files in Microsoft Office Word, without any further updates.
Also the Link Producer can be referenced to from an HTML page. If page contains code like
<a href="http://localhost:8080/rest/lnkproducer/openit.lnk?path=/repository/workspace/somenode/somefile.extention">somefile.extention</a>
the file "somefile.extention" will open directly.
Processing binary large object (BLOB) is very important in eXo JCR, so this section focuses on explaining how to do it.
Binary large object (BLOB) properties can be stored in two ways in the eXo JCR: in the database with items information or in an external storage on host file system. These options can be configured at workspace in the repository configuration file (repository-configuration.xml in portal and exo-jcr-config.xml in standalone mode). The database storage can't be completely disabled.
The first case is optimal for most of cases which you do not use very large values or/and do not have too many BLOBs. The configuration of the BLOBs size and BLOBs quantity in a repository depend on your database features and hardware.
The second case is to use an external values storage. The storage can be located on a built-in hard disk or on an attached storage. But in any cases, you should access to the storage as if it was a regular file(s). The external value storage is optional and can be enabled in a database configuration.
eXo JCR Repository service configuration basics is discussed in Configuration
Database and workspace persistence storage configuration is discussed in JDBC Data Container config
Configuration details for External Value Storages.
In both of the cases, a developer can set/update the binary Property via Node.setProperty(String, InputStream), Property.setValue(InputStream) as described in the spec JSR-170. Also, there is the setter with a ready Value object (obtainer from ValueFactory.createValue(InputStream)).
An example of a specification usage.
// Set the property value with given stream content. Property binProp = node.setProperty("BinData", myDataStream); // Get the property value stream. InputStream binStream = binProp.getStream(); // You may change the binary property value with a new Stream, all data will be replaced // with the content from the new stream. Property updatedBinProp = node.setProperty("BinData", newDataStream); // Or update an obtained property updatedBinProp.setValue(newDataStream); // Or update using a Value object updatedBinProp.setValue(ValueFactory.createValue(newDataStream)); // Get the updated property value stream. InputStream newStream = updatedBinProp.getStream();
But if you need to update the property sequentially and with partial content, you have no choice but to edit the whole data stream outside and get it back to the repository each time. In case of really large-sized data, the application will be stuck and the productivity will decrease a lot. JCR stream setters will also check constraints and perform common validation each time.
There is a feature of the eXo JCR extension that can be used for binary values partial writing without frequent session level calls. The main idea is to use a value object obtained from the property as the storage of the property content while writing/reading during runtime.
According to the spec JSR-170, Value interface provides the state of property that can't be changed (edited). The eXo JCR core provides ReadableBinaryValue and EditableBinaryValue interfaces which themselves extend JCR Value. The interfaces allow the user to partially read and change a value content.
ReadableBinaryValue value can be casted from any value, i.e. String, Binary, Date etc.
// get the property value of type PropertyType.STRING ReadableBinaryValue extValue = (ReadableBinaryValue) node.getProperty("LargeText").getValue(); // read 200 bytes to a destStream from the position 1024 in the value content OutputStream destStream = new FileOutputStream("MyTextFile.txt"); extValue.read(destStream, 200, 1024);
But EditableBinaryValue can be applied only to properties of type PropertyType.BINARY. In other cases, a cast to EditableBinaryValue will fail.
After the value has been edited, the EditableBinaryValue value can be applied to the property using the standard setters (Property.setValue(Value), Property.setValues(Value), Node.setProperty(String, Value) etc.). Only after the EditableBinaryValue has been set to the property, it can be obtained in this session by getters (Property.getValue(), Node.getProperty(String) etc.).
The user can obtain an EditableBinaryValue instance and fill it with data in an interaction manner (or any other appropriated to the targets) and return (set) the value to the property after the content will be done.
// get the property value for PropertyType.BINARY Property EditableBinaryValue extValue = (EditableBinaryValue) node.getProperty("BinData").getValue(); // update length bytes from the stream starting from the position 1024 in existing Value data extValue.update(dataInputStream, dataLength, 1024); // apply the edited EditableBinaryValue to the Property node.setProperty("BinData", extValue); // save the Property to persistence node.save();
A practical example of the iterative usage. In this example, the value is updated with data from the sequence of streams and after the update is done, the value will be applied to the property and be visible during the session.
// update length bytes from the stream starting from the particular // position in the existing Value data int dpos = 1024; while (source.dataAvailable()) { extValue.update(source.getInputStream(), source.getLength(), dpos); dpos = dpos + source.getLength(); } // apply the edited EditableBinaryValue to the Property node.setProperty("BinData", extValue);
ReadableBinaryValue has one method to read Value.
Read length bytes is counted from the binary value to the given position into the stream.
long read(OutputStream stream, long length, long position) throws IOException, RepositoryException ;
EditableBinaryValue has two methods to edit value.
Update with length bytes from the specified stream to this value data at a position. If the position is lower than 0, the IOException exception will be thrown. If the position is higher than the current Value length, the Value length will be increased at first to the size of position and length bytes will be added after the position.
void update(InputStream stream, long length, long position) throws IOException;
Set the length of the Value in bytes to the specified size. If the size is lower than 0, the IOException exception will be thrown. This operation can be used to extend or truncat the Value size. This method is used internally in the update operation in case of extending the size to the given position.
void setLength(long size) throws IOException;
An application can perform JCR binary operations more flexibly and will have less I/O and CPU usage using these methods.
* Java Community Process: JSR 170 and JSR 283
* Roy T. Fielding, JSR 170 Overview: Standardizing the Content Repository Interface (March 13, 2005)
The goals of this section are:
Coverage of the requirements of Workspace Data Container implementation
Description of container life cycle
Description relations between container and high-level DataManagers
Workspace Data Container (container) serves Repository Workspace persistent storage. WorkspacePersistentDataManager (data manager) uses container to perform CRUD operation on the persistent storage. Accessing to the storage in the data manager is implemented via storage connection obtained from the container (WorkspaceDataContainer interface implemenatiton). Each connection represents a transaction on the storage. Storage Connection (connection) should be an implementation of WorkspaceStorageConnection.
Container acts as a factory of a new storage connections. Usually, this method is designed to be synchronized to avoid possible concurrent issues.
WorkspaceStorageConnection openConnection() throws RepositoryException;
Open read-only WorkspaceStorageConnection. Read-only connections can be potentially a bit faster in some cases.
WorkspaceStorageConnection openConnection(boolean readOnly) throws RepositoryException;
Read-only WorkspaceStorageConnection is experimental feature and not currently handled in JCR. Actually, such connections didn't prove their performance, so JCR Core doesn't use them.
Storage connection might also be reused. This means reuse of physical resource (e.g. JDBC Connection) allocated by one connection in another. This feature is used in a data manager for saving ordinary and system changes on the system Workspace. But the reuse is an optional feature and it can work, otherwise a new connection will open.
WorkspaceStorageConnection reuseConnection(WorkspaceStorageConnection original) throws RepositoryException;
When checking Same-Name Siblings (SNS) existence, JCR Core can use new connection or not. This is defined via Workspace Data Container configuration and retrieved by using a special method.
boolean isCheckSNSNewConnection();
Container initialization is only based on a configuration. After the container has been created, it's not possible to change parameters. Configuration consists of implementation class and set of properties and Value Storages configuration.
Container provides optional special mechanism for Value storing. It's possible to configure external Value Storages via container configuration (available only via configuration). Value Storage works as fully independent pluggable storage. All required parameters storage obtains from its configuration. Some storages are possible for one container. Configuration describes such parameters as ValueStoragePluginimplementation class, set of implementation specific properties and filters. The filters declares criteria for Value matching to the storage. Only matched Property Values will be stored. So, in common case, the storage might contains only the part of the Workspace content. Value Storages are very useful for BLOB storing. E.g. storing on the File System instead of a database.
Container obtains Values Storages from ValueStoragePluginProvider component. Provider acts as a factory of Value channels (ValueIOChannel). Channel provides all CRUD operation for Value Storage respecting the transaction manner of work (how it can be possible due to implementation specifics of the storages).
Container is used for read and write operations by data manager. Read operations (getters) uses connection once and close it on the finally. Write operations performs in commit method as a sequence of creating/ updating calls and final commit (or rollback on error). Writes uses one connection (or two - another for system workspace) per commit call. One connection guaranties transaction support for write operations. Commit or rollback should free/clean all resources consumed by the container (connection).
Connection creation and reuse should be a thread safe operation. Connection provides CRUD operations support on the storage.
Read ItemData from the storage by item identifier.
ItemData getItemData(String identifier) throws RepositoryException, IllegalStateException;
Find Item by parent (id) and name (with path index) of a given type.
ItemData getItemData(NodeData parentData, QPathEntry name, ItemType itemType) throws RepositoryException, IllegalStateException;
Get child Nodes of the parent node.
List<NodeData> getChildNodesData(NodeData parent) throws RepositoryException, IllegalStateException;
Get child Nodes of the parent node.ItemDataFilter used to reduce count of returned items. But not guarantee that only items matching filter will be returned.
List<NodeData> getChildNodesData(NodeData parent, ListList<QPathEntryFilter> pattern) throws RepositoryException, IllegalStateException;
Reads List of PropertyData from the storage by using the parent location of the item.
List<PropertyData> getChildPropertiesData(NodeData parent) throws RepositoryException, IllegalStateException;
Get child Properties of the parent node. ItemDataFilter used to reduce count of returned items. But not guarantee that only items matching filter will be returned.
List<PropertyData> getChildPropertiesData(NodeData parent, List<QPathEntryFilter> pattern) throws RepositoryException, IllegalStateException;
Reads List of PropertyData with empty ValueData from the storage by using the parent location of the item.
This methiod specially dedicated for non-content modification operations (e.g. Items delete).
List<PropertyData> listChildPropertiesData(NodeData parent) throws RepositoryException, IllegalStateException;
Reads List of PropertyData from the storage by using the parent location of the item.
It's REFERENCE type: Properties referencing Node with given nodeIdentifier. See more in javax.jcr.Node.getReferences()
List<PropertyData> getReferencesData(String nodeIdentifier) throws RepositoryException, IllegalStateException, UnsupportedOperationException;
Get child Nodes of the parent node whose value of order number is between fromOrderNum and toOrderNum. Return true if there are data to retrieve for next request and false in other case.
boolean getChildNodesDataByPage(NodeData parent, int fromOrderNum, int toOrderNum, List<NodeData> childs) throws RepositoryException;
Get children nodes count of the parent node.
int getChildNodesCount(NodeData parent) throws RepositoryException;
Get order number of parent's last child node.
int getLastOrderNumber(NodeData parent) throws RepositoryException;
Add single NodeData.
void add(NodeData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Add single PropertyData.
void add(PropertyData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Update NodeData.
void update(NodeData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Update PropertyData.
void update(PropertyData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Rename NodeData by using Node identifier and new name and indexing from the data.
void rename(NodeData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Delete NodeData.
void delete(NodeData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Delete PropertyData.
void delete(PropertyData data) throws RepositoryException,UnsupportedOperationException,InvalidItemStateException,IllegalStateException;
Prepare the commit phase.
void prepare() throws IllegalStateException, RepositoryException;
Persist changes and closes connection. It can be database transaction commit for instance etc.
void commit() throws IllegalStateException, RepositoryException;
Refuse persistent changes and closes connection. It can be database transaction rollback for instance etc.
void rollback() throws IllegalStateException, RepositoryException;
All methods throw IllegalStateException if connection is closed. UnsupportedOperationException if the method is not supported (e.g. JCR Level 1 implementation etc). RepositoryException if some errors occur during preparation, validation or persistence.
Container has to care about storage consistency (JCR constraints) on write operations: (InvalidItemStateException should be thrown according the spec). At least, the following checks should be performed:
On ADD errors
Parent not found. Condition: Parent ID (Item with ID is not exists).
Item already exists. Condition: ID (Item with ID already exists).
Item already exists. Condition: Parent ID, Name, Index (Item with parent ID, name and index already exists).
On DELETE errors
Item not found. Condition ID.
Can not delete parent till children exists.
On UPDATE errors
Item not found. Condition ID.
Item already exists with higher Version. Condition: ID, Version (Some Session had updated Item with ID prior this update).
The container (connection) should implement consistency of Commit (Rollback) in transaction manner. I.e. If a set of operations was performed before the future Commit and another next operation fails. It should be possible to rollback applied changes using Rollback command.
Container implementation obtains Values Storages option via ValueStoragePluginProvider component. Provider acts as a factory of Value channels (ValueIOChannel) and has two methods for this purpose:
Return ValueIOChannel matched this property and valueOrderNumer. Null will be returned if no channel matches.
ValueIOChannel getApplicableChannel(PropertyData property, int valueOrderNumer) throws IOException;
Return ValueIOChannel associated with given storageId.
ValueIOChannel getChannel(String storageId) throws IOException, ValueStorageNotFoundException;
There is also method for consistency check, but this method doesn't used anywhere and storage implementations has it empty.
Provider implementation should use ValueStoragePlugin abstract class as a base for all storage implementations. Plugin provides support for provider implementation methods. Plugin's methods should be implemented:
Initialize this plugin. Used at start time in ValueStoragePluginProvider.
public abstract void init(Properties props, ValueDataResourceHolder resources) throws RepositoryConfigurationException, IOException;
Open ValueIOChannel.Used in ValueStoragePluginProvider.getApplicableChannel(PropertyData, int) and getChannel(String)
public abstract ValueIOChannel openIOChannel() throws IOException;
Return true if this storage has the same storageId.
public abstract boolean isSame(String valueDataDescriptor);
Channel should implement ValueIOChannel interface. CRUD operation for Value Storage:
Read Property value.
ValueData read(String propertyId, int orderNumber, int maxBufferSize) throws IOException;
Add or update Property value.
void write(String propertyId, ValueData data) throws IOException;
Delete Property all values.
void delete(String propertyId) throws IOException;
Modification operations should be applied only when commiting. Rollback is required for data created cleanup.
Commit channel changes.
void commit() throws IOException;
Rollback channel changes.
void rollback() throws IOException;
Prepare Value content.
void prepare() throws IOException;
Commit Value content (two phases).
void twoPhaseCommit() throws IOException;
To implement Workspace data container, you need to do the following:
Read a bit about the contract.
Start a new implementation project pom.xml with org.exoplatform.jcr parent. It is not required, but will ease the development.
Update sources of JCR Core and read JavaDoc on org.exoplatform.services.jcr.storage.WorkspaceDataContainer and org.exoplatform.services.jcr.storage.WorkspaceStorageConnection interfaces. They are the main part for the implemenation.
Look at org.exoplatform.services.jcr.impl.dataflow.persistent.WorkspacePersistentDataManager sourcecode, check how data menager uses container and its connections (see in save() method)
Create WorkspaceStorageConnection dummy implementation class. It's freeform class, but to be close to the eXo JCR, check how to implement JDBC ( org.exoplatform.services.jcr.impl.storage.jdbc.JDBCStorageConnection. Take in account usage of ValueStoragePluginProvider in both implementations.Value storage is an useful option for production versions. But leave it to the end of implementation work.
Create the connection implementation unit tests to play TTD. (optional, but takes many benefits for the process)
Implement CRUD starting from the read to write etc. Test the methods by using the external implementation ways of data read/write in your backend.
When all methods of the connection done start WorkspaceDataContainer. Container class is very simple, it's like a factory for the connections only.
Care about container reuseConnection(WorkspaceStorageConnection) method logic. For some backends, it cab be same as openConnection(), but for some others, it's important to reuse physical backend connection, e.g. to be in the same transaction - see JDBC container.
It's almost ready to use in data manager. Start another test and go on.
When the container will be ready to run as JCR persistence storage (e.g. for this level testing), it should be configured in Repository configuration.
Assuming that our new implementation class name is org.project.jcr.impl.storage.MyWorkspaceDataContainer.
<repository-service default-repository="repository"> <repositories> <repository name="repository" system-workspace="production" default-workspace="production"> ............. <workspaces> <workspace name="production"> <container class="org.project.jcr.impl.storage.MyWorkspaceDataContainer"> <properties> <property name="propertyName1" value="propertyValue1" /> <property name="propertyName2" value="propertyValue2" /> ....... <property name="propertyNameN" value="propertyValueN" /> </properties> <value-storages> ....... </value-storages> </container>
Container can be configured by using set properties.
Value storages are pluggable to the container but if they are used, the container implementation should respect set of interfaces and external storage usage principles.
If the container has ValueStoragePluginProvider (e.g. via constructor), it's just a few methods to manipulate external Values data.
// get channel for ValueData write (add or update) ValueIOChannel channel = valueStorageProvider.getApplicableChannel(data, i); if (channel == null) { // write channel.write(data.getIdentifier(), vd); // obtain storage id, id can be used for linkage of external ValueData and PropertyData in main backend String storageId = channel.getStorageId(); } .... // delete all Property Values in external storage ValueIOChannel channel = valueStorageProvider.getChannel(storageId); channel.delete(propertyData.getIdentifier()); .... // read ValueData from external storage ValueIOChannel channel = valueStorageProvider.getChannel(storageId); ValueData vdata = channel.read(propertyData.getIdentifier(), orderNumber, maxBufferSize);
After a sequence of write and/or delete operations on the storage channel, the channel should be committed (or rolled back on an error). See ValueIOChannel.commit() and ValueIOChannel.rollback() and how those methods are used in JDBC container.
It is a special service for data removal from database. The section shortly describes the principles of work DBCleanerTool under all databases.
It is special service for data removal from database. The article shortly describes the principles of work DBCleanerTool under all databases
Code that invokes the methods of DBCleanService must have JCRRuntimePermissions.MANAGE_REPOSITORY_PERMISSION permission.
There are several methods of DBCleanService :
Table 1.52. API
public static void cleanWorkspaceData(WorkspaceEntry wsEntry) throws DBCleanException | Clean workspace data from database |
public static void cleanRepositoryData(RepositoryEntry rEntry) throws DBCleanException | Cleanup repository data from database |
public static DBCleanerTool getWorkspaceDBCleaner(Connection jdbcConn, WorkspaceEntry wsEntry) throws DBCleanException | Returns database cleaner of workspace. |
public static DBCleanerTool getRepositoryDBCleaner(Connection jdbcConn, RepositoryEntry rEntry) | Returns database cleaner of repository. Returns null in case of multi-db configuration. |
The cleaning is a part of restoring from backup and it is used in the following restore phases:
Table 1.53. Relations between restore phases and what is called on DBCleanerTool
clean | DBCleanerTool.clean(); |
restore | does nothing with DBCleanerTool |
commit | DBCleanerTool.commit(); |
rollback | DBCleanerTool.rollback(); |
Different approaches are used for database cleaning depending on database and JCR configuration.
Simple cleaning records from JCR table is used in case of single-db configuration.
Table 1.54. PostgreSQL/PostgrePlus, DB2 and MSSQL
clean() | removing all records from the database. Foreign key of JCR_SITEM table is also removed |
commit() | adding foreign key |
rollback() |
Table 1.55. Oracle, Sybase, HSQLDB, MySQL
clean() | removing all records from the database. Foreign key of JCR_SITEM table is also removed |
commit() | adding foreign key |
rollback() | adding foreign key |
Either removing or renaming JCR tables are used in case of mult-db configuration.
Table 1.56. PostgreSQL/PostgrePlus, DB2 and MSSQL
clean() | removing tables JCR_MVALUE, JCR_MREF, JCR_MITEM, initializing new tables without foreign key of JCR_MITEM table, adding root |
commit() | adding foreign key |
rollback() |
Table 1.57. Oracle, Sybase, HSQLDB, MySQL
clean() | renaming current tables, initializing new tables without foreign key of JCR_MITEM table, adding root node, removing indexes for some databases |
commit() | renaming tables, adding indexes |
rollback() | removing previously renamed tables, adding indexes, adding foreign key |
This section will show you possible ways of improving JCR
It is intended to GateIn Administrators and those who wants to use JCR features.
EC2 network: 1Gbit
Servers hardware:
7.5 GB memory |
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) |
850 GB instance storage (2×420 GB plus 10 GB root partition) |
64-bit platform |
I/O Performance: High |
API name: m1.large |
NFS and statistics (cacti snmp) server were located on one physical server.
JBoss AS configuration
JAVA_OPTS: -Dprogram.name=run.sh -server -Xms4g -Xmx4g
-XX:MaxPermSize=512m -Dorg.jboss.resolver.warning=true
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000 -XX:+UseParallelGC
-Djava.net.preferIPv4Stack=true
Benchmark test using webdav (Complex read/write load test (benchmark)) with 20K same file. To obtain per-operation results we have used custom output from the testscase threads to CSV file.
Read operation:
Warm-up iterations: 100 |
Run iterations: 2000 |
Background writing threads: 25 |
Reading threads: 225 |
Table 1.58.
Nodes count | tps | Responses >2s | Responses >4s |
---|---|---|---|
1 | 523 | 6.87% | 1.27% |
2 | 1754 | 0.64% | 0.08% |
3 | 2388 | 0.49% | 0.09% |
4 | 2706 | 0.46% | 0.1% |
Read operaion with more threads:
Warm-up iterations: 100 |
Run iterations: 2000 |
Background writing threads: 50 |
Reading threads: 450 |
You can use maxThreads
parameter to
increase maximum amount of threads that can be launched in AS instance.
This can improve performance if you need a high level of concurrency.
also you can use -XX:+UseParallelGC
java directory to use
parallel garbage collector.
Beware of setting maxThreads
too big,
this can cause OutOfMemoryError
. We've
got it with maxThreads=1250
on such machine:
7.5 GB memory |
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) |
850 GB instance storage (2×420 GB plus 10 GB root partition) |
64-bit platform |
I/O Performance: High |
API name: m1.large |
java -Xmx 4g |
Cache size
JCR-cluster implementation is built using JBoss Cache as distributed, replicated cache. But there is one particularity related to remove action in it. Speed of this operation depends on the actual size of cache. As many nodes are currently in cache as much time is needed to remove one particular node (subtree) from it.
Eviction
Manipulations with eviction wakeUpInterval
value doestn't affect on performance. Performance results with values
from 500 up to 3000 are approximately equal.
Transaction Timeout
Using short timeout for long transactions such as Export/Import,
removing huge subtree defined timeout may cause
TransactionTimeoutException
.
For performance it is better to have loadbalacer, DB server and shared NFS on different computers. If in some reasons you see that one node gets more load than others you can decrease this load using load value in load balancer.
JGroups configuration
It's recommended to use "multiplexer stack" feature present in JGroups. It is set by default in eXo JCR and offers higher performance in cluster, using less network connections also. If there are two or more clusters in your network, please check that they use different ports and different cluster names.
Write performance in cluster
Exo JCR implementation uses Lucene indexing engine to provide search capabilities. But Lucene brings some limitations for write operations: it can perform indexing only in one thread. Thats why write performance in cluster is not higher than in singleton environment. Data is indexed on coordinator node, so increasing write-load on cluster may lead to ReplicationTimeout exception. It occurs because writing threads queue in the indexer and under high load timeout for replication to coordinator will be exceeded.
Taking in consideration this fact, it is recommended to exceed
replTimeout
value in cache configurations in case
of high write-load.
Replication timeout
Some operations may take too much time. So if you get
ReplicationTimeoutException
try
increasing replication timeout:
<clustering mode="replication" clusterName="${jbosscache-cluster-name}"> ... <sync replTimeout="60000" /> </clustering>
value is set in miliseconds.
PermGen space size
If you intend to use Infinispan, you will have to increase the PermGen size to at least 256 Mo due to the latest versions of JGroups that are needed by Infinispan (please note that Infinspan is only dedicated to the community for now, no support will be provided). In case, you intend to use JBoss Cache, you can keep on using JGroups 2.6.13.GA which means that you don't need to increase the PermGen size.