Chapter 24. QueryHandler configuration

JCR offers multiple indexing strategies. They include both for standalone and clustered environments using the advantages of running in a single JVM or doing the best to use all resources available in cluster. JCR uses Lucene library as underlying search and indexing engine, but it has several limitations that greatly reduce possibilities and limits the usage of cluster advantages. That's why eXo JCR offers three strategies that are suitable for it's own usecases. They are standalone, clustered with shared index and clustered with local indexes. Each one has it's pros and cons.

Stanadlone strategy provides a stack of indexes to achieve greater performance within single JVM.

It combines in-memory buffer index directory with delayed file-system flushing. This index is called "Volatile" and it is invoked in searches also. Within some conditions volatile index is flushed to the persistent storage (file system) as new index directory. This allows to achieve great results for write operations.

Clustered implementation with local indexes is built upon same strategy with volatile in-memory index buffer along with delayed flushing on persistent storage.

As this implementation designed for clustered environment it has additional mechanisms for data delivery within cluster. Actual text extraction jobs done on the same node that does content operations (i.e. write operation). Prepared "documents" (Lucene term that means block of data ready for indexing) are replicated withing cluster nodes and processed by local indexes. So each cluster instance has the same index content. When new node joins the cluster it has no initial index, so it must be created. There are some supported ways of doing this operation. The simplest is to simply copy the index manually but this is not intended for use. If no initial index found JCR uses automated sceneries. They are controlled via configuration (see "index-recovery-mode" parameter) offering full re-indexing from database or copying from another cluster node.

For some reasons having a multiple index copies on each instance can be costly. So shared index can be used instead (see diagram below).

This indexing strategy combines advantages of in-memory index along with shared persistent index offering "near" real time search capabilities. This means that newly added content is accessible via search practically immediately. This strategy allows nodes to index data in their own volatile (in-memory) indexes, but persistent indexes are managed by single "coordinator" node only. Each cluster instance has a read access for shared index to perform queries combining search results found in own in-memory index also. Take in account that shared folder must be configured in your system environment (i.e. mounted NFS folder). But this strategy in some extremely rare cases can have a bit different volatile indexes within cluster instances for a while. In a few seconds they will be up2date.

See more about Search Configuration.

24.2. Configuration

24.2.1. Query-handler configuration overview

Configuration example:

<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="shareddir/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

Table 24.1. Config properties description

Property name	Description
index-dir	path to index
changesfilter-class	template of JBoss-cache configuration for all query-handlers in repository
jbosscache-configuration	template of JBoss-cache configuration for all query-handlers in repository
jgroups-configuration	jgroups-configuration is template configuration for all components (search, cache, locks) [Add link to document describing template configurations]
jgroups-multiplexer-stack	[TODO about jgroups-multiplexer-stack - add link to JBoss doc]
jbosscache-cluster-name	cluster name (must be unique)
max-volatile-time	max time to live for Volatile Index
rdbms-reindexing	indicate that need to use rdbms reindexing mechanism if possible, the default value is true
reindexing-page-size	maximum amount of nodes which can be retrieved from storage for re-indexing purpose, the default value is 100
index-recovery-mode	If the parameter has been set to `from-indexing`, so a full indexing will be automatically launched (default behavior), if the parameter has been set to `from-coordinator`, the index will be retrieved from coordinator

24.2.2. Standalone strategy

When running JCR in standalone usually standalone indexing is used also. Such parameters as "changesfilter-class", "jgroups-configuration" and all the "jbosscache-*" must be skipped and not defined. Like the configuration below.

<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="shareddir/index/db1/ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

24.2.3. Cluster-ready indexing strategies

For both cluster-ready implementations JBoss Cache, JGroups and Changes Filter values must be defined. Shared index requires some kind of remote or shared file system to be attached in a system (i.e. NFS, SMB or etc). Indexing directory ("indexDir" value) must point to it. Setting "changesfilter-class" to "org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" will enable shared index implementation.

<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

In order to use cluster-ready strategy based on local indexes, when each node has own copy of index on local file system, the following configuration must be applied. Indexing directory must point to any folder on local file system and "changesfilter-class" must be set to "org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter".

<workspace name="ws">
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" />
         <property name="changesfilter-class"
            value="org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter" />
         <property name="jbosscache-configuration" value="jbosscache-indexer.xml" />
         <property name="jgroups-configuration" value="udp-mux.xml" />
         <property name="jgroups-multiplexer-stack" value="true" />
         <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" />
         <property name="max-volatile-time" value="60" />
         <property name="rdbms-reindexing" value="true" />
         <property name="reindexing-page-size" value="1000" />
         <property name="index-recovery-mode" value="from-coordinator" />
      </properties>
   </query-handler>
</workspace>

24.2.4. JBoss-Cache template configuration

JBoss-Cache template configuration for query handler is about the same for both clustered strategies.

jbosscache-indexer.xml

<?xml version="1.0" encoding="UTF-8"?>
<jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1">

   <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false"
      lockAcquisitionTimeout="20000" />
   <!-- Configure the TransactionManager -->
   <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup" />

   <clustering mode="replication" clusterName="${jbosscache-cluster-name}">
      <stateRetrieval timeout="20000" fetchInMemoryState="false" />
      <jgroupsConfig multiplexerStack="jcr.stack" />
      <sync />
   </clustering>
   <!-- Eviction configuration -->
   <eviction wakeUpInterval="5000">
      <default algorithmClass="org.jboss.cache.eviction.FIFOAlgorithm" eventQueueSize="1000000">
         <property name="maxNodes" value="10000" />
         <property name="minTimeToLive" value="60000" />
      </default>
   </eviction>

</jbosscache>

See more about template configurations here.

Chapter 24. QueryHandler configuration

24.1. Indexing in clustered environment

24.2. Configuration

24.2.1. Query-handler configuration overview

24.2.2. Standalone strategy

24.2.3. Cluster-ready indexing strategies

24.2.4. JBoss-Cache template configuration