JBoss.orgCommunity Documentation
JCR offers multiple indexing strategies. They include both for standalone and clustered environments using the advantages of running in a single JVM or doing the best to use all resources available in cluster. JCR uses Lucene library as underlying search and indexing engine, but it has several limitations that greatly reduce possibilities and limits the usage of cluster advantages. That's why eXo JCR offers three strategies that are suitable for it's own usecases. They are standalone, clustered with shared index and clustered with local indexes. Each one has it's pros and cons.
Stanadlone strategy provides a stack of indexes to achieve greater performance within single JVM.
It combines in-memory buffer index directory with delayed file-system flushing. This index is called "Volatile" and it is invoked in searches also. Within some conditions volatile index is flushed to the persistent storage (file system) as new index directory. This allows to achieve great results for write operations.
Clustered implementation with local indexes is built upon same strategy with volatile in-memory index buffer along with delayed flushing on persistent storage.
As this implementation designed for clustered environment it has additional mechanisms for data delivery within cluster. Actual text extraction jobs done on the same node that does content operations (i.e. write operation). Prepared "documents" (Lucene term that means block of data ready for indexing) are replicated withing cluster nodes and processed by local indexes. So each cluster instance has the same index content. When new node joins the cluster it has no initial index, so it must be created. There are some supported ways of doing this operation. The simplest is to simply copy the index manually but this is not intended for use. If no initial index found JCR uses automated sceneries. They are controlled via configuration (see "index-recovery-mode" parameter) offering full re-indexing from database or copying from another cluster node.
For some reasons having a multiple index copies on each instance can be costly. So shared index can be used instead (see diagram below).
This indexing strategy combines advantages of in-memory index along with shared persistent index offering "near" real time search capabilities. This means that newly added content is accessible via search practically immediately. This strategy allows nodes to index data in their own volatile (in-memory) indexes, but persistent indexes are managed by single "coordinator" node only. Each cluster instance has a read access for shared index to perform queries combining search results found in own in-memory index also. Take in account that shared folder must be configured in your system environment (i.e. mounted NFS folder). But this strategy in some extremely rare cases can have a bit different volatile indexes within cluster instances for a while. In a few seconds they will be up2date.
See more about Search Configuration.
Configuration example:
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="shareddir/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler> </workspace>
Table 24.1. Config properties description
Property name | Description |
---|---|
index-dir | path to index |
changesfilter-class | template of JBoss-cache configuration for all query-handlers in repository |
jbosscache-configuration | template of JBoss-cache configuration for all query-handlers in repository |
jgroups-configuration | jgroups-configuration is template configuration for all components (search, cache, locks) [Add link to document describing template configurations] |
jgroups-multiplexer-stack | [TODO about jgroups-multiplexer-stack - add link to JBoss doc] |
jbosscache-cluster-name | cluster name (must be unique) |
max-volatile-time | max time to live for Volatile Index |
rdbms-reindexing | indicate that need to use rdbms reindexing mechanism if possible, the default value is true |
reindexing-page-size | maximum amount of nodes which can be retrieved from storage for re-indexing purpose, the default value is 100 |
index-recovery-mode | If the parameter has been set to
from-indexing , so a full indexing will be
automatically launched (default behavior), if the parameter
has been set to from-coordinator , the index
will be retrieved from coordinator |
When running JCR in standalone usually standalone indexing is used also. Such parameters as "changesfilter-class", "jgroups-configuration" and all the "jbosscache-*" must be skipped and not defined. Like the configuration below.
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="shareddir/index/db1/ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler> </workspace>
For both cluster-ready implementations JBoss Cache, JGroups and Changes Filter values must be defined. Shared index requires some kind of remote or shared file system to be attached in a system (i.e. NFS, SMB or etc). Indexing directory ("indexDir" value) must point to it. Setting "changesfilter-class" to "org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" will enable shared index implementation.
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler> </workspace>
In order to use cluster-ready strategy based on local indexes, when each node has own copy of index on local file system, the following configuration must be applied. Indexing directory must point to any folder on local file system and "changesfilter-class" must be set to "org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter".
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="/mnt/nfs_drive/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> <property name="rdbms-reindexing" value="true" /> <property name="reindexing-page-size" value="1000" /> <property name="index-recovery-mode" value="from-coordinator" /> </properties> </query-handler> </workspace>
JBoss-Cache template configuration for query handler is about the same for both clustered strategies.
jbosscache-indexer.xml
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <!-- Configure the TransactionManager --> <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <jgroupsConfig multiplexerStack="jcr.stack" /> <sync /> </clustering> <!-- Eviction configuration --> <eviction wakeUpInterval="5000"> <default algorithmClass="org.jboss.cache.eviction.FIFOAlgorithm" eventQueueSize="1000000"> <property name="maxNodes" value="10000" /> <property name="minTimeToLive" value="60000" /> </default> </eviction> </jbosscache>
See more about template configurations here.