JBoss.orgCommunity Documentation
Let's talk about indexing content in cluster.
For couple of reasons, we can't replicate index. It means that some data added and indexed on one cluster node will be replicated to another cluster node, but will not be indexed on that node.
So, how do the indexing works in cluster environment?
As, we can not index the same data on all nodes of cluster, we must index it on one node. Node, that can index data and do changes on lucene index, is called "coordinator". Coordinator-node is choosen automaticaly, so we do not need special configuration for coordinator.
But, how can another nodes save their changes to lucene index?
First of all, data is already saved and replicated to another cluster-nodes, so we need only deliver message like "we need to index this data" to coordinator. Thats why Jboss-cache is used.
All nodes of cluster writes messages into JBoss-cache but only coordinator takes those messages and makes changes Lucene index.
How do the search works in cluster environment?
Search engine do not works with indexer, coordinator, etc. Search needs only lucene index. But only one cluster node can change lucene index - asking you. Yes - lucene index is shared. So, all cluster nodes must be configured to use lucene index from shared directory.
A little bit about indexing process (no matter, cluster or not): Indexer does not write changes to FS lucene index immediately. At first, Indexer write the changes to Volatile index. If Volatile index size become 1Mb or more, it is flushed to FS. Also, there is timer, that flushes volatile index by timeout. Volatile index timeout is configured by "max-volatile-time" paremeter.
See more about Search Configuration.
Common scheme of Shared Index
Now, let's see what we need to run Search engine in cluster environment.
Shared directory for storing Lucene index (i.e. NFS);
Changes filter configured as org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter;
This filter ignore changes on non-coordinator nodes, and index changes on coordinator node.
configure JBoss-cache, course;
Configuration example:
<workspace name="ws"> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="shareddir/index/db1/ws" /> <property name="changesfilter-class" value="org.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" /> <property name="jbosscache-configuration" value="jbosscache-indexer.xml" /> <property name="jgroups-configuration" value="udp-mux.xml" /> <property name="jgroups-multiplexer-stack" value="true" /> <property name="jbosscache-cluster-name" value="JCR-cluster-indexer-ws" /> <property name="max-volatile-time" value="60" /> </properties> </query-handler> </workspace>
Table 24.1. Config properties description
Property name | Description |
---|---|
index-dir | path to index |
jbosscache-configuration | template of JBoss-cache configuration for all query-handlers in repository |
jgroups-configuration | jgroups-configuration is template configuration for all components (search, cache, locks) [Add link to document describing template configurations] |
jgroups-multiplexer-stack | [TODO about jgroups-multiplexer-stack - add link to JBoss doc] |
jbosscache-cluster-name | cluster name (must be unique) |
max-volatile-time | max time to live for Volatile Index |
JBoss-Cache template configuration for query handler.
jbosscache-indexer.xml
<?xml version="1.0" encoding="UTF-8"?> <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.1"> <locking useLockStriping="false" concurrencyLevel="50000" lockParentForChildInsertRemove="false" lockAcquisitionTimeout="20000" /> <!-- Configure the TransactionManager --> <transaction transactionManagerLookupClass="org.jboss.cache.transaction.JBossStandaloneJTAManagerLookup" /> <clustering mode="replication" clusterName="${jbosscache-cluster-name}"> <stateRetrieval timeout="20000" fetchInMemoryState="false" /> <jgroupsConfig multiplexerStack="jcr.stack" /> <sync /> </clustering> <!-- Eviction configuration --> <eviction wakeUpInterval="5000"> <default algorithmClass="org.jboss.cache.eviction.FIFOAlgorithm" eventQueueSize="1000000"> <property name="maxNodes" value="10000" /> <property name="minTimeToLive" value="60000" /> </default> </eviction> </jbosscache>
See more about template configurations here.