Infinispan as a Directory for Lucene

Infinispan is including a highly scalable distributed Apache Lucene Directory implementation.

This directory closely mimicks the same semantics of the traditional filesystem and RAM-based directories, being able to work as a drop-in replacement for existing applications using Lucene and providing reliable index sharing and other features of Infinispan like node autodiscovery, automatic failover and rebalancing, optionally transactions, and can be backed by traditional storage solutions as filesystem, databases or cloud store engines.

The implementation extends Lucene's org.apache.lucene.store.Directory so it can be used to store the index in a cluster-wide shared memory, making it easy to distribute the index. Compared to rsync-based replication this solution is suited for use cases in which your application makes frequent changes to the index and you need them to be quickly distributed to all nodes, having configurable consistency levels, synchronicity and guarantees, total elasticity and autodiscovery; also changes applied to the index can optionally participate in a JTA transaction; since version 5 supporting XA transactions with recovery.

Two different LockFactory implementations are provided to guarantee only one IndexWriter at a time will make changes to the index, again implementing the same semantics as when opening an index on a local filesystem. As with other Lucene Directories, you can override the LockFactory if you prefer to use an alternative implementation.

Additional Links

Javadoc: http://docs.jboss.org/infinispan/5.0/apidocs/org/infinispan/lucene/InfinispanDirectory.html
Issue tracker: https://jira.jboss.org/browse/ISPN/component/12312732
Source code: http://www.jboss.org/infinispan/sourcecode.html

Lucene compatibility

Current version was developed against Lucene 3.2.0, and also tested to work with Lucene 3.0.x, 3.1.0, 2.9.x the older 2.4.1 and the latest Lucene release 3.3.0.

How to use it

To create a Directory instance:

import org.apache.lucene.store.Directory;
import org.infinispan.lucene.InfinispanDirectory;
import org.infinispan.Cache;

Cache cache = // create an Infinispan cache, configured as you like
Directory indexDir = new InfinispanDirectory(cache, "indexName");

The indexName is a unique key to identify your index. It takes the same role as the path did on filesystem based indexes: you can create several different indexes giving them different names. When you use the same indexName in another instance connected to the same network (or instantiated on the same machine, useful for testing) they will join, form a cluster and share all content.

New nodes can be added or removed dynamically, making the service administration very easy and also suited for cloud environments: it's simple to react to load spikes, as adding more memory and CPU power to the search system is done by just starting more nodes.

Limitations

As when using an IndexWriter on a filesystem based Directory , even on the clustered edition only one IndexWriter can be opened across the whole cluster. Hibernate Search, which includes integration with this Lucene Directory since version 3.3, sends index change requests across a JMS queue, or a JGroups channel. Other valid approaches are to proxy the remote IndexWriter or just design your application in such a way that only one node attempts to write it. Reading (searching) is of course possible in parallel, from any number of threads on each node; changes applied to the single IndexWriter are affecting results of all threads on all nodes in a very short time.

Configuration

This works with local only configurations and also with any clustering mode supported by Infinispan. A transaction manager is not mandatory, while batching needs to be enabled. An example configuration:

public static Configuration createTestConfiguration() {
      Configuration c = new Configuration();
      c.setCacheMode(Configuration.CacheMode.DIST_SYNC);
      c.setInvocationBatchingEnabled(true);
      return c;
}

As better explained in the javadocs of org.infinispan.lucene.InfinispanDirectory , it's possible for it to use more than a single cache, using specific configurations for different purposes. When using readlocks, make sure to not enable transactions on this cache.

Demo

There is a simple command-line demo of it's capabilities distributed with Infinispan under demos/lucene-directory; make sure you grab the *-all.zip distribution from sourceforge, which contains all demos.

Start several instances, then try adding text in one instance and searching for it on the other. The configuration is not tuned at all, but should work out-of-the box without any changes. If your network interface has multicast enabled, it will cluster across the local network with other instances of the demo.

Maven dependencies

All you need is org.infinispan:infinispan-lucene-directory :

<dependencies>
   <dependency>
      <groupId>org.infinispan</groupId>
      <artifactId>infinispan-lucene-directory</artifactId>
      <version>5.0.0.FINAL</version>
   </dependency>
</dependencies>

In early versions the Infinispan Lucene Directory was needing a transaction manager, this is no longer needed but it's still supported to wrap all changes you make to the index in a transaction.

JBoss Community Archive (Read Only)

Infinispan 5.0