Infinispan is including a highly scalable distributed Apache Lucene Directory implementation.
This directory closely mimicks the same semantics of the traditional filesystem and RAM-based directories, being able to work as a drop-in replacement for existing applications using Lucene and providing reliable index sharing and other features of Infinispan like node autodiscovery, automatic failover and rebalancing, optionally transactions, and can be backed by traditional storage solutions as filesystem, databases or cloud store engines.
The implementation extends Lucene's org.apache.lucene.store.Directory so it can be used to store the index in a cluster-wide shared memory, making it easy to distribute the index. Compared to rsync-based replication this solution is suited for use cases in which your application makes frequent changes to the index and you need them to be quickly distributed to all nodes, having configurable consistency levels, synchronicity and guarantees, total elasticity and autodiscovery; also changes applied to the index can optionally participate in a JTA transaction; since version 5 supporting XA transactions with recovery.
Two different LockFactory implementations are provided to guarantee only one IndexWriter at a time will make changes to the index, again implementing the same semantics as when opening an index on a local filesystem. As with other Lucene Directories, you can override the LockFactory if you prefer to use an alternative implementation.
Issue tracker: https://jira.jboss.org/browse/ISPN/component/12312732
Source code: http://www.jboss.org/infinispan/sourcecode.html
Current version was developed and compiled against Lucene 3.5.0, and also tested to work with Lucene versions from 3.0.x to 3.4.0, version 2.9.x, and the older 2.4.1.
To create a Directory instance:
New nodes can be added or removed dynamically, making the service administration very easy and also suited for cloud environments: it's simple to react to load spikes, as adding more memory and CPU power to the search system is done by just starting more nodes.
As when using an IndexWriter on a filesystem based Directory , even on the clustered edition only one IndexWriter can be opened across the whole cluster. Hibernate Search, which includes integration with this Lucene Directory since version 3.3, sends index change requests across a JMS queue, or a JGroups channel. Other valid approaches are to proxy the remote IndexWriter or just design your application in such a way that only one node attempts to write it. Reading (searching) is of course possible in parallel, from any number of threads on each node; changes applied to the single IndexWriter are affecting results of all threads on all nodes in a very short time.
This works with local only configurations and also with any clustering mode supported by Infinispan. A transaction manager is not mandatory, while batching needs to be enabled. An example configuration:
There is a simple command-line demo of it's capabilities distributed with Infinispan under demos/lucene-directory; make sure you grab the "Binaries, server and demos" package from download page, which contains all demos.
Start several instances, then try adding text in one instance and searching for it on the other. The configuration is not tuned at all, but should work out-of-the box without any changes. If your network interface has multicast enabled, it will cluster across the local network with other instances of the demo.
All you need is org.infinispan:infinispan-lucene-directory :
Using a CacheLoader you can have the index content backed up to a permanent storage; you can use a shared store for all nodes or one per node, see CacheLoaders for more details.
When using a CacheLoader to store a Lucene index, to get best write performance you would need to configure the CacheLoader with async=true.
It might be useful to store the Lucene index in a relational database; this would be very slow but Infinispan can act as an efficient cache between the application and the JDBC interface, making this configuration useful in both clustered and non-clustered configurations.
When storing indexes in a JDBC database, it's suggested to use the , which will need this attribute: