This module adds querying capabilities to Infinispan. It uses Hibernate Search and Apache Lucene to index and search objects in the cache. It allows users to obtain objects within the cache without needing to know the keys to each object that they want to obtain, so you can now search your objects basing on some of it's properties, for example to retrieve all red cars (exact metadata match), or all books about a specific topic (full text search and relevance scoring).
Indexing must be enabled in the configuration (as explained in the next paragraph); then you interact with the Search capabilities via a SearchManager which exposes all needed functionality.
With Infinispan 4 there was an experimental preview of this technology, so the API changed in version 5: documentation for version 4 is [here].
We're going to store Book instances in Infinispan; each Book will be defined as in the following example; we have to choose which properties are indexed, and for each property we can optionally choose advanced indexing options using the annotations defined in the Hibernate Search project.
Now assuming we stored several Book instances in our Infinispan Cache, we can search them for any matching field as in the following example.
A Lucene Query is often created by parsing a query in text format such as "title:infinispan AND authors.name:sanne", or by using the query builder provided by Hibernate Search.
This barely scratches the surface of all what is possible to do: see the Hibernate Search reference documentation to learn about sorting, numeric fields, declarative filters, caching filters, complex object graph indexing, custom types and the powerfull faceting search API.
Using @DocumentId to mark a field as identifier is not supported; instead all @Indexed objects should also be marked with @ProvidedId : Infinispan will provide the identifier, which is the key used to store each value in the cache.
To enable indexing via XML, you need to add the <indexing ... /> element to your cache configuration, and optionally pass additional properties to the embedded Hibernate Search engine:
Index management is currently controlled by the Configuration.setIndexLocalOnly() setter, or the <indexing indexLocalOnly="true" /> XML element. If you set this to true, only modifications made locally on each node are considered in indexing. Otherwise, remote changes are considered too.
Regarding actually configuring a Lucene directory, please refer to the Hibernate Search documentation on how to pass in the appropriate Lucene configuration via the Properties object passed to QueryHelper.
In local mode, you may use any Lucene Directory implementation. And it doesn't matter what you set indexLocalOnly to.
In replication mode, each node can have it's own local copy of the index. So indexes can either be stored locally on each node (RAMDirectory, FSDirectory, etc) but you need to set indexLocalOnly to false , so that each node will apply needed updates it receives from other nodes in addition to the updates started locally. Any Directory implementation can be used, but you have to make sure that when a new node is started it receives an up to date copy of the index; typically rsync is well suited for this task, but being an external operation you might end up with a slightly out-of-sync index, especially if updates are very frequent.
Alternately, if you use some form of shared storage for indexes (see Sharing the Index ), you then have to set indexLocalOnly to true so that each node will apply only the changes originated locally; in this case there's no risk in having an out-of-sync index, but to avoid write contention on the index you should make sure that a single node is "in charge" of updating the index. Again, the Hibernate Search reference documentation describes means to use a JMS queue or JGroups to send indexing tasks to a master node.
The diagram below shows a replicated deployment, in which each node has a local index.
For these 2 cache modes, you need to use a shared index and set indexLocalOnly to true. In future, we will be able to deal with truly distributed queries, but that would be after ISPN-200.
The diagram below shows a deployment with a shared index. Note that while not mandatory, a shared index can be used for replicated (vs. distributed) caches as well.
The most simple way to share an index is to use some form of shared storage for the indexes, like an FSDirectory on a shared disk; however this form is problematic as the FSDirectory relies on specific locking semantics which are often incompletely implemented on network filesystems, or not reliable enough; if you go for this approach make sure to search for potential problems on the Lucene mailing lists for other experiences and workarounds. Good luck, test well.
There are many alternative Directory implementations you can find, one of the most suited approaches when working with Infinispan is of course to store the index in an Infinispan cache: have a look at the InfinispanDirectoryProvider, as all Infinispan based layers it can be combined with persistent CacheLoaders to keep the index on a shared filesystem withouth the locking issues, or alternatively in a database, cloud storage, or any other CacheLoader implementation; you could backup the index in the same store used to backup your values.
For full documentation on clustering the Lucene engine, refer to the Hibernate Search documentation to properly configure it clustered.
Again the configuration details are in the Hibernate Search reference, in particular in the infinispan-directories section. This backend will by default start a secondary Infinispan CacheManager, and optionally take another Infinispan configuration file: don't reuse the same configuration or you will start grids recursively!
It is currently not possible to share the same CacheManager.