Chapter 3. Configuration

3.1. Directory configuration

Apache Lucene has a notion of Directory to store the index files. The Directory implementation can be customized, but Lucene comes bundled with a file system (FSDirectoryProvider) and a in memory (RAMDirectoryProvider) implementation. Hibernate Search has the notion of DirectoryProvider that handles the configuration and the initialization of the Lucene Directory.

Table 3.1. List of built-in Directory Providers

ClassDescriptionProperties
org.hibernate.search.store.FSDirectoryProviderFile system based directory. The directory used will be <indexBase>/< @Indexed.name >

indexBase : Base directory

indexName: override @Index.name (useful for sharded indexes)

org.hibernate.search.store.FSMasterDirectoryProvider

File system based directory. Like FSDirectoryProvider. It also copies the index to a source directory (aka copy directory) on a regular basis.

The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes).

Note that the copy is based on an incremental copy mechanism reducing the average copy time.

DirectoryProvider typically used on the master node in a JMS back end cluster.

DirectoryProvider typically used on slave nodes using a JMS back end.

indexBase: Base directory

indexName: override @Index.name (useful for sharded indexes)

sourceBase: Source (copy) base directory.

source: Source directory suffix (default to @Indexed.name). The actual source directory name being <sourceBase>/<source>

refresh: refresh period in second (the copy will take place every refresh seconds).

org.hibernate.search.store.FSSlaveDirectoryProvider

File system based directory. Like FSDirectoryProvider, but retrieves a master version (source) on a regular basis. To avoid locking and inconsistent search results, 2 local copies are kept.

The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes).

Note that the copy is based on an incremental copy mechanism reducing the average copy time.

DirectoryProvider typically used on slave nodes using a JMS back end.

indexBase: Base directory

indexName: override @Index.name (useful for sharded indexes)

sourceBase: Source (copy) base directory.

source: Source directory suffix (default to @Indexed.name). The actual source directory name being <sourceBase>/<source>

refresh: refresh period in second (the copy will take place every refresh seconds).

org.hibernate.search.store.RAMDirectoryProviderMemory based directory, the directory will be uniquely identified (in the same deployment unit) by the @Indexed.name elementnone

If the built-in directory providers does not fit your needs, you can write your own directory provider by implementing the org.hibernate.store.DirectoryProvider interface

Each indexed entity is associated to a Lucene index (an index can be shared by several entities but this is not usually the case). You can configure the index through properties prefixed by hibernate.search.indexname . Default properties inherited to all indexes can be defined using the prefix hibernate.search.default.

To define the directory provider of a given index, you use the hibernate.search.indexname.directory_provider

hibernate.search.default.directory_provider org.hibernate.search.store.FSDirectoryProvider
hibernate.search.default.indexBase=/usr/lucene/indexes

hibernate.search.Rules.directory_provider org.hibernate.search.store.RAMDirectoryProvider        

applied on

@Indexed(name="Status")
public class Status { ... }

@Indexed(name="Rules")
public class Rule { ... }

will create a file system directory in /usr/lucene/indexes/Status where the Status entities will be indexed, and use an in memory directory named Rules where Rule entities will be indexed.

You can easily define common rules like the directory provider and base directory, and overide those default later on on a per index basis.

Writing your own DirectoryProvider, you can utilize this configuration mechanism as well.

3.2. Index sharding

In some extreme cases involving huge indexes (in size), it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended until you reach significant index sizes and index update time are slowing down. The main drawback of index sharding is that searches will end up being slower since more files have to be opend for a single search. In other words don't do it until you have problems :)

Despite this strong warning, Hibernate Search allows you to index a given entity type into several sub indexes. Data is sharded into the different sub indexes thanks to an IndexShardingStrategy. By default, no sharding strategy is enabled, unless the number of shards is configured. To configure the number of shards use the following property

hibernate.search.<indexName>.sharding_strategy.nbr_of_shards 5

This will use 5 different shards.

The default sharding strategy, when shards are set up, splits the data according to the hash value of the id string representation (generated by the Field Bridge). This ensures a fairly balanced sharding. You can replace the strategy by implementing IndexShardingStrategy and by setting the following property

hibernate.search.<indexName>.sharding_strategy my.shardingstrategy.Implementation

Each shard has an independent directory provider configuration as described in Section 3.1, “Directory configuration”. The DirectoryProvider default name for the previous example are <indexName>.0 to <indexName>.4. In other words, each shard has the name of it's owning index followed by . (dot) and its index number.

hibernate.search.default.indexBase /usr/lucene/indexes

hibernate.search.Animal.sharding_strategy.nbr_of_shards 5
hibernate.search.Animal.directory_provider org.hibernate.search.store.FSDirectoryProvider
hibernate.search.Animal.0.indexName Animal00
hibernate.search.Animal.3.indexBase /usr/lucene/sharded
hibernate.search.Animal.3.indexName Animal03

This configuration uses the default id string hashing strategy and shards the Animal index into 5 subindexes. All subindexes are FSDirectoryProvider instances and the directory where each subindex is stored is as followed:

  • for subindex 0: /usr/lucene/indexes/Animal00 (shared indexBase but overridden indexName)

  • for subindex 1: /usr/lucene/indexes/Animal.1 (shared indexBase, default indexName)

  • for subindex 2: /usr/lucene/indexes/Animal.2 (shared indexBase, default indexName)

  • for subindex 3: /usr/lucene/shared/Animal03 (overridden indexBase, overridden indexName)

  • for subindex 4: /usr/lucene/indexes/Animal.4 (shared indexBase, default indexName)

3.3. Worker configuration

It is possible to refine how Hibernate Search interacts with Lucene through the worker configuration. The work can be exected to the Lucene directory or sent to a JMS queue for later processing. When processed to the Lucene directory, the work can be processed synchronously or asynchronously to the transaction commit.

You can define the worker configuration using the following properties

Table 3.2. worker configuration

PropertyDescription
hibernate.worker.backendOut of the box support for the Apache Lucene back end and the JMS back end. Default to lucene. Supports also jms.
hibernate.worker.executionSupports synchronous and asynchrounous execution. Default to sync. Supports also async.
hibernate.worker.thread_pool.sizeDefines the number of threads in the pool. useful only for asynchrounous execution. Default to 1.
hibernate.worker.buffer_queue.maxDefines the maximal number of work queue if the thread poll is starved. Useful only for asynchrounous execution. Default to infinite. If the limit is reached, the work is done by the main thread.
hibernate.worker.jndi.*Defines the JNDI properties to initiate the InitialContext (if needed). JNDI is only used by the JMS back end.
hibernate.worker.jms.connection_factoryMandatory for the JMS back end. Defines the JNDI name to lookup the JMS connection factory from (java:/ConnectionFactory by default in JBoss AS)
hibernate.worker.jms.queueMandatory for the JMS back end. Defines the JNDI name to lookup the JMS queue from. The queue will be used to post work messages.
hibernate.worker.batch_sizeDefines the maximum number of elements indexed before flushing the transaction-bound queue. Default to 0 (ie no limit). See Chapter 6, Manual indexing for more information.

3.4. JMS Master/Slave configuration

This section describes in greater detail how to configure the Master / Slaves Hibernate Search architecture.

3.4.1. Slave nodes

Every index update operation is sent to a JMS queue. Index quering operations are executed on a local index copy.

### slave configuration

## DirectoryProvider
# (remote) master location
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy

# local copy location
hibernate.search.default.indexBase = /Users/prod/lucenedirs

# refresh every half hour
hibernate.search.default.refresh = 1800

# appropriate directory provider
hibernate.search.default.directory_provider = org.hibernate.search.store.FSSlaveDirectoryProvider

## Backend configuration
hibernate.search.worker.backend = jms
hibernate.search.worker.jms.connection_factory = java:/ConnectionFactory
hibernate.search.worker.jms.queue = queue/hibernatesearch
#optional jndi configuration (check your JMS provider for more information)

## Optional asynchronous execution strategy
# org.hibernate.worker.execution = async
# org.hibernate.worker.thread_pool.size = 2
# org.hibernate.worker.buffer_queue.max = 50

A file system local copy is recommended for faster search results.

The refresh period should be higher that the expected time copy.

3.4.2. Master node

Every index update operation is taken from a JMS queue and executed. The master index(es) is(are) copied on a regular basis.

### master configuration

## DirectoryProvider
# (remote) master location where information is copied to
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy

# local master location
hibernate.search.default.indexBase = /Users/prod/lucenedirs

# refresh every half hour
hibernate.search.default.refresh = 1800

# appropriate directory provider
hibernate.search.default.directory_provider = org.hibernate.search.store.FSMasterDirectoryProvider

## Backend configuration
#Backend is the default lucene one

The refresh period should be higher that the expected time copy.

In addition to the Hibernate Search framework configuration, a Message Driven Bean should be written and set up to process index works queue through JMS.

@MessageDriven(activationConfig = {
      @ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"),
      @ActivationConfigProperty(propertyName="destination", propertyValue="queue/hiebrnatesearch"),
      @ActivationConfigProperty(propertyName="DLQMaxResent", propertyValue="1")
   } )
public class MDBSearchController extends AbstractJMSHibernateSearchController implements MessageListener {
    @PersistenceContext EntityManager em;
    
    //method retrieving the appropriate session
    protected Session getSession() {
        return (Session) em.getDelegate();
    }

    //potentially close the session opened in #getSession(), not needed here
    protected void cleanSessionIfNeeded(Session session) 
    }
}

This example inherit the abstract JMS controller class available and implements a JavaEE 5 MDB. This implementation is given as an example and, while most likely more complex, can be adjusted to make use of non Java EE Message Driven Beans. For more information about the getSession() and cleanSessionIfNeeded(), please check AbstractJMSHibernateSearchController's javadoc.

Note

Hibernate Search test suite makes use of JBoss Embedded to test the JMS integration. It allows the unit test to run both the MDB container and JBoss Messaging (JMS provider) in a standalone way (marketed by some as "lightweight").

3.5. Reader strategy configuration

The different reader strategies are described in Reader strategy. The default reader strategy is shared. This can be adjusted:

hibernate.search.reader.strategy = not-shared

Adding this property switch to the non shared strategy.

Or if you have a custom reader strategy:

hibernate.search.reader.strategy = my.corp.myapp.CustomReaderProvider

where my.corp.myapp.CustomReaderProvider is the custom strategy implementation

3.6. Enabling Hibernate Search and automatic indexing

3.6.1. Enabling Hibernate Search

Hibernate Search is enabled out of the box when using Hibernate Annotations or Hibernate EntityManager. If, for some reason you need to disable it, set hibernate.search.autoregister_listeners to false. Note that there is no performance runtime when the listeners are enabled while no entity is indexable.

To enable Hibernate Search in Hibernate Core, add the FullTextIndexEventListener for the three Hibernate events that occur after changes are executed to the database. Once again, such a configuration is not useful with Hibernate Annotations or Hibernate EntityManager.

<hibernate-configuration>
     <session-factory>
        ...
        <event type="post-update"/>
            <listener class="org.hibernate.search.event.FullTextIndexEventListener"/>
        </event>
        <event type="post-insert"/>
            <listener class="org.hibernate.search.event.FullTextIndexEventListener"/>
        </event>
        <event type="post-delete"/>
            <listener class="org.hibernate.search.event.FullTextIndexEventListener"/>
        </event>
    </session-factory>
</hibernate-configuration>

Be sure to add the appropriate jar files in your classpath. Check lib/README.TXT for the list of third party libraries. A typical installation on top of Hibernate Annotations will add:

  • hibernate-search.jar: the core engine

  • lucene-core-*.jar: Lucene core engine

3.6.1.1. Hibernate Core 3.2.6 and beyond

If you use Hibernate Core 3.2.6 and beyond, make sure to add three additional event listeners that cope with collection events

<hibernate-configuration>
     <session-factory>
        ...
        <event type="post-collection-recreate"/>
            <listener class="org.hibernate.search.event.FullTextIndexCollectionEventListener"/>
        </event>
        <event type="post-collection-remove"/>
            <listener class="org.hibernate.search.event.FullTextIndexCollectionEventListener"/>
        </event>
        <event type="post-collection-update"/>
            <listener class="org.hibernate.search.event.FullTextIndexCollectionEventListener"/>
        </event>
    </session-factory>
</hibernate-configuration>

Those additional event listeners have been introduced in Hibernate 3.2.6. note the FullTextIndexCollectionEventListener usage. You need to explicitly reference those event listeners unless you use Hibernate Annotations 3.3.1 and above.

3.6.2. Automatic indexing

By default, every time an object is inserted, updated or deleted through Hibernate, Hibernate Search updates the according Lucene index. It is sometimes desirable to disable that features if either your index is read-only or if index updates are done in a batch way (see Chapter 6, Manual indexing).

To disable event based indexing, set

hibernate.search.indexing_strategy manual

Note

In most case, the JMS backend provides the best of both world, a lightweight event based system keeps track of all changes in the system, and the heavyweight indexing process is done by a separate process or machine.

3.7. Tuning Lucene indexing performance

Hibernate Search allows you to tune the Lucene indexing performance by specifying a set of parameters which are passed through to underlying Lucene IndexWriter such as mergeFactor, maxMergeDocs and maxBufferedDocs. You can specify these parameters either as default values applying for all indexes or on a per index basis.

There are two sets of parameters allowing for different performance settings depending on the use case. During indexing operations triggered by database modifications, the following ones are used:

  • hibernate.search.[default|<indexname>].transaction.merge_factor

  • hibernate.search.[default|<indexname>].transaction.max_merge_docs

  • hibernate.search.[default|<indexname>].transaction.max_buffered_docs

When indexing occurs via FullTextSession.index() (see Chapter 6, Manual indexing), the following properties are used:

  • hibernate.search.[default|<indexname>].batch.merge_factor

  • hibernate.search.[default|<indexname>].batch.max_merge_docs

  • hibernate.search.[default|<indexname>].batch.max_buffered_docs

Unless the corresponding .batch property is explicitly set, the value will default to the .transaction property.

For more information about Lucene indexing performances, please refer to the Lucene documentation.

Table 3.3. List of indexing performance properties

PropertyDescriptionDefault Value
hibernate.search.[default|<indexname>].transaction.merge_factor

Controls segment merge frequency and size.

Determines how often segment indices are merged when insertion occurs. With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained. The value must no be lower than 2.

Used by Hibernate Search during index update operations as part of database modifications.

10
hibernate.search.[default|<indexname>].transaction.max_merge_docs

Defines the largest number of documents allowed in a segment.

Used by Hibernate Search during index update operations as part of database modifications.

Unlimited (Integer.MAX_VALUE)
hibernate.search.[default|<indexname>].transaction.max_buffered_docs

Controls the amount of documents buffered in memory during indexing. The bigger the more RAM is consumed.

Used by Hibernate Search during index update operations as part of database modifications.

10
hibernate.search.[default|<indexname>].batch.merge_factor

Controls segment merge frequency and size.

Determines how often segment indices are merged when insertion occurs. With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained. The value must no be lower than 2.

Used during indexing via FullTextSession.index()

10
hibernate.search.[default|<indexname>].batch.max_merge_docs

Defines the largest number of documents allowed in a segment.

Used during indexing via FullTextSession.index()

Unlimited (Integer.MAX_VALUE)
hibernate.search.[default|<indexname>].batch.max_buffered_docs

Controls the amount of documents buffered in memory during indexing. The bigger the more RAM is consumed.

Used during indexing via FullTextSession.index()

10