Chapter 7. Index Optimization

7.1. Automatic optimization
7.2. Manual optimization
7.3. Adjusting optimization

This section explains some low level tricks to keep your indexes at peak performance. We cover some Lucene details which in most cases you don't have to know about: Hibernate Search will handle these operations optimally and transparently in most cases without the need for further configuration. Still, it is good to know that there are ways to configure the behaviour, if the need arises.

The index is physically stored in several smaller segments. Each segment is immutable and represents a generation of index writes. Index segments are periodically compacted, both to merge smaller segments and to remove stale entries; this merging process happens constantly in the background and can be tuned with the options specified in Section 3.6, “Tuning Lucene indexing performance”, but you can also define policies to fully run index optimizations when it is most suited for your specific workload.

With older versions of Lucene it was important to frequently optimize the index to maintain good performance, but with current Lucene versions this doesn't apply anymore. The benefit of explicit optimization is very low, and in certain cases even counter-productive. During an explicit optimization the whole index is processed and rewritten inflicting a significant performance cost. Optimization is for this reason a double-edged sword.

Another reason to avoid optimizing the index too often is that an optimization will, as a side effect, invalidate cached filters and field caches and internal buffers need to be refreshed.

Tip

Optimizing the index is often not needed, does not benefit write (update) performance at all, and is a slow operation: make sure you need it before activating it.

Of course optimizing the index does not only present drawbacks: after the optimization process is completed and new IndexReader instances have loaded their buffers, queries will perform at peak performance and you will have reclaimed all disk space potentially used by stale entries.

It is recommended to not schedule any optimization, but if you wish to perform it periodically you should run it:

on an idle system or when the searches are less frequent
after a lot of index modifications

When using a MassIndexer (see Section 6.3.2, “Using a MassIndexer”) it will optimize involved indexes by default at the start and at the end of processing; you can change this behavior by using MassIndexer.optimizeAfterPurge and MassIndexer.optimizeOnFinish respectively. The initial optimization is actually very cheap as it is performed on an emtpy index: its purpose is to release the storage space occupied by the old index.

7.1. Automatic optimization

While in most cases this is not needed, Hibernate Search can automatically optimize an index after:

a certain amount of write operations
or after a certain amount of transactions

The configuration for automatic index optimization can be defined on a global level or per index:

Example 7.1. Defining automatic optimization parameters

hibernate.search.default.optimizer.operation_limit.max = 1000
hibernate.search.default.optimizer.transaction_limit.max = 100
hibernate.search.Animal.optimizer.transaction_limit.max = 50

With the above example an optimization will be triggered to the Animal index as soon as either:

the number of additions and deletions reaches 1000
the number of transactions reaches 50 (hibernate.search.Animal.optimizer.transaction_limit.max having priority over hibernate.search.default.optimizer.transaction_limit.max)

If none of these parameters are defined, no optimization is processed automatically.

The default implementation of OptimizerStrategy can be overriden by implementing org.hibernate.search.store.optimization.OptimizerStrategy and setting the optimizer.implementation property to the fully qualified name of your implementation. This implementation must implement the interface, be a public class and have a public constructor taking no arguments.

Example 7.2. Loading a custom OptimizerStrategy

hibernate.search.default.optimizer.implementation = com.acme.worlddomination.SmartOptimizer
hibernate.search.default.optimizer.SomeOption = CustomConfigurationValue
hibernate.search.humans.optimizer.implementation = default

The keyword default can be used to select the Hibernate Search default implementation; all properties after the .optimizer key separator will be passed to the implementation's initialize method at start.

7.2. Manual optimization

You can programmatically optimize (defragment) a Lucene index from Hibernate Search through the SearchFactory:

Example 7.3. Programmatic index optimization

FullTextSession fullTextSession = Search.getFullTextSession(regularSession);

SearchFactory searchFactory = fullTextSession.getSearchFactory();


searchFactory.optimize(Order.class);

// or

searchFactory.optimize();

The first example optimizes the Lucene index holding Orders; the second, optimizes all indexes.

Note

searchFactory.optimize() has no effect on a JMS or JGroups backend: you must apply the optimize operation on the Master node.

7.3. Adjusting optimization

The Lucene index is constantly being merged in the background to keep a good balance between write and read performance; in a sense this is a form of background optimization which is always applied.

The following match attributes of Lucene's IndexWriter and are commonly used to tune how often merging occurs and how aggressive it is applied. They are exposed by Hibernate Search via:

hibernate.search.[default|<indexname>].indexwriter.max_buffered_docs
hibernate.search.[default|<indexname>].indexwriter.max_merge_docs
hibernate.search.[default|<indexname>].indexwriter.merge_factor
hibernate.search.[default|<indexname>].indexwriter.ram_buffer_size
hibernate.search.[default|<indexname>].indexwriter.term_index_interval

See Section 3.6, “Tuning Lucene indexing performance” for a description of these properties.