Chapter 2. Architecture

2.1. Overview

Hibernate Search consists of an indexing component as well as an index search component. Both are backed by Apache Lucene.

Each time an entity is inserted, updated or removed in/from the database, Hibernate Search keeps track of this event (through the Hibernate event system) and schedules an index update. All these updates are handled without you having to interact with the Apache Lucene APIs directly (see Section 3.1, “Enabling Hibernate Search and automatic indexing”). Instead, the interaction with the underlying Lucene indexes is handled via so called IndexManagers.

Each Lucene index is managed by one index manager which is uniquely identified by name. In most cases there is also a one to one relationship between an indexed entity and a single IndexManager. The exceptions are the use cases of index sharding and index sharing. The former can be applied when the index for a single entity becomes too big and indexing operations are slowing down the application. In this case a single entity is indexed into multiple indexes each with its own index manager (see Section 10.5, “Sharding indexes”). The latter, index sharing, is the ability to index multiple entities into the same Lucene index (see Section 10.6, “Sharing indexes”).

The index manager abstracts from the specific index configuration. In the case of the default index manager this includes details about the selected backend, the configured reader strategy and the chosen DirectoryProvider. These components will be discussed in greater detail later on. It is recommended that you start with the default index manager which uses different Lucene Directory types to manage the indexes (see Section 3.3, “Directory configuration”). You can, however, also provide your own IndexManager implementation (see Section 3.2, “Configuring the IndexManager”).

Once the index is created, you can search for entities and return lists of managed entities saving you the tedious object to Lucene Document mapping. The same persistence context is shared between Hibernate and Hibernate Search. As a matter of fact, the FullTextSession is built on top of the Hibernate Session so that the application code can use the unified org.hibernate.Query or javax.persistence.Query APIs exactly the same way a HQL, JPA-QL or native query would do.

To be more efficient Hibernate Search batches the write interactions with the Lucene index. This batching is the responsibility of the Worker. There are currently two types of batching. Outside a transaction, the index update operation is executed right after the actual database operation. This is really a no batching setup. In the case of an ongoing transaction, the index update operation is scheduled for the transaction commit phase and discarded in case of transaction rollback. The batching scope is the transaction. There are two immediate benefits:

Performance: Lucene indexing works better when operation are executed in batch.
ACIDity: The work executed has the same scoping as the one executed by the database transaction and is executed if and only if the transaction is committed. This is not ACID in the strict sense of it, but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from the source at any time.

You can think of those two batch modes (no scope vs transactional) as the equivalent of the (infamous) autocommit vs transactional behavior. From a performance perspective, the in transaction mode is recommended. The scoping choice is made transparently. Hibernate Search detects the presence of a transaction and adjust the scoping (see Section 3.4, “Worker configuration”).

Tip

It is recommended - for both your database and Hibernate Search - to execute your operations in a transaction, be it JDBC or JTA.

Note

Hibernate Search works perfectly fine in the Hibernate / EntityManager long conversation pattern aka. atomic conversation.

2.2. Back end

Hibernate Search offers the ability to let the batched work being processed by different back ends. Several back ends are provided out of the box and you have the option to plugin your own. It is important to understand that in this context back end encompasses more than just the configuration option hibernate.search.default.worker.backend. This property just specifies a implementation of the BackendQueueProcessor interface which is a part of a back end configuration. In most cases, however, additional configuration settings are needed to successfully configure a specific backend setup, like for example the JMS back end.

2.2.1. Lucene

In this mode, all index update operations applied on a given node (JVM) will be executed to the Lucene directories (through the directory providers) by the same node. This mode is typically used in non clustered environment or in clustered environments where the directory store is shared.

This mode targets non clustered applications, or clustered applications where the Directory is taking care of the locking strategy.

The main advantage is simplicity and immediate visibility of the changes in Lucene queries (a requirement in some applications).

An alternative back end viable for non-clustered and non-shared index configurations is the near- real-time backend.

2.2.2. JMS

All index update operations applied on a given node are sent to a JMS queue. A unique reader will then process the queue and update the master index. The master index is then replicated on a regular basis to the slave copies. This is known as the master/slaves pattern. The master is the sole responsible for updating the Lucene index. The slaves can accept read as well as write operations. However, while they process the read operations on their local index copy, they will delegate the update operations to the master.

This mode targets clustered environments where throughput is critical, and index update delays are affordable. Reliability is ensured by the JMS provider and by having the slaves working on a local copy of the index.

The JMS integration can be transactional. With this backend (and currently only this backend) you can have Hibernate Search send the indexing work into the queue within the same transaction applying changes to the relational database. This options requires you to use an XA transaction.

By default this backend’s transactional capabilities are disabled: messages will be enqueued as a post-transaction event, consistently with other backends. To change this configuration see also Section 3.4, “Worker configuration”.

2.2.3. JGroups

The JGroups based back end works similar to the JMS one and is designed after the same master/slave pattern. However, instead of JMS the JGroups toolkit is used as a replication mechanism. This back end can be used as an alternative to JMS when response time is critical, but i.e. JNDI service is not available.

Note that while JMS can usually be configured to use persistent queues, JGroups talks directly to other nodes over network. Message delivery to other reachable nodes is guaranteed, but if no master node is available, index operations are silently discarded. This backend can be configured to use asynchronous messages, or to wait for each indexing operation to be completed on the remote node before returning.

The JGroups backend can be configured with static master or slave roles, or can be setup to perform an auto-election of the master. This mode is particularly useful to have the system automatically pick a new master in case of failure, but during a reelection process some indexing operations might be lost. For this reason this mode is not suited for use cases requiring strong consistency guarantees. When configured to perform an automatic election, the master node is defined as an hash on the index name: the role is therefore possibly different for each index or shard.

2.3. Reader strategy

When executing a query, Hibernate Search interacts with the Apache Lucene indexes through a reader strategy. Choosing a reader strategy will depend on the profile of the application (frequent updates, read mostly, asynchronous index update etc). See also Section 3.5, “Reader strategy configuration”

2.3.1. shared

With this strategy, Hibernate Search will share the same IndexReader, for a given Lucene index, across multiple queries and threads provided that the IndexReader is still up-to-date. If the IndexReader is not up-to-date, a new one is opened and provided. Each IndexReader is made of several SegmentReaders. This strategy only reopens segments that have been modified or created after last opening and shares the already loaded segments from the previous instance. This approach is quite efficient and guarantees that each query is run on the most recent index snapshot; the drawback is that for every query the strategy will have to verify if the IndexReader is still fresh, and if not perform a refresh; such a refresh is typically a cheap operation but if you have a significant amount of writes and queries happening concurrently then one of the other strategies might be preferred. This strategy is the default.

The name of this strategy is shared.

2.3.2. not-shared

Every time a query is executed, a Lucene IndexReader is opened. This strategy is not efficient since opening and warming up an IndexReader can be a relatively expensive operation, but is very simple code. Use it as an example implementation if you’re interested to learn about Hibernate Search internals or want to extend it.

The name of this strategy is not-shared.

2.3.3. async

This implementation keeps an IndexReader open and ready to be used by all queries, while a background thread periodically verifies if there is need to open a fresh one, replaces the active one and disposes the outdated one. The frequency of checks - and refreshing - of this background thread is configurable, but defaults to 5000 milliseconds. The drawback of this design is that queries are effectively run on an index snapshot which might be approximately 5 seconds out of date (assuming the refresh period is not reconfigured); the benefit is that if your application writes frequently to the index, the query performance will be more consistent.

The name of this strategy is async.

2.3.4. Custom

You can write your own reader strategy that suits your application needs by implementing org.hibernate.search.reader.ReaderProvider. The implementation must be thread safe.