Hibernate.orgCommunity Documentation
Hibernate Search consists of an indexing component as well as an index search component. Both are backed by Apache Lucene.
Each time an entity is inserted, updated or removed in/from the
database, Hibernate Search keeps track of this event (through the
Hibernate event system) and schedules an index update. All these updates
are handled without you having to interact with the Apache Lucene APIs
directly (see Section 3.1, “Enabling Hibernate Search and automatic indexing”). Instead, the
interaction with the underlying Lucene indexes is handled via so called
IndexManager
s.
Each Lucene index is managed by one index manager which is uniquely
identified by name. In most cases there is also a one to one relationship
between an indexed entity and a single
IndexManager
. The exceptions are the use cases of
index sharding and index sharing. The former can be applied when the index
for a single entity becomes too big and indexing operations are slowing
down the application. In this case a single entity is indexed into
multiple indexes each with its own index manager (see Section 10.4, “Sharding indexes”). The latter, index
sharing, is the ability to index multiple entities into the same Lucene
index (see Section 10.5, “Sharing indexes”).
The index manager abstracts from the specific index configuration.
In the case of the default index manager this includes details about the
selected backend, the configured reader strategy and the chosen
DirectoryProvider
. These components will be
discussed in greater detail later on. It is recommended that you start
with the default index manager which uses different Lucene
Directory
types to manage the indexes (see Section 3.3, “Directory configuration”). You can, however, also
provide your own IndexManager
implementation (see
Section 3.2, “Configuring the IndexManager”).
Once the index is created, you can search for entities and return
lists of managed entities saving you the tedious object to Lucene
Document
mapping. The same persistence context is
shared between Hibernate and Hibernate Search. As a matter of fact, the
FullTextSession
is built on top of the Hibernate
Session
so that the application code can use the
unified org.hibernate.Query
or
javax.persistence.Query
APIs exactly the same way a
HQL, JPA-QL or native query would do.
To be more efficient Hibernate Search batches the write interactions
with the Lucene index. This batching is the responsibility of the
Worker
. There are currently two types of batching.
Outside a transaction, the index update operation is executed right after
the actual database operation. This is really a no batching setup. In the
case of an ongoing transaction, the index update operation is scheduled
for the transaction commit phase and discarded in case of transaction
rollback. The batching scope is the transaction. There are two immediate
benefits:
Performance: Lucene indexing works better when operation are executed in batch.
ACIDity: The work executed has the same scoping as the one executed by the database transaction and is executed if and only if the transaction is committed. This is not ACID in the strict sense of it, but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from the source at any time.
You can think of those two batch modes (no scope vs transactional) as the equivalent of the (infamous) autocommit vs transactional behavior. From a performance perspective, the in transaction mode is recommended. The scoping choice is made transparently. Hibernate Search detects the presence of a transaction and adjust the scoping (see Section 3.4, “Worker configuration”).
It is recommended - for both your database and Hibernate Search - to execute your operations in a transaction, be it JDBC or JTA.
Hibernate Search offers the ability to let the batched work being
processed by different back ends. Several back ends are provided out of
the box and you have the option to plugin your own. It is important to
understand that in this context back end encompasses more than just the
configuration option
hibernate.search.default.worker.backend
. This property
just specifies a implementation of the
BackendQueueProcessor
interface which is a part of
a back end configuration. In most cases, however, additional configuration
settings are needed to successfully configure a specific backend setup,
like for example the JMS back end.
In this mode, all index update operations applied on a given node (JVM) will be executed to the Lucene directories (through the directory providers) by the same node. This mode is typically used in non clustered environment or in clustered environments where the directory store is shared.
Lucene back end configuration.
This mode targets non clustered applications, or clustered
applications where the Directory
is taking care
of the locking strategy.
The main advantage is simplicity and immediate visibility of the changes in Lucene queries (a requirement in some applications).
An alternative back end viable for non-clustered and non-shared index configurations is the near-real-time backend.
All index update operations applied on a given node are sent to a JMS queue. A unique reader will then process the queue and update the master index. The master index is then replicated on a regular basis to the slave copies. This is known as the master/slaves pattern. The master is the sole responsible for updating the Lucene index. The slaves can accept read as well as write operations. However, they only process the read operation on their local index copy and delegate the update operations to the master.
JMS back end configuration.
This mode targets clustered environments where throughput is critical, and index update delays are affordable. Reliability is ensured by the JMS provider and by having the slaves working on a local copy of the index.
The JGroups based back end works similar to the JMS one and is designed after the same master/slave pattern. However, instead of JMS the JGroups toolkit is used as a replication mechanism. This back end can be used as an alternative to JMS when response time is critical, but i.e. JNDI service is not available.
Note that while JMS can usually be configured to use persistent queues, JGroups can only talk directly with the other nodes over network: it does guarantee message delivery to other participants, but if no master node is available, index update operations will be silently discarded. A new mode for auto-election of a master node is being developed.
When executing a query, Hibernate Search interacts with the Apache Lucene indexes through a reader strategy. Choosing a reader strategy will depend on the profile of the application (frequent updates, read mostly, asynchronous index update etc). See also Section 3.5, “Reader strategy configuration”
With this strategy, Hibernate Search will share the same
IndexReader
, for a given Lucene index, across
multiple queries and threads provided that the
IndexReader
is still up-to-date. If the
IndexReader
is not up-to-date, a new one is
opened and provided. Each IndexReader
is made of
several SegmentReader
s. This strategy only
reopens segments that have been modified or created after last opening
and shares the already loaded segments from the previous instance. This
strategy is the default.
The name of this strategy is shared
.
Every time a query is executed, a Lucene
IndexReader
is opened. This strategy is not the
most efficient since opening and warming up an
IndexReader
can be a relatively expensive
operation.
The name of this strategy is not-shared
.