Hibernate.orgCommunity Documentation

Chapter 9. Advanced features

9.1. Accessing the SearchFactory
9.2. Accessing a Lucene Directory
9.3. Using an IndexReader
9.4. Use external services in Hibernate Search components (experimental)
9.4.1. Exposing a service
9.4.2. Using a service
9.5. Customizing Lucene's scoring formula

In this final chapter we are offering a smorgasbord of tips and tricks which might become useful as you dive deeper and deeper into Hibernate Search.

The SearchFactory object keeps track of the underlying Lucene resources for Hibernate Search. It is a convenient way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession:

You can always access the Lucene directories through plain Lucene. The Directory structure is in no way different with or without Hibernate Search. However there are some more convenient ways to access a given Directory. The SearchFactory keeps track of the DirectoryProviders per indexed class. One directory provider can be shared amongst several indexed classes, if the classes share the same underlying index directory. While usually not the case, a given entity can have several DirectoryProviders if the index is sharded (see Section 3.3, “Sharding indexes”).

In this example, directory points to the lucene index storing Orders information. Note that the obtained Lucene directory must not be closed (this is Hibernate Search's responsibility).

Queries in Lucene are executed on an IndexReader. Hibernate Search caches all index readers to maximize performance. Your code can access this cached resources, but you have to follow some "good citizen" rules.

The ReaderProvider (described inReader strategy), will open an IndexReader on top of the index(es) referenced by the directory providers. Because this IndexReader is shared amongst several clients, you must adhere to the following rules:

  • Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.

  • Don't use this IndexReader for modification operations (you would get an exception). If you want to use a read/write index reader, open one from the Lucene Directory object.

Aside from those rules, you can use the IndexReader freely, especially to do native queries. Using the shared IndexReaders will make most queries more efficient.

By components, this section means any of the pluggable contracts - DirectoryProvider being the most useful use case:

Some of these compnents need to access a service which is either available in the environment or whose lifecycle is bound to the SearchFactory. Sometimes, you even want the same service to be shared amongst several instances of these contract. One example is the ability the share an Infinispan cache instance between several directory providers to store the various indexes using the same underlying infrastructure.

To expose a service, you need to implement org.hibernate.search.spi.ServiceProvider<T>. T is the type of the service you want to use. Services are retrieved by components via their ServiceProvider class implementation.

Lucene allows the user to customize its scoring formula by extending org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ t in q ( tf(t in d) · idf(t) 2 · t.getBoost() · norm(t,d) )

tf(t ind)Term frequency factor for the term (t) in the document (d).
idf(t)Inverse document frequency of the term.
coord(q,d)Score factor based on how many of the query terms are found in the specified document.
queryNorm(q)Normalizing factor used to make scores between queries comparable.
t.getBoost()Field boost.
norm(t,d)Encapsulates a few (indexing time) boost and length factors.

It is beyond the scope of this manual to explain this formula in more detail. Please refer to Similarity's Javadocs for more information.

Hibernate Search provides three ways to modify Lucene's similarity calculation.

First you can set the default similarity by specifying the fully specified classname of your Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity.

You can also override the similarity used for a specific index by setting the similarity property

hibernate.search.default.similarity my.custom.Similarity

Finally you can override the default similarity on class level using the @Similarity annotation.

@Similarity(impl = DummySimilarity.class)
public class Book {

As an example, let's assume it is not important how often a term appears in a document. Documents with a single occurrence of the term should be scored the same as documents with multiple occurrences. In this case your custom implementation of the method tf(float freq) should return 1.0.