Chapter 8. Advanced features

The SearchFactory object keeps track of the underlying Lucene resources for Hibernate Search, it's also a convenient way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession:

Example 8.1. Accessing the SearchFactory

FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();

8.2. Accessing a Lucene Directory

You can always access the Lucene directories through plain Lucene, the Directory structure is in no way different with or without Hibernate Search. However there are some more convenient ways to access a given Directory. The SearchFactory keeps track of the DirectoryProviders per indexed class. One directory provider can be shared amongst several indexed classes if the classes share the same underlying index directory. While usually not the case, a given entity can have several DirectoryProviders if the index is sharded (see Section 3.2, “Sharding indexes”).

Example 8.2. Accessing the Lucene Directory

DirectoryProvider[] provider = searchFactory.getDirectoryProviders(Order.class);
org.apache.lucene.store.Directory directory = provider[0].getDirectory();

In this example, directory points to the lucene index storing Orders information. Note that the obtained Lucene directory must not be closed (this is Hibernate Search responsibility).

8.3. Using an IndexReader

Queries in Lucene are executed on an IndexReader. Hibernate Search caches all index readers to maximize performance. Your code can access this cached resources, but you have to follow some "good citizen" rules.

Example 8.3. Accessing an IndexReader

DirectoryProvider orderProvider = searchFactory.getDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0];

ReaderProvider readerProvider = searchFactory.getReaderProvider();
IndexReader reader = readerProvider.openReader(orderProvider, clientProvider);

try {
    //do read-only operations on the reader
}
finally {
    readerProvider.closeReader(reader);
}

The ReaderProvider (described in Reader strategy), will open an IndexReader on top of the index(es) referenced by the directory providers. Because this IndexReader is shared amongst several clients, you must adhere to the following rules:

Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.
Don't use this IndexReader for modification operations (you would get an exception). If you want to use a read/write index reader, open one from the Lucene Directory object.

Aside from those rules, you can use the IndexReader freely, especially to do native queries. Using the shared IndexReaders will make most queries more efficient.

8.4. Customizing Lucene's scoring formula

Lucene allows the user to customize its scoring formula by extending org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:

score(q,d) = coord(q,d) · queryNorm(q) · ∑_{t in q} ( tf(t in d) · idf(t)² · t.getBoost() · norm(t,d) )

Factor	Description
tf(t ind)	Term frequency factor for the term (t) in the document (d).
idf(t)	Inverse document frequency of the term.
coord(q,d)	Score factor based on how many of the query terms are found in the specified document.
queryNorm(q)	Normalizing factor used to make scores between queries comparable.
t.getBoost()	Field boost.
norm(t,d)	Encapsulates a few (indexing time) boost and length factors.

It is beyond the scope of this manual to explain this formula in more detail. Please refer to Similarity's Javadocs for more information.

Hibernate Search provides two ways to modify Lucene's similarity calculation. First you can set the default similarity by specifying the fully specified classname of your Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity. Additionally you can override the default similarity on class level using the @Similarity annotation.

@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
   ...
}

As an example, let's assume it is not important how often a term appears in a document. Documents with a single occurrence of the term should be scored the same as documents with multiple occurrences. In this case your custom implementation of the method tf(float freq) should return 1.0.

Warning

When two entities share the same index they must declare the same Similarity implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity implementation in a subtype.