Chapter 9. Advanced features

In this final chapter we are offering a smorgasbord of tips and tricks which might become useful as you dive deeper and deeper into Hibernate Search.

9.1. Accessing the SearchFactory

The SearchFactory object keeps track of the underlying Lucene resources for Hibernate Search. It is a convenient way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession:

Example 9.1. Accessing the SearchFactory

FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();

9.2. Accessing a Lucene Directory

You can always access the Lucene directories through plain Lucene. The Directory structure is in no way different with or without Hibernate Search. However there are some more convenient ways to access a given Directory. The SearchFactory keeps track of the DirectoryProviders per indexed class. One directory provider can be shared amongst several indexed classes, if the classes share the same underlying index directory. While usually not the case, a given entity can have several DirectoryProviders if the index is sharded (see Section 3.3, “Sharding indexes”).

Example 9.2. Accessing the Lucene Directory

DirectoryProvider[] provider = searchFactory.getDirectoryProviders(Order.class);
org.apache.lucene.store.Directory directory = provider[0].getDirectory();

In this example, directory points to the lucene index storing Orders information. Note that the obtained Lucene directory must not be closed (this is Hibernate Search's responsibility).

9.3. Using an IndexReader

Queries in Lucene are executed on an IndexReader. Hibernate Search caches all index readers to maximize performance. Your code can access this cached resources, but you have to follow some "good citizen" rules.

Example 9.3. Accessing an IndexReader

DirectoryProvider orderProvider = searchFactory.getDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0];

ReaderProvider readerProvider = searchFactory.getReaderProvider();
IndexReader reader = readerProvider.openReader(orderProvider, clientProvider);

try {
   //do read-only operations on the reader
}
finally {
   readerProvider.closeReader(reader);
}

The ReaderProvider (described in Reader strategy), will open an IndexReader on top of the index(es) referenced by the directory providers. Because this IndexReader is shared amongst several clients, you must adhere to the following rules:

Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.
Don't use this IndexReader for modification operations (you would get an exception). If you want to use a read/write index reader, open one from the Lucene Directory object.

Aside from those rules, you can use the IndexReader freely, especially to do native queries. Using the shared IndexReaders will make most queries more efficient.

9.4. Use external services in Hibernate Search components (experimental)

By components, this section means any of the pluggable contracts - DirectoryProvider being the most useful use case:

DirectoryProvider
ReaderProvider
OptimizerStrategy
BackendQueueProcessorFactory
Worker

Some of these compnents need to access a service which is either available in the environment or whose lifecycle is bound to the SearchFactory. Sometimes, you even want the same service to be shared amongst several instances of these contract. One example is the ability the share an Infinispan cache instance between several directory providers to store the various indexes using the same underlying infrastructure.

9.4.1. Exposing a service

To expose a service, you need to implement org.hibernate.search.spi.ServiceProvider<T>. T is the type of the service you want to use. Services are retrieved by components via their ServiceProvider class implementation.

9.4.1.1. Managed services

If your service ought to be started when Hibernate Search starts and stopped when Hibernate Search stops, you can use a managed service. Make sure to properly implement the start and stop methods of ServiceProvider. When the service is requested, the getService method is called.

Example 9.4. Example of ServiceProvider implementation

public class CacheServiceProvider implements ServiceProvider<Cache> {
    private CacheManager manager;

    public void start(Properties properties) {
        //read configuration
        manager = new CacheManager(properties);
    }

    public Cache getService() {
        return manager.getCache(DEFAULT);
    }

    void stop() {
        manager.close();
    }
}

Note

The ServiceProvider implementation must have a no-arg constructor.

To be transparently discoverable, such service should have an accompanying META-INF/services/org.hibernate.search.spi.ServiceProvider whose content list the (various) service provider implementation(s).

Example 9.5. Content of META-INF/services/org.hibernate.search.spi.ServiceProvider

com.acme.infra.hibernate.CacheServiceProvider

9.4.1.2. Provided services

Alternatively, the service can be provided by the environment bootstrapping Hibernate Search. For example, Infinispan which uses Hibernate Search as its internal search engine can pass the CacheContainer to Hibernate Search. In this case, the CacheContainer instance is not managed by Hibernate Search and the start/stop methods of its corresponding service provider will not be used.

Note

Provided services have priority over managed services. If a provider service is registered with the same ServiceProvider class as a managed service, the provided service will be used.

The provided services are passed to Hibernate Search via the SearchConfiguration interface (getProvidedServices).

Important

Provided services are used by frameworks controlling the lifecycle of Hibernate Search and not by traditional users.

If, as a user, you want to retrieve a service instance from the environment, use registry services like JNDI and look the service up in the provider.

9.4.2. Using a service

Many of of the pluggable contracts of Hibernate Search can use services. Services are accessible via the BuildContext interface.

Example 9.6. Example of a directory provider using a cache service

public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> {
    private BuildContext context;

    public void initialize(
        String directoryProviderName, 
        Properties properties, 
        BuildContext context) {
        //initialize
        this.context = context;
    }

    public void start() {
        Cache cache = context.requestService( CacheServiceProvider.class );
        //use cache
    }

    public RAMDirectory getDirectory() {
        // use cache
    }

    public stop() {
        //stop services
        context.releaseService( CacheServiceProvider.class );
    } 
}

When you request a service, an instance of the service is served to you. Make sure to then release the service. This is fundamental. Note that the service can be released in the DirectoryProvider.stop method if the DirectoryProvider uses the service during its lifetime or could be released right away of the service is simply used at initialization time.

9.5. Customizing Lucene's scoring formula

Lucene allows the user to customize its scoring formula by extending org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ _{t in q} ( tf(t in d) · idf(t) ² · t.getBoost() · norm(t,d) )

Factor	Description
tf(t ind)	Term frequency factor for the term (t) in the document (d).
idf(t)	Inverse document frequency of the term.
coord(q,d)	Score factor based on how many of the query terms are found in the specified document.
queryNorm(q)	Normalizing factor used to make scores between queries comparable.
t.getBoost()	Field boost.
norm(t,d)	Encapsulates a few (indexing time) boost and length factors.

It is beyond the scope of this manual to explain this formula in more detail. Please refer to Similarity's Javadocs for more information.

Hibernate Search provides three ways to modify Lucene's similarity calculation.

First you can set the default similarity by specifying the fully specified classname of your Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity.

You can also override the similarity used for a specific index by setting the similarity property

hibernate.search.default.similarity my.custom.Similarity

Finally you can override the default similarity on class level using the @Similarity annotation.

@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}

As an example, let's assume it is not important how often a term appears in a document. Documents with a single occurrence of the term should be scored the same as documents with multiple occurrences. In this case your custom implementation of the method tf(float freq) should return 1.0.

Warning

When two entities share the same index they must declare the same Similarity implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity implementation in a subtype.

Likewise, it does not make sense to define the similarity via the index setting and the class-level setting as they would conflict. Such a configuration will be rejected.