Hibernate.orgCommunity Documentation
In this final chapter we are offering a smorgasbord of tips and tricks which might become useful as you dive deeper and deeper into Hibernate Search.
The SearchFactory
object keeps track of the
underlying Lucene resources for Hibernate Search. It is a convenient way
to access Lucene natively. The SearchFactory
can be
accessed from a FullTextSession
:
Example 9.1. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
You can always access the Lucene directories through plain Lucene.
The Directory
structure is in no way different with
or without Hibernate Search. However there are some more convenient ways
to access a given Directory. The SearchFactory
keeps track of the DirectoryProvider
s per indexed
class. One directory provider can be shared amongst several indexed
classes, if the classes share the same underlying index directory. While
usually not the case, a given entity can have several
DirectoryProvider
s if the index is sharded (see
Section 3.3, “Sharding indexes”).
Example 9.2. Accessing the Lucene Directory
DirectoryProvider[] provider = searchFactory.getDirectoryProviders(Order.class); org.apache.lucene.store.Directory directory = provider[0].getDirectory();
In this example, directory points to the lucene index storing
Order
s information. Note that the obtained Lucene
directory must not be closed (this is Hibernate Search's
responsibility).
Queries in Lucene are executed on an IndexReader
.
Hibernate Search caches all index readers to maximize performance. Your
code can access this cached resources, but you have to follow some "good
citizen" rules.
Example 9.3. Accessing an IndexReader
DirectoryProvider orderProvider = searchFactory.getDirectoryProviders(Order.class)[0]; DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0]; ReaderProvider readerProvider = searchFactory.getReaderProvider(); IndexReader reader = readerProvider.openReader(orderProvider, clientProvider); try { //do read-only operations on the reader } finally { readerProvider.closeReader(reader); }
The ReaderProvider (described inReader strategy), will open an IndexReader
on top of the index(es) referenced by the directory providers. Because
this IndexReader
is shared amongst several clients,
you must adhere to the following rules:
Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.
Don't use this IndexReader
for
modification operations (you would get an exception). If you want to
use a read/write index reader, open one from the Lucene Directory
object.
Aside from those rules, you can use the
IndexReader
freely, especially to do native
queries. Using the shared IndexReader
s will make most
queries more efficient.
By components, this section means any of the pluggable contracts -
DirectoryProvider
being the most useful use
case:
DirectoryProvider
ReaderProvider
OptimizerStrategy
BackendQueueProcessorFactory
Worker
Some of these compnents need to access a service which is either
available in the environment or whose lifecycle is bound to the
SearchFactory
. Sometimes, you even want the same
service to be shared amongst several instances of these contract. One
example is the ability the share an Infinispan cache instance between
several directory providers to store the various indexes using the same
underlying infrastructure.
To expose a service, you need to implement
org.hibernate.search.spi.ServiceProvider<T>
.
T
is the type of the service you want to use.
Services are retrieved by components via their
ServiceProvider
class implementation.
If your service ought to be started when Hibernate Search starts
and stopped when Hibernate Search stops, you can use a managed
service. Make sure to properly implement the
start
and stop
methods of ServiceProvider
. When the service is
requested, the getService
method is
called.
Example 9.4. Example of ServiceProvider implementation
public class CacheServiceProvider implements ServiceProvider<Cache> { private CacheManager manager; public void start(Properties properties) { //read configuration manager = new CacheManager(properties); } public Cache getService() { return manager.getCache(DEFAULT); } void stop() { manager.close(); } }
The ServiceProvider
implementation must
have a no-arg constructor.
To be transparently discoverable, such service should have an
accompanying
META-INF/services/org.hibernate.search.spi.ServiceProvider
whose content list the (various) service provider
implementation(s).
Example 9.5. Content of META-INF/services/org.hibernate.search.spi.ServiceProvider
com.acme.infra.hibernate.CacheServiceProvider
Alternatively, the service can be provided by the environment
bootstrapping Hibernate Search. For example, Infinispan which uses
Hibernate Search as its internal search engine can pass the
CacheContainer
to Hibernate Search. In this
case, the CacheContainer
instance is not
managed by Hibernate Search and the
start
/stop
methods
of its corresponding service provider will not be used.
Provided services have priority over managed services. If a
provider service is registered with the same
ServiceProvider
class as a managed service,
the provided service will be used.
The provided services are passed to Hibernate Search via the
SearchConfiguration
interface
(getProvidedServices
).
Provided services are used by frameworks controlling the lifecycle of Hibernate Search and not by traditional users.
If, as a user, you want to retrieve a service instance from the environment, use registry services like JNDI and look the service up in the provider.
Many of of the pluggable contracts of Hibernate Search can use
services. Services are accessible via the
BuildContext
interface.
Example 9.6. Example of a directory provider using a cache service
public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> {
private BuildContext context;
public void initialize(
String directoryProviderName,
Properties properties,
BuildContext context) {
//initialize
this.context = context;
}
public void start() {
Cache cache = context.requestService( CacheServiceProvider.class );
//use cache
}
public RAMDirectory getDirectory() {
// use cache
}
public stop() {
//stop services
context.releaseService( CacheServiceProvider.class );
}
}
When you request a service, an instance of the service is served
to you. Make sure to then release the service. This is fundamental. Note
that the service can be released in the
DirectoryProvider.stop
method if the
DirectoryProvider
uses the service during its
lifetime or could be released right away of the service is simply used
at initialization time.
Lucene allows the user to customize its scoring formula by extending
org.apache.lucene.search.Similarity
. The abstract
methods defined in this class match the factors of the following formula
calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ t in q ( tf(t in d) · idf(t) 2 · t.getBoost() · norm(t,d) )
Factor | Description |
---|---|
tf(t ind) | Term frequency factor for the term (t) in the document (d). |
idf(t) | Inverse document frequency of the term. |
coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
t.getBoost() | Field boost. |
norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
Similarity
's Javadocs for more information.
Hibernate Search provides three ways to modify Lucene's similarity calculation.
First you can set the default similarity by specifying the fully
specified classname of your Similarity
implementation using the property
hibernate.search.similarity
. The default value is
org.apache.lucene.search.DefaultSimilarity
.
You can also override the similarity used for a specific index by
setting the similarity
property
hibernate.search.default.similarity my.custom.Similarity
Finally you can override the default similarity on class level using
the @Similarity
annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}
As an example, let's assume it is not important
how often a term appears in a document. Documents with a single occurrence
of the term should be scored the same as documents with multiple
occurrences. In this case your custom implementation of the method
tf(float freq)
should return 1.0.
When two entities share the same index they must declare the same
Similarity
implementation. Classes in the same
class hierarchy always share the index, so it's not allowed to override
the Similarity
implementation in a
subtype.
Likewise, it does not make sense to define the similarity via the index setting and the class-level setting as they would conflict. Such a configuration will be rejected.