Hibernate.orgCommunity Documentation
In this final chapter we are offering a smorgasbord of tips and tricks which might become useful as you dive deeper and deeper into Hibernate Search.
The SearchFactory
object keeps track of the
underlying Lucene resources for Hibernate Search. It is a convenient way
to access Lucene natively. The SearchFactory
can be
accessed from a FullTextSession
:
Example 10.1. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();
Queries in Lucene are executed on an
IndexReader
. Hibernate Search caches index readers
to maximize performance and implements other strategies to retrieve
updated IndexReader
s in order to minimize IO
operations. Your code can access these cached resources, but you have to
follow some "good citizen" rules.
Example 10.2. Accessing an IndexReader
IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class);
try {
//perform read-only operations on the reader
}
finally {
searchFactory.getIndexReaderAccessor().close(reader);
}
In this example the SearchFactory
figures out
which indexes are needed to query this entity. Using the configured
ReaderProvider
(described in Reader strategy) on each index, it returns
a compound IndexReader
on top of all involved indexes.
Because this IndexReader
is shared amongst several
clients, you must adhere to the following rules:
Never call indexReader.close(), but always call readerProvider.closeReader(reader), using a finally block.
Don't use this IndexReader
for
modification operations: it's a readonly
IndexReader
, you would get an
exception).
Aside from those rules, you can use the
IndexReader
freely, especially to do native Lucene
queries. Using this shared IndexReader
s will be
more efficient than by opening one directly from - for example - the
filesystem.
As an alternative to the method open(Class...
types)
you can use open(String...
indexNames)
; in this case you pass in one or more index
names; using this strategy you can also select a subset of the indexes for
any indexed type if sharding is used.
Example 10.3. Accessing an IndexReader by index
names
IndexReader reader = searchFactory
.getIndexReaderAccessor()
.open("Products.1", "Products.3");
A Directory
is the most common abstraction
used by Lucene to represent the index storage; Hibernate Search doesn't
interact directly with a Lucene Directory
but
abstracts these interactions via an IndexManager
:
an index does not necessarily need to be implemented by a
Directory
.
If you are certain that your index is represented as a
Directory
and need to access it, you can get a
reference to the Directory
via the
IndexManager
. You will have to cast the
IndexManager
instance to a
DirectoryBasedIndexManager
and then use
getDirectoryProvider().getDirectory()
to get a
reference to the underlying Directory
. This is not
recommended, if you need low level access to the index using Lucene APIs
we suggest to see Section 10.2, “Using an IndexReader” instead.
In some cases it can be useful to split (shard) the data into several Lucene indexes. There are two main use use cases:
A single index is so big that index update times are slowing the application down. In this case static sharding can be used to split the data into a pre-defined number of shards.
Data is naturally segmented by customer, region, language or other application parameter and the index should be split according to these segments. This is a use case for dynamic sharding.
By default sharding is not enabled.
To enable static sharding set the
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards
property as seen in Example 10.4, “Enabling index sharding”.
Example 10.4. Enabling index sharding
hibernate.search.[default|<indexName>].sharding_strategy.nbr_of_shards = 5
The default sharding strategy which gets enabled by setting this
property, splits the data according to the hash value of the document id
(generated by the FieldBridge
). This ensures a
fairly balanced sharding. You can replace the default strategy by
implementing a custom IndexShardingStrategy
. To
use your custom strategy you have to set the
hibernate.search.[default|<indexName>].sharding_strategy
property to the fully qualified class name of your custom
IndexShardingStrategy
.
Example 10.5. Registering a custom IndexShardingStrategy
hibernate.search.[default|<indexName>].sharding_strategy = my.custom.RandomShardingStrategy
Dynamic sharding allows you to manage the shards yourself and even
create new shards on the fly. To do so you need to implement the
interface ShardIdentifierProvider
and set the
hibernate.search.[default|<indexName>].sharding_strategy
property to the fully qualified name of this class. Note that instead of
implementing the interface directly, you should rather derive your
implementation from
org.hibernate.search.store.ShardIdentifierProviderTemplate
which provides a basic implementation. Let's look at Example 10.6, “Custom ShardidentiferProvider
” for an
example.
Example 10.6. Custom ShardidentiferProvider
public static class AnimalShardIdentifierProvider extends ShardIdentifierProviderTemplate { @Override public String getShardIdentifier(Class<?> entityType, Serializable id, String idAsString, Document document) { if ( entityType.equals(Animal.class) ) { String type = document.getFieldable("type").stringValue(); addShard(type); return type; } throw new RuntimeException("Animal expected but found " + entityType); } @Override protected Set<String> loadInitialShardNames(Properties properties, BuildContext buildContext) { ServiceManager serviceManager = buildContext.getServiceManager(); SessionFactory sessionFactory = serviceManager.requestService( HibernateSessionFactoryServiceProvider.class, buildContext); Session session = sessionFactory.openSession(); try { Criteria initialShardsCriteria = session.createCriteria(Animal.class); initialShardsCriteria.setProjection( Projections.distinct(Property.forName("type"))); @SuppressWarnings("unchecked") List<String> initialTypes = initialShardsCriteria.list(); return new HashSet<String>(initialTypes); } finally { session.close(); } } }
The are several things happening in
AnimalShardIdentifierProvider
. First off its
purpose is to create one shard per animal type (e.g. mammal, insect,
etc.). It does so by inspecting the class type and the Lucene document
passed to the getShardIdentifier()
method. It
extracts the type
field from the document and uses
it as shard name. getShardIdentifier()
is
called for every addition to the index and a new shard will be created
with every new animal type encountered. The base class
ShardIdentifierProviderTemplate
maintains a set
with all known shards to which any identifier must be added by calling
addShard()
.
It is important to understand that Hibernate Search cannot know which
shards already exist when the application starts. When using
ShardIdentifierProviderTemplate
as base class of
a ShardIdentifierProvider
implementation, the
initial set of shard identifiers must be returned by the
loadInitialShardNames()
method. How this is
done will depend on the use case. However, a common case in combination
with Hibernate ORM is that the initial shard set is defined by the the
distinct values of a given database column. Example 10.6, “Custom ShardidentiferProvider
” shows how to handle
such a case. AnimalShardIdentifierProvider
makes
in its loadInitialShardNames()
implementation
use of a service called
HibernateSessionFactoryServiceProvider
(see also
Section 10.6, “Using external services”) which is available within an ORM
environment. It allows to request a Hibernate
SessionFactory
instance which can be used to run
a Criteria
query in order to determine the
initial set of shard identifers.
Last but not least, the
ShardIdentifierProvider
also allows for
optimizing searches by selecting which shard to run a query against. By
activating a filter (see Section 5.3.1, “Using filters in a sharded environment”), a
sharding strategy can select a subset of the shards used to answer a
query (getShardIdentifiersForQuery()
, not shown
in the example) and thus speed up the query execution.
This ShardIdentifierProvider
is considered
experimental. We might need to apply some changes to the defined method
signatures to accomodate for unforeseen use cases. Please provide
feedback if you have ideas, or just to let us know how you're using
this API.
It is technically possible to store the information of more than one entity into a single Lucene index. There are two ways to accomplish this:
Configuring the underlying directory providers to point to the
same physical index directory. In practice, you set the property
hibernate.search.[fully qualified entity
name].indexName
to the same value. As an example let’s use
the same index (directory) for the Furniture
and Animal
entity. We just set
indexName
for both entities to for example
“Animal”. Both entities will then be stored in the Animal
directory.
hibernate.search.org.hibernate.search.test.shards.Furniture.indexName = Animal hibernate.search.org.hibernate.search.test.shards.Animal.indexName = Animal
Setting the @Indexed
annotation’s
index
attribute of the entities you want to
merge to the same value. If we again wanted all
Furniture
instances to be indexed in the
Animal
index along with all instances of
Animal
we would specify
@Indexed(index="Animal")
on both
Animal
and Furniture
classes.
This is only presented here so that you know the option is available. There is really not much benefit in sharing indexes.
Any of the pluggable contracts we have seen so far allows for the
injection of a service. The most notable example being the
DirectoryProvider
. The full list is:
DirectoryProvider
ReaderProvider
OptimizerStrategy
BackendQueueProcessor
Worker
ErrorHandler
MassIndexerProgressMonitor
Some of these components need to access a service which is either
available in the environment or whose lifecycle is bound to the
SearchFactory
. Sometimes, you even want the same
service to be shared amongst several instances of these contract. One
example is the ability the share an Infinispan cache instance between
several directory providers running in different JVM
s
to store the various indexes using the same underlying infrastructure;
this provides real-time replication of indexes across nodes.
To expose a service, you need to implement
org.hibernate.search.spi.ServiceProvider<T>
.
T
is the type of the service you want to use.
Services are retrieved by components via their
ServiceProvider
class implementation.
If your service ought to be started when Hibernate Search starts
and stopped when Hibernate Search stops, you can use a managed
service. Make sure to properly implement the
start
and stop
methods of ServiceProvider
. When the service is
requested, the getService
method is
called.
Example 10.7. Example of ServiceProvider implementation
public class CacheServiceProvider implements ServiceProvider<Cache> {
private CacheManager manager;
public void start(Properties properties) {
//read configuration
manager = new CacheManager(properties);
}
public Cache getService() {
return manager.getCache(DEFAULT);
}
void stop() {
manager.close();
}
}
The ServiceProvider
implementation must
have a no-arg constructor.
To be transparently discoverable, such service should have an
accompanying
META-INF/services/org.hibernate.search.spi.ServiceProvider
whose content list the (various) service provider
implementation(s).
Example 10.8. Content of META-INF/services/org.hibernate.search.spi.ServiceProvider
com.acme.infra.hibernate.CacheServiceProvider
Alternatively, the service can be provided by the environment
bootstrapping Hibernate Search. For example, Infinispan which uses
Hibernate Search as its internal search engine can pass the
CacheContainer
to Hibernate Search. In this
case, the CacheContainer
instance is not
managed by Hibernate Search and the
start
/stop
methods
of its corresponding service provider will not be used.
Provided services have priority over managed services. If a
provider service is registered with the same
ServiceProvider
class as a managed service,
the provided service will be used.
The provided services are passed to Hibernate Search via the
SearchConfiguration
interface
(getProvidedServices
).
Provided services are used by frameworks controlling the lifecycle of Hibernate Search and not by traditional users.
If, as a user, you want to retrieve a service instance from the environment, use registry services like JNDI and look the service up in the provider.
Many of of the pluggable contracts of Hibernate Search can use
services. Services are accessible via the
BuildContext
interface.
Example 10.9. Example of a directory provider using a cache service
public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> {
private BuildContext context;
public void initialize(
String directoryProviderName,
Properties properties,
BuildContext context) {
//initialize
this.context = context;
}
public void start() {
Cache cache = context.requestService(CacheServiceProvider.class);
//use cache
}
public RAMDirectory getDirectory() {
// use cache
}
public stop() {
//stop services
context.releaseService(CacheServiceProvider.class);
}
}
When you request a service, an instance of the service is served
to you. Make sure to then release the service. This is fundamental. Note
that the service can be released in the
DirectoryProvider.stop
method if the
DirectoryProvider
uses the service during its
lifetime or could be released right away of the service is simply used
at initialization time.
Lucene allows the user to customize its scoring formula by extending
org.apache.lucene.search.Similarity
. The abstract
methods defined in this class match the factors of the following formula
calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ t in q ( tf(t in d) · idf(t) 2 · t.getBoost() · norm(t,d) )
Factor | Description |
---|---|
tf(t ind) | Term frequency factor for the term (t) in the document (d). |
idf(t) | Inverse document frequency of the term. |
coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
t.getBoost() | Field boost. |
norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
Similarity
's Javadocs for more information.
Hibernate Search provides two ways to modify Lucene's similarity calculation.
First you can set the default similarity by specifying the fully
specified classname of your Similarity
implementation using the property
hibernate.search.similarity
. The default value is
org.apache.lucene.search.DefaultSimilarity
.
Secondly, you can override the similarity used for a specific index
by setting the similarity
property for this index (see
Section 3.3, “Directory configuration” for more information
about index configuration):
hibernate.search.[default|<indexname>].similarity = my.custom.Similarity
As an example, let's assume it is not important how often a term
appears in a document. Documents with a single occurrence of the term
should be scored the same as documents with multiple occurrences. In this
case your custom implementation of the method tf(float freq)
should return 1.0.
When two entities share the same index they must declare the same
Similarity
implementation.
The use of @Similarity
which was used to
configure the similarity on a class level is deprecated since Hibernate
Search 4.4. Instead of using the annotation use the configuration
property.