In some extreme cases involving huge indexes (in size), it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended until you reach significant index sizes and index update times are slowing the application down. The main drawback of index sharding is that searches will end up being slower since more files have to be opened for a single search. In other words don't do it until you have problems :)
Despite this strong warning, Hibernate Search allows you to index a
given entity type into several sub indexes. Data is sharded into the
different sub indexes thanks to an
IndexShardingStrategy
. By default, no sharding
strategy is enabled, unless the number of shards is configured. To
configure the number of shards use the following property
Example 3.3. Enabling index sharding by specifying nbr_of_shards for a specific index
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards 5
This will use 5 different shards.
The default sharding strategy, when shards are set up, splits the
data according to the hash value of the id string representation
(generated by the Field Bridge). This ensures a fairly balanced sharding.
You can replace the strategy by implementing
IndexShardingStrategy
and by setting the following
property
Example 3.4. Specifying a custom sharding strategy
hibernate.search.<indexName>.sharding_strategy my.shardingstrategy.Implementation
Each shard has an independent directory provider configuration as
described in Section 3.1, “Directory configuration”. The
DirectoryProvider default name for the previous example are
<indexName>.0
to
<indexName>.4
. In other words, each shard has the
name of it's owning index followed by .
(dot) and its
index number.
Example 3.5. Configuring the sharding configuration for an example entity
Animal
hibernate.search.default.indexBase /usr/lucene/indexes hibernate.search.Animal.sharding_strategy.nbr_of_shards 5 hibernate.search.Animal.directory_provider org.hibernate.search.store.FSDirectoryProvider hibernate.search.Animal.0.indexName Animal00 hibernate.search.Animal.3.indexBase /usr/lucene/sharded hibernate.search.Animal.3.indexName Animal03
This configuration uses the default id string hashing strategy and
shards the Animal index into 5 subindexes. All subindexes are
FSDirectoryProvider
instances and the directory
where each subindex is stored is as followed:
for subindex 0: /usr/lucene/indexes/Animal00 (shared indexBase but overridden indexName)
for subindex 1: /usr/lucene/indexes/Animal.1 (shared indexBase, default indexName)
for subindex 2: /usr/lucene/indexes/Animal.2 (shared indexBase, default indexName)
for subindex 3: /usr/lucene/shared/Animal03 (overridden indexBase, overridden indexName)
for subindex 4: /usr/lucene/indexes/Animal.4 (shared indexBase, default indexName)