Chapter 4. Resharding

When an application's dataset grows beyond the capacity of the databases originally allocated to the application it becomes necessary to add more databases, and it is often desirable to redistribute the data across the shards (either to achieve proper load balancing or to satisfy application invariants) - this is called resharding. Resharding is a complicated problem, and it has the potential to cause major complications in the management of your production application if it is not considered during the design. In order to ease some of the pain associated with resharding, Hibernate Shards provides support for virtual shards.

4.1. Virtual Shards

In the general case, each object lives on a shard. Resharding consists of two tasks: moving the object to another shard, and changing object-shard mappings. The object-shard mapping is captured either by the shard ID encoded into the object ID or by the internal logic of the shard resolution strategy which the object uses. In the former case, resharding would require changing all the object IDs and FKs. In the latter case, resharding could require anything from changing the runtime configuration of a given ShardResolutionStrategy to changing the algorithm of the ShardResolutionStrategy. Unfortunately, the problem of changing object-shard mappings becomes even worse once we consider the fact that Hibernate Shards does not support cross-shard relationships. This limitation prevents us from moving a subset of an object graph from one shard to another.

The task of changing object-shard mappings can be simplified by adding a level of indirection - each object lives on a virtual shard, and each virtual shard is mapped to one physical shard. During design, developers must decide on the maximum number of physical shards the application will ever require. This maximum is then used as the number of virtual shards, and these virtual shards are then mapped to the physical shards currently required by the application. Since Hibernate Shards' ShardSelectionStrategy, ShardResolutionStrategy, and ShardEncodingIdentifierGenerator all operate on virtual shards, the objects will correctly be distributed across virtual shards. During resharding, object-shard mappings can now simply be changed by changing virtual shard to physical shard mappings.

If you're worried about correctly estimating the maximum number of physical shards your application will ever require, aim high. Virtual shards are cheap. Down the road you'll be much better off with extra virtual than if you have to add virtual shards.

In order to enable virtual sharding you need to create your ShardedConfiguration with a Map from virtual shard ids to physical shard ids. Here's an example where we have 4 virtual shards mapped to 2 physical shards.

Map<Integer, Integer> virtualShardMap = new HashMap<Integer, Integer>();
virtualShardMap.put(0, 0);
virtualShardMap.put(1, 0);
virtualShardMap.put(2, 1);
virtualShardMap.put(3, 1);
ShardedConfiguration shardedConfig =
    new ShardedConfiguration(
        prototypeConfiguration,
        configurations,
        strategyFactory,
        virtualShardMap);
return shardedConfig.buildShardedSessionFactory();

In order to change the virtual shard to physical shard mapping later on it is only necessary to change the virtualShardToShardMap passed to this constructor.

We mentioned that the second task during resharding is moving data from one physical shard to another. Hibernate Shards does not try to provide automatic support for this as this is usually very application-specific, and complexity varies based on the potential need for hot-resharding, deployment architecture of the application, etc.