Chapter 3. Shard Strategy

3.1. Overview

Hibernate Shards gives you enormous flexibility in configuring how your data is distributed across your shards and how your data is queried across your shards. The entry point for this configuration is the org.hibernate.shards.strategy.ShardStrategy interface:

public interface ShardStrategy {
    ShardSelectionStrategy getShardSelectionStrategy();
    ShardResolutionStrategy getShardResolutionStrategy();
    ShardAccessStrategy getShardAccessStrategy();
}

As you can see, a ShardStrategy is comprised of three sub-strategies. We'll discuss each of these in turn.

3.2. ShardAccessStrategy

We'll start with the most simple of the strategies: ShardAccessStrategy. Hibernate Shards uses the ShardAccessStrategy to determine how to apply database operations across multiple shards. The ShardAccessStrategy is consulted whenever you execute a query against your shards. We've already provided two implementations of this interface that we expect will suffice for the majority of applications.

3.2.1. SequentialShardAccessStrategy

SequentialShardAccessStrategy behaves in the exact way that is implied by its name: queries are executed against your shards in sequence. Depending on the types of queries you execute you may want to avoid this implementation because it will execute queries against your shards in the same order every time. If you execute a lot of row-limited, unordered queries this could result in poor utilization across your shards (shards that appear early in your list will get hammered and shards that appear late will sit idly by, twiddling their shard-thumbs). If this is a concern you should consider using the LoadBalancedSequentialShardAccessStrategy instead. This implementation receives a rotated view of your shards on each invocation, thereby distributing query load evenly.

3.2.2. ParallelShardAccessStrategy

ParallelShardAccessStrategy also behaves in the exact way that is implied by its name: queries are executed against your shards in parallel. When you use this implementation you need to provide a java.util.concurrent.ThreadPoolExecutor that is suitable for the performance and throughput needs of your application. Here's a simple example:

    ThreadFactory factory = new ThreadFactory() {
        public Thread newThread(Runnable r) {
            Thread t = Executors.defaultThreadFactory().newThread(r);
            t.setDaemon(true);
            return t;
        }
    };

    ThreadPoolExecutor exec =
        new ThreadPoolExecutor(
            10,
            50,
            60,
            TimeUnit.SECONDS,
            new SynchronousQueue<Runnable>(),
            factory);

    return new ParallelShardAccessStrategy(exec);
    

Please note that these are just sample values - proper thread pool configuration is beyond the scope of this document.

3.3. ShardSelectionStrategy

Hibernate Shards uses the ShardSelectionStrategy to determine the shard on which a new object should be created. It's entirely up to you to decide what you want your implementation of this interface to look like, but we've provided a round-robin implementation to get you started (RoundRobinShardSelectionStrategy). We expect many applications will want to implement attribute-based sharding, so for our example application that stores weather reports let's shard reports by the continents on which the reports originate:

public class WeatherReportShardSelectionStrategy implements ShardSelectionStrategy {
    public ShardId selectShardIdForNewObject(Object obj) {
        if(obj instanceof WeatherReport) {
            return ((WeatherReport)obj).getContinent().getShardId();
        }
        throw new IllegalArgumentException();
    }
}

It's important to note that if a multi-level object graph is being saved via Hibernate's cascading functionality, the ShardSelectionStrategy will only be consulted when saving the top-level object. All child objects will automatically be saved to the same shard as the parent. You may find your ShardSelectionStrategy easier to implement if you prevent developers from creating new objects at more than one level in your object hierarchy. You can accomplish this by making your ShardSelectionStrategy implementation aware of the top-level objects in your model and having it throw an exception if it encounters an object that is not in this set. If you do not wish to impose this restriction that's fine, just remember that if you're doing attribute-based shard selection, the attributes you use to make your decision need to be available on every object that gets passed to session.save().

3.4. ShardResolutionStrategy

Hibernate Shards uses the ShardResolutionStrategy to determine the set of shards on which an object with a given id might reside. Let's go back to our weather report application and suppose, for example, that each continent has a range of ids associated with it. Whenever we assign an id to a WeatherReport we pick one that falls within the legal range for the continent to which the WeatherReport belongs. Our ShardResolutionStrategy can use this information to identify which shard a WeatherReport resides on simply by looking at the id:

public class WeatherReportShardResolutionStrategy extends AllShardsShardResolutionStrategy {
    public WeatherReportShardResolutionStrategy(List<ShardId> shardIds) {
        super(shardIds);
    }

    public List<ShardId> selectShardIdsFromShardResolutionStrategyData(
            ShardResolutionStrategyData srsd) {
        if(srsd.getEntityName().equals(WeatherReport.class.getName())) {
            return Continent.getContinentByReportId(srsd.getId()).getShardId();
        }
        return super.selectShardIdsFromShardResolutionStrategyData(srsd);
    }
}

It's worth pointing out that while we have not (yet) implemented a cache that maps entity name/id to shard, the ShardResolutionStrategy would be an excellent place to plug in such a cache.

Shard Resolution is tightly tied to ID Generation. If you select an ID Generator for your class that encodes the shard id in the id of the object, your ShardResolutionStrategy will never even be called. If you plan to only use ID Generators that encode the shard id in the ids of your object you should use AllShardsShardResolutionStrategy as your ShardResolutionStrategy.

3.5. ID Generation

Hibernate Sharding supports any ID generation strategy; the only requirement is that object IDs have to be unique across all the shards. There are a few simple ID generation strategies which support this requirement:

  • Native ID generation - use Hibernate's native ID generation strategy, and configure your databases so that the IDs never collide. For example, if you are using identity ID generation, you have 5 databases across which you will evenly distribute the data, and you don't expect you will ever have more than 1 million records, you could configure database 0 to return IDs starting at 0, configure database 1 to return IDs starting at 200000, configure database 2 to return IDs starting at 400000, and so on. As long as your assumptions about the data are correct, the IDs of your objects would never collide.

  • Application-level UUID generation - by definition you don't have to worry about ID collisions, but you do need to be willing to deal with potentially unwieldy primary keys for your objects.

    Hibernate Shards provides an implementation of a simple, shard-aware UUID generator - ShardedUUIDGenerator.

  • Distributed hilo generation - the idea is to have a hilo table on only one shard, which ensures that the identifiers generated by the hi/lo algorithm are unique across all shards. Two main drawbacks of this approach are that the access to the hilo table can become the bottleneck in ID generation, and that storing the hilo table on a single database creates a single point of failure of the system.

    Hibernate Shards provides an implementation of a distributed hilo generation algorithm - ShardedTableHiLoGenerator. This implementation is based on org.hibernate.id.TableHiLoGenerator, so for information on the expected structure of the database table on which the implementation depends please see the documentation for this class.

ID generation is also tightly tied with the shard resolution. The objective of shard resolution is to find the shard an object lives on, given the object's ID. There are two ways to accomplish this objective:

  • Use the ShardResolutionStrategy, described above

  • Encode the shard ID into the object ID during the ID generation, and retrieve the shard ID during shard resolution

The main advantage of encoding the shard ID into the object ID is that it enables Hibernate Shards to resolve the shard from the object's ID much faster without database lookups, cache lookups, etc. Hibernate Shards does not require any specific algorithm for encoding/decoding of the shard ID - all you have to do is use an ID generator that implements the ShardEncodingIdentifierGenerator interface. Of the two ID generators included with Hibernate Shards, the ShardedUUIDGenerator implements this interface.