Chapter 5. Querying

5.1. Overview

Executing queries across shards can be hard. In this chapter we'll discuss what works, what doesn't, and what you can do to stay out of trouble.

5.2. Criteria

As we discuss in the chapter on Limitations, we do not yet have a complete implementation of the Hibernate Core API. This limitation applies to ShardedCriteriaImpl, which is a shard-aware implementation of the Criteria interface. In this chapter we won't go into the details of specific things that haven't been implemented. Rather, we're going to discuss the types of Criteria queries that are problematic in a sharded environment.

Simply put, queries that do sorting are trouble. Why? Because we can't return a properly sorted list without the ability to compare any value in the list to any other value in the list, and the entire list isn't available until the results of the individual queries have been collected in the application tier. The sorting needs to take place inside Hibernate Shards, and in order for this to happen we require that all objects returned by a Criteria query with an order-by clause implement the Comparable interface. If the type of the objects you return do not implement this interface you'll receive an exception.

Distinct clauses are trouble as well. So much trouble, in fact, that at the moment we don't even support them. Sorry about that.

On the other hand, while distinct and order-by are trouble, aggregation works just fine. Consider the following example:

        // fetch the average of all temperatures recorded since last thursday
        Criteria crit = session.createCriteria(WeatherReport.class);
        crit.add(Restrictions.gt("timestamp", lastThursday));
        crit.setProjection(Projections.avg("temperature"));
        return crit.list();
            

In a single-shard environment this query can be easily answered, but in a multi-shard environment it's a little bit trickier. Why? Because just getting the average from each shard isn't enough to calculate the average across all shards. In order to calculate this piece of information we need not just the average but the number of records from each shard. This is exactly what we do, and the performance hit (doing an extra count as part of each query) is probably negligible. Now, if we wanted the median we'd be in trouble (just adding the count to the query would not provide enough information to perform the calculation), but at the moment Criteria doesn't expose a median function so we'll deal with that if and when it becomes and issue.

5.3. HQL

Our support for HQL is, at this point, not nearly as good as the support we have for Criteria queries. We have not yet implemented any extensions to the query parser, so we don't support distinct, order-by, or aggregations. This means you can only use HQL for very simple queries. You're probably better off staying clear of HQL in this release if you can help it.

5.4. Use of Shard Strategy When Querying

The only component of your shard strategy that is consulted when executing a query (Criteria or HQL) is the ShardAccessStrategy. ShardSelectionStrategy is ignored because executing a query doesn't create any new records in the database. ShardResolutionStrategy is ignored because we currently assume that you always want your query executed on all shards. If this isn't the case, the best thing to do is just downcast your Session to a ShardedSession and dig out the shard-specific Sessions you need. Clunky, but it works. We'll come up with a better solution for this in later releases.