Hibernate.orgCommunity Documentation

Chapter 9. Spatial

With the spatial extensions you can combine full-text queries with distance restrictions, filter results based on distances or sort results on such a distance criteria.

The spatial support of Hibernate Search has the following goals:

For example, you might search for restaurants somewhere in a 2 km radius around your office.

In order to use the spatial extensions for an indexed entity, you need to add the @Spatial annotation (org.hibernate.search.annotations.Spatial) and specify one or more sets of coordinates.

9.1. Enable indexing of Spatial Coordinates

There are different techniques to index point coordinates. Hibernate Search Spatial offers a choice between two strategies:

  • index as numbers
  • index as labeled spatial hashes

We will now describe both methods, so you can make a suitable choice. You can pick a different strategy for each set of coordinates. The strategy is selected by specifying the spatialMode attribute of the @Spatial annotation.

9.1.1. Indexing coordinates for range queries

When setting the @Spatial.spatialMode attribute to SpatialMode.RANGE (which is the default) coordinates are indexed as numeric fields, so that range queries can be performed to narrow down the initial area of interest.

Pros:

  • Is quick on small data sets (< 100k entities)
  • Is very simple: straightforward to debug/analyze
  • Impact on index size is moderate

Cons:

  • Poor performance on large data sets
  • Poor performance if your data set is distributed across the whole world (for example when indexing points of interest in the United States, in Europe and in Asia, large areas collide because they share the same latitude. The latitude range query returns large amounts of data that need to be cross checked with those returned by the longitude range).

To index your entities for range querying you have to:

  • add the @Spatial annotation on your entity
  • add the @Latitude and @Longitude annotations on your properties representing the coordinates; these must be of type Double
Example 9.1. Sample Spatial indexing: Hotel class
import org.hibernate.search.annotations.*;

@Entity
@Indexed
@Spatial
public class Hotel {

  @Latitude
  Double latitude

  @Longitude
  Double longitude

  // ...

9.1.2. Indexing coordinates in a grid with spatial hashes

When setting @Spatial.spatialMode to SpatialMode.HASH the coordinates are encoded in several fields representing different zoom levels. Each box for each level is labeled so coordinates are assigned matching labels for each zoom level. This results in a grid encoding of labels called spatial hashes.

Pros :

  • Good performance even with large data sets
  • World wide data distribution independent

Cons :

  • Index size is larger: need to encode multiple labels per pair of coordinates

To index your entities you have to:

  • add the @Spatial annotation on the entity with the SpatialMode set to GRID : @Spatial(spatialMode = SpatialMode.HASH)
  • add the @Latitude and @Longitude annotations on the properties representing your coordinates; these must be of type Double
Example 9.2. Indexing coordinates in a grid using spatial hashes
@Spatial(spatialMode = SpatialMode.HASH)
@Indexed
@Entity
public class Hotel {

  @Latitude
  Double latitude;

  @Longitude
  Double longitude;

  // ...

9.1.3. Implementing the Coordinates interface

Instead of using the @Latitude and @Longitude annotations you can choose to implement the org.hibernate.search.spatial.Coordinates interface.

Example 9.3. Implementing the Coordinates interface
import org.hibernate.search.annotations.*;
import org.hibernate.search.spatial.Coordinates;

@Entity
@Indexed
@Spatial
public class Song implements Coordinates {

  @Id long id;
  double latitude;
  double longitude;
  // ...

  @Override
  Double getLatitude() {
    return latitude;
  }

  @Override
  Double getLongitude() {
    return longitude;
  }

  // ...

As we will see in the section Section 9.3, “Multiple Coordinate pairs”, an entity can have multiple @Spatial annotations; when having the entity implement Coordinates, the implemented methods refer to the default @Spatial annotation with the default pair of coordinates.

Tip

The default (field) name in case @Spatial is placed on the entity level is org.hibernate.search.annotations.Spatial.COORDINATES_DEFAULT_FIELD.

An alternative is to use properties implementing the Coordinates interface; this way you can have multiple Spatial instances:

Example 9.4. Using attributes of type Coordinates
@Entity
@Indexed
public class Event {
  @Id
  Integer id;

  @Field(store = Store.YES)
  String name;

  double latitude;
  double longitude;

  @Spatial(spatialMode = SpatialMode.HASH)
  public Coordinates getLocation() {
    return new Coordinates() {
      @Override
      public Double getLatitude() {
        return latitude;
      }

      @Override
      public Double getLongitude() {
        return longitude;
      }
    };
  }

// ...

When using this form the @Spatial.name automatically defaults to the property name. In the above case to location.

9.2. Performing Spatial Queries

You can use the Hibernate Search query DSL to build a query to search around a pair of coordinates (latitude, longitude) or around a bean implementing the Coordinates interface.

As with any full-text query, the spatial query creation flow looks like:

  1. retrieve a QueryBuilder from the SearchFactory
  2. use the DSL to build a spatial query, defining search center and radius
  3. optionally combine the resulting Query with other filters
  4. call the createFullTextQuery() and use the resulting query like any standard Hibernate or JPA query
Example 9.5. Search for an Hotel by distance
QueryBuilder builder = fullTextSession.getSearchFactory()
  .buildQueryBuilder().forEntity( Hotel.class ).get();

org.apache.lucene.search.Query luceneQuery = builder
  .spatial()
  .within( radius, Unit.KM )
    .ofLatitude( centerLatitude )
    .andLongitude( centerLongitude )
  .createQuery();

org.hibernate.Query hibQuery = fullTextSession
  .createFullTextQuery( luceneQuery, Hotel.class );
List results = hibQuery.list();

Note

In the above example we did not explicitly specify the field name to use. The default coordinates field name was used implicitly. To target an alternative pair of coordinates at query time, we need to specify the field name as well. See Section 9.3, “Multiple Coordinate pairs”.

A fully working example can be found in the test-suite of the source code. Refer to SpatialIndexingTest.testSpatialAnnotationOnClassLevel() and its corresponding Hotel test class.

Alternatively to passing separate latitude and longitude values, you can also pass an instance implementing the Coordinates interface:

Example 9.6. DSL example with Coordinates
Coordinates coordinates = Point.fromDegrees(24d, 31.5d);
Query query = builder
  .spatial()
    .within( 51, Unit.KM )
      .ofCoordinates( coordinates )
  .createQuery();

List results = fullTextSession.createFullTextQuery( query, POI.class ).list();

9.2.1. Returning distance to query point in the search results

9.2.1.1. Returning distance to the center in the results

To retrieve the actual distance values (in kilometers) you need to use projection (see Section 5.1.3.5, “Projection”):

Example 9.7. Distance projection example
double centerLatitude = 24.0d;
double centerLongitude= 32.0d;

QueryBuilder builder = fullTextSession.getSearchFactory()
  .buildQueryBuilder().forEntity(POI.class).get();
org.apache.lucene.search.Query luceneQuery = builder
  .spatial()
     .onField("location")
     .within(100, Unit.KM)
       .ofLatitude(centerLatitude)
       .andLongitude(centerLongitude)
  .createQuery();

FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class);
hibQuery.setProjection(FullTextQuery.SPATIAL_DISTANCE, FullTextQuery.THIS);
hibQuery.setSpatialParameters(centerLatitude, centerLongitude, "location");
List results = hibQuery.list();

  • Use FullTextQuery.setProjection with FullTextQuery.SPATIAL_DISTANCE as one of the projected fields.
  • Call FullTextQuery.setSpatialParameters with the latitude, longitude and the name of the spatial field used to build the spatial query. Note that using coordinates different than the center used for the query will have unexpected results.

Tip

The default (field) name in case @Spatial is placed on the entity level is org.hibernate.search.annotations.Spatial.COORDINATES_DEFAULT_FIELD.

Distance projection and null values

When a spatial field on an entity has a null value for either its latitude or longitude (or both), the resulting projected distance will always be null.

9.2.1.2. Sorting by distance

To sort the results by distance to the center of the search you will have to build a Sort instance using Hibernate Search sort DSL:

Example 9.8. Distance sort example using the sort DSL
double centerLatitude = 24.0d;
double centerLongitude = 32.0d;

QueryBuilder builder = fullTextSession.getSearchFactory()
   .buildQueryBuilder().forEntity( POI.class ).get();
org.apache.lucene.search.Query luceneQuery = builder
  .spatial()
    .onField("location")
      .within(100, Unit.KM)
      .ofLatitude(centerLatitude)
      .andLongitude(centerLongitude)
  .createQuery();

FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class);
List results = query.list();
Sort distanceSort = qb
  .sort()
    .byDistance()
      .onField("location")
      .fromLatitude(centerLatitude)
      .andLongitude(centerLongitude)
    .createSort();
hibQuery.setSort(distanceSort);

The sort must be constructed using the same coordinates on the same spatial field used to build the spatial query, otherwise the sorting will occur with another center than the query. This repetition is needed to allow you to define Queries with any tool.

Sorting and null values

When a spatial field on an entity has a null value for either its latitude or longitude (or both):

  • if you are filtering the results using a distance query, the entity with missing coordinates will not appear in the query results and its rank in the sort is irrelevant.
  • otherwise, the resulting distance will always be the greatest possible value, which means the entity will be ranked last if the sort is ascending, or first if the sort is descending.

Alternatively, you may also use a DistanceSortField directly, as it was done before the introduction of Hibernate Search sort DSL:

Example 9.9. Distance sort example without using the sort DSL
double centerLatitude = 24.0d;
double centerLongitude = 32.0d;

QueryBuilder builder = fullTextSession.getSearchFactory()
   .buildQueryBuilder().forEntity( POI.class ).get();
org.apache.lucene.search.Query luceneQuery = builder
  .spatial()
    .onField("location")
      .within(100, Unit.KM)
      .ofLatitude(centerLatitude)
      .andLongitude(centerLongitude)
  .createQuery();

FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class);
Sort distanceSort = new Sort(
	new DistanceSortField(centerLatitude, centerLongitude, "location"));
hibQuery.setSort(distanceSort);

9.3. Multiple Coordinate pairs

You can associate multiple pairs of coordinates to the same entity, as long as each pair is uniquely identified by using a different name. This is achieved by stacking multiple @Spatial annotations within a single @Spatials annotation and specifying the name attribute on the individual @Spatial annotations.

Example 9.10. Multiple sets of coordinates
import org.hibernate.search.annotations.*;

@Entity
@Indexed
@Spatials({
  @Spatial,
  @Spatial(name="work",  spatialMode = SpatialMode.HASH)
})
public class UserEx {

  @Id
  Integer id;

  @Latitude
  Double homeLatitude;

  @Longitude
  Double homeLongitude;

  @Latitude(of="work")
  Double workLatitude;

  @Longitude(of="work")
  Double workLongitude;

To target an alternative pair of coordinates at query time, we need to specify the pair by name using onField(String):

Example 9.11. Querying on non-default coordinate set
QueryBuilder builder = fullTextSession.getSearchFactory()
  .buildQueryBuilder().forEntity( UserEx.class ).get();

org.apache.lucene.search.Query luceneQuery = builder
  .spatial()
  .onField( "work" )
  .within( radius, Unit.KM )
    .ofLatitude( centerLatitude )
    .andLongitude( centerLongitude )
  .createQuery();

org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery( luceneQuery,
   Hotel.class );
List results = hibQuery.list();

9.4. Insight: implementation details of spatial hashes indexing

The following chapter is meant to provide a technical insight in spatial hash (grid) indexing. It discusses how coordinates are mapped to the index and how queries are implemented.

9.4.1. At indexing level

When Hibernate Search indexes an entity annotated with @Spatial, it instantiates a SpatialFieldBridge to transform the latitude and longitude fields accessed via the Coordinates interface to the multiple index fields stored in the Lucene index.

Principle of the spatial index: the spatial index used in Hibernate Search is a grid based spatial index where grid ids are hashes derived from latitude and longitude.

To make computations easier the latitude and longitude field values will be projected into a flat coordinate system with the help of a sinusoidal projection. Origin value space is :

[-90 → +90],]-180 →; 180]

for latitude,longitude coordinates and projected space is:

]-pi → +pi],[-pi/2 → +pi/2]

for Cartesian x,y coordinates (beware of fields order inversion: x is longitude and y is latitude).

The index is divided into n levels labeled from 0 to n-1.

At the level 0 the projected space is the whole Earth. At the level 1 the projected space is divided into 4 rectangles (called boxes as in bounding box):

[-pi,-pi/2]→[0,0], [-pi,0]→[0,+pi/2], [0,-pi/2]→[+pi,0] and [0,0]→[+pi,+pi/2]

At level n+1 each box of level n is divided into 4 new boxes and so on. The numbers of boxes at a given level is 4^n.

Each box is given an id, in this format: [Box index on the X axis]|[Box index on the Y axis]. To calculate the index of a box on an axis we divide the axis range in 2^n slots and find the slot the box belongs to. At the n level the indexes on an axis are from -(2^n)/2 to (2^n)/2. For instance, the 5th level has 4^5 = 1024 boxes with 32 indexes on each axis (32x32 is 1024) and the box of Id "0|8" is covering the [0,8/32*pi/2]→[1/32*pi,9/32*pi/2] rectangle is projected space.

Beware! The boxes are rectangles in projected space but the related area on Earth is not rectangular!

Now that we have all these boxes at all these levels, we index points "into" them.

For a point (lat,long) we calculate its projection (x,y) and then we calculate for each level of the spatial index, the ids of the boxes it belongs to.

At each level the point is in one and only one box. For points on the edges the box are considered exclusive n the left side and inclusive on the right i-e ]start,end] (the points are normalized before projection to [-90,+90],]-180,+180]).

We store in the Lucene document corresponding to the entity to index one field for each level of the spatial hash grid. The field is named: HSSI[n]. [spatial index fields name] is given either by the parameter at class level annotation or derived from the name of the spatial annotated method of the entity, HSSI stands for Hibernate Search Spatial Index and n is the level of the spatial hashes grid.

We also store the latitude and longitude as a numeric field under [spatial index fields name]_HSSI_Latitude and [spatial index fields name]_HSSI_Longitude fields. They will be used to filter precisely results by distance in the second stage of the search.

9.4.2. At search level

Now that we have all these fields, what are they used for?

When you ask for a spatial search by providing a search discus (center+radius) we will calculate the box ids that do cover the search discus in the projected space, fetch all the documents that belong to these boxes (thus narrowing the number of documents for which we will have to calculate distance to the center) and then filter this subset with a real distance calculation. This is called two level spatial filtering.

9.4.2.1. Step 1: Compute the best spatial hashes grid level for the search discus

For a given search radius there is an optimal hash grid level where the number of boxes to retrieve shall be minimal without bringing back to many documents (level 0 has only 1 box but retrieve all documents). The optimal hash grid level is the maximum level where the width of each box is larger than the search area. Near the equator line where projection deformation is minimal, this will lead to the retrieval of at most 4 boxes. Towards the poles where the deformation is more significant, it might need to examine more boxes but as the sinusoidal projection has a simple Tissot’s indicatrix (see Sinusoidal projection) in populated areas, the overhead is minimal.

9.4.2.2. Step 2: Compute ids of the corresponding covering boxes at that level

Now that we have chosen the optimal level, we can compute the ids of the boxes covering the search discus (which is not a discus in projected space anymore).

This is done by org.hibernate.search.spatial.impl.SpatialHelper.getSpatialHashCellsIds(Point center, double radius, int spatialHashLevel)

It will calculate the bounding box of the search discus and then call org.hibernate.search.spatial.impl.SpatialHelper.getSpatialHashCellsIds(Point lowerLeft, Point upperRight, int spatialHashLevel) that will do the actual computation. If the bounding box crosses the meridian line it will cut the search in two and make two calls to getSpatialHashCellsIds(Point lowerLeft, Point upperRight, int spatialHashLevel) with left and right parts of the box.

There are some geo related hacks (search radius too large, search radius crossing the poles) that are handled in bounding box computations done by Rectangle.fromBoundingCircle(Coordinates center, double radius) (see http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates for reference on those subjects).

The SpatialHelper.getSpatialHashCellsIds(Point lowerLeft, Point upperRight, int spatialHashLevel) project the defining points of the bounding box and compute the boxes they belong to. It returns all the box Ids between the lower left to the upper right corners, thus covering the area.

9.4.2.3. Step 3: Lucene index lookup

The query is built with theses Ids searching for documents having a HSSI[n] (n the level found at Step 1) field valued with one of the ids of Step 2.

See also the implementation of org.hibernate.search.spatial.impl.SpatialHashFilter.

This query will return all documents in the boxes covering the projected bounding box of the search discus. So it is too large and needs refining. But we have narrowed the distance calculation problems to a subset of our data.

9.4.2.4. Step 4: Refine

A distance calculation filter is set after the Lucene index lookup query of Step 3 to exclude false candidates from the result list.

See SpatialQueryBuilderFromCoordinates.buildSpatialQuery(Coordinates center, double radius, String fieldName)