Hibernate.orgCommunity Documentation
With the spatial extensions you can combine fulltext queries with distance restrictions, filter results based on distances or sort results on such a distance criteria.
The spatial support of Hibernate Search has the following goals:
For example, you might search for restaurants somewhere in a 2 km radius around your office.
In order to use the spatial extensions for an indexed entity, you need to add the @Spatial
annotation (org.hibernate.search.annotations.Spatial
) and specify one or more sets of coordinates.
There are different techniques to index point coordinates. Hibernate Search Spatial offers a choice between two strategies:
We will now describe both methods, so you can make a suitable choice. You can pick a different strategy for each set of
coordinates. The strategy is selected by specifying the spatialMode
attribute of the @Spatial
annotation.
When setting the @Spatial.spatialMode
attribute to SpatialMode.RANGE
(which is the default)
coordinates are indexed as numeric fields, so that range queries can be performed to narrow down the
initial area of interest.
Pros:
Cons:
To index your entities for range querying you have to:
@Spatial
annotation on your entity
@Latitude
and @Longitude
annotations on your properties representing the coordinates;
these must be of type Double
Example 9.1. Sample Spatial indexing: Hotel class
import org.hibernate.search.annotations.*; @Entity @Indexed @Spatial public class Hotel { @Latitude Double latitude @Longitude Double longitude // ...
When setting @Spatial.spatialMode
to SpatialMode.HASH
the coordinates are encoded in several fields
representing different zoom levels. Each box for each level is labeled so coordinates are assigned
matching labels for each zoom level. This results in a grid encoding of labels called spatial
hashes
.
Pros :
Cons :
To index your entities you have to:
@Spatial
annotation on the entity with the SpatialMode
set to GRID :
@Spatial(spatialMode = SpatialMode.HASH)
@Latitude
and @Longitude
annotations on the properties representing your coordinates;
these must be of type Double
Example 9.2. Indexing coordinates in a grid using spatial hashes
@Spatial(spatialMode = SpatialMode.HASH) @Indexed @Entity public class Hotel { @Latitude Double latitude; @Longitude Double longitude; // ...
Instead of using the @Latitude
and @Longitue
annotations you can choose to implement the
org.hibernate.search.spatial.Coordinates
interface.
Example 9.3. Implementing the Coordinates interface
import org.hibernate.search.annotations.*; import org.hibernate.search.spatial.Coordinates; @Entity @Indexed @Spatial public class Song implements Coordinates { @Id long id; double latitude; double longitude; // ... @Override Double getLatitude() { return latitude; } @Override Double getLongitude() { return longitude; } // ...
As we will see in the section Section 9.3, “Multiple Coordinate pairs”, an entity can have multiple @Spatial
annotations;
when having the entity implement Coordinates
, the implemented methods refer to the default @Spatial
annotation with
the default pair of coordinates.
The default (field) name in case @Spatial
is placed on the entity level is org.hibernate.search.annotations.Spatial.COORDINATES_DEFAULT_FIELD
.
An alternative is to use properties implementing the Coordinates
interface; this way you can have
multiple Spatial
instances:
Example 9.4. Using attributes of type Coordinates
@Entity @Indexed public class Event { @Id Integer id; @Field(store = Store.YES) String name; double latitude; double longitude; @Spatial(spatialMode = SpatialMode.HASH) public Coordinates getLocation() { return new Coordinates() { @Override public Double getLatitude() { return latitude; } @Override public Double getLongitude() { return longitude; } }; } // ...
When using this form the @Spatial.name
automatically defaults to the property name. In the above case to location
.
You can use the Hibernate Search query DSL to build a query to search around a pair of coordinates (latitude, longitude)
or around a bean implementing the Coordinates
interface.
As with any fulltext query, the spatial query creation flow looks like:
QueryBuilder
from the SearchFactory
Query
with other filters
createFullTextQuery()
and use the resulting query like any standard Hibernate or JPA query
Example 9.5. Search for an Hotel by distance
QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( Hotel.class ).get(); org.apache.lucene.search.Query luceneQuery = builder .spatial() .within( radius, Unit.KM ) .ofLatitude( centerLatitude ) .andLongitude( centerLongitude ) .createQuery(); org.hibernate.Query hibQuery = fullTextSession .createFullTextQuery( luceneQuery, Hotel.class ); List results = hibQuery.list();
In the above example we did not explicitly specify the field name to use. The default coordinates field name was used implicitly. To target an alternative pair of coordinates at query time, we need to specify the field name as well. See Section 9.3, “Multiple Coordinate pairs”.
A fully working example can be found in the test-suite of the source code.
Refer to SpatialIndexingTest.testSpatialAnnotationOnClassLevel()
and its corresponding Hotel
test class.
Alternatively to passing separate latitude and longitude values, you can also pass an instance implementing the
Coordinates
interface:
Example 9.6. DSL example with Coordinates
Coordinates coordinates = Point.fromDegrees(24d, 31.5d); Query query = builder .spatial() .within( 51, Unit.KM ) .ofCoordinates( coordinates ) .createQuery(); List results = fullTextSession.createFullTextQuery( query, POI.class ).list();
To retrieve the actual distance values you need to use projection (see Section 5.1.3.5, “Projection”):
Example 9.7. Distance projection example
double centerLatitude = 24.0d; double centerLongitude= 32.0d; QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity(POI.class).get(); org.apache.lucene.search.Query luceneQuery = builder .spatial() .onField("location") .within(100, Unit.KM) .ofLatitude(centerLatitude) .andLongitude(centerLongitude) .createQuery(); FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class); hibQuery.setProjection(FullTextQuery.SPATIAL_DISTANCE, FullTextQuery.THIS); hibQuery.setSpatialParameters(centerLatitude, centerLongitude, "location"); List results = hibQuery.list();
FullTextQuery.setProjection
with FullTextQuery.SPATIAL_DISTANCE
as one of the projected fields.
FullTextQuery.setSpatialParameters
with the latitude, longitude and the name of the
spatial field used to build the spatial query. Note that using coordinates different than the
center used for the query will have unexpected results.
The default (field) name in case @Spatial
is placed on the entity level is org.hibernate.search.annotations.Spatial.COORDINATES_DEFAULT_FIELD
.
Using distance projection on non @Spatial
enabled entities and/or with a non spatial Query will have
unexpected results as entities not spatially indexed and/or having null
values for latitude or
longitude will be considered to be at (0,0)/(lat,0)/(0,long).
Using distance projection with a spatial query on spatially indexed entities having, eventually,
null
values for latitude and/or longitude is safe as they will not be found by the spatial query
and won’t have distance calculated.
To sort the results by distance to the center of the search you will have to build a Sort
instance
using a DistanceSortField
:
Example 9.8. Distance sort example
double centerLatitude = 24.0d; double centerLongitude = 32.0d; QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( POI.class ).get(); org.apache.lucene.search.Query luceneQuery = builder .spatial() .onField("location") .within(100, Unit.KM) .ofLatitude(centerLatitude) .andLongitude(centerLongitude) .createQuery(); FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class); Sort distanceSort = new Sort( new DistanceSortField(centerLatitude, centerLongitude, "location")); hibQuery.setSort(distanceSort);
The DistanceSortField
must be constructed using the same coordinates on the same spatial field used
to build the spatial query otherwise the sorting will occur with another center than the query. This
repetition is needed to allow you to define Queries with any tool.
Using distance sort on non @Spatial
enabled entities and/or with a non spatial Query will have also
unexpected results as entities non spatially indexed and/or with null values for latitude or
longitude will be considered to be at (0,0)/(lat,0)/(0,long)
Using distance sort with a spatial query on spatially indexed entities having, potentially, null
values for latitude and/or longitude is safe as they will not be found by the spatial query and so
won’t be sorted
You can associate multiple pairs of coordinates to the same entity, as long as each pair is
uniquely identified by using a different name. This is achieved by stacking multiple @Spatial
annotations within a single @Spatials
annotation and specifying the name
attribute on the individual @Spatial
annotations.
Example 9.9. Multiple sets of coordinates
import org.hibernate.search.annotations.*; @Entity @Indexed @Spatials({ @Spatial, @Spatial(name="work", spatialMode = SpatialMode.HASH) }) public class UserEx { @Id Integer id; @Latitude Double homeLatitude; @Longitude Double homeLongitude; @Latitude(of="work") Double workLatitude; @Longitude(of="work") Double workLongitude;
To target an alternative pair of coordinates at query time, we need to specify the pair by name using onField(String)
:
Example 9.10. Querying on non-default coordinate set
QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( UserEx.class ).get(); org.apache.lucene.search.Query luceneQuery = builder .spatial() .onField( "work" ) .within( radius, Unit.KM ) .ofLatitude( centerLatitude ) .andLongitude( centerLongitude ) .createQuery(); org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery( luceneQuery, Hotel.class ); List results = hibQuery.list();
The following chapter is meant to provide a technical insight in spatial hash (grid) indexing. It discusses how coordinates are mapped to the index and how queries are implemented.
When Hibernate Search indexes an entity annotated with @Spatial
, it instantiates a
SpatialFieldBridge
to transform the latitude and longitude fields accessed via the Coordinates
interface to the multiple index fields stored in the Lucene index.
Principle of the spatial index: the spatial index used in Hibernate Search is a grid based spatial index where grid ids are hashes derived from latitude and longitude.
To make computations easier the latitude and longitude field values will be projected into a flat coordinate system with the help of a sinusoidal projection. Origin value space is :
[-90 → +90],]-180 →; 180]
for latitude,longitude coordinates and projected space is:
]-pi → +pi],[-pi/2 → +pi/2]
for Cartesian x,y coordinates (beware of fields order inversion: x is longitude and y is latitude).
The index is divided into n levels labeled from 0 to n-1.
At the level 0 the projected space is the whole Earth. At the level 1 the projected space is divided into 4 rectangles (called boxes as in bounding box):
[-pi,-pi/2]→[0,0], [-pi,0]→[0,+pi/2], [0,-pi/2]→[+pi,0] and [0,0]→[+pi,+pi/2]
At level n+1 each box of level n is divided into 4 new boxes and so on. The numbers of boxes at a given level is 4^n.
Each box is given an id, in this format: [Box index on the X axis]|[Box index on the Y axis]. To calculate the index of a box on an axis we divide the axis range in 2^n slots and find the slot the box belongs to. At the n level the indexes on an axis are from -(2^n)/2 to (2^n)/2. For instance, the 5th level has 4^5 = 1024 boxes with 32 indexes on each axis (32x32 is 1024) and the box of Id "0|8" is covering the [0,8/32*pi/2]→[1/32*pi,9/32*pi/2] rectangle is projected space.
Beware! The boxes are rectangles in projected space but the related area on Earth is not rectangular!
Now that we have all these boxes at all these levels, we index points "into" them.
For a point (lat,long) we calculate its projection (x,y) and then we calculate for each level of the spatial index, the ids of the boxes it belongs to.
At each level the point is in one and only one box. For points on the edges the box are considered exclusive n the left side and inclusive on the right i-e ]start,end] (the points are normalized before projection to [-90,+90],]-180,+180]).
We store in the Lucene document corresponding to the entity to index one field for each level of the spatial hash grid. The field is named: HSSI[n]. [spatial index fields name] is given either by the parameter at class level annotation or derived from the name of the spatial annotated method of the entity, HSSI stands for Hibernate Search Spatial Index and n is the level of the spatial hashes grid.
We also store the latitude and longitude as a numeric field under [spatial index fields name]_HSSI_Latitude and [spatial index fields name]_HSSI_Longitude fields. They will be used to filter precisely results by distance in the second stage of the search.
Now that we have all these fields, what are they used for?
When you ask for a spatial search by providing a search discus (center+radius) we will calculate the box ids that do cover the search discus in the projected space, fetch all the documents that belong to these boxes (thus narrowing the number of documents for which we will have to calculate distance to the center) and then filter this subset with a real distance calculation. This is called two level spatial filtering.
For a given search radius there is an optimal hash grid level where the number of boxes to retrieve shall be minimal without bringing back to many documents (level 0 has only 1 box but retrieve all documents). The optimal hash grid level is the maximum level where the width of each box is larger than the search area. Near the equator line where projection deformation is minimal, this will lead to the retrieval of at most 4 boxes. Towards the poles where the deformation is more significant, it might need to examine more boxes but as the sinusoidal projection has a simple Tissot’s indicatrix (see Sinusoidal projection) in populated areas, the overhead is minimal.
Now that we have chosen the optimal level, we can compute the ids of the boxes covering the search discus (which is not a discus in projected space anymore).
This is done by org.hibernate.search.spatial.impl.SpatialHelper.getSpatialHashCellsIds(Point center,
double radius, int spatialHashLevel)
It will calculate the bounding box of the search discus and then call
org.hibernate.search.spatial.impl.SpatialHelper.getSpatialHashCellsIds(Point lowerLeft, Point
upperRight, int spatialHashLevel)
that will do the actual computation. If the bounding box crosses
the meridian line it will cut the search in two and make two calls to getSpatialHashCellsIds(Point
lowerLeft, Point upperRight, int spatialHashLevel)
with left and right parts of the box.
There are some geo related hacks (search radius too large, search radius crossing the poles) that
are handled in bounding box computations done by Rectangle.fromBoundingCircle(Coordinates center,
double radius)
(see http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates for reference on
those subjects).
The SpatialHelper.getSpatialHashCellsIds(Point lowerLeft, Point upperRight, int spatialHashLevel)
project the defining points of the bounding box and compute the boxes they belong to. It returns all
the box Ids between the lower left to the upper right corners, thus covering the area.
The query is built with theses Ids searching for documents having a HSSI[n] (n the level found at Step 1) field valued with one of the ids of Step 2.
See also the implementation of org.hibernate.search.spatial.impl.SpatialHashFilter
.
This query will return all documents in the boxes covering the projected bounding box of the search discus. So it is too large and needs refining. But we have narrowed the distance calculation problems to a subset of our data.