Hibernate.orgCommunity Documentation
With the Spatial extensions you can combine fulltext queries with restrictions based on distance from a point in space, filter results based on distances from coordinates or sort results on such a distance criteria.
The spatial support of Hibernate Search has a few goals:
Enable spatial search on entities: find entities within x km from a location (latitude, longitude) on Earth
Provide an easy way to enable spatial indexing via expressive annotations
Provide a simple way for querying
Hide geographical complexity
For example, you might search for that Italian place named approximately "Il Ciociaro" and is somewhere in the 2 km area around your office.
To be able to filter an @Indexed
@Entity
on a distance criteria you need to add the @Spatial
annotation (org.hibernate.search.annotations.Spatial
)
and specify one or more sets of coordinates.
There are different techniques to index point coordinates, in particular Hibernate Search Spatial offers a choice between two strategies:
as numbers formatted for range queries
in Quad-Tree labels for two stage spatial queries
We will now describe both methods so you can make a suitable choice; of
course you can pick different strategies for each set of coordinates.
These strategies are selected by specifying spatialMode
,
an attribute of the @Spatial
annotation.
When setting the @Spatial
.spatialMode
attribute to SpatialMode
.RANGE
(which is the default) coordinates are indexed as numeric fields,
so that range queries can be performed to narrow down the initial area
of interest.
Pros:
Is quick on small data sets (< 100k entities)
Is very simple: straightforward to debug/analyze
Impact on index size is moderate
Cons:
Poor performance on large data sets
Poor performance if your data set is distributed across the whole world (for example when indexing points of interest in the United States, in Europe and in Asia, large areas collide because they share the same latitude. The latitude range query returns large amounts of data that need to be cross checked with those returned by the longitude range).
To index your entities for range querying you have to:
add the @Spatial
annotation on your entity
add the @Latitude
and
@Longitude
annotations on your properties representing
the coordinates; these must be of type Double
Example 9.1. Sample Spatial indexing: Hotel class
import org.hibernate.search.annotations.*;
@Spatial @Indexed @Entity
public class Hotel {
@Latitude
Double latitude
@Longitude
Double longitude
[..]
When setting @Spatial
.spatialMode
to SpatialMode
.GRID
the coordinates are encoded in several fields representing different
zoom levels. Each box for each level is labelled so coordinates are assigned
matching labels for each zoom level. This results in a tree encoding of
labels called quad tree
.
Pros :
Good performance even with large data sets
World wide data distribution independent
Cons :
Index size is larger: need to encode multiple labels per pair of coordinates
To index your entities you have to:
add the @Spatial
annotation on the
entity with the SpatialMode
set to GRID :
@Spatial(spatialMode = SpatialMode.GRID)
add the @Latitude
and
@Longitude
annotations on the properties
representing your coordinates; these must be of type
Double
Example 9.2. Indexing coordinates in a Grid using Quad Trees
@Spatial(spatialMode = SpatialMode.GRID)
@Indexed
@Entity
public class Hotel {
@Latitude
Double latitude;
@Longitude
Double longitude;
[...]
Instead of using the @
Latitude
and @
Longitue
annotations
you can choose to implement the org.hibernate.search.spatial.Coordinates
interface.
Example 9.3. Implementing the Coordinates interface
import org.hibernate.search.annotations.*;
import org.hibernate.search.spatial.Coordinates;
@Spatial @Indexed @Entity
public class Song implements Coordinates {
@Id long id;
double latitude;
double longitude;
[...]
@Override
Double getLatitude() {
return latitude;
}
@Override
Double getLongitude() {
return longitude;
}
[...]
As we will see in the section Section 9.3, “Multiple Coordinate pairs”,
a @
Spatial
@
Indexed
@
Entity
can have multiple @
Spatial
annotations;
when having the entity implement Coordinates
, the implemented
methods refer to the default Spatial name: the default pair of coordinates.
An alternative is to use properties implementing the Coordinates
interface; this way you can have multiple Spatial instances:
Example 9.4. Using attributes of type Coordinates
@Indexed @Entity
public class Event {
@Id
Integer id;
@Field(store = Store.YES)
String name;
double latitude;
double longitude;
@Spatial(spatialMode = SpatialMode.GRID)
public Coordinates getLocation() {
return new Coordinates() {
@Override
public Double getLatitude() {
return latitude;
}
@Override
public Double getLongitude() {
return longitude;
}
};
}
[...]
When using this form the @
Spatial
.name
automatically defaults to the propery name.
The Hibernate Search DSL has been extended to support the spatial
feature. You can build a query to search around a pair of coordinates
(latitude,longitude) or around a bean implementing the Coordinates
interface.
As with any fulltext queries, also for Spatial queries you:
retrieve a QueryBuilder
from the
SearchFactory
as a starting point
use the DSL to build a spatial query with your search center and radius
optionally combine the resulting Query with other filters
call the createFullTextQuery()
and use
run it as any standard Hibernate or JPA Query
Example 9.5. Search for an Hotel by distance
QueryBuilder builder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity( Hotel.class ).get();
org.apache.lucene.search.Query luceneQuery = builder.spatial()
.onDefaultCoordinates()
.within( radius, Unit.KM )
.ofLatitude( centerLatitude )
.andLongitude( centerLongitude )
.createQuery();
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery( luceneQuery,
Hotel.class );
List results = hibQuery.list();
A fully working example can be found in the source code, in the
testsuite. See
SpatialIndexingTest.testSpatialAnnotationOnClassLevel()
and in the Hotel
class.
As an alternative to passing separate values for latitude and longitude
values, you can also pass an object implementing the Coordinates
interface:
Example 9.6. DSL example with Coordinates
Coordinates coordinates = Point.fromDegrees(24d, 31.5d);
Query query = builder
.spatial()
.onCoordinates( "location" )
.within( 51, Unit.KM )
.ofCoordinates( coordinates )
.createQuery();
List results = fullTextSession.createFullTextQuery( query, POI.class ).list();
To get the distance to the center of the search returned with the results you just need to project it:
Example 9.7. Distance projection example
double centerLatitude = 24.0d;
double centerLongitude= 32.0d;
QueryBuilder builder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(POI.class).get();
org.apache.lucene.search.Query luceneQuery = builder.spatial()
.onCoordinates("location")
.within(100, Unit.KM)
.ofLatitude(centerLatitude)
.andLongitude(centerLongitude)
.createQuery();
FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class);
hibQuery.setProjection(FullTextQuery.SPATIAL_DISTANCE, FullTextQuery.THIS);
hibQuery.setSpatialParameters(centerLatitude, centerLongitude, "location");
List results = hibQuery.list();
Use
FullTextQuery
.setProjection
with FullTextQuery.SPATIAL_DISTANCE as one of the projected
fields.
Call
FullTextQuery
.setSpatialParameters
with the latitude, longitude and the name of the spatial field used
to build the spatial query. Note that using coordinates different thans
the center used for the query will have unexpected results.
Using distance projection on non @Spatial enabled entities and/or with a non spatial Query will have unexpected results as entities not spatially indexed and/or having null values for latitude or longitude will be considered to be at (0,0)/(lat,0)/(0,long).
Using distance projection with a spatial query on spatially
indexed entities having, eventually, null
values for
latitude and/or longitude is safe as they will not be found by the
spatial query and won't have distance calculated.
To sort the results by distance to the center of the search you
will have to build a Sort object using a
DistanceSortField
:
Example 9.8. Distance sort example
double centerLatitude = 24.0d;
double centerLongitude = 32.0d;
QueryBuilder builder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity( POI.class ).get();
org.apache.lucene.search.Query luceneQuery = builder.spatial()
.onCoordinates("location")
.within(100, Unit.KM)
.ofLatitude(centerLatitude)
.andLongitude(centerLongitude)
.createQuery();
FullTextQuery hibQuery = fullTextSession.createFullTextQuery(luceneQuery, POI.class);
Sort distanceSort = new Sort(
new DistanceSortField(centerLatitude, centerLongitude, "location"));
hibQuery.setSort(distanceSort);
The DistanceSortField
must be constructed
using the same coordinates on the same spatial field used to build the
spatial query otherwise the sorting will occur with another center than
the query. This repetition is needed to allow you to define Queries with
any tool.
Using distance sort on non @Spatial enabled entities and/or with a non spatial Query will have also unexpected results as entities non spatially indexed and/or with null values for latitude or longitude will be considered to be at (0,0)/(lat,0)/(0,long)
Using distance sort with a spatial query on spatially indexed
entities having, potentially, null
values for latitude
and/or longitude is safe as they will not be found by the spatial query
and so won't be sorted
You can associate multiple pairs of coordinates to the same entity,
as long as each pair is uniquelly identified by using a different name. This is achieved
by stacking multiple @
Spatial
annotations in a @
Spatials
annotation,
and specifying the name
attribute on the
@
Spatial
annotation.
Example 9.9. Multiple sets of coordinates
import org.hibernate.search.annotations.*;
@Spatials({
@Spatial,
@Spatial(name="work", spatialMode = SpatialMode.GRID)
})
@Entity
@Indexed
public class UserEx {
@Id
Integer id;
@Latitude
Double homeLatitude;
@Longitude
Double homeLongitude;
@Latitude(of="work")
Double workLatitude;
@Longitude(of="work")
Double workLongitude;
In the example Example 9.5, “Search for an Hotel by distance” we used
onDefaultCoordinates()
which points to the coordinates
defined by a @
Spatial
annotation
whose name
attribute was not specified.
To target an alternative pair of coordinates at query time, we need
to specify the pair by name using onCoordinates
(String)
instead of
onDefaultCoordinates()
:
Example 9.10. Querying on non-default coordinate set
QueryBuilder builder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity( UserEx.class ).get();
org.apache.lucene.search.Query luceneQuery = builder.spatial()
.onCoordinates( "work" )
.within( radius, Unit.KM )
.ofLatitude( centerLatitude )
.andLongitude( centerLongitude )
.createQuery();
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery( luceneQuery,
Hotel.class );
List results = hibQuery.list();
The present chapter is meant to provide a technical insight in quad-tree (grid) indexing: how coordinates are mapped to the index and how queries are implemented.
When Hibernate Search indexes the entity annotated with @Spatial, it instantiates a SpatialFieldBridge to transform the latitude and longitude fields accessed via the Coordinates interface to the multiple index fields stored in the Lucene index.
Principle of the spatial index: the spatial index used in Hibernate Search is a QuadTree (http://en.wikipedia.org/wiki/Quadtree).
To make computation in a flat coordinates system the latitude and longitude field values will be projected with a sinusoidal projection ( http://en.wikipedia.org/wiki/Sinusoidal_projection). Origin values space is :
[-90 -> +90],]-180 -> 180]
for latitude,longitude coordinates and projected space is:
]-pi -> +pi],[-pi/2 -> +pi/2]
for cartesian x,y coordinates (beware of fields order inversion: x is longitude and y is latitude).
The index is divided into n levels labeled from 0 to n-1.
At the level 0 the projected space is the whole Earth. At the level 1 the projected space is devided into 4 rectangles (called boxes as in bounding box):
[-pi,-pi/2]->[0,0], [-pi,0]->[0,+pi/2], [0,-pi/2]->[+pi,0] and [0,0]->[+pi,+pi/2]
At level n+1 each box of level n is divided into 4 new boxes and so on. The numbers of boxes at a given level is 4^n.
Each box is given an id, in this format: [Box index on the X axis]|[Box index on the Y axis] To calculate the index of a box on an axis we divide the axis range in 2^n slots and find the slot the box belongs to. At the n level the indexes on an axis are from -(2^n)/2 to (2^n)/2. For instance, the 5th level has 4^5 = 1024 boxes with 32 indexes on each axis (32x32 is 1024) and the box of Id "0|8" is covering the [0,8/32*pi/2]->[1/32*pi,9/32*pi/2] rectangle is projected space.
Beware! The boxes are rectangles in projected space but the related area on Earth is not a rectangle!
Now that we have all these boxes at all these levels will be indexing points "into" them.
For a point (lat,long) we calculate its projection (x,y) and then we calculate for each level of the spatial index, the ids of the boxes it belongs to.
At each level the point is in one and only one box. For points on the edges the box are considered exclusive n the left side and inclusive on the right i-e ]start,end] (the points are normalized before projection to [-90,+90],]-180,+180]).
We store in the Lucene document corresponding to the entity to index one field for each level of the quad tree. The field is named: [spatial index fields name]_HSSI_[n]. [spatial index fields name] is given either by the parameter at class level annotation or derived from the name of the spatial annoted method of he entitiy, HSSI stands for Hibernate Search Spatial Index and n is the level of the quad tree.
We also store the latitude and longitude as a Numeric field under [spatial index fields name]_HSSI_Latitude and [spatial index fields name]_HSSI_Longitude fields. They will be used to filter precisely results by distance in the second stage of the search.
Now that we have all these fields, what are they used for?
When you ask for a spatial search by providing a search discus (center+radius) we will calculate the boxes ids that do cover the search discus in the projected space, fetch all the documents that belong to these boxes (thus narrowing the number of documents for which we will have to calculate distance to the center) and then filter this subset with a real distance calculation. This is called two level spatial filtering.
For a given search radius there is an optimal quad tree level where the number of boxes to retrieve hall be minimal without bringing back to many documents (level 0 has only 1 box but retrieve all documents). The optimal quad tree level is the maximum level where the width of each box is larger than the search area. Near the equator line where projection deformation is minimal, this will lead to the retrieval of at most 4 boxes. Towards the poles where the deformation is more significant, it might need to examine more boxes but as the sinusoidal projection has a simple Tissot's indicatrix (see http://en.wikipedia.org/wiki/Sinusoidal_projection) in populated areas, the overhead is minimal.
Now that we have chosen the optimal level, we can compute the ids of the boxes covering the search discus (which is not a discus in projected space anymore).
This is done by org.hibernate.search.spatial.impl.SpatialHelper.getQuadTreeCellsIds(Point center, double radius, int quadTreeLevel)
It will calculate the bounding box of the search discus and then
call
org.hibernate.search.spatial.impl.SpatialHelper.getQuadTreeCellsIds(Point
lowerLeft, Point upperRight, int quadTreeLevel) that will do the actual
computation. If the bounding box crosses the meridian line it will cut
the search in two and make two calls to getQuadTreeCellsIds(Point
lowerLeft, Point upperRight, int quadTreeLevel)
with left and
right parts of the box.
There are some geo related hacks (search radius too large, search radius crossing the poles) that are handled in bounding box computations done by Rectangle.fromBoundingCircle(Point center, double radius) (see http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates for reference on those subjects).
The SpatialHelper.getQuadTreeCellsIds(Point lowerLeft, Point upperRight, int quadTreeLevel) project the defining points of the bounding box and compute the boxes they belong to. It returns all the box Ids between the lower left to the upper right corners, thus covering the area.
The Query is build with theses Ids to lookup for documents having a [spatial index fields name]_HSSI_[n] (n the level found at Step 1) field valued with one of the ids of Step 2.
See also the implementation of org.hibernate.search.spatial.impl.QuadTreeFilter
.
This Query will return all documents in the boxes covering the projected bounding box of the search discus. So it is too large and needs refining. But we have narrowed the distance calculation problems to a subet of our datas.