Preface

Full text search engines like Apache Lucene are very powerful technologies to add efficient free text search capabilities to applications. However, Lucene suffers several mismatches when dealing with object domain models. Amongst other things indexes have to be kept up to date and mismatches between index structure and domain model as well as query mismatches have to be avoided.

Hibernate Search addresses these shortcomings: it indexes your domain model with the help of a few annotations, takes care of database/index synchronization and brings back regular managed objects from free text queries.

To achieve this, Hibernate Search combines the power of Hibernate ORM and Apache Lucene/Elasticsearch/OpenSearch.

1. Getting started

This section will guide you through the initial steps required to integrate Hibernate Search into your application.

1.1. Compatibility

Table 1. Compatibility

Version

Note

Java Runtime

8, 11 or 17

Hibernate ORM (for the ORM mapper)

5.6.12.Final

JPA (Java EE) (for the ORM mapper)

2.2

Jakarta Persistence (for the ORM mapper)

3.0

Need to use different Maven artifacts.

Apache Lucene (for the Lucene backend)

8.11.1

Elasticsearch server (for the Elasticsearch backend)

5.6, 6.8, 7.10 or 7.16

Other minor versions (e.g. 6.0 or 7.0) may work but are not given priority for bugfixes and new features.

OpenSearch server (for the Elasticsearch backend)

1.0 or 1.2

Other minor versions may work but are not given priority for bugfixes and new features.

Find more information for all versions of Hibernate Search on our compatibility matrix.

The compatibility policy may also be of interest.

Elasticsearch 7.11+ licensing

While Elasticsearch up to 7.10 was distributed under the Apache License 2.0, be aware that Elasticsearch 7.11 and later are distributed under the Elastic License and the SSPL, which are not considered open-source by the Open Source Initiative.

Only the low-level Java REST client, which Hibernate Search depends on, remains open-source.

OpenSearch

While it historically targeted Elastic’s Elasticsearch distribution, Hibernate Search is also compatible with OpenSearch and regularly tested against it.

Every section of this documentation referring to Elasticsearch is also relevant for the OpenSearch distribution.

1.2. Migration notes

If you are upgrading an existing application from an earlier version of Hibernate Search to the latest release, make sure to check out the migration guide.

To Hibernate Search 5 users

If you pull our artifacts from a Maven repository, and you come from Hibernate Search 5, be aware that just bumping the version number will not be enough.

In particular, the group IDs changed from org.hibernate to org.hibernate.search, most of the artifact IDs changed to reflect the new mapper/backend design, and the Lucene integration now requires an explicit dependency instead of being available by default. Read Dependencies for more information.

Additionally, be aware that a lot of APIs have changed, some only because of a package change, others because of more fundamental changes (like moving away from using Lucene types in Hibernate Search APIs). For that reason, you are encouraged to migrate first to Hibernate Search 6.0 using the 6.0 migration guide, and only then to later versions (which will be significantly easier).

1.3. Framework support

1.3.1. Quarkus

Quarkus has an official extension for Hibernate Search with Elasticsearch. We recommend you follow Quarkus’s Hibernate Search Guide: it is a great hands-on introduction to Hibernate Search, and it covers the specifics of Quarkus (different dependencies, different configuration properties, …​).

1.3.2. Spring Boot

Hibernate Search can easily be integrated into a Spring Boot application. Just read about Spring Boot’s specifics below, then follow the getting started guide.

Configuration properties

application.properties/application.yaml are Spring Boot configuration files, not JPA or Hibernate Search configuration files. Adding Hibernate Search properties starting with hibernate.search. directly in that file will not work.

Instead, prefix your Hibernate Search properties with spring.jpa.properties., so that Spring Boot passes along the properties to Hibernate ORM, which will pass them along to Hibernate Search.

For example:

spring.jpa.properties.hibernate.search.backend.hosts = elasticsearch.mycompany.com
Dependency versions

Spring Boot automatically sets the version of dependencies without your knowledge. While this is ordinarily a good thing, from time to time Spring Boot dependencies will be a little out of date. Thus, it is recommended to override Spring Boot’s defaults at least for some key dependencies.

With Maven, add this to your POM’s <properties>:

<properties>
    <hibernate.version>5.6.12.Final</hibernate.version>
    <elasticsearch.version>7.16.3</elasticsearch.version>
    <!-- ... plus any other properties of yours ... -->
</properties>

If, after setting the properties above, you still have problems (e.g. NoClassDefFoundError) with some of Hibernate Search’s dependencies, look for the version of that dependency in Spring Boot’s POM and Hibernate Search’s POM: there will probably be a mismatch, and generally overriding Spring Boot’s version to match Hibernate Search’s version will work fine.

Application hanging on startup

Spring Boot 2.3.x and above is affected by a bug that causes the application to hang on startup when using Hibernate Search, particularly when using custom components (custom bridges, analysis configurers, …​).

The problem, which is not limited to just Hibernate Search, has been reported, but hasn’t been fixed yet in Spring Boot 2.5.1.

As a workaround, you can set the property spring.data.jpa.repositories.bootstrap-mode to deferred or, if that doesn’t work, default. Interestingly, using @EnableJpaRepositories(bootstrapMode = BootstrapMode.DEFERRED) has been reported to work even in situations where setting spring.data.jpa.repositories.bootstrap-mode to deferred didn’t work.

Alternatively, if you do not need dependency injection in your custom components, you can refer to those components with the prefix constructor: so that Hibernate Search doesn’t even try to use Spring to retrieve the components, and thus avoids the deadlock in Spring. See this section for more information.

1.3.3. Other

If your framework of choice is not mentioned in the previous sections, don’t worry: Hibernate Search works just fine with plenty of other frameworks.

Just skip right to the next section to try it out.

1.4. Architecture

For the sake of simplicity, this guide assumes we are building application deployed as a single instance on a single node.

For more advanced setups, you are encouraged to have a look at the Examples of architectures.

1.5. Dependencies

The Hibernate Search artifacts can be found in Maven’s Central Repository.

If you do not want to, or cannot, fetch the JARs from a Maven repository, you can get them from the distribution bundle hosted at Sourceforge.

In order to use Hibernate Search, you will need at least two direct dependencies:

  • a dependency to the "mapper", which extracts data from your domain model and maps it to indexable documents;

  • and a dependency to the "backend", which allows indexing and searching these documents.

Below are the most common setups and matching dependencies for a quick start; read Architecture for more information.

Hibernate ORM + Lucene

Allows indexing of ORM entities in a single application node, storing the index on the local filesystem.

If you get Hibernate Search from Maven, use these dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-lucene</artifactId>
   <version>6.1.8.Final</version>
</dependency>

If you get Hibernate Search from the distribution bundle, copy the JARs from dist/engine, dist/mapper/orm, dist/backend/lucene, and their respective lib subdirectories.

Hibernate ORM + Elasticsearch (or OpenSearch)

Allows indexing of ORM entities on multiple application nodes, storing the index on a remote Elasticsearch or OpenSearch cluster (to be configured separately).

If you get Hibernate Search from Maven, use these dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-elasticsearch</artifactId>
   <version>6.1.8.Final</version>
</dependency>

If you get Hibernate Search from the distribution bundle, copy the JARs from dist/engine, dist/mapper/orm, dist/backend/elasticsearch, and their respective lib subdirectories.

1.6. Configuration

Once you have added all required dependencies to your application, it’s time to have a look at the configuration file.

If you are new to Hibernate ORM, we recommend you start there to implement entity persistence in your application, and only then come back here to add Hibernate Search indexing.

The configuration properties of Hibernate Search are sourced from Hibernate ORM, so they can be added to any file from which Hibernate ORM takes its configuration:

  • A hibernate.properties file in your classpath.

  • The hibernate.cfg.xml file in your classpath, if using Hibernate ORM native bootstrapping.

  • The persistence.xml file in your classpath, if using Hibernate ORM JPA bootstrapping.

Hibernate Search provides sensible defaults for all configuration properties, but depending on your setup you might want to set the following:

Example 1. Hibernate Search properties in persistence.xml for a "Hibernate ORM + Lucene" setup
<property name="hibernate.search.backend.directory.root"
          value="some/filesystem/path"/> (1)
1 Set the location of indexes in the filesystem. By default, the backend will store indexes in the current working directory.
Example 2. Hibernate Search properties in persistence.xml for a "Hibernate ORM + Elasticsearch/OpenSearch" setup
<property name="hibernate.search.backend.hosts"
          value="elasticsearch.mycompany.com"/> (1)
<property name="hibernate.search.backend.protocol"
          value="https"/> (2)
<property name="hibernate.search.backend.username"
          value="ironman"/> (3)
<property name="hibernate.search.backend.password"
          value="j@rV1s"/>
1 Set the Elasticsearch hosts to connect to. By default, the backend will attempt to connect to localhost:9200.
2 Set the protocol. The default is http, but you may need to use https.
3 Set the username and password for basic HTTP authentication. You may also be interested in AWS IAM authentication.

1.7. Mapping

Let’s assume that your application contains the Hibernate ORM managed classes Book and Author and you want to index them in order to search the books contained in your database.

Example 3. Book and Author entities BEFORE adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

@Entity
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    private String isbn;

    private int pageCount;

    @ManyToMany
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

@Entity
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}

To make these entities searchable, you will need to map them to an index structure. The mapping can be defined using annotations, or using a programmatic API; this getting started guide will show you a simple annotation mapping. For more details, refer to Mapping Hibernate ORM entities to indexes.

Below is an example of how the model above can be mapped.

Example 4. Book and Author entities AFTER adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;

@Entity
@Indexed (1)
public class Book {

    @Id (2)
    @GeneratedValue
    private Integer id;

    @FullTextField (3)
    private String title;

    @KeywordField (4)
    private String isbn;

    @GenericField (5)
    private int pageCount;

    @ManyToMany
    @IndexedEmbedded (6)
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;

@Entity (7)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField (3)
    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 @Indexed marks Book as indexed, i.e. an index will be created for that entity, and that index will be kept up to date.
2 By default, the JPA @Id is used to generate a document identifier.
3 @FullTextField maps a property to a full-text index field with the same name and type. Full-text fields are broken down into tokens and normalized (lowercased, …​). Here we’re relying on default analysis configuration, but most applications need to customize it; this will be addressed further down.
4 @KeywordField maps a property to a non-analyzed index field. Useful for identifiers, for example.
5 Hibernate Search is not just for full-text search: you can index non-String types with the @GenericField annotation, A broad range of property types are supported out-of-the-box, such as primitive types (int, double, …​) and their boxed counterpart (Integer, Double, …​), enums, date/time types, BigInteger/BigDecimal, etc.
6 @IndexedEmbedded "embeds" the indexed form of associated objects (entities or embeddables) into the indexed form of the embedding entity.

Here, the Author class defines a single indexed field, name. Thus adding @IndexedEmbedded to the authors property of Book will add a single field named authors.name to the Book index. This field will be populated automatically based on the content of the authors property, and the books will be re-indexed automatically whenever the name property of their author changes. See Mapping associated elements with @IndexedEmbedded for more information.

7 Entities that are only @IndexedEmbedded in other entities, but do not require to be searchable by themselves, do not need to be annotated with @Indexed.

This is a very simple example, but is enough to get started. Just remember that Hibernate Search allows more complex mappings:

  • Multiple @*Field annotations exist, some of them allowing full-text search, some of them allowing finer-grained configuration for field of a certain type. You can find out more about @*Field annotations in Mapping a property to an index field with @GenericField, @FullTextField, …​.

  • Properties, or even types, can be mapped with finer-grained control using "bridges". This allows the mapping of types that are not supported out-of-the-box. See Bridges for more information.

1.8. Initialization

Before the application is started for the first time, some initialization may be required:

  • The indexes and their schema need to be created.

  • Data already present in the database (if any) needs to be indexed.

1.8.1. Schema management

Before indexing can take place, indexes and their schema need to be created, either on disk (Lucene) or through REST API calls (Elasticsearch).

Fortunately, by default, Hibernate Search will take care of creating indexes on the first startup: you don’t have to do anything.

The next time the application is started, existing indexes will be re-used.

Any change to your mapping (adding new fields, changing the type of existing fields, …​) between two restarts of the application will require an update to the index schema.

This will require some special handling, though it can easily be solved by dropping and re-creating the index. See Changing the mapping of an existing application for more information.

1.8.2. Initial indexing

As we’ll see later, Hibernate Search takes care of triggering indexing every time an entity changes in the application.

However, data already present in the database when you add the Hibernate Search integration is unknown to Hibernate Search, and thus has to be indexed through a batch process. To that end, you can use the mass indexer API, as shown in the following code:

Example 5. Using Hibernate Search MassIndexer API to manually (re)index the already persisted data
SearchSession searchSession = Search.session( entityManager ); (1)

MassIndexer indexer = searchSession.massIndexer( Book.class ) (2)
        .threadsToLoadObjects( 7 ); (3)

indexer.startAndWait(); (4)
1 Get a Hibernate Search session, called SearchSession, from the EntityManager.
2 Create an "indexer", passing the entity types you want to index. To index all entity types, call massIndexer() without any argument.
3 It is possible to set the number of threads to be used. For the complete list of options see Reindexing large volumes of data with the MassIndexer.
4 Invoke the batch indexing process.
If no data is initially present in the database, mass indexing is not necessary.

1.9. Indexing

Hibernate Search will transparently index every entity persisted, updated or removed through Hibernate ORM. Thus, this code would transparently populate your index:

Example 6. Using Hibernate ORM to persist data, and implicitly indexing it through Hibernate Search
// Not shown: get the entity manager and open a transaction
Author author = new Author();
author.setName( "John Doe" );

Book book = new Book();
book.setTitle( "Refactoring: Improving the Design of Existing Code" );
book.setIsbn( "978-0-58-600835-5" );
book.setPageCount( 200 );
book.getAuthors().add( author );
author.getBooks().add( book );

entityManager.persist( author );
entityManager.persist( book );
// Not shown: commit the transaction and close the entity manager

By default, in particular when using the Elasticsearch backend, changes will not be visible right after the transaction is committed. A slight delay (by default one second) will be necessary for Elasticsearch to process the changes.

For that reason, if you modify entities in a transaction, and then execute a search query right after that transaction, the search results may not be consistent with the changes you just performed.

See Synchronization with the indexes for more information about this behavior and how to tune it.

1.10. Searching

Once the data is indexed, you can perform search queries.

The following code will prepare a search query targeting the index for the Book entity, filtering the results so that at least one field among title and authors.name contains the string refactoring. The matches are implicitly on words ("tokens") instead of the full string, and are case-insensitive: that’s because the targeted fields are full-text fields.

Example 7. Using Hibernate Search to query the indexes
// Not shown: get the entity manager and open a transaction
SearchSession searchSession = Search.session( entityManager ); (1)

SearchResult<Book> result = searchSession.search( Book.class ) (2)
        .where( f -> f.match() (3)
                .fields( "title", "authors.name" )
                .matching( "refactoring" ) )
        .fetch( 20 ); (4)

long totalHitCount = result.total().hitCount(); (5)
List<Book> hits = result.hits(); (6)

List<Book> hits2 =
        /* ... same DSL calls as above... */
        .fetchHits( 20 ); (7)
// Not shown: commit the transaction and close the entity manager
1 Get a Hibernate Search session, called SearchSession, from the EntityManager.
2 Initiate a search query on the index mapped to the Book entity.
3 Define that only documents matching the given predicate should be returned. The predicate is created using a factory f passed as an argument to the lambda expression.
4 Build the query and fetch the results, limiting to the top 20 hits.
5 Retrieve the total number of matching entities.
6 Retrieve matching entities.
7 In case you’re not interested in the whole result, but only in the hits, you can also call fetchHits() directly.

If for some reason you don’t want to use lambdas, you can use an alternative, object-based syntax, but it will be a bit more verbose:

Example 8. Using Hibernate Search to query the indexes — object-based syntax
// Not shown: get the entity manager and open a transaction
SearchSession searchSession = Search.session( entityManager ); (1)

SearchScope<Book> scope = searchSession.scope( Book.class ); (2)

SearchResult<Book> result = searchSession.search( scope ) (3)
        .where( scope.predicate().match() (4)
                .fields( "title", "authors.name" )
                .matching( "refactoring" )
                .toPredicate() )
        .fetch( 20 ); (5)

long totalHitCount = result.total().hitCount(); (6)
List<Book> hits = result.hits(); (7)

List<Book> hits2 =
        /* ... same DSL calls as above... */
        .fetchHits( 20 ); (8)
// Not shown: commit the transaction and close the entity manager
1 Get a Hibernate Search session, called SearchSession, from the EntityManager.
2 Create a "search scope", representing the indexed types that will be queried.
3 Initiate a search query targeting the search scope.
4 Define that only documents matching the given predicate should be returned. The predicate is created using the same search scope as the query.
5 Build the query and fetch the results, limiting to the top 20 hits.
6 Retrieve the total number of matching entities.
7 Retrieve matching entities.
8 In case you’re not interested in the whole result, but only in the hits, you can also call fetchHits() directly.

It is possible to get just the total hit count, using fetchTotalHitCount().

Example 9. Using Hibernate Search to count the matches
// Not shown: get the entity manager and open a transaction
SearchSession searchSession = Search.session( entityManager );

long totalHitCount = searchSession.search( Book.class )
        .where( f -> f.match()
                .fields( "title", "authors.name" )
                .matching( "refactoring" ) )
        .fetchTotalHitCount(); (1)
// Not shown: commit the transaction and close the entity manager
1 Fetch the total hit count.

Note that, while the examples above retrieved hits as managed entities, it is just one of the possible hit types. See Projection DSL for more information.

1.11. Analysis

Full-text search allows fast matches on words in a case-insensitive way, which is one step further than substring search in a relational database. But it can get much better: what if we want a search with the term "refactored" to match our book whose title contains "refactoring"? That’s possible with custom analysis.

Analysis is how text is supposed to be processed when indexing and searching. This involves analyzers, which are made up of three types of components, applied one after the other:

  • zero or (rarely) more character filters, to clean up the input text: A <strong>GREAT</strong> résumeA GREAT résume.

  • a tokenizer, to split the input text into words, called "tokens": A GREAT résume[A, GREAT, résume].

  • zero or more token filters, to normalize the tokens and remove meaningless tokens. [A, GREAT, résume][great, resume].

There are built-in analyzers, in particular the default one, which will:

  • tokenize (split) the input according to the Word Break rules of the Unicode Text Segmentation algorithm;

  • filter (normalize) tokens by turning uppercase letters to lowercase.

The default analyzer is a good fit for most language, but is not very advanced. To get the most of analysis, you will need to define a custom analyzer by picking the tokenizer and filters most suited to your specific needs.

The following paragraphs will explain how to configure and use a simple yet reasonably useful analyzer. For more information about analysis and how to configure it, refer to the Analysis section.

Each custom analyzer needs to be given a name in Hibernate Search. This is done through analysis configurers, which are defined per backend:

  1. First, you need to implement an analysis configurer, a Java class that implements a backend-specific interface: LuceneAnalysisConfigurer or ElasticsearchAnalysisConfigurer.

  2. Second, you need to alter the configuration of your backend to actually use your analysis configurer.

As an example, let’s assume that one of your indexed Book entities has the title "Refactoring: Improving the Design of Existing Code", and you want to get hits for any of the following search terms: "Refactor", "refactors", "refactored" and "refactoring". One way to achieve this is to use an analyzer with the following components:

  • A "standard" tokenizer, which splits words at whitespaces, punctuation characters and hyphens. It is a good general-purpose tokenizer.

  • A "lowercase" filter, which converts every character to lowercase.

  • A "snowball" filter, which applies language-specific stemming.

  • Finally, an "ascii-folding" filter, which replaces characters with diacritics ("é", "à", …​) with their ASCII equivalent ("e", "a", …​).

The examples below show how to define an analyzer with these components, depending on the backend you picked.

Example 10. Analysis configurer implementation and configuration in persistence.xml for a "Hibernate ORM + Lucene" setup
package org.hibernate.search.documentation.gettingstarted.withhsearch.customanalysis;

import org.hibernate.search.backend.lucene.analysis.LuceneAnalysisConfigurationContext;
import org.hibernate.search.backend.lucene.analysis.LuceneAnalysisConfigurer;

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
    @Override
    public void configure(LuceneAnalysisConfigurationContext context) {
        context.analyzer( "english" ).custom() (1)
                .tokenizer( "standard" ) (2)
                .tokenFilter( "lowercase" ) (3)
                .tokenFilter( "snowballPorter" ) (3)
                        .param( "language", "English" ) (4)
                .tokenFilter( "asciiFolding" );

        context.analyzer( "name" ).custom() (5)
                .tokenizer( "standard" )
                .tokenFilter( "lowercase" )
                .tokenFilter( "asciiFolding" );
    }
}
<property name="hibernate.search.backend.analysis.configurer"
          value="class:org.hibernate.search.documentation.gettingstarted.withhsearch.customanalysis.MyLuceneAnalysisConfigurer"/> (6)
1 Define a custom analyzer named "english", to analyze English text such as book titles.
2 Set the tokenizer to a standard tokenizer. You need to pass Lucene-specific names to refer to tokenizers; see Custom analyzers and normalizers for information about available tokenizers, their name and their parameters.
3 Set the token filters. Token filters are applied in the order they are given. Here too, Lucene-specific names are expected; see Custom analyzers and normalizers for information about available token filters, their name and their parameters.
4 Set the value of a parameter for the last added char filter/tokenizer/token filter.
5 Define another custom analyzer, called "name", to analyze author names. On contrary to the first one, do not use enable stemming, as it is unlikely to lead to useful results on proper nouns.
6 Assign the configurer to the backend in the Hibernate Search configuration (here in persistence.xml). For more information about the format of bean references, see Parsing of bean references.
Example 11. Analysis configurer implementation and configuration in persistence.xml for a "Hibernate ORM + Elasticsearch/OpenSearch" setup
package org.hibernate.search.documentation.gettingstarted.withhsearch.customanalysis;

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;

public class MyElasticsearchAnalysisConfigurer implements ElasticsearchAnalysisConfigurer {
    @Override
    public void configure(ElasticsearchAnalysisConfigurationContext context) {
        context.analyzer( "english" ).custom() (1)
                .tokenizer( "standard" ) (2)
                .tokenFilters( "lowercase", "snowball_english", "asciifolding" ); (3)

        context.tokenFilter( "snowball_english" ) (4)
                .type( "snowball" )
                .param( "language", "English" ); (5)

        context.analyzer( "name" ).custom() (6)
                .tokenizer( "standard" )
                .tokenFilters( "lowercase", "asciifolding" );
    }
}
<property name="hibernate.search.backend.analysis.configurer"
          value="class:org.hibernate.search.documentation.gettingstarted.withhsearch.customanalysis.MyElasticsearchAnalysisConfigurer"/> (7)
1 Define a custom analyzer named "english", to analyze English text such as book titles.
2 Set the tokenizer to a standard tokenizer. You need to pass Elasticsearch-specific names to refer to tokenizers; see Custom analyzers and normalizers for information about available tokenizers, their name and their parameters.
3 Set the token filters. Token filters are applied in the order they are given. Here too, Elasticsearch-specific names are expected; see Custom analyzers and normalizers for information about available token filters, their name and their parameters.
4 Note that, for Elasticsearch, any parameterized char filter, tokenizer or token filter must be defined separately and assigned a new name.
5 Set the value of a parameter for the char filter/tokenizer/token filter being defined.
6 Define another custom analyzer, named "name", to analyze author names. On contrary to the first one, do not use enable stemming, as it is unlikely to lead to useful results on proper nouns.
7 Assign the configurer to the backend in the Hibernate Search configuration (here in persistence.xml). For more information about the format of bean references, see Parsing of bean references.

Once analysis is configured, the mapping must be adapted to assign the relevant analyzer to each field:

Example 12. Book and Author entities after adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;

@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "english") (1)
    private String title;

    @KeywordField
    private String isbn;

    @GenericField
    private int pageCount;

    @ManyToMany
    @IndexedEmbedded
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;

@Entity
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "name") (1)
    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Replace the @GenericField annotation with @FullTextField, and set the analyzer parameter to the name of the custom analyzer configured earlier.

That’s it! Now, once the entities will be re-indexed, you will be able to search for the terms "Refactor", "refactors", "refactored" or "refactoring", and the book entitled "Refactoring: Improving the Design of Existing Code" will show up in the results.

Mapping changes are not auto-magically applied to already-indexed data. Unless you know what you are doing, you should remember to reindex your data after you changed the Hibernate Search mapping of your entities.

Example 13. Using Hibernate Search to query the indexes after analysis was configured
// Not shown: get the entity manager and open a transaction
SearchSession searchSession = Search.session( entityManager );

SearchResult<Book> result = searchSession.search( Book.class )
        .where( f -> f.match()
                .fields( "title", "authors.name" )
                .matching( "refactored" ) )
        .fetch( 20 );
// Not shown: commit the transaction and close the entity manager

1.12. What’s next

The above paragraphs gave you an overview of Hibernate Search.

The next step after this tutorial is to get more familiar with the overall architecture of Hibernate Search ( Architecture) and review the Examples of architectures to pick the most appropriate for your use case; distributed applications in particular require a specific setup involving a coordination strategy.

You may also want to explore the basic features in more detail. Two topics which were only briefly touched in this tutorial were analysis configuration ( Analysis) and bridges ( Bridges). Both are important features required for more fine-grained indexing.

When it comes to initializing your index, you will be interested in schema management and mass indexing.

When querying, you will probably want to know more about predicates, sorts, projections, aggregations.

You can also have a look at sample applications:

2. Concepts

2.1. Full-text search

Full-text search is a set of techniques for searching, in a corpus of text documents, the documents that best match a given query.

The main difference with traditional search — for example in an SQL database — is that the stored text is not considered as a single block of text, but as a collection of tokens (words).

Hibernate Search relies on either Apache Lucene or Elasticsearch to implement full-text search. Since Elasticsearch uses Lucene internally, they share a lot of characteristics and their general approach to full-text search.

To simplify, these search engines are based on the concept of inverted indexes: a dictionary where the key is a token (word) found in a document, and the value is the list of identifiers of every document containing this token.

Still simplifying, once all documents are indexed, searching for documents involves three steps:

  1. extracting tokens (words) from the query;

  2. looking up these tokens in the index to find matching documents;

  3. aggregating the results of the lookups to produce a list of matching documents.

Lucene and Elasticsearch are not limited to just text search: numeric data is also supported, enabling support for integers, doubles, longs, dates, etc. These types are indexed and queried using a slightly different approach, which obviously does not involve text processing.

2.2. Mapping

Applications targeted by Hibernate search generally use an entity-based model to represent data. In this model, each entity is a single object with a few properties of atomic type (String, Integer, LocalDate, …​). Each entity can have multiple associations to one or even many other entities.

Entities are thus organized as a graph, where each node is an entity and each association is an edge.

By contrast, Lucene and Elasticsearch work with documents. Each document is a collection of "fields", each field being assigned a name — a unique string — and a value — which can be text, but also numeric data such as an integer or a date. Fields also have a type, which not only determines the type of values (text/numeric), but more importantly the way this value will be stored: indexed, stored, with doc values, etc. It is possible to introduce nested documents, but not real associations.

Documents are thus organized, at best, as a collection of trees, where each tree is a document, optionally with nested documents.

There are multiple mismatches between the entity model and the document model: properties vs. fields, associations vs. nested documents, graph vs. collection of trees.

The goal of mapping, in Hibernate search, is to resolve these mismatches by defining how to transform one or more entities into a document, and how to resolve a search hit back into the original entity. This is the main added value of Hibernate Search, the basis for everything else from automatic indexing to the various search DSLs.

Mapping is usually configured using annotations in the entity model, but this can also be achieved using a programmatic API. To learn more about how to configure mapping, see Mapping Hibernate ORM entities to indexes.

To learn how to index the resulting documents, see Indexing Hibernate ORM entities (hint: it’s automatic).

To learn how to search with an API that takes advantage of the mapping to be closer to the entity model, in particular by returning hits as entities instead of just document identifiers, see Searching.

2.3. Analysis

As mentioned in Full-text search, the full-text engine works on tokens, which means text has to be processed both when indexing (document processing, to build the token → document index) and when searching (query processing, to generate a list of tokens to look up).

However, the processing is not just about "tokenizing". Index lookups are exact lookups, which means that looking up Great (capitalized) will not return documents containing only great (all lowercase). An extra step is performed when processing text to address this caveat: token filtering, which normalizes tokens. Thanks to that "normalization", Great will be indexed as great, so that an index lookup for the query great will match as expected.

In the Lucene world (Lucene, Elasticsearch, Solr, …​), text processing during both the indexing and searching phases is called "analysis" and is performed by an "analyzer".

The analyzer is made up of three types of components, which will each process the text successively in the following order:

  1. Character filter: transforms the input characters. Replaces, adds or removes characters.

  2. Tokenizer: splits the text into several words, called "tokens".

  3. Token filter: transforms the tokens. Replaces, add or removes characters in a token, derives new tokens from the existing ones, removes tokens based on some condition, …​

The tokenizer usually splits on whitespaces (though there are other options). Token filters are usually where customization takes place. They can remove accented characters, remove meaningless suffixes (-ing, -s, …​) or tokens (a, the, …​), replace tokens with a chosen spelling (wi-fiwifi), etc.

Character filters, though useful, are rarely used, because they have no knowledge of token boundaries.

Unless you know what you are doing, you should generally favor token filters.

In some cases, it is necessary to index text in one block, without any tokenization:

  • For some types of text, such as SKUs or other business codes, tokenization simply does not make sense: the text is a single "keyword".

  • For sorts by field value, tokenization is not necessary. It is also forbidden in Hibernate Search due to performance issues; only non-tokenized fields can be sorted on.

To address these use cases, a special type of analyzer, called "normalizer", is available. Normalizers are simply analyzers that are guaranteed not to use a tokenizer: they can only use character filters and token filters.

In Hibernate Search, analyzers and normalizers are referenced by their name, for example when defining a full-text field. Analyzers and normalizers have two separate namespaces.

Some names are already assigned to built-in analyzers (in Elasticsearch in particular), but it is possible (and recommended) to assign names to custom analyzers and normalizers, assembled using built-in components (tokenizers, filters) to address your specific needs.

Each backend exposes its own APIs to define analyzers and normalizers, and generally to configure analysis. See the documentation of each backend for more information:

2.4. Commit and refresh

In order to get the best throughput when indexing and when searching, both Elasticsearch and Lucene rely on "buffers" when writing to and reading from the index:

  • When writing, changes are not directly written to the index, but to an "index writer" that buffers changes in-memory or in temporary files.

    The changes are "pushed" to the actual index when the writer is committed. Until the commit happens, uncommitted changes are in an "unsafe" state: if the application crashes or if the server suffers from a power loss, uncommitted changes will be lost.

  • When reading, e.g. when executing a search query, data is not read directly from the index, but from an "index reader" that exposes a view of the index as it was at some point in the past.

    The view is updated when the reader is refreshed. Until the refresh happens, results of search queries might be slightly out of date: documents added since the last refresh will be missing, documents delete since the last refresh will still be there, etc.

Unsafe changes and out-of-sync indexes are obviously undesirable, but they are a trade-off that improves performance.

Different factors influence when refreshes and commit happen:

  • Automatic indexing will, by default, require that a commit of the index writer is performed after each set of changes, meaning the changes are safe after the Hibernate ORM transaction commit returns. However, no refresh is requested by default, meaning the changes may only be visible at a later time, when the backend decides to refresh the index reader. This behavior can be customized by setting a different synchronization strategy.

  • The mass indexer will not require any commit or refresh until the very end of mass indexing, to maximize indexing throughput.

  • Whenever there are no particular commit or refresh requirements, backend defaults will apply:

  • A commit may be forced explicitly through the flush() API.

  • A refresh may be forced explicitly though the refresh() API.

Even though we use the word "commit", this is not the same concept as a commit in relational database transactions: there is no transaction and no "rollback" is possible.

There is no concept of isolation, either. After a refresh, all changes to the index are taken into account: those committed to the index, but also those that are still buffered in the index writer.

For this reason, commits and refreshes can be treated as completely orthogonal concepts: certain setups will occasionally lead to committed changes not being visible in search queries, while others will allow even uncommitted changes to be visible in search queries.

2.5. Sharding and routing

Sharding consists in splitting index data into multiple "smaller indexes", called shards, in order to improve performance when dealing with large amounts of data.

In Hibernate Search, similarly to Elasticsearch, another concept is closely related to sharding: routing. Routing consists in resolving a document identifier, or generally any string called a "routing key", into the corresponding shard.

When indexing:

  • A document identifier and optionally a routing key are generated from the indexed entity.

  • The document, along with its identifier and optionally its routing key, is passed to the backend.

  • The backend "routes" the document to the correct shard, and adds the routing key (if any) to a special field in the document (so that it’s indexed).

  • The document is indexed in that shard.

When searching:

  • The search query can optionally be passed one or more routing keys.

  • If no routing key is passed, the query will be executed on all shards.

  • If one or more routing keys are passed:

    • The backend resolves these routing keys into a set of shards, and the query will only be executed on all shards, ignoring the other shards.

    • A filter is added to the query so that only documents indexed with one of the given routing keys are matched.

Sharding, then, can be leveraged to boost performance in two ways:

  • When indexing: a sharded index can spread the "stress" onto multiple shards, which can be located on different disks (Lucene) or different servers (Elasticsearch).

  • When searching: if one property, let’s call it category, is often used to select a subset of documents, this property can be defined as a routing key in the mapping, so that it’s used to route documents instead of the document ID. As a result, documents with the same value for category will be indexed in the same shard. Then when searching, if a query already filters documents so that it is known that the hits will all have the same value for category, the query can be manually routed to the shards containing documents with this value, and the other shards can be ignored.

To enable sharding, some configuration is required:

  • The backends require explicit configuration: see here for Lucene and here for Elasticsearch.

  • In most cases, document IDs are used to route documents to shards by default. This does not allow taking advantage of routing when searching, which requires multiple documents to share the same routing key. Applying routing to a search query in that case will return at most one result. To explicitly define the routing key to assign to each document, assign routing bridges to your entities.

Sharding is static by nature: each index is expected to have the same shards, with the same identifiers, from one boot to the other. Changing the number of shards or their identifiers will require full reindexing.

3. Architecture

3.1. Components of Hibernate Search

From the user’s perspective, Hibernate Search consists of two components:

Mapper

The mapper "maps" the user model to an index model, and provide APIs consistent with the user model to perform indexing and searching.

Most applications rely on the ORM mapper, which offers the ability to index properties of Hibernate ORM entities.

The mapper is configured partly through annotations on the domain model, and partly through configuration properties.

Backend

The backend is the abstraction over the full-text engines, where "things get done". It implements generic indexing and searching interfaces for use by the mapper through "index managers", each providing access to one index.

For instance the Lucene backend delegates to the Lucene library, and the Elasticsearch backend delegates to a remote Elasticsearch cluster.

The backend is configured partly by the mapper, which tells the backend which indexes must exist and what fields they must have, and partly through configuration properties.

The mapper and backend work together to provide three main features:

Mass indexing

This is how Hibernate Search rebuilds indexes from zero based on the content of a database.

The mapper queries the database to retrieve the identifier of every entity, then processes these identifiers in batches, loading the entities then processing them to generate documents that are sent to the backend for indexing. The backend puts the document in an internal queue, and will index documents in batches, in background processes, notifying the mapper when it’s done.

Automatic indexing

This is how Hibernate Search keeps indexes in sync with a database.

When an entity changes, the mapper detects the change and stores the information in an indexing plan. Upon transaction commit, these changes processed (either in the same thread or in a background process, depending on the coordination strategy), and documents are generated, then sent to the backend for indexing. The backend puts the documents in an internal queue, and will index documents in batches, in background processes, notifying the mapper when it’s done.

See Automatic indexing for details.

Searching

This is how Hibernate Search provides ways to query an index.

The mapper exposes entry points to the search DSL, allowing selection of entity types to query. When one or more entity types are selected, the mapper delegates to the corresponding index managers to provide a Search DSL and ultimately create the search query. Upon query execution, the backend submits a list of entity references to the mapper, which loads the corresponding entities. The entities are then returned by the query.

See Searching for details.

3.2. Examples of architectures

3.2.1. Overview

Table 2. Comparison of architectures
Architecture Single-node with Lucene No coordination with Elasticsearch Outbox polling with Elasticsearch

Application topology

Single-node

Single-node or multi-node

Extra bits to maintain

Indexes on filesystem

Elasticsearch cluster

Guarantee of index updates

When the commit returns (non-transactional)

On commit (transactional)

Visibility of index updates

Configurable: immediate or eventual

Configurable: immediate (poor performance) or eventual

Eventual

Native features

Mostly for experts

For anyone

Overhead for application threads

Low to medium

Very low

Overhead for the database

Low

Low to medium

Impact on database schema

None

Extra tables

Limitations

Automatic indexing ignores: JPQL/SQL queries, asymmetric association updates

Out-of-sync indexes in rare situations: concurrent @IndexedEmbedded, backend I/O errors

No other known limitation

3.2.2. Single-node application with the Lucene backend

Description

With the Lucene backend, indexes are local to a given application node (JVM). They are accessed through direct calls to the Lucene library, without going through the network.

Simple architecture with Lucene backend

This mode is only relevant to single-node applications.

Pros and cons

Pros:

  • Simplicity: no external services are required, everything lives on the same server.

  • Immediate visibility (~milliseconds) of index updates. While other architectures can perform comparably well for most use cases, a single-node, Lucene backend is the best way to implement indexing if you need changes to be visible immediately after the database changes.

Cons:

Getting started

To implement this architecture, use the following Maven dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-lucene</artifactId>
   <version>6.1.8.Final</version>
</dependency>

3.2.3. Single-node or multi-node application, without coordination and with the Elasticsearch backend

Description

With the Elasticsearch backend, indexes are not tied to the application node. They are managed by a separate cluster of Elasticsearch nodes, and accessed through calls to REST APIs.

Thus, it is possible to set up multiple application nodes in such a way that they all perform index updates and search queries independently, without coordinating with each other.

Simple architecture with Elasticsearch backend
The Elasticsearch cluster may be a single node living on the same server as the application.
Pros and cons

Pros:

Cons:

Getting started

To implement this architecture, use the following Maven dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-elasticsearch</artifactId>
   <version>6.1.8.Final</version>
</dependency>

3.2.4. Multi-node application with outbox polling and Elasticsearch backend

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

Description

With Hibernate Search’s outbox-polling coordination strategy, entity change events are not processed immediately in the ORM session where they arise, but are pushed to an outbox table in the database.

A background process polls that outbox table for new events, and processes them asynchronously, updating the indexes as necessary. Since that queue can be sharded, multiple application nodes can share the workload of indexing.

This requires the Elasticsearch backend so that indexes are not tied to a single application node and can be updated or queried from multiple application nodes.

Clustered architecture with outbox polling and Elasticsearch backend
Pros and cons

Pros:

Cons:

Getting started

The outbox-polling coordination strategy requires an extra dependency. To implement this architecture, use the following Maven dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm-coordination-outbox-polling</artifactId>
   <version>6.1.8.Final</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-elasticsearch</artifactId>
   <version>6.1.8.Final</version>
</dependency>

4. Configuration

4.1. Configuration sources

When using Hibernate Search within Hibernate ORM, configuration properties are retrieved from Hibernate ORM.

This means that wherever you set Hibernate ORM properties, you can set Hibernate Search properties:

  • In a hibernate.properties file at the root of your classpath.

  • In persistence.xml, if you bootstrap Hibernate ORM with the JPA APIs

  • In JVM system properties (-DmyProperty=myValue passed to the java command)

  • In the configuration file of your framework, for example application.yaml/application.properties.

When setting properties through the configuration file of your framework, the keys of configuration properties will likely be different from the keys mentioned in this documentation.

For example hibernate.search.backend.hosts will become quarkus.hibernate-search-orm.elasticsearch.hosts in Quarkus or spring.jpa.properties.hibernate.search.backend.hosts in Spring.

See Framework support for more information.

4.2. Configuration properties

4.2.1. Structure of configuration properties

Configuration properties are all grouped under a common root. In the ORM integration, this root is hibernate.search, but other integrations (Infinispan, …​) may use a different one. This documentation will use hibernate.search in all examples.

Under that root, we can distinguish between three categories of properties.

Global properties

These properties potentially affect all Hibernate Search. They are generally located just under the hibernate.search root.

Global properties are explained in the relevant parts of this documentation:

Backend properties

These properties affect a single backend. They are grouped under a common root:

  • hibernate.search.backend for the default backend (most common usage).

  • hibernate.search.backends.<backend name> for a named backend (advanced usage).

Backend properties are explained in the relevant parts of this documentation:

Index properties

These properties affect either one or multiple indexes, depending on the root.

With the root hibernate.search.backend, they set defaults for all indexes of the backend.

With the root hibernate.search.backend.indexes.<index name>, they set the value for a specific index, overriding the defaults (if any). The backend and index names must match the names defined in the mapping. For ORM entities, the default index name is the name of the indexed class, without the package: org.mycompany.Book will have Book as its default index name. Index names can be customized in the mapping.

Alternatively, the backend can also be referenced by name, i.e. the roots above can also be hibernate.search.backends.<backend name> or hibernate.search.backends.<backend name>.indexes.<index name>.

Examples:

  • hibernate.search.backend.io.commit_interval = 500 sets the io.commit_interval property for all indexes of the default backend.

  • hibernate.search.backend.indexes.Product.io.commit_interval = 2000 sets the io.commit_interval property for the Product index of the default backend.

  • hibernate.search.backends.myBackend.io.commit_interval = 500 sets the io.commit_interval property for all indexes of backend myBackend.

  • hibernate.search.backends.myBackend.indexes.Product.io.commit_interval = 2000 sets the io.commit_interval property for the Product index of backend myBackend.

Other index properties are explained in the relevant parts of this documentation:

4.2.2. Building property keys programmatically

Both BackendSettings and IndexSettings provide tools to help build the configuration property keys.

BackendSettings

BackendSettings.backendKey(ElasticsearchBackendSettings.HOSTS) is equivalent to hibernate.search.backend.hosts.

BackendSettings.backendKey("myBackend", ElasticsearchBackendSettings.HOSTS) is equivalent to hibernate.search.backends.myBackend.hosts.

For a list of available property keys, see ElasticsearchBackendSettings or LuceneBackendSettings

IndexSettings

IndexSettings.indexKey("myIndex", ElasticsearchIndexSettings.INDEXING_QUEUE_SIZE) is equivalent to hibernate.search.backend.indexes.myIndex.indexing.queue_size.

IndexSettings.indexKey("myBackend", "myIndex", ElasticsearchIndexSettings.INDEXING_QUEUE_SIZE) is equivalent to hibernate.search.backends.myBackend.indexes.myIndex.indexing.queue_size.

For a list of available property keys, see ElasticsearchIndexSettings or LuceneIndexSettings

Example 14. Using the helper to build hibernate configuration
private Properties buildHibernateConfiguration() {
    Properties config = new Properties();
    // backend configuration
    config.put( BackendSettings.backendKey( ElasticsearchBackendSettings.HOSTS ), "127.0.0.1:9200" );
    config.put( BackendSettings.backendKey( ElasticsearchBackendSettings.PROTOCOL ), "http" );
    // index configuration
    config.put(
            IndexSettings.indexKey( "myIndex", ElasticsearchIndexSettings.INDEXING_MAX_BULK_SIZE ),
            20
    );
    // orm configuration
    config.put(
            HibernateOrmMapperSettings.AUTOMATIC_INDEXING_SYNCHRONIZATION_STRATEGY,
            AutomaticIndexingSynchronizationStrategyNames.ASYNC
    );
    // engine configuration
    config.put( EngineSettings.BACKGROUND_FAILURE_HANDLER, "myFailureHandler" );
    return config;
}

4.3. Type of configuration properties

Property values can be set programmatically as Java objects, or through a configuration file as a string that will have to be parsed.

Each configuration property in Hibernate Search has an assigned type, and this type defines the accepted values in both cases.

Here are the definitions of all property types.

Designation Accepted Java objects Accepted String format

String

java.lang.String

Any string

Boolean

java.lang.Boolean

true or false (case-insensitive)

Integer

java.lang.Number (will call .intValue())

Any string that can be parsed by Integer.parseInt

Long

java.lang.Number (will call .longValue())

Any string that can be parsed by Long.parseLong

Bean reference of type T

An instance of T or BeanReference or a reference by type as a java.lang.Class (see Bean references)

Multivalued bean reference of type T

A java.util.Collection containing bean references (see above)

Comma-separated string containing bean references (see above)

4.4. Configuration property checking

Hibernate Search will track the parts of the provided configuration that are actually used and will log a warning if any configuration property starting with "hibernate.search." is never used, because that might indicate a configuration issue.

To disable this warning, set the hibernate.search.configuration_property_checking.strategy property to ignore.

4.5. Beans

Hibernate Search allows plugging in references to custom beans in various places: configuration properties, mapping annotations, arguments to APIs, …​

4.5.1. Supported frameworks

When using the Hibernate Search integration into Hibernate ORM, all dependency injection frameworks supported by Hibernate ORM are supported.

This includes, but may not be limited to:

When the framework is not supported, or when using Hibernate Search without Hibernate ORM, beans can only be retrieved using reflection by calling the public, no-arg constructor of the referenced type.

4.5.2. Bean references

Bean references are composed of two parts:

  • The type, i.e. a java.lang.Class.

  • Optionally, the name, as a String.

When referencing beans using a string value in configuration properties, the type is implicitly set to whatever interface Hibernate Search expects for that configuration property.

For experienced users, Hibernate Search also provides the org.hibernate.search.engine.environment.bean.BeanReference type, which is accepted in configuration properties and APIs. This interface allows plugging in custom instantiation and cleanup code. See the javadoc of this interface for details.

4.5.3. Parsing of bean references

When referencing beans using a string value in configuration properties, that string is parsed.

Here are the most common formats:

  • bean: followed by the name of a Spring or CDI bean. For example bean:myBean.

  • class: followed by the fully-qualified name of a class, to be instantiated through Spring/CDI if available, or through its public, no-argument constructor otherwise. For example class:com.mycompany.MyClass.

  • An arbitrary string that doesn’t contain a colon: it will be interpreted as explained in Bean resolution. In short:

    • first, look for a built-in bean with the given name;

    • then try to retrieve a bean with the given name from Spring/CDI (if available);

    • then try to interpret the string as a fully-qualified class name and to retrieve the corresponding bean from Spring/CDI (if available);

    • then try to interpret the string as a fully-qualified class name and to instantiate it through its public, no-argument constructor.

The following formats are also accepted, but are only useful for advanced use cases:

  • any: followed by an arbitrary string. Equivalent to leaving out the prefix in most cases. Only useful if the arbitrary string contains a colon.

  • builtin: followed by the name of a built-in bean, e.g. simple for the Elasticsearch index layout strategies. This will not fall back to Spring/CDI or a direct constructor call.

  • constructor: followed by the fully-qualified name of a class, to be instantiated through its public, no-argument constructor. This will ignore built-in beans and will not try to instantiate the class through Spring/CDI.

4.5.4. Bean resolution

Bean resolution (i.e. the process of turning this reference into an object instance) happens as follows by default:

  • If the given reference matches a built-in bean, that bean is used.

    Example: the name simple, when used as the value of the property hibernate.search.backend.layout.strategy to configure the Elasticsearch index layout strategy, resolves to the built-in simple strategy.

  • Otherwise, if a dependency injection framework is integrated into Hibernate ORM, the reference is resolved using the DI framework (see Supported frameworks).

    • If a managed bean with the given type (and if provided, name) exists, that bean is used.

      Example: the name myLayoutStrategy, when used as the value of the property hibernate.search.backend.layout.strategy to configure the Elasticsearch index layout strategy, resolves to any bean known from CDI/Spring of type IndexLayoutStrategy and annotated with @Named("myLayoutStrategy").

    • Otherwise, if a name is given, and that name is a fully-qualified class name, and a managed bean of that type exists, that bean is used.

      Example: the name com.mycompany.MyLayoutStrategy, when used as the value of the property hibernate.search.backend.layout.strategy to configure the Elasticsearch index layout strategy, resolves to any bean known from CDI/Spring and extending com.mycompany.MyLayoutStrategy.

  • Otherwise, reflection is used to resolve the bean.

    • If a name is given, and that name is a fully-qualified class name, and that class extends the type reference, an instance is created by invoking the public, no-argument constructor of that class.

      Example: the name com.mycompany.MyLayoutStrategy, when used as the value of the property hibernate.search.backend.layout.strategy to configure the Elasticsearch index layout strategy, resolves to an instance of com.mycompany.MyLayoutStrategy.

    • If no name is given, an instance is created by invoking the public, no-argument constructor of the referenced type.

      Example: the class com.mycompany.MyLayoutStrategy.class (a java.lang.Class, not a String), when used as the value of the property hibernate.search.backend.layout.strategy to configure the Elasticsearch index layout strategy, resolves to an instance of com.mycompany.MyLayoutStrategy.

It is possible to control bean retrieval more finely by selecting a BeanRetrieval; see the javadoc of org.hibernate.search.engine.environment.bean.BeanRetrieval for more information. See also Parsing of bean references for the prefixes that allow to specify the bean retrieval when referencing a bean from configuration properties.

4.5.5. Bean injection

All beans resolved by Hibernate Search using a supported framework can take advantage of injection features of this framework.

For example a bean can be injected with another bean by annotating one of its fields in the bridge with @Inject.

Lifecycle annotations such as @PostConstruct should also work as expected.

Even when not using any framework, it is still possible to take advantage of the BeanResolver. This component, passed to several methods during bootstrap, exposes several methods to resolve a reference into a bean, exposing programmatically what would usually be achieved with an @Inject annotation. See the javadoc of BeanResolver for more information.

4.5.6. Bean lifecycle

As soon as beans are no longer needed, Hibernate Search will release them and let the dependency injection framework call the appropriate methods (@PreDestroy, …​).

Some beans are only necessary during bootstrap, such as ElasticsearchAnalysisConfigurers, so they will be released just after bootstrap.

Other beans are necessary at runtime, such as ValueBridges, so they will be released on shutdown.

Be careful to define the scope of your beans as appropriate.

Immutable beans or beans used only once such as ElasticsearchAnalysisConfigurer may be safely assigned any scope.

However, some beans are expected to be mutable and instantiated multiple times, such as for example PropertyBinder. For these beans, it is recommended to use the "dependent" scope (CDI terminology) or the "prototype" scope (Spring terminology). When in doubt, this is also generally the safest choice for beans injected into Hibernate Search.

Beans resolved by Hibernate Search using a supported framework can take advantage of injection features of this framework.

4.6. Global configuration

This section presents global configuration, common to all mappers and backends.

4.6.1. Background failure handling

Hibernate Search generally propagates exceptions occurring in background threads to the user thread, but in some cases, such as Lucene segment merging failures, or some failures during automatic indexing, the exception in background threads cannot be propagated. By default, when that happens, the failure is logged at the ERROR level.

To customize background failure handling, you will need to:

  1. Define a class that implements the org.hibernate.search.engine.reporting.FailureHandler interface.

  2. Configure the backend to use that implementation by setting the configuration property hibernate.search.background_failure_handler to a bean reference pointing to the implementation, for example class:com.mycompany.MyFailureHandler.

Hibernate Search will call the handle methods whenever a failure occurs.

Example 15. Implementing and using a FailureHandler
package org.hibernate.search.documentation.reporting.failurehandler;

import java.util.ArrayList;
import java.util.List;

import org.hibernate.search.engine.reporting.EntityIndexingFailureContext;
import org.hibernate.search.engine.reporting.FailureContext;
import org.hibernate.search.engine.reporting.FailureHandler;
import org.hibernate.search.util.impl.test.rule.StaticCounters;

public class MyFailureHandler implements FailureHandler {

    @Override
    public void handle(FailureContext context) { (1)
        String failingOperationDescription = context.failingOperation().toString(); (2)
        Throwable throwable = context.throwable(); (3)

        // ... report the failure ... (4)
    }

    @Override
    public void handle(EntityIndexingFailureContext context) { (5)
        String failingOperationDescription = context.failingOperation().toString();
        Throwable throwable = context.throwable();
        List<String> entityReferencesAsStrings = new ArrayList<>();
        for ( Object entityReference : context.entityReferences() ) { (6)
            entityReferencesAsStrings.add( entityReference.toString() );
        }

        // ... report the failure ... (7)
    }

}
1 handle(FailureContext) is called for generic failures that do not fit any other specialized handle method.
2 Get a description of the failing operation from the context.
3 Get the throwable thrown when the operation failed from the context.
4 Use the context-provided information to report the failure in any relevant way.
5 handle(EntityIndexingFailureContext) is called for failures occurring when indexing entities.
6 On top of the failing operation and throwable, the context also lists references to entities that could not be indexed correctly because of the failure.
7 Use the context-provided information to report the failure in any relevant way.
(1)
hibernate.search.background_failure_handler = org.hibernate.search.documentation.reporting.failurehandler.MyFailureHandler
1 Assign the background failure handler using a Hibernate Search configuration property.

When a failure handler’s handle method throws an error or exception, Hibernate Search will catch it and log it at the ERROR level. It will not be propagated.

4.6.2. Multi-tenancy

If your application uses Hibernate ORM’s multi-tenancy support, Hibernate Search should detect that and configure your backends transparently. For details, see:

In some cases, in particular when using the outbox-polling coordination strategy, you will need to list explicitly all tenant identifiers that your application might use. This information is used by Hibernate Search when spawning background processes that should apply an operation to every tenant.

The list of identifiers is defined through the following configuration property:

hibernate.search.multi_tenancy.tenant_ids = mytenant1,mytenant2,mytenant3

This property may be set to a String containing multiple tenant identifiers separated by commas, or a Collection<String> containing tenant identifiers.

5. Mapping Hibernate ORM entities to indexes

5.1. Configuration

5.1.1. Enabling/disabling Hibernate Search

The Hibernate Search integration into Hibernate ORM is enabled by default as soon as it is present in the classpath. If for some reason you need to disable it, set the hibernate.search.enabled boolean property to false.

5.1.2. Configuring the mapping

By default, Hibernate Search will automatically process mapping annotations for entity types, as well as nested types in those entity types, for instance embedded types. See Entity/index mapping and Mapping a property to an index field with @GenericField, @FullTextField, …​ to get started with annotation-based mapping.

If you want to ignore these annotations, set hibernate.search.mapping.process_annotations to false.

To configure the mapping programmatically, see Programmatic mapping.

5.1.3. Other configuration properties

Other configuration properties are mentioned in the relevant parts of this documentation. You can find a full reference of available properties in the Hibernate Search javadoc: org.hibernate.search.mapper.orm.cfg.HibernateOrmMapperSettings.

5.2. Programmatic mapping

5.2.1. Basics

Most examples in this documentation use annotation-based mapping, which is generally enough for most applications. However, some applications have needs that go beyond what annotations can offer:

  • a single entity type must be mapped differently for different deployments — e.g. for different customers.

  • many entity types must be mapped similarly, without code duplication.

To address those needs, you can use programmatic mapping: define the mapping through code that will get executed on startup.

Implementing a programmatic mapping requires two steps:

  1. Define a class that implements the org.hibernate.search.mapper.orm.mapping.HibernateOrmSearchMappingConfigurer interface.

  2. Configure Hibernate Search to use that implementation by setting the configuration property hibernate.search.mapping.configurer to a bean reference pointing to the implementation, for example class:com.mycompany.MyMappingConfigurer.

Hibernate Search will call the configure method of this implementation on startup, and the configurer will be able to take advantage of a DSL to define the programmatic mapping.

Programmatic mapping is declarative and exposes the exact same features as annotation-based mapping.

In order to implement more complex, "imperative" mapping, for example to combine two entity properties into a single index field, use custom bridges.

Alternatively, if you only need to repeat the same mapping for several types or properties, you can apply a custom annotation on those types or properties, and have Hibernate Search execute some programmatic mapping code when it encounters that annotation. This solution doesn’t require a mapping configurer.

See Custom mapping annotations for more information.

See below for an example. The following sections also provide one example of programmatic mapping for each feature.

Example 16. Implementing a mapping configurer
public class MySearchMappingConfigurer implements HibernateOrmSearchMappingConfigurer {
    @Override
    public void configure(HibernateOrmMappingConfigurationContext context) {
        ProgrammaticMappingConfigurationContext mapping = context.programmaticMapping(); (1)
        TypeMappingStep bookMapping = mapping.type( Book.class ); (2)
        bookMapping.indexed(); (3)
        bookMapping.property( "title" ) (4)
                .fullTextField().analyzer( "english" ); (5)
    }
}
1 Access the programmatic mapping.
2 Access the programmatic mapping of type Book.
3 Define Book as indexed.
4 Access the programmatic mapping of property title of type Book.
5 Define an index field based on property title of type Book.

By default, programmatic mapping will be merged with annotation mapping (if any).

To disable annotation mapping, set hibernate.search.mapping.process_annotations to false.

5.2.2. Mapping Map-based models

"dynamic-map" entity models, i.e. models based on java.util.Map instead of custom classes, cannot be mapped using annotations. However, they can be mapped using the programmatic mapping API. You just need to refer to the types by their name using context.programmaticMapping().type("thename"):

  • Pass the entity name for dynamic entity types.

  • Pass the "role" for dynamic embedded/component types, i.e. the name of the owning entity, followed by a dot ("."), followed by the dot-separated path to the component in that entity. For example MyEntity.myEmbedded or MyEntity.myEmbedded.myNestedEmbedded.

5.3. Entity/index mapping

5.3.1. Basics

In order to index an entity, it must be annotated with @Indexed.

Example 17. Marking a class for indexing with @Indexed
@Entity
@Indexed
public class Book {

Subclasses inherit the @Indexed annotation and will also be indexed by default. Each indexed subclass will have its own index, though this will be transparent when searching (all targeted indexes will be queried simultaneously).

If the fact that @Indexed is inherited is a problem for your application, you can annotate subclasses with @Indexed(enabled = false).

By default:

  • The index name will be equal to the entity name, which in Hibernate ORM is set using the @Entity annotation and defaults to the simple class name.

  • The identifier of indexed documents will be generated from the entity identifier. Most types commonly used for entity identifiers are supported out of the box, but for more exotic types you may need specific configuration. See Mapping the document identifier for details.

  • The index won’t have any field. Fields must be mapped to properties explicitly. See Mapping a property to an index field with @GenericField, @FullTextField, …​ for details.

5.3.2. Explicit index/backend

You can change the name of the index by setting @Indexed(index = …​). Note that index names must be unique in a given application.

Example 18. Explicit index name with @Indexed.index
@Entity
@Indexed(index = "AuthorIndex")
public class Author {

If you defined named backends, you can map entities to another backend than the default one. By setting @Indexed(backend = "backend2") you inform Hibernate Search that the index for your entity must be created in the backend named "backend2". This may be useful if your model has clearly defined sub-parts with very different indexing requirements.

Example 19. Explicit backend with @Indexed.backend
@Entity
@Table(name = "\"user\"")
@Indexed(backend = "backend2")
public class User {

Entities indexed in different backends cannot be targeted by the same query. For example, with the mappings defined above, the following code will throw an exception because Author and User are indexed in different backends:

// This will fail because Author and User are indexed in different backends
searchSession.search( Arrays.asList( Author.class, User.class ) )
        .where( f -> f.matchAll() )
        .fetchHits( 20 );

5.3.3. Conditional indexing and routing

The mapping of an entity to an index is not always as straightforward as "this entity type goes to this index". For many reasons, but mainly for performance reasons, you may want to customize when and where a given entity is indexed:

  • You may not want to index all entities of a given type: for example, prevent indexing of entities when their status property is set to DRAFT or ARCHIVED, because users are not supposed to search for those entities.

  • You may want to route entities to a specific shard of the index: for example, route entities based on their language property, because each user has a specific language and only searches for entities in their language.

These behaviors can be implemented in Hibernate Search by assigning a routing bridge to the indexed entity type through @Indexed(routingBinder = …​).

For more information about routing bridges, see Routing bridge.

5.3.4. Programmatic mapping

You can mark an entity as indexed through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 20. Marking a class for indexing with .indexed()
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
TypeMappingStep authorMapping = mapping.type( Author.class );
authorMapping.indexed().index( "AuthorIndex" );
TypeMappingStep userMapping = mapping.type( User.class );
userMapping.indexed().backend( "backend2" );

5.4. Mapping the document identifier

5.4.1. Basics

Index documents, much like entities, need to be assigned an identifier so that Hibernate Search can handle updates and deletion.

When indexing Hibernate ORM entities, the entity identifier is used as a document identifier by default.

Provided the entity identifier has a supported type, identifier mapping will work out of the box and no explicit mapping is necessary.

5.4.2. Explicit identifier mapping

Explicit identifier mapping is required in the following cases:

  • The document identifier is not the entity identifier.

  • OR the entity identifier has a type that is not supported by default. This is the case of composite identifiers, in particular.

To select a property to map to the document identifier, just apply the @DocumentId annotation to that property:

Example 21. Mapping a property to the document identifier explicitly with @DocumentId
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @NaturalId
    @DocumentId
    private String isbn;

    public Book() {
    }

    // Getters and setters
    // ...

}

When the property type is not supported, it is also necessary to implement a custom identifier bridge, then refer to it in the @DocumentId annotation:

Example 22. Mapping a property with unsupported type to the document identifier with @DocumentId
@Entity
@Indexed
public class Book {

    @Id
    @Convert(converter = ISBNAttributeConverter.class)
    @DocumentId(identifierBridge = @IdentifierBridgeRef(type = ISBNIdentifierBridge.class))
    private ISBN isbn;

    public Book() {
    }

    // Getters and setters
    // ...

}

5.4.3. Supported identifier property types

Below is a table listing all types with built-in identifier bridges, i.e. property types that are supported out of the box when mapping a property to a document identifier.

The table also explains the value assigned to the document identifier, i.e. the value passed to the underlying backend.

Table 3. Property types with built-in identifier bridges
Property type Value of document identifiers Limitations

All enum types

name() as a java.lang.String

-

java.lang.String

Unchanged

-

java.lang.Character, char

A single-character java.lang.String

-

java.lang.Byte, byte

toString()

-

java.lang.Short, short

toString()

-

java.lang.Integer, int

toString()

-

java.lang.Long, long

toString()

-

java.lang.Double, double

toString()

-

java.lang.Float, float

toString()

-

java.lang.Boolean, boolean

toString()

-

java.math.BigDecimal

toString()

-

java.math.BigInteger

toString()

-

java.net.URI

toString()

-

java.net.URL

toExternalForm()

-

java.time.Instant

Formatted according to DateTimeFormatter.ISO_INSTANT.

-

java.time.LocalDate

Formatted according to DateTimeFormatter.ISO_LOCAL_DATE.

-

java.time.LocalTime

Formatted according to DateTimeFormatter.ISO_LOCAL_TIME.

-

java.time.LocalDateTime

Formatted according to DateTimeFormatter.ISO_LOCAL_DATE_TIME.

-

java.time.OffsetDateTime

Formatted according to DateTimeFormatter.ISO_OFFSET_DATE_TIME.

-

java.time.OffsetTime

Formatted according to DateTimeFormatter.ISO_OFFSET_TIME.

-

java.time.ZonedDateTime

Formatted according to DateTimeFormatter.ISO_ZONED_DATE_TIME.

-

java.time.ZoneId

getId()

-

java.time.ZoneOffset

getId()

-

java.time.Period

Formatted according to the ISO 8601 format for a duration (e.g. P1900Y12M21D).

-

java.time.Duration

Formatted according to the ISO 8601 format for a duration, using seconds and nanoseconds only (e.g. PT1.000000123S).

-

java.time.Year

Formatted according to the ISO 8601 format for a Year (e.g. 2017 for 2017 AD, 0000 for 1 BC, -10000 for 10,001 BC, etc.).

-

java.time.YearMonth

Formatted according to the ISO 8601 format for a Year-Month (e.g. 2017-11 for November 2017).

-

java.time.MonthDay

Formatted according to the ISO 8601 format for a Month-Day (e.g. --11-06 for November 6th).

-

java.util.UUID

toString() as a java.lang.String

-

java.util.Calendar

A java.time.ZonedDateTime representing the same date/time and timezone, formatted according to DateTimeFormatter.ISO_ZONED_DATE_TIME.

See Support for legacy java.util date/time APIs.

java.util.Date

Instant.ofEpochMilli(long) as a java.time.Instant formatted according to DateTimeFormatter.ISO_INSTANT.

See Support for legacy java.util date/time APIs.

java.sql.Timestamp

Instant.ofEpochMilli(long) as a java.time.Instant formatted according to DateTimeFormatter.ISO_INSTANT.

See Support for legacy java.util date/time APIs.

java.sql.Date

Instant.ofEpochMilli(long) as a java.time.Instant formatted according to DateTimeFormatter.ISO_INSTANT.

See Support for legacy java.util date/time APIs.

java.sql.Time

Instant.ofEpochMilli(long) as a java.time.Instant, formatted according to DateTimeFormatter.ISO_INSTANT.

See Support for legacy java.util date/time APIs.

GeoPoint and subtypes

Latitude as double and longitude as double, separated by a comma (e.g. 41.8919, 12.51133).

-

5.4.4. Programmatic mapping

You can map the document identifier through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 23. Mapping a property to the document identifier explicitly with .documentId()
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "isbn" ).documentId();

5.5. Mapping a property to an index field with @GenericField, @FullTextField, …​

5.5.1. Basics

Properties of an entity can be mapped to an index field directly: you just need to add an annotation, configure the field through the annotation attributes, and Hibernate Search will take care of extracting the property value and populating the index field when necessary.

Mapping a property to an index field looks like this:

Example 24. Mapping properties to fields directly
@FullTextField(analyzer = "english", projectable = Projectable.YES) (1)
@KeywordField(name = "title_sort", normalizer = "english", sortable = Sortable.YES) (2)
private String title;

@GenericField(projectable = Projectable.YES, sortable = Sortable.YES) (3)
private Integer pageCount;
1 Map the title property to a full-text field with the same name. Some options can be set to customize the fields' behavior, in this case the analyzer (for full-text indexing) and the fact that this field is projectable (its value can be retrieved from the index).
2 Map the title property to another field, configured differently: it is not analyzed, but simply normalized (i.e. it’s not split into multiple tokens), and it is stored in such a way that it can be used in sorts.

Mapping a single property to multiple fields is particularly useful when doing full-text search: at query time, you can use a different field depending on what you need. You can map a property to as many fields as you want, but each must have a unique name.

3 Map another property to its own field.

Before you map a property, you must consider two things:

The @*Field annotation

In its simplest form, property/field mapping is achieved by applying the @GenericField annotation to a property. This annotation will work for every supported property type, but is rather limited: it does not allow full-text search in particular. To go further, you will need to rely on different, more specific annotations, which offer specific attributes. The available annotations are described in details in Available field annotations.

The type of the property

In order for the @*Field annotation to work correctly, the type of the mapped property must be supported by Hibernate Search. See Supported property types for a list of all types that are supported out of the box, and Mapping custom property types for indications on how to handle more complex types, be it simply containers (List<String>, Map<String, Integer>, …​) or custom types.

5.5.2. Available field annotations

Various field annotations exist, each offering its own set of attributes.

This section lists the different annotations and their use. For more details about available attributes, see Field annotation attributes.

@GenericField

A good default choice that will work for every property type with built-in support.

Fields mapped using this annotation do not provide any advanced features such as full-text search: matches on a generic field are exact matches.

@FullTextField

A text field whose value is considered as multiple words. Only works for String fields.

Matches on a full-text field can be more subtle than exact matches: match fields which contains a given word, match fields regardless of case, match fields ignoring diacritics, …​

Full-text fields should be assigned an analyzer, referenced by its name. By default, the analyzer named default will be used. See Analysis for more details about analyzers and full-text analysis.

Note you can also define a search analyzer to analyze searched terms differently.

Full-text fields cannot be sorted on. If you need to sort on the value of a property, it is recommended to use @KeywordField, with a normalizer if necessary (see below). Note that multiple fields can be added to the same property, so you can use both @FullTextField and @KeywordField if you need both full-text search and sorting.
@KeywordField

A text field whose value is considered as a single keyword. Only works for String fields.

Keyword fields allow more subtle matches, similarly to full-text fields, with the limitation that keyword fields only contain one token. On the other hand, this limitation allows keyword fields to be sorted on.

Keyword fields may be assigned a normalizer, referenced by its name. See Analysis for more details about normalizers and full-text analysis.

@ScaledNumberField

A numeric field for integer or floating-point values that require a higher precision than doubles but always have roughly the same scale. Only works for either java.math.BigDecimal or java.math.BigInteger fields.

Scaled numbers are indexed as integers, typically a long (64 bits), with a fixed scale that is consistent for all values of the field across all documents. Because scaled numbers are indexed with a fixed precision, they cannot represent all BigDecimal or BigInteger values. Values that are too large to be indexed will trigger a runtime exception. Values that have trailing decimal digits will be rounded to the nearest integer.

This annotation allows to set the decimalScale attribute.

@NonStandardField

An annotation for advanced use cases where a value binder is used and that binder is expected to define an index field type that does not support any of the standard options: searchable, sortable, …​

This annotation is very useful for cases when a field type native to the backend is necessary: defining the mapping directly as JSON for Elasticsearch, or manipulating IndexableField directly for Lucene.

Fields mapped using this annotation have very limited configuration options from the annotation (no searchable/sortable/etc.), but the value binder will be able to pick a non-standard field type, which generally gives much more flexibility.

5.5.3. Field annotation attributes

Various field mapping annotations exist, each offering its own set of attributes.

This section lists the different annotation attributes and their use. For more details about available annotations, see Available field annotations.

name

The name of the index field. By default, it is the same as the property name. You may want to change it in particular when mapping a single property to multiple fields.

Value: String. The name must not contain the dot character (.). Defaults to the name of the property.

sortable

Whether the field can be sorted on, i.e. whether a specific data structure is added to the index to allow efficient sorts when querying.

Value: Sortable.YES, Sortable.NO, Sortable.DEFAULT.

This option is not available for @FullTextField. See here for an explanation and some solutions.

projectable

Whether the field can be projected on, i.e. whether the field value is stored in the index to allow retrieval later when querying.

Value: Projectable.YES, Projectable.NO, Projectable.DEFAULT.

aggregable

Whether the field can be aggregated, i.e. whether the field value is stored in a specific data structure in the index to allow aggregations later when querying.

Value: Aggregable.YES, Aggregable.NO, Aggregable.DEFAULT.

searchable

Whether the field can be searched on. i.e. whether the field is indexed in order to allow applying predicates later when querying.

Value: Searchable.YES, Searchable.NO, Searchable.DEFAULT.

indexNullAs

The value to use as a replacement anytime the property value is null.

Disabled by default.

The replacement is defined as a String. Thus, its value has to be parsed. Look up the column Parsing method for 'indexNullAs' in Supported property types to find out the format used when parsing.

extraction

How elements to index should be extracted from the property in the case of container types (List, Optional, Map, …​).

By default, for properties that have a container type, the innermost elements will be indexed. For example for a property of type List<String>, elements of type String will be indexed.

This default behavior and ways to override it are described in the section Mapping container types with container extractors.

analyzer

The analyzer to apply to field values when indexing and querying. Only available on @FullTextField.

By default, the analyzer named default will be used.

See Analysis for more details about analyzers and full-text analysis.

searchAnalyzer

An optional different analyzer, overriding the one defined with the analyzer attribute, to use only when analyzing searched terms.

If not defined, the analyzer assigned to analyzer will be used.

See Analysis for more details about analyzers and full-text analysis.

normalizer

The normalizer to apply to field values when indexing and querying. Only available on @KeywordField.

See Analysis for more details about normalizers and full-text analysis.

norms

Whether index-time scoring information for the field should be stored or not. Only available on @KeywordField and @FullTextField.

Enabling norms will improve the quality of scoring. Disabling norms will reduce the disk space used by the index.

Value: Norms.YES, Norms.NO, Norms.DEFAULT.

termVector

The term vector storing strategy. Only available on @FullTextField.

The different values of this attribute are:

Value Definition

TermVector.YES

Store the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term’s frequency.

TermVector.NO

Do not store term vectors.

TermVector.WITH_POSITIONS

Store the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document.

TermVector.WITH_OFFSETS

Store the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms.

TermVector.WITH_POSITION_OFFSETS

Store the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS and WITH_POSITIONS.

TermVector.WITH_POSITIONS_PAYLOADS

Store the term vector, token position and token payloads. This is the same as TermVector.WITH_POSITIONS plus it contains the payload of each occurrence of a term in a document.

TermVector.WITH_POSITIONS_OFFSETS_PAYLOADS

Store the term vector, token position, offset information and token payloads. This is the same as TermVector.WITH_POSITION_OFFSETS plus it contains the payload of each occurrence of a term in a document.

decimalScale

How the scale of a large number (BigInteger or BigDecimal) should be adjusted before it is indexed as a fixed-precision integer. Only available on @ScaledNumberField.

To index numbers that have significant digits after the decimal point, set the decimalScale to the number of digits you need indexed. The decimal point will be shifted that many times to the right before indexing, preserving that many digits from the decimal part. To index very large numbers that cannot fit in a long, set the decimal point to a negative value. The decimal point will be shifted that many times to the left before indexing, dropping all digits from the decimal part.

decimalScale with strictly positive values is allowed only for BigDecimal, since BigInteger values have no decimal digits.

Note that shifting of the decimal points is completely transparent and will not affect how you use the search DSL: you be expected to provide "normal" BigDecimal or BigInteger values, and Hibernate Search will apply the decimalScale and rounding transparently.

As a result of the rounding, search predicates and sorts will only be as precise as what the decimalScale allows.

Note that rounding does not affect projections, which will return the original value without any loss of precision.

A typical use case is monetary amounts, with a decimal scale of 2 because only two digits are generally needed beyond the decimal point.
Using Hibernate ORM mapping, a default decimalScale is taken automatically from the underlying scale value of the relative SQL @Column, using the Hibernate ORM metadata. The value could be overridden explicitly using the decimalScale attribute.

5.5.4. Supported property types

Below is a table listing all types with built-in value bridges, i.e. property types that are supported out of the box when mapping a property to an index field.

The table also explains the value assigned to the index field, i.e. the value passed to the underlying backend for indexing.

For information about the underlying indexing and storage used by the backend, see Lucene field types or Elasticsearch field types depending on your backend.

Table 4. Property types with built-in value bridges
Property type Value of index field (if different) Limitations Parsing method for 'indexNullAs'

All enum types

name() as a java.lang.String

-

Enum.valueOf(String)

java.lang.String

-

-

-

java.lang.Character, char

A single-character java.lang.String

-

Accepts any single-character java.lang.String

java.lang.Byte, byte

-

-

Byte.parseByte(String)

java.lang.Short, short

-

-

Short.parseShort(String)

java.lang.Integer, int

-

-

Integer.parseInt(String)

java.lang.Long, long

-

-

Long.parseLong(String)

java.lang.Double, double

-

-

Double.parseDouble(String)

java.lang.Float, float

-

-

Float.parseFloat(String)

java.lang.Boolean, boolean

-

-

Accepts the strings true or false, ignoring case

java.math.BigDecimal

-

-

new BigDecimal(String)

java.math.BigInteger

-

-

new BigInteger(String)

java.net.URI

toString() as a java.lang.String

-

new URI(String)

java.net.URL

toExternalForm() as a java.lang.String

-

new URL(String)

java.time.Instant

-

Possibly lower range/resolution

Instant.parse(String)

java.time.LocalDate

-

Possibly lower range/resolution

LocalDate.parse(String).

java.time.LocalTime

-

Possibly lower range/resolution

LocalTime.parse(String)

java.time.LocalDateTime

-

Possibly lower range/resolution

LocalDateTime.parse(String)

java.time.OffsetDateTime

-

Possibly lower range/resolution

OffsetDateTime.parse(String)

java.time.OffsetTime

-

Possibly lower range/resolution

OffsetTime.parse(String)

java.time.ZonedDateTime

-

Possibly lower range/resolution

ZonedDateTime.parse(String)

java.time.ZoneId

getId() as a java.lang.String

-

ZoneId.of(String)

java.time.ZoneOffset

getTotalSeconds() as a java.lang.Integer

-

ZoneOffset.of(String)

java.time.Period

A formatted java.lang.String: <years on 11 characters><months on 11 characters><days on 11 characters>

-

Period.parse(String)

java.time.Duration

toNanos() as a java.lang.Long

Possibly lower range/resolution

Duration.parse(String)

java.time.Year

-

Possibly lower range/resolution

Year.parse(String)

java.time.YearMonth

-

Possibly lower range/resolution

YearMonth.parse(String)

java.time.MonthDay

-

-

MonthDay.parse(String)

java.util.UUID

toString() as a java.lang.String

-

UUID.fromString(String)

java.util.Calendar

A java.time.ZonedDateTime representing the same date/time and timezone.

See Support for legacy java.util date/time APIs.

ZonedDateTime.parse(String)

java.util.Date

Instant.ofEpochMilli(long) as a java.time.Instant.

See Support for legacy java.util date/time APIs.

Instant.parse(String)

java.sql.Timestamp

Instant.ofEpochMilli(long) as a java.time.Instant.

See Support for legacy java.util date/time APIs.

Instant.parse(String)

java.sql.Date

Instant.ofEpochMilli(long) as a java.time.Instant.

See Support for legacy java.util date/time APIs.

Instant.parse(String)

java.sql.Time

Instant.ofEpochMilli(long) as a java.time.Instant.

See Support for legacy java.util date/time APIs.

Instant.parse(String)

GeoPoint and subtypes

-

-

Latitude as double and longitude as double, separated by a comma. Ex: 41.8919, 12.51133.

Range and resolution of date/time fields

With a few exceptions, most date and time values are passed as-is to the backend; e.g. a LocalDateTime property would be passed as a LocalDateTime to the backend.

Internally, however, the Lucene and Elasticsearch backend use a different representation of date/time types. As a result, date and time fields stored in the index may have a smaller range and resolution than the corresponding Java type.

The documentation of each backend provides more information: see here for Lucene and here for Elasticsearch.

5.5.5. Support for legacy java.util date/time APIs

Using legacy date/time types such as java.util.Calendar, java.util.Date, java.sql.Timestamp, java.sql.Date, java.sql.Time is not recommended, due to their numerous quirks and shortcomings. The java.time package introduced in Java 8 should generally be preferred.

That being said, integration constraints may force you to rely on the legacy date/time APIs, which is why Hibernate Search still attempts to support them on a best effort basis.

Since Hibernate Search uses the java.time APIs to represent date/time internally, the legacy date/time types need to be converted before they can be indexed. Hibernate Search keeps things simple: java.util.Date, java.util.Calendar, etc. will be converted using their time-value (number of milliseconds since the epoch), which will be assumed to represent the same date/time in Java 8 APIs. In the case of java.util.Calendar, timezone information will be preserved for projections.

For all dates after 1900, this will work exactly as expected.

Before 1900, indexing and searching through Hibernate Search APIs will also work as expected, but if you need to access the index natively, for example through direct HTTP calls to an Elasticsearch server, you will notice that the indexed values are slightly "off". This is caused by differences in the implementation of java.time and legacy date/time APIs which lead to slight differences in the interpretation of time-values (number of milliseconds since the epoch).

The "drifts" are consistent: they will also happen when building a predicate, and they will happen in the opposite direction when projecting. As a result, the differences will not be visible from an application relying on the Hibernate Search APIs exclusively. They will, however, be visible when accessing indexes natively.

For the large majority of use cases, this will not be a problem. If this behavior is not acceptable for your application, you should look into implementing custom value bridges and instructing Hibernate Search to use them by default for java.util.Date, java.util.Calendar, etc.: see Assigning default bridges with the bridge resolver.

Technically, conversions are difficult because the java.time APIs and the legacy date/time APIs do not have the same internal calendar.

In particular:

  • java.time assumes a "Local Mean Time" before 1900, while legacy date/time APIs do not support it (JDK-6281408), As a result, time values (number of milliseconds since the epoch) reported by the two APIs will be different for dates before 1900.

  • java.time uses a proleptic Gregorian calendar before October 15, 1582, meaning it acts as if the Gregorian calendar, along with its system of leap years, had always existed. Legacy date/time APIs, on the other hand, use the Julian calendar before that date (by default), meaning the leap years are not exactly the same ones. As a result, some dates that are deemed valid by one API will be deemed invalid by the other, for example February 29, 1500.

Those are the two main problems, but there may be others.

5.5.6. Mapping custom property types

Even types that are not supported out of the box can be mapped. There are various solutions, some simple and some more powerful, but they all come down to extracting data from the unsupported type and converting it to types that are supported by the backend.

There are two cases to distinguish between:

  1. If the unsupported type is simply a container (List<String>) or multiple nested containers (Map<Integer, List<String>>) whose elements have a supported type, then what you need is a container extractor. See Mapping container types with container extractors for more information.

  2. Otherwise, you will have to rely on a custom component, called a bridge, to extract data from your type. See Bridges for more information on custom bridges.

5.5.7. Programmatic mapping

You can map properties of an entity to an index field directly through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 25. Mapping properties to fields directly with .genericField(), .fullTextField(), …​
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "title" )
        .fullTextField()
                .analyzer( "english" ).projectable( Projectable.YES )
        .keywordField( "title_sort" )
                .normalizer( "english" ).sortable( Sortable.YES );
bookMapping.property( "pageCount" )
        .genericField().projectable( Projectable.YES ).sortable( Sortable.YES );

5.6. Mapping associated elements with @IndexedEmbedded

5.6.1. Basics

Using only @Indexed combined with @*Field annotations allows indexing an entity and its direct properties, which is nice but simplistic. A real-world model will include multiple object types holding references to one another, like the authors association in the example below.

Example 26. A multi-entity model with associations

This mapping will declare the following fields in the Book index:

  • title

  • …​ and nothing else.

@Entity
@Indexed (1)
public class Book {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english") (2)
    private String title;

    @ManyToMany
    private List<Author> authors = new ArrayList<>(); (3)

    public Book() {
    }

    // Getters and setters
    // ...

}
@Entity
public class Author {

    @Id
    private Integer id;

    private String name;

    @ManyToMany(mappedBy = "authors")
    private List<Book> books = new ArrayList<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 The Book entity is indexed.
2 The title of the book is mapped to an index field.
3 But how to index the Author name into the Book index?

When searching for a book, users will likely need to search by author name. In the world of high-performance indexes, cross-index joins are costly and usually not an option. The best way to address such use cases is generally to copy data: when indexing a Book, just copy the name of all its authors into the Book document.

That’s what @IndexedEmbedded does: it instructs Hibernate Search to embed the fields of an associated object into the main object. In the example below, it will instruct Hibernate Search to embed the name field defined in Author into Book, creating the field authors.name.

@IndexedEmbedded can be used on Hibernate ORM’s @Embedded properties as well as associations (@OneToOne, @OneToMany, @ManyToMany, …​).

Example 27. Using @IndexedEmbedded to index associated elements

This mapping will declare the following fields in the Book index:

  • title

  • authors.name

@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english")
    private String title;

    @ManyToMany
    @IndexedEmbedded (1)
    private List<Author> authors = new ArrayList<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
@Entity
public class Author {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name") (2)
    private String name;

    @ManyToMany(mappedBy = "authors")
    private List<Book> books = new ArrayList<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Add an @IndexedEmbedded to the authors property.
2 Map Author.name to an index field, even though Author is not directly mapped to an index (no @Indexed).

Document identifiers are not index fields. Consequently, they will be ignored by @IndexedEmbedded.

To embed another entity’s identifier with @IndexedEmbedded, map that identifier to a field explicitly using @GenericField or another @*Field annotation.

When @IndexedEmbedded is applied to an association, i.e. to a property that refers to entities (like the example above), the association must be bidirectional. Otherwise, Hibernate Search will throw an exception on startup.

See Reindexing when embedded elements change for the reasons behind this restriction and ways to circumvent it.

Index-embedding can be nested on multiple levels; for example you can decide to index-embed the place of birth of authors, to be able to search for books written by Russian authors exclusively:

Example 28. Nesting multiple @IndexedEmbedded

This mapping will declare the following fields in the Book index:

  • title

  • authors.name

  • authors.placeOfBirth.country

@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english")
    private String title;

    @ManyToMany
    @IndexedEmbedded (1)
    private List<Author> authors = new ArrayList<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
@Entity
public class Author {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name") (2)
    private String name;

    @Embedded
    @IndexedEmbedded (3)
    private Address placeOfBirth;

    @ManyToMany(mappedBy = "authors")
    private List<Book> books = new ArrayList<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
@Embeddable
public class Address {

    @FullTextField(analyzer = "name") (4)
    private String country;

    private String city;

    private String street;

    public Address() {
    }

    // Getters and setters
    // ...

}
1 Add an @IndexedEmbedded to the authors property.
2 Map Author.name to an index field, even though Author is not directly mapped to an index (no @Indexed).
3 Add an @IndexedEmbedded to the placeOfBirth property.
4 Map Address.country to an index field, even though Address is not directly mapped to an index (no @Indexed).

By default, @IndexedEmbedded will nest other @IndexedEmbedded encountered in the indexed-embedded type recursively, without any sort of limit, which can cause infinite recursion.

5.6.2. @IndexedEmbedded and null values

When properties targeted by an @IndexedEmbedded contain null elements, these elements are simply not indexed.

On contrary to Mapping a property to an index field with @GenericField, @FullTextField, …​, there is no indexNullAs feature to index a specific value for null objects, but you can take advantage of the exists predicate in search queries to look for documents where a given @IndexedEmbedded has or doesn’t have a value: simply pass the name of the object field to the exists predicate, for example authors in the example above.

5.6.3. @IndexedEmbedded on container types

When properties targeted by an @IndexedEmbedded have a container type (List, Optional, Map, …​), the innermost elements will be embedded. For example for a property of type List<MyEntity>, elements of type MyEntity will be embedded.

This default behavior and ways to override it are described in the section Mapping container types with container extractors.

5.6.4. Setting the object field name with name

By default, @IndexedEmbedded will create an object field with the same name as the annotated property, and will add embedded fields to that object field. So if @IndexedEmbedded is applied to a property named authors in a Book entity, the index field name of the authors will be copied to the index field authors.name when Book is indexed.

It is possible to change the name of the object field by setting the name attribute; for example using @IndexedEmbedded(name = "allAuthors") in the example above will result in the name of authors being copied to the index field allAuthors.name instead of authors.name.

The name must not contain the dot character (.).

5.6.5. Setting the field name prefix with prefix

The prefix attribute in @IndexedEmbedded is deprecated and will ultimately be removed. Use name instead.

By default, @IndexedEmbedded will prepend the name of embedded fields with the name of the property it is applied to followed by a dot. So if @IndexedEmbedded is applied to a property named authors in a Book entity, the name field of the authors will be copied to the authors.name field when Book is indexed.

It is possible to change this prefix by setting the prefix attribute, for example @IndexedEmbedded(prefix = "author.") (do not forget the trailing dot!).

The prefix should generally be a sequence of non-dots ending with a single dot, for example my_Property..

Changing the prefix to a string that does not include any dot at the end (my_Property), or that includes a dot anywhere but at the very end (my.Property.), will lead to complex, undocumented, legacy behavior. Do this at your own risk.

In particular, a prefix that does not end with a dot will lead to incorrect behavior in some APIs exposed to custom bridges: the addValue/addObject methods that accept a field name.

5.6.6. Casting the target of @IndexedEmbedded with targetType

By default, the type of indexed-embedded values is detected automatically using reflection, taking into account container extraction if relevant; for example @IndexedEmbedded List<MyEntity> will be detected as having values of type MyEntity. Fields to be embedded will be inferred from the mapping of the value type and its supertypes; in the example, @GenericField annotations present on MyEntity and its superclasses will be taken into account, but annotations defined in its subclasses will be ignored.

If for some reason a schema does not expose the correct type for a property (e.g. a raw List, or List<MyEntityInterface> instead of List<MyEntityImpl>) it is possible to define the expected type of values by setting the targetType attribute in @IndexedEmbedded. On bootstrap, Hibernate Search will then resolve fields to be embedded based on the given target type, and at runtime it will cast values to the given target type.

Failures to cast indexed-embedded values to the designated type will be propagated and lead to indexing failure.

5.6.7. Reindexing when embedded elements change

When the "embedded" entity changes, Hibernate Search will handle reindexing of the "embedding" entity.

This will work transparently most of the time, as long as the association @IndexedEmbedded is applied to is bidirectional (uses Hibernate ORM’s mappedBy).

When Hibernate Search is unable to handle an association, it will throw an exception on bootstrap. If this happens, refer to Basics to know more.

5.6.8. Filtering embedded fields and breaking @IndexedEmbedded cycles

By default, @IndexedEmbedded will "embed" everything: every field encountered in the indexed-embedded element, and every @IndexedEmbedded encountered in the indexed-embedded element, recursively.

This will work just fine for simpler use cases, but may lead to problems for more complex models:

  • If the indexed-embedded element declares many index fields (Hibernate Search fields), only some of which are actually useful to the "index-embedding" type, the extra fields will decrease indexing performance needlessly.

  • If there is a cycle of @IndexedEmbedded (e.g. A index-embeds b of type B, which index-embeds a of type A) the index-embedding type will end up with an infinite amount of fields (a.b.someField, a.b.a.b.someField, a.b.a.b.a.b.someField, …​), which Hibernate Search will detect and reject with an exception.

To address these problems, it is possible to filter the fields to embed, to only include those that are actually useful. Two filtering attributes are available on @IndexedEmbedded and may be combined:

includePaths

The paths of index fields from the indexed-embedded element that should be embedded.

Provided paths must be relative to the indexed-embedded element, i.e. they must not include its name or prefix.

This takes precedence over includeDepth (see below).

includeDepth

The number of levels of indexed-embedded that will have all their fields included by default.

includeDepth is the number of @IndexedEmbedded that will be traversed and for which all fields of the indexed-embedded element will be included, even if these fields are not included explicitly through includePaths:

  • includeDepth=0 means that fields of the indexed-embedded element are not included, nor is any field of nested indexed-embedded elements, unless these fields are included explicitly through includePaths.

  • includeDepth=1 means that fields of the indexed-embedded element are included, but not fields of nested indexed-embedded elements, unless these fields are included explicitly through includePaths.

  • And so on.

The default value depends on the value of the includePaths attribute: if includePaths is empty, the default is Integer.MAX_VALUE (include all fields at every level) if includePaths is not empty, the default is 0 (only include fields included explicitly).

Dynamic fields and filtering

Dynamic fields are not directly affected by filtering rules: a dynamic field will be included if and only if its parent is included.

This means in particular that includeDepth and includePaths constraints only need to match the nearest static parent of a dynamic field in order for that field to be included.

Below are two examples: one leveraging includePaths only, and one leveraging includePaths and includeDepth.

Example 29. Filtering indexed-embedded fields with includePaths

This mapping will declare the following fields in the Human index:

  • name

  • nickname

  • parents.name: explicitly included because includePaths on parents includes name.

  • parents.nickname: explicitly included because includePaths on parents includes nickname.

  • parents.parents.name: explicitly included because includePaths on parents includes parents.name.

The following fields in particular are excluded:

  • parents.parents.nickname: not implicitly included because includeDepth is not set and defaults to 0, and not explicitly included either because includePaths on parents does not include parents.nickname.

  • parents.parents.parents.name: not implicitly included because includeDepth is not set and defaults to 0, and not explicitly included either because includePaths on parents does not include parents.parents.name.

@Entity
@Indexed
public class Human {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name")
    private String name;

    @FullTextField(analyzer = "name")
    private String nickname;

    @ManyToMany
    @IndexedEmbedded(includePaths = { "name", "nickname", "parents.name" })
    private List<Human> parents = new ArrayList<>();

    @ManyToMany(mappedBy = "parents")
    private List<Human> children = new ArrayList<>();

    public Human() {
    }

    // Getters and setters
    // ...

}
Example 30. Filtering indexed-embedded fields with includePaths and includeDepth

This mapping will declare the following fields in the Human index:

  • name

  • surname

  • parents.name: implicitly at depth 0 because includeDepth > 0 (so parents.* is included implicitly).

  • parents.nickname: implicitly included at depth 0 because includeDepth > 0 (so parents.* is included implicitly).

  • parents.parents.name: implicitly included at depth 1 because includeDepth > 1 (so parents.parents.* is included implicitly).

  • parents.parents.nickname: implicitly included at depth 1 because includeDepth > 1 (so parents.parents.* is included implicitly).

  • parents.parents.parents.name: not implicitly included at depth 2 because includeDepth = 2 (so parents.parents.parents is included implicitly, but subfields can only be included explicitly) but explicitly included because includePaths on parents includes parents.parents.name.

The following fields in particular are excluded:

  • parents.parents.parents.nickname: not implicitly included at depth 2 because includeDepth = 2 (so parents.parents.parents is included implicitly, but subfields must be included explicitly) and not explicitly included either because includePaths on parents does not include parents.parents.nickname.

  • parents.parents.parents.parents.name: not implicitly included at depth 3 because includeDepth = 2 (so parents.parents.parents is included implicitly, but parents.parents.parents.parents and subfields can only be included explicitly) and not explicitly included either because includePaths on parents does not include parents.parents.parents.name.

@Entity
@Indexed
public class Human {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name")
    private String name;

    @FullTextField(analyzer = "name")
    private String nickname;

    @ManyToMany
    @IndexedEmbedded(includeDepth = 2, includePaths = { "parents.parents.name" })
    private List<Human> parents = new ArrayList<>();

    @ManyToMany(mappedBy = "parents")
    private List<Human> children = new ArrayList<>();

    public Human() {
    }

    // Getters and setters
    // ...

}

5.6.9. Structuring embedded elements as nested documents using structure

Indexed-embedded fields can be structured in one of two ways, configured through the structure attribute of the @IndexedEmbedded annotation. To illustrate structure options, let’s assume the class Book is annotated with @Indexed and its authors property is annotated with @IndexedEmbedded:

  • Book instance

    • title = Leviathan Wakes

    • authors =

      • Author instance

        • firstName = Daniel

        • lastName = Abraham

      • Author instance

        • firstName = Ty

        • lastName = Frank

DEFAULT or FLATTENED structure

By default, or when using @IndexedEmbedded(structure = FLATTENED) as shown below, indexed-embedded fields are "flattened", meaning that the tree structure is not preserved.

Example 31. @IndexedEmbedded with a flattened structure
@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english")
    private String title;

    @ManyToMany
    @IndexedEmbedded(structure = ObjectStructure.FLATTENED) (1)
    private List<Author> authors = new ArrayList<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
1 Explicitly set the structure of indexed-embedded to FLATTENED. This is not strictly necessary, since FLATTENED is the default.
@Entity
public class Author {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name")
    private String firstName;

    @FullTextField(analyzer = "name")
    private String lastName;

    @ManyToMany(mappedBy = "authors")
    private List<Book> books = new ArrayList<>();

    public Author() {
    }

    // Getters and setters
    // ...

}

The book instance mentioned earlier would be indexed with a structure roughly similar to this:

  • Book document

    • title = Leviathan Wakes

    • authors.firstName = [Daniel, Ty]

    • authors.lastName = [Abraham, Frank]

The authors.firstName and authors.lastName fields were "flattened" and now each has two values; the knowledge of which last name corresponds to which first name has been lost.

This is more efficient for indexing and querying, but can cause unexpected behavior when querying the index on both the author’s first name and the author’s last name.

For example, the book instance described above would show up as a match to a query such as authors.firstname:Ty AND authors.lastname:Abraham, even though "Ty Abraham" is not one of this book’s authors:

Example 32. Searching with a flattened structure
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .must( f.match().field( "authors.firstName" ).matching( "Ty" ) ) (1)
                .must( f.match().field( "authors.lastName" ).matching( "Abraham" ) ) ) (1)
        .fetchHits( 20 );

assertThat( hits ).isNotEmpty(); (2)
1 Require that hits have an author with the first name Ty and an author with the last name Abraham…​ but not necessarily the same author!
2 The hits will include a book whose authors are "Ty Daniel" and "Frank Abraham".
NESTED structure

When indexed-embedded elements are "nested", i.e. when using @IndexedEmbedded(structure = NESTED) as shown below, the tree structure is preserved by transparently creating one separate "nested" document for each indexed-embedded element.

Example 33. @IndexedEmbedded with a nested structure
@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english")
    private String title;

    @ManyToMany
    @IndexedEmbedded(structure = ObjectStructure.NESTED) (1)
    private List<Author> authors = new ArrayList<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
1 Explicitly set the structure of indexed-embedded objects to NESTED.
@Entity
public class Author {

    @Id
    private Integer id;

    @FullTextField(analyzer = "name")
    private String firstName;

    @FullTextField(analyzer = "name")
    private String lastName;

    @ManyToMany(mappedBy = "authors")
    private List<Book> books = new ArrayList<>();

    public Author() {
    }

    // Getters and setters
    // ...

}

The book instance mentioned earlier would be indexed with a structure roughly similar to this:

  • Book document

    • title = Leviathan Wakes

    • Nested documents

      • Nested document #1 for "authors"

        • authors.firstName = Daniel

        • authors.lastName = Abraham

      • Nested document #2 for "authors"

        • authors.firstName = Ty

        • authors.lastName = Frank

The book is effectively indexed as three documents: the root document for the book, and two internal, "nested" documents for the authors, preserving the knowledge of which last name corresponds to which first name at the cost of degraded performance when indexing and querying.

The nested documents are "hidden" and won’t directly show up in search results. No need to worry about nested documents being "mixed up" with root documents.

If special care is taken when building predicates on fields within nested documents, using a nested predicate, queries containing predicates on both the author’s first name and the author’s last name will behave as one would (intuitively) expect.

For example, the book instance described above would not show up as a match to a query such as authors.firstname:Ty AND authors.lastname:Abraham, thanks to the nested predicate (which can only be used when indexing with the NESTED structure):

Example 34. Searching with a nested structure
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.nested().objectField( "authors" ).nest( f.bool() (1)
                .must( f.match().field( "authors.firstName" ).matching( "Ty" ) ) (2)
                .must( f.match().field( "authors.lastName" ).matching( "Abraham" ) ) ) ) (2)
        .fetchHits( 20 );

assertThat( hits ).isEmpty(); (3)
1 Require that the two constraints (first name and last name) apply to the same author.
2 Require that hits have an author with the first name Ty and an author with the last name Abraham.
3 The hits will not include a book whose authors are "Ty Daniel" and "Frank Abraham".

5.6.10. Programmatic mapping

You can embed the fields of an associated object into the main object through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 35. Using .indexedEmbedded() to index associated elements

This mapping will declare the following fields in the Book index:

  • title

  • authors.name

TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "title" )
        .fullTextField().analyzer( "english" );
bookMapping.property( "authors" )
        .indexedEmbedded();
TypeMappingStep authorMapping = mapping.type( Author.class );
authorMapping.property( "name" )
        .fullTextField().analyzer( "name" );

5.7. Mapping container types with container extractors

5.7.1. Basics

Most built-in annotations applied to properties will work transparently when applied to container types:

  • @GenericField applied to a property of type String will index the property value directly.

  • @GenericField applied to a property of type OptionalInt will index the optional’s value (an integer).

  • @GenericField applied to a property of type List<String> will index the list elements (strings).

  • @GenericField applied to a property of type Map<Integer, String> will index the map values (strings).

  • @GenericField applied to a property of type Map<Integer, List<String>> will index the list elements in the map values (strings).

  • Etc.

Same goes for other field annotations such as @FullTextField, as well as @IndexedEmbedded in particular.

What happens behind the scenes is that Hibernate Search will inspect the property type and attempt to apply "container extractors", picking the first that works.

5.7.2. Explicit container extraction

In some cases, you will want to pick the container extractors to use explicitly. This is the case when a map’s keys must be indexed, instead of the values. Relevant annotations offer an extraction attribute to configure this, as shown in the example below.

All built-in extractor names are available as constants in org.hibernate.search.mapper.pojo.extractor.builtin.BuiltinContainerExtractors.
Example 36. Mapping Map keys to an index field using explicit container extractor definition
@ElementCollection (1)
@JoinTable(name = "book_pricebyformat")
@MapKeyColumn(name = "format")
@Column(name = "price")
@OrderBy("format asc")
@GenericField( (2)
        name = "availableFormats",
        extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY) (3)
)
private Map<BookFormat, BigDecimal> priceByFormat = new LinkedHashMap<>();
1 This annotation — and those below — are just Hibernate ORM configuration.
2 Declare an index field based on the priceByFormat property.
3 By default, Hibernate Search would index the map values (the book prices). This uses the extraction attribute to specify that map keys (the book formats) must be indexed instead.
When multiple levels of extractions are necessary, multiple extractors can be configured: extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY, BuiltinContainerExtractors.OPTIONAL). However, such complex mappings are unlikely since they are generally not supported by Hibernate ORM.

It is possible to implement and use custom container extractors, but at the moment these extractors will not be handled correctly for automatic reindexing, so the corresponding property must have automatic reindexing disabled.

See HSEARCH-3688 for more information.

5.7.3. Disabling container extraction

In some rare cases, container extraction is not wanted, and the @GenericField/@IndexedEmbedded is meant to be applied to the List/Optional/etc. directly. To ignore the default container extractors, most annotations offer an extraction attribute. Set it as below to disable extraction altogether:

Example 37. Disabling container extraction
@ManyToMany
@GenericField( (1)
        name = "authorCount",
        valueBridge = @ValueBridgeRef(type = MyCollectionSizeBridge.class), (2)
        extraction = @ContainerExtraction(extract = ContainerExtract.NO) (3)
)
private List<Author> authors = new ArrayList<>();
1 Declare an index field based on the authors property.
2 Instruct Hibernate Search to use the given bridge, which will extract the collection size (the number of authors).
3 Because the bridge is applied to the collection as a whole, and not to each author, the extraction attribute is used to disable container extraction.

5.7.4. Programmatic mapping

You can pick the container extractors to use explicitly when defining fields or indexed-embeddeds through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 38. Mapping Map keys to an index field using .extractor(…​)/.extactors(…​) for explicit container extractor definition
bookMapping.property( "priceByFormat" )
        .genericField( "availableFormats" )
                .extractor( BuiltinContainerExtractors.MAP_KEY );

Similarly, you can disable container extraction.

Example 39. Disabling container extraction with .noExtractors()
bookMapping.property( "authors" )
        .genericField( "authorCount" )
                .valueBridge( new MyCollectionSizeBridge() )
                .noExtractors();

5.8. Mapping geo-point types

5.8.1. Basics

Hibernate Search provides a variety of spatial features such as a distance predicate and a distance sort. These features require that spatial coordinates are indexed. More precisely, it requires that a geo-point, i.e. a latitude and longitude in the geographic coordinate system, are indexed.

Geo-points are a bit of an exception, because there isn’t any type in the standard Java library to represent them. For that reason, Hibernate Search defines its own interface, org.hibernate.search.engine.spatial.GeoPoint. Since your model probably uses a different type to represent geo-points, mapping geo-points requires some extra steps.

Two options are available:

  • If your geo-points are represented by a dedicated, immutable type, simply use @GenericField and the GeoPoint interface, as explained here.

  • For every other case, use the more complex (but more powerful) @GeoPointBinding, as explained here.

5.8.2. Using @GenericField and the GeoPoint interface

When geo-points are represented in your entity model by a dedicated, immutable type, you can simply make that type implement the GeoPoint interface, and use simple property/field mapping with @GenericField:

Example 40. Mapping spatial coordinates by implementing GeoPoint and using @GenericField
@Embeddable
public class MyCoordinates implements GeoPoint { (1)

    @Basic
    private Double latitude;

    @Basic
    private Double longitude;

    protected MyCoordinates() {
        // For Hibernate ORM
    }

    public MyCoordinates(double latitude, double longitude) {
        this.latitude = latitude;
        this.longitude = longitude;
    }

    @Override
    public double latitude() { (2)
        return latitude;
    }

    @Override
    public double longitude() {
        return longitude;
    }
}
@Entity
@Indexed
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String name;

    @Embedded
    @GenericField (3)
    private MyCoordinates placeOfBirth;

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Model the geo-point as an embeddable implementing GeoPoint. A custom type with a corresponding Hibernate ORM UserType would work as well.
2 The geo-point type must be immutable: it does not declare any setter.
3 Apply the @GenericField annotation to the placeOfBirth property holding the coordinates. An index field named placeOfBirth will be added to the index. Options generally used on @GenericField can be used here as well.

The geo-point type must be immutable, i.e. the latitude and longitude of a given instance may never change.

This is a core assumption of @GenericField and generally all @*Field annotations: changes to the coordinates will be ignored and will not trigger reindexing as one would expect.

If the type holding your coordinates is mutable, do not use @GenericField and refer to Using @GeoPointBinding, @Latitude and @Longitude instead.

If your geo-point type is immutable, but extending the GeoPoint interface is not an option, you can also use a custom value bridge converting between the custom geo-point type and GeoPoint. GeoPoint offers static methods to quickly build a GeoPoint instance.

5.8.3. Using @GeoPointBinding, @Latitude and @Longitude

For cases where coordinates are stored in a mutable object, the solution is the @GeoPointBinding annotation. Combined with the @Latitude and @Longitude annotation, it can map the coordinates of any type that declares a latitude and longitude of type double:

Example 41. Mapping spatial coordinates using @GeoPointBinding
@Entity
@Indexed
@GeoPointBinding(fieldName = "placeOfBirth") (1)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String name;

    @Latitude (2)
    private Double placeOfBirthLatitude;

    @Longitude (3)
    private Double placeOfBirthLongitude;

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Apply the @GeoPointBinding annotation to the type, setting fieldName to the name of the index field.
2 Apply @Latitude to the property holding the latitude. It must be of double or Double type.
3 Apply @Longitude to the property holding the longitude. It must be of double or Double type.

The @GeoPointBinding annotation may also be applied to a property, in which case the @Latitude and @Longitude must be applied to properties of the property’s type:

Example 42. Mapping spatial coordinates using @GeoPointBinding on a property
@Embeddable
public class MyCoordinates { (1)

    @Basic
    @Latitude (2)
    private Double latitude;

    @Basic
    @Longitude (3)
    private Double longitude;

    protected MyCoordinates() {
        // For Hibernate ORM
    }

    public MyCoordinates(double latitude, double longitude) {
        this.latitude = latitude;
        this.longitude = longitude;
    }

    public double getLatitude() {
        return latitude;
    }

    public void setLatitude(Double latitude) { (4)
        this.latitude = latitude;
    }

    public double getLongitude() {
        return longitude;
    }

    public void setLongitude(Double longitude) {
        this.longitude = longitude;
    }
}
@Entity
@Indexed
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "name")
    private String name;

    @Embedded
    @GeoPointBinding (5)
    private MyCoordinates placeOfBirth;

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Model the geo-point as embeddable. An entity would work as well.
2 In the geo-point type, apply @Latitude to the property holding the latitude.
3 In the geo-point type, apply @Longitude to the property holding the longitude.
4 The geo-point type may safely declare setters (it can be mutable).
5 Apply the @GeoPointBinding annotation to the property. Setting fieldName to the name of the index field is possible, but optional: the property name will be used by default.

It is possible to handle multiple sets of coordinates by applying the annotations multiple times and setting the markerSet attribute to a unique value:

Example 43. Mapping multiple sets of spatial coordinates using @GeoPointBinding
@Entity
@Indexed
@GeoPointBinding(fieldName = "placeOfBirth", markerSet = "birth") (1)
@GeoPointBinding(fieldName = "placeOfDeath", markerSet = "death") (2)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "name")
    private String name;

    @Latitude(markerSet = "birth") (3)
    private Double placeOfBirthLatitude;

    @Longitude(markerSet = "birth") (4)
    private Double placeOfBirthLongitude;

    @Latitude(markerSet = "death") (5)
    private Double placeOfDeathLatitude;

    @Longitude(markerSet = "death") (6)
    private Double placeOfDeathLongitude;

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Apply the @GeoPointBinding annotation to the type, setting fieldName to the name of the index field, and markerSet to a unique value.
2 Apply the @GeoPointBinding annotation to the type a second time, setting fieldName to the name of the index field (different from the first one), and markerSet to a unique value (different from the first one).
3 Apply @Latitude to the property holding the latitude for the first geo-point field. Set the markerSet attribute to the same value as the corresponding @GeoPointBinding annotation.
4 Apply @Longitude to the property holding the longitude for the first geo-point field. Set the markerSet attribute to the same value as the corresponding @GeoPointBinding annotation.
5 Apply @Latitude to the property holding the latitude for the second geo-point field. Set the markerSet attribute to the same value as the corresponding @GeoPointBinding annotation.
6 Apply @Longitude to the property holding the longitude for the second geo-point field. Set the markerSet attribute to the same value as the corresponding @GeoPointBinding annotation.

5.8.4. Programmatic mapping

You can map geo-point fields document identifier through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 44. Mapping spatial coordinates by implementing GeoPoint and using .genericField()
TypeMappingStep authorMapping = mapping.type( Author.class );
authorMapping.indexed();
authorMapping.property( "placeOfBirth" )
        .genericField();
Example 45. Mapping spatial coordinates using GeoPointBinder
TypeMappingStep authorMapping = mapping.type( Author.class );
authorMapping.indexed();
authorMapping.binder( GeoPointBinder.create().fieldName( "placeOfBirth" ) );
authorMapping.property( "placeOfBirthLatitude" )
        .marker( GeoPointBinder.latitude() );
authorMapping.property( "placeOfBirthLongitude" )
        .marker( GeoPointBinder.longitude() );

5.9. Mapping multiple alternatives

5.9.1. Basics

In some situations, it is necessary for a particular property to be indexed differently depending on the value of another property.

For example there may be an entity that has text properties whose content is in a different language depending on the value of another property, say language. In that case, you probably want to analyze the text differently depending on the language.

While this could definitely be solved with a custom type bridge, a convenient solution to that problem is to use the AlternativeBinder. This binder solves the problem this way:

  • at bootstrap, declare one index field per language, assigning a different analyzer to each field;

  • at runtime, put the content of the text property in a different field based on the language.

In order to use this binder, you will need to:

  • annotate a property with @AlternativeDiscriminator (e.g. the language property);

  • implement an AlternativeBinderDelegate that will declare the index fields (e.g. one field per language) and create an AlternativeValueBridge. This bridge is responsible for passing the property value to the relevant field at runtime.

  • apply the AlternativeBinder to the type hosting the properties (e.g. the type declaring the language property and the multi-language text properties). Generally you will want to create your own annotation for that.

Below is an example of how to use the binder.

Example 46. Mapping a property to a different index field based on a language property using AlternativeBinder
public enum Language { (1)

    ENGLISH( "en" ),
    FRENCH( "fr" ),
    GERMAN( "de" );

    public final String code;

    Language(String code) {
        this.code = code;
    }
}
1 A Language enum defines supported languages.
@Entity
@Indexed
public class BlogEntry {

    @Id
    private Integer id;

    @AlternativeDiscriminator (1)
    @Enumerated(EnumType.STRING)
    private Language language;

    @MultiLanguageField (2)
    private String text;

    // Getters and setters
    // ...

}
1 Mark the language property as the discriminator which will be used to determine the language.
2 Map the text property to multiple fields using a custom annotation.
@Retention(RetentionPolicy.RUNTIME) (1)
@Target({ ElementType.METHOD, ElementType.FIELD }) (2)
@PropertyMapping(processor = @PropertyMappingAnnotationProcessorRef( (3)
        type = MultiLanguageField.Processor.class
))
@Documented (4)
public @interface MultiLanguageField {

    String name() default ""; (5)

    class Processor implements PropertyMappingAnnotationProcessor<MultiLanguageField> { (6)
        @Override
        public void process(PropertyMappingStep mapping, MultiLanguageField annotation,
                PropertyMappingAnnotationProcessorContext context) {
            LanguageAlternativeBinderDelegate delegate = new LanguageAlternativeBinderDelegate( (7)
                    annotation.name().isEmpty() ? null : annotation.name()
            );
            mapping.hostingType() (8)
                    .binder( AlternativeBinder.create( (9)
                            Language.class, (10)
                            context.annotatedElement().name(), (11)
                            String.class, (12)
                            BeanReference.ofInstance( delegate ) (13)
                    ) );
        }
    }
}
1 Define an annotation with retention RUNTIME. Any other retention policy will cause the annotation to be ignored by Hibernate Search.
2 Allow the annotation to target either methods (getters) or fields.
3 Mark this annotation as a property mapping, and instruct Hibernate Search to apply the given processor whenever it finds this annotation. It is also possible to reference the processor by its name, in the case of a CDI/Spring bean.
4 Optionally, mark the annotation as documented, so that it is included in the javadoc of your entities.
5 Optionally, define parameters. Here we allow to customize the field name (which will default to the property name, see further down).
6 The processor must implement the PropertyMappingAnnotationProcessor interface, setting its generic type argument to the type of the corresponding annotation. Here the processor class is nested in the annotation class, because it is more convenient, but you are obviously free to implement it in a separate Java file.
7 In the annotation processor, instantiate a custom binder delegate (see below for the implementation).
8 Access the mapping of the type hosting the property (in this example, BlogEntry).
9 Apply the AlternativeBinder to the type hosting the property (in this example, BlogEntry).
10 Pass to AlternativeBinder the expected type of discriminator values.
11 Pass to AlternativeBinder the name of the property from which field values should be extracted (in this example, text).
12 Pass to AlternativeBinder the expected type of the property from which index field values are extracted.
13 Pass to AlternativeBinder the binder delegate.
public class LanguageAlternativeBinderDelegate implements AlternativeBinderDelegate<Language, String> { (1)

    private final String name;

    public LanguageAlternativeBinderDelegate(String name) { (2)
        this.name = name;
    }

    @Override
    public AlternativeValueBridge<Language, String> bind(IndexSchemaElement indexSchemaElement, (3)
            PojoModelProperty fieldValueSource) {
        EnumMap<Language, IndexFieldReference<String>> fields = new EnumMap<>( Language.class );
        String fieldNamePrefix = ( name != null ? name : fieldValueSource.name() ) + "_";

        for ( Language language : Language.values() ) { (4)
            String languageCode = language.code;
            IndexFieldReference<String> field = indexSchemaElement.field(
                    fieldNamePrefix + languageCode, (5)
                    f -> f.asString().analyzer( "text_" + languageCode ) (6)
            )
                    .toReference();
            fields.put( language, field );
        }

        return new Bridge( fields ); (7)
    }

    private static class Bridge implements AlternativeValueBridge<Language, String> { (8)
        private final EnumMap<Language, IndexFieldReference<String>> fields;

        private Bridge(EnumMap<Language, IndexFieldReference<String>> fields) {
            this.fields = fields;
        }

        @Override
        public void write(DocumentElement target, Language discriminator, String bridgedElement) {
            target.addValue( fields.get( discriminator ), bridgedElement ); (9)
        }
    }
}
1 The binder delegate must implement AlternativeBinderDelegate. The first type parameter is the expected type of discriminator values (in this example, Language); the second type parameter is the expected type of the property from which field values are extracted (in this example, String).
2 Any (custom) parameter can be passed through the constructor.
3 Implement bind, to bind a property to index fields.
4 Define one field per language.
5 Make sure to give a different name to each field. Here we’re using the language code as a suffix, i.e. text_en, text_fr, text_de, …​
6 Assign a different analyzer to each field. The analyzers text_en, text_fr, text_de must have been defined in the backend; see Analysis.
7 Return a bridge.
8 The bridge must implement the AlternativeValueBridge interface. Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
9 The bridge is called when indexing; it selects the field to write to based on the discriminator value, then writes the value to index to that field.

5.9.2. Programmatic mapping

You can apply AlternativeBinder through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 47. Applying an AlternativeBinder with .binder(…​)
TypeMappingStep blogEntryMapping = mapping.type( BlogEntry.class );
blogEntryMapping.indexed();
blogEntryMapping.property( "language" )
        .marker( AlternativeBinder.alternativeDiscriminator() );
LanguageAlternativeBinderDelegate delegate = new LanguageAlternativeBinderDelegate( null );
blogEntryMapping.binder( AlternativeBinder.create( Language.class,
        "text", String.class, BeanReference.ofInstance( delegate ) ) );

5.10. Tuning automatic reindexing

5.10.1. Basics

When an entity property is mapped to the index, be it through @GenericField, @IndexedEmbedded, or a custom bridge, this mapping introduces a dependency: the document will need to be updated when the property changes.

For simpler, single-entity mappings, this only means that Hibernate Search will need to detect when an entity changes and reindex the entity. This will be handled transparently.

If the mapping includes a "derived" property, i.e. a property that is not persisted directly, but instead is dynamically computed in a getter that uses other properties as input, Hibernate Search will be unable to guess which part of the persistent state these properties are based on. In this case, some explicit configuration will be required; see Reindexing when a derived value changes with @IndexingDependency for more information.

When the mapping crosses the entity boundaries, things get more complicated. Let’s consider a mapping where a Book entity is mapped to a document, and that document must include the name property of the Author entity (for example using @IndexedEmbedded). Hibernate Search will need to track changes to the author’s name, and whenever that happens, it will need to retrieve all the books of that author, to reindex these books automatically.

In practice, this means that whenever an entity mapping relies on an association to another entity, this association must be bidirectional: if Book.authors is @IndexedEmbedded, Hibernate Search must be aware of an inverse association Author.books. An exception will be thrown on startup if the inverse association cannot be resolved.

Most of the time, Hibernate Search is able to take advantage of Hibernate ORM metadata (the mappedBy attribute of @OneToOne and @OneToMany) to resolve the inverse side of an association, so this is all handled transparently.

In some rare cases, with the more complex mappings, it is possible that even Hibernate ORM is not aware that an association is bidirectional, because mappedBy cannot be used. A few solutions exist:

5.10.2. Enriching the entity model with @AssociationInverseSide

Given an association from an entity type A to entity type B, @AssociationInverseSide defines the inverse side of an association, i.e. the path from B to A.

This is mostly useful when a bidirectional association is not mapped as such in Hibernate ORM (no mappedBy).

Example 48. Mapping the inverse side of an association with @AssociationInverseSide
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    @ElementCollection (1)
    @JoinTable(
            name = "book_editionbyprice",
            joinColumns = @JoinColumn(name = "book_id")
    )
    @MapKeyJoinColumn(name = "edition_id")
    @Column(name = "price")
    @OrderBy("edition_id asc")
    @IndexedEmbedded( (2)
            name = "editionsForSale",
            extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY)
    )
    @AssociationInverseSide( (3)
            extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY),
            inversePath = @ObjectPath( @PropertyValue( propertyName = "book" ) )
    )
    private Map<BookEdition, BigDecimal> priceByEdition = new LinkedHashMap<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
@Entity
public class BookEdition {

    @Id
    @GeneratedValue
    private Integer id;

    @ManyToOne (4)
    private Book book;

    @FullTextField(analyzer = "english")
    private String label;

    public BookEdition() {
    }

    // Getters and setters
    // ...

}
1 This annotation and the following ones are the Hibernate ORM mapping for a Map<BookEdition, BigDecimal> where the keys are BookEdition entities and the values are the price of that edition.
2 Index-embed the editions that are actually for sale.
3 In Hibernate ORM, it is not possible to use mappedBy for an association modeled by a Map key. Thus, we use @AssociationInverseSide to tell Hibernate Search what the inverse side of this association is.
4 We could have applied the @AssociationInverseSide annotation here instead: either side will do.

5.10.3. Reindexing when a derived value changes with @IndexingDependency

When a property is not persisted directly, but instead is dynamically computed in a getter that uses other properties as input, Hibernate Search will be unable to guess which part of the persistent state these properties are based on, and thus will be unable to trigger automatic reindexing when the relevant persistent state changes. By default, Hibernate Search will detect such cases on bootstrap and throw an exception.

Annotating the property with @IndexingDependency(derivedFrom = …​) will give Hibernate Search the information it needs and allow automatic reindexing.

Example 49. Mapping a derived value with @IndexingDependency.derivedFrom
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    @ElementCollection
    private List<String> authors = new ArrayList<>(); (1)

    public Book() {
    }

    // Getters and setters
    // ...

    @Transient (2)
    @FullTextField(analyzer = "name") (3)
    @IndexingDependency(derivedFrom = @ObjectPath( (4)
            @PropertyValue(propertyName = "authors")
    ))
    public String getMainAuthor() {
        return authors.isEmpty() ? null : authors.get( 0 );
    }
}
1 Authors are modeled as a list of string containing the author names.
2 The transient mainAuthor property dynamically returns the main author (the first one).
3 We use @FullTextField on the getMainAuthor() getter to index the name of the main author.
4 We use @IndexingDependency.derivedFrom to tell Hibernate Search that whenever the list of authors changes, the result of getMainAuthor() may have changed.

5.10.4. Limiting automatic reindexing with @IndexingDependency

In some cases, fully automatic reindexing is not realistically achievable:

  • When an association is massive, for example a single entity instance is indexed-embedded in thousands of other entities.

  • When a property mapped to the index is updated very frequently, leading to a very frequent reindexing and unacceptable usage of disks or database.

  • Etc.

When that happens, it is possible to tell Hibernate Search to ignore updates to a particular property (and, in the case of @IndexedEmbedded, anything beyond that property).

Several options are available to control exactly how updates to a given property affect reindexing. See the sections below for an explanation of each option.

ReindexOnUpdate.SHALLOW: limiting automatic reindexing to same-entity updates only

ReindexOnUpdate.SHALLOW is most useful when an association is highly asymmetric and therefore unidirectional. Think associations to "reference" data such as categories, types, cities, countries, …​

It essentially tells Hibernate Search that changing an association — adding or removing associated elements, i.e. "shallow" updates — should trigger automatic reindexing, but changing properties of associated entities — "deep" updates — should not.

For example, let’s consider the (incorrect) mapping below:

Example 50. A highly-asymmetric, unidirectional association
@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    private String title;

    @ManyToOne (1)
    @IndexedEmbedded (2)
    private BookCategory category;

    public Book() {
    }

    // Getters and setters
    // ...

}
@Entity
public class BookCategory {

    @Id
    private Integer id;

    @FullTextField(analyzer = "english")
    private String name;

    (3)

    // Getters and setters
    // ...

}
1 Each book has an association to a BookCategory entity.
2 We want to index-embed the BookCategory into the Book …​
3 …​ but we really don’t want to model the (huge) inverse association from BookCategory to Book: There are potentially thousands of books for each category, so calling a getBooks() method would lead to loading thousands of entities into the Hibernate ORM session at once, and would perform badly. Thus, there isn’t any getBooks() method to list all books in a category.

With this mapping, Hibernate Search will not be able to reindex all books when the category name changes: the getter that would list all books for that category simply doesn’t exist. Since Hibernate Search tries to be safe by default, it will reject this mapping and throw an exception at bootstrap, saying it needs an inverse side to the BookBookCategory association.

However, in this case, we don’t expect the name of a BookCategory to change. That’s really "reference" data, which changes so rarely that we can conceivably plan ahead such change and reindex all books whenever that happens. So we would really not mind if Hibernate Search just ignored changes to BookCategory…​

That’s what @IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW) is for: it tells Hibernate Search to ignore the impact of updates to an associated entity. See the modified mapping below:

Example 51. Limiting automatic reindexing to same-entity updates with ReindexOnUpdate.SHALLOW
@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    private String title;

    @ManyToOne
    @IndexedEmbedded
    @IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW) (1)
    private BookCategory category;

    public Book() {
    }

    // Getters and setters
    // ...

}
1 We use ReindexOnUpdate.SHALLOW to tell Hibernate Search that Book should be re-indexed automatically when it’s assigned a new category (book.setCategory( newCategory )), but not when properties of its category change (category.setName( newName )).

Hibernate Search will accept the mapping above and boot successfully, since the inverse side of the association from Book to BookCategory is no longer deemed necessary.

Only shallow changes to a book’s category will trigger automatic reindexing:

  • When a book is assigned a new category (book.setCategory( newCategory )), Hibernate Search will consider it a "shallow" change, since it only affects the Book entity. Thus, Hibernate Search will reindex the book automatically.

  • When a category itself changes (category.setName( newName )), Hibernate Search will consider it a "deep" change, since it occurs beyond the boundaries of the Book entity. Thus, Hibernate Search will not reindex books of that category automatically. The index will become slightly out-of-sync, but this can be solved by reindexing Book entities, for example every night.

ReindexOnUpdate.NO: disabling automatic reindexing for updates of a particular property

ReindexOnUpdate.NO is most useful for properties that change very frequently and don’t need to be up-to-date in the index.

It essentially tells Hibernate Search that changes to that property should not trigger automatic reindexing,

For example, let’s consider the mapping below:

Example 52. A frequently-changing property
@Entity
@Indexed
public class Sensor {

    @Id
    private Integer id;

    @FullTextField
    private String name; (1)

    @KeywordField
    private SensorStatus status; (1)

    @Column(name = "\"value\"")
    private double value; (2)

    @GenericField
    private double rollingAverage; (3)

    public Sensor() {
    }

    // Getters and setters
    // ...

}
1 The sensor name and status get updated very rarely.
2 The sensor value gets updated every few milliseconds
3 When the sensor value gets updated, we also update the rolling average over the last few seconds (based on data not shown here).

Updates to the name and status, which are rarely updated, can perfectly well trigger automatic reindexing. But considering there are thousands of sensors, updates to the sensor value cannot reasonably trigger automatic reindexing: reindexing thousands of sensors every few milliseconds probably won’t perform well.

In this scenario, however, search on sensor value is not considered critical and indexes don’t need to be as fresh. We can accept indexes to lag behind a few minutes when it comes to sensor value. We can consider setting up a batch process that runs every few seconds to reindex all sensors, either through a mass indexer or other means. So we would really not mind if Hibernate Search just ignored changes to sensor values…​

That’s what @IndexingDependency(reindexOnUpdate = ReindexOnUpdate.NO) is for: it tells Hibernate Search to ignore the impact of updates to the rollingAverage property. See the modified mapping below:

Example 53. Disabling automatic reindexing for a particular property with ReindexOnUpdate.NO
@Entity
@Indexed
public class Sensor {

    @Id
    private Integer id;

    @FullTextField
    private String name;

    @KeywordField
    private SensorStatus status;

    @Column(name = "\"value\"")
    private double value;

    @GenericField
    @IndexingDependency(reindexOnUpdate = ReindexOnUpdate.NO) (1)
    private double rollingAverage;

    public Sensor() {
    }

    // Getters and setters
    // ...

}
1 We use ReindexOnUpdate.NO to tell Hibernate Search that updates to rollingAverage should not trigger automatic reindexing.

With this mapping:

  • When a sensor is assigned a new name (sensor.setName( newName )) or status (sensor.setStatus( newStatus )), Hibernate Search will reindex the sensor automatically.

  • When a sensor is assigned a new rolling average (sensor.setRollingAverage( newName )), Hibernate Search will not reindex the sensor automatically.

5.10.5. Programmatic mapping

You can control reindexing through the programmatic mapping too. Behavior and options are identical to annotation-based mapping.

Example 54. Mapping the inverse side of an association with .associationInverseSide(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "priceByEdition" )
        .indexedEmbedded( "editionsForSale" )
                .extractor( BuiltinContainerExtractors.MAP_KEY )
        .associationInverseSide( PojoModelPath.parse( "book" ) )
                .extractor( BuiltinContainerExtractors.MAP_KEY );
TypeMappingStep bookEditionMapping = mapping.type( BookEdition.class );
bookEditionMapping.property( "label" )
        .fullTextField().analyzer( "english" );
Example 55. Mapping a derived value with .indexingDependency().derivedFrom(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "mainAuthor" )
        .fullTextField().analyzer( "name" )
        .indexingDependency().derivedFrom( PojoModelPath.parse( "authors" ) );
Example 56. Limiting automatic reindexing with .indexingDependency().reindexOnUpdate(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "category" )
        .indexedEmbedded()
        .indexingDependency().reindexOnUpdate( ReindexOnUpdate.SHALLOW );
TypeMappingStep bookCategoryMapping = mapping.type( BookCategory.class );
bookCategoryMapping.property( "name" )
        .fullTextField().analyzer( "english" );

5.11. Changing the mapping of an existing application

Over the lifetime of an application, it will happen that the mapping of a particular indexed entity type has to change. When this happens, the mapping changes are likely to require changes to the structure of the index, i.e. its schema. Hibernate Search does not handle this structure change automatically, so manual intervention is required.

The simplest solution when the index structure needs to change is to:

  1. Drop and re-create the index and its schema, either manually by deleting the filesystem directory for Lucene or using the REST API to delete the index for Elasticsearch, or using Hibernate Search’s schema management features.

  2. Re-populate the index, for example using the mass indexer.

Technically, dropping the index and reindexing is not strictly required if the mapping changes include only:

  • adding new indexed entities that will not have any persisted instance, e.g. adding an @Indexed annotation on an entity which has no rows in database.

  • adding new fields that will be empty for all currently persisted entities, e.g. adding a new property on an entity type and mapping it to a field, but with the guarantee that this property will initially be null for every instance of this entity;

  • and/or removing data from existing indexes/fields, e.g. removing an index field, or removing the need for a field to be stored.

However, you will still need to:

  • create missing indexes: this can generally be done automatically by starting up the application with the create, create-or-validate, or create-or-update schema management strategy.

  • (Elasticsearch only:) update the schema of existing indexes to declare the new fields. This will be more complex: either do it manually using Elasticsearch’s REST API, or start up the application with the create-or-update strategy, but be warned that it may fail.

5.12. Custom mapping annotations

By default, Hibernate Search only recognizes built-in mapping annotations such as @Indexed, @GenericField or @IndexedEmbedded.

To use custom annotations in a Hibernate Search mapping, two steps are required:

  1. Implementing a processor for that annotation: TypeMappingAnnotationProcessor for type annotations or PropertyMappingAnnotationProcessor for method/field annotations.

  2. Annotating the custom annotation with either @TypeMapping or @PropertyMapping, passing as an argument the reference to the annotation processor.

Once this is done, Hibernate Search will be able to detect custom annotations in indexed classes. Whenever a custom annotation is encountered, Hibernate Search will instantiate the annotation processor and call its process method, passing the following as arguments:

  • A mapping parameter allowing to define the mapping for the type or property using the programmatic mapping API.

  • An annotation parameter representing the annotation instance.

  • A context object with various helpers.

Custom annotations are most frequently used to apply custom, parameterized bridges. You can find examples in these sections in particular:

It is completely possible to use custom annotations for parameter-less bridges, or even for more complex features such as indexed-embedded: every feature available in the programmatic API can be triggered by a custom annotation.

5.13. Inspecting the mapping

After Hibernate Search has successfully booted, the SearchMapping can be used to get a list of indexed entities and get more direct access to the corresponding indexes, as shown in the example below.

Example 57. Accessing indexed entities
SearchMapping mapping = Search.mapping( entityManagerFactory ); (1)
SearchIndexedEntity<Book> bookEntity = mapping.indexedEntity( Book.class ); (2)
String jpaName = bookEntity.jpaName(); (3)
IndexManager indexManager = bookEntity.indexManager(); (4)
Backend backend = indexManager.backend(); (5)

SearchIndexedEntity<?> bookEntity2 = mapping.indexedEntity( "Book" ); (6)
Class<?> javaClass = bookEntity2.javaClass();

for ( SearchIndexedEntity<?> entity : mapping.allIndexedEntities() ) { (7)
    // ...
}
1 Retrieve the SearchMapping.
2 Retrieve the SearchIndexedEntity by its entity class. SearchIndexedEntity gives access to information pertaining to that entity and its index.
3 Get the JPA name of that entity.
4 Get the index manager for that entity.
5 Get the backend for that index manager.
6 Retrieve the SearchIndexedEntity by its entity name.
7 Retrieve all indexed entities.

From an IndexManager, you can then access the index metamodel, to inspect available fields and their main characteristics, as shown below.

Example 58. Accessing the index metamodel
SearchIndexedEntity<Book> bookEntity = mapping.indexedEntity( Book.class ); (1)
IndexManager indexManager = bookEntity.indexManager(); (2)
IndexDescriptor indexDescriptor = indexManager.descriptor(); (3)

indexDescriptor.field( "releaseDate" ).ifPresent( field -> { (4)
    String path = field.absolutePath(); (5)
    String relativeName = field.relativeName();
    // Etc.

    if ( field.isValueField() ) { (6)
        IndexValueFieldDescriptor valueField = field.toValueField(); (7)

        IndexValueFieldTypeDescriptor type = valueField.type(); (8)
        boolean projectable = type.projectable();
        Class<?> dslArgumentClass = type.dslArgumentClass();
        Class<?> projectedValueClass = type.projectedValueClass();
        Optional<String> analyzerName = type.analyzerName();
        Optional<String> searchAnalyzerName = type.searchAnalyzerName();
        Optional<String> normalizerName = type.normalizerName();
        // Etc.
    }
    else if ( field.isObjectField() ) { (9)
        IndexObjectFieldDescriptor objectField = field.toObjectField();

        IndexObjectFieldTypeDescriptor type = objectField.type();
        boolean nested = type.nested();
        // Etc.
    }
} );
1 Retrieve a SearchIndexedEntity.
2 Get the index manager for that entity. IndexManager gives access to information pertaining to the index. This includes the metamodel, but not only (see below).
3 Get the descriptor for that index. The descriptor exposes the index metamodel.
4 Retrieve a field by name. The method returns an Optional, which is empty if the field does not exist.
5 The field descriptor exposes information about the field structure: path, name, parent, …​
6 Check that the field is a value field, holding a value (integer, text, …​), as opposed to object fields, holding other fields.
7 Narrow down the field descriptor to a value field descriptor.
8 Get the descriptor for the field type. The type descriptor exposes information about the field’s capabilities: is it searchable, sortable, projectable, what is the expected java class for arguments to the Search DSL, what are the analyzers/normalizer set on this field, …​
9 Object fields can also be inspected.

The Backend and IndexManager can also be used to retrieve the Elasticsearch REST client or retrieve Lucene analyzers.

The SearchMapping also exposes methods to retrieve an IndexManager by name, or even a whole Backend by name.

6. Bridges

6.1. Basics

In Hibernate Search, bridges are the components responsible for converting pieces of data from the entity model to the document model.

For example, when @GenericField is applied to a property of a custom enum type, a built-in bridge will be used to convert this enum to a string when indexing, and to convert the string back to an enum when projecting.

Similarly, when an entity identifier of type Long is mapped to a document identifier, a built-in bridge will be used to convert the Long to a String (since all document identifiers are strings) when indexing, and back from a String to a Long when loading search results.

Bridges are not limited to one-to-one mapping: for example, the @GeoPointBinding annotation, which maps two properties annotated with @Latitude and @Longitude to a single field, is backed by another built-in bridge.

While built-in bridges are provided for a wide range of standard types, they may not be enough for complex models. This is why bridges are really useful: it is possible to implement custom bridges and to refer to them in the Hibernate Search mapping. Using custom bridges, custom types can be mapped, even complex types that require user code to execute at indexing time.

There are multiple types of bridges, detailed in the next sections. If you need to implement a custom bridge, but don’t quite know which type of bridge you need, the following table may help:

Table 5. Comparison of available bridge types
Bridge type ValueBridge PropertyBridge TypeBridge IdentifierBridge RoutingBridge

Applied to…​

Class field or getter

Class field or getter

Class

Class field or getter (usually entity ID)

Class

Maps to…​

One index field. Value field only: integer, text, geopoint, etc. No object field (composite).

One index field or more. Value fields as well as object fields (composite).

One index field or more. Value fields as well as object fields (composite).

Document identifier

Route (conditional indexing, routing key)

Built-in annotation(s)

@GenericField, @FullTextField, …​

@PropertyBinding

@TypeBinding

@DocumentId

@Indexed( routingBinder = …​ )

Supports container extractors

Yes

No

No

No

No

Supports mutable types

No

Yes

Yes

No

Yes

6.2. Value bridge

6.2.1. Basics

A value bridge is a pluggable component that implements the mapping of a property to an index field. It is applied to a property with a @*Field annotation (@GenericField, @FullTextField, …​) or with a custom annotation.

A value bridge is relatively straightforward to implement: in its simplest form, it boils down to converting a value from the property type to the index field type. Thanks to the integration to the @*Field annotations, several features come for free:

  • The type of the index field can be customized directly in the @*Field annotation: it can be defined as sortable, projectable, it can be assigned an analyzer, …​

  • The bridge can be transparently applied to elements of a container. For example, you can implement a ValueBridge<ISBN, String> and transparently use it on a property of type List<ISBN>: the bridge will simply be applied once per list element and populate the index field with as many values.

However, due to these features, several limitations are imposed on a value bridge which are not present in a property bridge for example:

  • A value bridge only allows one-to-one mapping: one property to one index field. A single value bridge cannot populate more than one index field.

  • A value bridge will not work correctly when applied to a mutable type. A value bridge is expected to be applied to "atomic" data, such as a LocalDate; if it is applied to an entity, for example, extracting data from its properties, Hibernate Search will not be aware of which properties are used and will not be able to automatically trigger reindexing when these properties change.

Below is an example of a custom value bridge that converts a custom ISBN type to its string representation to index it:

Example 59. Implementing and using a ValueBridge
public class ISBNValueBridge implements ValueBridge<ISBN, String> { (1)

    @Override
    public String toIndexedValue(ISBN value, ValueBridgeToIndexedValueContext context) { (2)
        return value == null ? null : value.getStringValue();
    }

}
1 The bridge must implement the ValueBridge interface. Two generic type arguments must be provided: the first one is the type of property values (values in the entity model), and the second one is the type of index fields (values in the document model).
2 The toIndexedValue method is the only one that must be implemented: all other methods are optional. It takes the property value and a context object as parameters, and is expected to return the corresponding index field value. It is called when indexing, but also when parameters to the search DSL must be transformed.
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @Convert(converter = ISBNAttributeConverter.class) (1)
    @KeywordField( (2)
            valueBridge = @ValueBridgeRef(type = ISBNValueBridge.class), (3)
            normalizer = "isbn" (4)
    )
    private ISBN isbn;

    // Getters and setters
    // ...

}
1 This is unrelated to the value bridge, but necessary in order for Hibernate ORM to store the data correctly in the database.
2 Map the property to an index field.
3 Instruct Hibernate Search to use our custom value bridge. It is also possible to reference the bridge by its name, in the case of a CDI/Spring bean.
4 Customize the field as usual.

Here is an example of what an indexed document would look like, with the Elasticsearch backend:

{
  "isbn": "978-0-58-600835-5"
}

The example above is just a minimal implementations. A custom value bridge can do more:

See the next sections for more information.

6.2.2. Type resolution

By default, the value bridge’s property type and index field type are determined automatically, using reflection to extract the generic type arguments of the ValueBridge interface: the first argument is the property type while the second argument is the index field type.

For example, in public class MyBridge implements ValueBridge<ISBN, String>, the property type is resolved to ISBN and the index field type is resolved to String: the bridge will be applied to properties of type ISBN and will populate an index field of type String.

The fact that types are resolved automatically using reflection brings a few limitations. In particular, it means the generic type arguments cannot be just anything; as a general rule, you should stick to literal types (MyBridge implements ValueBridge<ISBN, String>) and avoid generic type parameters and wildcards (MyBridge<T> implements ValueBridge<List<T>, T>).

If you need more complex types, you can bypass the automatic resolution and specify types explicitly using a ValueBinder.

6.2.3. Using value bridges in other @*Field annotations

In order to use a custom value bridge with specialized annotations such as @FullTextField, the bridge must declare a compatible index field type.

For example:

  • @FullTextField and @KeywordField require an index field type of type String (ValueBridge<Whatever, String>);

  • @ScaledNumberField requires an index field type of type BigDecimal (ValueBridge<Whatever, BigDecimal>) or BigInteger (ValueBridge<Whatever, BigInteger>).

Refer to Available field annotations for the specific constraints of each annotation.

Attempts to use a bridge that declares an incompatible type will trigger exceptions at bootstrap.

6.2.4. Supporting projections with fromIndexedValue()

By default, any attempt to project on a field using a custom bridge will result in an exception, because Hibernate Search doesn’t know how to convert the projected values obtained from the index back to the property type.

It is possible to disable conversion explicitly to get the raw value from the index, but another way of solving the problem is to simply implement fromIndexedValue in the custom bridge. This method will be called whenever a projected value needs to be converted.

Example 60. Implementing fromIndexedValue to convert projected values
public class ISBNValueBridge implements ValueBridge<ISBN, String> {

    @Override
    public String toIndexedValue(ISBN value, ValueBridgeToIndexedValueContext context) {
        return value == null ? null : value.getStringValue();
    }

    @Override
    public ISBN fromIndexedValue(String value, ValueBridgeFromIndexedValueContext context) { (1)
        return value == null ? null : ISBN.parse( value );
    }
}
1 Implement fromIndexedValue as necessary.
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @Convert(converter = ISBNAttributeConverter.class) (1)
    @KeywordField( (2)
            valueBridge = @ValueBridgeRef(type = ISBNValueBridge.class), (3)
            normalizer = "isbn",
            projectable = Projectable.YES (4)
    )
    private ISBN isbn;

    // Getters and setters
    // ...

}
1 This is unrelated to the value bridge, but necessary in order for Hibernate ORM to store the data correctly in the database.
2 Map the property to an index field.
3 Instruct Hibernate Search to use our custom value bridge.
4 Do not forget to configure the field as projectable.

6.2.5. Supporting indexNullAs with parse()

By default, the indexNullAs attribute of @*Field annotations cannot be used together with a custom bridge.

In order to make it work, the bridge needs to implement the parse method so that Hibernate Search can convert the string assigned to indexNullAs to a value of the correct type for the index field.

Example 61. Implementing parse to support indexNullAs
public class ISBNValueBridge implements ValueBridge<ISBN, String> {

    @Override
    public String toIndexedValue(ISBN value, ValueBridgeToIndexedValueContext context) {
        return value == null ? null : value.getStringValue();
    }

    @Override
    public String parse(String value) {
        // Just check the string format and return the string
        return ISBN.parse( value ).getStringValue(); (1)
    }
}
1 Implement parse as necessary. The bridge may throw exceptions for invalid strings.
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @Convert(converter = ISBNAttributeConverter.class) (1)
    @KeywordField( (2)
            valueBridge = @ValueBridgeRef(type = ISBNValueBridge.class), (3)
            normalizer = "isbn",
            indexNullAs = "000-0-00-000000-0" (4)
    )
    private ISBN isbn;

    // Getters and setters
    // ...

}
1 This is unrelated to the value bridge, but necessary in order for Hibernate ORM to store the data correctly in the database.
2 Map the property to an index field.
3 Instruct Hibernate Search to use our custom value bridge.
4 Set indexNullAs to a valid value.

6.2.6. Compatibility across indexes with isCompatibleWith()

A value bridges is involved in indexing, but also in the various search DSLs, to convert values passed to the DSL to an index field value that the backend will understand.

When creating a predicate targeting a single field across multiple indexes, Hibernate Search will have multiple bridges to choose from: one per index. Since only one predicate with a single value can be created, Hibernate Search needs to pick a single bridge. By default, when a custom bridge is assigned to the field, Hibernate Search will throw an exception because it cannot decide which bridge to pick.

If the bridges assigned to the field in all indexes produce the same result, it is possible to indicate to Hibernate Search that any bridge will do by implementing isCompatibleWith.

This method accepts another bridge in parameter, and returns true if that bridge can be expected to always behave the same as this.

Example 62. Implementing isCompatibleWith to support multi-index search
public class ISBNValueBridge implements ValueBridge<ISBN, String> { (1)

    @Override
    public String toIndexedValue(ISBN value, ValueBridgeToIndexedValueContext context) { (2)
        return value == null ? null : value.getStringValue();
    }

    @Override
    public boolean isCompatibleWith(ValueBridge<?, ?> other) {
        return getClass().equals( other.getClass() );
    }
}
1 Implement isCompatibleWith as necessary. Here we just deem any instance of the same class to be compatible.

6.2.7. Configuring the bridge more finely with ValueBinder

To configure a bridge more finely, it is possible to implement a value binder that will be executed at bootstrap. This binder will be able in particular to define a custom index field type.

Example 63. Implementing a ValueBinder
public class ISBNValueBinder implements ValueBinder { (1)
    @Override
    public void bind(ValueBindingContext<?> context) { (2)
        context.bridge( (3)
                ISBN.class, (4)
                new ISBNValueBridge(), (5)
                context.typeFactory() (6)
                        .asString() (7)
                        .normalizer( "isbn" ) (8)
        );
    }

    private static class ISBNValueBridge implements ValueBridge<ISBN, String> {
        @Override
        public String toIndexedValue(ISBN value, ValueBridgeToIndexedValueContext context) { (9)
            return value == null ? null : value.getStringValue();
        }
    }
}
1 The binder must implement the ValueBinder interface.
2 Implement the bind method.
3 Call context.bridge(…​) to define the value bridge to use.
4 Pass the expected type of property values.
5 Pass the value bridge instance.
6 Use the context’s type factory to create an index field type.
7 Pick a base type for the index field using an as*() method.
8 Configure the type as necessary. This configuration will set defaults that are applied for any type using this bridge, but they can be overridden. Type configuration is similar to the attributes found in the various @*Field annotations. See Defining index field types for more information.
9 The value bridge must still be implemented. Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @Convert(converter = ISBNAttributeConverter.class) (1)
    @KeywordField( (2)
            valueBinder = @ValueBinderRef(type = ISBNValueBinder.class), (3)
            sortable = Sortable.YES (4)
    )
    private ISBN isbn;

    // Getters and setters
    // ...

}
1 This is unrelated to the value bridge, but necessary in order for Hibernate ORM to store the data correctly in the database.
2 Map the property to an index field.
3 Instruct Hibernate Search to use our custom value binder. Note the use of valueBinder instead of valueBridge. It is also possible to reference the binder by its name, in the case of a CDI/Spring bean.
4 Customize the field as usual. Configuration set using annotation attributes take precedence over the index field type configuration set by the value binder. For example, in this case, the field with be sortable even if the binder didn’t define the field as sortable.

When using a value binder with a specialized @*Field annotation, the index field type must be compatible with the annotation.

For example, @FullTextField will only work if the index field type was created using asString().

These restrictions are similar to those when assigning a value bridge directly; see Using value bridges in other @*Field annotations.

6.2.8. Passing parameters

The value bridges are usually applied with built-in @*Field annotation, which already accept parameters to configure the field name, whether the field is sortable, etc.

However, these parameters are not passed to the value bridge or value binder. There are two ways to pass parameters to value bridges:

  • One is (mostly) limited to string parameters, but is trivial to implement.

  • The other can allow any type of parameters, but requires you to declare your own annotations.

Simple, string parameters

You can define string parameters to the @ValueBinderRef annotation and then use them later in the binder:

Example 64. Passing parameters to a ValueBridge using the @ValueBinderRef annotation
public class BooleanAsStringBridge implements ValueBridge<Boolean, String> { (1)

    private final String trueAsString;
    private final String falseAsString;

    public BooleanAsStringBridge(String trueAsString, String falseAsString) { (2)
        this.trueAsString = trueAsString;
        this.falseAsString = falseAsString;
    }

    @Override
    public String toIndexedValue(Boolean value, ValueBridgeToIndexedValueContext context) {
        if ( value == null ) {
            return null;
        }
        return value ? trueAsString : falseAsString;
    }
}
1 Implement a bridge that does not index booleans directly, but indexes them as strings instead.
2 The bridge accepts two parameters in its constructors: the string representing true and the string representing false.
public class BooleanAsStringBinder implements ValueBinder {

    @Override
    @SuppressWarnings("unchecked")
    public void bind(ValueBindingContext<?> context) {
        String trueAsString = (String) context.param( "trueAsString" ); (1)
        String falseAsString = (String) context.param( "falseAsString" );

        context.bridge( Boolean.class, (2)
                new BooleanAsStringBridge( trueAsString, falseAsString ) );
    }
}
1 Use the binding context to get the parameter values. Getting the param using the param method assumes that the param has been defined. Alternatively it is possible to use paramOptional to get the java.util.Optional of the param.
2 Pass them as arguments to the bridge constructor.
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    @GenericField(valueBinder = @ValueBinderRef(type = BooleanAsStringBinder.class, (1)
            params = {
                    @Param(name = "trueAsString", value = "yes"),
                    @Param(name = "falseAsString", value = "no")
            }))
    private boolean published;

    @ElementCollection
    @GenericField(valueBinder = @ValueBinderRef(type = BooleanAsStringBinder.class, (2)
            params = {
                    @Param(name = "trueAsString", value = "passed"),
                    @Param(name = "falseAsString", value = "failed")
            }), name = "censorshipAssessments_allYears")
    private Map<Year, Boolean> censorshipAssessments = new HashMap<>();

    // Getters and setters
    // ...

}
1 Define the binder to use on the property, setting the fieldName parameter.
2 Because we use a value bridge, the annotation can be transparently applied to containers. Here, the bridge will be applied successively to each value in the map.
Parameters with custom annotations

You can pass parameters of any type to the bridge by defining a custom annotation with attributes:

Example 65. Passing parameters to a ValueBridge using a custom annotation
public class BooleanAsStringBridge implements ValueBridge<Boolean, String> { (1)

    private final String trueAsString;
    private final String falseAsString;

    public BooleanAsStringBridge(String trueAsString, String falseAsString) { (2)
        this.trueAsString = trueAsString;
        this.falseAsString = falseAsString;
    }

    @Override
    public String toIndexedValue(Boolean value, ValueBridgeToIndexedValueContext context) {
        if ( value == null ) {
            return null;
        }
        return value ? trueAsString : falseAsString;
    }
}
1 Implement a bridge that does not index booleans directly, but indexes them as strings instead.
2 The bridge accepts two parameters in its constructors: the string representing true and the string representing false.
@Retention(RetentionPolicy.RUNTIME) (1)
@Target({ ElementType.METHOD, ElementType.FIELD }) (2)
@PropertyMapping(processor = @PropertyMappingAnnotationProcessorRef( (3)
        type = BooleanAsStringField.Processor.class
))
@Documented (4)
@Repeatable(BooleanAsStringField.List.class) (5)
public @interface BooleanAsStringField {

    String trueAsString() default "true"; (6)

    String falseAsString() default "false";

    String name() default ""; (7)

    ContainerExtraction extraction() default @ContainerExtraction(); (7)

    @Documented
    @Target({ ElementType.METHOD, ElementType.FIELD })
    @Retention(RetentionPolicy.RUNTIME)
    @interface List {
        BooleanAsStringField[] value();
    }

    class Processor implements PropertyMappingAnnotationProcessor<BooleanAsStringField> { (8)
        @Override
        public void process(PropertyMappingStep mapping, BooleanAsStringField annotation,
                PropertyMappingAnnotationProcessorContext context) {
            BooleanAsStringBridge bridge = new BooleanAsStringBridge( (9)
                    annotation.trueAsString(), annotation.falseAsString()
            );
            mapping.genericField( annotation.name().isEmpty() ? null : annotation.name() ) (10)
                    .valueBridge( bridge ) (11)
                    .extractors( context.toContainerExtractorPath( annotation.extraction() ) ); (12)
        }
    }
}
1 Define an annotation with retention RUNTIME. Any other retention policy will cause the annotation to be ignored by Hibernate Search.
2 Since we’re defining a value bridge, allow the annotation to target either methods (getters) or fields.
3 Mark this annotation as a property mapping, and instruct Hibernate Search to apply the given processor whenever it finds this annotation. It is also possible to reference the processor by its name, in the case of a CDI/Spring bean.
4 Optionally, mark the annotation as documented, so that it is included in the javadoc of your entities.
5 Optionally, mark the annotation as repeatable, in order to be able to declare multiple fields on the same property.
6 Define custom attributes to configure the value bridge. Here we define two strings that the bridge should use to represent the boolean values true and false.
7 Since we will be using a custom annotation, and not the built-in @*Field annotation, the standard parameters that make sense for this bridge need to be declared here, too.
8 The processor must implement the PropertyMappingAnnotationProcessor interface, setting its generic type argument to the type of the corresponding annotation. Here the processor class is nested in the annotation class, because it is more convenient, but you are obviously free to implement it in a separate Java file.
9 In the process method, instantiate the bridge and pass the annotation attributes as constructor arguments.
10 Declare the field with the configured name (if provided).
11 Assign our bridge to the field. Alternatively, we could assign a value binder instead, using the valueBinder() method.
12 Configure the remaining standard parameters. Note that the context object passed to the process method exposes utility methods to convert standard Hibernate Search annotations to something that can be passed to the mapping (here, @ContainerExtraction is converted to a container extractor path).
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    @BooleanAsStringField(trueAsString = "yes", falseAsString = "no") (1)
    private boolean published;

    @ElementCollection
    @BooleanAsStringField( (2)
            name = "censorshipAssessments_allYears",
            trueAsString = "passed", falseAsString = "failed"
    )
    private Map<Year, Boolean> censorshipAssessments = new HashMap<>();

    // Getters and setters
    // ...

}
1 Apply the bridge using its custom annotation, setting the parameters.
2 Because we use a value bridge, the annotation can be transparently applied to containers. Here, the bridge will be applied successively to each value in the map.

6.2.9. Accessing the ORM session or session factory from the bridge

Contexts passed to the bridge methods can be used to retrieve the Hibernate ORM session or session factory.

Example 66. Retrieving the ORM session or session factory from a ValueBridge
public class MyDataValueBridge implements ValueBridge<MyData, String> {

    @Override
    public String toIndexedValue(MyData value, ValueBridgeToIndexedValueContext context) {
        SessionFactory sessionFactory = context.extension( HibernateOrmExtension.get() ) (1)
                .sessionFactory(); (2)
        // ... do something with the factory ...
    }

    @Override
    public MyData fromIndexedValue(String value, ValueBridgeFromIndexedValueContext context) {
        Session session = context.extension( HibernateOrmExtension.get() ) (3)
                .session(); (4)
        // ... do something with the session ...
    }
}
1 Apply an extension to the context to access content specific to Hibernate ORM.
2 Retrieve the SessionFactory from the extended context. The Session is not available here.
3 Apply an extension to the context to access content specific to Hibernate ORM.
4 Retrieve the Session from the extended context.

6.2.10. Injecting beans into the value bridge or value binder

With compatible frameworks, Hibernate Search supports injecting beans into both the ValueBridge and the ValueBinder.

This only applies to bridges/binders instantiated by Hibernate Search itself. As a rule of thumb, if you need to call new MyBridge() at some point, the bridge won’t get auto-magically injected.

The context passed to the value binder’s bind method also exposes a beanResolver() method to access the bean resolver and instantiate beans explicitly.

See Bean injection for more details.

6.2.11. Programmatic mapping

You can apply a value bridge through the programmatic mapping too. Just pass an instance of the bridge.

Example 67. Applying a ValueBridge with .valueBridge(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "isbn" )
        .keywordField().valueBridge( new ISBNValueBridge() );

Similarly, you can pass a binder instance. You can pass arguments either through the binder’s constructor or through setters.

Example 68. Applying a ValueBinder with .valueBinder(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "isbn" )
        .genericField()
                .valueBinder( new ISBNValueBinder() )
                .sortable( Sortable.YES );

6.2.12. Incubating features

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

The context passed to the value binder’s bind method exposes a bridgedElement() method that gives access to metadata about the value being bound, in particular its type.

See the javadoc for more information.

6.3. Property bridge

6.3.1. Basics

A property bridge, like a value bridge, is a pluggable component that implements the mapping of a property to one or more index fields. It is applied to a property with the @PropertyBinding annotation or with a custom annotation.

Compared to the value bridge, the property bridge is more complex to implement, but covers a broader range of use cases:

  • A property bridge can map a single property to more than one index field.

  • A property bridge can work correctly when applied to a mutable type, provided it is implemented correctly.

However, due to its rather flexible nature, the property bridge does not transparently provide all the features that come for free with a value bridge. They can be supported, but have to be implemented manually. This includes in particular container extractors, which cannot be combined with a property bridge: the property bridge must extract container values explicitly.

Implementing a property bridge requires two components:

  1. A custom implementation of PropertyBinder, to bind the bridge to a property at bootstrap. This involves declaring the parts of the property that will be used, declaring the index fields that will be populated along with their type, and instantiating the property bridge.

  2. A custom implementation of PropertyBridge, to perform the conversion at runtime. This involves extracting data from the property, transforming it if necessary, and pushing it to index fields.

Below is an example of a custom property bridge that maps a list of invoice line items to several fields summarizing the invoice.

Example 69. Implementing and using a PropertyBridge
public class InvoiceLineItemsSummaryBinder implements PropertyBinder { (1)

    @Override
    public void bind(PropertyBindingContext context) { (2)
        context.dependencies() (3)
                .use( "category" )
                .use( "amount" );

        IndexSchemaObjectField summaryField = context.indexSchemaElement() (4)
                .objectField( "summary" );

        IndexFieldType<BigDecimal> amountFieldType = context.typeFactory() (5)
                .asBigDecimal().decimalScale( 2 ).toIndexFieldType();

        context.bridge( List.class, new Bridge( (6)
                summaryField.toReference(), (7)
                summaryField.field( "total", amountFieldType ).toReference(), (8)
                summaryField.field( "books", amountFieldType ).toReference(), (8)
                summaryField.field( "shipping", amountFieldType ).toReference() (8)
        ) );
    }

    // ... class continues below
1 The binder must implement the PropertyBinder interface.
2 Implement the bind method in the binder.
3 Declare the dependencies of the bridge, i.e. the parts of the property value that the bridge will actually use. This is absolutely necessary in order for Hibernate Search to correctly trigger reindexing when these parts are modified. See Declaring dependencies to bridged elements for more information about declaring dependencies.
4 Declare the fields that are populated by this bridge. In this case we’re creating a summary object field, which will have multiple subfields (see below). See Declaring and writing to index fields for more information about declaring index fields.
5 Declare the type of the subfields. We’re going to index monetary amounts, so we will use a BigDecimal type with two digits after the decimal point. See Defining index field types for more information about declaring index field types.
6 Call context.bridge(…​) to define the property bridge to use, and pass an instance of the bridge.
7 Pass a reference to the summary object field to the bridge.
8 Create a subfield for the total amount of the invoice, a subfield for the subtotal for books, and a subfield for the subtotal for shipping. Pass references to these fields to the bridge.
    // ... class InvoiceLineItemsSummaryBinder (continued)

    @SuppressWarnings("rawtypes")
    private static class Bridge implements PropertyBridge<List> { (1)

        private final IndexObjectFieldReference summaryField;
        private final IndexFieldReference<BigDecimal> totalField;
        private final IndexFieldReference<BigDecimal> booksField;
        private final IndexFieldReference<BigDecimal> shippingField;

        private Bridge(IndexObjectFieldReference summaryField, (2)
                IndexFieldReference<BigDecimal> totalField,
                IndexFieldReference<BigDecimal> booksField,
                IndexFieldReference<BigDecimal> shippingField) {
            this.summaryField = summaryField;
            this.totalField = totalField;
            this.booksField = booksField;
            this.shippingField = shippingField;
        }

        @Override
        public void write(DocumentElement target, List bridgedElement, PropertyBridgeWriteContext context) { (3)
            @SuppressWarnings("unchecked")
            List<InvoiceLineItem> lineItems = (List<InvoiceLineItem>) bridgedElement;

            BigDecimal total = BigDecimal.ZERO;
            BigDecimal books = BigDecimal.ZERO;
            BigDecimal shipping = BigDecimal.ZERO;
            for ( InvoiceLineItem lineItem : lineItems ) { (4)
                BigDecimal amount = lineItem.getAmount();
                total = total.add( amount );
                switch ( lineItem.getCategory() ) {
                    case BOOK:
                        books = books.add( amount );
                        break;
                    case SHIPPING:
                        shipping = shipping.add( amount );
                        break;
                }
            }

            DocumentElement summary = target.addObject( this.summaryField ); (5)
            summary.addValue( this.totalField, total ); (6)
            summary.addValue( this.booksField, books ); (6)
            summary.addValue( this.shippingField, shipping ); (6)
        }
    }
}
1 The bridge must implement the PropertyBridge interface. One generic type argument must be provided: the type of the property, i.e. the type of the "bridged element". Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
2 The bridge stores references to the fields: it will need them when indexing.
3 Implement the write method in the bridge. This method is called on indexing.
4 Extract data from the bridged element, and optionally transform it.
5 Add an object to the summary object field. Note the summary field was declared at the root, so we call addObject directly on the target argument.
6 Add a value to each of the summary.total, summary.books and summary.shipping fields. Note the fields were declared as subfields of summary, so we call addValue on summaryValue instead of target.
@Entity
@Indexed
public class Invoice {

    @Id
    @GeneratedValue
    private Integer id;

    @ElementCollection
    @OrderColumn
    @PropertyBinding(binder = @PropertyBinderRef(type = InvoiceLineItemsSummaryBinder.class)) (1)
    private List<InvoiceLineItem> lineItems = new ArrayList<>();

    // Getters and setters
    // ...

}
1 Apply the bridge using the @PropertyBinding annotation.

Here is an example of what an indexed document would look like, with the Elasticsearch backend:

{
  "summary": {
    "total": 38.96,
    "books": 30.97,
    "shipping": 7.99
  }
}

6.3.2. Passing parameters

There are two ways to pass parameters to property bridges:

  • One is (mostly) limited to string parameters, but is trivial to implement.

  • The other can allow any type of parameters, but requires you to declare your own annotations.

Simple, string parameters

You can pass string parameters to the @PropertyBinderRef annotation and then use them later in the binder:

Example 70. Passing parameters to a PropertyBinder using the @PropertyBinderRef annotation
public class InvoiceLineItemsSummaryBinder implements PropertyBinder {

    @Override
    @SuppressWarnings("uncheked")
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                .use( "category" )
                .use( "amount" );

        String fieldName = (String) context.param( "fieldName" ); (1)
        IndexSchemaObjectField summaryField = context.indexSchemaElement()
                .objectField( fieldName ); (2)

        IndexFieldType<BigDecimal> amountFieldType = context.typeFactory()
                .asBigDecimal().decimalScale( 2 ).toIndexFieldType();

        context.bridge( List.class, new Bridge(
                summaryField.toReference(),
                summaryField.field( "total", amountFieldType ).toReference(),
                summaryField.field( "books", amountFieldType ).toReference(),
                summaryField.field( "shipping", amountFieldType ).toReference()
        ) );
    }

    @SuppressWarnings("rawtypes")
    private static class Bridge implements PropertyBridge<List> {

        /* ... same implementation as before ... */

    }
}
1 Use the binding context to get the parameter value. Getting the param using the param method assumes that the param has been defined. Alternatively it is possible to use paramOptional to get the java.util.Optional of the param.
2 In the bind method, use the value of parameters. Here use the fieldName parameter to set the field name, but we could pass parameters for any purpose: defining the field as sortable, defining a normalizer, …​
@Entity
@Indexed
public class Invoice {

    @Id
    @GeneratedValue
    private Integer id;

    @ElementCollection
    @OrderColumn
    @PropertyBinding(binder = @PropertyBinderRef( (1)
            type = InvoiceLineItemsSummaryBinder.class,
            params = @Param(name = "fieldName", value = "itemSummary")))
    private List<InvoiceLineItem> lineItems = new ArrayList<>();

    // Getters and setters
    // ...

}
1 Define the binder to use on the property, setting the fieldName parameter.
Parameters with custom annotations

You can pass parameters of any type to the bridge by defining a custom annotation with attributes:

Example 71. Passing parameters to a PropertyBinder using a custom annotation
@Retention(RetentionPolicy.RUNTIME) (1)
@Target({ ElementType.METHOD, ElementType.FIELD }) (2)
@PropertyMapping(processor = @PropertyMappingAnnotationProcessorRef( (3)
        type = InvoiceLineItemsSummaryBinding.Processor.class
))
@Documented (4)
public @interface InvoiceLineItemsSummaryBinding {

    String fieldName() default ""; (5)

    class Processor implements PropertyMappingAnnotationProcessor<InvoiceLineItemsSummaryBinding> { (6)
        @Override
        public void process(PropertyMappingStep mapping, InvoiceLineItemsSummaryBinding annotation,
                PropertyMappingAnnotationProcessorContext context) {
            InvoiceLineItemsSummaryBinder binder = new InvoiceLineItemsSummaryBinder(); (7)
            if ( !annotation.fieldName().isEmpty() ) { (8)
                binder.fieldName( annotation.fieldName() );
            }
            mapping.binder( binder ); (9)
        }
    }
}
1 Define an annotation with retention RUNTIME. Any other retention policy will cause the annotation to be ignored by Hibernate Search.
2 Since we’re defining a property bridge, allow the annotation to target either methods (getters) or fields.
3 Mark this annotation as a property mapping, and instruct Hibernate Search to apply the given processor whenever it finds this annotation. It is also possible to reference the processor by its name, in the case of a CDI/Spring bean.
4 Optionally, mark the annotation as documented, so that it is included in the javadoc of your entities.
5 Define an attribute of type String to specify the field name.
6 The processor must implement the PropertyMappingAnnotationProcessor interface, setting its generic type argument to the type of the corresponding annotation. Here the processor class is nested in the annotation class, because it is more convenient, but you are obviously free to implement it in a separate Java file.
7 In the annotation processor, instantiate the binder.
8 Process the annotation attributes and pass the data to the binder. Here we’re using a setter, but passing the data through the constructor would work, too.
9 Apply the binder to the property.
public class InvoiceLineItemsSummaryBinder implements PropertyBinder {

    private String fieldName = "summary";

    public InvoiceLineItemsSummaryBinder fieldName(String fieldName) { (1)
        this.fieldName = fieldName;
        return this;
    }

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                .use( "category" )
                .use( "amount" );

        IndexSchemaObjectField summaryField = context.indexSchemaElement()
                .objectField( this.fieldName ); (2)

        IndexFieldType<BigDecimal> amountFieldType = context.typeFactory()
                .asBigDecimal().decimalScale( 2 ).toIndexFieldType();

        context.bridge( List.class, new Bridge(
                summaryField.toReference(),
                summaryField.field( "total", amountFieldType ).toReference(),
                summaryField.field( "books", amountFieldType ).toReference(),
                summaryField.field( "shipping", amountFieldType ).toReference()
        ) );
    }

    @SuppressWarnings("rawtypes")
    private static class Bridge implements PropertyBridge<List> {

        /* ... same implementation as before ... */

    }
}
1 Implement setters in the binder. Alternatively, we could expose a parameterized constructor.
2 In the bind method, use the value of parameters. Here use the fieldName parameter to set the field name, but we could pass parameters for any purpose: defining the field as sortable, defining a normalizer, …​
@Entity
@Indexed
public class Invoice {

    @Id
    @GeneratedValue
    private Integer id;

    @ElementCollection
    @OrderColumn
    @InvoiceLineItemsSummaryBinding( (1)
            fieldName = "itemSummary"
    )
    private List<InvoiceLineItem> lineItems = new ArrayList<>();

    // Getters and setters
    // ...

}
1 Apply the bridge using its custom annotation, setting the fieldName parameter.

6.3.3. Accessing the ORM session from the bridge

Contexts passed to the bridge methods can be used to retrieve the Hibernate ORM session.

Example 72. Retrieving the ORM session from a PropertyBridge
private static class Bridge implements PropertyBridge<Object> {

    private final IndexFieldReference<String> field;

    private Bridge(IndexFieldReference<String> field) {
        this.field = field;
    }

    @Override
    public void write(DocumentElement target, Object bridgedElement, PropertyBridgeWriteContext context) {
        Session session = context.extension( HibernateOrmExtension.get() ) (1)
                .session(); (2)
        // ... do something with the session ...
    }
}
1 Apply an extension to the context to access content specific to Hibernate ORM.
2 Retrieve the Session from the extended context.

6.3.4. Injecting beans into the binder

With compatible frameworks, Hibernate Search supports injecting beans into:

  • the PropertyMappingAnnotationProcessor if you use custom annotations and instantiate the binder yourself.

  • the PropertyBinder if you use the @PropertyBinding annotation and let Hibernate Search instantiate the binder using your dependency injection framework.

This only applies to binders instantiated by Hibernate Search itself. As a rule of thumb, if you need to call new MyBinder() at some point, the binder won’t get auto-magically injected.

The context passed to the property binder’s bind method also exposes a beanResolver() method to access the bean resolver and instantiate beans explicitly.

See Bean injection for more details.

6.3.5. Programmatic mapping

You can apply a property bridge through the programmatic mapping too. Just pass an instance of the binder. You can pass arguments either through the binder’s constructor, or through setters.

Example 73. Applying an PropertyBinder with .binder(…​)
TypeMappingStep invoiceMapping = mapping.type( Invoice.class );
invoiceMapping.indexed();
invoiceMapping.property( "lineItems" )
        .binder( new InvoiceLineItemsSummaryBinder().fieldName( "itemSummary" ) );

6.3.6. Incubating features

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

The context passed to the property binder’s bind method exposes a bridgedElement() method that gives access to metadata about the property being bound.

The metadata can be used to inspect the property in details:

  • Getting the name of the property.

  • Checking the type of the property.

  • Getting accessors to properties.

  • Detecting properties with markers. Markers are applied by specific annotations carrying a @MarkerBinding meta-annotation.

See the javadoc for more information.

Below is an example of the simplest use of this metadata, getting the property name and using it as a field name.

Example 74. Naming a field after the property being bound in a PropertyBinder
public class InvoiceLineItemsSummaryBinder implements PropertyBinder {

    @Override
    @SuppressWarnings("uncheked")
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                .use( "category" )
                .use( "amount" );

        PojoModelProperty bridgedElement = context.bridgedElement(); (1)
        IndexSchemaObjectField summaryField = context.indexSchemaElement()
                .objectField( bridgedElement.name() ); (2)

        IndexFieldType<BigDecimal> amountFieldType = context.typeFactory()
                .asBigDecimal().decimalScale( 2 ).toIndexFieldType();

        context.bridge( List.class, new Bridge(
                summaryField.toReference(),
                summaryField.field( "total", amountFieldType ).toReference(),
                summaryField.field( "books", amountFieldType ).toReference(),
                summaryField.field( "shipping", amountFieldType ).toReference()
        ) );
    }

    @SuppressWarnings("rawtypes")
    private static class Bridge implements PropertyBridge<List> {

        /* ... same implementation as before ... */

    }
}
1 Use the binding context to get the bridged element.
2 Use the name of the property as the name of a newly declared index field.
@Entity
@Indexed
public class Invoice {

    @Id
    @GeneratedValue
    private Integer id;

    @ElementCollection
    @OrderColumn
    @PropertyBinding(binder = @PropertyBinderRef(type = InvoiceLineItemsSummaryBinder.class)) (1)
    private List<InvoiceLineItem> lineItems = new ArrayList<>();

    // Getters and setters
    // ...

}
1 Apply the bridge using the @PropertyBinding annotation.

Here is an example of what an indexed document would look like, with the Elasticsearch backend:

{
  "lineItems": {
    "total": 38.96,
    "books": 30.97,
    "shipping": 7.99
  }
}

6.4. Type bridge

6.4.1. Basics

A type bridge is a pluggable component that implements the mapping of a whole type to one or more index fields. It is applied to a type with the @TypeBinding annotation or with a custom annotation.

The type bridge is very similar to the property bridge in its core principles and in how it is implemented. The only (obvious) difference is that the property bridge is applied to properties (fields or getters), while the type bridge is applied to the type (class or interface). This entails some slight differences in the APIs exposed to the type bridge.

Implementing a type bridge requires two components:

  1. A custom implementation of TypeBinder, to bind the bridge to a type at bootstrap. This involves declaring the properties of the type that will be used, declaring the index fields that will be populated along with their type, and instantiating the type bridge.

  2. A custom implementation of TypeBridge, to perform the conversion at runtime. This involves extracting data from an instance of the type, transforming the data if necessary, and pushing it to index fields.

Below is an example of a custom type bridge that maps two properties of the Author class, the firstName and lastName, to a single fullName field.

Example 75. Implementing and using a TypeBridge
public class FullNameBinder implements TypeBinder { (1)

    @Override
    public void bind(TypeBindingContext context) { (2)
        context.dependencies() (3)
                .use( "firstName" )
                .use( "lastName" );

        IndexFieldReference<String> fullNameField = context.indexSchemaElement() (4)
                .field( "fullName", f -> f.asString().analyzer( "name" ) ) (5)
                .toReference();

        context.bridge( (6)
                Author.class, (7)
                new Bridge(
                    fullNameField (8)
                )
        );
    }

    // ... class continues below
1 The binder must implement the TypeBinder interface.
2 Implement the bind method in the binder.
3 Declare the dependencies of the bridge, i.e. the parts of the type instances that the bridge will actually use. This is absolutely necessary in order for Hibernate Search to correctly trigger reindexing when these parts are modified. See Declaring dependencies to bridged elements for more information about declaring dependencies.
4 Declare the field that will be populated by this bridge. In this case we’re creating a single fullName String field. Multiple index fields can be declared. See Declaring and writing to index fields for more information about declaring index fields.
5 Declare the type of the field. Since we’re indexing a full name, we will use a String type with a name analyzer (defined separately, see Analysis). See Defining index field types for more information about declaring index field types.
6 Call context.bridge(…​) to define the type bridge to use, and pass an instance of the bridge.
7 Pass the expected type of the entity.
8 Pass a reference to the fullName field to the bridge.
    // ... class FullNameBinder (continued)

    private static class Bridge implements TypeBridge<Author> { (1)

        private final IndexFieldReference<String> fullNameField;

        private Bridge(IndexFieldReference<String> fullNameField) { (2)
            this.fullNameField = fullNameField;
        }

        @Override
        public void write(DocumentElement target, Author author, TypeBridgeWriteContext context) { (3)
            String fullName = author.getLastName() + " " + author.getFirstName(); (4)
            target.addValue( this.fullNameField, fullName ); (5)
        }
    }
}
1 The bridge must implement the TypeBridge interface. One generic type argument must be provided: the type of the "bridged element". Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
2 The bridge stores references to the fields: it will need them when indexing.
3 Implement the write method in the bridge. This method is called on indexing.
4 Extract data from the bridged element, and optionally transform it.
5 Set the value of the fullName field. Note the fullName field was declared at the root, so we call addValue directly on the target argument.
@Entity
@Indexed
@TypeBinding(binder = @TypeBinderRef(type = FullNameBinder.class)) (1)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String firstName;

    private String lastName;

    @GenericField (2)
    private LocalDate birthDate;

    // Getters and setters
    // ...

}
1 Apply the bridge using the @TypeBinding annotation.
2 It is still possible to map properties directly using other annotations, as long as index field names are distinct from the names used in the type binder. But no annotation is necessary on the firstName and lastName properties: these are already handled by the bridge.

Here is an example of what an indexed document would look like, with the Elasticsearch backend:

{
  "fullName": "Asimov Isaac"
}

6.4.2. Passing parameters

There are two ways to pass parameters to type bridges:

  • One is (mostly) limited to string parameters, but is trivial to implement.

  • The other can allow any type of parameters, but requires you to declare your own annotations.

Simple, string parameters

You can pass string parameters to the @TypeBinderRef annotation and then use them later in the binder:

Example 76. Passing parameters to a TypeBinder using the @TypeBinderRef annotation
public class FullNameBinder implements TypeBinder {

    @Override
    @SuppressWarnings("unchecked")
    public void bind(TypeBindingContext context) {
        context.dependencies()
                .use( "firstName" )
                .use( "lastName" );

        IndexFieldReference<String> fullNameField = context.indexSchemaElement()
                .field( "fullName", f -> f.asString().analyzer( "name" ) )
                .toReference();

        IndexFieldReference<String> fullNameSortField = null;
        String sortField = (String) context.param( "sortField" ); (1)
        if ( "true".equalsIgnoreCase( sortField ) ) { (2)
            fullNameSortField = context.indexSchemaElement()
                    .field(
                            "fullName_sort",
                            f -> f.asString().normalizer( "name" ).sortable( Sortable.YES )
                    )
                    .toReference();
        }

        context.bridge( Author.class, new Bridge(
                fullNameField,
                fullNameSortField
        ) );
    }

    private static class Bridge implements TypeBridge<Author> {

        private final IndexFieldReference<String> fullNameField;
        private final IndexFieldReference<String> fullNameSortField;

        private Bridge(IndexFieldReference<String> fullNameField,
                IndexFieldReference<String> fullNameSortField) { (2)
            this.fullNameField = fullNameField;
            this.fullNameSortField = fullNameSortField;
        }

        @Override
        public void write(DocumentElement target, Author author, TypeBridgeWriteContext context) {
            String fullName = author.getLastName() + " " + author.getFirstName();

            target.addValue( this.fullNameField, fullName );
            if ( this.fullNameSortField != null ) {
                target.addValue( this.fullNameSortField, fullName );
            }
        }
    }
}
1 Use the binding context to get the parameter value. Getting the param using the param method assumes that the param has been defined. Alternatively it is possible to use paramOptional to get the java.util.Optional of the param.
2 In the bind method, use the value of parameters. Here use the sortField parameter to decide whether to add another, sortable field, but we could pass parameters for any purpose: defining the field name, defining a normalizer,custom annotation …​
@Entity
@Indexed
@TypeBinding(binder = @TypeBinderRef(type = FullNameBinder.class, (1)
        params = @Param(name = "sortField", value = "true")))
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String firstName;

    private String lastName;

    // Getters and setters
    // ...

}
1 Define the binder to use on the type, setting the sortField parameter.
Parameters with custom annotations

You can pass parameters of any type to the bridge by defining a custom annotation with attributes:

Example 77. Passing parameters to a TypeBinder using a custom annotation
@Retention(RetentionPolicy.RUNTIME) (1)
@Target({ ElementType.TYPE }) (2)
@TypeMapping(processor = @TypeMappingAnnotationProcessorRef(type = FullNameBinding.Processor.class)) (3)
@Documented (4)
public @interface FullNameBinding {

    boolean sortField() default false; (5)

    class Processor implements TypeMappingAnnotationProcessor<FullNameBinding> { (6)
        @Override
        public void process(TypeMappingStep mapping, FullNameBinding annotation,
                TypeMappingAnnotationProcessorContext context) {
            FullNameBinder binder = new FullNameBinder() (7)
                    .sortField( annotation.sortField() ); (8)
            mapping.binder( binder ); (9)
        }
    }
}
1 Define an annotation with retention RUNTIME. Any other retention policy will cause the annotation to be ignored by Hibernate Search.
2 Since we’re defining a type bridge, allow the annotation to target types.
3 Mark this annotation as a type mapping, and instruct Hibernate Search to apply the given binder whenever it finds this annotation. It is also possible to reference the binder by its name, in the case of a CDI/Spring bean.
4 Optionally, mark the annotation as documented, so that it is included in the javadoc of your entities.
5 Define an attribute of type boolean to specify whether a sort field should be added.
6 The processor must implement the TypeMappingAnnotationProcessor interface, setting its generic type argument to the type of the corresponding annotation. Here the processor class is nested in the annotation class, because it is more convenient, but you are obviously free to implement it in a separate Java file.
7 In the annotation processor, instantiate the binder.
8 Process the annotation attributes and pass the data to the binder. Here we’re using a setter, but passing the data through the constructor would work, too.
9 Apply the binder to the type.
public class FullNameBinder implements TypeBinder {

    private boolean sortField;

    public FullNameBinder sortField(boolean sortField) { (1)
        this.sortField = sortField;
        return this;
    }

    @Override
    public void bind(TypeBindingContext context) {
        context.dependencies()
                .use( "firstName" )
                .use( "lastName" );

        IndexFieldReference<String> fullNameField = context.indexSchemaElement()
                .field( "fullName", f -> f.asString().analyzer( "name" ) )
                .toReference();

        IndexFieldReference<String> fullNameSortField = null;
        if ( this.sortField ) { (2)
            fullNameSortField = context.indexSchemaElement()
                    .field(
                            "fullName_sort",
                            f -> f.asString().normalizer( "name" ).sortable( Sortable.YES )
                    )
                    .toReference();
        }

        context.bridge( Author.class, new Bridge(
                fullNameField,
                fullNameSortField
        ) );
    }

    private static class Bridge implements TypeBridge<Author> {

        private final IndexFieldReference<String> fullNameField;
        private final IndexFieldReference<String> fullNameSortField;

        private Bridge(IndexFieldReference<String> fullNameField,
                IndexFieldReference<String> fullNameSortField) { (2)
            this.fullNameField = fullNameField;
            this.fullNameSortField = fullNameSortField;
        }

        @Override
        public void write(DocumentElement target, Author author, TypeBridgeWriteContext context) {
            String fullName = author.getLastName() + " " + author.getFirstName();

            target.addValue( this.fullNameField, fullName );
            if ( this.fullNameSortField != null ) {
                target.addValue( this.fullNameSortField, fullName );
            }
        }
    }
}
1 Implement setters in the binder. Alternatively, we could expose a parameterized constructor.
2 In the bind method, use the value of parameters. Here use the sortField parameter to decide whether to add another, sortable field, but we could pass parameters for any purpose: defining the field name, defining a normalizer,custom annotation …​
@Entity
@Indexed
@FullNameBinding(sortField = true) (1)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String firstName;

    private String lastName;

    // Getters and setters
    // ...

}
1 Apply the bridge using its custom annotation, setting the sortField parameter.

6.4.3. Accessing the ORM session from the bridge

Contexts passed to the bridge methods can be used to retrieve the Hibernate ORM session.

Example 78. Retrieving the ORM session from a TypeBridge
private static class Bridge implements TypeBridge<Object> {

    private final IndexFieldReference<String> field;

    private Bridge(IndexFieldReference<String> field) {
        this.field = field;
    }

    @Override
    public void write(DocumentElement target, Object bridgedElement, TypeBridgeWriteContext context) {
        Session session = context.extension( HibernateOrmExtension.get() ) (1)
                .session(); (2)
        // ... do something with the session ...
    }
}
1 Apply an extension to the context to access content specific to Hibernate ORM.
2 Retrieve the Session from the extended context.

6.4.4. Injecting beans into the binder

With compatible frameworks, Hibernate Search supports injecting beans into:

  • the TypeMappingAnnotationProcessor if you use custom annotations and instantiate the binder yourself.

  • the TypeBinder if you use the @TypeBinding annotation and let Hibernate Search instantiate the binder using your dependency injection framework.

This only applies to binders instantiated by Hibernate Search itself. As a rule of thumb, if you need to call new MyBinder() at some point, the binder won’t get auto-magically injected.

The context passed to the routing key binder’s bind method also exposes a beanResolver() method to access the bean resolver and instantiate beans explicitly.

See Bean injection for more details.

6.4.5. Programmatic mapping

You can apply a type bridge through the programmatic mapping too. Just pass an instance of the binder. You can pass arguments either through the binder’s constructor, or through setters.

Example 79. Applying a TypeBinder with .binder(…​)
TypeMappingStep authorMapping = mapping.type( Author.class );
authorMapping.indexed();
authorMapping.binder( new FullNameBinder().sortField( true ) );

6.4.6. Incubating features

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

The context passed to the type binder’s bind method exposes a bridgedElement() method that gives access to metadata about the type being bound.

The metadata can in particular be used to inspect the type in details:

  • Getting accessors to properties.

  • Detecting properties with markers. Markers are applied by specific annotations carrying a @MarkerBinding meta-annotation.

See the javadoc for more information.

6.5. Identifier bridge

6.5.1. Basics

An identifier bridge is a pluggable component that implements the mapping of an entity property to a document identifier. It is applied to a property with the @DocumentId annotation or with a custom annotation.

Implementing an identifier bridge boils down to implementing two methods:

  • one method to convert the property value (any type) to the document identifier (a string);

  • one method to convert the document identifier back to the original property value.

Below is an example of a custom identifier bridge that converts a custom BookId type to its string representation and back:

Example 80. Implementing and using an IdentifierBridge
public class BookIdBridge implements IdentifierBridge<BookId> { (1)

    @Override
    public String toDocumentIdentifier(BookId value,
            IdentifierBridgeToDocumentIdentifierContext context) { (2)
        return value.getPublisherId() + "/" + value.getPublisherSpecificBookId();
    }

    @Override
    public BookId fromDocumentIdentifier(String documentIdentifier,
            IdentifierBridgeFromDocumentIdentifierContext context) { (3)
        String[] split = documentIdentifier.split( "/" );
        return new BookId( Long.parseLong( split[0] ), Long.parseLong( split[1] ) );
    }

}
1 The bridge must implement the IdentifierBridge interface. One generic parameters must be provided: the type of property values (values in the entity model).
2 The toDocumentIdentifier method takes the property value and a context object as parameters, and is expected to return the corresponding document identifier. It is called when indexing, but also when parameters to the search DSL must be transformed, in particular for the ID predicate.
3 The fromDocumentIdentifier methods takes the document identifier and a context object as parameters, and is expected to return the original property value. It is called when mapping search hits to the corresponding entity.
@Entity
@Indexed
public class Book {

    @EmbeddedId
    @DocumentId( (1)
            identifierBridge = @IdentifierBridgeRef(type = BookIdBridge.class) (2)
    )
    private BookId id = new BookId();

    private String title;

    // Getters and setters
    // ...

}
1 Map the property to the document identifier.
2 Instruct Hibernate Search to use our custom identifier bridge. It is also possible to reference the bridge by its name, in the case of a CDI/Spring bean.

6.5.2. Type resolution

By default, the identifier bridge’s property type is determined automatically, using reflection to extract the generic type argument of the IdentifierBridge interface.

For example, in public class MyBridge implements IdentifierBridge<BookId>, the property type is resolved to BookId: the bridge will be applied to properties of type BookId.

The fact that the type is resolved automatically using reflection brings a few limitations. In particular, it means the generic type argument cannot be just anything; as a general rule, you should stick to literal types (MyBridge implements IdentifierBridge<BookId>) and avoid generic type parameters and wildcards (MyBridge<T extends Number> implements IdentifierBridge<T>, `MyBridge implements IdentifierBridge<List<? extends Number>>).

If you need more complex types, you can bypass the automatic resolution and specify types explicitly using an IdentifierBinder.

6.5.3. Compatibility across indexes with isCompatibleWith()

An identifier bridge is involved in indexing, but also in the search DSLs, to convert values passed to the id predicate to a document identifier that the backend will understand.

When creating an id predicate targeting multiple entity types (and their indexes), Hibernate Search will have multiple bridges to choose from: one per entity type. Since only one predicate with a single value can be created, Hibernate Search needs to pick a single bridge.

By default, when a custom bridge is assigned to the field, Hibernate Search will throw an exception because it cannot decide which bridge to pick.

If the bridges assigned to the field in all indexes produce the same result, it is possible to indicate to Hibernate Search that any bridge will do by implementing isCompatibleWith.

This method accepts another bridge in parameter, and returns true if that bridge can be expected to always behave the same as this.

Example 81. Implementing isCompatibleWith to support multi-index search
public class BookOrMagazineIdBridge implements IdentifierBridge<BookOrMagazineId> {

    @Override
    public String toDocumentIdentifier(BookOrMagazineId value,
            IdentifierBridgeToDocumentIdentifierContext context) {
        return value.getPublisherId() + "/" + value.getPublisherSpecificBookId();
    }

    @Override
    public BookOrMagazineId fromDocumentIdentifier(String documentIdentifier,
            IdentifierBridgeFromDocumentIdentifierContext context) {
        String[] split = documentIdentifier.split( "/" );
        return new BookOrMagazineId( Long.parseLong( split[0] ), Long.parseLong( split[1] ) );
    }

    @Override
    public boolean isCompatibleWith(IdentifierBridge<?> other) {
        return getClass().equals( other.getClass() ); (1)
    }
}
1 Implement isCompatibleWith as necessary. Here we just deem any instance of the same class to be compatible.

6.5.4. Configuring the bridge more finely with IdentifierBinder

To configure a bridge more finely, it is possible to implement a value binder that will be executed at bootstrap. This binder will be able in particular to inspect the type of the property.

Example 82. Implementing an IdentifierBinder
public class BookIdBinder implements IdentifierBinder { (1)

    @Override
    public void bind(IdentifierBindingContext<?> context) { (2)
        context.bridge( (3)
                BookId.class, (4)
                new Bridge() (5)
        );
    }

    private static class Bridge implements IdentifierBridge<BookId> { (6)
        @Override
        public String toDocumentIdentifier(BookId value,
                IdentifierBridgeToDocumentIdentifierContext context) {
            return value.getPublisherId() + "/" + value.getPublisherSpecificBookId();
        }

        @Override
        public BookId fromDocumentIdentifier(String documentIdentifier,
                IdentifierBridgeFromDocumentIdentifierContext context) {
            String[] split = documentIdentifier.split( "/" );
            return new BookId( Long.parseLong( split[0] ), Long.parseLong( split[1] ) );
        }
    }
}
1 The binder must implement the IdentifierBinder interface.
2 Implement the bind method.
3 Call context.bridge(…​) to define the identifier bridge to use.
4 Pass the expected type of property values.
5 Pass the identifier bridge instance.
6 The identifier bridge must still be implemented. Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
@Entity
@Indexed
public class Book {

    @EmbeddedId
    @DocumentId( (1)
            identifierBinder = @IdentifierBinderRef(type = BookIdBinder.class) (2)
    )
    private BookId id = new BookId();

    @FullTextField(analyzer = "english")
    private String title;

    // Getters and setters
    // ...

}
1 Map the property to the document identifier.
2 Instruct Hibernate Search to use our custom identifier binder. Note the use of identifierBinder instead of identifierBridge. It is also possible to reference the binder by its name, in the case of a CDI/Spring bean.

6.5.5. Passing parameters

There are two ways to pass parameters to identifier bridges:

  • One is (mostly) limited to string parameters, but is trivial to implement.

  • The other can allow any type of parameters, but requires you to declare your own annotations.

Simple, string parameters

You can pass string parameters to the @IdentifierBinderRef annotation and then use them later in the binder:

Example 83. Passing parameters to an IdentifierBridge using the @IdentifierBinderRef annotation
public class OffsetIdentifierBridge implements IdentifierBridge<Integer> { (1)

    private final int offset;

    public OffsetIdentifierBridge(int offset) { (2)
        this.offset = offset;
    }

    @Override
    public String toDocumentIdentifier(Integer propertyValue, IdentifierBridgeToDocumentIdentifierContext context) {
        return String.valueOf( propertyValue + offset );
    }

    @Override
    public Integer fromDocumentIdentifier(String documentIdentifier,
            IdentifierBridgeFromDocumentIdentifierContext context) {
        return Integer.parseInt( documentIdentifier ) - offset;
    }
}
1 Implement a bridge that indexes the identifier as is, but adds a configurable offset, For example, with an offset of 1 and database identifiers starting at 0, index identifiers will start at 1.
2 The bridge accepts one parameter in its constructors: the offset to apply to identifiers.
public class OffsetIdentifierBinder implements IdentifierBinder {

    @Override
    @SuppressWarnings("unchecked")
    public void bind(IdentifierBindingContext<?> context) {
        String offset = (String) context.param( "offset" ); (1)
        context.bridge(
                Integer.class,
                new OffsetIdentifierBridge( Integer.parseInt( offset ) ) (2)
        );
    }
}
1 Use the binding context to get the parameter value. Getting the param using the param method assumes that the param has been defined. Alternatively it is possible to use paramOptional to get the java.util.Optional of the param.
2 Pass it as argument to the bridge constructor.
@Entity
@Indexed
public class Book {

    @Id
    // DB identifiers start at 0, but index identifiers start at 1
    @DocumentId(identifierBinder = @IdentifierBinderRef( (1)
            type = OffsetIdentifierBinder.class,
            params = @Param(name = "offset", value = "1")))
    private Integer id;

    private String title;

    // Getters and setters
    // ...

}
1 Define the binder to use on the identifier, setting the parameter.
Parameters with custom annotations

You can pass parameters of any type to the bridge by defining a custom annotation with attributes:

Example 84. Passing parameters to an IdentifierBridge using a custom annotation
public class OffsetIdentifierBridge implements IdentifierBridge<Integer> { (1)

    private final int offset;

    public OffsetIdentifierBridge(int offset) { (2)
        this.offset = offset;
    }

    @Override
    public String toDocumentIdentifier(Integer propertyValue, IdentifierBridgeToDocumentIdentifierContext context) {
        return String.valueOf( propertyValue + offset );
    }

    @Override
    public Integer fromDocumentIdentifier(String documentIdentifier,
            IdentifierBridgeFromDocumentIdentifierContext context) {
        return Integer.parseInt( documentIdentifier ) - offset;
    }
}
1 Implement a bridge that index the identifier as is, but adds a configurable offset, For example, with an offset of 1 and database identifiers starting at 0, index identifiers will start at 1.
2 The bridge accepts one parameter in its constructors: the offset to apply to identifiers.
@Retention(RetentionPolicy.RUNTIME) (1)
@Target({ ElementType.METHOD, ElementType.FIELD }) (2)
@PropertyMapping(processor = @PropertyMappingAnnotationProcessorRef( (3)
        type = OffsetDocumentId.Processor.class
))
@Documented (4)
public @interface OffsetDocumentId {

    int offset(); (5)

    class Processor implements PropertyMappingAnnotationProcessor<OffsetDocumentId> { (6)
        @Override
        public void process(PropertyMappingStep mapping, OffsetDocumentId annotation,
                PropertyMappingAnnotationProcessorContext context) {
            OffsetIdentifierBridge bridge = new OffsetIdentifierBridge( (7)
                    annotation.offset()
            );
            mapping.documentId() (8)
                    .identifierBridge( bridge ); (9)
        }
    }
}
1 Define an annotation with retention RUNTIME. Any other retention policy will cause the annotation to be ignored by Hibernate Search.
2 Since we’re defining an identifier bridge, allow the annotation to target either methods (getters) or fields.
3 Mark this annotation as a property mapping, and instruct Hibernate Search to apply the given processor whenever it finds this annotation. It is also possible to reference the processor by its name, in the case of a CDI/Spring bean.
4 Optionally, mark the annotation as documented, so that it is included in the javadoc of your entities.
5 Define custom attributes to configure the value bridge. Here we define an offset that the bridge should add to entity identifiers.
6 The processor must implement the PropertyMappingAnnotationProcessor interface, setting its generic type argument to the type of the corresponding annotation. Here the processor class is nested in the annotation class, because it is more convenient, but you are obviously free to implement it in a separate Java file.
7 In the process method, instantiate the bridge and pass the annotation attribute as constructor argument.
8 Declare that this property is to be used to generate the document identifier.
9 Instruct Hibernate Search to use our bridge to convert between the property and the document identifiers. Alternatively, we could pass an identifier binder instead, using the identifierBinder() method.
@Entity
@Indexed
public class Book {

    @Id
    // DB identifiers start at 0, but index identifiers start at 1
    @OffsetDocumentId(offset = 1) (1)
    private Integer id;

    private String title;

    // Getters and setters
    // ...

}
1 Apply the bridge using its custom annotation, setting the parameter.

6.5.6. Accessing the ORM session or session factory from the bridge

Contexts passed to the bridge methods can be used to retrieve the Hibernate ORM session or session factory.

Example 85. Retrieving the ORM session or session factory from an IdentifierBridge
public class MyDataIdentifierBridge implements IdentifierBridge<MyData> {

    @Override
    public String toDocumentIdentifier(MyData propertyValue, IdentifierBridgeToDocumentIdentifierContext context) {
        SessionFactory sessionFactory = context.extension( HibernateOrmExtension.get() ) (1)
                .sessionFactory(); (2)
        // ... do something with the factory ...
    }

    @Override
    public MyData fromDocumentIdentifier(String documentIdentifier,
            IdentifierBridgeFromDocumentIdentifierContext context) {
        Session session = context.extension( HibernateOrmExtension.get() ) (3)
                .session(); (4)
        // ... do something with the session ...
    }
}
1 Apply an extension to the context to access content specific to Hibernate ORM.
2 Retrieve the SessionFactory from the extended context. The Session is not available here.
3 Apply an extension to the context to access content specific to Hibernate ORM.
4 Retrieve the Session from the extended context.

6.5.7. Injecting beans into the bridge or binder

With compatible frameworks, Hibernate Search supports injection of beans into both the IdentifierBridge and the IdentifierBinder.

This only applies to beans instantiated by Hibernate Search itself. As a rule of thumb, if you need to call new MyBridge() at some point, the bridge won’t get auto-magically injected.

The context passed to the identifier binder’s bind method also exposes a beanResolver() method to access the bean resolver and instantiate beans explicitly.

See Bean injection for more details.

6.5.8. Programmatic mapping

You can apply an identifier bridge through the programmatic mapping too. Just pass an instance of the bridge.

Example 86. Applying an IdentifierBridge with .identifierBridge(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "id" )
        .documentId().identifierBridge( new BookIdBridge() );

Similarly, you can pass a binder instance:

Example 87. Applying an IdentifierBinder with .identifierBinder(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed();
bookMapping.property( "id" )
        .documentId().identifierBinder( new BookIdBinder() );

6.5.9. Incubating features

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

The context passed to the identifier binder’s bind method exposes a bridgedElement() method that gives access to metadata about the value being bound, in particular its type.

See the javadoc for more information.

6.6. Routing bridge

6.6.1. Basics

A routing bridge is a pluggable component that defines, at runtime, whether an entity should be indexed and to which shard the corresponding indexed document should be routed. It is applied to an indexed entity type with the @Indexed annotation, using its routingBinder attribute (@Indexed(routingBinder = …​)).

Implementing a routing bridge requires two components:

  1. A custom implementation of RoutingBinder, to bind the bridge to an indexed entity type at bootstrap. This involves declaring the properties of the indexed entity type that will be used by the routing bridge and instantiating the routing bridge.

  2. A custom implementation of RoutingBridge, to route entities to the index at runtime. This involves extracting data from an instance of the type, transforming the data if necessary, and defining the current route (or marking the entity as "not indexed").

    If routing can change during the lifetime of an entity instance, you will also need to define the potential previous routes, so that Hibernate Search can find and delete previous documents indexed for this entity instance.

In the sections below, you will find examples for the main use cases:

6.6.2. Using a routing bridge for conditional indexing

Below is a first example of a custom routing bridge that disables indexing for instances of the Book class if their status is ARCHIVED.

Example 88. Implementing and using a RoutingBridge for conditional indexing
public class BookStatusRoutingBinder implements RoutingBinder { (1)

    @Override
    public void bind(RoutingBindingContext context) { (2)
        context.dependencies() (3)
                .use( "status" );

        context.bridge( (4)
                Book.class, (5)
                new Bridge() (6)
        );
    }

    // ... class continues below
1 The binder must implement the RoutingBinder interface.
2 Implement the bind method in the binder.
3 Declare the dependencies of the bridge, i.e. the parts of the entity instances that the bridge will actually use. See Declaring dependencies to bridged elements for more information about declaring dependencies.
4 Call context.bridge(…​) to define the routing bridge to use.
5 Pass the expected type of indexed entities.
6 Pass the routing bridge instance.
    // ... class BookStatusRoutingBinder (continued)

    public static class Bridge implements RoutingBridge<Book> { (1)
        @Override
        public void route(DocumentRoutes routes, Object entityIdentifier, Book indexedEntity, (2)
                RoutingBridgeRouteContext context) {
            switch ( indexedEntity.getStatus() ) { (3)
                case PUBLISHED:
                    routes.addRoute(); (4)
                    break;
                case ARCHIVED:
                    routes.notIndexed(); (5)
                    break;
            }
        }

        @Override
        public void previousRoutes(DocumentRoutes routes, Object entityIdentifier, Book indexedEntity, (6)
                RoutingBridgeRouteContext context) {
            routes.addRoute(); (7)
        }
    }
}
1 The bridge must implement the RoutingBridge interface. Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
2 Implement the route(…​) method in the bridge. This method is called on indexing.
3 Extract data from the bridged element and inspect it.
4 If the Book status is PUBLISHED, then we want to proceed with indexing: add a route so that Hibernate Search indexes the entity as usual.
5 If the Book status is ARCHIVED, then we don’t want to index it: call notIndexed() so that Hibernate Search knows it should not index the entity.
6 When a book gets archived, there might be a previously indexed document that needs to be deleted. The previousRoutes(…​) method allows you to tell Hibernate Search where this document can possibly be. When necessary, Hibernate Search will follow each given route, look for documents corresponding to this entity, and delete them.
7 In this case, routing is very simple: there is only one possible previous route, so we only register that route.
@Entity
@Indexed(routingBinder = @RoutingBinderRef(type = BookStatusRoutingBinder.class)) (1)
public class Book {

    @Id
    private Integer id;

    private String title;

    @Basic(optional = false)
    @KeywordField (2)
    private Status status;

    // Getters and setters
    // ...

}
1 Apply the bridge using the @Indexed annotation.
2 Properties used in the bridge can still be mapped as index fields, but they don’t have to be.

6.6.3. Using a routing bridge to control routing to index shards

For a preliminary introduction to sharding, including how it works in Hibernate Search and what its limitations are, see Sharding and routing.

Routing bridges can also be used to control routing to index shards.

Below is an example of a custom routing bridge that uses the genre property of the Book class as a routing key. See Routing for an example of how to use routing in search queries, with the same mapping as the example below.

Example 89. Implementing and using a RoutingBridge to control routing to index shards
public class BookGenreRoutingBinder implements RoutingBinder { (1)

    @Override
    public void bind(RoutingBindingContext context) { (2)
        context.dependencies() (3)
                .use( "genre" );

        context.bridge( (4)
                Book.class, (5)
                new Bridge() (6)
        );
    }

    // ... class continues below
1 The binder must implement the RoutingBinder interface.
2 Implement the bind method in the binder.
3 Declare the dependencies of the bridge, i.e. the parts of the entity instances that the bridge will actually use. See Declaring dependencies to bridged elements for more information about declaring dependencies.
4 Call context.bridge(…​) to define the routing bridge to use.
5 Pass the expected type of indexed entities.
6 Pass the routing bridge instance.
    // ... class BookGenreRoutingBinder (continued)

    public static class Bridge implements RoutingBridge<Book> { (1)
        @Override
        public void route(DocumentRoutes routes, Object entityIdentifier, Book indexedEntity, (2)
                RoutingBridgeRouteContext context) {
            String routingKey = indexedEntity.getGenre().name(); (3)
            routes.addRoute().routingKey( routingKey ); (4)
        }

        @Override
        public void previousRoutes(DocumentRoutes routes, Object entityIdentifier, Book indexedEntity, (5)
                RoutingBridgeRouteContext context) {
            for ( Genre possiblePreviousGenre : Genre.values() ) {
                String routingKey = possiblePreviousGenre.name();
                routes.addRoute().routingKey( routingKey ); (6)
            }
        }
    }
}
1 The bridge must implement the RoutingBridge interface. Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
2 Implement the route(…​) method in the bridge. This method is called on indexing.
3 Extract data from the bridged element and derive a routing key.
4 Add a route with the generated routing key. Hibernate Search will follow this route when adding/updating/deleting the entity in the index.
5 When the genre of a book changes, the route will change, and there it might be a previously indexed document in the index that needs to be deleted. The previousRoutes(…​) method allows you to tell Hibernate Search where this document can possibly be. When necessary, Hibernate Search will follow each given route, look for documents corresponding to this entity, and delete them.
6 In this case, we simply don’t know what the previous genre of the book was, so we tell Hibernate Search to follow all possible routes, one for every possible genre.
@Entity
@Indexed(routingBinder = @RoutingBinderRef(type = BookGenreRoutingBinder.class)) (1)
public class Book {

    @Id
    private Integer id;

    private String title;

    @Basic(optional = false)
    @KeywordField (2)
    private Genre genre;

    // Getters and setters
    // ...

}
1 Apply the bridge using the @Indexed annotation.
2 Properties used in the bridge can still be mapped as index fields, but they don’t have to be.
Optimizing previousRoutes(…​)

In some cases you might have more information than in the example above about the previous routes, and you can take advantage of that information to trigger fewer deletions in the index:

  • If the routing key is derived from an immutable property, then you can be sure the route never changes. In that case, just call route(…​) with the arguments passed to previousRoutes(…​) to tell Hibernate Search that the previous route is the same as the current route, and Hibernate Search will skip the deletion.

  • If the routing key is derived from a property that changes in a predictable way, e.g. a status that always goes from DRAFT to PUBLISHED to ARCHIVED and never goes back, then you can be sure the previous routes are those corresponding to the possible previous values. In that case, just add one route for each possible previous status, e.g. if the current status is PUBLISHED you only need to add a route for DRAFT and PUBLISHED, but not for ARCHIVED.

6.6.4. Passing parameters

There are two ways to pass parameters to routing bridges:

  • One is (mostly) limited to string parameters, but is trivial to implement.

  • The other can allow any type of parameters, but requires you to declare your own annotations.

Refer to this example for TypeBinder, which is fairly similar to what you’ll need for a RoutingBinder.

6.6.5. Accessing the ORM session from the bridge

Contexts passed to the bridge methods can be used to retrieve the Hibernate ORM session.

Example 90. Retrieving the ORM session from a RoutingBridge
private static class Bridge implements RoutingBridge<MyEntity> {

    @Override
    public void route(DocumentRoutes routes, Object entityIdentifier, MyEntity indexedEntity,
            RoutingBridgeRouteContext context) {
        Session session = context.extension( HibernateOrmExtension.get() ) (1)
                .session(); (2)
        // ... do something with the session ...
    }

    @Override
    public void previousRoutes(DocumentRoutes routes, Object entityIdentifier, MyEntity indexedEntity,
            RoutingBridgeRouteContext context) {
        // ...
    }
}
1 Apply an extension to the context to access content specific to Hibernate ORM.
2 Retrieve the Session from the extended context.

6.6.6. Injecting beans into the binder

With compatible frameworks, Hibernate Search supports injecting beans into:

  • the TypeMappingAnnotationProcessor if you use custom annotations and instantiate the binder yourself.

  • the RoutingBinder if you use @Indexed(routingBinder = …​) and let Hibernate Search instantiate the binder using your dependency injection framework.

This only applies to binders instantiated by Hibernate Search itself. As a rule of thumb, if you need to call new MyBinder() at some point, the binder won’t get auto-magically injected.

The context passed to the routing binder’s bind method also exposes a beanResolver() method to access the bean resolver and instantiate beans explicitly.

See Bean injection for more details.

6.6.7. Programmatic mapping

You can apply a routing key bridge through the programmatic mapping too. Just pass an instance of the binder.

Example 91. Applying an RoutingBinder with .binder(…​)
TypeMappingStep bookMapping = mapping.type( Book.class );
bookMapping.indexed()
        .routingBinder( new BookStatusRoutingBinder() );
bookMapping.property( "status" ).keywordField();

6.6.8. Incubating features

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

The context passed to the routing binder’s bind method exposes a bridgedElement() method that gives access to metadata about the type being bound.

The metadata can in particular be used to inspect the type in details:

  • Getting accessors to properties.

  • Detecting properties with markers. Markers are applied by specific annotations carrying a @MarkerBinding meta-annotation.

See the javadoc for more information.

6.7. Declaring dependencies to bridged elements

6.7.1. Basics

In order to keep the index synchronized, Hibernate Search needs to be aware of all the entity properties that are used to produce indexed documents, so that it can trigger reindexing when they change.

When using a type bridge or a property bridge, the bridge itself decides which entity properties to access during indexing. Thus, it needs to let Hibernate Search know of its "dependencies" (the entity properties it may access).

This is done through a dedicated DSL, accessible from the bind(…​) method of TypeBinder and PropertyBinder.

Below is an example of a type binder that expects to be applied to the ScientificPaper type, and declares a dependency to the paper author’s last name and first name.

Example 92. Declaring dependencies in a bridge
public class AuthorFullNameBinder implements TypeBinder {

    @Override
    public void bind(TypeBindingContext context) {
        context.dependencies() (1)
                .use( "author.firstName" ) (2)
                .use( "author.lastName" ); (3)

        IndexFieldReference<String> authorFullNameField = context.indexSchemaElement()
                .field( "authorFullName", f -> f.asString().analyzer( "name" ) )
                .toReference();

        context.bridge( Book.class, new Bridge( authorFullNameField ) );
    }

    private static class Bridge implements TypeBridge<Book> {

        // ...
    }
}
1 Start the declaration of dependencies.
2 Declare that the bridge will access the paper’s author property, then the author’s firstName property.
3 Declare that the bridge will access the paper’s author property, then the author’s lastName property.

The above should be enough to get started, but if you want to know more, here are a few facts about declaring dependencies.

Paths are relative to the bridged element

For example:

  • for a type bridge on type ScientificPaper, path author will refer to the value of property author on ScientificPaper instances.

  • for a property bridge on the property author of ScientificPaper, path name will refer to the value of property name on Author instances.

Every component of given paths will be considered as a dependency

You do not need to declare every sub-path.

For example, if the path myProperty.someOtherProperty is declared as used, Hibernate Search will automatically assume that myProperty is also used.

Only mutable properties need to be declared

If a property never, ever changes after the entity is first persisted, then it will never trigger reindexing and Hibernate Search does not need to know about the dependency.

If your bridge only relies on immutable properties, see useRootOnly(): declaring no dependency at all.

Associations included in dependency paths need to have an inverse side

If you declare a dependency that crosses entity boundaries through an association, and that association has no inverse side in the other entity, an exception will be thrown.

For example, when you declare a dependency to path author.lastName, Hibernate Search infers that whenever the last name of an author changes, its books need to be re-indexed. Thus, when it detects an author’s last name changed, Hibernate Search will need to retrieve the books to reindex them. That’s why the author association in entity ScientificPaper needs to have an inverse side in entity Author, e.g. a books association.

See Tuning automatic reindexing for more information about these constraints and how to address non-trivial models.

6.7.2. Traversing non-default containers (map keys, …​)

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

When a path element refers to a property of a container type (List, Map, Optional, …​), the path will be implicitly resolved to elements of that container. For example someMap.otherObject will resolve to the otherObject property of the values (not the keys) of someMap.

If the default resolution is not what you need, you can explicitly control how to traverse containers by passing PojoModelPath objects instead of just strings:

Example 93. Declaring dependencies in a bridge with explicit container extractors
@Entity
@Indexed
@TypeBinding(binder = @TypeBinderRef(type = BookEditionsForSaleTypeBinder.class)) (1)
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "name")
    private String title;

    @ElementCollection
    @JoinTable(
            name = "book_editionbyprice",
            joinColumns = @JoinColumn(name = "book_id")
    )
    @MapKeyJoinColumn(name = "edition_id")
    @Column(name = "price")
    @OrderBy("edition_id asc")
    @AssociationInverseSide(
            extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY),
            inversePath = @ObjectPath( @PropertyValue( propertyName = "book" ) )
    )
    private Map<BookEdition, BigDecimal> priceByEdition = new LinkedHashMap<>(); (2)

    public Book() {
    }

    // Getters and setters
    // ...

}
1 Apply a custom bridge to the ScientificPaper entity.
2 This (rather complex) map is the one we’ll access in the custom bridge.
public class BookEditionsForSaleTypeBinder implements TypeBinder {

    @Override
    public void bind(TypeBindingContext context) {
        context.dependencies()
                .use( PojoModelPath.builder() (1)
                        .property( "priceByEdition" ) (2)
                        .value( BuiltinContainerExtractors.MAP_KEY ) (3)
                        .property( "label" ) (4)
                        .toValuePath() ); (5)

        IndexFieldReference<String> editionsForSaleField = context.indexSchemaElement()
                .field( "editionsForSale", f -> f.asString().analyzer( "english" ) )
                .multiValued()
                .toReference();

        context.bridge( Book.class, new Bridge( editionsForSaleField ) );
    }

    private static class Bridge implements TypeBridge<Book> {

        private final IndexFieldReference<String> editionsForSaleField;

        private Bridge(IndexFieldReference<String> editionsForSaleField) {
            this.editionsForSaleField = editionsForSaleField;
        }

        @Override
        public void write(DocumentElement target, Book book, TypeBridgeWriteContext context) {
            for ( BookEdition edition : book.getPriceByEdition().keySet() ) { (6)
                target.addValue( editionsForSaleField, edition.getLabel() );
            }
        }
    }
}
1 Start building a PojoModelPath.
2 Append the priceByEdition property (a Map) to the path.
3 Explicitly mention that the bridge will access keys from the priceByEdition map — the paper editions. Without this, Hibernate Search would have assumed that values are accessed.
4 Append the label property to the path. This is the label property in paper editions.
5 Create the path and pass it to .use(…​) to declare the dependency.
6 This is the actual code that accesses the paths as declared above.

For property binders applied to a container property, you can control how to traverse the property itself by passing a container extractor path as the first argument to use(…​):

Example 94. Declaring dependencies in a bridge with explicit container extractors for the bridged property
@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "name")
    private String title;

    @ElementCollection
    @JoinTable(
            name = "book_editionbyprice",
            joinColumns = @JoinColumn(name = "book_id")
    )
    @MapKeyJoinColumn(name = "edition_id")
    @Column(name = "price")
    @OrderBy("edition_id asc")
    @AssociationInverseSide(
            extraction = @ContainerExtraction(BuiltinContainerExtractors.MAP_KEY),
            inversePath = @ObjectPath( @PropertyValue( propertyName = "book" ) )
    )
    @PropertyBinding(binder = @PropertyBinderRef(type = BookEditionsForSalePropertyBinder.class)) (1)
    private Map<BookEdition, BigDecimal> priceByEdition = new LinkedHashMap<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
1 Apply a custom bridge to the pricesByEdition property of the ScientificPaper entity.
public class BookEditionsForSalePropertyBinder implements PropertyBinder {

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                .use( ContainerExtractorPath.explicitExtractor( BuiltinContainerExtractors.MAP_KEY ), (1)
                        "label" ); (2)

        IndexFieldReference<String> editionsForSaleField = context.indexSchemaElement()
                .field( "editionsForSale", f -> f.asString().analyzer( "english" ) )
                .multiValued()
                .toReference();

        context.bridge( Map.class, new Bridge( editionsForSaleField ) );
    }

    @SuppressWarnings("rawtypes")
    private static class Bridge implements PropertyBridge<Map> {

        private final IndexFieldReference<String> editionsForSaleField;

        private Bridge(IndexFieldReference<String> editionsForSaleField) {
            this.editionsForSaleField = editionsForSaleField;
        }

        @Override
        public void write(DocumentElement target, Map bridgedElement, PropertyBridgeWriteContext context) {
            @SuppressWarnings("unchecked")
            Map<BookEdition, ?> priceByEdition = (Map<BookEdition, ?>) bridgedElement;

            for ( BookEdition edition : priceByEdition.keySet() ) { (3)
                target.addValue( editionsForSaleField, edition.getLabel() );
            }
        }
    }
}
1 Explicitly mention that the bridge will access keys from the priceByEdition property — the paper editions. Without this, Hibernate Search would have assumed that values are accessed.
2 Declare a dependency to the label property in paper editions.
3 This is the actual code that accesses the paths as declared above.

6.7.3. useRootOnly(): declaring no dependency at all

If your bridge only accesses immutable properties, then it’s safe to declare that its only dependency is to the root object.

To do so, call .dependencies().useRootOnly().

Without this call, Hibernate Search will suspect an oversight and will throw an exception on startup.

6.7.4. fromOtherEntity(…​): declaring dependencies using the inverse path

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

It is not always possible to represent the dependency as a path from the bridged element to the values accessed by the bridge.

In particular, when the bridge relies on other components (queries, services) to retrieve another entity, there may not even be a path from the bridge element to that entity. In this case, if there is an inverse path from the other entity to the bridged element, and the bridged element is an entity, you can simply declare the dependency from the other entity, as shown below.

Example 95. Declaring dependencies in a bridge using the inverse path
@Entity
@Indexed
@TypeBinding(binder = @TypeBinderRef(type = ScientificPapersReferencedByBinder.class)) (1)
public class ScientificPaper {

    @Id
    private Integer id;

    private String title;

    @ManyToMany
    private List<ScientificPaper> references = new ArrayList<>();

    public ScientificPaper() {
    }

    // Getters and setters
    // ...

}
1 Apply a custom bridge to the ScientificPaper type.
public class ScientificPapersReferencedByBinder implements TypeBinder {

    @Override
    public void bind(TypeBindingContext context) {
        context.dependencies()
                .fromOtherEntity( ScientificPaper.class, "references" ) (1)
                .use( "title" ); (2)

        IndexFieldReference<String> papersReferencingThisOneField = context.indexSchemaElement()
                .field( "referencedBy", f -> f.asString().analyzer( "english" ) )
                .multiValued()
                .toReference();

        context.bridge( ScientificPaper.class, new Bridge( papersReferencingThisOneField ) );
    }

    private static class Bridge implements TypeBridge<ScientificPaper> {

        private final IndexFieldReference<String> referencedByField;

        private Bridge(IndexFieldReference<String> referencedByField) { (2)
            this.referencedByField = referencedByField;
        }

        @Override
        public void write(DocumentElement target, ScientificPaper paper, TypeBridgeWriteContext context) {
            for ( String referencingPaperTitle : findReferencingPaperTitles( context, paper ) ) { (3)
                target.addValue( referencedByField, referencingPaperTitle );
            }
        }

        private List<String> findReferencingPaperTitles(TypeBridgeWriteContext context, ScientificPaper paper) {
            Session session = context.extension( HibernateOrmExtension.get() ).session();
            Query<String> query = session.createQuery(
                    "select p.title from ScientificPaper p where :this member of p.references",
                    String.class );
            query.setParameter( "this", paper );
            return query.list();
        }
    }
}
1 Declare that this bridge relies on other entities of type ScientificPaper, and that those other entities reference the indexed entity through their references property.
2 Declare which parts of the other entities are actually used by the bridge.
3 The bridge retrieves the other entity through a query, but then uses exclusively the parts that were declared previously.

Currently, dependencies declared this way will be ignored when the "other entity" gets deleted.

See HSEARCH-3567 to track progress on solving this problem.

6.8. Declaring and writing to index fields

6.8.1. Basics

When implementing a PropertyBinder or TypeBinder, it is necessary to declare the index fields that the bridge will contribute to. This declaration is performed using a dedicated DSL.

The entry point to this DSL is the IndexNode, which represents the part of the document structure that the binder will push data to. From the IndexNode, it is possible to declare fields.

The declaration of each field yields a field reference. This reference is to be stored in the bridge, which will use it at runtime to set the value of this field in a given document, represented by a DocumentElement.

Below is a simple example using the DSL to declare a single field in a property binder and then write to that field in a property bridge.

Example 96. Declaring a simple index field and writing to that field
public class ISBNBinder implements PropertyBinder {

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                /* ... (declaration of dependencies, not relevant) ... */

        IndexSchemaElement schemaElement = context.indexSchemaElement(); (1)

        IndexFieldReference<String> field =
                schemaElement.field( (2)
                        "isbn", (3)
                        f -> f.asString() (4)
                                .normalizer( "isbn" )
                )
                        .toReference(); (5)

        context.bridge( (6)
                ISBN.class, (7)
                new ISBNBridge( field ) (8)
        );
    }
}
1 Get the IndexNode, the entry point to the index field declaration DSL.
2 Declare a field.
3 Pass the name of the field.
4 Declare the type of the field. This is done through a lambda taking advantage of another DSL. See Defining index field types for more information.
5 Get a reference to the declared field.
6 Call context.bridge(…​) to define the bridge to use.
7 Pass the expected type of values.
8 Pass the bridge instance.
private static class ISBNBridge implements PropertyBridge<ISBN> {

    private final IndexFieldReference<String> fieldReference;

    private ISBNBridge(IndexFieldReference<String> fieldReference) {
        this.fieldReference = fieldReference;
    }

    @Override
    public void write(DocumentElement target, ISBN bridgedElement, PropertyBridgeWriteContext context) {
        String indexedValue = /* ... (extraction of data, not relevant) ... */
        target.addValue( this.fieldReference, indexedValue ); (1)
    }
}
1 In the bridge, use the reference obtained above to add a value to the field for the current document.

6.8.2. Type objects

The lambda syntax to declare the type of each field is convenient, but sometimes gets in the way, in particular when multiple fields must be declared with the exact same type.

For that reason, the context object passed to binders exposes a typeFactory() method. Using this factory, it is possible to build IndexFieldType objects that can be re-used in multiple field declarations.

Example 97. Re-using an index field type in multiple field declarations
@Override
public void bind(TypeBindingContext context) {
    context.dependencies()
            /* ... (declaration of dependencies, not relevant) ... */

    IndexSchemaElement schemaElement = context.indexSchemaElement();

    IndexFieldType<String> nameType = context.typeFactory() (1)
            .asString() (2)
            .analyzer( "name" )
            .toIndexFieldType(); (3)

    context.bridge( Author.class, new Bridge(
            schemaElement.field( "firstName", nameType ) (4)
                    .toReference(),
            schemaElement.field( "lastName", nameType ) (4)
                    .toReference(),
            schemaElement.field( "fullName", nameType ) (4)
                    .toReference()
    ) );
}
1 Get the type factory.
2 Define the type.
3 Get the resulting type.
4 Pass the type directly instead of using a lambda when defining the field.

6.8.3. Multivalued fields

Fields are considered single-valued by default: if you attempt to add multiple values to a single-valued field during indexing, an exception will be thrown.

In order to add multiple values to a field, this field must be marked as multivalued during its declaration:

Example 98. Declaring a field as multivalued
@Override
public void bind(TypeBindingContext context) {
    context.dependencies()
            /* ... (declaration of dependencies, not relevant) ... */

    IndexSchemaElement schemaElement = context.indexSchemaElement();

    context.bridge( Author.class, new Bridge(
            schemaElement.field( "names", f -> f.asString().analyzer( "name" ) )
                    .multiValued() (1)
                    .toReference()
    ) );
}
1 Declare the field as multivalued.

6.8.4. Object fields

The previous sections only presented flat schemas with value fields, but the index schema can actually be organized in a tree structure, with two categories of index fields:

  • Value fields, often simply called "fields", which hold an atomic value of a specific type: string, integer, date, …​

  • Object fields, which hold a composite value.

Object fields are declared similarly to value fields, with an additional step to declare each subfield, as shown below.

Example 99. Declaring an object field
@Override
public void bind(PropertyBindingContext context) {
    context.dependencies()
            /* ... (declaration of dependencies, not relevant) ... */

    IndexSchemaElement schemaElement = context.indexSchemaElement();

    IndexSchemaObjectField summaryField =
            schemaElement.objectField( "summary" ); (1)

    IndexFieldType<BigDecimal> amountFieldType = context.typeFactory()
            .asBigDecimal().decimalScale( 2 )
            .toIndexFieldType();

    context.bridge( List.class, new Bridge(
            summaryField.toReference(), (2)
            summaryField.field( "total", amountFieldType ) (3)
                    .toReference(),
            summaryField.field( "books", amountFieldType ) (3)
                    .toReference(),
            summaryField.field( "shipping", amountFieldType ) (3)
                    .toReference()
    ) );
}
1 Declare an object field with objectField, passing its name in parameter.
2 Get a reference to the declared object field and pass it to the bridge for later use.
3 Create subfields, get references to these fields and pass them to the bridge for later use.

The subfields of an object field can include object fields.

Just as value fields, object fields are single-valued by default. Be sure to call .multiValued() during the object field definition if you want to make it multivalued.

Object fields as well as their subfields are each assigned a reference, which will be used by the bridge to write to documents, as shown in the example below.

Example 100. Writing to an object field
@Override
public void write(DocumentElement target, List bridgedElement, PropertyBridgeWriteContext context) {
    @SuppressWarnings("unchecked")
    List<InvoiceLineItem> lineItems = (List<InvoiceLineItem>) bridgedElement;

    BigDecimal total = BigDecimal.ZERO;
    BigDecimal books = BigDecimal.ZERO;
    BigDecimal shipping = BigDecimal.ZERO;
    /* ... (computation of amounts, not relevant) ... */

    DocumentElement summary = target.addObject( this.summaryField ); (1)
    summary.addValue( this.totalField, total ); (2)
    summary.addValue( this.booksField, books ); (2)
    summary.addValue( this.shippingField, shipping ); (2)
}
1 Add an object to the summary object field for the current document, and get a reference to that object.
2 Add a value to the subfields for the object we just added. Note we’re calling addValue on the object we just added, not on target.

6.8.5. Object structure

By default, object fields are flattened, meaning that the tree structure is not preserved. See DEFAULT or FLATTENED structure for more information.

It is possible to switch to a nested structure by passing an argument to the objectField method, as shown below. Each value of the object field will then transparently be indexed as a separate nested document, without any change to the write method of the bridge.

Example 101. Declaring an object field as nested
@Override
public void bind(PropertyBindingContext context) {
    context.dependencies()
            /* ... (declaration of dependencies, not relevant) ... */

    IndexSchemaElement schemaElement = context.indexSchemaElement();

    IndexSchemaObjectField lineItemsField =
            schemaElement.objectField( (1)
                    "lineItems", (2)
                    ObjectStructure.NESTED (3)
            )
            .multiValued(); (4)

    context.bridge( List.class, new Bridge(
            lineItemsField.toReference(), (5)
            lineItemsField.field( "category", f -> f.asString() ) (6)
                    .toReference(),
            lineItemsField.field( "amount", f -> f.asBigDecimal().decimalScale( 2 ) ) (7)
                    .toReference()
    ) );
}
1 Declare an object field with objectField.
2 Define the name of the object field.
3 Define the structure of the object field, here NESTED.
4 Define the object field as multivalued.
5 Get a reference to the declared object field and pass it to the bridge for later use.
6 Create subfields, get references to these fields and pass them to the bridge for later use.

6.8.6. Dynamic fields with field templates

Field declared in the sections above are all static: their path and type are known on bootstrap.

In some very specific cases, the path of a field is not known until you actually index it; for example, you may want to index a Map<String, Integer> by using the map keys as field names, or index the properties of a JSON object whose schema is not known in advance. The fields, then, are considered dynamic.

Dynamic fields are not declared on bootstrap, but need to match a field template that is declared on bootstrap. The template includes the field types and structural information (multivalued or not, …​), but omits the field names.

A field template is always declared in a binder: either in a type binder or in a property binder. As for static fields, the entry point to declaring a template is the IndexNode passed to the binder’s bind(…​) method A call to the fieldTemplate method on the schema element will declare a field template.

Assuming a field template was declared during binding, the bridge can then add dynamic fields to the DocumentElement when indexing, by calling addValue and passing the field name (as a string) and the field value.

Below is a simple example using the DSL to declare a field template in a property binder and then write to that field in a property bridge.

Example 102. Declaring a field template and writing to a dynamic field
public class UserMetadataBinder implements PropertyBinder {

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                /* ... (declaration of dependencies, not relevant) ... */

        IndexSchemaElement schemaElement = context.indexSchemaElement();

        IndexSchemaObjectField userMetadataField =
                schemaElement.objectField( "userMetadata" ); (1)

        userMetadataField.fieldTemplate( (2)
                "userMetadataValueTemplate", (3)
                f -> f.asString().analyzer( "english" ) (4)
        ); (5)

        context.bridge( Map.class, new UserMetadataBridge( userMetadataField.toReference() ) ); (6)
    }
}
1 Declare an object field with objectField. It’s better to always host your dynamic fields on a dedicated object field, to avoid conflicts with other templates.
2 Declare a field template with fieldTemplate.
3 Pass the template name — this is not the field name, and is only used to uniquely identify the template.
4 Define the field type.
5 On contrary to static field declarations, field template declarations do not return a field reference, because you won’t need it when writing to the document.
6 Get a reference to the declared object field and pass it to the bridge for later use.
@SuppressWarnings("rawtypes")
private static class UserMetadataBridge implements PropertyBridge<Map> {

    private final IndexObjectFieldReference userMetadataFieldReference;

    private UserMetadataBridge(IndexObjectFieldReference userMetadataFieldReference) {
        this.userMetadataFieldReference = userMetadataFieldReference;
    }

    @Override
    public void write(DocumentElement target, Map bridgedElement, PropertyBridgeWriteContext context) {
        @SuppressWarnings("unchecked")
        Map<String, String> userMetadata = (Map<String, String>) bridgedElement;

        DocumentElement indexedUserMetadata = target.addObject( userMetadataFieldReference ); (1)

        for ( Map.Entry<String, String> entry : userMetadata.entrySet() ) {
            String fieldName = entry.getKey();
            String fieldValue = entry.getValue();
            indexedUserMetadata.addValue( fieldName, fieldValue ); (2)
        }
    }
}
1 Add an object to the userMetadata object field for the current document, and get a reference to that object.
2 Add one field per user metadata entry, with the field name and field value defined by the user. Note that field names should usually be validated before that point, in order to avoid exotic characters (whitespaces, dots, …​).

Though rarely necessary, you can also declare templates for object fields using the objectFieldTemplate methods.

It is also possible to add multiple fields with different types to the same object. To that end, make sure that the format of a field can be inferred from the field name. You can then declare multiple templates and assign a path pattern to each template, as shown below.

Example 103. Declaring multiple field templates with different types
public class MultiTypeUserMetadataBinder implements PropertyBinder {

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies()
                /* ... (declaration of dependencies, not relevant) ... */

        IndexSchemaElement schemaElement = context.indexSchemaElement();

        IndexSchemaObjectField userMetadataField =
                schemaElement.objectField( "multiTypeUserMetadata" ); (1)

        userMetadataField.fieldTemplate( (2)
                "userMetadataValueTemplate_int", (3)
                f -> f.asInteger().sortable( Sortable.YES ) (4)
        )
                .matchingPathGlob( "*_int" ); (5)

        userMetadataField.fieldTemplate( (6)
                "userMetadataValueTemplate_default",
                f -> f.asString().analyzer( "english" )
        );

        context.bridge( Map.class, new Bridge( userMetadataField.toReference() ) );
    }
}
1 Declare an object field with objectField.
2 Declare a field template for integer with fieldTemplate.
3 Pass the template name.
4 Define the field type as integer, sortable.
5 Assign a path pattern to the template, so that only fields ending with _int will be considered as integers.
6 Declare another field template, so that fields are considered as english text if they do not match the previous template.
@SuppressWarnings("rawtypes")
private static class Bridge implements PropertyBridge<Map> {

    private final IndexObjectFieldReference userMetadataFieldReference;

    private Bridge(IndexObjectFieldReference userMetadataFieldReference) {
        this.userMetadataFieldReference = userMetadataFieldReference;
    }

    @Override
    public void write(DocumentElement target, Map bridgedElement, PropertyBridgeWriteContext context) {
        @SuppressWarnings("unchecked")
        Map<String, Object> userMetadata = (Map<String, Object>) bridgedElement;

        DocumentElement indexedUserMetadata = target.addObject( userMetadataFieldReference ); (1)

        for ( Map.Entry<String, Object> entry : userMetadata.entrySet() ) {
            String fieldName = entry.getKey();
            Object fieldValue = entry.getValue();
            indexedUserMetadata.addValue( fieldName, fieldValue ); (2)
        }
    }
}
1 Add an object to the userMetadata object field for the current document, and get a reference to that object.
2 Add one field per user metadata entry, with the field name and field value defined by the user. Note that field values should be validated before that point; in this case, adding a field named foo_int with a value of type String will lead to a SearchException when indexing.
Precedence of field templates

Hibernate Search tries to match templates in the order they are declared, so you should always declare the templates with the most specific path pattern first.

Templates declared on a given schema element can be matched in children of that element. For example, if you declare templates at the root of your entity (through a type bridge), these templates will be implicitly available in every single property bridge of that entity. In such cases, templates declared in property bridges will take precedence over those declared in the type bridge.

6.9. Defining index field types

6.9.1. Basics

A specificity of Lucene-based search engines (including Elasticsearch) is that field types are much more complex than just a data type ("string", "integer", …​).

When declaring a field, you must not only declare the data type, but also various characteristics that will define how the data is stored exactly: is the field sortable, is it projectable, is it analyzed and if so with which analyzer, …​

Because of this complexity, when field types must be defined in the various binders (ValueBinder, PropertyBinder, TypeBinder), they are defined using a dedicated DSL.

The entry point to this DSL is the IndexFieldTypeFactory. The type factory is generally accessible though the context object passed to the binders (context.typeFactory()). In the case of PropertyBinder and TypeBinder, the type factory can also be passed to the lambda expression passed to the field method to define the field type inline.

The type factory exposes various as*() methods, for example asString or asLocalDate. These are the first steps of the type definition DSL, where the data type is defined. They return other steps, from which options can be set, such as the analyzer. See below for an example.

Example 104. Defining a field type
IndexFieldType<String> type = context.typeFactory() (1)
        .asString() (2)
        .normalizer( "isbn" ) (3)
        .sortable( Sortable.YES ) (3)
        .toIndexFieldType(); (4)
1 Get the IndexFieldTypeFactory from the binding context.
2 Define the data type.
3 Define options. Available options differ based on the field type: for example, normalizer is available for String fields, but not for Double fields.
4 Get the index field type.

In ValueBinder, the call to toIndexFieldType() is omitted: context.bridge(…​) expects to be passed the last DSL step, not a fully built type.

toIndexFieldType() is also omitted in the lambda expressions passed to the field method of the field declaration DSL.

6.9.2. Available data types

All available data types have a dedicated as*() method in IndexFieldTypeFactory. For details, see the javadoc of IndexFieldTypeFactory, or the backend-specific documentation:

6.9.3. Available type options

Most of the options available in the index field type DSL are identical to the options exposed by @*Field annotations. See Field annotation attributes for details about them.

Other options are explained in the following sections.

6.9.4. DSL converter

This section is not relevant for ValueBinder: Hibernate Search sets the DSL converter automatically for value bridges, creating a DSL converter that simply delegates to the value bridge.

The various search DSLs expose some methods that expect a field value: matching(), between(), atMost(), missingValue().use(), …​ By default, the expected type will be the same as the data type, i.e. String if you called asString(), LocalDate if you called asLocalDate(), etc.

This can be annoying when the bridge converts values from a different type when indexing. For example, if the bridge converts an enum to a string when indexing, you probably don’t want to pass a string to search predicates, but rather the enum.

By setting a DSL converter on a field type, it is possible to change the expected type of values passed to the various DSL, See below for an example.

Example 105. Assigning a DSL converter to a field type
IndexFieldType<String> type = context.typeFactory()
        .asString() (1)
        .normalizer( "isbn" )
        .sortable( Sortable.YES )
        .dslConverter( (2)
                ISBN.class, (3)
                (value, convertContext) -> value.getStringValue() (4)
        )
        .toIndexFieldType();
1 Define the data type as String.
2 Define a DSL converter that converts from ISBN to String. This converter will be used transparently by the search DSLs.
3 Define the input type as ISBN by passing ISBN.class as the first parameter.
4 Define how to convert an ISBN to a String by passing a converter as the second parameter.
ISBN expectedISBN = /* ... */
List<Book> result = searchSession.search( Book.class )
        .where( f -> f.match().field( "isbn" )
                .matching( expectedISBN ) ) (1)
        .fetchHits( 20 );
1 Thanks to the DSL converter, predicates targeting fields using our type accept ISBN values by default.
DSL converters can be disabled in the various DSLs where necessary. See Type of arguments passed to the DSL.

6.9.5. Projection converter

This section is not relevant for ValueBinder: Hibernate Search sets the projection converter automatically for value bridges, creating a projection converter that simply delegates to the value bridge.

By default, the type of values returned by field projections or aggregations will be the same as the data type of the corresponding field, i.e. String if you called asString(), LocalDate if you called asLocalDate(), etc.

This can be annoying when the bridge converts values from a different type when indexing. For example, if the bridge converts an enum to a string when indexing, you probably don’t want projections to return a string, but rather the enum.

By setting a projection converter on a field type, it is possible to change the type of values returned by field projections or aggregations. See below for an example.

Example 106. Assigning a projection converter to a field type
IndexFieldType<String> type = context.typeFactory()
        .asString() (1)
        .projectable( Projectable.YES )
        .projectionConverter( (2)
                ISBN.class, (3)
                (value, convertContext) -> ISBN.parse( value ) (4)
        )
        .toIndexFieldType();
1 Define the data type as String.
2 Define a projection converter that converts from String to ISBN. This converter will be used transparently by the search DSLs.
3 Define the converted type as ISBN by passing ISBN.class as the first parameter.
4 Define how to convert a String to an ISBN by passing a converter as the second parameter.
List<ISBN> result = searchSession.search( Book.class )
        .select( f -> f.field( "isbn", ISBN.class ) ) (1)
        .where( f -> f.matchAll() )
        .fetchHits( 20 );
1 Thanks to the projection converter, fields using our type are projected to an ISBN by default.
Projection converters can be disabled in the projection DSL where necessary. See Type of projected values.

6.9.6. Backend-specific types

Backends define extensions to this DSL to define backend-specific types.

See:

6.10. Defining named predicates

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

When implementing a PropertyBinder or TypeBinder, it is possible to assign "named predicates" to index schema elements (either the index root or an object field).

These named predicates will then be usable through the Search DSL, referencing them by name and optionally passing parameters. The main point is that the implementation is hidden from callers: they do not need to understand how data is indexed in order to use a named predicate.

Below is a simple example using the DSL to declare an object field and assign a named predicate to that field, in a property binder.

Example 107. Declaring a named predicate
/**
 * A binder for Stock Keeping Unit (SKU) identifiers, i.e. Strings with a specific format.
 */
public class SkuIdentifierBinder implements PropertyBinder {

    @Override
    public void bind(PropertyBindingContext context) {
        context.dependencies().useRootOnly();

        IndexSchemaObjectField skuIdObjectField = context.indexSchemaElement()
                .objectField( context.bridgedElement().name() );

        IndexFieldType<String> skuIdPartType = context.typeFactory()
                .asString().normalizer( "lowercase" ).toIndexFieldType();

        context.bridge( String.class, new Bridge(
                skuIdObjectField.toReference(),
                skuIdObjectField.field( "departmentCode", skuIdPartType ).toReference(),
                skuIdObjectField.field( "collectionCode", skuIdPartType ).toReference(),
                skuIdObjectField.field( "itemCode", skuIdPartType ).toReference()
        ) );

        skuIdObjectField.namedPredicate( (1)
                "skuIdMatch", (2)
                new SkuIdentifierMatchPredicateProvider() (3)
        );
    }

    // ... class continues below
1 The binder defines a named predicate. Note this predicate is assigned to an object field.
2 The predicate name will be used to refer to this predicate when calling the named predicate. Since the predicate is assigned to an object field, callers will have to prefix the predicate name with the path to that object field.
3 The named predicate provider will define how to create the predicate when searching.
// ... class SkuIdentifierBinder (continued)

private static class Bridge implements PropertyBridge<String> { (1)

    private final IndexObjectFieldReference skuIdObjectField;
    private final IndexFieldReference<String> departmentCodeField;
    private final IndexFieldReference<String> collectionCodeField;
    private final IndexFieldReference<String> itemCodeField;

    private Bridge(IndexObjectFieldReference skuIdObjectField,
            IndexFieldReference<String> departmentCodeField,
            IndexFieldReference<String> collectionCodeField,
            IndexFieldReference<String> itemCodeField) {
        this.skuIdObjectField = skuIdObjectField;
        this.departmentCodeField = departmentCodeField;
        this.collectionCodeField = collectionCodeField;
        this.itemCodeField = itemCodeField;
    }

    @Override
    public void write(DocumentElement target, String skuId, PropertyBridgeWriteContext context) {
        DocumentElement skuIdObject = target.addObject( this.skuIdObjectField );(2)

        // An SKU identifier is formatted this way: "<department code>.<collection code>.<item code>".
        String[] skuIdParts = skuId.split( "\\." );
        skuIdObject.addValue( this.departmentCodeField, skuIdParts[0] ); (3)
        skuIdObject.addValue( this.collectionCodeField, skuIdParts[1] ); (3)
        skuIdObject.addValue( this.itemCodeField, skuIdParts[2] ); (3)
    }
}

// ... class continues below
1 Here the bridge class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.
2 The bridge creates an object to hold the various components of the SKU identifier.
3 The bridge populates the various components of the SKU identifier.
    // ... class SkuIdentifierBinder (continued)

    private static class SkuIdentifierMatchPredicateProvider implements NamedPredicateProvider { (1)
        @Override
        public SearchPredicate create(NamedPredicateProviderContext context) {
            SearchPredicateFactory f = context.predicate(); (2)

            String pattern = (String) context.param( "pattern" ); (3)

            return f.bool( b -> { (4)
                // An SKU identifier pattern is formatted this way: "<department code>.<collection code>.<item code>".
                // Each part supports * and ? wildcards.
                String[] patternParts = pattern.split( "\\." );
                if ( patternParts.length > 0 ) {
                    b.must( f.wildcard()
                            .field( "departmentCode" ) (5)
                            .matching( patternParts[0] ) );
                }
                if ( patternParts.length > 1 ) {
                    b.must( f.wildcard()
                            .field( "collectionCode" )
                            .matching( patternParts[1] ) );
                }
                if ( patternParts.length > 2 ) {
                    b.must( f.wildcard()
                            .field( "itemCode" )
                            .matching( patternParts[2] ) );
                }
            } ).toPredicate(); (6)
        }
    }
}
1 The named predicate provider must implement the NamedPredicateProvider interface.

Here the named predicate provider class is nested in the binder class, because it is more convenient, but you are obviously free to implement it in a separate java file.

2 The context passed to the provider exposes the predicate factory, which is the entry point to the predicate DSL, used to create predicates.
3 The provider can access parameters that are passed when calling the named predicates. Getting the param using the param method assumes that the param has been defined. Alternatively it is possible to use paramOptional to get the java.util.Optional of the param.
4 The provider uses the predicate factory to create predicates. In this example, this implementation transforms a pattern with a custom format into three patterns, one for each field populated by the bridge.
5 Be careful: the search predicate factory expects paths relative to the object field where the named predicate was registered. Here the path departmentCode will be understood as <path to the object field>.departmentCode. See also Field paths.
6 Do not forget to call toPredicate() to return a SearchPredicate instance.
@Entity
@Indexed
public class ItemStock {

    @Id
    @PropertyBinding(binder = @PropertyBinderRef(type = SkuIdentifierBinder.class)) (1)
    private String skuId;

    private int amountInStock;

    // Getters and setters
    // ...


}
1 Apply the bridge using the @PropertyBinding annotation. The predicate will be available in the Search DSL, as shown in named: call a predicate defined in the mapping.

6.11. Assigning default bridges with the bridge resolver

6.11.1. Basics

Both the @*Field annotations and the @DocumentId annotation support a broad range of standard types by default, without needing to tell Hibernate Search how to convert values to something that can be indexed.

Under the hood, the support for default types is handled by the bridge resolver. For example, when a property is mapped with @GenericField and neither @GenericField.valueBridge nor @GenericField.valueBinder is set, Hibernate Search will resolve the type of this property, then pass it to the bridge resolver, which will return an appropriate bridge, or fail if there isn’t any.

It is possible to customize the bridge resolver, to override existing default bridges (indexing java.util.Date differently, for example) or to define default bridges for additional types (a geospatial type from an external library, for example).

To that end, define a mapping configurer as explained in Programmatic mapping, then define bridges as shown below:

Example 108. Defining default bridges with a mapping configurer
public class MyDefaultBridgesConfigurer implements HibernateOrmSearchMappingConfigurer {
    @Override
    public void configure(HibernateOrmMappingConfigurationContext context) {
        context.bridges().exactType( MyCoordinates.class )
                .valueBridge( new MyCoordinatesBridge() ); (1)

        context.bridges().exactType( MyProductId.class )
                .identifierBridge( new MyProductIdBridge() ); (2)

        context.bridges().exactType( ISBN.class )
                .valueBinder( new ValueBinder() { (3)
                    @Override
                    public void bind(ValueBindingContext<?> context) {
                        context.bridge( ISBN.class, new ISBNValueBridge(),
                                context.typeFactory().asString().normalizer( "isbn" ) );
                    }
                } );
    }
}
1 Use our custom bridge (MyCoordinatesBridge) by default when a property of type MyCoordinates is mapped to an index field (e.g. with @GenericField).
2 Use our custom bridge (MyProductBridge) by default when a property of type MyProductId is mapped to a document identifier (e.g. with @DocumentId).
3 It’s also possible to specify a binder instead of a bridge, so that additional settings can be tuned. Here we’re assigning the "isbn" normalizer every time we map an ISBN to an index field.

6.11.2. Assigning a single binder to multiple types

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

For more advanced use cases, it is also possible to assign a single binder to subtypes of a given type. This is useful when many types should be indexed similarly.

Below is an example where enums are not indexed as their .name() (which is the default), but instead are indexed as their label retrieved from an external service.

Example 109. Assigning a single default binder to multiple types with a mapping configurer
context.bridges().subTypesOf( Enum.class ) (1)
        .valueBinder( new ValueBinder() {
            @Override
            public void bind(ValueBindingContext<?> context) {
                Class<?> enumType = context.bridgedElement().rawType(); (2)
                doBind( context, enumType );
            }

            private <T> void doBind(ValueBindingContext<?> context, Class<T> enumType) {
                BeanHolder<EnumLabelService> serviceHolder =
                        context.beanResolver().resolve( EnumLabelService.class, BeanRetrieval.ANY ); (3)
                context.bridge( enumType, new EnumLabelBridge<>( enumType, serviceHolder ) ); (4)
            }
        } );
1 Match all subtypes of Enum.
2 Retrieve the type of the element being bridged.
3 Retrieve an external service (through CDI/Spring).
4 Create and assign the bridge.

7. Managing the index schema

7.1. Basics

Before indexes can be used for indexing or searching, they must be created on disk (Lucene) or in the remote cluster (Elasticsearch). With Elasticsearch in particular, this creation may not be obvious since it requires to describe the schema for each index, which includes in particular:

  • the definition of every analyzer or normalizer used in this index;

  • the definition of every single field used in this index, including in particular its type, the analyzer assigned to it, whether it requires doc values, etc.

Hibernate Search has all the necessary information to generate this schema automatically, so it is possible to delegate the task of managing the schema to Hibernate Search.

7.2. Automatic schema management on startup/shutdown

The property hibernate.search.schema_management.strategy can be set to one of the following values in order to define what to do with the indexes and their schema on startup and shutdown.

Strategy Definition Warnings

none

A strategy that does not do anything on startup or shutdown.

Indexes and their schema will not be created nor deleted on startup or shutdown. Hibernate Search will not even check that the index actually exists.

With Elasticsearch, indexes and their schema will have to be created explicitly before startup.

validate

A strategy that does not change indexes nor their schema, but checks that indexes exist and validates their schema on startup.

An exception will be thrown on startup if:

  • Indexes are missing

  • OR, with the Elasticsearch backend only, indexes exist but their schema does not match the requirements of the Hibernate Search mapping: missing fields, fields with incorrect type, missing analyzer definitions or normalizer definitions, …​

"Compatible" differences such as extra fields are ignored.

Indexes and their schema will have to be created explicitly before startup.

With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

create

A strategy that creates missing indexes and their schema on startup, but does not touch existing indexes and assumes their schema is correct without validating it.

create-or-validate (default)

A strategy that creates missing indexes and their schema on startup, and validates the schema of existing indexes.

With the Elasticsearch backend only, an exception will be thrown on startup if some indexes already exist but their schema does not match the requirements of the Hibernate Search mapping: missing fields, fields with incorrect type, missing analyzer definitions or normalizer definitions, …​

"Compatible" differences such as extra fields are ignored.

With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

create-or-update

A strategy that creates missing indexes and their schema on startup, and updates the schema of existing indexes if possible.

This strategy is unfit for production environments, due to several limitations including the impossibility to change the type of an existing field or the requirement to close indexes while updating analyzer definitions (which is not possible at all on AWS).

With the Lucene backend, schema update is a no-op, because local Lucene indexes don’t have a schema.

drop-and-create

A strategy that drops existing indexes and re-creates them and their schema on startup.

drop-and-create-and-drop

A strategy that drops existing indexes and re-creates them and their schema on startup, then drops the indexes on shutdown.

All indexed data will be lost on startup and shutdown.

7.3. Manual schema management

Schema management does not have to happen automatically on startup and shutdown.

Using the SearchSchemaManager interface, it is possible to trigger schema management operations explicitly after Hibernate Search has started.

The most common use case is to set the automatic schema management strategy to none and handle the creation/deletion of indexes manually when some other conditions are met, for example the Elasticsearch cluster has finished booting.

After schema management operations are complete, you will often want to populate indexes. To that end, use the mass indexer.

The SearchSchemaManager interface exposes the following methods.

Method Definition Warnings

validate()

Does not change indexes nor their schema, but checks that indexes exist and validates their schema.

With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

createIfMissing()

Creates missing indexes and their schema, but does not touch existing indexes and assumes their schema is correct without validating it.

createOrValidate()

Creates missing indexes and their schema, and validates the schema of existing indexes.

With the Lucene backend, validation is limited to checking that the indexes exist, because local Lucene indexes don’t have a schema.

createOrUpdate()

Creates missing indexes and their schema, and updates the schema of existing indexes if possible.

With the Elasticsearch backend, updating a schema may fail.

With the Elasticsearch backend, updating a schema may close indexes while updating analyzer definitions (which is not possible at all on AWS).

With the Lucene backend, schema update is a no-op, because local Lucene indexes don’t have a schema. (it just creates missing indexes).

dropIfExisting()

Drops existing indexes.

dropAndCreate()

Drops existing indexes and re-creates them and their schema.

Below is an example using a SearchSchemaManager to drop and create indexes, then using a mass indexer to re-populate the indexes. The dropAndCreateSchemaOnStart setting of the mass indexer would be an alternative solution to achieve the same results.

Example 110. Reinitializing indexes using a SearchSchemaManager
SearchSession searchSession = Search.session( entityManager ); (1)
SearchSchemaManager schemaManager = searchSession.schemaManager(); (2)
schemaManager.dropAndCreate(); (3)
searchSession.massIndexer().startAndWait(); (4)
1 Get a SearchSession.
2 Get a schema manager.
3 Drop and create the indexes. This method is synchronous and will only return after the operation is complete.
4 Optionally, trigger mass indexing.

You can also select entity types when creating a schema manager, to manage the indexes of these types only (and their indexed subtypes, if any):

Example 111. Reinitializing only some indexes using a SearchSchemaManager
SearchSchemaManager schemaManager = searchSession.schemaManager( Book.class ); (1)
schemaManager.dropAndCreate(); (2)
1 Get a schema manager targeting the index mapped to the Book entity type.
2 Drop and create the index for the Book entity only. Other indexes are unaffected.

7.4. How schema management works

Creating/updating a schema does not create/update indexed data

Creating or updating indexes and their schema through schema management will not populate the indexes:

  • newly created indexes will always be empty.

  • indexes with a recently updated schema will still contain the same indexed data, i.e. new fields won’t be added to documents just because they were added to the schema.

This is by design: reindexing is a potentially long-running task that should be triggered explicitly. To populate indexes with pre-existing data from the database, use mass indexing.

Dropping the schema means losing indexed data

Dropping a schema will drop the whole index, including all indexed data.

A dropped index will need to be re-created through schema management, then populated with pre-existing data from the database through mass indexing.

Schema validation and update are not effective with Lucene

The Lucene backend will only validate that the index actually exists and create missing indexes, because there is no concept of schema in Lucene beyond the existence of index segments.

Schema validation is permissive

With Elasticsearch, schema validation is as permissive as possible:

  • Fields that are unknown to Hibernate Search will be ignored.

  • Settings that are more powerful than required will be deemed valid. For example, a field that is not marked as sortable in Hibernate Search but marked as "docvalues": true in Elasticsearch will be deemed valid.

  • Analyzer/normalizer definitions that are unknown to Hibernate Search will be ignored.

One exception: date formats must match exactly the formats specified by Hibernate Search, due to implementation constraints.

Schema updates may fail

A schema update, triggered by the create-or-update strategy, may simply fail. This is because schemas may change in an incompatible way, such as a field having its type changed, or its analyzer changed, etc.

Worse, since updates are handled on a per-index basis, a schema update may succeed for one index but fail on another, leaving your schema as a whole half-updated.

For these reasons, using schema updates in a production environment is not recommended. Whenever the schema changes, you should either:

  • drop and create indexes, then reindex.

  • OR update the schema manually through custom scripts.

In this case, the create-or-update strategy will prevent Hibernate Search from starting, but it may already have successfully updated the schema for another index, making a rollback difficult.

Schema updates on Elasticsearch may close indexes

Elasticsearch does not allow updating analyzer/normalizer definitions on an open index. Thus, when analyzer or normalizer definitions have to be updated during a schema update, Hibernate Search will temporarily stop the affected indexes.

For this reason, the create-or-update strategy should be used with caution when multiple clients use Elasticsearch indexes managed by Hibernate Search: those clients should be synchronized in such a way that while Hibernate Search is starting, no other client needs to access the index.

Also, since Elasticsearch on Amazon Web Services (AWS) does not support the _close/_open operations, the schema update will fail when trying to update analyzer definitions on an AWS Elasticsearch cluster. The only workaround is to avoid the schema update on AWS. It should be avoided in production environments regardless: see [mapper-orm-schema-management-concepts-update-failure].

8. Indexing Hibernate ORM entities

8.1. Automatic indexing

By default, every time an entity is changed through a Hibernate ORM Session, if that entity is mapped to an index, Hibernate Search updates the relevant index.

Exactly how and when the index update happens depends on the coordination strategy; see Overview for more information.

8.1.1. Overview

Below is a summary of how automatic indexing works depending on the configured coordination strategy.

Follow the links for more details.

Table 6. Comparison of automatic indexing depending on the coordination strategy
Coordination strategy No coordination (default) Outbox polling

Detects changes occurring in ORM sessions (session.persist(…​), session.delete(…​), setters, …​)

Yes

Detects changes caused by JPQL or SQL queries (insert/update/delete)

No

Associations must be updated on both sides

Yes

Changes triggering reindexing

Only relevant changes

Guarantee of indexes updates

When the commit returns (non-transactional)

On commit (transactional)

Visibility of index updates

Configurable: immediate (poor performance) or eventual

Eventual

Overhead for application threads

Low to medium

Very low

Overhead for the database

Low

Low to medium

8.1.2. Configuration

Automatic indexing may be unnecessary if your index is read-only or if you update it regularly by reindexing, either using the MassIndexer or manually. You can disable automatic indexing by setting the configuration property hibernate.search.automatic_indexing.enabled to false.

8.1.3. In-session entity change detection and limitations

Hibernate Search uses internal events of Hibernate ORM in order to detect changes. These events will be triggered if you actually manipulate managed entity objects in your code: calls o session.persist(…​), session.delete(…​), to entity setters, etc.

This works great for most applications, but you need to consider some limitations:

8.1.4. Dirty checking

Hibernate Search is aware of the entity properties that are accessed when building indexed documents. Thanks to that knowledge, it is able to detect which entity changes are actually relevant to indexing, and to skip reindexing when a property is modified, but does not affect the indexed document.

You can control this "dirty checking" by setting the boolean property hibernate.search.automatic_indexing.enable_dirty_check:

  • by default, or when set to true, Hibernate Search will consider whether modified properties are relevant before triggering reindexing.

  • when set to false, Hibernate Search will trigger reindexing upon any change, regardless of the entity properties that changed.

8.1.5. Synchronization with the indexes

Basics

For a preliminary introduction to writing to and reading from indexes in Hibernate Search, including in particular the concepts of commit and refresh, see Commit and refresh.

Indexing synchronization is only relevant when coordination is disabled.

With coordination strategies such as outbox-polling, indexing happens in background threads and is always asynchronous.

When a transaction is committed, with default coordination settings, automatic indexing can (and, by default, will) block the application thread until indexing reaches a certain level of completion.

There are two main reasons for blocking the thread:

  1. Indexed data safety: if, once the database transaction completes, index data must be safely stored to disk, an index commit is necessary. Without it, index changes may only be safe after a few seconds, when a periodic index commit happens in the background.

  2. Real-time search queries: if, once the database transaction completes, any search query must immediately take the index changes into account, an index refresh is necessary. Without it, index changes may only be visible after a few seconds, when a periodic index refresh happens in the background.

These two requirements are controlled by the synchronization strategy. The default strategy is defined by the configuration property hibernate.search.automatic_indexing.synchronization.strategy. Below is a reference of all available strategies and their guarantees.

Strategy

Throughput

Guarantees when the application thread resumes

Changes applied (with or without commit)

Changes safe from crash/power loss (commit)

Changes visible on search (refresh)

async

Best

No guarantee

No guarantee

No guarantee

write-sync (default)

Medium

Guaranteed

Guaranteed

No guarantee

read-sync

Medium to worst

Guaranteed

No guarantee

Guaranteed

sync

Worst

Guaranteed

Guaranteed

Guaranteed

Depending on the backend and its configuration, the sync and read-sync strategies may lead to poor indexing throughput, because the backend may not be designed for frequent, on-demand index refreshes.

This is why this strategy is only recommended if you know your backend is designed for it, or for integration tests. In particular, the sync strategy will work fine with the default configuration of the Lucene backend, but will perform poorly with the Elasticsearch backend.

Indexing failures may be reported differently depending on the chosen strategy:

  • Failure to extract data from entities:

    • Regardless of the strategy, throws an exception in the application thread.

  • Failure to apply index changes (i.e. I/O operations on the index):

    • For strategies that apply changes immediately: throws an exception in the application thread.

    • For strategies that do not apply changes immediately: forwards the failure to the failure handler, which by default will simply log the failure.

  • Failure to commit index changes:

    • For strategies that guarantee an index commit: throws an exception in the application thread.

    • For strategies that do not guarantee an index commit: forwards the failure to the failure handler, which by default will simply log the failure.

Per-session override

While the configuration property mentioned above defines a default, it is possible to override this default on a particular session by calling SearchSession#automaticIndexingSynchronizationStrategy(…​) and passing a different strategy.

The built-in strategies can be retrieved by calling:

  • AutomaticIndexingSynchronizationStrategy.async()

  • AutomaticIndexingSynchronizationStrategy.writeSync()

  • AutomaticIndexingSynchronizationStrategy.readSync()

  • or AutomaticIndexingSynchronizationStrategy.sync()

Example 112. Overriding the automatic indexing synchronization strategy
SearchSession searchSession = Search.session( entityManager ); (1)
searchSession.automaticIndexingSynchronizationStrategy(
        AutomaticIndexingSynchronizationStrategy.sync()
); (2)

entityManager.getTransaction().begin();
try {
    Book book = entityManager.find( Book.class, 1 );
    book.setTitle( book.getTitle() + " (2nd edition)" ); (3)
    entityManager.getTransaction().commit(); (4)
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
}

List<Book> result = searchSession.search( Book.class )
        .where( f -> f.match().field( "title" ).matching( "2nd edition" ) )
        .fetchHits( 20 ); (5)
1 Obtain the search session, which by default uses the synchronization strategy configured in properties.
2 Override the synchronization strategy.
3 Change an entity.
4 Commit the changes, triggering reindexing.
5 The overridden strategy guarantees that the modified book will be present in these results, even though the query was executed just after the transaction commit.
Custom strategy

You can also implement custom strategy. The custom strategy can then be set just like the built-in strategies:

  • as the default by setting the configuration property hibernate.search.automatic_indexing.synchronization.strategy to a bean reference pointing to the custom implementation, for example class:com.mycompany.MySynchronizationStrategy.

  • at the session level by passing an instance of the custom implementation to SearchSession#automaticIndexingSynchronizationStrategy(…​).

8.2. Reindexing large volumes of data with the MassIndexer

8.2.1. Basics

There are cases where automatic indexing is not enough, because a pre-existing database has to be indexed:

  • when restoring a database backup;

  • when indexes had to be wiped, for example because the Hibernate Search mapping or some core settings changed;

  • when automatic indexing had to be disabled for performance reasons, and periodic reindexing (every night, …​) is preferred.

To address these situations, Hibernate Search provides the MassIndexer: a tool to rebuild indexes completely based on the content of the database. It can be told to reindex a few selected indexed types, or all of them.

The MassIndexer takes the following approach to provide a reasonably high throughput:

  • Indexes are purged completely when mass indexing starts.

  • Mass indexing is performed by several parallel threads, each loading data from the database and sending indexing requests to the indexes.

Because of the initial index purge, and because mass indexing is a very resource-intensive operation, it is recommended to take your application offline while the MassIndexer works.

Querying the index while a MassIndexer is busy may be slower than usual and will likely return incomplete results.

The following snippet of code will rebuild the index of all indexed entities, deleting the index and then reloading all entities from the database.

Example 113. Reindexing everything using a MassIndexer
SearchSession searchSession = Search.session( entityManager ); (1)
searchSession.massIndexer() (2)
        .startAndWait(); (3)
1 Get the SearchSession.
2 Create a MassIndexer targeting every indexed entity type.
3 Start the mass indexing process and return when it is over.

The MassIndexer creates its own, separate sessions and (read-only) transactions, so there is no need to begin a database transaction before the MassIndexer is started or to commit a transaction after it is done.

A note to MySQL users: the MassIndexer uses forward only scrollable results to iterate on the primary keys to be loaded, but MySQL’s JDBC driver will preload all values in memory.

To avoid this "optimization" set the idFetchSize parameter to Integer.MIN_VALUE.

You can also select entity types when creating a mass indexer, to reindex only these types (and their indexed subtypes, if any):

Example 114. Reindexing selected types using a MassIndexer
searchSession.massIndexer( Book.class ) (1)
        .startAndWait(); (2)
1 Create a MassIndexer targeting the Book type and its indexed subtypes (if any).
2 Start the mass indexing process for the selected types and return when it is over.

It is possible to run the mass indexer asynchronously, because, the mass indexer does not rely on the original Hibernate ORM session. When used asynchronously, the mass indexer will return a completion stage to track the completion of mass indexing:

Example 115. Reindexing asynchronously using a MassIndexer
searchSession.massIndexer() (1)
        .start() (2)
        .thenRun( () -> { (3)
            logger.info( "Mass indexing succeeded!" );
        } )
        .exceptionally( throwable -> {
            logger.error( "Mass indexing failed!", throwable );
            return null;
        } );

// OR
Future<?> future = searchSession.massIndexer().start()
        .toCompletableFuture(); (4)
1 Create a MassIndexer.
2 Start the mass indexing process, but do not wait for the process to finish. A CompletionStage is returned.
3 The CompletionStage exposes methods to execute more code after indexing is complete.
4 Alternatively, call toCompletableFuture() on the returned object to get a Future.

Although the MassIndexer is simple to use, some tweaking is recommended to speed up the process. Several optional parameters are available, and can be set as shown below, before the mass indexer starts. See MassIndexer parameters for a reference of all available parameters, and Tuning the MassIndexer for best performance for details about key topics.

Example 116. Using a tuned MassIndexer
searchSession.massIndexer() (1)
        .idFetchSize( 150 ) (2)
        .batchSizeToLoadObjects( 25 ) (3)
        .threadsToLoadObjects( 12 ) (4)
        .startAndWait(); (5)
1 Create a MassIndexer.
2 Load Book identifiers by batches of 150 elements.
3 Load Book entities to reindex by batches of 25 elements.
4 Create 12 parallel threads to load the Book entities.
5 Start the mass indexing process and return when it is over.

Running the MassIndexer with many threads will require many connections to the database. If you don’t have a sufficiently large connection pool, the MassIndexer itself and/or your other applications could starve and be unable to serve other requests: make sure you size your connection pool according to the mass indexing parameters, as explained in Threads and JDBC connections.

8.2.2. Conditional reindexing

You can also select a subset of target entities to be reindexed.

Example 117. Use of Conditional reindexing
SearchSession searchSession = Search.session( entityManager ); (1)
MassIndexer massIndexer = searchSession.massIndexer(); (2)
massIndexer.type( Book.class ).reindexOnly( "e.publicationYear <= 2100" ); (3)
massIndexer.type( Author.class ).reindexOnly( "e.birthDate < :birthDate" ) (4)
        .param( "birthDate", LocalDate.ofYearDay( 2100, 77 ) ); (5)
massIndexer.startAndWait(); (6)
1 Get the SearchSession.
2 Create a MassIndexer targeting every indexed entity type.
3 Reindex only the books published before year 2100.
4 Reindex only the authors born prior to a given local date.
5 In this example the date is passed as a query parameter.
6 Start the mass indexing process and return when it is over.

Even if the reindexing is applied on a subset of entities, by default all entities will be purged at the start. The purge can be disabled completely, but when enabled there is no way to filter the entities that will be purged.

See HSEARCH-3304 for more information.

8.2.3. MassIndexer parameters

Table 7. MassIndexer parameters
Setter Default value Description

typesToIndexInParallel(int)

1

The number of types to index in parallel.

threadsToLoadObjects(int)

6

The number of threads for entity loading, for each type indexed in parallel. That is to say, the number of threads spawned for entity loading will be typesToIndexInParallel * threadsToLoadObjects (+ 1 thread per type to retrieve the IDs of entities to load).

idFetchSize(int)

100

The fetch size to be used when loading primary keys. Some databases accept special values, for example MySQL might benefit from using Integer#MIN_VALUE, otherwise it will attempt to preload everything in memory.

batchSizeToLoadObjects(int)

10

The fetch size to be used when loading entities from database. Some databases accept special values, for example MySQL might benefit from using Integer#MIN_VALUE, otherwise it will attempt to preload everything in memory.

dropAndCreateSchemaOnStart(boolean)

false

Drops the indexes and their schema (if they exist) and re-creates them before indexing.

Indexes will be unavailable for a short time during the dropping and re-creation, so this should only be used when failures of concurrent operations on the indexes (automatic indexing, …​) are acceptable.

This should be used when the existing schema is known to be obsolete, for example when the Hibernate Search mapping changed and some fields now have a different type, a different analyzer, new capabilities (projectable, …​), etc.

This may also be used when the schema is up-to-date, since it can be faster than a purge (purgeAllOnStart) on large indexes, especially with the Elasticsearch backend.

As an alternative to this parameter, you can also use a schema manager to manage schemas manually at the time of your choosing: Manual schema management.

purgeAllOnStart(boolean)

true

Removes all entities from the indexes before indexing.

Only set this to false if you know the index is already empty; otherwise, you will end up with duplicates in the index.

mergeSegmentsAfterPurge(boolean)

true

Force merging of each index into a single segment after the initial index purge, just before indexing. This setting has no effect if purgeAllOnStart is set to false.

mergeSegmentsOnFinish(boolean)

false

Force merging of each index into a single segment after indexing. This operation does not always improve performance: see Merging segments and performance.

cacheMode(CacheMode)

CacheMode.IGNORE

The Hibernate CacheMode when loading entities. The default is CacheMode.IGNORE, and it will be the most efficient choice in most cases, but using another mode such as CacheMode.GET may be more efficient if many of the entities being indexed refer to a small set of other entities.

transactionTimeout

-

Only supported in JTA-enabled environments. Timeout of transactions for loading ids and entities to be re-indexed. The timeout should be long enough to load and index all entities of one type. Note that these transactions are read-only, so choosing a large value (e.g. 1800, meaning 30 minutes) should not cause any problem.

limitIndexedObjectsTo(long)

-

The maximum number of results to load per entity type. This parameter let you define a threshold value to avoid loading too many entities accidentally. The value defined must be greater than 0. The parameter is not used by default. It is equivalent to keyword LIMIT in SQL.

monitor(MassIndexingMonitor)

A logging monitor.

The component responsible for monitoring progress of mass indexing.

As a MassIndexer can take some time to finish its job, it is often necessary to monitor its progress. The default, built-in monitor logs progress periodically at the INFO level, but a custom monitor can be set by implementing the MassIndexingMonitor interface and passing an instance using the monitor method.

Implementations of MassIndexingMonitor must be thread-safe.

failureHandler(MassIndexingFailureHandler)

A failure handler.

The component responsible for handling failures occurring during mass indexing.

A MassIndexer performs multiple operations in parallel, some of which can fail without stopping the whole mass indexing process. As a result, it may be necessary to trace individual failures.

The default, built-in failure handler just forwards the failures to the global background failure handler, which by default will log them at the ERROR level, but a custom handler can be set by implementing the MassIndexingFailureHandler interface and passing an instance using the failureHandler method. This can be used to simply log failures in a context specific to the mass indexer, e.g. a web interface in a maintenance console from which mass indexing was requested, or for more advanced use cases, such as cancelling mass indexing on the first failure.

Implementations of MassIndexingFailureHandler must be thread-safe.

8.2.4. Tuning the MassIndexer for best performance

Basics

The MassIndexer was designed to finish the re-indexing task as quickly as possible, but there is no one-size-fits-all solution, so some configuration is required to get the best of it.

Performance optimization can get quite complex, so keep the following in mind while you attempt to configure the MassIndexer:

  • Always test your changes to assess their actual effect: advice provided in this section is true in general, but each application and environment is different, and some options, when combined, may produce unexpected results.

  • Take baby steps: before tuning mass indexing with 40 indexed entity types with two million instances each, try a more reasonable scenario with only one entity type, optionally limiting the number of entities to index to assess performance more quickly.

  • Tune your entity types individually before you try to tune a mass indexing operation that indexes multiple entity types in parallel.

Threads and JDBC connections

Increasing parallelism usually helps as the bottleneck usually is the latency to the database connection: it’s probably worth it to experiment with a number of threads significantly higher than the number of actual cores available.

However, each thread requires one JDBC connection, and JDBC connections are usually in limited supply. In order to increase the number of threads safely:

  1. You should make sure your database can actually handle the resulting number of connections.

  2. Your JDBC connection pool should be configured to provide a sufficient number of connections.

  3. The above should take into account the rest of your application (request threads in a web application): ignoring this may bring other processes to a halt while the MassIndexer is working.

There is a simple formula to understand how the different options applied to the MassIndexer affect the number of used worker threads and connections:

if ( using the default 'none' coordination strategy ) {
    threadsToCoordinate = 0;
}
else {
    threadsToCoordinate = 1;
}
threadsToLoadIdentifiers = 1;
threads = threadsToCoordinate + typesToIndexInParallel * (threadsToLoadObjects + threadsToLoadIdentifiers);
required JDBC connections = threads;

Here are a few suggestions for a roughly sane tuning starting point for the parameters that affect parallelism:

typesToIndexInParallel

Should probably be a low value, like 1 or 2, depending on how much of your CPUs have spare cycles and how slow a database round trip will be.

threadsToLoadObjects

Higher increases the preloading rate for the picked entities from the database, but also increases memory usage and the pressure on the threads working on subsequent indexing. Note that each thread will extract data from the entity to reindex, which depending on your mapping might require accessing lazy associations and load associated entities, thus making blocking calls to the database, so you will probably need a high number of threads working in parallel.

All internal thread groups have meaningful names prefixed with "Hibernate Search", so they should be easily identified with most diagnostic tools, including simply thread dumps.

8.3. Reindexing large volumes of data with the JSR-352 integration

Hibernate Search provides a JSR-352 job to perform mass indexing. It covers not only the existing functionality of the mass indexer described above, but also benefits from some powerful standard features of the Java Batch Platform (JSR-352), such as failure recovery using checkpoints, chunk oriented processing, and parallel execution. This batch job accepts different entity type(s) as input, loads the relevant entities from the database, then rebuilds the full-text index from these.

However, it requires a batch runtime for the execution. Please notice that we don’t provide any batch runtime, you are free to choose one that fits you needs, e.g. the default batch runtime embedded in your Java EE container. We provide full integration to the JBeret implementation (see how to configure it here). As for other implementations, they can also be used, but will require a bit more configuration on your side.

If the runtime is JBeret, you need to add the following dependency:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm-batch-jsr352-jberet</artifactId>
   <version>6.1.8.Final</version>
</dependency>

For any other runtime, you need to add the following dependency:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm-batch-jsr352-core</artifactId>
   <version>6.1.8.Final</version>
</dependency>

Here is an example of how to run a batch instance:

Example 118. Reindexing everything using a JSR-352 mass-indexing job
Properties jobProps = MassIndexingJob.parameters() (1)
        .forEntities( Book.class, Author.class ) (2)
        .build();

JobOperator jobOperator = BatchRuntime.getJobOperator(); (3)
long executionId = jobOperator.start( MassIndexingJob.NAME, jobProps ); (4)
1 Start building parameters for a mass-indexing job.
2 Define some parameters. In this case, the list of the entity types to be indexed.
3 Get the JobOperator from the framework.
4 Start the job.

8.3.1. Job Parameters

The following table contains all the job parameters you can use to customize the mass-indexing job.

Table 8. Job Parameters in JSR 352 Integration
Parameter Name / Builder Method Default value Description

entityTypes / .forEntity(Class<?>), .forEntities(Class<?>, Class<?>…​)

-

This parameter is always required.

The entity types to index in this job execution, comma-separated.

purgeAllOnStart / .purgeAllOnStart(boolean)

True

Specify whether the existing index should be purged at the beginning of the job. This operation takes place before indexing.

mergeSegmentsAfterPurge / .mergeSegmentsAfterPurge(boolean)

True

Specify whether the mass indexer should merge segments at the beginning of the job. This operation takes place after the purge operation and before indexing.

mergeSegmentsOnFinish / .mergeSegmentsOnFinish(boolean)

True

Specify whether the mass indexer should merge segments at the end of the job. This operation takes place after indexing.

cacheMode / .cacheMode(CacheMode)

IGNORE

Specify the Hibernate CacheMode when loading entities. The default is IGNORE, and it will be the most efficient choice in most cases, but using another mode such as GET may be more efficient if many of the entities being indexed are already present in the Hibernate ORM second-level cache before mass indexing. Enabling caches has an effect only if the entity id is also the document id, which is the default. PUT or NORMAL values may lead to bad performance, because all the entities are also loaded into Hibernate second level cache.

idFetchSize / .idFetchSize(int)

1000

Specifies the fetch size to be used when loading primary keys. Some databases accept special values, for example MySQL might benefit from using Integer#MIN_VALUE, otherwise it will attempt to preload everything in memory.

entityFetchSize / .entityFetchSize(int)

The value of sessionClearInterval

Specifies the fetch size to be used when loading entities from database. Some databases accept special values, for example MySQL might benefit from using Integer#MIN_VALUE, otherwise it will attempt to preload everything in memory.

customQueryHQL / .restrictedBy(String)

-

Use HQL / JPQL to index entities of a target entity type. Your query should contain only one entity type. Mixing this approach with the criteria restriction is not allowed. Please notice that there’s no query validation for your input. See Indexing mode for more detail and limitations.

maxResultsPerEntity / .maxResultsPerEntity(int)

-

The maximum number of results to load per entity type. This parameter let you define a threshold value to avoid loading too many entities accidentally. The value defined must be greater than 0. The parameter is not used by default. It is equivalent to keyword LIMIT in SQL.

rowsPerPartition / .rowsPerPartition(int)

20,000

The maximum number of rows to process per partition. The value defined must be greater than 0, and equal to or greater than the value of checkpointInterval.

maxThreads / .maxThreads(int)

The number of partitions

The maximum number of threads to use for processing the job. Note the batch runtime cannot guarantee the request number of threads are available; it will use as many as it can up to the request maximum.

checkpointInterval / .checkpointInterval(int)

2,000, or the value of rowsPerPartition if it is smaller

The number of entities to process before triggering a checkpoint. The value defined must be greater than 0, and equal to or less than the value of rowsPerPartition.

sessionClearInterval / .sessionClearInterval(int)

200, or the value of checkpointInterval if it is smaller

The number of entities to process before clearing the session. The value defined must be greater than 0, and equal to or less than the value of checkpointInterval.

entityManagerFactoryReference / .entityManagerFactoryReference(String)

-

This parameter is required when there is more than one persistence unit.

The string that will identify the EntityManagerFactory.

entityManagerFactoryNamespace / .entityManagerFactoryNamespace(String)

-

-

8.3.2. Indexing mode

The mass indexing job allows you to define your own entities to be indexed — you can start a full indexing or a partial indexing through 2 different methods: selecting the desired entity types, or using HQL.

Example 119. Conditional reindexing using a restrictedBy HQL parameter
Properties jobProps = MassIndexingJob.parameters() (1)
        .forEntities( Author.class ) (2)
        .restrictedBy( "from Author a where a.lastName = 'Smith1'" ) (3)
        .build();

JobOperator jobOperator = BatchRuntime.getJobOperator(); (4)
long executionId = jobOperator.start( MassIndexingJob.NAME, jobProps ); (5)
1 Start building parameters for a mass-indexing job.
2 Define the entity type to be indexed.
3 Restrict the scope of the job using an HQL restriction.
4 Get JobOperator form the framework.
5 Start the job.

While the full indexing is useful when you perform the very first indexing, or after extensive changes to your whole database, it may also be time-consuming. If your want to reindex only part of your data, you need to add restrictions using HQL: they help you to define a customized selection, and only the entities inside that selection will be indexed. A typical use-case is to index the new entities appeared since yesterday.

Note that, as detailed below, some features may not be supported depending on the indexing mode.

Table 9. Comparison of each indexing mode
Indexing mode Scope Parallel Indexing

Full Indexation

All entities

Supported

HQL

Some entities

Not supported

When using the HQL mode, there isn’t any query validation before the job’s start. If the query is invalid, the job will start and fail.

Also, parallel indexing is disabled in HQL mode, because our current parallelism implementations relies on selection order, which might not be provided by the HQL given by user.

Because of those limitations, we suggest you use this approach only for indexing small numbers of entities, and only if you know that no entities matching the query will be created during indexing.

8.3.3. Parallel indexing

For better performance, indexing is performed in parallel using multiple threads. The set of entities to index is split into multiple partitions. Each thread processes one partition at a time.

The following section will explain how to tune the parallel execution.

The "sweet spot" of number of threads, fetch size, partition size, etc. to achieve best performance is highly dependent on your overall architecture, database design and even data values.

You should experiment with these settings to find out what’s best in your particular case.

Threads

The maximum number of threads used by the job execution is defined through method maxThreads(). Within the N threads given, there’s 1 thread reserved for the core, so only N - 1 threads are available for different partitions. If N = 1, the program will work, and all batch elements will run in the same thread. The default number of threads used in Hibernate Search is 10. You can overwrite it with your preferred number.

MassIndexingJob.parameters()
        .maxThreads( 5 )
        ...

Note that the batch runtime cannot guarantee the requested number of threads are available, it will use as many as possible up to the requested maximum (JSR352 v1.0 Final Release, page 34). Note also that all batch jobs share the same thread pool, so it’s not always a good idea to execute jobs concurrently.

Rows per partition

Each partition consists of a fixed number of elements to index. You may tune exactly how many elements a partition will hold with rowsPerPartition.

MassIndexingJob.parameters()
        .rowsPerPartition( 5000 )
        ...

This property has nothing to do with "chunk size", which is how many elements are processed together between each write. That aspect of processing is addressed by chunking.

Instead, rowsPerPartition is more about how parallel your mass indexing job will be.

Please see the Chunking section to see how to tune chunking.

When rowsPerPartition is low, there will be many small partitions, so processing threads will be less likely to starve (stay idle because there’s no more partition to process), but on the other hand you will only be able to take advantage of a small fetch size, which will increase the number of database accesses. Also, due to the failure recovery mechanisms, there is some overhead in starting a new partition, so with an unnecessarily large number of partitions, this overhead will add up.

When rowsPerPartition is high, there will be a few big partitions, so you will be able to take advantage of a higher chunk size, and thus a higher fetch size, which will reduce the number of database accesses, and the overhead of starting a new partition will be less noticeable, but on the other hand you may not use all the threads available.

Each partition deals with one root entity type, so two different entity types will never run under the same partition.

8.3.4. Chunking and session clearing

The mass indexing job supports restart a suspended or failed job more or less from where it stopped.

This is made possible by splitting each partition in several consecutive chunks of entities, and saving process information in a checkpoint at the end of each chunk. When a job is restarted, it will resume from the last checkpoint.

The size of each chunk is determined by the checkpointInterval parameter.

MassIndexingJob.parameters()
        .checkpointInterval( 1000 )
        ...

But the size of a chunk is not only about saving progress, it is also about performance:

  • a new Hibernate session is opened for each chunk;

  • a new transaction is started for each chunk;

  • inside a chunk, the session is cleared periodically according to the sessionClearInterval parameter, which must thereby be smaller than (or equal to) the chunk size;

  • documents are flushed to the index at the end of each chunk.

In general the checkpoint interval should be small compared to the number of rows per partition.

Indeed, due to the failure recovery mechanism, the elements before the first checkpoint of each partition will take longer to process than the other, so in a 1000-element partition, having a 100-element checkpoint interval will be faster than having a 1000-element checkpoint interval.

On the other hand, chunks shouldn’t be too small in absolute terms. Performing a checkpoint means your JSR-352 runtime will write information about the progress of the job execution to its persistent storage, which also has a cost. Also, a new transaction and session are created for each chunk which doesn’t come for free, and implies that setting the fetch size to a value higher than the chunk size is pointless. Finally, the index flush performed at the end of each chunk is an expensive operation that involves a global lock, which essentially means that the less you do it, the faster indexing will be. Thus having a 1-element checkpoint interval is definitely not a good idea.

8.3.5. Selecting the persistence unit (EntityManagerFactory)

Regardless of how the entity manager factory is retrieved, you must make sure that the entity manager factory used by the mass indexer will stay open during the whole mass indexing process.

JBeret

If your JSR-352 runtime is JBeret (used in WildFly in particular), you can use CDI to retrieve the EntityManagerFactory.

If you use only one persistence unit, the mass indexer will be able to access your database automatically without any special configuration.

If you want to use multiple persistence units, you will have to register the EntityManagerFactories as beans in the CDI context. Note that entity manager factories will probably not be considered as beans by default, in which case you will have to register them yourself. You may use an application-scoped bean to do so:

@ApplicationScoped
public class EntityManagerFactoriesProducer {

    @PersistenceUnit(unitName = "db1")
    private EntityManagerFactory db1Factory;

    @PersistenceUnit(unitName = "db2")
    private EntityManagerFactory db2Factory;

    @Produces
    @Singleton
    @Named("db1") // The name to use when referencing the bean
    public EntityManagerFactory createEntityManagerFactoryForDb1() {
        return db1Factory;
    }

    @Produces
    @Singleton
    @Named("db2") // The name to use when referencing the bean
    public EntityManagerFactory createEntityManagerFactoryForDb2() {
        return db2Factory;
    }
}

Once the entity manager factories are registered in the CDI context, you can instruct the mass indexer to use one in particular by naming it using the entityManagerReference parameter.

Due to limitations of the CDI APIs, it is not currently possible to reference an entity manager factory by its persistence unit name when using the mass indexer with CDI.

Other DI-enabled JSR-352 implementations

If you want to use a different JSR-352 implementation that happens to allow dependency injection:

  1. You must map the following two scope annotations to the relevant scope in the dependency injection mechanism:

    • org.hibernate.search.batch.jsr352.core.inject.scope.spi.HibernateSearchJobScoped

    • org.hibernate.search.batch.jsr352.core.inject.scope.spi.HibernateSearchPartitionScoped

  2. You must make sure that the dependency injection mechanism will register all injection-annotated classes (@Named, …​) from the hibernate-search-mapper-orm-batch-jsr352-core module in the dependency injection context. For instance this can be achieved in Spring DI using the @ComponentScan annotation.

  3. You must register a single bean in the dependency injection context that will implement the EntityManagerFactoryRegistry interface.

Plain Java environment (no dependency injection at all)

The following will work only if your JSR-352 runtime does not support dependency injection at all, i.e. it ignores @Inject annotations in batch artifacts. This is the case for JBatch in Java SE mode, for instance.

If you use only one persistence unit, the mass indexer will be able to access your database automatically without any special configuration: you only have to make sure to create the EntityManagerFactory (or SessionFactory) in your application before launching the mass indexer.

If you want to use multiple persistence units, you will have to add two parameters when launching the mass indexer:

  • entityManagerFactoryReference: this is the string that will identify the EntityManagerFactory.

  • entityManagerFactoryNamespace: this allows to select how you want to reference the EntityManagerFactory. Possible values are:

    • persistence-unit-name (the default): use the persistence unit name defined in persistence.xml.

    • session-factory-name: use the session factory name defined in the Hibernate configuration by the hibernate.session_factory_name configuration property.

If you set the hibernate.session_factory_name property in the Hibernate configuration, and you don’t use JNDI, you will also have to set hibernate.session_factory_name_is_jndi to false.

8.4. Manual indexing

8.4.1. Basics

While automatic indexing and the MassIndexer or the mass indexing job should take care of most needs, it is sometimes necessary to control indexing manually, for example to reindex just a few entity instances that were affected by changes to the database that automatic indexing cannot detect, such as JPQL/SQL insert, update or delete queries.

To address these use cases, Hibernate Search exposes several APIs explained if the following sections.

As with everything in Hibernate Search, these APIs only affect the Hibernate Search indexes: they do not write anything to the database.

8.4.2. Controlling entity reads and index writes with SearchIndexingPlan

A fairly common use case when manipulating large datasets with JPA is the periodic "flush-clear" pattern, where a loop reads or writes entities for every iteration and flushes then clears the session every n iterations. This patterns allows processing a large number of entities while keeping the memory footprint reasonably low.

Below is an example of this pattern to persist a large number of entities when not using Hibernate Search.

Example 120. A batch process with JPA
entityManager.getTransaction().begin();
try {
    for ( int i = 0 ; i < NUMBER_OF_BOOKS ; ++i ) { (1)
        Book book = newBook( i );
        entityManager.persist( book ); (2)

        if ( ( i + 1 ) % BATCH_SIZE == 0 ) {
            entityManager.flush(); (3)
            entityManager.clear(); (4)
        }
    }
    entityManager.getTransaction().commit();
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
    throw e;
}
1 Execute a loop for a large number of elements, inside a transaction.
2 For every iteration of the loop, instantiate a new entity and persist it.
3 Every BATCH_SIZE iterations of the loop, flush the entity manager to send the changes to the database-side buffer.
4 After a flush, clear the ORM session to release some memory.

With Hibernate Search 6 (on contrary to Hibernate Search 5 and earlier), this pattern will work as expected:

However, each flush call will potentially add data to an internal buffer, which for large volumes of data may lead to an OutOfMemoryException, depending on the JVM heap size, the coordination strategy and the complexity and number of documents.

If you run into memory issues, the first solution is to break down the batch process into multiple transactions, each handling a smaller number of elements: the internal document buffer will be cleared after each transaction.

See below for an example.

With this pattern, if one transaction fails, part of the data will already be in the database and in indexes, with no way to roll back the changes.

However, the indexes will be consistent with the database, and it will be possible to (manually) restart the process from the last transaction that failed.

Example 121. A batch process with Hibernate Search using multiple transactions
try {
    int i = 0;
    while ( i < NUMBER_OF_BOOKS ) { (1)
        entityManager.getTransaction().begin(); (2)
        int end = Math.min( i + BATCH_SIZE, NUMBER_OF_BOOKS ); (3)
        for ( ; i < end; ++i ) {
            Book book = newBook( i );
            entityManager.persist( book ); (4)
        }
        entityManager.getTransaction().commit(); (5)
    }
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
    throw e;
}
1 Add an outer loop that creates one transaction per iteration.
2 Begin the transaction at the beginning of each iteration of the outer loop.
3 Only handle a limited number of elements per transaction.
4 For every iteration of the loop, instantiate a new entity and persist it. Note we’re relying on automatic indexing to index the entity, but this would work just as well if automatic indexing was disabled, only requiring an extra call to index the entity. See Explicitly indexing and deleting specific documents.
5 Commit the transaction at the end of each iteration of the outer loop. The entities will be flushed and indexed automatically.

The multi-transaction solution and the original flush()/clear() loop pattern can be combined, breaking down the process in multiple medium-sized transactions, and periodically calling flush/clear inside each transaction.

This combined solution is the most flexible, hence the most suitable if you want to fine-tune your batch process.

If breaking down the batch process into multiple transactions is not an option, a second solution is to just write to indexes after the call to session.flush()/session.clear(), without waiting for the database transaction to be committed: the internal document buffer will be cleared after each write to indexes.

This is done by calling the execute() method on the indexing plan, as shown in the example below.

With this pattern, if an exception is thrown, part of the data will already be in the index, with no way to roll back the changes, while the database changes will have been rolled back. The index will thus be inconsistent with the database.

To recover from that situation, you will have to either execute the exact same database changes that failed manually (to get the database back in sync with the index), or reindex the entities affected by the transaction manually (to get the index back in sync with the database).

Of course, if you can afford to take the indexes offline for a longer period of time, a simpler solution would be to wipe the indexes clean and reindex everything.

Example 122. A batch process with Hibernate Search using execute()
SearchSession searchSession = Search.session( entityManager ); (1)
SearchIndexingPlan indexingPlan = searchSession.indexingPlan(); (2)

entityManager.getTransaction().begin();
try {
    for ( int i = 0 ; i < NUMBER_OF_BOOKS ; ++i ) {
        Book book = newBook( i );
        entityManager.persist( book ); (3)

        if ( ( i + 1 ) % BATCH_SIZE == 0 ) {
            entityManager.flush();
            entityManager.clear();
            indexingPlan.execute(); (4)
        }
    }
    entityManager.getTransaction().commit(); (5)
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
    throw e;
}
1 Get the SearchSession.
2 Get the search session’s indexing plan.
3 For every iteration of the loop, instantiate a new entity and persist it. Note we’re relying on automatic indexing to index the entity, but this would work just as well if automatic indexing was disabled, only requiring an extra call to index the entity. See Explicitly indexing and deleting specific documents.
4 After a flush()/clear(), call indexingPlan.execute(). The entities will be processed and the changes will be sent to the indexes immediately. Hibernate Search will wait for index changes to be "completed" as required by the configured synchronization strategy.
5 After the loop, commit the transaction. The remaining entities that were not flushed/cleared will be flushed and indexed automatically.

8.4.3. Explicitly indexing and deleting specific documents

When automatic indexing is disabled, the indexes will start empty and stay that way until explicit indexing commands are sent to Hibernate Search.

Indexing is done in the context of an ORM session using the SearchIndexingPlan interface. This interface represents the (mutable) set of changes that are planned in the context of a session, and will be applied to indexes upon transaction commit.

This interface offers the following methods:

addOrUpdate(Object entity)

Add or update a document in the index if the entity type is mapped to an index (@Indexed), and re-index documents that embed this entity (through @IndexedEmbedded for example).

delete(Object entity)

Delete a document from the index if the entity type is mapped to an index (@Indexed), and re-index documents that embed this entity (through @IndexedEmbedded for example).

purge(Class<?> entityType, Object id)

Delete the entity from the index, but do not try to re-index documents that embed this entity.

Compared to delete, this is mainly useful if the entity has already been deleted from the database and is not available, even in a detached state, in the session. In that case, reindexing associated entities will be the user’s responsibility, since Hibernate Search cannot know which entities are associated to an entity that no longer exists.

purge(String entityName, Object id)

Same as purge(Class<?> entityType, Object id), but the entity type is referenced by its name (see @javax.persistence.Entity#name).

process() and execute()

Respectively, process the changes and apply them to indexes.

These methods will be executed automatically on commit, so they are only useful when processing large number of items, as explained in Controlling entity reads and index writes with SearchIndexingPlan.

Below are examples of using addOrUpdate and delete.

Example 123. Explicitly adding or updating an entity in the index using SearchIndexingPlan
SearchSession searchSession = Search.session( entityManager ); (1)
SearchIndexingPlan indexingPlan = searchSession.indexingPlan(); (2)

entityManager.getTransaction().begin();
try {
    Book book = entityManager.getReference( Book.class, 5 ); (3)

    indexingPlan.addOrUpdate( book ); (4)

    entityManager.getTransaction().commit(); (5)
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
    throw e;
}
1 Get the SearchSession.
2 Get the search session’s indexing plan.
3 Fetch from the database the Book we want to index.
4 Submit the Book to the indexing plan for an add-or-update operation. The operation won’t be executed immediately, but will be delayed until the transaction is committed.
5 Commit the transaction, allowing Hibernate Search to actually write the document to the index.
Example 124. Explicitly deleting an entity from the index using SearchIndexingPlan
SearchSession searchSession = Search.session( entityManager ); (1)
SearchIndexingPlan indexingPlan = searchSession.indexingPlan(); (2)

entityManager.getTransaction().begin();
try {
    Book book = entityManager.getReference( Book.class, 5 ); (3)

    indexingPlan.delete( book ); (4)

    entityManager.getTransaction().commit(); (5)
}
catch (RuntimeException e) {
    entityManager.getTransaction().rollback();
    throw e;
}
1 Get the SearchSession.
2 Get the search session’s indexing plan.
3 Fetch from the database the Book we want to un-index.
4 Submit the Book to the indexing plan for a delete operation. The operation won’t be executed immediately, but will be delayed until the transaction is committed.
5 Commit the transaction, allowing Hibernate Search to actually delete the document from the index.

Multiple operations can be performed in a single indexing plan. The same entity can even be changed multiple times, for example added and then removed: Hibernate Search will simplify the operation as expected.

This will work fine for any reasonable number of entities, but changing or simply loading large numbers of entities in a single session requires special care with Hibernate ORM, and then some extra care with Hibernate Search. See Controlling entity reads and index writes with SearchIndexingPlan for more information.

8.4.4. Explicitly altering a whole index

Some index operations are not about a specific entity/document, but rather about a large number of documents, possibly all of them. This includes, for example, purging the index to remove all of its content.

The operations are performed outside the context of an ORM session, using the SearchWorkspace interface.

The SearchWorkspace can be retrieved from the SearchMapping, and can target one, several or all indexes:

Example 125. Retrieving a SearchWorkspace from the SearchMapping
SearchMapping searchMapping = Search.mapping( entityManagerFactory ); (1)
SearchWorkspace allEntitiesWorkspace = searchMapping.scope( Object.class ).workspace(); (2)
SearchWorkspace bookWorkspace = searchMapping.scope( Book.class ).workspace(); (3)
SearchWorkspace bookAndAuthorWorkspace = searchMapping.scope( Arrays.asList( Book.class, Author.class ) )
        .workspace(); (4)
1 Get a SearchMapping.
2 Get a workspace targeting all indexes.
3 Get a workspace targeting the index mapped to the Book entity type.
4 Get a workspace targeting the indexes mapped to the Book and Author entity types.

Alternatively, for convenience, the SearchWorkspace can be retrieved from the SearchSession:

Example 126. Retrieving a SearchWorkspace from the SearchSession
SearchMapping searchMapping = Search.mapping( entityManagerFactory ); (1)
SearchWorkspace allEntitiesWorkspace = searchMapping.scope( Object.class ).workspace(); (2)
SearchWorkspace bookWorkspace = searchMapping.scope( Book.class ).workspace(); (3)
SearchWorkspace bookAndAuthorWorkspace = searchMapping.scope( Arrays.asList( Book.class, Author.class ) )
        .workspace(); (4)
1 Get a SearchSession.
2 Get a workspace targeting all indexes.
3 Get a workspace targeting the index mapped to the Book entity type.
4 Get a workspace targeting the indexes mapped to the Book and Author entity types.

The SearchWorkspace exposes various large-scale operations that can be applied to an index or a set of indexes. These operations are triggered as soon as they are requested, without waiting for the transaction commit.

This interface offers the following methods:

purge()

Delete all documents from indexes targeted by this workspace.

With multi-tenancy enabled, only documents of the current tenant will be removed: the tenant of the session from which this workspace originated.

purgeAsync()

Asynchronous version of purge() returning a CompletionStage.

purge(Set<String> routingKeys)

Delete documents from indexes targeted by this workspace that were indexed with any of the given routing keys.

With multi-tenancy enabled, only documents of the current tenant will be removed: the tenant of the session from which this workspace originated.

purgeAsync(Set<String> routingKeys)

Asynchronous version of purge(Set<String>) returning a CompletionStage.

flush()

Flush to disk the changes to indexes that have not been committed yet. In the case of backends with a transaction log (Elasticsearch), also apply operations from the transaction log that were not applied yet.

This is generally not useful as Hibernate Search commits changes automatically. See Commit and refresh for more information.

flushAsync()

Asynchronous version of flush() returning a CompletionStage.

refresh()

Refresh the indexes so that all changes executed so far will be visible in search queries.

This is generally not useful as indexes are refreshed automatically. See Commit and refresh for more information.

refreshAsync()

Asynchronous version of refresh() returning a CompletionStage.

mergeSegments()

Merge each index targeted by this workspace into a single segment. This operation does not always improve performance: see Merging segments and performance.

mergeSegmentsAsync()

Asynchronous version of mergeSegments() returning a CompletionStage. This operation does not always improve performance: see Merging segments and performance.

Merging segments and performance

The merge-segments operation may affect performance positively as well as negatively.

This operation will regroup all index data into a single, huge segment (a file). This may speed up search at first, but as documents are deleted, this huge segment will begin to fill with "holes" which have to be handled as special cases during search, degrading performance.

Elasticsearch/Lucene do address this by rebuilding the segment at some point, but only once a certain ratio of deleted documents is reached. If all documents are in a single, huge segment, this ratio is less likely to be reached, and the index performance will continue to degrade for a long time.

There are, however, two situations in which merging segments may help:

  1. No deletions or document updates are expected for an extended period of time.

  2. Most, or all documents have just been removed from the index, leading to segments consisting mostly of deleted documents. In that case, it makes sense to regroup the few remaining documents into a single segment, though Elasticsearch/Lucene will probably do it automatically.

Below is an example using a SearchWorkspace to purge several indexes.

Example 127. Purging indexes using a SearchWorkspace
SearchSession searchSession = Search.session( entityManager ); (1)
SearchWorkspace workspace = searchSession.workspace( Book.class, Author.class ); (2)
workspace.purge(); (3)
1 Get a SearchSession.
2 Get a workspace targeting the indexes mapped to the Book and Author entity types.
3 Trigger a purge. This method is synchronous and will only return after the purge is complete, but an asynchronous method, purgeAsync, is also available.

9. Searching

Beyond simply indexing, Hibernate Search also exposes high-level APIs to search these indexes without having to resort to native APIs.

One key feature of these search APIs is the ability to use indexes to perform the search, but to return entities loaded from the database, effectively offering a new type of query for Hibernate ORM entities.

9.1. Query DSL

9.1.1. Basics

Preparing and executing a query requires just a few lines:

Example 128. Executing a search query
// Not shown: get the entity manager and open a transaction
SearchSession searchSession = Search.session( entityManager ); (1)

SearchResult<Book> result = searchSession.search( Book.class ) (2)
        .where( f -> f.match() (3)
                .field( "title" )
                .matching( "robot" ) )
        .fetch( 20 ); (4)

long totalHitCount = result.total().hitCount(); (5)
List<Book> hits = result.hits(); (6)
// Not shown: commit the transaction and close the entity manager
1 Get a Hibernate Search session, called SearchSession, from the EntityManager.
2 Initiate a search query on the index mapped to the Book entity.
3 Define that only documents matching the given predicate should be returned. The predicate is created using a factory f passed as an argument to the lambda expression. See Predicate DSL for more information about predicates.
4 Build the query and fetch the results, limiting to the top 20 hits.
5 Retrieve the total number of matching entities. See Fetching the total (hit count, …​) for ways to optimize computation of the total hit count.
6 Retrieve matching entities.

By default, the hits of a search query will be entities managed by Hibernate ORM, bound to the entity manager used to create the search session. This provides all the benefits of Hibernate ORM, in particular the ability to navigate the entity graph to retrieve associated entities if necessary.

The query DSL offers many features, detailed in the following sections. Some commonly used features include:

  • predicates, the main component of a search query, i.e. the condition that every document must satisfy in order to be included in search results.

  • fetching the results differently: getting the hits directly as a list, using pagination, scrolling, etc.

  • sorts, to order the hits in various ways: by score, by the value of a field, by distance to a point, etc.

  • projections, to retrieve hits that are not just managed entities: data can be extracted from the index (field values), or even from both the index and the database.

  • aggregations, to group hits and compute aggregated metrics for each group — hit count by category, for example.

9.1.2. Advanced entity types targeting

Targeting multiple entity types

When multiple entity types have similar indexed fields, it is possible to search across these multiple types in a single search query: the search result will contain hits from any of the targeted types.

Example 129. Targeting multiple entity types in a single search query
SearchResult<Person> result = searchSession.search( Arrays.asList( (1)
                Manager.class, Associate.class
        ) )
        .where( f -> f.match() (2)
                .field( "name" )
                .matching( "james" ) )
        .fetch( 20 ); (3)
1 Initiate a search query targeting the indexes mapped to the Manager and Associate entity types. Since both entity types implement the Person interface, search hits will be instances of Person.
2 Continue building the query as usual. There are restrictions regarding the fields that can be used: see the note below.
3 Fetch the search result. Hits will all be instances of Person.

Multi-entity (multi-index) searches will only work well as long as the fields referenced in predicates/sorts/etc. are identical in all targeted indexes (same type, same analyzer, …​). Fields that are defined in only one of the targeted indexes will also work correctly.

If you want to reference index fields that are even slightly different in one of the targeted indexes (different type, different analyzer, …​), see Targeting multiple fields.

Targeting entity types by name

Though rarely necessary, it is also possible to use entity names instead of classes to designate the entity types targeted by the search:

Example 130. Targeting entity types by name
SearchResult<Person> result = searchSession.search( (1)
                searchSession.scope( (2)
                        Person.class,
                        Arrays.asList( "Manager", "Associate" )
                )
        )
        .where( f -> f.match() (3)
                .field( "name" )
                .matching( "james" ) )
        .fetch( 20 ); (4)
1 Initiate a search query.
2 Pass a custom scope encompassing the indexes mapped to the Manager and Associate entity types, expecting those entity types to implement the Person interface (Hibernate Search will check that).
3 Continue building the query as usual.
4 Fetch the search result. Hits will all be instances of Person.

9.1.3. Fetching results

Basics

In Hibernate Search, the default search result is a bit more complicated than just "a list of hits". This is why the default methods return a composite SearchResult object offering getters to retrieve the part of the result you want, as shown in the example below.

Example 131. Getting information from a SearchResult
SearchResult<Book> result = searchSession.search( Book.class ) (1)
        .where( f -> f.matchAll() )
        .fetch( 20 ); (2)

long totalHitCount = result.total().hitCount(); (3)
List<Book> hits = result.hits(); (4)
// ... (5)
1 Start building the query as usual.
2 Fetch the results, limiting to the top 20 hits.
3 Retrieve the total hit count, i.e. the total number of matching entities/documents, which could be 10,000 even if you only retrieved the top 20 hits. This is useful to give end users and idea of how many more hits they query produced. See Fetching the total (hit count, …​) for ways to optimize computation of the total hit count.
4 Retrieve the top hits, in this case the top 20 matching entities/documents.
5 Other kinds of results and information can be retrieved from SearchResult. They are explained in dedicated sections, such as Aggregation DSL.

It is possible to retrieve the total hit count alone, for cases where only the number of hits is of interest, not the hits themselves:

Example 132. Getting the total hit count directly
long totalHitCount = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetchTotalHitCount();

The top hits can also be obtained directly, without going through a SearchResult, which can be handy if only the top hits are useful, and not the total hit count:

Example 133. Getting the top hits directly
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetchHits( 20 );

If only zero to one hit is expected, it is possible to retrieve it as an Optional. An exception will be thrown if more than one hits are returned.

Example 134. Getting the only hit directly
Optional<Book> hit = searchSession.search( Book.class )
        .where( f -> f.id().matching( 1 ) )
        .fetchSingleHit();
Fetching all hits

Fetching all hits is rarely a good idea: if the query matches many entities/documents, this may lead to loading millions of entities in memory, which will likely crash the JVM, or at the very least slow it down to a crawl.

If you know your query will always have less than N hits, consider setting the limit to N to avoid memory issues.

If there is no bound to the number of hits you expect, you should consider Pagination or Scrolling to retrieve data in batches.

If you still want to fetch all hits in one call, be aware that the Elasticsearch backend will only ever return 10,000 hits at a time, due to internal safety mechanisms in the Elasticsearch cluster.

Example 135. Getting all hits in a SearchResult
SearchResult<Book> result = searchSession.search( Book.class )
        .where( f -> f.id().matchingAny( Arrays.asList( 1, 2 ) ) )
        .fetchAll();

long totalHitCount = result.total().hitCount();
List<Book> hits = result.hits();
Example 136. Getting all hits directly
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.id().matchingAny( Arrays.asList( 1, 2 ) ) )
        .fetchAllHits();
Fetching the total (hit count, …​)

A SearchResultTotal contains the count of the total hits have been matched the query, either belonging to the current page or not. For pagination see Pagination.

The total hit count is exact by default, but can be replaced with a lower-bound estimate in the following cases:

Example 137. Working with the result total
SearchResult<Book> result = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetch( 20 );

SearchResultTotal resultTotal = result.total(); (1)
long totalHitCount = resultTotal.hitCount(); (2)
long totalHitCountLowerBound = resultTotal.hitCountLowerBound(); (3)
boolean hitCountExact = resultTotal.isHitCountExact(); (4)
boolean hitCountLowerBound = resultTotal.isHitCountLowerBound(); (5)
1 Extract the SearchResultTotal from the SearchResult.
2 Retrieve the exact total hit count. This call will raise an exception if the only available hit count is a lower-bound estimate.
3 Retrieve a lower-bound estimate of the total hit count. This will return the exact hit count if available.
4 Test if the count is exact.
5 Test if the count is a lower bound.
totalHitCountThreshold(…​): optimizing total hit count computation

When working with large result sets, counting the number of hits exactly can be very resource-consuming.

When sorting by score (the default) and retrieving the result through fetch(…​), it is possible to yield significant performance improvements by allowing Hibernate Search to return a lower-bound estimate of the total hit count, instead of the exact total hit count. In that case, the underlying engine (Lucene or Elasticsearch) will be able to skip large chunks of non-competitive hits, leading to fewer index scans and thus better performance.

To enable this performance optimization, call totalHitCountThreshold(…​) when building the query, as shown in the example below.

This optimization has no effect in the following cases:

  • when calling fetchHits(…​): it is already optimized by default.

  • when calling fetchTotalHitCount(): it always returns an exact hit count.

  • when calling scroll(…​) with the Elasticsearch backend: Elasticsearch does not support this optimization when scrolling. The optimization is enabled for scroll(…​) calls with the Lucene backend, however.

Example 138. Defining a total hit count threshold
SearchResult<Book> result = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .totalHitCountThreshold( 1000 ) (1)
        .fetch( 20 );

SearchResultTotal resultTotal = result.total(); (2)
long totalHitCountLowerBound = resultTotal.hitCountLowerBound(); (3)
boolean hitCountExact = resultTotal.isHitCountExact(); (4)
boolean hitCountLowerBound = resultTotal.isHitCountLowerBound(); (5)
1 Define a totalHitCountThreshold for the current query
2 Extract the SearchResultTotal from the SearchResult.
3 Retrieve a lower-bound estimate of the total hit count. This will return the exact hit count if available.
4 Test if the count is exact.
5 Test if the count is a lower-bound estimate.
Pagination

Pagination is the concept of splitting hits in successive "pages", all pages containing a fixed number of elements (except potentially the last one). When displaying results on a web page, the user will be able to go to an arbitrary page and see the corresponding results, for example "results 151 to 170 of 14,265".

Pagination is achieved in Hibernate Search by passing an offset and a limit to the fetch or fetchHits method:

  • The offset defines the number of documents that should be skipped because they were displayed in previous pages. It is a number of documents, not a number of pages, so you will usually want to compute it from the page number and page size this way: offset = zero-based-page-number * page-size.

  • The limit defines the maximum number of hits to return, i.e. the page size.

Example 139. Pagination retrieving a SearchResult
SearchResult<Book> result = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetch( 40, 20 ); (1)
1 Set the offset to 40 and the limit to 20.
Example 140. Pagination retrieving hits directly
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetchHits( 40, 20 ); (1)
1 Set the offset to 40 and the limit to 20.

The index may be modified between the retrieval of two pages. As a result of that modification, it is possible that some hits change position, and end up being present on two subsequent pages.

If you’re running a batch process and want to avoid this, use Scrolling.

Scrolling

Scrolling is the concept of keeping a cursor on the search query at the lowest level, and advancing that cursor progressively to collect subsequent "chunks" of search hits.

Scrolling relies on the internal state of the cursor (which must be closed at some point), and thus is not appropriate for stateless operations such as displaying a page of results to a user in a webpage. However, thanks to this internal state, scrolling is able to guarantee that all returned hits are consistent: there is absolutely no way for a given hit to appear twice.

Scrolling is therefore most useful when processing a large result set as small chunks.

Below is an example of using scrolling in Hibernate Search.

SearchScroll exposes a close() method that must be called to avoid resource leaks.

With the Elasticsearch backend, scrolls can time out and become unusable after some time; See here for more information.

Example 141. Scrolling to retrieve search results in small chunks
try ( SearchScroll<Book> scroll = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .scroll( 20 ) ) { (1)
    for ( SearchScrollResult<Book> chunk = scroll.next(); (2)
            chunk.hasHits(); chunk = scroll.next() ) { (3)
        for ( Book hit : chunk.hits() ) { (4)
            // ... do something with the hits ...
        }

        totalHitCount = chunk.total().hitCount(); (5)

        entityManager.flush(); (6)
        entityManager.clear(); (6)
    }
}
1 Start a scroll that will return chunks of 20 hits. Note the scroll is used in a try-with-resource block to avoid resource leaks.
2 Retrieve the first chunk by calling next(). Each chunk will include at most 20 hits, since that was the selected chunk size.
3 Detect the end of the scroll by calling hasHits() on the last retrieved chunk, and retrieve the next chunk by calling next() again on the scroll.
4 Retrieve the hits of a chunk.
5 Optionally, retrieve the total number of matching entities.
6 Optionally, if using Hibernate ORM and retrieving entities, you might want to use the periodic "flush-clear" pattern to ensure entities don’t stay in the session taking more and more memory.

9.1.4. Routing

For a preliminary introduction to sharding, including how it works in Hibernate Search and what its limitations are, see Sharding and routing.

If, for a given index, there is one immutable value that documents are often filtered on, for example a "category" or a "user id", it is possible to match documents with this value using a routing key instead of a predicate.

The main advantage of a routing key over a predicate is that, on top of filtering documents, the routing key will also filter shards. If sharding is enabled, this means only part of the index will be scanned during query execution, potentially increasing search performance.

A pre-requisite to using routing in search queries is to map your entity in such a way that it is assigned a routing key at indexing time.

Specifying routing keys is done by calling the .routing(String) or .routing(Collection<String>) methods when building the query:

Example 142. Routing a query to a subset of all shards
SearchResult<Book> result = searchSession.search( Book.class ) (1)
        .where( f -> f.match()
                .field( "genre" )
                .matching( Genre.SCIENCE_FICTION ) ) (2)
        .routing( Genre.SCIENCE_FICTION.name() ) (3)
        .fetch( 20 ); (4)
1 Start building the query.
2 Define that only documents matching the given genre should be returned.
3 In this case, the entity is mapped in such a way that the genre is also used as a routing key. We know all documents will have the given genre value, so we can specify the routing key to limit the query to relevant shards.
4 Build the query and fetch the results.

9.1.5. Entity loading options

Hibernate Search executes database queries to load entities that are returned as part of the hits of a search query.

This section presents all available options related to entity loading in search queries.

Cache lookup strategy

By default, Hibernate Search will load entities from the database directly, without looking at any cache. This is a good strategy when the size of caches (Hibernate ORM session or second level cache) is much lower than the total number of indexed entities.

If a significant portion of your entities are present in the second level cache, you can force Hibernate Search to retrieve entities from the persistence context (the session) and/or the second level cache if possible. Hibernate Search will still need to execute a database query to retrieve entities missing from the cache, but the query will likely have to fetch fewer entities, leading to better performance and lower stress on your database.

This is done through the cache lookup strategy, which can be configured by setting the configuration property hibernate.search.query.loading.cache_lookup.strategy:

  • skip (the default) will not perform any cache lookup.

  • persistence-context will only look into the persistence context, i.e. will check if the entities are already loaded in the session. Useful if most search hits are expected to already be loaded in session, which is generally unlikely.

  • persistence-context-then-second-level-cache will first look into the persistence context, then into the second level cache, if enabled in Hibernate ORM for the searched entity. Useful if most search hits are expected to be cached, which may be likely if you have a small number of entities and a large cache.

Before a second-level cache can be used for a given entity type, some configuration is required in Hibernate ORM.

It is also possible to override the configured strategy on a per-query basis, as shown below.

Example 143. Overriding the cache lookup strategy in a single search query
SearchResult<Book> result = searchSession.search( Book.class ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .loading( o -> o.cacheLookupStrategy( (2)
                EntityLoadingCacheLookupStrategy.PERSISTENCE_CONTEXT_THEN_SECOND_LEVEL_CACHE
        ) )
        .fetch( 20 ); (3)
1 Start building the query.
2 Access the loading options of the query, then mention that the persistence context and second level cache should be checked before entities are loaded from the database.
3 Fetch the results. The more entities found in the persistence context or second level cache, the fewer entities will be loaded from the database.
Fetch size

By default, Hibernate Search will use a fetch size of 100, meaning that for a single fetch*() call on a single query, it will run a first query to load the first 100 entities, then if there are more hits it will run a second query to load the next 100, etc.

The fetch size can be configured by setting the configuration property hibernate.search.query.loading.fetch_size. This property expects a strictly positive Integer value.

It is also possible to override the configured fetch size on a per-query basis, as shown below.

Example 144. Overriding the fetch size in a single search query
SearchResult<Book> result = searchSession.search( Book.class ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .loading( o -> o.fetchSize( 50 ) ) (2)
        .fetch( 200 ); (3)
1 Start building the query.
2 Access the loading options of the query, then set the fetch size to an arbitrary value (must be 1 or more).
3 Fetch the results, limiting to the top 200 hits. One query will be executed to load the hits if there are fewer hits than the given fetch size; two queries if there are more hits than the fetch size but less than twice the fetch size, etc.
Entity graph

By default, Hibernate Search will load associations according to the defaults of your mappings: associations marked as lazy won’t be loaded, while associations marked as eager will be loaded before returning the entities.

It is possible to force the loading of a lazy association, or to prevent the loading of an eager association, by referencing an entity graph in the query. See below for an example, and this section of the Hibernate ORM documentation for more information about entity graphs.

Example 145. Applying an entity graph to a search query
EntityManager entityManager = /* ... */

EntityGraph<Manager> graph = entityManager.createEntityGraph( Manager.class ); (1)
graph.addAttributeNodes( "associates" );

SearchResult<Manager> result = Search.session( entityManager ).search( Manager.class ) (2)
        .where( f -> f.match()
                .field( "name" )
                .matching( "james" ) )
        .loading( o -> o.graph( graph, GraphSemantic.FETCH ) ) (3)
        .fetch( 20 ); (4)
1 Build an entity graph.
2 Start building the query.
3 Access the loading options of the query, then set the entity graph to the graph built above. You must also pass a semantic: GraphSemantic.FETCH means only associations referenced in the graph will be loaded; GraphSemantic.LOAD means associations referenced in the graph and associations marked as EAGER in the mapping will be loaded.
4 Fetch the results. All managers loaded by this search query will have their associates association already populated.

Instead of building the entity graph on the spot, you can also define the entity graph statically using the @NamedEntityGraph annotation, and pass the name of your graph to Hibernate Search, as shown below. See this section of the Hibernate ORM documentation for more information about @NamedEntityGraph.

Example 146. Applying a named entity graph to a search query
SearchResult<Manager> result = Search.session( entityManager ).search( Manager.class ) (1)
        .where( f -> f.match()
                .field( "name" )
                .matching( "james" ) )
        .loading( o -> o.graph( "preload-associates", GraphSemantic.FETCH ) ) (2)
        .fetch( 20 ); (3)
1 Start building the query.
2 Access the loading options of the query, then set the entity graph to "preload-associates", which was defined elsewhere using the @NamedEntityGraph annotation.
3 Fetch the results. All managers loaded by this search query will have their associates association already populated.

9.1.6. Timeout

You can limit the time it takes for a search query to execute in two ways:

  • Aborting (throwing an exception) when the time limit is reached with failAfter().

  • Truncating the results when the time limit is reached with truncateAfter().

Currently, the two approaches are incompatible: trying to set both failAfter and truncateAfter will result in unspecified behavior.

failAfter(): Aborting the query after a given amount of time

By calling failAfter(…​) when building the query, it is possible to set a time limit for the query execution. Once the time limit is reached, Hibernate Search will stop the query execution and throw a SearchTimeoutException.

Timeouts are handled on a best-effort basis.

Depending on the resolution of the internal clock and on how often Hibernate Search is able to check that clock, it is possible that a query execution exceeds the timeout. Hibernate Search will try to minimize this excess execution time.

Example 147. Triggering a failure on timeout
            try {
                SearchResult<Book> result = searchSession.search( Book.class ) (1)
                        .where( f -> f.match()
                                .field( "title" )
                                .matching( "robot" ) )
                        .failAfter( 500, TimeUnit.MILLISECONDS ) (2)
                        .fetch( 20 ); (3)
            }
            catch (SearchTimeoutException e) { (4)
                // ...
            }
1 Build the query as usual.
2 Call failAfter to set the timeout.
3 Fetch the results.
4 Catch the exception if necessary.

explain() does not honor this timeout: this method is used for debugging purposes and in particular to find out why a query is slow.

truncateAfter(): Truncating the results after a given amount of time

By calling truncateAfter(…​) when building the query, it is possible to set a time limit for the collection of search results. Once the time limit is reached, Hibernate Search will stop collecting hits and return an incomplete result.

Timeouts are handled on a best-effort basis.

Depending on the resolution of the internal clock and on how often Hibernate Search is able to check that clock, it is possible that a query execution exceeds the timeout. Hibernate Search will try to minimize this excess execution time.

Example 148. Truncating the results on timeout
            SearchResult<Book> result = searchSession.search( Book.class ) (1)
                    .where( f -> f.match()
                            .field( "title" )
                            .matching( "robot" ) )
                    .truncateAfter( 500, TimeUnit.MILLISECONDS ) (2)
                    .fetch( 20 ); (3)

            Duration took = result.took(); (4)
            Boolean timedOut = result.timedOut(); (5)
1 Build the query as usual.
2 Call truncateAfter to set the timeout.
3 Fetch the results.
4 Optionally extract took: how much time the query took to execute.
5 Optionally extract timedOut: whether the query timed out.

explain() and fetchTotalHitCount() do not honor this timeout. The former is used for debugging purposes and in particular to find out why a query is slow. For the latter it does not make sense to return a partial result.

9.1.7. Obtaining a query object

The example presented in most of this documentation fetch the query results directly at the end of the query definition DSL, not showing any "query" object that can be manipulated. This is because the query object generally only makes code more verbose without bringing anything worthwhile.

However, in some cases a query object can be useful. To get a query object, just call toQuery() at the end of the query definition:

Example 149. Getting a SearchQuery object
SearchQuery<Book> query = searchSession.search( Book.class ) (1)
        .where( f -> f.matchAll() )
        .toQuery(); (2)
List<Book> hits = query.fetchHits( 20 ); (3)
1 Build the query as usual.
2 Retrieve a SearchQuery object.
3 Fetch the results.

This query object supports all fetch* methods supported by the query DSL. The main advantage over calling these methods directly at the end of a query definition is mostly related to troubleshooting, but the query object can also be useful if you need an adapter to another API.

Hibernate Search provides an adapter to JPA and Hibernate ORM’s native APIs, i.e. a way to turn a SearchQuery into a javax.persistence.TypedQuery (JPA) or a org.hibernate.query.Query (native ORM API):

Example 150. Turning a SearchQuery into a JPA or Hibernate ORM query
SearchQuery<Book> query = searchSession.search( Book.class ) (1)
        .where( f -> f.matchAll() )
        .toQuery(); (2)
javax.persistence.TypedQuery<Book> jpaQuery = Search.toJpaQuery( query ); (3)
org.hibernate.query.Query<Book> ormQuery = Search.toOrmQuery( query ); (4)
1 Build the query as usual.
2 Retrieve a SearchQuery object.
3 Turn the SearchQuery object into a JPA query.
4 Turn the SearchQuery object into a Hibernate ORM query.

The resulting query does not support all operations, so it is recommended to only convert search queries when absolutely required, for example when integrating with code that only works with Hibernate ORM queries.

The following operations are expected to work correctly in most cases, even though they may behave slightly differently from what is expected from a JPA TypedQuery or Hibernate ORM Query in some cases (including, but not limited to, the type of thrown exceptions):

  • Direct hit retrieval methods: list, getResultList, uniqueResult, …​

  • Scrolling: scroll(), scroll(ScrollMode) (but only with ScrollMode.FORWARDS_ONLY).

  • setFirstResult/setMaxResults and getters.

  • setFetchSize

  • unwrap

  • setHint

The following operations are known not to work correctly, with no plan to fix them at the moment:

  • getHints

  • Parameter-related methods: setParameter, …​

  • Result transformer: setResultTransformer, …​ Use composite projections instead.

  • Lock-related methods: setLockOptions, …​

  • And more: this list is not exhaustive.

9.1.8. explain(…​): Explaining scores

In order to explain the score of a particular document, create a SearchQuery object using toQuery() at the end of the query definition, and then use one of the backend-specific explain(…​) methods; the result of these methods will include a human-readable description of how the score of a specific document was computed.

Regardless of the API used, explanations are rather costly performance-wise: only use them for debugging purposes.

Example 151. Retrieving score explanation — Lucene
LuceneSearchQuery<Book> query = searchSession.search( Book.class )
        .extension( LuceneExtension.get() ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .toQuery(); (2)

Explanation explanation1 = query.explain( 1 ); (3)
Explanation explanation2 = query.explain( "Book", 1 ); (4)

LuceneSearchQuery<Book> luceneQuery = query.extension( LuceneExtension.get() ); (5)
1 Build the query as usual, but using the Lucene extension so that the retrieved query exposes Lucene-specific operations.
2 Retrieve a SearchQuery object.
3 Retrieve the explanation of the score of the entity with ID 1. The explanation is of type Explanation, but you can convert it to a readable string using toString().
4 For multi-index queries, it is necessary to refer to the entity not only by its ID, but also by the name of its type.
5 If you cannot change the code building the query to use the Lucene extension, you can instead use the Lucene extension on the SearchQuery to convert it after its creation.
Example 152. Retrieving score explanation — Elasticsearch
ElasticsearchSearchQuery<Book> query = searchSession.search( Book.class )
        .extension( ElasticsearchExtension.get() ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .toQuery(); (2)

JsonObject explanation1 = query.explain( 1 ); (3)
JsonObject explanation2 = query.explain( "Book", 1 ); (4)

ElasticsearchSearchQuery<Book> elasticsearchQuery = query.extension( ElasticsearchExtension.get() ); (5)
1 Build the query as usual, but using the Elasticsearch extension so that the retrieved query exposes Elasticsearch-specific operations.
2 Retrieve a SearchQuery object.
3 Retrieve the explanation of the score of the entity with ID 1.
4 For multi-index queries, it is necessary to refer to the entity not only by its ID, but also by the name of its type.
5 If you cannot change the code building the query to use the Elasticsearch extension, you can instead use the Elasticsearch extension on the SearchQuery to convert it after its creation.

9.1.9. took and timedOut: finding out how long the query took

Example 153. Returning query execution time and whether a timeout occurred
SearchQuery<Book> query = searchSession.search( Book.class )
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .toQuery();

SearchResult<Book> result = query.fetch( 20 ); (1)

Duration took = result.took(); (2)
Boolean timedOut = result.timedOut(); (3)
1 Fetch the results.
2 Extract took: how much time the query took (in case of Elasticsearch, ignoring network latency between the application and the Elasticsearch cluster).
3 Extract timedOut: whether the query timed out (in case of Elasticsearch, ignoring network latency between the application and the Elasticsearch cluster).

9.1.10. Elasticsearch: leveraging advanced features with JSON manipulation

Features detailed in this section are incubating: they are still under active development.

The usual compatibility policy does not apply: the contract of incubating elements (e.g. types, methods, configuration properties, etc.) may be altered in a backward-incompatible way — or even removed — in subsequent releases.

You are encouraged to use incubating features so the development team can get feedback and improve them, but you should be prepared to update code which relies on them as needed.

Elasticsearch ships with many features. It is possible that at some point, one feature you need will not be exposed by the Search DSL.

To work around such limitations, Hibernate Search provides ways to:

  • Transform the HTTP request sent to Elasticsearch for search queries.

  • Read the raw JSON of the HTTP response received from Elasticsearch for search queries.

Direct changes to the HTTP request may conflict with Hibernate Search features and be supported differently by different versions of Elasticsearch.

Similarly, the content of the HTTP response may change depending on the version of Elasticsearch, depending on which Hibernate Search features are used, and even depending on how Hibernate Search features are implemented.

Thus, features relying on direct access to HTTP requests or responses cannot be guaranteed to continue to work when upgrading Hibernate Search, even for micro upgrades (x.y.z to x.y.(z+1)).

Use this at your own risk.

Most simple use cases will only need to change the HTTP request slightly, as shown below.

Example 154. Transforming the Elasticsearch request manually in a search query
List<Book> hits = searchSession.search( Book.class )
        .extension( ElasticsearchExtension.get() ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .requestTransformer( context -> { (2)
            Map<String, String> parameters = context.parametersMap(); (3)
            parameters.put( "search_type", "dfs_query_then_fetch" );

            JsonObject body = context.body(); (4)
            body.addProperty( "min_score", 0.5f );
        } )
        .fetchHits( 20 ); (5)
1 Build the query as usual, but using the Elasticsearch extension so that Elasticsearch-specific options are available.
2 Add a request transformer to the query. Its transform method will be called whenever a request is about to be sent to Elasticsearch.
3 Inside the transform method, alter the HTTP query parameters.
4 It is also possible to alter the request’s JSON body as shown here, or even the request’s path (not shown in this example).
5 Retrieve the result as usual.

For more complicated use cases, it is possible to access the raw JSON of the HTTP response, as shown below.

Example 155. Accessing the Elasticsearch response body manually in a search query
ElasticsearchSearchResult<Book> result = searchSession.search( Book.class )
        .extension( ElasticsearchExtension.get() ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robt" ) )
        .requestTransformer( context -> { (2)
            JsonObject body = context.body();
            body.add( "suggest", jsonObject( suggest -> { (3)
                suggest.add( "my-suggest", jsonObject( mySuggest -> {
                    mySuggest.addProperty( "text", "robt" );
                    mySuggest.add( "term", jsonObject( term -> {
                        term.addProperty( "field", "title" );
                    } ) );
                } ) );
            } ) );
        } )
        .fetch( 20 ); (4)

JsonObject responseBody = result.responseBody(); (5)
JsonArray mySuggestResults = responseBody.getAsJsonObject( "suggest" ) (6)
        .getAsJsonArray( "my-suggest" );
1 Build the query as usual, but using the Elasticsearch extension so that Elasticsearch-specific options are available.
2 Add a request transformer to the query.
3 Add content to the request body, so that Elasticsearch will return more data in the response. Here we’re asking Elasticsearch to apply a suggester.
4 Retrieve the result as usual. Since we used the Elasticsearch extension when building the query, the result is an ElasticsearchSearchResult instead of the usual SearchResult.
5 Get the response body as a JsonObject.
6 Extract useful information from the response body. Here we’re extracting the result of the suggester we configured above.

Gson’s API for building JSON objects is quite verbose, so the example above relies on a small, custom helper method to make the code more readable:

private static JsonObject jsonObject(Consumer<JsonObject> instructions) {
    JsonObject object = new JsonObject();
    instructions.accept( object );
    return object;
}

When data needs to be extracted from each hit, it is often more convenient to use the jsonHit projection than parsing the whole response.

9.1.11. Lucene: retrieving low-level components

Lucene queries allow to retrieve some low-level components. This should only be useful to integrators, but is documented here for the sake of completeness.

Example 156. Accessing low-level components in a Lucene search query
LuceneSearchQuery<Book> query = searchSession.search( Book.class )
        .extension( LuceneExtension.get() ) (1)
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" ) )
        .sort( f -> f.field( "title_sort" ) )
        .toQuery(); (2)

Sort sort = query.luceneSort(); (3)

LuceneSearchResult<Book> result = query.fetch( 20 ); (4)

TopDocs topDocs = result.topDocs(); (5)
1 Build the query as usual, but using the Lucene extension so that Lucene-specific options are available.
2 Since we used the Lucene extension when building the query, the query is a LuceneSearchQuery instead of the usual SearchQuery.
3 Retrieve the org.apache.lucene.search.Sort this query relies on.
4 Retrieve the result as usual. LuceneSearchQuery returns a LuceneSearchResult instead of the usual SearchResult.
5 Retrieve the org.apache.lucene.search.TopDocs for this result. Note that the TopDocs are offset according to the arguments to the fetch method, if any.

9.2. Predicate DSL

9.2.1. Basics

The main component of a search query is the predicate, i.e. the condition that every document must satisfy in order to be included in search results.

The predicate is configured when building the search query:

Example 157. Defining the predicate of a search query
SearchSession searchSession = Search.session( entityManager );

List<Book> result = searchSession.search( Book.class ) (1)
        .where( f -> f.match().field( "title" ) (2)
                .matching( "robot" ) )
        .fetchHits( 20 ); (3)
1 Start building the query.
2 Mention that the results of the query are expected to have a title field matching the value robot. If the field does not exist or cannot be searched on, an exception will be thrown.
3 Fetch the results, which will match the given predicate.

Alternatively, if you don’t want to use lambdas:

Example 158. Defining the predicate of a search query — object-based syntax
SearchSession searchSession = Search.session( entityManager );

SearchScope<Book> scope = searchSession.scope( Book.class );

List<Book> result = searchSession.search( scope )
        .where( scope.predicate().match().field( "title" )
                .matching( "robot" )
                .toPredicate() )
        .fetchHits( 20 );

The predicate DSL offers more predicate types, and multiple options for each type of predicate. To learn more about the match predicate, and all the other types of predicate, refer to the following sections.

9.2.2. matchAll: match all documents

The matchAll predicate simply matches all documents.

Example 159. Matching all documents
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetchHits( 20 );
except(…​): exclude documents matching a given predicate

Optionally, you can exclude a few documents from the hits:

Example 160. Matching all documents except those matching a given predicate
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll()
                .except( f.match().field( "title" )
                        .matching( "robot" ) )
        )
        .fetchHits( 20 );
Other options

9.2.3. id: match a document identifier

The id predicate matches documents by their identifier.

Example 161. Matching a document with a given identifier
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.id().matching( 1 ) )
        .fetchHits( 20 );

You can also match multiple ids in a single predicate:

Example 162. Matching all documents with an identifier among a given collection
List<Integer> ids = new ArrayList<>();
ids.add( 1 );
ids.add( 2 );
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.id().matchingAny( ids ) )
        .fetchHits( 20 );
Expected type of arguments

By default, the id predicate expects arguments to the matching(…​)/matchingAny(…​) methods to have the same type as the entity property corresponding to the document id.

For example, if the document identifier is generated from an entity identifier of type Long, the document identifier will still be of type String. matching(…​)/matchingAny(…​) will expect its argument to be of type Long regardless.

This should generally be what you want, but if you ever need to bypass conversion and pass an unconverted argument (of type String) to matching(…​)/matchingAny(…​), see Type of arguments passed to the DSL.

Other options

9.2.4. match: match a value

The match predicate matches documents for which a given field has a given value.

Example 163. Matching a value
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match().field( "title" )
                .matching( "robot" ) )
        .fetchHits( 20 );
Expected type of arguments

By default, the match predicate expects arguments to the matching(…​) method to have the same type as the entity property corresponding to the target field.

For example, if an entity property is of an enum type, the corresponding field may be of type String. .matching(…​) will expect its argument to have the enum type regardless.

This should generally be what you want, but if you ever need to bypass conversion and pass an unconverted argument (of type String in the example above) to .matching(…​), see Type of arguments passed to the DSL.

Targeting multiple fields

Optionally, the predicate can target multiple fields. In that case, the predicate will match documents for which any of the given fields matches.

Analysis

For most field types (number, date, …​), the match is exact. However, for full-text fields or normalized keyword fields, the value passed to the matching(…​) method is analyzed or normalized before being compared to the values in the index. This means the match is more subtle in two ways.

First, the predicate will not just match documents for which a given field has the exact same value: it will match all documents for which this field has a value whose normalized form is identical. See below for an example.

Example 164. Matching normalized terms