Preface

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

1. Getting started

This section will guide you through the initial steps required to integrate Hibernate Search into your application.

Hibernate Search 6.0.0.Alpha1 is a technology preview and is not ready for production.

Use it to have a sneak peak at the APIs, make suggestions or warn us of what you consider blocking early so we can fix it, but do not use it to address business needs!

Read the dedicated page on our website for more detailed and up-to-date information.

1.1. Compatibility

Table 1. Compatibility

Java Runtime

Java 8 or greater.

Hibernate ORM (for the ORM mapper)

Hibernate ORM 5.3.7.Final.

JPA (for the ORM mapper)

JPA 2.2.

1.2. Migration notes

If you are upgrading an existing application from an earlier version of Hibernate Search to the latest release, make sure to check out the migration guide.

To Hibernate Search 5 users

If you pull our artifacts from a Maven repository and you come from Hibernate Search 5, be aware that just bumping the version number will not be enough.

In particular, the group IDs changed from org.hibernate to org.hibernate.search, most of the artifact IDs changed to reflect the new mapper/backend design, and the Lucene integration now requires an explicit dependency instead of being available by default. Read Dependencies for more information.

Additionally, be aware that a lot of APIs changed, some only because of a package change, others because of more fundamental changes (like moving away from using Lucene types in Hibernate Search APIs).

1.3. Dependencies

The Hibernate Search artifacts can be found in Maven’s Central Repository.

If you do not want to, or cannot, fetch the JARs from a Maven repository, you can get them from the distribution bundle hosted at Sourceforge.

In order to use Hibernate Search, you will need at least two direct dependencies:

  • a dependency to the "mapper", which extracts data from your domain model and maps it to indexable documents;

  • and a dependency to the "backend", which allows to index and search these documents.

Below are the most common setups and matching dependencies for a quick start; read Architecture for more information.

Hibernate ORM + Lucene

Allows indexing of ORM entities in a single application node, storing the index on the local filesystem.

If you get Hibernate Search from Maven, use these dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.0.0.Alpha1</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-lucene</artifactId>
   <version>6.0.0.Alpha1</version>
</dependency>

If you get Hibernate Search from the distribution bundle, copy the JARs from dist/engine, dist/mapper/orm, dist/backend/lucene, and their respective lib subdirectories.

Hibernate ORM + Elasticsearch

Allows indexing of ORM entities on multiple application nodes, storing the index on a remote Elasticsearch cluster (to be configured separately).

If you get Hibernate Search from Maven, use these dependencies:

<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-mapper-orm</artifactId>
   <version>6.0.0.Alpha1</version>
</dependency>
<dependency>
   <groupId>org.hibernate.search</groupId>
   <artifactId>hibernate-search-backend-elasticsearch</artifactId>
   <version>6.0.0.Alpha1</version>
</dependency>

If you get Hibernate Search from the distribution bundle, copy the JARs from dist/engine, dist/mapper/orm, dist/backend/elasticsearch, and their respective lib subdirectories.

1.4. Configuration

Once you have added all required dependencies to your application you have to add a couple of properties to your Hibernate ORM configuration file.

In case you are a Hibernate ORM new timer we recommend you start there to implement entity persistence in your application, and only then come back here to add Hibernate Search indexing.

The properties are sourced from Hibernate ORM, so they can be added to any file from which Hibernate ORM takes its configuration:

  • A hibernate.properties file in your classpath.

  • The hibernate.cfg.xml file in your classpath, if using Hibernate ORM native bootstrapping.

  • The persistence.xml file in your classpath, if using Hibernate ORM JPA bootstrapping.

The minimal working configuration is short, but depends on your setup:

Example 1. Hibernate Search properties in persistence.xml for a "Hibernate ORM + Lucene" setup
<property name="hibernate.search.backends.myBackend.type"
          value="org.hibernate.search.backend.lucene.impl.LuceneBackendFactory"/> (1)
<property name="hibernate.search.backends.myBackend.lucene.directory_provider"
          value="local_directory"/> (2)
<!--
<property name="hibernate.search.backends.myBackend.lucene.root_directory"
          value="some/filesystem/path"/>
 --> (3)
<property name="hibernate.search.indexes.default.backend"
          value="myBackend"/> (4)
1 Define a backend named "myBackend" relying on Lucene technology.
2 Define the storage for that backend as a local filesystem directory.
3 The backend will store indexes in the current working directory by default. If you want to store the indexes elsewhere, uncomment this line and set the value of the property.
4 Make sure to use the backend we just defined for all indexes.
Example 2. Hibernate Search properties in persistence.xml for a "Hibernate ORM + Elasticsearch" setup
<property name="hibernate.search.backends.myBackend.type"
          value="org.hibernate.search.backend.elasticsearch.impl.ElasticsearchBackendFactory" /> (1)
<!--
<property name="hibernate.search.backends.myBackend.host"
          value="https://elasticsearch.mycompany.com"/>
<property name="hibernate.search.backends.myBackend.username"
          value="ironman"/>
<property name="hibernate.search.backends.myBackend.password"
          value="j@rV1s"/>
 --> (2)
<property name="hibernate.search.indexes.default.backend"
          value="myBackend"/> (3)
1 Define a backend named "myBackend" relying on Elasticsearch technology.
2 The backend will attempt to connect to http://localhost:9200 by default. If you want to connect to another URL, uncomment these lines and set the value for the "host" property, and optionally the username and password.
3 Make sure to use the backend we just defined for all indexes.

1.5. Mapping

Let’s assume that your application contains the Hibernate ORM managed classes Book and Author and you want to index them in order to search the books contained in your database.

Example 3. Book and Author entities BEFORE adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

@Entity
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    private String title;

    @ManyToMany
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

@Entity
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}

To make these entities searchable, you will need to map them to an index structure. The mapping can be defined using annotations, or using a programmatic API; this getting started guide will show you a simple annotation mapping. For more details, refer to Mapping Java entities to the index structure.

Below is an example of how the model above can be mapped.

Example 4. Book and Author entities AFTER adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;

@Entity
@Indexed (1)
public class Book {

    @Id (2)
    @GeneratedValue
    private Integer id;

    @GenericField (3)
    private String title;

    @ManyToMany
    @IndexedEmbedded (4)
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;

@Entity (5)
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @GenericField (3)
    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 @Indexed marks Book as indexed, i.e. an index will be created for that entity, and that index will be kept up to date.
2 By default, the JPA @Id is used to generate a document identifier.
3 @GenericField maps a property to an index field with the same name and type. As such, the field is indexed in a way that only allows exact matches; full-text matches will be discussed in a moment.
4 @IndexedEmbedded allows to "embed" the indexed form of associated objects (entities or embeddables) into the indexed form of the embedding entity. Here, the Author class defines a single indexed field, name. Thus adding @IndexedEmbedded to the authors property of Book will add a single authors.name field to the Book index. This field will be populated automatically based on the content of the authors property, and the books will be reindexed automatically whenever the name property of their author changes. See Indexed-embedded for more information.
5 Entities that are only @IndexedEmbedded in other entities, but do not require to be searchable by themselves, do not need to be annotated with @Indexed.

This is a very simple example, but is enough to get started. Just remember that Hibernate Search allows more complex mappings:

  • Other @*Field annotations exist, some of them allowing full-text search, some of them allowing finer-grained configuration for field of a certain type. You can find out more about @*Field annotations in Direct field mapping.

  • Properties, or even types, can be mapped with finer-grained control using "bridges". See Bridges for more information.

1.6. Indexing

Hibernate Search will transparently index every entity persisted, updated or removed through Hibernate ORM. Thus this code would transparently populate your index:

Example 5. Using Hibernate ORM to persist data, and implicitly indexing it through Hibernate Search
// Not shown: get the entity manager and open a transaction
Author author = new Author();
author.setName( "John Doe" );

Book book = new Book();
book.setTitle( "Refactoring: Improving the Design of Existing Code" );
book.getAuthors().add( author );
author.getBooks().add( book );

entityManager.persist( author );
entityManager.persist( book );
// Not shown: commit the transaction and close the entity manager

However, keep in mind that data already present in your database when you add the Hibernate Search integration is unknown to Hibernate Search, and thus has to be indexed through a batch process.

The mass indexer is not yet available in Hibernate Search 6.0.0.Alpha1. We will add an example here when it is implemented. See HSEARCH-3268.

1.7. Searching

Once the data is indexed, you can perform search queries.

The following code will prepare a search query targeting the index for the Book entity, filtering the results so that at least one field among title and authors.name matches the string Refactoring: Improving the Design of Existing Code exactly.

Example 6. Using Hibernate Search to query the indexes
// Not shown: get the entity manager and open a transaction
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager( entityManager ); (1)

FullTextSearchTarget<Book> searchTarget = fullTextEntityManager.search( Book.class ); (2)

FullTextQuery<Book> query = searchTarget.query() (3)
        .asEntity() (4)
        .predicate( searchTarget.predicate().match() (5)
                .onFields( "title", "authors.name" )
                .matching( "Refactoring: Improving the Design of Existing Code" )
                .toPredicate()
        )
        .build(); (6)

List<Book> result = query.getResultList(); (7)
// Not shown: commit the transaction and close the entity manager
1 Get the Hibernate Search-specific version of the EntityManager, called FullTextEntityManager.
2 Create a "search target", representing the indexed types that will be queried.
3 Use the "search target" to start creating the query.
4 Define the results expected from the query; here we expect managed Hibernate ORM entities, but other options are available.
5 Define that only documents matching the given predicate should be returned. The predicate is created using the same search target as the query.
6 Build the query.
7 Fetch the results.

If this first example looks too verbose to you, you can use an alternative, lambda-based syntax that spares you the declaration of a variable for the search target:

Example 7. Using Hibernate Search to query the indexes - lambda syntax
// Not shown: get the entity manager and open a transaction
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager( entityManager );

FullTextQuery<Book> query = fullTextEntityManager.search( Book.class ).query()
        .asEntity()
        .predicate( factory -> factory.match()
                .onFields( "title", "authors.name" )
                .matching( "Refactoring: Improving the Design of Existing Code" )
                .toPredicate()
        )
        .build();

List<Book> result = query.getResultList();
// Not shown: commit the transaction and close the entity manager

1.8. Analysis

Exact matches are well and good, but obviously not what you would expect from a full-text search engine.

For non-exact matches, you will need to configure analysis.

1.8.1. Concept

In the Lucene world (Lucene, Elasticsearch, Solr, …​), non-exact matches can be achieved by applying what is called an "analyzer" to both documents (when indexing) and search terms (when querying).

The analyzer will perform three steps, delegated to the following components, in the following order:

  1. Character filter: transforms the input text: replaces, adds or removes characters. This step is rarely used, generally text is transformed in the third step.

  2. Tokenizer: splits the text into several words, called "tokens".

  3. Token filter: transforms the tokens: replaces, add or removes characters in a token, derives new tokens from the existing ones, removes tokens based on some condition, …​

In order to perform non-exact matches, you will need to either pick a pre-defined analyzer, or define your own by combining character filters, a tokenizer, and token filters.

The following section will give a reasonable example of a general-purpose analyzer. For more advanced use cases, refer to the Analysis section.

1.8.2. Configuration

Once you know what analysis is and which analyzer you want to apply, you will need to define it, or at least give it a name in Hibernate Search. This is done though analysis configurers, which are defined per backend:

  1. First, you need to implement an analysis configurer, a Java class that implements a backend-specific interface: LuceneAnalysisConfigurer or ElasticsearchAnalysisConfigurer.

  2. Second, you need to alter the configuration of your backend to actually use your analysis configurer.

As an example, let’s assume that one of your indexed Book entities has the title "Refactoring: Improving the Design of Existing Code", and you want to get hits for any of the following search terms: "Refactor", "refactors", "refactored" and "refactoring". One way to achieve this is to use an analyzer with the following components:

  • A "standard" tokenizer, which splits words at whitespaces, punctuation characters and hyphens. It is a good general purpose tokenizer.

  • A "lowercase" filter, which converts every character to lowercase.

  • A "snowball" filter, which applies language-specific stemming.

The examples below show how to define an analyzer with these components, depending on the backend you picked.

Example 8. Analysis configurer implementation and configuration in persistence.xml for a "Hibernate ORM + Lucene" setup
package org.hibernate.search.documentation.gettingstarted.withhsearch.withanalysis;

import org.hibernate.search.backend.lucene.analysis.LuceneAnalysisConfigurer;
import org.hibernate.search.backend.lucene.analysis.model.dsl.LuceneAnalysisDefinitionContainerContext;

import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilterFactory;
import org.apache.lucene.analysis.snowball.SnowballPorterFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
    @Override
    public void configure(LuceneAnalysisDefinitionContainerContext context) {
        context.analyzer( "myAnalyzer" ).custom() (1)
                .tokenizer( StandardTokenizerFactory.class ) (2)
                .tokenFilter( ASCIIFoldingFilterFactory.class ) (3)
                .tokenFilter( LowerCaseFilterFactory.class ) (3)
                .tokenFilter( SnowballPorterFilterFactory.class ) (3)
                        .param( "language", "English" ); (4)
    }
}
<property name="hibernate.search.backends.myBackend.analysis_configurer"
          value="org.hibernate.search.documentation.gettingstarted.withhsearch.withanalysis.MyLuceneAnalysisConfigurer"/> (5)
1 Define a custom analyzer named "myAnalyzer".
2 Set the tokenizer to a standard tokenizer. You need to pass factory classes to refer to components.
3 Set the token filters. Token filters are applied in the order they are given.
4 Set the value of a parameter for the last added char filter/tokenizer/token filter.
5 Assign the configurer to the backend "myBackend" in the Hibernate Search configuration (here in persistence.xml).
Example 9. Analysis configurer implementation and configuration in persistence.xml for a "Hibernate ORM + Elasticsearch" setup
package org.hibernate.search.documentation.gettingstarted.withhsearch.withanalysis;

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;
import org.hibernate.search.backend.elasticsearch.analysis.model.dsl.ElasticsearchAnalysisDefinitionContainerContext;

public class MyElasticsearchAnalysisConfigurer implements ElasticsearchAnalysisConfigurer {
    @Override
    public void configure(ElasticsearchAnalysisDefinitionContainerContext context) {
        context.analyzer( "myAnalyzer" ).custom() (1)
                .withTokenizer( "standard" ) (2)
                .withTokenFilters( "asciifolding", "lowercase", "mySnowballFilter" ); (3)

        context.tokenFilter( "mySnowballFilter" ) (4)
                .type( "snowball" )
                .param( "language", "English" ); (5)
    }
}
<property name="hibernate.search.backends.myBackend.analysis_configurer"
          value="org.hibernate.search.documentation.gettingstarted.withhsearch.withanalysis.MyElasticsearchAnalysisConfigurer"/> (6)
1 Define a custom analyzer named "myAnalyzer".
2 Set the tokenizer to a standard tokenizer.
3 Set the token filters. Token filters are applied in the order they are given.
4 Note that, for Elasticsearch, any parameterized char filter, tokenizer or token filter must be defined separately and given a name.
5 Set the value of a parameter for the char filter/tokenizer/token filter being defined.
6 Assign the configurer to the backend "myBackend" in the Hibernate Search configuration (here in persistence.xml).

Once analysis is configured, the mapping must be adapted to assign the relevant analyzer to each field:

Example 10. Book and Author entities after adding Hibernate Search specific annotations
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;

@Entity
@Indexed
public class Book {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "myAnalyzer") (1)
    private String title;

    @ManyToMany
    @IndexedEmbedded
    private Set<Author> authors = new HashSet<>();

    public Book() {
    }

    // Getters and setters
    // ...

}
import java.util.HashSet;
import java.util.Set;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToMany;

import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;

@Entity
public class Author {

    @Id
    @GeneratedValue
    private Integer id;

    @FullTextField(analyzer = "myAnalyzer") (1)
    private String name;

    @ManyToMany(mappedBy = "authors")
    private Set<Book> books = new HashSet<>();

    public Author() {
    }

    // Getters and setters
    // ...

}
1 Replace the @GenericField annotation with @FullTextField, and set the analyzer parameter to the name of the custom analyzer configured earlier.

That’s it! Now, once the entities will be reindexed, you will be able to search for the terms "Refactor", "refactors", "refactored" or "refactoring", and the book with the title "Refactoring: Improving the Design of Existing Code" will show up in the results.

Mapping changes are not auto-magically applied to already-indexed data. Unless you know what you are doing, you should remember to reindex your data after you changed the Hibernate Search mapping of your entities.

Example 11. Using Hibernate Search to query the indexes after analysis was configured
// Not shown: get the entity manager and open a transaction
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager( entityManager );

FullTextQuery<Book> query = fullTextEntityManager.search( Book.class ).query()
        .asEntity()
        .predicate( factory -> factory.match()
                .onFields( "title", "authors.name" )
                .matching( "refactor" )
                .toPredicate()
        )
        .build();

List<Book> result = query.getResultList();
// Not shown: commit the transaction and close the entity manager

1.9. What’s next

The above paragraphs helped you getting an overview of Hibernate Search. The next step after this tutorial is to get more familiar with the overall architecture of Hibernate Search (Architecture) and explore the basic features in more detail.

Two topics which were only briefly touched in this tutorial were analysis configuration (Analysis) and bridges (Bridges). Both are important features required for more fine-grained indexing.

Other features that you will probably want to use include sorts and projections

2. Architecture

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

3. Configuration

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

4. Mapping Java entities to the index structure

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

4.1. Direct field mapping

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

4.2. Bridges

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

Starting with Hibernate Search 6, there are three main interfaces for bridges:

  • ValueBridge can be used for simple use cases when mapping an object’s property.

    The ValueBridge is applied at the property level using one of the pre-defined @*Field annotations: @GenericField, @FullTextField, …​

    ValueBridge is a suitable interface for your custom bridge if:

    • The property value should be mapped to a single index field.

    • The bridge should be applied to a property whose type is effectively immutable. For example Integer, or a custom enum type, or a custom bean type whose content never changes would be suitable candidates, but a custom bean type with setters would most definitely not.

  • PropertyBridge can be used for more complex uses cases when mapping an object’s property.

    The PropertyBridge is applied at the property level using a custom annotation.

    PropertyBridge can be used even if the property being mapped has a mutable type, or if its value should be mapped to multiple index fields.

  • TypeBridge should be used when mapping multiple properties of an object, potentially combining them in the process.

    The TypeBridge is applied at the type level using a custom annotation.

    Similarly to PropertyBridge, TypeBridge can be used even if the properties being mapped have a mutable type, or if their values should be mapped to multiple index fields.

You can find example of custom bridges in the Hibernate Search source code:

  • org.hibernate.search.integrationtest.showcase.library.bridge.ISBNBridge implements ValueBridge.

  • org.hibernate.search.integrationtest.showcase.library.bridge.MultiKeywordStringBridge implements PropertyBridge. The corresponding annotation is org.hibernate.search.integrationtest.showcase.library.bridge.annotation.MultiKeywordStringBridge.

  • org.hibernate.search.integrationtest.showcase.library.bridge.AccountBorrowalSummaryBridge implements TypeBridge. The corresponding annotation is org.hibernate.search.integrationtest.showcase.library.bridge.annotation.AccountBorrowalSummaryBridge.

4.3. Indexed-embedded

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

5. Analysis

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

To know which character filters, tokenizers and token filters are available, refer to the documentation specific to each backend:

  • For Lucene, either browse the Lucene JavaDoc or read the corresponding section on the Solr Wiki.

  • For Elasticsearch, have a look at the online documentation. If you want to use a built-in analyzer and not create your own: analyzers; if you want to define your own analyzer: character filters, tokenizers, token filters.

Why the reference to the Apache Solr wiki for Lucene?

The analyzer factory framework was originally created in the Apache Solr project. Most of these implementations have been moved to Apache Lucene, but the documentation for these additional analyzers can still be found in the Solr Wiki. You might find other documentation referring to the "Solr Analyzer Framework"; just remember you don’t need to depend on Apache Solr anymore: the required classes are part of the core Lucene distribution.

6. Search DSL

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

6.1. Sort

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

6.2. Projection

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

7. Manual index changes

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

8. Lucene backend

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

9. Elasticsearch backend

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

10. Index Optimization

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

11. Monitoring

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

12. Spatial

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

13. Advanced features

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

14. Internals of Hibernate Search

This section is intended for new Hibernate Search contributors looking for an introduction to how Hibernate Search works.

Knowledge of the Hibernate Search APIs and how to use them is a requirement to understand this section.

14.1. General overview

This section focuses on describing what the different parts of Hibernate Search are at a high level and how they interact with each other.

Hibernate Search internals are split into three parts:

Backends

The backends are where "things get done". They implement common indexing and searching interfaces for use by the mappers through "index managers", each providing access to one index. Examples include the Lucene backend, delegating to the Lucene library, and the Elasticsearch backend, delegating to a remote Elasticsearch cluster.

The word "backend" may refer either to a whole Maven module (e.g. "the Elasticsearch backend") or to a single, central class in this module (e.g. the ElasticsearchBackend class implementing the Backend interface), depending on context.
Mappers

Mappers are what users see. They "map" the user model to an index, and provide APIs consistent with the user model to perform indexing and searching. For instance the POJO mapper provides APIs that allow to index getters and fields of Java objects according to a configuration provided at boot time.

The word "mapper" may refer either to a whole Maven module (e.g. "the POJO mapper") or to a single, central class in this module (e.g. the PojoMapper class implementing the Mapper interface), depending on context.
Engine

The engine defines some APIs, a lot of SPIs, and implements the code needed to start and stop Hibernate Search, and to "glue" mappers and backends together during bootstrap.

Those parts are strictly separated in order to allow to use them interchangeably. For instance the Elasticsearch backend could be used indifferently with a POJO mapper or a JSON mapper, and we will only have to implement the backend once.

Here is an example of what Hibernate Search would look like at runtime, from a high level perspective:

High-level view of a Hibernate Search instance at runtime
A "mapping" is a very coarse-grained term, here. A single POJO mapping, for instance, may support many indexed entities.

The mapping was provided, during bootstrap, with several "index managers", each exposing SPIs allowing to search and index. The purpose of the mapping is to transform calls to their APIs into call to the index manager SPIs. This requires to perform conversions of:

  • indexed data: the data manipulated by the mapping may take any form, but it has to be converted to a document accepted by the index manager.

  • index references, e.g. a search query targeting classes MyEntity and MyOtherEntity must instead target index manager 1 and index manager 2.

  • document references, e.g. a search query executed at the index manager level may return "document 1 in index 1 matched the query", but the user wants to see "entity 1 of type MyEntity matched the query".

The purpose of the SearchIntegration is mainly to keep track of every resource (mapping or backend) created at bootstrap, and allow to close it all from a single call.

Finally, the purpose of the backend and its index managers is to execute the actual work and return results when relevant.

The architecture is able to support more complex user configurations. The example below shows a Hibernate Search instance with two mappings: a POJO mapping and a JSON mapping.

High-level view of a more complex Hibernate Search instance at runtime

The example is deliberately a bit contrived, in order to demonstrate some subtleties:

  • There are two mappings in this example. Most setups will only configure one mapping, but it is important to keep in mind there may be more. In particular, we anticipate that Infinispan may need multiple different mappings in a single Hibernate Search instance, in order to handle the multiple input types it accepts from its users.

  • There are multiple backends in this example. Again, most setups will only ever configure one, but there may be good reasons to use more. For instance if someone wants to index part of the entities in one Elasticsearch cluster, and the other part in another cluster.

  • Here, the two mappings each use one index manager from the same Elasticsearch backend. This is currently possible, though whether there are valid uses cases for this remains to be determined, mainly based on the Infinispan needs.

14.1.1. Bootstrap

Bootstrap starts by creating at least two components:

  • The SearchIntegrationBuilder, which allows to setup all the mapper-independent configuration: bean resolver, configuration property sources for the backends, …​

  • At least one MappingInitiator instance, of a type provided by the mapper module, which will register itself to the SearchIntegrationBuilder. From the point of view of the engine, it is a callback that will come into play later.

The idea is that the SearchIntegrationBuilder will allow one or more initiators to provide configuration about their mapping, in particular metadata about various "mappable" types (in short, the types manipulated by the user). Then the builder will organize this metadata, check the consistency to some extent, create backends and index manager builders as necessary, and then provide the (organized) metadata back to the mapper module along with handles to index manager builders so that it can start its own bootstrapping.

To sum up: the SearchIntegrationBuilder is a facilitator, allowing to start mapper bootstrapping with everything that is necessary:

  • engine services and components (BuildContext);

  • configuration properties (ConfigurationPropertySource);

  • organized metadata (TypeMetadataContributorProvider);

  • one handle to the backend layer (IndexManagerBuildingState) for each indexed type.

All this is provided to the mapper through the MappingInitiator and Mapper interfaces.

Mapper bootstrapping is really up to the mapper module, but one thing that won’t change is what mappers can do with the handles to the backend layer. These handles are instances of IndexManagerBuildingState and each one represents an index manager being built. As the mapper inspects the metadata, it will infer the required fields in the index, and will contribute this information to the backend using the dedicated SPI: IndexModelBindingContext, IndexSchemaElement, IndexSchemaFieldContext are the most important parts.

All this information about the required fields and their options (field type, whether it’s stored, how it is analyzed, …​) will be validated and will allow the backend to build an internal representation of the index schema, which will be used for various, backend-specific purposes, for example initializing a remote Elasticsearch index or inferring the required type of parameters to a range query on a given field.

14.1.2. Indexing

The entry point for indexing is specific to each mapper, and so are the upper levels of each mapper implementation. But at the lower levels, indexing in a mapper comes down to using the backend SPIs.

When indexing, the mapper must build a document that will be passed to the backend. This is done using index field accessors. During bootstrap, whenever the mapper declared a field, the backend returned an accessor (see IndexSchemaFieldTerminalContext#createAccessor). In order to build a document, the mapper extracts data from an object to index, but it then needs to store the extracted data at the appropriate place in the document. The accessor is responsible for exactly this: given a document, it sets the value of the field it represents to some mapper-provided value.

The other part of indexing (or altering the index in any way) is to give an order to the index manager: "add this document", "delete this document", …​ This is done through the IndexWorkPlan class. The mapper should create a work plan whenever it needs to execute a series of works.

IndexWorkPlan carries some context usually associated to a "session" in the JPA world, including the tenant identifier when using multi-tenancy, in particular. Thus the mapper should instantiate a new work plan whenever this context changes.

For now index-scoped operations such as flush, optimize, etc. are unavailable from work plans. HSEARCH-3305 will introduce APIs and SPIs for these.

14.1.3. Searching

Searching is a bit different from indexing, in that users are presented with APIs focused on the index rather than the mapped objects. The idea is that when you search, you will mainly target index fields, not properties of mapped objects (though they may happen to have the same name).

As a result, mapper APIs only define entry points for searching so as to offer more natural ways of defining the search target and to provide additional settings. For example PojoSearchManager#search allows to define the search target using the Java classes of mapped types instead of index names. But somewhere along the API calls, mappers end up exposing generic APIs, for instance SearchQueryResultDefinitionContext or SearchPredicateContainerContext.

Those generic APIs are mostly implemented in the engine. The implementation itself relies on lower-level, less "user-focused" SPIs implemented by backends, such as SearchPredicateFactory or FieldSortBuilder.

Note that the APIs implemented by the engine include ways for the mapper to wrap the resulting search query (SearchQueryWrappingDefinitionResultContext#asWrappedQuery). Also, the SPIs implemented by backends allow mappers to inject an "object loader" (see IndexSearchTarget.query) that will essentially transform document references into the object that was initially indexed.

14.2. POJO mapper

What we call the POJO mapper is in fact an abstract basis for implementing mappers from Java objects to a full-text index. This module implements most of the necessary logic, and defines SPIs to implement the bits that are specific to each mapper.

There are currently only two implementations: the Hibernate ORM mapper, and the JavaBean mapper. The second one is mostly here to demonstrate that implementing a mapper that doesn’t rely on Hibernate ORM is possible: we do not expect much real-life usage.

The following sections do not address everything in the POJO mapper, but instead focus on the more complex parts.

14.2.1. Representation of the POJO metamodel

The bootstrapping process of the POJO mapper relies heavily on the POJO metamodel to infer what will have to be done at runtime. Multiple constructs are used to represent this metamodel.

Models

PojoTypeModel, PojoPropertyModel and similar are at the root of everything. They are SPIs, to be implemented by the Hibernate ORM mapper for instance, and they provide basic information about mapped types: Java annotations, list of properties, type of each property, "handle" to access each property on an instance of this type, …​

Container value extractor paths

ContainerValueExtractorPath and BoundContainerValueExtractorPath both represent a list of ContainerValueExtractor to be applied to a property. They allow to represent what will have to be done to get from a property of type Map<String, List<MyEntity>> to a sequence of MyEntity, for example. The difference between the "bound" version and the other is that the "bound" version was applied to a POJO model, allowing to guarantee that it will work when applied to that model, and allowing to infer the type of extracted values. See ContainerValueExtractorBinder for more information.

Paths

POJO paths come in two flavors: PojoModelPath and BoundPojoModelPath. Each has a number of subtypes representing "nodes" in a path. The POJO paths represent how to get from a given type to a given value, by accessing properties, extracting container values (see container value extractor paths above), and casting types. As for container value extractor paths, the difference between the "bound" version and the other is that the "bound" version was applied to a POJO model, allowing to guarantee that it will work when applied to that model (except for casts, obviously), and allowing to infer the type of extracted values.

Additional metadata

PojoTypeAdditionalMetadata, PojoPropertyAdditionalMetadata and PojoValueAdditionalMetadata allow to represent POJO metadata that would not typically be found in a "plain old Java object" without annotations. The metadata may come from various sources: Hibernate Search’s annotations, Hibernate Search’s programmatic API, or even from other metamodels such as Hibernate ORM’s. The "additional metadata" objects are a way to represent this metadata the same way, wherever it comes from. Examples of "additional metadata" include whether a given type is an entity type, property markers ("this property represents a latitude"), or information about inter-entity associations.

Model elements

PojoModelElement, PojoModelProperty and similar are representations of the POJO metamodel for use by Hibernate Search users in bridges. They are API, on contrary to PojoTypeModel et. al. which are SPI, but their implementation relies on both the POJO model and additional metadata. Their main purpose is to shield users from eventual changes in our SPIs, and to allow users to get "accessors" so that they can extract information from the bridge elements at runtime.

When retrieving accessors, users indirectly declare what parts of the POJO model they will extract and use in their bridge, and Hibernate Search actually makes use of this information (see Implicit reindexing resolvers).

14.2.2. Indexing processors

Indexing processors are the objects responsible for extracting data from a POJO and pushing it to a document.

Index processors are organized as trees, each node being an implementation of PojoIndexingProcessor. The POJO mapper assigns one tree to each indexed entity type.

Here are the main types of nodes:

  • PojoIndexingProcessorTypeNode: A node representing a POJO type (a Java class).

  • PojoIndexingProcessorPropertyNode: A node representing a POJO property.

  • PojoIndexingProcessorContainerElementNode: A node representing elements in a container (List, Optional, …​).

At runtime, the root node will be passed the entity to index and a handle to the document being built. Then each node will "process" its input, i.e. perform one (or more) of the following:

  • extract data from the Java object passed as input: extract the value of a property, the elements of a list, …​

  • pass the extracted data along with the handle to the document being built to a user-configured bridge, which will add fields to the document.

  • pass the extracted data along with the handle to the document being built to a nested node, which will in turn "process" its input.

For nodes representing an indexed embedded, some more work is involved to add an object field to the document and ensure nested nodes add fields to that object field instead of the root document. But this is specific to indexed embedded: manipulation of the document is generally only performed by bridges.

This representation is flexible enough to allow it to represent almost any mapping, simply by defining the appropriate node types and ensuring the indexing processor tree is built correctly, yet explicit enough to not require any metadata lookup at runtime.

Indexing processors are logged at the debug level during bootstrap. Enable this level of logging for the Hibernate Search classes if you want to understand the indexing processor tree that was generated for a given mapping.
Bootstrap

For each indexed type, the building process consists in creating a root PojoIndexingProcessorTypeNode builder, and applying metadata contributors to this builder (see Bootstrap), creating nested builders as the need arises (when a metadata contributor mentions a POJO property, for instance). Whenever an @IndexedEmbedded is found, the process is simply applied recursively on a type node created as a child of the @IndexedEmbedded property node.

As an example, let’s consider the following mapped model:

POJO model mapped using Hibernate Search

The class IndexedEntityClass is indexed. It has two mapped fields, plus an indexed-embedded on a property named embedded of type EmbeddedEntityClass. The class EmbeddedEntityClass has one mapped field, plus an indexed-embedded on a property named secondLevelEmbedded of type SecondLevelEmbeddedEntityClass. The class SecondLevelEmbeddedEntityClass, finally, has one mapped field, plus an indexed-embedded on a property named thirdLevelEmbedded of type IndexedEntityClass. To avoid any infinite recursion, the indexed-embedded is bounded to a maximum depth of 1, meaning it will embed fields mapped directly in the IndexedEntityClass type, but will not transitively include any of its indexed-embedded.

This model is converted using the process described above into this node builder tree:

Indexing processor node builder tree for the mapping above

While the mapped model was originally organized as a cyclic graph, the indexing processor nodes are organized as a tree, which means among others it is acyclic. This is necessary to be able to process entities in a straightforward way at runtime, without relying on complex logic, mutable states or metadata lookups.

This transformation from a potentially cyclic graph into a tree results from the fact we "unroll" the indexed-embedded definitions, breaking cycles by creating multiple indexing processor nodes for the same type if the type appears at different levels of embedding.

In our example, IndexedEntityClass is exactly in this case: the root node represents this type, but the type node near the bottom also represents the same type, only at a different level of embedding.

If you want to learn more about how @IndexedEmbedded path filtering, depth filtering, cycles, and prefixes are handled, a good starting point is IndexModelBindingContextImpl#addIndexedEmbeddedIfIncluded.

Ultimately, the created indexing process tree will follow approximately the same structure as the builder tree. The indexing processor tree may be a bit different from the builder tree, due to optimizations. In particular, some nodes may be trimmed down if we detect that the node will not contribute anything to documents at runtime, which may happen for some property nodes when using @IndexedEmbedded with path filtering (includePaths) or depth filtering (maxDepth).

This is the case in our example for the "embedded" node near the bottom. The builder node was created when applying and interpreting metadata, but it turns out the node does not have any child nor any bridge. As a result, this node will be ignored when creating the indexing processor.

14.2.3. Implicit reindexing resolvers

Reindexing resolvers are the objects responsible for determining, whenever an entity changes, which other entities include that changed entity in their indexed form and should thus be reindexed.

Similarly to indexing processors, the PojoImplicitReindexingResolver contains nodes organized as a tree, each node being an implementation of PojoImplicitReindexingResolverNode. The POJO mapper assigns one PojoImplicitReindexingResolver containing one tree to each indexed or contained entity type. Indexed entity types are those mapped to an index (using @Indexed or similar), while "contained" entity types are those being the target of an @IndexedEmbedded or being manipulated in a bridge using the PojoModelElement API.

Here are the main types of nodes:

  • PojoImplicitReindexingResolverOriginalTypeNode: A node representing a POJO type (a Java class).

  • PojoImplicitReindexingResolverCastedTypeNode: A node representing a POJO type (a Java class) to be casted to a supertype or subtype, applying nested nodes only if the cast succeeds.

  • PojoImplicitReindexingResolverPropertyNode: A node representing a POJO property.

  • PojoImplicitReindexingResolverContainerElementNode: A node representing elements in a container (List, Optional, …​).

  • PojoImplicitReindexingResolverDirtinessFilterNode: A node representing a filter, delegating to its nested nodes only if some precise paths are considered dirty.

  • PojoImplicitReindexingResolverMarkingNode: A node representing a value to be marked as "to reindex".

At runtime, the root node will be passed the changed entity, the "dirtiness state" of that entity (in short, a list of properties that changed in that entity), and a collector of entities to re-index. Then each node will "resolve" entities to reindex according to its input, i.e. perform one (or more) of the following:

  • check that the "dirtiness state" contains specific dirty paths that make reindexing relevant for this node

  • extract data from the Java object passed as input: extract the value of a property, the elements of a list, try to cast the object to a given type, …​

  • pass the extracted data to the collector

  • pass the extracted data along with the collector to a nested node, which will in turn "resolve" entities to reindex according to its input.

As with indexing processor, this representation is very flexible, yet explicit enough to not require any metadata lookup at runtime.

Reindexing resolvers are logged at the debug level during bootstrap. Enable this level of logging for the Hibernate Search classes if you want to understand the reindexing resolver tree that was generated for a given mapping.
Bootstrap

One reindexing resolver tree is built during bootstrap for each indexed or contained type. The entry point to building these resolvers may not be obvious: it is the indexing resolver building process. Indeed, as we build the indexing processor for a given indexed type, we discover all the paths that will be walked through in the entity graph when indexing this type, and thus what the indexed type’s indexing process definitely depends on. Which is all the information we need to build the reindexing resolvers.

In order to understand how reindexing resolvers are built, it is important to keep in mind that reindexing resolvers mirror indexing processors: if the indexing processor for entity A references entity B at some point, then you can be sure that the reindexing resolver for entity B will reference entity A at some point.

As an example, let’s consider the indexing processor builder tree from the previous section (Indexing processors):

Indexing processor node builder tree used as an input

As we build the indexing processors, we will also build another tree to represent dependencies from the root type (IndexedEntityClass) to each dependency. This is where dependency collectors come into play.

Dependency collectors are organized approximately the same way as the indexing processor builders, as a tree. A root node is provided to the root builder, then one node will be created for each of his children, and so on. Along the way, each builder will be able to notify its dependency collector that it will actually build an indexing processor (it wasn’t trimmed down due to some optimization), which means the node needs to be taken into account in the dependency tree. This is done through the PojoIndexingDependencyCollectorValueNode#collectDependency method, which triggers some additional steps.

TypeBridge and PropertyBridge implementations are allowed to go through associations and access properties from different entities. For this reason, when such bridges appear in an indexing processor, we create dependency collector nodes as necessary to model the bridge’s dependencies. For more information, see PojoModelTypeRootElement#contributeDependencies (type bridges) and PojoModelPropertyRootElement#contributeDependencies (property bridges).

Let’s see what our dependency collector tree will ultimately look like:

Dependency collector tree for the indexing processor node builder tree above

The value nodes in red are those that we will mark as a dependency using PojoIndexingDependencyCollectorValueNode#collectDependency. The embedded property at the bottom will be detected as not being used during indexing, so the corresponding value node will not be marked as a dependency, but all the other value nodes will.

The actual reindexing resolver building happens when PojoIndexingDependencyCollectorValueNode#collectDependency is called for each value node. To understand how it works, let us use the value node for longField as an example.

When collectDependency is called on this node, the dependency collector will first backtrack to the last encountered entity type, because that is the type for which "change events" will be received by the POJO mapper. Once this entity type is found, the dependency collector type node will retrieve the reindexing resolver builder for this type from a common pool, shared among all dependency collectors for all indexed types.

Reindexing resolver builders follow the same structure as the reindexing resolvers they build: they are nodes in a tree, and there is one type of builder for each type of reindexing resolver node: PojoImplicitReindexingResolverOriginalTypeNodeBuilder, PojoImplicitReindexingResolverPropertyNodeBuilder, …​

Back to our example, when collectDependency is called on the value node for longField, we backtrack to the last encountered entity type, and the dependency collector type node retrieves what will be the builder of our "root" reindexing resolver node:

Initial state of the reindexing resolver builder

From there, the reindexing resolver builder is passed to the next dependency collector value node using the PojoIndexingDependencyCollectorValueNode#markForReindexing method. This method also takes as a parameter the path to the property that is depended on, in this case longField.

The value node will then use its knowledge of the dependency tree (using its ancestors in the dependency collector tree) to build a BoundPojoModelPath from the previous entity type to that value. In our case, this path is Type EmbeddedEntityClass ⇒ Property "secondLevelEmbedded" ⇒ No container value extractor.

This path represents an association between two entity types: EmbeddedEntityClass on the containing side, and SecondLevelEmbeddedEntityClass on the contained side. In order to complete the reindexing resolver tree, we need to invert this association, i.e. find out the inverse path from SecondLevelEmbeddedEntityClass to EmbeddedEntityClass. This is done in PojoAssociationPathInverter using the "additional metadata" mentioned in Representation of the POJO metamodel.

Once the path is successfully inverted, the dependency collector value node can add new children to the reindexing resolver builder:

State of the reindexing resolver builder after inverting "secondLevelEmbedded"

The resulting reindexing resolver builder is then passed to the next dependency collector value node, and the process repeats:

State of the reindexing resolver builder after inverting "embedded"

Once we reach the dependency collector root, we are almost done. The reindexing resolver builder tree has been populated with every node needed to reindex IndexedEntityClass whenever a change occurs in the longField property of SecondLevelEmbeddedEntityClass.

The only thing left to do is register the path that is depended on (in our example, longField). With this path registered, we will be able to build a PojoPathFilter, so that whenever SecondLevelEmbeddedEntityClass changes, we will walk through the tree, but not all the tree: if at some point we notice that a node is relevant only if longField changed, but the "dirtiness state" tells us that longField did not change, we can skip a whole branch of the tree, avoiding useless lazy loading and reindexing.

The example above was deliberately simple, to give a general idea of how reindexing resolvers are built. In the actual algorithm, we have to handle several circumstances that make the whole process significantly more complex:

Polymorphism

Due to polymorphism, the target of an association at runtime may not be of the exact type declared in the model. Also because of polymorphism, an association may be defined on an abstract entity type, but have different inverse sides, and even different target types, depending on the concrete entity subtype.

There are all sorts of intricate corner cases to take into account, but they are for the main part addressed this way:

  • Whenever we create a type node in the reindexing resolver building tree, we take care to determine all the possible concrete entity types for the considered type, and create one reindexing resolver type node builder per possible entity type.

  • Whenever we resolve the inverse side of an association, take care to resolve it for every concrete "source" entity type, and to apply all of the resulting inverse paths.

If you want to observe the algorithm handling this live, try debugging AutomaticIndexingPolymorphicOriginalSideAssociationIT or AutomaticIndexingPolymorphicInverseSideAssociationIT, and put breakpoints in the collectDependency/markForReindexing methods of dependency collectors.

Embedded types

Types in the dependency collector tree may not always be entity types. Thus, the path of associations (both the ones to invert and the inverse paths) may be more complex than just one property plus one container value extractor.

If you want to observe the algorithm handling this live, try debugging AutomaticIndexingEmbeddableIT, and put breakpoints in the collectDependency/markForReindexing methods of dependency collectors.

Fine-grained dirty checking

Fine-grained dirty checking consists in keeping track of which properties are dirty in a given entity, so as to only reindex "containing" entities that actually use at least one of the dirty properties. Without this, Hibernate Search could trigger unnecessary reindexing from time to time, which could have a very bad impact on performance depending on the user model.

In order to implement fined-grained dirty checking, each reindexing resolver node builder not only stores the information that the corresponding node should be reindexed whenever the root entity changes, but it also keeps track of which properties of the root entity should trigger reindexing of this particular node. Each builder keeps this state in a PojoImplicitReindexingResolverMarkingNodeBuilder instance it delegates to.

If you want to observe the algorithm handling this live, try debugging AutomaticIndexingBasicIT.directValueUpdate_nonIndexedField, and put breakpoints in the collectDependency/markForReindexing methods of dependency collectors (to see what happens at bootstrap), and in the resolveEntitiesToReindex method of PojoImplicitReindexingResolverDirtinessFilterNode (to see what happens at runtime).

14.3. JSON mapper

The JSON mapper does not currently exist, but there are plans to work on it.

15. Further reading

This section is incomplete. It will be completed during the Alpha/Beta phases of Hibernate Search 6.0.0.

16. Credits

The full list of contributors to Hibernate Search can be found in the copyright.txt file in the Hibernate Search sources, available in particular in our git repository.

The following contributors have been involved in this documentation:

  • Emmanuel Bernard

  • Hardy Ferentschik

  • Gustavo Fernandes

  • Sanne Grinovero

  • Mincong Huang

  • Nabeel Ali Memon

  • Gunnar Morling

  • Yoann Rodière

  • Guillaume Smet