Hibernate.orgCommunity Documentation
Welcome to Hibernate Search. The following chapter will guide you through the initial steps required to integrate Hibernate Search into an existing Hibernate ORM enabled application. In case you are a Hibernate new timer we recommend you start here.
Table 1.1. System requirements
Java Runtime | Requires Java version 7 or greater. You can download a Java Runtime for Windows/Linux/Solaris here. |
Hibernate Search |
|
Hibernate Core | You will need
|
JPA 2.1 | Hibernate Search can be used without JPA but the following instructions will use JPA annotations for basic
entity configuration ( |
If you are upgrading an existing application from an earlier version of Hibernate Search to the latest release, make sure to check the out the migration guide.
The Hibernate Search library is split in several modules to allow you to pick the minimal set of dependencies you need. It requires Apache Lucene, Hibernate ORM and some standard APIs such as the Java Persistence API and the Java Transactions API. Other dependencies are optional, providing additional integration points. To get the correct jar files on your classpath we highly recommend to use a dependency manager such as Maven, or similar tools such as Gradle or Ivy. These alternatives are also able to consume the artifacts from the Section 1.3.1, “Using Maven” section.
The Hibernate Search artifacts can be found in Maven’s Central Repository but are released first in the JBoss Maven Repository. See also the Maven Getting Started wiki page to use the JBoss repository.
All you have to add to your pom.xml is:
Example 1.1. Maven artifact identifier for Hibernate Search
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-orm</artifactId> <version>5.1.1.Final</version> </dependency>
Example 1.2. Optional Maven dependencies for Hibernate Search
<dependency> <!-- If using JPA, add: --> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-entitymanager</artifactId> <version>4.3.8.Final</version> </dependency> <!-- Infinispan integration: --> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-infinispan</artifactId> <version>5.1.1.Final</version> </dependency>
Only the hibernate-search-orm dependency is mandatory. hibernate-entitymanager is only required if you want to use Hibernate Search in conjunction with JPA.
You can download zip bundles from Sourcefroge containing all needed Hibernate Search dependencies. This includes - among others - the latest compatible version of Hibernate ORM. However, only the essential parts you need to start experimenting with are included. You will probably need to combine this with downloads from the other projects, for example the Hibernate ORM distribution on Sourceforge also provides the modules to enable caching or use a connection pool.
If you are creating an application to be deployed on WildFly you’re lucky: Hibernate Search is included in the application server. This means that you don’t need to package it along with your application but remember that you need to activate the module, see Section 3.9, “Hibernate Search as a WildFly module” for details.
Due to he modular design of WildFly, you can also bundle a more recent version of Hibernate Search than the one included in the popular application server. This is also explained further in Section 3.9, “Hibernate Search as a WildFly module”.
Once you have added all required dependencies to your application you have to add a couple of
properties to your Hibernate configuration file.
If you are using Hibernate directly this can be done in hibernate.properties
or hibernate.cfg.xml
.
If you are using Hibernate via JPA you can also add the properties to persistence.xml
.
The good news is that for standard use most properties offer a sensible default.
An example persistence.xml
configuration could look like this:
Example 1.3. Basic configuration options to be added to hibernate.properties
, hibernate.cfg.xml
or persistence.xml
... <property name="hibernate.search.default.directory_provider" value="filesystem"/> <property name="hibernate.search.default.indexBase" value="/var/lucene/indexes"/> ...
First you have to tell Hibernate Search which DirectoryProvider
to use. This can be achieved by
setting the hibernate.search.default.directory_provider
property. Apache Lucene has the notion
of a Directory
to store the index files. Hibernate Search handles the initialization and
configuration of a Lucene Directory
instance via a DirectoryProvider
. In this tutorial we will
use a a directory provider which stores the index on the file system. This will give us the ability to
inspect the Lucene indexes created by Hibernate Search (eg via
Luke). Once you have a working configuration you can start
experimenting with other directory providers (see Section 3.3, “Directory configuration”).
You also have to specify the default base directory for all indexes via
hibernate.search.default.indexBase
. This defines the path where indexes are stored.
Let’s assume that your application contains the Hibernate managed classes example.Book
and
example.Author
and you want to add free text search capabilities to your application in order to
search the books contained in your database.
Example 1.4. Example entities Book and Author before adding Hibernate Search specific annotations
package example; ... @Entity public class Book { @Id @GeneratedValue private Integer id; private String title; private String subtitle; @ManyToMany private Set<Author> authors = new HashSet<Author>(); private Date publicationDate; public Book() {} // standard getters/setters follow ... }
package example; ... @Entity public class Author { @Id @GeneratedValue private Integer id; private String name; public Author() {} // standard getters/setters follow ... }
To achieve this you have to add a few annotations to the Book
and Author
class. The first annotation
@Indexed
marks Book
as indexable. By design Hibernate Search needs to store an untokenized id in
the index to ensure index uniqueness for a given entity (for now don’t worry if you don’t know what
untokenized means, it will soon be clear).
Next you have to mark the fields you want to make searchable. Let’s start with title
and
subtitle
and annotate both with @Field
. The parameter index=Index.YES
will ensure that the
text will be indexed, while analyze=Analyze.YES
ensures that the text will be analyzed using the
default Lucene analyzer. Usually, analyzing or tokenizing means chunking a sentence into individual
words and potentially excluding common words like "a" or "the". We will talk more about analyzers a
little later on.
The third parameter we specify is store=Store.NO
, which ensures that the actual data
will not be stored in the index.
Whether data is stored in the index or not has nothing to do with the ability to search for it.
It is not necessary to store fields in the index to allow Lucene to search for them: the benefit of
storing them is the ability to retrieve them via projections (see Section 5.1.3.5, “Projection”).
Without projections, Hibernate Search will per default execute a Lucene query in order to find the database identifiers of the entities matching the query criteria and use these identifiers to retrieve managed objects from the database. The decision for or against projection has to be made on a case by case basis.
Note that index=Index.YES
, analyze=Analyze.YES
and store=Store.NO
are the default values for
these parameters and could be omitted.
After this short look under the hood let’s go back to annotating the Book
class. Another annotation
we have not yet discussed is @DateBridge
. This annotation is one of the built-in field bridges in
Hibernate Search. The Lucene index is mostly string based, with special support for encoding numbers.
Hibernate Search must convert the data types of the indexed fields to their respective Lucene
encoding and vice versa. A range of predefined bridges is provided for this purpose, including the
DateBridge
which will convert a java.util.Date
into a numeric value (a long
) with the
specified resolution. For more details see Section 4.4.1, “Built-in bridges”.
This leaves us with @IndexedEmbedded
. This annotation is used to index associated entities
(@ManyToMany
, @*ToOne
, @Embedded
and @ElementCollection
) as part of the owning entity.
This is needed since a Lucene index document is a flat data structure which does not know anything
about object relations.
To ensure that the author names will be searchable you have to make sure that the names are indexed
as part of the book itself. On top of @IndexedEmbedded
you will also have to mark all fields of
the associated entity you want to have included in the index with @Indexed
.
For more details see Section 4.1.3, “Embedded and associated objects”.
These settings should be sufficient for now. For more details on entity mapping refer to Section 4.1, “Mapping an entity”.
Example 1.5. Example entities after adding Hibernate Search annotations
package example; ... @Entity @Indexed public class Book { @Id @GeneratedValue private Integer id; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String title; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String subtitle; @Field(index = Index.YES, analyze=Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; @IndexedEmbedded @ManyToMany private Set<Author> authors = new HashSet<Author>(); public Book() { } // standard getters/setters follow here ... }
@Entity public class Author { @Id @GeneratedValue private Integer id; @Field private String name; public Author() { } // standard getters/setters follow here ... }
Hibernate Search will transparently index every entity persisted, updated or removed through Hibernate ORM. However, you have to create an initial Lucene index for the data already present in your database. Once you have added the above properties and annotations it is time to trigger an initial batch index of your books. You can achieve this by using one of the following code snippets (see also Section 6.3, “Rebuilding the whole index”):
Example 1.6. Using Hibernate Session to index data
FullTextSession fullTextSession = Search.getFullTextSession(session); fullTextSession.createIndexer().startAndWait();
Example 1.7. Using JPA to index data
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em); fullTextEntityManager.createIndexer().startAndWait();
After executing the above code, you should be able to see a Lucene index under /var/lucene/indexes/example.Book
(or based on a different path depending how you configured the property hibernate.search.default.directory_provider
).
Go ahead an inspect this index with Luke: it will help you to understand how Hibernate Search works.
Now it is time to execute a first search. The general approach is to create a Lucene query, either
via the Lucene API (Section 5.1.1, “Building a Lucene query using the Lucene API”) or via the Hibernate Search query DSL
(Section 5.1.2, “Building a Lucene query with the Hibernate Search query DSL”), and then wrap this query into a org.hibernate.Query
in order to get all the
functionality one is used to from the Hibernate API. The following code will prepare a query against
the indexed fields, execute it and return a list of Book
instances.
Example 1.8. Using Hibernate Session to create and execute a search
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); // create native Lucene query using the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity(Book.class).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a org.hibernate.Query org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(query, Book.class); // execute search List result = hibQuery.list(); tx.commit(); session.close();
Example 1.9. Using JPA to create and execute a search
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); em.getTransaction().begin(); // create native Lucene query using the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextEntityManager.getSearchFactory() .buildQueryBuilder().forEntity(Book.class).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a javax.persistence.Query javax.persistence.Query persistenceQuery = fullTextEntityManager.createFullTextQuery(query, Book.class); // execute search List result = persistenceQuery.getResultList(); em.getTransaction().commit(); em.close();
Let’s make things a little more interesting now. Assume that one of your indexed book entities has the title "Refactoring: Improving the Design of Existing Code" and you want to get hits for all of the following queries: "refactor", "refactors", "refactored" and "refactoring". In Lucene this can be achieved by choosing an analyzer class which applies word stemming during the indexing as well as the search process. Hibernate Search offers several ways to configure the analyzer to be used (see Section 4.3.1, “Default analyzer and analyzer by class”):
hibernate.search.analyzer
property in the configuration file.
The specified class will then be the default analyzer.
@Analyzer
annotation at the entity level.
@Analyzer
annotation at the field level.
When using the @Analyzer
annotation one can either specify the fully qualified classname of the
analyzer to use or one can refer to an analyzer definition defined by the @AnalyzerDef
annotation.
In the latter case the analyzer framework with its factories approach is utilized.
To find out more about the factory classes available you can either browse the Lucene JavaDoc or read the corresponding section on the Solr Wiki.
You can use @AnalyzerDef
or @AnalyzerDefs
on any:
*@Indexed
entity regardless of where the analyzer is applied to;
* parent class of an @Indexed
entity;
* package-info.java of a package containing an @Indexed
entity.
This implies that analyzer definitions are global and their names must be unique.
Why the reference to the Apache Solr wiki?
Apache Solr was historically an indepedent sister project of Apache Lucene and the analyzer factory framework was originally created in Solr. Since then the Apache Lucene and Solr projects have merged, but the documentation for these additional analyzers can still be found in the Solr Wiki. You might find other documentation referring to the "Solr Analyzer Framework" - just remember you don’t need to depend on Apache Solr anymore to use it. The required classes are part of the core Lucene distribution.
In the example below a StandardTokenizerFactory
is used followed by two filter factories,
LowerCaseFilterFactory
and SnowballPorterFilterFactory
. The standard tokenizer splits words at
punctuation characters and hyphens while keeping email addresses and internet hostnames intact. It
is a good general purpose tokenizer.
The lowercase filter converts to lowercase the letters in each token
whereas the snowball filter finally applies language specific stemming.
Generally, when using the Analyzer Framework you have to start with a tokenizer followed by an arbitrary number of filters.
Example 1.10. Using @AnalyzerDef
and the Analyzer Framework to define and use an analyzer
@Entity @Indexed @AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") }) }) public class Book { @Id @GeneratedValue @DocumentId private Integer id; @Field @Analyzer(definition = "customanalyzer") private String title; @Field @Analyzer(definition = "customanalyzer") private String subtitle; @IndexedEmbedded @ManyToMany private Set<Author> authors = new HashSet<Author>(); @Field(index = Index.YES, analyze = Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; public Book() { } // standard getters/setters follow here ... }
Using @AnalyzerDef
only defines an Analyzer, you still have to apply it to entities and or
properties using @Analyzer
. Like in the above example the customanalyzer
is defined but not
applied on the entity: it’s applied on the title
and subtitle
properties only. An analyzer
definition is global, so you can define it on any entity and reuse the definition on other entities.
The above paragraphs helped you getting an overview of Hibernate Search. The next step after this tutorial is to get more familiar with the overall architecture of Hibernate Search (Chapter 2, Architecture) and explore the basic features in more detail. Two topics which were only briefly touched in this tutorial were analyzer configuration (Section 4.3.1, “Default analyzer and analyzer by class”) and field bridges (Section 4.4, “Bridges”). Both are important features required for more fine-grained indexing. More advanced topics cover clustering (Section 3.4.1, “JMS Master/Slave back end”, Section 3.3.1, “Infinispan Directory configuration”) and large index handling (Section 10.4, “Sharding indexes”).