Hibernate.orgCommunity Documentation
Welcome to Hibernate Search. The following chapter will guide you through the initial steps required to integrate Hibernate Search into an existing Hibernate enabled application. In case you are a Hibernate new timer we recommend you start here.
Table 1.1. System requirements
Java Runtime | A JDK or JRE version 5 or greater. You can download a Java Runtime for Windows/Linux/Solaris here. If using Java version 7 make sure you avoid builds 0 and 1: those versions contained an optimisation bug which would be triggered by Lucene. |
Hibernate Search | hibernate-search-3.4.2.Final.jar and all
runtime dependencies. You can get the jar artifacts either from
the dist/lib directory of the Hibernate
Search distribution or you can download them from the
JBoss
maven repository. |
Hibernate Core | This instructions have been tested against Hibernate 3.6.
You will need
hibernate-core-3.6.7.Final.jar and its
transitive dependencies (either from the distribution
bundle or the maven repository). |
JPA 2 | Even though Hibernate Search can be used without JPA
annotations the following instructions will use them for basic
entity configuration (@Entity, @Id,
@OneToMany,...). This part of the configuration could
also be expressed in xml or code. Hibernate Search, however, has itself its own set of annotations (@Indexed, @DocumentId, @Field,...) for which there exists so far no alternative configuration. |
Instead of managing all dependencies manually, maven users have the
possibility to use the JBoss
maven repository. Add the following to your Maven
settings.xml file
(see also Maven
Getting Started):
Example 1.1. Adding the JBoss maven repository to
settings.xml
<settings>
...
<profiles>
...
<profile>
<id>jboss-public-repository</id>
<repositories>
<repository>
<id>jboss-public-repository-group</id>
<name>JBoss Public Maven Repository Group</name>
<url>https://repository.jboss.org/nexus/content/groups/public-jboss/</url>
<layout>default</layout>
<releases>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>jboss-public-repository-group</id>
<name>JBoss Public Maven Repository Group</name>
<url>https://repository.jboss.org/nexus/content/groups/public-jboss/</url>
<layout>default</layout>
<releases>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>never</updatePolicy>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
<activeProfiles>
<activeProfile>jboss-public-repository</activeProfile>
</activeProfiles>
...
</settings>
Then add the following dependencies to your pom.xml:
Example 1.2. Maven dependencies for Hibernate Search
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search</artifactId>
<version>3.4.2.Final</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>3.6.7.Final</version>
</dependency>
Only the hibernate-search dependency is mandatory, because it contains together with its required transitive dependencies all required classes needed to use Hibernate Search. hibernate-entitymanager is only required if you want to use Hibernate Search in conjunction with JPA.
There is no XML configuration available for Hibernate Search but we provide a powerful programmatic mapping API that elegantly replace this kind of deployment form (see Section 4.6, “Programmatic API” for more information).
Once you have downloaded and added all required dependencies to your
application you have to add a couple of properties to your hibernate
configuration file. If you are using Hibernate directly this can be done
in hibernate.properties
or
hibernate.cfg.xml
. If you are using Hibernate via JPA
you can also add the properties to persistence.xml
. The
good news is that for standard use most properties offer a sensible
default. An example persistence.xml
configuration
could look like this:
Example 1.3. Basic configuration options to be added to
,
hibernate.properties
or
hibernate.cfg.xml
persistence.xml
...
<property name="hibernate.search.default.directory_provider"
value="filesystem"/>
<property name="hibernate.search.default.indexBase"
value="/var/lucene/indexes"/>
...
First you have to tell Hibernate Search which
DirectoryProvider
to use. This can be achieved by
setting the hibernate.search.default.directory_provider
property. Apache Lucene has the notion of a Directory
to store the index files. Hibernate Search handles the initialization and
configuration of a Lucene Directory
instance via a
DirectoryProvider
. In this tutorial we will use a a
directory provider storing the index in the file system. This will give us
the ability to physically inspect the Lucene indexes created by Hibernate
Search (eg via Luke).
Once you have a working configuration you can start experimenting with
other directory providers (see Section 3.2, “Directory configuration”). Next to the directory
provider you also have to specify the default base directory for all
indexes via hibernate.search.default.indexBase
.
Lets assume that your application contains the Hibernate managed
classes example.Book
and
example.Author
and you want to add free text search
capabilities to your application in order to search the books contained in
your database.
Example 1.4. Example entities Book and Author before adding Hibernate Search specific annotations
package example;
...
@Entity
public class Book {
@Id
@GeneratedValue
private Integer id;
private String title;
private String subtitle;
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
private Date publicationDate;
public Book() {}
// standard getters/setters follow here
...
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
private String name;
public Author() {}
// standard getters/setters follow here
...
}
To achieve this you have to add a few annotations to the
Book
and Author
class. The
first annotation @Indexed
marks
Book
as indexable. By design Hibernate Search needs
to store an untokenized id in the index to ensure index unicity for a
given entity. @DocumentId
marks the property to use for
this purpose and is in most cases the same as the database primary key.
The @DocumentId
annotation is optional in the case
where an @Id
annotation exists.
Next you have to mark the fields you want to make searchable. Let's
start with title
and subtitle
and
annotate both with @Field
. The parameter
index=Index.TOKENIZED
will ensure that the text will be
tokenized using the default Lucene analyzer. Usually, tokenizing means
chunking a sentence into individual words and potentially excluding common
words like 'a'
or 'the
'. We will
talk more about analyzers a little later on. The second parameter we
specify within @Field
,
store=Store.NO
, ensures that the actual data will not be stored
in the index. Whether this data is stored in the index or not has nothing
to do with the ability to search for it. From Lucene's perspective it is
not necessary to keep the data once the index is created. The benefit of
storing it is the ability to retrieve it via projections ( see Section 5.1.3.5, “Projection”).
Without projections, Hibernate Search will per default execute a Lucene query in order to find the database identifiers of the entities matching the query critera and use these identifiers to retrieve managed objects from the database. The decision for or against projection has to be made on a case to case basis. The default behaviour is recommended since it returns managed objects whereas projections only return object arrays.
After this short look under the hood let's go back to annotating the
Book
class. Another annotation we have not yet
discussed is @DateBridge
. This annotation is one of the
built-in field bridges in Hibernate Search. The Lucene index is purely
string based. For this reason Hibernate Search must convert the data types
of the indexed fields to strings and vice versa. A range of predefined
bridges are provided, including the DateBridge
which will convert a java.util.Date
into a
String
with the specified resolution. For more
details see Section 4.4, “Bridges”.
This leaves us with @IndexedEmbedded.
This
annotation is used to index associated entities
(@ManyToMany
, @*ToOne
and
@Embedded
) as part of the owning entity. This is needed
since a Lucene index document is a flat data structure which does not know
anything about object relations. To ensure that the authors' name will be
searchable you have to make sure that the names are indexed as part of the
book itself. On top of @IndexedEmbedded
you will also
have to mark all fields of the associated entity you want to have included
in the index with @Indexed
. For more details see Section 4.1.3, “Embedded and associated objects”.
These settings should be sufficient for now. For more details on entity mapping refer to Section 4.1, “Mapping an entity”.
Example 1.5. Example entities after adding Hibernate Search annotations
package example;
...
@Entity @Indexed
public class Book {
@Id
@GeneratedValue
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String title;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String subtitle;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>(); @Field(index = Index.UN_TOKENIZED, store = Store.YES) @DateBridge(resolution = Resolution.DAY)
private Date publicationDate;
public Book() {
}
// standard getters/setters follow here
...
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String name;
public Author() {
}
// standard getters/setters follow here
...
}
Hibernate Search will transparently index every entity persisted, updated or removed through Hibernate Core. However, you have to create an initial Lucene index for the data already present in your database. Once you have added the above properties and annotations it is time to trigger an initial batch index of your books. You can achieve this by using one of the following code snippets (see also Section 6.3, “Rebuilding the whole index”):
Example 1.6. Using Hibernate Session to index data
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.createIndexer().startAndWait();
Example 1.7. Using JPA to index data
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
fullTextEntityManager.createIndexer().startAndWait();
After executing the above code, you should be able to see a Lucene
index under /var/lucene/indexes/example.Book
. Go ahead
an inspect this index with Luke. It will help you to
understand how Hibernate Search works.
Now it is time to execute a first search. The general approach is to
create a Lucene query (either via the Lucene API (Section 5.1.1, “Building a Lucene query using the Lucene API”) or via the Hibernate Search query
DSL (Section 5.1.2, “Building a Lucene query with the Hibernate Search query
DSL”)) and then wrap this query
into a org.hibernate.Query
in order to get all the
functionality one is used to from the Hibernate API. The following code
will prepare a query against the indexed fields, execute it and return a
list of Book
s.
Example 1.8. Using Hibernate Session to create and execute a search
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
// create native Lucene query unsing the query DSL
// alternatively you can write the Lucene query using the Lucene query parser
// or the Lucene programmatic API. The Hibernate Search DSL is recommended though
QueryBuilder qb = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity( Book.class ).get();
org.apache.lucene.search.Query query = qb
.keyword()
.onFields("title", "subtitle", "authors.name", "publicationDate")
.matching("Java rocks!");
.createQuery();
// wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibQuery =
fullTextSession.createFullTextQuery(query, Book.class);
// execute search
List result = hibQuery.list();
tx.commit();
session.close();
Example 1.9. Using JPA to create and execute a search
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager =
org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
em.getTransaction().begin();
// create native Lucene query unsing the query DSL
// alternatively you can write the Lucene query using the Lucene query parser
// or the Lucene programmatic API. The Hibernate Search DSL is recommended though
QueryBuilder qb = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity( Book.class ).get();
org.apache.lucene.search.Query query = qb
.keyword()
.onFields("title", "subtitle", "authors.name", "publicationDate")
.matching("Java rocks!")
.createQuery();
// wrap Lucene query in a javax.persistence.Query
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, Book.class);
// execute search
List result = persistenceQuery.getResultList();
em.getTransaction().commit();
em.close();
Let's make things a little more interesting now. Assume that one of your indexed book entities has the title "Refactoring: Improving the Design of Existing Code" and you want to get hits for all of the following queries: "refactor", "refactors", "refactored" and "refactoring". In Lucene this can be achieved by choosing an analyzer class which applies word stemming during the indexing as well as the search process. Hibernate Search offers several ways to configure the analyzer to be used (see Section 4.3.1, “Default analyzer and analyzer by class”):
Setting the hibernate.search.analyzer
property in the configuration file. The specified class will then be
the default analyzer.
Setting the
annotation at the entity level.@Analyzer
Setting the @
annotation at the field level.Analyzer
When using the @Analyzer
annotation one can
either specify the fully qualified classname of the analyzer to use or one
can refer to an analyzer definition defined by the
@AnalyzerDef
annotation. In the latter case the Solr
analyzer framework with its factories approach is utilized. To find out
more about the factory classes available you can either browse the Solr
JavaDoc or read the corresponding section on the Solr
Wiki.
In the example below a
StandardTokenizerFactory
is used followed by two
filter factories, LowerCaseFilterFactory
and
SnowballPorterFilterFactory
. The standard tokenizer
splits words at punctuation characters and hyphens while keeping email
addresses and internet hostnames intact. It is a good general purpose
tokenizer. The lowercase filter lowercases the letters in each token
whereas the snowball filter finally applies language specific
stemming.
Generally, when using the Solr framework you have to start with a tokenizer followed by an arbitrary number of filters.
Example 1.10. Using @AnalyzerDef
and the Solr framework
to define and use an analyzer
@Entity
@Indexed @AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") }) })
public class Book {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
@Analyzer(definition = "customanalyzer")
private String title;
@Field(index=Index.TOKENIZED, store=Store.NO)
@Analyzer(definition = "customanalyzer")
private String subtitle;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>(); @Field(index = Index.UN_TOKENIZED, store = Store.YES)
@DateBridge(resolution = Resolution.DAY)
private Date publicationDate;
public Book() {
}
// standard getters/setters follow here
...
}
The above paragraphs helped you getting an overview of Hibernate Search. The next step after this tutorial is to get more familiar with the overall architecture of Hibernate Search (Chapter 2, Architecture) and explore the basic features in more detail. Two topics which were only briefly touched in this tutorial were analyzer configuration (Section 4.3.1, “Default analyzer and analyzer by class”) and field bridges (Section 4.4, “Bridges”). Both are important features required for more fine-grained indexing. More advanced topics cover clustering (Section 3.6, “JMS Master/Slave configuration”, Section 3.8, “Infinispan Directory configuration”) and large index handling (Section 3.3, “Sharding indexes”).