Hibernate.orgCommunity Documentation
The SearchFactory
object keeps track of the
underlying Lucene resources for Hibernate Search, it's also a convenient
way to access Lucene natively. The SearchFactory
can be accessed from a FullTextSession
:
Example 8.1. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
You can always access the Lucene directories through plain Lucene,
the Directory structure is in no way different with or without Hibernate
Search. However there are some more convenient ways to access a given
Directory. The SearchFactory
keeps track of the
DirectoryProvider
s per indexed class. One directory
provider can be shared amongst several indexed classes if the classes
share the same underlying index directory. While usually not the case, a
given entity can have several DirectoryProvider
s if
the index is sharded (see Section 3.2, “Sharding indexes”).
Example 8.2. Accessing the Lucene Directory
DirectoryProvider[] provider = searchFactory.getDirectoryProviders(Order.class); org.apache.lucene.store.Directory directory = provider[0].getDirectory();
In this example, directory points to the lucene index storing
Order
s information. Note that the obtained Lucene
directory must not be closed (this is Hibernate Search
responsibility).
Queries in Lucene are executed on an IndexReader
.
Hibernate Search caches all index readers to maximize performance. Your
code can access this cached resources, but you have to follow some "good
citizen" rules.
Example 8.3. Accessing an IndexReader
DirectoryProvider orderProvider = searchFactory.getDirectoryProviders(Order.class)[0]; DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0]; ReaderProvider readerProvider = searchFactory.getReaderProvider(); IndexReader reader = readerProvider.openReader(orderProvider, clientProvider); try { //do read-only operations on the reader } finally { readerProvider.closeReader(reader); }
The ReaderProvider (described in Reader strategy), will open an IndexReader
on top of the index(es) referenced by the directory providers. Because
this IndexReader
is shared amongst several clients,
you must adhere to the following rules:
Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.
Don't use this IndexReader
for
modification operations (you would get an exception). If you want to
use a read/write index reader, open one from the Lucene Directory
object.
Aside from those rules, you can use the IndexReader freely,
especially to do native queries. Using the shared
IndexReader
s will make most queries more
efficient.
Lucene allows the user to customize its scoring formula by extending
org.apache.lucene.search.Similarity
. The abstract
methods defined in this class match the factors of the following formula
calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑t in q ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )
Factor | Description |
---|---|
tf(t ind) | Term frequency factor for the term (t) in the document (d). |
idf(t) | Inverse document frequency of the term. |
coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
t.getBoost() | Field boost. |
norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
Similarity
's Javadocs for more information.
Hibernate Search provides two ways to modify Lucene's similarity
calculation. First you can set the default similarity by specifying the
fully specified classname of your Similarity
implementation using the property
hibernate.search.similarity
. The default value is
org.apache.lucene.search.DefaultSimilarity
.
Additionally you can override the default similarity on class level using
the @Similarity
annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}
As an example, let's assume it is not important how often a
term appears in a document. Documents with a single occurrence of the term
should be scored the same as documents with multiple occurrences. In this
case your custom implementation of the method tf(float
freq)
should return 1.0.
When two entities share the same index they must declare the
same Similarity
implementation. Classes in the same
class hierarchy always share the index, so it's not allowed to override the
Similarity
implementation in a subtype.