From time to time, the Lucene index needs to be optimized. The process is essentially a defragmentation: until the optimization occurs, deleted documents are just marked as such, no physical deletion is applied, the optimization can also adjust the number of files in the Lucene Directory.
The optimization speeds up searches but in no way speeds up indexation (update). During an optimization, searches can be performed (but will most likely be slowed down), and all index updates will be stopped. Prefer optimizing:
on an idle system or when the searches are less frequent
after a lot of index modifications (doing so before will not speed up the indexation process)
Hibernate Search can optimize automatically an index after:
a certain amount of operations have been applied (insertion, deletion)
or a certain amout of transactions have been applied
The configuration can be global or defined at the index level:
hibernate.search.default.optimizer.operation_limit.max = 1000 hibernate.search.default.optimizer.transaction_limit.max = 100 hibernate.search.Animal.optimizer.transaction_limit.max = 50
An optimization will be triggered to the Animal index as soon as either:
the number of addition and deletion reaches 1000
the number of transactions reaches 50 (hibernate.search.Animal.optimizer.transaction_limit.max having priority over hibernate.search.default.optimizer.transaction_limit.max)
If none of these parameters are defined, not optimization is processed automatically.
You can programmatically optimize (defragment) a Lucene index from Hibernate Search through the SearchFactory
searchFactory.optimize(Order.class); searchFactory.optimize();
The first example reindex the Lucene index holding Orders, the second, optimize all indexes.
The SearchFactory can be accessed from a FullTextSession:
FullTextSession fullTextSession = Search.createFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
Note that searchFactory.optimize() has no effect on a JMS backend. You must apply the optimize operation on the Master node.
Apache Lucene has a few parameters to influence how optimization is performed. Hibernate Search expose those parameters.
Further index optimisation parameters include hibernate.search.[default|<indexname>].merge_factor, hibernate.search.[default|<indexname>].max_merge_docs and hibernate.search.[default|<indexname>].max_buffered_docs - see Section 3.7, “Tuning Lucene indexing performance” for more details.