It is sometimes useful to index an entity even if this entity is not
inserted or updated to the database. This is for example the case when you
want to build your index for the first time.
FullTextSession
.index()
allows you to do so.
Example 6.1. Indexing an entity via
FullTextSession.index()
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
fullTextSession.index(customer);
}
tx.commit(); //index are written at commit time
For maximum efficiency, Hibernate Search batches index operations
and executes them at commit time. If you expect to index a lot of data,
however, you need to be careful about memory consumption since all
documents are kept in a queue until the transaction commit. You can
potentially face an OutOfMemoryException
. To avoid
this exception, you can use
fullTextSession.flushToIndexes()
. Every time
fullTextSession.flushToIndexes()
is called (or if
the transaction is committed), the batch queue is processed (freeing
memory) applying all index changes. Be aware that once flushed changes
cannot be rolled back.
hibernate.search.worker.batch_size
has been
deprecated in favor of this explicit API which provides better
control
Other parameters which also can affect indexing time and memory consumption are:
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_buffered_docs
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_field_length
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_merge_docs
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].merge_factor
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].ram_buffer_size
hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].term_index_interval
These parameters are Lucene specific and Hibernate Search is just passing these parameters through - see Section 3.8, “Tuning Lucene indexing performance” for more details.
Example 6.2. Efficiently indexing a given class (useful for index (re)initialization)
fullTextSession.setFlushMode(FlushMode.MANUAL); fullTextSession.setCacheMode(CacheMode.IGNORE); transaction = fullTextSession.beginTransaction(); //Scrollable results will avoid loading too many objects in memory ScrollableResults results = fullTextSession.createCriteria( Email.class ) .setFetchSize(BATCH_SIZE) .scroll( ScrollMode.FORWARD_ONLY ); int index = 0; while( results.next() ) { index++; fullTextSession.index( results.get(0) ); //index each element if (index % BATCH_SIZE == 0) { fullTextSession.flushToIndexes(); //apply changes to indexes fullTextSession.clear(); //clear since the queue is processed } } transaction.commit();
Try to use a batch size that guarantees that your application will not run out of memory.