Chapter 6. Manual indexing

Chapter 6. Manual indexing
Prev		Next

6.1. Indexing

It is sometimes useful to index an object even if this object is not inserted nor updated to the database. This is especially true when you want to build your index for the first time. You can achieve that goal using the FullTextSession.

FullTextSession fullTextSession = Search.createFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
    fullTextSession.index(customer);
}
tx.commit(); //index are written at commit time

For maximum efficiency, Hibernate Search batches index operations and executse them at commit time (Note: you don't need to use org.hibernate.Transaction in a JTA environment).

If you expect to index a lot of data, you need to be careful about memory consumption: since all documents are kept in a queue until the transaction commit, you can potentially face an OutOfMemoryException.

To avoid that, you can set up the hibernate.search.worker.batch_size property to a sensitive value: all index operations are queued until batch_size is reached. Every time batch_size is reached (or if the transaction is committed), the queue is processed (freeing memory) and emptied. Be aware that the changes cannot be rollbacked if the number of index elements goes beyond batch_size. Be also aware that the queue limits are also applied on regular transparent indexing (and not only when session.index() is used). That's why a sensitive batch_size value is expected.

Other parameters which also can affect indexing time and memory consumption are hibernate.search.[default|<indexname>].batch.merge_factor , hibernate.search.[default|<indexname>].batch.max_merge_docs and hibernate.search.[default|<indexname>].batch.max_buffered_docs . These parameters are Lucene specific and Hibernate Search is just passing these paramters through - see Section 3.7, “Tuning Lucene indexing performance” for more details.

Here is an especially efficient way to index a given class (useful for index (re)initialization):

fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class ).scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
    index++;
    fullTextSession.index( results.get(0) ); //index each element
    if (index % batchSize == 0) s.clear(); //clear every batchSize since the queue is processed
}
transaction.commit();

It is critical that batchSize in the previous example matches the batch_size value described previously.

6.2. Purging

It is equally possible to remove an entity or all entities of a given type from a Lucene index without the need to physically remove them from the database. This operation is named purging and is done through the FullTextSession.

FullTextSession fullTextSession = Search.createFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
    fullTextSession.purge( Customer.class, customer.getId() );
}
tx.commit(); //index are written at commit time

Purging will remove the entity with the given id from the Lucene index but will not touch the database.

If you need to remove all entities of a given type, you can use the purgeAll method.

FullTextSession fullTextSession = Search.createFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
fullTextSession.purgeAll( Customer.class );
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index are written at commit time

It is recommended to optimize the index after such an operation.

Note

Methods index, purge and purgeAll are available on FullTextEntityManager as well

Prev	Up	Next
Chapter 5. Querying	Home	Chapter 7. Index Optimization