JBoss.orgCommunity Documentation

Index atomicity and durability

To be able to provide index consistency and recovery in case of unexpected crashes or damages, XCMIS uses write-ahead logging (WAL) technique. Write-ahead logging is a standard approach to transaction logging. Briefly, WAL's center concept is changes of data files (indexes) must be written only after those changes have been logged, that is, when the change log records have been flushed to permanent storage. If you follow this procedure, you do not need to flush data pages to disk on every transaction commit, because it is known in the event of a crash, and the index can be recovered by using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)

A major benefit of using WAL is a significantly reduced number of disk writes, because only the log file needs to be flushed to disk at the time of transaction commit, rather than every data file changed by the transaction.

When you start Indexer, it will check uncommitted transaction logs. If at least one log exists, recovering process will be started. Indexer will read all logs and extract added, updated and removed UUIDs into a set. Then, indexer walks through this set and checks objects against UUID. If the object exists, the indexer will put it into the added document list. In other cases, UUID will be added to the removed documents list. After that, depending on the list of added and removed documents, changes will be applied to the index.

When you run the indexer to check the number of documents in the index. If there are no documents in the index or the previous re-indexation was not successful, then re-indexation of all content will be started. The first step is cleaning old index data. Uncommitted transaction logs and old persistent data are removed. These data are useless, because re-indexation of all content will be started. Then indexer walks throw all objects and makes lucene document for each one. Then batches with less than 100 elements will be saved to the index. After re-indexation, all logs (WAL) will be removed, all data mentioned on these change logs are already indexed.

Note

If the administrator gets an exception with the message "Can't remove reindex flag.", it means that the index restoring was finished but file-flag was not removed (see index directory, file named as "reindexProcessing"). You can manually remove this file-flag, and avoid a new reindex of repository on the JCR start.