Class SoftIndexFileStore

  • All Implemented Interfaces:
    Lifecycle, AdvancedCacheLoader<Object,​Object>, AdvancedCacheWriter<Object,​Object>, AdvancedLoadWriteStore<Object,​Object>, CacheLoader<Object,​Object>, CacheWriter<Object,​Object>, ExternalStore<Object,​Object>

    public class SoftIndexFileStore
    extends Object
    implements AdvancedLoadWriteStore<Object,​Object>
    Local file-based cache store, optimized for write-through use with strong consistency guarantees (ability to flush disk operations before returning from the store call). * DESIGN: There are three threads operating in the cache-store: - LogAppender: Requests to store entries are passed to the LogAppender thread via queue, then the requestor threads wait until LogAppender notifies them about successful store. LogAppender serializes the writes into append-only file, writes the offset into TemporaryTable and enqueues request to update index into UpdateQueue. The append-only files have limited size, when the file is full, new file is started. - IndexUpdater: Reads the UpdateQueue, applies the operation into B-tree-like structure Index (exact description below) and then removes the entry from TemporaryTable. When the Index is overwriten, the current entry offset is retrieved and IndexUpdater increases the unused space statistics in FileStats. - Compactor: When a limit of unused space in some file is reached (according to FileStats), the Compactor starts reading this file sequentially, querying TemporaryTable or Index for the current entry position and copying the unchanged entries into another file. For the entries that are still valid in the original file, a compare-and-set (file-offset based) request is enqueued into UpdateQueue - therefore this operation cannot interfere with concurrent writes overwriting the entry. Multiple files can be merged into single file during compaction. Structures: - TemporaryTable: keeps the records about current entry location until this is applied to the Index. Each read request goes to the TemporaryTable, if the key is not found here, Index is queried. - UpdateQueue: bounded queue (to prevent grow the TemporaryTable too much) of either forced writes (used for regular stores) or compare-and-set writes (used by Compactor). - FileStats: simple (Concurrent)HashTable with actual file size and amount of unused space for each file. - Index: B+-tree of IndexNodes. The tree is dropped and built a new if the process crashes, it does not need to flush disk operations. On disk it is kept as single random-accessed file, with free blocks list stored in memory. As IndexUpdater may easily become a bottleneck under heavy load, the IndexUpdater thread, UpdateQueue and tree of IndexNodes may be multiplied several times - the Index is divided into Segments. Each segment owns keys according to the hashCode() of the key. Amount of entries in IndexNode is limited by the size it occupies on disk. This size is limited by configurable nodeSize (4096 bytes by default?), only in case that the node contains single pivot (too long) it can be longer. A key_prefix common for all keys in the IndexNode is stored in order to reduce space requirements. For implementation reasons the keys are limited to 32kB - this requirement may be circumvented later. The pivots are not whole keys - it is the shortest part of key that is greater than all left children (but lesser or equal to all right children) - let us call this key_part. The key_parts are sorted in the IndexNode, naturally. On disk it has this format: key_prefix_length(2 bytes), key_prefix, num_parts(2 bytes), ( key_part_length (2 bytes), key_part, left_child_index_node_offset (8 bytes))+, right_child_index_node_offset (8 bytes) In memory, for every child a SoftReference is held. When this reference is empty (but the offset in file is set), any reader may load the reference using double-locking pattern (synchronized over the reference itself). The entry is never loaded by multiple threads in parallel and even may block other threads trying to read this node. For each node in memory a RW-lock is held. When the IndexUpdater thread updates the Index (modifying some IndexNodes), it prepares a copy of these nodes (already stored into index file). Then, in locks only the uppermost node for writing, overwrites the references to new data and unlocks the this node. After that the changed nodes are traversed from top down, write locked and their record in index file is released. Reader threads crawl the tree from top down, locking the parent node (for reading), locking child node and unlocking parent node.
    Radim Vansa <>