ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (Jira)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-12263) Introduce native persistence compaction operation
Date Mon, 07 Oct 2019 08:30:00 GMT
Alexey Goncharuk created IGNITE-12263:

             Summary: Introduce native persistence compaction operation
                 Key: IGNITE-12263
                 URL: https://issues.apache.org/jira/browse/IGNITE-12263
             Project: Ignite
          Issue Type: Improvement
            Reporter: Alexey Goncharuk

Currently, Ignite native persistence does not shrink storage files after key-value pairs are
The causes of this behavior are:
 * The absence of a mechanism that allows Ignite to track highest non-empty page position
in a partition file
 * The absence of a mechanism which allows Ignite to select a page closest to the file beginning
for write
 * The absence of a mechanism which allows Ignite to move a key-value pair from page to page
during defragmentation

As an initial change I suggest to introduce a new node startup mode, which will run a defragmentation
procedure allowing the node to shrink storage files. The procedure will not mutate the logical
state of a partition allowing further historical rebalance to quickly catch up the node. Since
the procedure will run during the node startup (during the final stages of recovery), there
will be no concurrent load, thus the entries can be freely moved from page to page with no
tricky synchronization.

If a procedure is applied during the whole cluster restart, then all nodes will be defragmented
simultaneously, allowing for a quicker parallel defragmentation at a cost of downtime.

The procedure should accept an optional list of cache groups to defragment to allow arbitrary
cache group selection for defragmentation.

An idea of the actions taken during the run for each partition selected for defragmentation:
 * Partition pages are preloaded to memory if possible to avoid excessive page replacement.
During the scan, a HWM of the written data is detected (empty pages are skipped)
 * Pages references in a free list are sorted in a way allowing to pick pages closest to the
file start
 * The partition is scanned in reverse order, key-value pairs are moved closer to the file
start, HWM is updated accordingly. This step is particularly open for various optimizations
because different strategies will work well for different fragmentation patterns.
 * After the scan iteration is completed, the file size can be updated according to the HWM

As a further improvement, this partition defragmentation procedure can be later run in online
mode, after proper cache update protocol changes are designed.

This message was sent by Atlassian Jira

View raw message