ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis A. Magda (Jira)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-12263) Introduce native persistence compaction operation
Date Mon, 07 Oct 2019 17:25:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Denis A. Magda updated IGNITE-12263:
------------------------------------

Dev list discussion: http://apache-ignite-developers.2346864.n4.nabble.com/How-to-free-up-space-on-disc-after-removing-entries-from-IgniteCache-with-enabled-PDS-td39839.html

> Introduce native persistence compaction operation
> -------------------------------------------------
>
>                 Key: IGNITE-12263
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12263
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Priority: Critical
>
> Currently, Ignite native persistence does not shrink storage files after key-value pairs
are removed.
> The causes of this behavior are:
>  * The absence of a mechanism that allows Ignite to track highest non-empty page position
in a partition file
>  * The absence of a mechanism which allows Ignite to select a page closest to the file
beginning for write
>  * The absence of a mechanism which allows Ignite to move a key-value pair from page
to page during defragmentation
> As an initial change I suggest to introduce a new node startup mode, which will run a
defragmentation procedure allowing the node to shrink storage files. The procedure will not
mutate the logical state of a partition allowing further historical rebalance to quickly catch
up the node. Since the procedure will run during the node startup (during the final stages
of recovery), there will be no concurrent load, thus the entries can be freely moved from
page to page with no tricky synchronization.
> If a procedure is applied during the whole cluster restart, then all nodes will be defragmented
simultaneously, allowing for a quicker parallel defragmentation at a cost of downtime.
> The procedure should accept an optional list of cache groups to defragment to allow arbitrary
cache group selection for defragmentation.
> An idea of the actions taken during the run for each partition selected for defragmentation:
>  * Partition pages are preloaded to memory if possible to avoid excessive page replacement.
During the scan, a HWM of the written data is detected (empty pages are skipped)
>  * Pages references in a free list are sorted in a way allowing to pick pages closest
to the file start
>  * The partition is scanned in reverse order, key-value pairs are moved closer to the
file start, HWM is updated accordingly. This step is particularly open for various optimizations
because different strategies will work well for different fragmentation patterns.
>  * After the scan iteration is completed, the file size can be updated according to the
HWM
> As a further improvement, this partition defragmentation procedure can be later run in
online mode, after proper cache update protocol changes are designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message