activemq-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Bain (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (AMQ-5905) Allow KahaDB to perform compaction of sparsely-used data files
Date Tue, 28 Jul 2015 14:37:04 GMT

     [ https://issues.apache.org/jira/browse/AMQ-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Bain closed AMQ-5905.
-------------------------
    Resolution: Duplicate

> Allow KahaDB to perform compaction of sparsely-used data files
> --------------------------------------------------------------
>
>                 Key: AMQ-5905
>                 URL: https://issues.apache.org/jira/browse/AMQ-5905
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: KahaDB, Message Store
>            Reporter: Tim Bain
>
> As currently implemented, KahaDB can only reduce data file usage by deleting an entire
data file.  As a result, the situation where KahaDB can reduce the amount of space it uses
on disk is when there are no old messages still in a data file; if there is even one message
in an old file that must be kept, the entire file cannot be deleted.  And if one (deleted)
message in the old file has its deletion record in a later file, that later file must also
be kept, even if none of the messages in it are actually needed otherwise; as a result, a
single old message could keep alive a long chain of data files.
> The current advice that's been given is 1) don't keep messages for very long, and 2)
use small KahaDB files so that you'll be able to delete at least some portions of what would
have been a single large file that had to stick around (and in the hopes that you'll get lucky
and be able to break the chain of kept files).  These are both workarounds (and not very good
ones, particularly since the entire concept of a DLQ is fundamentally opposed to #1) for the
fundamental flawed assumption in KahaDB: that it's reasonable for its files to be read-only
and for the database itself to be powerless to do anything when files are sparsely populated
by live messages.  The fundamental paradigm of files being write-only for individual message
deletion was a good one and provides excellent performance characteristics; however, restricting
occasional maintenance tasks to the same paradigm handcuffs them unreasonably and should be
changed.
> The periodic cleanup task that already looks for files that are unused should be changed
so that if it determines that it cannot delete the file because it contains at least one live
message but it contains less than a configurable percentage of live messages, it will rewrite
the journal file in question so it contains only those live messages into file, updating any
in-memory indices that might show the offsets of messages within the file (if there are any
such things).  If any in-memory data structures will need to be updated, we need to appropriately
synchronize to ensure that no one can use the portions of the data structure related to the
file currently being compacted; access to similar information for all other data files can
continue unrestricted.
> Note that this will result in us still having potentially many individual files, with
each one having a much smaller file size than our target size.  If that is problematic, it
would be possible to combine multiple partial files together during the compaction process
(while respecting the max file size) instead of writing live messages back into their current
file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message