activemq-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Bain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMQ-3978) Allow KahaDB to "compact" journal files to remove messages that are no longer needed
Date Thu, 27 Oct 2016 03:35:58 GMT

    [ https://issues.apache.org/jira/browse/AMQ-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610533#comment-15610533
] 

Tim Bain commented on AMQ-3978:
-------------------------------

I believe that AMQ-6203 reduces the severity of the problem by allowing the removal of files
that are being kept only for the sake of acking messages in a file that contains live messages,
so it solves part of the problem by eliminating chains of journal files.

However, it doesn't eliminate the problem that a single file can be kept alive by just one
message.  And it doesn't eliminate the need to keep the replayed acks around (in the most
recent journal file at the time of the replay), so in practice you could have as much as two
journal files for each message kept for long periods of time.  So it's a partial solution
(and a very welcome improvement, that will substantially reduce the disk space for installations
where just a few messages are kept for long periods of time), but it's not a full solution
and it doesn't help installations where a large enough percentage of messages are kept such
that most journal files contain at least one.

This fix would still be useful, if someone felt like implementing it, but AMQ-6203 does reduce
the urgency a bit.

> Allow KahaDB to "compact" journal files to remove messages that are no longer needed
> ------------------------------------------------------------------------------------
>
>                 Key: AMQ-3978
>                 URL: https://issues.apache.org/jira/browse/AMQ-3978
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: KahaDB, Message Store
>    Affects Versions: 5.6.0
>            Reporter: Tim Bain
>            Priority: Minor
>
> KahaDB uses a write-only journaling approach that ensures that a journal file will be
deleted only when no content within it is still in use.  If a single byte of the file is still
needed, the entire file must be kept, even if the rest of the file is not needed.
> This works fine when all messages are immediately removed from destinations within an
ActiveMQ broker, but it fails ungracefully when messages that are consumed infrequently (or
not at all, relying on TTL to delete the messages) are interspersed with large volumes of
messages that are consumed quickly.  In this scenario, if a single infrequently-consumed message
ends up in a journal file with a large number of quickly-consumed messages, the entire file
will be kept even though nearly all of the content of the file is no longer needed.  When
this happens for enough journal files, the KahaDB's disk/file limits are reached, even though
the amount of actual "live" data within the KahaDB is far below the configured limits.
> To fix this, the periodic cleanup task that already looks for files that are unused should
be changed so that if it determines that it cannot delete the file because it contains at
least one live message but it contains less than a configurable percentage of live messages,
the task will rewrite the journal file in question so it contains only those live messages
into file, updating any in-memory indices that might show the offsets of messages within the
file (if there are any such things). If any in-memory data structures will need to be updated,
we need to appropriately synchronize to ensure that no one can use the portions of the data
structure related to the file currently being compacted; access to similar information for
all other data files can continue unrestricted.
> Note that this will result in us still having potentially many individual files, with
each one having a much smaller file size than our target size. If that is problematic, it
would be possible to combine multiple partial files together during the compaction process
(while respecting the max file size) instead of writing live messages back into their current
file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message