incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <>
Subject Re: 'Redundant' mutations
Date Thu, 09 Feb 2012 15:20:27 GMT
short answer: yes on disk these redundant keys are removed eventually

On Feb 9, 2012, at 10:14 AM, Keith Turner wrote:

> On Thu, Feb 9, 2012 at 9:50 AM, Benson Margulies <> wrote:
>> On Thu, Feb 9, 2012 at 9:47 AM, Aaron Cordova <> wrote:
>>> You get "a"
>>> By default tables are configured with a "versioning iterator" that filters out
all but the latest "version" of a key, meaning the key with the latest timestamp, which provides
the cleaning out of redundant keys that differ only in timestamp behavior you describe
>> I understood that the default was only to see the latest, but does
>> disk space remain consumed with older ones until something happens, or
>> does it clean out itself?
>> .
>>> On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote:
>>>> At time 0, I make a Mutation with put("a", "b", "c");
>>>> At time 1, I do it again.
>>>> Do I get:
>>>> a) two copies of the same data with different timestamps?
>>>> b) an error?
>>>> c) something else?
>>>> If the idea I'm looking for is to end up with one item without doing a
>>>> scan each time to see if it's out there, is there a 'garbage
>>>> collection' cliche for cleaning out redundant items that differ only
>>>> in timestamp?
> It depends on a few factors.
>  * If the two mutations were written to the same in memory map, when
> it is minor compacted only one is written out.
>  * If the two mutations were written to different in memory maps,
> then the data will be minor compacted to separate files.  In this case
> it will not go away until a major compactions occurs (merges multiple
> files, controlled by the major compaction ratio).  This can be caused
> by additional data being written or a user forcing major compaction on
> a table.

View raw message