incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: 'Redundant' mutations
Date Thu, 09 Feb 2012 15:14:40 GMT
On Thu, Feb 9, 2012 at 9:50 AM, Benson Margulies <> wrote:
> On Thu, Feb 9, 2012 at 9:47 AM, Aaron Cordova <> wrote:
>> You get "a"
>> By default tables are configured with a "versioning iterator" that filters out all
but the latest "version" of a key, meaning the key with the latest timestamp, which provides
the cleaning out of redundant keys that differ only in timestamp behavior you describe
> I understood that the default was only to see the latest, but does
> disk space remain consumed with older ones until something happens, or
> does it clean out itself?
> .
>> On Feb 9, 2012, at 9:43 AM, Benson Margulies wrote:
>>> At time 0, I make a Mutation with put("a", "b", "c");
>>> At time 1, I do it again.
>>> Do I get:
>>> a) two copies of the same data with different timestamps?
>>> b) an error?
>>> c) something else?
>>> If the idea I'm looking for is to end up with one item without doing a
>>> scan each time to see if it's out there, is there a 'garbage
>>> collection' cliche for cleaning out redundant items that differ only
>>> in timestamp?

It depends on a few factors.
  * If the two mutations were written to the same in memory map, when
it is minor compacted only one is written out.
  * If the two mutations were written to different in memory maps,
then the data will be minor compacted to separate files.  In this case
it will not go away until a major compactions occurs (merges multiple
files, controlled by the major compaction ratio).  This can be caused
by additional data being written or a user forcing major compaction on
a table.

View raw message