hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: Custom compaction
Date Thu, 27 May 2010 08:34:24 GMT

Actually, for us it would be nice to be able to hook into the compaction, too.

We store records that are basically events that occur at certain times. We store the record
itself as qualifier and a timeline as column value (so multiple records+timelines per row
key is possible). So when a new record comes in, we do a get for the timeline, merge the new
timestamp with the existing timeline in memory and do a put to update the column value with
the new timeline.

In our first version, we just wrote the individual timestamps as values and used versioning
to keep all timestamps in the value. Then we combined all the timelines and individual timestamp
into a single timeline in memory on each read. We ran a MR job periodically to do the timeline
combining in the table and delete the obsolete timestamps in order to keep read performance
OK (because otherwise the read operation would involve a lot of additional work to create
a timeline and lots of versions would be created). In the end, the deletes in the MR job were
a bottleneck (as I understand, but I was not on the project at that moment).

Now, if we could hook into the compactions, then we could just always insert individual timestamps
as new versions and do the combining of versions into a single timeline during compaction
(as compaction needs to go through the complete table anyway). This would also improve our
insertion performance (no more gets in there, just puts like in the first version), which
is nice. We collect internet routing information, which is collected at 80 million records
per day with updates coming in in batches every 5 minutes (http://ris.ripe.net). We'd like
to try to be efficient before just throwing more machines at the problem.

Will there be anything like this on the roadmap?


On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote:

> Invisible. What's your need?
> J-D
> On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman
> <vidhyash@yahoo-inc.com> wrote:
>> Is there a way to customize the compaction function (like a hook provided by the
API) or is it invisible to the user?
>> Thank you
>> Vidhya

View raw message