hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Custom compaction
Date Sat, 29 May 2010 18:43:38 GMT
I like this idea.
St.Ack

On Thu, May 27, 2010 at 6:29 PM, Andrew Purtell <apurtell@apache.org> wrote:
> We could put a hook out of that iterator up into RegionObserver (HBASE-2001), for example.
>
> Currently the observer only gets notified that a compaction has happened.
>
>   - Andy
>
>> From: Jonathan Gray <jgray@facebook.com>
>> Subject: RE: Custom compaction
>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>> Date: Thursday, May 27, 2010, 6:21 AM
>> And of course, HBase is open source
>> so you can hack it up to do what you want :)
>>
>> The compaction API basically has an iterator of KeyValues
>> as input and then returns KeyValues as well.
>>
>> > -----Original Message-----
>> > From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>> > Sent: Thursday, May 27, 2010 1:34 AM
>> > To: user@hbase.apache.org
>> > Subject: Re: Custom compaction
>> >
>> > Hi,
>> >
>> > Actually, for us it would be nice to be able to hook
>> > into the compaction, too.
>> >
>> > We store records that are basically events that occur
>> at certain times.
>> > We store the record itself as qualifier and a timeline
>> as column value
>> > (so multiple records+timelines per row key is
>> possible). So when a new
>> > record comes in, we do a get for the timeline, merge
>> the new timestamp
>> > with the existing timeline in memory and do a put to
>> update the column
>> > value with the new timeline.
>> >
>> > In our first version, we just wrote the individual
>> timestamps as values
>> > and used versioning to keep all timestamps in the
>> value. Then we
>> > combined all the timelines and individual timestamp
>> into a single
>> > timeline in memory on each read. We ran a MR job
>> periodically to do the
>> > timeline combining in the table and delete the
>> obsolete timestamps in
>> > order to keep read performance OK (because otherwise
>> the read operation
>> > would involve a lot of additional work to create a
>> timeline and lots of
>> > versions would be created). In the end, the deletes in
>> the MR job were
>> > a bottleneck (as I understand, but I was not on the
>> project at that
>> > moment).
>> >
>> > Now, if we could hook into the compactions, then we
>> could just always
>> > insert individual timestamps as new versions and do
>> the combining of
>> > versions into a single timeline during compaction (as
>> compaction needs
>> > to go through the complete table anyway). This would
>> also improve our
>> > insertion performance (no more gets in there, just
>> puts like in the
>> > first version), which is nice. We collect internet
>> routing information,
>> > which is collected at 80 million records per day with
>> updates coming in
>> > in batches every 5 minutes (http://ris.ripe.net). We'd like to try to
>> > be efficient before just throwing more machines at the
>> problem.
>> >
>> > Will there be anything like this on the roadmap?
>> >
>> >
>> > Cheers,
>> > Friso
>> >
>> >
>> >
>> > On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans
>> wrote:
>> >
>> > > Invisible. What's your need?
>> > >
>> > > J-D
>> > >
>> > > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar
>> Venkataraman
>> > > <vidhyash@yahoo-inc.com>
>> wrote:
>> > >> Is there a way to customize the compaction
>> function (like a hook
>> > provided by the API) or is it invisible to the user?
>> > >>
>> > >> Thank you
>> > >> Vidhya
>> > >>
>>
>>
>
>
>
>
>

Mime
View raw message