hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject RE: Custom compaction
Date Fri, 28 May 2010 01:29:22 GMT
We could put a hook out of that iterator up into RegionObserver (HBASE-2001), for example.

Currently the observer only gets notified that a compaction has happened. 

   - Andy

> From: Jonathan Gray <jgray@facebook.com>
> Subject: RE: Custom compaction
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Date: Thursday, May 27, 2010, 6:21 AM
> And of course, HBase is open source
> so you can hack it up to do what you want :)
> 
> The compaction API basically has an iterator of KeyValues
> as input and then returns KeyValues as well. 
> 
> > -----Original Message-----
> > From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> > Sent: Thursday, May 27, 2010 1:34 AM
> > To: user@hbase.apache.org
> > Subject: Re: Custom compaction
> > 
> > Hi,
> > 
> > Actually, for us it would be nice to be able to hook
> > into the compaction, too.
> > 
> > We store records that are basically events that occur
> at certain times.
> > We store the record itself as qualifier and a timeline
> as column value
> > (so multiple records+timelines per row key is
> possible). So when a new
> > record comes in, we do a get for the timeline, merge
> the new timestamp
> > with the existing timeline in memory and do a put to
> update the column
> > value with the new timeline.
> > 
> > In our first version, we just wrote the individual
> timestamps as values
> > and used versioning to keep all timestamps in the
> value. Then we
> > combined all the timelines and individual timestamp
> into a single
> > timeline in memory on each read. We ran a MR job
> periodically to do the
> > timeline combining in the table and delete the
> obsolete timestamps in
> > order to keep read performance OK (because otherwise
> the read operation
> > would involve a lot of additional work to create a
> timeline and lots of
> > versions would be created). In the end, the deletes in
> the MR job were
> > a bottleneck (as I understand, but I was not on the
> project at that
> > moment).
> > 
> > Now, if we could hook into the compactions, then we
> could just always
> > insert individual timestamps as new versions and do
> the combining of
> > versions into a single timeline during compaction (as
> compaction needs
> > to go through the complete table anyway). This would
> also improve our
> > insertion performance (no more gets in there, just
> puts like in the
> > first version), which is nice. We collect internet
> routing information,
> > which is collected at 80 million records per day with
> updates coming in
> > in batches every 5 minutes (http://ris.ripe.net). We'd like to try to
> > be efficient before just throwing more machines at the
> problem.
> > 
> > Will there be anything like this on the roadmap?
> > 
> > 
> > Cheers,
> > Friso
> > 
> > 
> > 
> > On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans
> wrote:
> > 
> > > Invisible. What's your need?
> > >
> > > J-D
> > >
> > > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar
> Venkataraman
> > > <vidhyash@yahoo-inc.com>
> wrote:
> > >> Is there a way to customize the compaction
> function (like a hook
> > provided by the API) or is it invisible to the user?
> > >>
> > >> Thank you
> > >> Vidhya
> > >>
> 
> 


      


Mime
View raw message