hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: Custom compaction
Date Thu, 27 May 2010 15:06:51 GMT

>> And of course, HBase is open source so you can hack it up to do what you want :)

:) That was what we had been thinking too..

Thanks Friso! Your problem is quite similar to the problem that we our having: We also want
to perform a content-based merge i.e., merging a row only after performing a read operation
on one or more values in the row..


> -----Original Message-----
> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
> Sent: Thursday, May 27, 2010 1:34 AM
> To: user@hbase.apache.org
> Subject: Re: Custom compaction
> Hi,
> Actually, for us it would be nice to be able to hook into the
> compaction, too.
> We store records that are basically events that occur at certain times.
> We store the record itself as qualifier and a timeline as column value
> (so multiple records+timelines per row key is possible). So when a new
> record comes in, we do a get for the timeline, merge the new timestamp
> with the existing timeline in memory and do a put to update the column
> value with the new timeline.
> In our first version, we just wrote the individual timestamps as values
> and used versioning to keep all timestamps in the value. Then we
> combined all the timelines and individual timestamp into a single
> timeline in memory on each read. We ran a MR job periodically to do the
> timeline combining in the table and delete the obsolete timestamps in
> order to keep read performance OK (because otherwise the read operation
> would involve a lot of additional work to create a timeline and lots of
> versions would be created). In the end, the deletes in the MR job were
> a bottleneck (as I understand, but I was not on the project at that
> moment).
> Now, if we could hook into the compactions, then we could just always
> insert individual timestamps as new versions and do the combining of
> versions into a single timeline during compaction (as compaction needs
> to go through the complete table anyway). This would also improve our
> insertion performance (no more gets in there, just puts like in the
> first version), which is nice. We collect internet routing information,
> which is collected at 80 million records per day with updates coming in
> in batches every 5 minutes (http://ris.ripe.net). We'd like to try to
> be efficient before just throwing more machines at the problem.
> Will there be anything like this on the roadmap?
> Cheers,
> Friso
> On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote:
> > Invisible. What's your need?
> >
> > J-D
> >
> > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman
> > <vidhyash@yahoo-inc.com> wrote:
> >> Is there a way to customize the compaction function (like a hook
> provided by the API) or is it invisible to the user?
> >>
> >> Thank you
> >> Vidhya
> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message