hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: 1 table, 1 dense CF => N tables, 1 dense CF ?
Date Fri, 09 Jan 2015 20:19:38 GMT
> 2) is more expensive than 1).
> I'm wondering if we could use Compaction Coprocessor for 2)?  HBaseHUT
> needs to be able to grab N rows and merge them into 1, delete those N rows,
> and just write that 1 new row.  This N could be several thousand rows.
> Could Compaction Coprocessor really be used for that?
It would depend on the details.  If you're simply aggregating the data into
one row, and:
* the thousands of rows are contiguous in the scan
* you can somehow incrementally update or emit the new row that you want to
create so that you don't need to retain all the old rows in memory
* the new row you want to emit would sort sequentially into the same

Then overriding the scanner used for compaction could be a good solution.
This would allow you to transform the cells emitted during compaction,
including dropping the cells from the old rows and emitting new
(transformed) cells for the new row.

> Also, would that come into play during minor or major compactions or both?
You can distinguish between them in your coprocessor hooks based on
ScanType.  So up to you.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message