>
>
> 2) is more expensive than 1).
> I'm wondering if we could use Compaction Coprocessor for 2)? HBaseHUT
> needs to be able to grab N rows and merge them into 1, delete those N rows,
> and just write that 1 new row. This N could be several thousand rows.
> Could Compaction Coprocessor really be used for that?
>
>
It would depend on the details. If you're simply aggregating the data into
one row, and:
* the thousands of rows are contiguous in the scan
* you can somehow incrementally update or emit the new row that you want to
create so that you don't need to retain all the old rows in memory
* the new row you want to emit would sort sequentially into the same
position
Then overriding the scanner used for compaction could be a good solution.
This would allow you to transform the cells emitted during compaction,
including dropping the cells from the old rows and emitting new
(transformed) cells for the new row.
> Also, would that come into play during minor or major compactions or both?
>
>
You can distinguish between them in your coprocessor hooks based on
ScanType. So up to you.
|