hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: cell level coprocessor
Date Fri, 15 May 2015 01:39:25 GMT
Hi Navdeep,

I believe you will need to:
* implement RegionScanner that would apply aggregation on Cell level
* extend from BaseRegionObserver to force using your RegionScanner in
preGet and preScan

I don't have a simple example in front of me, but maybe the following will
give you some pointers. We use versions of a Cell to store delta-values
when performing append-style increments (you put delta in next version of a
cell instead of incrementing existing). Then, during scanning those deltas
got summed up into a single value. I assume you want to do something along
those lines, so you may learn some from that code.

Here's the RegionScanner implementation [1]. Note that in next(List<Cell>
cells, int limit) you'll need to check for crossing the boundary of the
cell (i.e. cells given to you may have e.g. 3 versions of cell of column1
and 2 versions of a cell of column2).

Here's the BaseRegionObserver implementation [2].

On a side note, be sure to not overuse the versions of a Cell. Many times
using columns is a better schema design.

Cheers,
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase

[1]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementSummingScanner.java

[2]
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-0.98/src/main/java/co/cask/cdap/data2/increment/hbase98/IncrementHandler.java

[3] http://hbase.apache.org/book.html#schema.versions

On Thu, May 14, 2015 at 5:37 AM, Navdeep Agrawal <
Navdeep_Agrawal@symantec.com> wrote:

> Hi,
> I am trying to use co processor to do some aggregations(eg topn) over all
> versions of a cell and return it . I  found most of the aggregation
> implementation with coprocessors are done on column . how we can achieve
> for every cell in that column  ,any ideas ,links ???
>
> Use case - if I want to dom some aggregation over all versions of cell and
> return single value for that cell given row key and column .
>
>
> Thanks,
> Navdeep
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message