hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Wilkinson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3434) ability to increment a counter without reading original value from storage
Date Thu, 24 Mar 2016 22:17:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211053#comment-15211053
] 

Paul Wilkinson commented on HBASE-3434:
---------------------------------------

Hey folks, happy to take this on. The current prototype code (based on co-processors) is at
https://github.com/paulmw/hbase-aggregation/tree/master/src/main/java/aggregation/coprocessor

It's a work in progress for sure, but most of the ideas are in there. It aggregates data both
during flushes and compactions, as well as during gets and scans. So counters are implemented
simply by adding the co-processor and performing puts. It's very much not limited to summation
though, as you can plug in a custom value aggregation function (by implementing https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/ValueAccumulator.java).

The decision on what cells to aggregate is also pluggable - the default is versions of the
same cell (https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/DefaultCellAccumulator.java,
which implements CellAccumulator) but it's easy to imagine the kind of multi-level rollup
you often get in time series - keeping 1 minute granularity for today, 10 minute granularity
for the previous 6 days, hourly beyond that etc. So long as those values are all consecutive
in KV terms, that's still possible in a stateless fashion.

What's missing as yet is a design for how aggregation functions are registered - happy to
take direction there. It's also possible it could become more supported in HBase itself, rather
than in client land. Again, happy to take direction from folks here. It's certain though that
there's a need to retain the custom aggregation part of this, rather than just doing a better
version of counters.

> ability to increment a counter without reading original value from storage
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3434
>                 URL: https://issues.apache.org/jira/browse/HBASE-3434
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, regionserver
>            Reporter: dhruba borthakur
>            Assignee: stack
>              Labels: gsoc2016, mentor
>
> There are a bunch of applications that do read-modify-write operations on HBase constructs,
e.g  a counter; The counter value has to be read in from hdfs before it can be incremented.
 We have an application where the number of increments on a counter far outnumbers the number
of times the counter is used or read. For these type of applications, it will be very beneficial
to not have to read in the counter from disk before it can be incremented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message