Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 24 Mar 2016 22:17:25 +0000 (UTC)
From: "Paul Wilkinson (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12495119.1294654920000.45856.1458857845616@Atlassian.JIRA>
In-Reply-To: <JIRA.12495119.1294654920000@Atlassian.JIRA>
References: <JIRA.12495119.1294654920000@Atlassian.JIRA>
 <JIRA.12495119.1294654920478@arcas>
Subject: [jira] [Commented] (HBASE-3434) ability to increment a counter
 without reading original value from storage
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211053#comment-15211053 ] 

Paul Wilkinson commented on HBASE-3434:
---------------------------------------

Hey folks, happy to take this on. The current prototype code (based on co-processors) is at https://github.com/paulmw/hbase-aggregation/tree/master/src/main/java/aggregation/coprocessor

It's a work in progress for sure, but most of the ideas are in there. It aggregates data both during flushes and compactions, as well as during gets and scans. So counters are implemented simply by adding the co-processor and performing puts. It's very much not limited to summation though, as you can plug in a custom value aggregation function (by implementing https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/ValueAccumulator.java).

The decision on what cells to aggregate is also pluggable - the default is versions of the same cell (https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/DefaultCellAccumulator.java, which implements CellAccumulator) but it's easy to imagine the kind of multi-level rollup you often get in time series - keeping 1 minute granularity for today, 10 minute granularity for the previous 6 days, hourly beyond that etc. So long as those values are all consecutive in KV terms, that's still possible in a stateless fashion.

What's missing as yet is a design for how aggregation functions are registered - happy to take direction there. It's also possible it could become more supported in HBase itself, rather than in client land. Again, happy to take direction from folks here. It's certain though that there's a need to retain the custom aggregation part of this, rather than just doing a better version of counters.

> ability to increment a counter without reading original value from storage
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3434
>                 URL: https://issues.apache.org/jira/browse/HBASE-3434
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, regionserver
>            Reporter: dhruba borthakur
>            Assignee: stack
>              Labels: gsoc2016, mentor
>
> There are a bunch of applications that do read-modify-write operations on HBase constructs, e.g  a counter; The counter value has to be read in from hdfs before it can be incremented.  We have an application where the number of increments on a counter far outnumbers the number of times the counter is used or read. For these type of applications, it will be very beneficial to not have to read in the counter from disk before it can be incremented.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)