Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C102810D5B for ; Tue, 30 Apr 2013 19:25:38 +0000 (UTC) Received: (qmail 63620 invoked by uid 500); 30 Apr 2013 19:25:36 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63543 invoked by uid 500); 30 Apr 2013 19:25:36 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63533 invoked by uid 99); 30 Apr 2013 19:25:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 19:25:36 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of liorsav@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-ob0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 19:25:31 +0000 Received: by mail-ob0-f170.google.com with SMTP id eh20so762999obb.1 for ; Tue, 30 Apr 2013 12:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=EpJz+Px7YbsfxHGIfKToga+9VGYTJHBwETTtBviMNzY=; b=LfbWQOJxqRBOcX5Ghpfl2a48/Lb/P6HoQF/qPXvEM9W5m/Dii9fKUKW1lvpD2lMuAV UuB9Yx/4uOpgqpg4VAfZPkQkjnDAemc38+TkCSq2UF+POW8PJpR9zmurwhOnbwqYIT7G P++yCnnN5/HDSPz885TM5p0UfjTszfrulybiSzRNombc7WDmC8tL2vSE/t+irkSFbSdI 2X1YaTsJQTwdXyW85vGDwTE1/0e+X4bH5ZuL5YMzUKFE4lYRnZySrIFrlIET6JEXZ3KQ v8MoiLEGoom/LGoBRYIJPJxRUt9YOEx11/AQW85dyyAhV7jgPjQFDwVee0k6XPvS706s 5d9g== MIME-Version: 1.0 X-Received: by 10.182.120.134 with SMTP id lc6mr7167359obb.34.1367349910816; Tue, 30 Apr 2013 12:25:10 -0700 (PDT) Received: by 10.60.78.232 with HTTP; Tue, 30 Apr 2013 12:25:10 -0700 (PDT) In-Reply-To: References: Date: Tue, 30 Apr 2013 22:25:10 +0300 Message-ID: Subject: Re: checkAnd... From: Lior Schachter To: user Content-Type: multipart/alternative; boundary=089e013a1978b81d4f04db98f5b7 X-Virus-Checked: Checked by ClamAV on apache.org --089e013a1978b81d4f04db98f5b7 Content-Type: text/plain; charset=ISO-8859-1 Hi, We have a simple HBase schema: row key = subscriber id. Column family A = counters - all kinds of aggregations. Events records have a UUID, in some scenarios we might get duplicate events. We should not count the duplicates. A possible solution was to keep event ids as qualifiers in another CF and do checkAndIncrement only if can't find the event id. I understand how to utilize RegionObserver to solve the problem. Any other suggestions ? Thanks, Lior. On Sun, Apr 28, 2013 at 10:55 PM, Asaf Mesika wrote: > Yep. > You can write a RegionObserver which take all event qualifiers with a time > stamp larger than a certain grace period, sum it up, add it to the current > value of the Count qualifier and emits an updated Count qualifier. > I wrote something very similar for us at Akamai and it improved throughput > by x10. I'm working on open sourcing it. > > On Saturday, April 27, 2013, Lior Schachter wrote: > > > Hi Ted, > > Thanks for the prompt response. > > I've already had a look at HRegionServer.checkAndPut and the > implementation > > looks quite straight forward. > > That's why I was wondering why the other 2 methods are not available...or > > planned (couldn't find Jira). > > Seems like a useful functionality. > > > > Anyhow, I'm not allowed to make any source code modifications to the > HBase > > installation (in production) so I reckon I'll have to find a workaround. > > > > This is my use case: > > Updating user counters by events. > > We may get (in rare cases) duplicate events. > > Should not count the duplicates. > > > > My initial thought was to have an event_id qualifier for each incoming > > event (with '1' value). By checking if event_id exists before > incrementing > > I can avoid duplicates. > > Without the checkAndIncrement functionality I must make 2 round trips for > > each event (which doesn't make sense). > > > > Any ideas how I can solve this issue ? > > > > Thanks, > > Lior > > > > > > > > > > > > > > > > > > On Sat, Apr 27, 2013 at 4:23 PM, Ted Yu > > > wrote: > > > > > Take a look at the following method in HRegionServer: > > > > > > public boolean checkAndPut(final byte[] regionName, final byte[] row, > > > final byte[] family, final byte[] qualifier, final byte[] value, > > > final Put put) throws IOException { > > > > > > You can create checkAndIncrement() in a similar way. > > > > > > Cheers > > > > > > On Sat, Apr 27, 2013 at 9:02 PM, Lior Schachter > > > wrote: > > > > > > > Hi, > > > > I want to increment a cell value only after checking a condition on > > > another > > > > cell. I could find checkAndPut/checkAndDelete on HTableInteface. It > > seems > > > > that checkAndIncrement (and checkAndAppend) are missing. > > > > > > > > Can you suggest a workaround for my use-case ? working with version > > > > 0.94.5. > > > > > > > > Thanks, > > > > Lior > > > > > > > > > > --089e013a1978b81d4f04db98f5b7--