Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 499719A41 for ; Sun, 24 Jun 2012 23:19:48 +0000 (UTC) Received: (qmail 20152 invoked by uid 500); 24 Jun 2012 23:19:46 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 20105 invoked by uid 500); 24 Jun 2012 23:19:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 20096 invoked by uid 99); 24 Jun 2012 23:19:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Jun 2012 23:19:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.86 as permitted sender) Received: from [65.55.111.86] (HELO blu0-omc2-s11.blu0.hotmail.com) (65.55.111.86) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Jun 2012 23:19:36 +0000 Received: from BLU0-SMTP316 ([65.55.111.73]) by blu0-omc2-s11.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 24 Jun 2012 16:19:14 -0700 X-Originating-IP: [173.15.87.37] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [192.168.0.100] ([173.15.87.37]) by BLU0-SMTP316.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Sun, 24 Jun 2012 16:19:12 -0700 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 (Apple Message framework v1278) Subject: Re: Increment Counters in HBase during MapReduce From: Michael Segel In-Reply-To: Date: Sun, 24 Jun 2012 18:19:11 -0500 Content-Transfer-Encoding: quoted-printable References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1278) X-OriginalArrivalTime: 24 Jun 2012 23:19:12.0864 (UTC) FILETIME=[C4142E00:01CD525F] There are a couple of issues and I'm sure others will point them out.=20 If you turn off speculative execution on the job, you don't get = duplicate tasks running in parallel.=20 You could create a table to store your aggregations on a per job basis = where your row-id could incorporate your job-id.=20 Then at the end of the job. If you didn't have any task failures or = speculative execution jobs, you could count on your aggregations to be = correct.=20 If you had a task fail or killed (a simple test if for some reason a job = ran with speculative execution) you could discard that row's data.=20 On Jun 24, 2012, at 4:15 PM, David Koch wrote: > Hello J-D >=20 > I have a similar requirement as that presented by the original poster, = i.e > updating a totals count without having to push the entire data set = through > the Mapper again. >=20 > Are you advising against calling incrementColumnValue on a mapper's = HTable > instance because the operation is not idempotent or are there other > reasons? It is even suggested in the docs: > http://hbase.apache.org/book/mapreduce.example.html (section 7.2.6). >=20 > Do you know of any "count-exactly-once" implementations on top of = Hadoop > Map/Reduce? >=20 > Thanks, >=20 > /David >=20 >=20 > On Tue, Jun 19, 2012 at 6:55 PM, Jean-Daniel Cryans = wrote: >=20 >> This question was answered here already: >>=20 >> = http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%3CAANLkTi= nnW2d7DMCyFu3ptv1Hu_i3XqK_1pDSgD5NT_Lk@mail.gmail.com%3E >>=20 >> Counters are not idempotent, this can be hard to manage. >>=20 >> J-D >>=20 >> On Mon, Jun 18, 2012 at 5:49 PM, Sid Kumar = wrote: >>> Hi everyone, >>>=20 >>> I have a use case in HBase that I was wondering if someone may = have >>> stumbled upon. I am maintaining an ad impressions table with columns = that >>> are counters for certain metrics. I started using the >> incrementColumnValue >>> method part of the HTable API to update these metrics and that works >> great. >>> I was wondering if this function could be used from a MapReduce = job. >>> The TableOutputFormat supports only Delete and Put operations. Using = the >>> Incremental counters saves me from doing any aggregations in my Map >> Reduce >>> code. Ideally i would like to just call this function in my mapper = and >>> wouldn't even need a Reducer. >>> Has anyone run into this use case? I would also love to know if = there >>> are any better alternatives of solving this too. Any info would be = great. >>>=20 >>> Thanks >>> Sid >>=20