Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 77437 invoked from network); 25 Mar 2011 17:01:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Mar 2011 17:01:00 -0000 Received: (qmail 65178 invoked by uid 500); 25 Mar 2011 17:00:59 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 65127 invoked by uid 500); 25 Mar 2011 17:00:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 65119 invoked by uid 99); 25 Mar 2011 17:00:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2011 17:00:58 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of buttler1@llnl.gov designates 128.115.41.82 as permitted sender) Received: from [128.115.41.82] (HELO nspiron-2.llnl.gov) (128.115.41.82) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2011 17:00:50 +0000 X-Attachments: None Received: from nspexhub-2.llnl.gov (HELO nspexhub-2.the-lab.llnl.gov) ([128.115.54.114]) by nspiron-2.llnl.gov with ESMTP; 25 Mar 2011 10:00:27 -0700 Received: from NSPEXMBX-A.the-lab.llnl.gov ([128.115.54.105]) by nspexhub-2.the-lab.llnl.gov ([172.16.54.114]) with mapi; Fri, 25 Mar 2011 10:00:27 -0700 From: "Buttler, David" To: "user@hbase.apache.org" Date: Fri, 25 Mar 2011 10:00:26 -0700 Subject: RE: How could I re-calculate every entries in hbase efficiently through mapreduce? Thread-Topic: How could I re-calculate every entries in hbase efficiently through mapreduce? Thread-Index: AcvrBRrUla6QQsqQQ96OWS/35Pwm0QACM15w Message-ID: <2D6136772A13B84E95DF6DA79E85A9F0013AD7FB8E53@NSPEXMBX-A.the-lab.llnl.gov> References: ,<2D6136772A13B84E95DF6DA79E85A9F0013AD7FB8DF2@NSPEXMBX-A.the-lab.llnl.gov>,,<2D6136772A13B84E95DF6DA79E85A9F0013AD7FB8E03@NSPEXMBX-A.the-lab.llnl.gov> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I would certainly find it useful if you wrote such a blog post. Dave -----Original Message----- From: Michael Segel [mailto:michael_segel@hotmail.com]=20 Sent: Friday, March 25, 2011 8:55 AM To: user@hbase.apache.org Subject: RE: How could I re-calculate every entries in hbase efficiently th= rough mapreduce? "During inserts into the table, there was one field that was populated=20 from hand-crafted HTML that should only have a small range of values=20 (e.g. a primary color). We wanted to keep a log of all of the unique=20 values that were found here, and so the values were the map job output=20 and then sorted and counted in the reduce phase." Ahhh, have you heard about dynamic counters? You don't need a reducer and all you have to do is dump the counters in you= r main job after your mappers run. Maybe I should write a blog entry where you can do your word counter app us= ing just dynamic counters and no reducers? HTH -Mike ---------------------------------------- > From: buttler1@llnl.gov > To: user@hbase.apache.org > Date: Fri, 25 Mar 2011 08:44:12 -0700 > Subject: RE: How could I re-calculate every entries in hbase efficiently = through mapreduce? > > We ran across a use-case this week. During inserts into the table, there = was one field that was populated from hand-crafted HTML that should only ha= ve a small range of values (e.g. a primary color). We wanted to keep a log = of all of the unique values that were found here, and so the values were th= e map job output and then sorted and counted in the reduce phase. A handy w= ay for us to debug the HTML into a persistent file (we could have just used= counters, but those disappear after a while unless you manually copy them)= . > > -----Original Message----- > From: Michael Segel [mailto:michael_segel@hotmail.com] > Sent: Friday, March 25, 2011 8:26 AM > To: user@hbase.apache.org > Subject: RE: How could I re-calculate every entries in hbase efficiently = through mapreduce? > > > > Yeah... > Uhm I don't know of many use cases where you would want or need a reducer= step when dealing with HBase. > I'm sure one may exist, but from past practical experience... you shouldn= 't need one. > > ---------------------------------------- > > From: buttler1@llnl.gov > > To: user@hbase.apache.org > > Date: Fri, 25 Mar 2011 08:20:45 -0700 > > Subject: RE: How could I re-calculate every entries in hbase efficientl= y through mapreduce? > > > > There is no reason to use a reducer in this scenario. I frequently do m= ap-only update jobs. Skipping the reduce step saves a lot of unnecessary wo= rk. > > > > Dave > > > > -----Original Message----- > > From: Stanley Xu [mailto:wenhao.xu@gmail.com] > > Sent: Thursday, March 24, 2011 7:37 PM > > To: user@hbase.apache.org > > Subject: How could I re-calculate every entries in hbase efficiently th= rough mapreduce? > > > > Dear Buddies, > > > > I need to re-calculate the entries in a hbase everyday, like let x =3D = 0.9x > > everyday, to make the time has impact on the entry values. > > > > So I write a TableMapper to get the Entry, and recalculate the result, = and > > use Context.write(key, put) to put the update operation in context, and= then > > use a IdentityTableReducer to write that directly back the hbase. In or= der > > to make the job done in a short time, I use the HRegionPartitioner to > > increase the reducer number to 50. > > > > But I have two doubts here: > > 1. It looks the partitioner will do a lots of shuffling, I am wondering= why > > it couldn't just do the put on the local region since the read and writ= e on > > the same entry should be on the same region, isn't it? > > > > 2. If the job failed for any reason(like timeout), the HBase might be i= n a > > partial-updated status, is it? > > > > Is there any suggestion that I could avoid these two problems? > > > > > > Thanks. > > > > Best wishes, > > Stanley Xu > =20