hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: How could I re-calculate every entries in hbase efficiently through mapreduce?
Date Fri, 25 Mar 2011 15:55:16 GMT

"During inserts into the table, there was one field that was populated 
from hand-crafted HTML that should only have a small range of values 
(e.g. a primary color). We wanted to keep a log of all of the unique 
values that were found here, and so the values were the map job output 
and then sorted and counted in the reduce phase."

Ahhh, have you heard about dynamic counters?
You don't need a reducer and all you have to do is dump the counters in your main job after
your mappers run.

Maybe I should write a blog entry where you can do your word counter app using just dynamic
counters and no reducers?

HTH

-Mike


----------------------------------------
> From: buttler1@llnl.gov
> To: user@hbase.apache.org
> Date: Fri, 25 Mar 2011 08:44:12 -0700
> Subject: RE: How could I re-calculate every entries in hbase efficiently through mapreduce?
>
> We ran across a use-case this week. During inserts into the table, there was one field
that was populated from hand-crafted HTML that should only have a small range of values (e.g.
a primary color). We wanted to keep a log of all of the unique values that were found here,
and so the values were the map job output and then sorted and counted in the reduce phase.
A handy way for us to debug the HTML into a persistent file (we could have just used counters,
but those disappear after a while unless you manually copy them).
>
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com]
> Sent: Friday, March 25, 2011 8:26 AM
> To: user@hbase.apache.org
> Subject: RE: How could I re-calculate every entries in hbase efficiently through mapreduce?
>
>
>
> Yeah...
> Uhm I don't know of many use cases where you would want or need a reducer step when dealing
with HBase.
> I'm sure one may exist, but from past practical experience... you shouldn't need one.
>
> ----------------------------------------
> > From: buttler1@llnl.gov
> > To: user@hbase.apache.org
> > Date: Fri, 25 Mar 2011 08:20:45 -0700
> > Subject: RE: How could I re-calculate every entries in hbase efficiently through
mapreduce?
> >
> > There is no reason to use a reducer in this scenario. I frequently do map-only update
jobs. Skipping the reduce step saves a lot of unnecessary work.
> >
> > Dave
> >
> > -----Original Message-----
> > From: Stanley Xu [mailto:wenhao.xu@gmail.com]
> > Sent: Thursday, March 24, 2011 7:37 PM
> > To: user@hbase.apache.org
> > Subject: How could I re-calculate every entries in hbase efficiently through mapreduce?
> >
> > Dear Buddies,
> >
> > I need to re-calculate the entries in a hbase everyday, like let x = 0.9x
> > everyday, to make the time has impact on the entry values.
> >
> > So I write a TableMapper to get the Entry, and recalculate the result, and
> > use Context.write(key, put) to put the update operation in context, and then
> > use a IdentityTableReducer to write that directly back the hbase. In order
> > to make the job done in a short time, I use the HRegionPartitioner to
> > increase the reducer number to 50.
> >
> > But I have two doubts here:
> > 1. It looks the partitioner will do a lots of shuffling, I am wondering why
> > it couldn't just do the put on the local region since the read and write on
> > the same entry should be on the same region, isn't it?
> >
> > 2. If the job failed for any reason(like timeout), the HBase might be in a
> > partial-updated status, is it?
> >
> > Is there any suggestion that I could avoid these two problems?
> >
> >
> > Thanks.
> >
> > Best wishes,
> > Stanley Xu
>
 		 	   		  
Mime
View raw message