hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Snively <bsniv...@gmail.com>
Subject Re: multiple puts in reducer?
Date Wed, 29 Feb 2012 13:21:06 GMT
I would enjoy seeing this:

"  Maybe I should submit this as an HBaseconn topic for a presentation? "

Thanks,
Ben

On Wed, Feb 29, 2012 at 8:18 AM, Michel Segel <michael_segel@hotmail.com>wrote:

> There is nothing wrong in writing the output from a reducer to HBase.
>
> The question you have to ask yourself is why are you using a reducer in
> the first place. ;-)
>
> Look, you have a database. Why do you need a reducer?
>
> It's a simple question... Right? ;-)
>
> Look, I apologize for being cryptic. This is one of those philosophical
> design questions where you the developer/architect have to figure out the
> answer for yourself.  Maybe I should submit this as an HBaseconn topic for
> a presentation?
>
> Sort of like how to do an efficient table join in HBase....
>
> HTH
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 28, 2012, at 11:16 PM, Jacques <whshub@gmail.com> wrote:
>
> > I see nothing wrong with using the output of the reducer into hbase.
> You
> > just need to make sure duplicated operations wouldn't cause problems.  If
> > using tableoutputformat, don't use random seeded keys.  If working
> straight
> > against htable,  don't use increment.  We do this for some situations and
> > either don't care about overwrites or use checkAndPut with a skip option
> in
> > the application code.
> > On Feb 28, 2012 9:40 AM, "Ben Snively" <bsnively@gmail.com> wrote:
> >
> >> Is there an assertion that you would never need to run a reducer when
> >> writing to the DB?
> >>
> >> It seems that there are cases when you would not need one, but the
> general
> >> statement doesn't apply to all use cases.
> >>
> >> If you were trying to process data where you may have two a map task (or
> >> set of map tasks) output the same key,  you could have a case where you
> >> need to reduce the data for that key prior to insert the result into
> hbase.
> >>
> >> Am I missing something, but to me, that would be the deciding factor.
>  If
> >> the key/values output in the map task are the exact values that need to
> be
> >> inserted into HBase versus multiple values aggregated together and the
> >> results put into the hbase entry?
> >>
> >> Thanks,
> >> Ben
> >>
> >>
> >> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel
> >> <michael_segel@hotmail.com>wrote:
> >>
> >>> The better question is why would you need a reducer?
> >>>
> >>> That's a bit cryptic, I understand, but you have to ask yourself when
> do
> >>> you need to use a reducer when you are writing to a database... ;-)
> >>>
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <tvinod@readypulse.com>
> >>> wrote:
> >>>
> >>>> Mike,
> >>>> I didn't understand - why would I not need reducer in hbase m/r? there
> >>> can
> >>>> be cases right.
> >>>> My use case is very similar to Sujee's blog on frequency counting -
> >>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
> >>>> So in the reducer, I can do all the aggregations. Is there a better
> >> way?
> >>> I
> >>>> can think of another way - to use increments in the map job itself.
i
> >>> have
> >>>> to figure out if thats possible though.
> >>>>
> >>>> thanks
> >>>>
> >>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel <
> >> michael_segel@hotmail.com
> >>>> wrote:
> >>>>
> >>>>> Yes you can do it.
> >>>>> But why do you have a reducer when running a m/r job against HBase?
> >>>>>
> >>>>> The trick in writing multiple rows... You do it independently of
the
> >>>>> output from the map() method.
> >>>>>
> >>>>>
> >>>>> Sent from a remote device. Please excuse any typos...
> >>>>>
> >>>>> Mike Segel
> >>>>>
> >>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <tvinod@readypulse.com>
> >>> wrote:
> >>>>>
> >>>>>> while doing map reduce on hbase tables, is it possible to do
> multiple
> >>>>> puts
> >>>>>> in the reducer? what i want is a way to be able to write multiple
> >> rows.
> >>>>> if
> >>>>>> its not possible, then what are the other alternatives? i mean
like
> >>>>>> creating a wider table in that case.
> >>>>>>
> >>>>>> thanks
> >>>>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message