hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: multiple puts in reducer?
Date Wed, 29 Feb 2012 13:04:15 GMT
The assertion is that for most cases you shouldn't need one. That the rule of thumb is that
you should have to defend your use of one. 

Reducers are expensive. Running multiple mappers in a job can be cheaper.

All I am saying is that you need to rethink your solution if you insist on using a reducer.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 28, 2012, at 11:40 AM, Ben Snively <bsnively@gmail.com> wrote:

> Is there an assertion that you would never need to run a reducer when
> writing to the DB?
> It seems that there are cases when you would not need one, but the general
> statement doesn't apply to all use cases.
> If you were trying to process data where you may have two a map task (or
> set of map tasks) output the same key,  you could have a case where you
> need to reduce the data for that key prior to insert the result into hbase.
> Am I missing something, but to me, that would be the deciding factor.  If
> the key/values output in the map task are the exact values that need to be
> inserted into HBase versus multiple values aggregated together and the
> results put into the hbase entry?
> Thanks,
> Ben
> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel
> <michael_segel@hotmail.com>wrote:
>> The better question is why would you need a reducer?
>> That's a bit cryptic, I understand, but you have to ask yourself when do
>> you need to use a reducer when you are writing to a database... ;-)
>> Sent from my iPhone
>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <tvinod@readypulse.com>
>> wrote:
>>> Mike,
>>> I didn't understand - why would I not need reducer in hbase m/r? there
>> can
>>> be cases right.
>>> My use case is very similar to Sujee's blog on frequency counting -
>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
>>> So in the reducer, I can do all the aggregations. Is there a better way?
>> I
>>> can think of another way - to use increments in the map job itself. i
>> have
>>> to figure out if thats possible though.
>>> thanks
>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel <michael_segel@hotmail.com
>>> wrote:
>>>> Yes you can do it.
>>>> But why do you have a reducer when running a m/r job against HBase?
>>>> The trick in writing multiple rows... You do it independently of the
>>>> output from the map() method.
>>>> Sent from a remote device. Please excuse any typos...
>>>> Mike Segel
>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <tvinod@readypulse.com>
>> wrote:
>>>>> while doing map reduce on hbase tables, is it possible to do multiple
>>>> puts
>>>>> in the reducer? what i want is a way to be able to write multiple rows.
>>>> if
>>>>> its not possible, then what are the other alternatives? i mean like
>>>>> creating a wider table in that case.
>>>>> thanks

View raw message