hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: multiple puts in reducer?
Date Wed, 29 Feb 2012 13:18:10 GMT
There is nothing wrong in writing the output from a reducer to HBase.

The question you have to ask yourself is why are you using a reducer in the first place. ;-)

Look, you have a database. Why do you need a reducer?

It's a simple question... Right? ;-)

Look, I apologize for being cryptic. This is one of those philosophical design questions where
you the developer/architect have to figure out the answer for yourself.  Maybe I should submit
this as an HBaseconn topic for a presentation?

Sort of like how to do an efficient table join in HBase.... 

HTH
Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 28, 2012, at 11:16 PM, Jacques <whshub@gmail.com> wrote:

> I see nothing wrong with using the output of the reducer into hbase.   You
> just need to make sure duplicated operations wouldn't cause problems.  If
> using tableoutputformat, don't use random seeded keys.  If working straight
> against htable,  don't use increment.  We do this for some situations and
> either don't care about overwrites or use checkAndPut with a skip option in
> the application code.
> On Feb 28, 2012 9:40 AM, "Ben Snively" <bsnively@gmail.com> wrote:
> 
>> Is there an assertion that you would never need to run a reducer when
>> writing to the DB?
>> 
>> It seems that there are cases when you would not need one, but the general
>> statement doesn't apply to all use cases.
>> 
>> If you were trying to process data where you may have two a map task (or
>> set of map tasks) output the same key,  you could have a case where you
>> need to reduce the data for that key prior to insert the result into hbase.
>> 
>> Am I missing something, but to me, that would be the deciding factor.  If
>> the key/values output in the map task are the exact values that need to be
>> inserted into HBase versus multiple values aggregated together and the
>> results put into the hbase entry?
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel
>> <michael_segel@hotmail.com>wrote:
>> 
>>> The better question is why would you need a reducer?
>>> 
>>> That's a bit cryptic, I understand, but you have to ask yourself when do
>>> you need to use a reducer when you are writing to a database... ;-)
>>> 
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <tvinod@readypulse.com>
>>> wrote:
>>> 
>>>> Mike,
>>>> I didn't understand - why would I not need reducer in hbase m/r? there
>>> can
>>>> be cases right.
>>>> My use case is very similar to Sujee's blog on frequency counting -
>>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/
>>>> So in the reducer, I can do all the aggregations. Is there a better
>> way?
>>> I
>>>> can think of another way - to use increments in the map job itself. i
>>> have
>>>> to figure out if thats possible though.
>>>> 
>>>> thanks
>>>> 
>>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel <
>> michael_segel@hotmail.com
>>>> wrote:
>>>> 
>>>>> Yes you can do it.
>>>>> But why do you have a reducer when running a m/r job against HBase?
>>>>> 
>>>>> The trick in writing multiple rows... You do it independently of the
>>>>> output from the map() method.
>>>>> 
>>>>> 
>>>>> Sent from a remote device. Please excuse any typos...
>>>>> 
>>>>> Mike Segel
>>>>> 
>>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <tvinod@readypulse.com>
>>> wrote:
>>>>> 
>>>>>> while doing map reduce on hbase tables, is it possible to do multiple
>>>>> puts
>>>>>> in the reducer? what i want is a way to be able to write multiple
>> rows.
>>>>> if
>>>>>> its not possible, then what are the other alternatives? i mean like
>>>>>> creating a wider table in that case.
>>>>>> 
>>>>>> thanks
>>>>> 
>>> 
>> 

Mime
View raw message