hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: anyway to do "local" reduce like the combiner does?
Date Wed, 01 Feb 2012 13:27:33 GMT
There isn't a framework-provided way to do this in map-only jobs
presently, but I imagine you can do it with an in-mapper combiner of
your own. Projects like Apache Pig and Apache Hive carry such features
too, for some of its operations. You'd end up requiring more memory in
the map tasks this way though.

On Sun, Jan 29, 2012 at 1:36 PM, Jianhui Zhang <wuhuagua@yahoo.com> wrote:
> I have a problem at hand that seems to need "local" reducing:
> I have a large data input, in which each line is a data mapping, something
> like "name : attribute". The attributes for the same name are usually pretty
> close in the file, so they are very likely to be processed by the same
> mapper. I need to persist the "name:attributes" somewhere else (think DB).
> It'll be optimal if I can combine the attributes of the same name together
> and only persist them once. Attributes for the same name from different
> mappers can be safely persisted separately.
> I don't want to use reducers due to the network traffic. What I need is
> exactly what a combiner does, but as far as I can tell, combiners are not
> guaranteed to run or run only once (Correct me if I'm wrong here), so I
> guess I am not supposed to implement the persistence in the combiner.
> Anybody has got a similar problem before? What's your solution?
> Appreciate your help.
> Thanks,
> James

Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

View raw message