hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianhui Zhang <wuhua...@yahoo.com>
Subject anyway to do "local" reduce like the combiner does?
Date Sun, 29 Jan 2012 08:06:40 GMT

I have a problem at hand that seems to need "local" reducing: 

I have a large data input, in which each line is a data mapping, something like "name : attribute".
The attributes for the same name are usually pretty close in the file, so they are very likely
to be processed by the same mapper. I need to persist the "name:attributes" somewhere else
(think DB). It'll be optimal if I can combine the attributes of the same name together and
only persist them once. Attributes for the same name from different mappers can be safely
persisted separately. 

I don't want to use reducers due to the network traffic. What I need is exactly what a combiner
does, but as far as I can tell, combiners are not guaranteed to run or run only once (Correct
me if I'm wrong here), so I guess I am not supposed to implement the persistence in the combiner. 

Anybody has got a similar problem before? What's your solution? 

Appreciate your help. 


View raw message