hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaron Gonen <yaron.go...@gmail.com>
Subject Re: Keeping Map-Tasks alive
Date Sun, 05 Aug 2012 18:41:01 GMT
Thanks for the fast reply, but I don't see how a custom record reader will
help.
Consider again the k-means: the mappers need to stand-by until all the
reducers finish to calculate the new clusters' center. Only then, after the
reducers finish their work, the stand-by mappers get back to life and
perform their work.

On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <harsh@cloudera.com> wrote:

> Sure you can, as we provide pluggable code points via the API. Just write
> a custom record reader that doubles the work (first round reads actual
> input, second round reads your known output and reiterates). In the mapper,
> separate the first and second logic via a flag.
>
>
> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.gonen@gmail.com> wrote:
>
>> Hi,
>> Is there a way to keep a map-task alive after it has finished its work,
>> to later perform another task on its same input?
>> For example, consider the k-means clustering algorithm (k-means
>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>> The only thing changing between iterations is the clusters centers. All the
>> input points remain the same. Keeping the mapper alive, and performing the
>> next round of map-tasks on the same node will save a lot of communication
>> cost.
>>
>> Thanks,
>> Yaron
>>
>
>
>
> --
> Harsh J
>

Mime
View raw message