hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Keeping Map-Tasks alive
Date Sun, 05 Aug 2012 16:49:08 GMT
Sure you can, as we provide pluggable code points via the API. Just write a
custom record reader that doubles the work (first round reads actual input,
second round reads your known output and reiterates). In the mapper,
separate the first and second logic via a flag.

On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.gonen@gmail.com> wrote:

> Hi,
> Is there a way to keep a map-task alive after it has finished its work, to
> later perform another task on its same input?
> For example, consider the k-means clustering algorithm (k-means
> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
> The only thing changing between iterations is the clusters centers. All the
> input points remain the same. Keeping the mapper alive, and performing the
> next round of map-tasks on the same node will save a lot of communication
> cost.
> Thanks,
> Yaron

Harsh J

View raw message