hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaron Gonen <yaron.go...@gmail.com>
Subject Re: Keeping Map-Tasks alive
Date Mon, 06 Aug 2012 07:23:19 GMT
As I see it, it cannot be done in the MapReduce 1 framework without
changing TaskTracker and JobTracker.
Problem is I'm not familiar at all with YARN... it might be possible there.
Thanks again!

On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <harsh@cloudera.com> wrote:

> Ah, my bad - I skipped over the K-Means part of your original post.
> There currently isn't a way to do this with the existing MR framework and
> APIs. A Reducer is initiated upon map completion and the Task JVM is canned
> away after the Maps end. Perhaps you can use YARN to write something of
> what you desire?
> On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <yaron.gonen@gmail.com>wrote:
>> Thanks for the fast reply, but I don't see how a custom record reader
>> will help.
>> Consider again the k-means: the mappers need to stand-by until all the
>> reducers finish to calculate the new clusters' center. Only then, after the
>> reducers finish their work, the stand-by mappers get back to life and
>> perform their work.
>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <harsh@cloudera.com> wrote:
>>> Sure you can, as we provide pluggable code points via the API. Just
>>> write a custom record reader that doubles the work (first round reads
>>> actual input, second round reads your known output and reiterates). In the
>>> mapper, separate the first and second logic via a flag.
>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.gonen@gmail.com>wrote:
>>>> Hi,
>>>> Is there a way to keep a map-task alive after it has finished its work,
>>>> to later perform another task on its same input?
>>>> For example, consider the k-means clustering algorithm (k-means
>>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>>>> The only thing changing between iterations is the clusters centers. All the
>>>> input points remain the same. Keeping the mapper alive, and performing the
>>>> next round of map-tasks on the same node will save a lot of communication
>>>> cost.
>>>> Thanks,
>>>> Yaron
>>> --
>>> Harsh J
> --
> Harsh J

View raw message