Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB9E8D024 for ; Mon, 6 Aug 2012 07:23:51 +0000 (UTC) Received: (qmail 97351 invoked by uid 500); 6 Aug 2012 07:23:49 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 97102 invoked by uid 500); 6 Aug 2012 07:23:47 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 97067 invoked by uid 99); 6 Aug 2012 07:23:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Aug 2012 07:23:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yaron.gonen@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Aug 2012 07:23:40 +0000 Received: by bkty5 with SMTP id y5so1121239bkt.35 for ; Mon, 06 Aug 2012 00:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hjnj13GecdrKBXk76VbvkiU+RSnf1Qd4f7bakXoyjPw=; b=tuFUSLQ0VL+LqT+pw8ySs/Zfd37o6prCta06zP2U0hLgEFBRiMA4YBBhLXKO6oNBr8 cWX7o/+E/qm9GCkhunIP3cEmQWFsZME5lVnqUKx/kYV8ulNOTK8vPGm/WW3GgGv6pU49 tTvqce92fp9YkNe3hHeH/+7GiGFfdpFEsYpA0gwWgBoNDnxj/IuTPOAZplRq5IIJL/mj eM4aqRFB09UsHM+gyzF0Ik1ozAN1ZikIMTY9VNNd9ibGoxUdmeWLc6r82M6PALZu/do/ 0yRHNIV7QS/l7pJQ+gmw/V7BsC7r2rUQO+fOL9UfO2i9GfBezIMAorodOqFCZXkLaLfJ LGbw== MIME-Version: 1.0 Received: by 10.204.157.156 with SMTP id b28mr3827982bkx.27.1344237800043; Mon, 06 Aug 2012 00:23:20 -0700 (PDT) Received: by 10.204.149.82 with HTTP; Mon, 6 Aug 2012 00:23:19 -0700 (PDT) In-Reply-To: References: Date: Mon, 6 Aug 2012 10:23:19 +0300 Message-ID: Subject: Re: Keeping Map-Tasks alive From: Yaron Gonen To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015175df17090b45004c693c0fd --0015175df17090b45004c693c0fd Content-Type: text/plain; charset=ISO-8859-1 Thanks. As I see it, it cannot be done in the MapReduce 1 framework without changing TaskTracker and JobTracker. Problem is I'm not familiar at all with YARN... it might be possible there. Thanks again! On Mon, Aug 6, 2012 at 1:21 AM, Harsh J wrote: > Ah, my bad - I skipped over the K-Means part of your original post. > > There currently isn't a way to do this with the existing MR framework and > APIs. A Reducer is initiated upon map completion and the Task JVM is canned > away after the Maps end. Perhaps you can use YARN to write something of > what you desire? > > > On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen wrote: > >> Thanks for the fast reply, but I don't see how a custom record reader >> will help. >> Consider again the k-means: the mappers need to stand-by until all the >> reducers finish to calculate the new clusters' center. Only then, after the >> reducers finish their work, the stand-by mappers get back to life and >> perform their work. >> >> >> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J wrote: >> >>> Sure you can, as we provide pluggable code points via the API. Just >>> write a custom record reader that doubles the work (first round reads >>> actual input, second round reads your known output and reiterates). In the >>> mapper, separate the first and second logic via a flag. >>> >>> >>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen wrote: >>> >>>> Hi, >>>> Is there a way to keep a map-task alive after it has finished its work, >>>> to later perform another task on its same input? >>>> For example, consider the k-means clustering algorithm (k-means >>>> description and hadoop >>>> implementation). >>>> The only thing changing between iterations is the clusters centers. All the >>>> input points remain the same. Keeping the mapper alive, and performing the >>>> next round of map-tasks on the same node will save a lot of communication >>>> cost. >>>> >>>> Thanks, >>>> Yaron >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > > > -- > Harsh J > --0015175df17090b45004c693c0fd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks.
As I see it, it cannot be done in the MapReduce= 1 framework without changing TaskTracker and JobTracker.
Problem is I'm not familiar at all with YARN... it might be possible th= ere.
Thanks again!

On Mon, Aug 6, 2012= at 1:21 AM, Harsh J <harsh@cloudera.com> wrote:
Ah, my bad - I skipped over the K-Means part= of your original post.

There currently isn't a way = to do this with the existing MR framework and APIs. A Reducer is initiated = upon map completion and the Task JVM is canned away after the Maps end. Per= haps you can use YARN to write something of what you desire?


On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen= <yaron.gonen@gmail.com> wrote:
Thanks for the fast reply, but I don't see how a custo= m record reader will help.
Consider again the k-means: the mappers need = to stand-by until all the reducers finish to calculate the new clusters'= ; center. Only then, after the reducers finish their work, the stand-by map= pers get back to life and perform their work.


On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <h= arsh@cloudera.com> wrote:
Sure you can, as we provide pluggable code points via the API. Just write a= custom record reader that doubles the work (first round reads actual input= , second round reads your known output and reiterates). In the mapper, sepa= rate the first and second logic via a flag.


On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen = <yaron.gonen@gmail.com> wrote:
Hi,
Is there a way to keep a map-task alive= after it has finished its work, to later perform another task on its same = input?
For example, consider the k-means clustering algorithm (k-means description and hadoo= p implementation). The only thing changing between iterations is the cl= usters centers. All the input points remain the same. Keeping the mapper al= ive, and performing the next round of map-tasks on the same node will save = a lot of communication cost.

Thanks,
Yaron



<= font color=3D"#888888">--
Harsh J




--
Harsh J

--0015175df17090b45004c693c0fd--