Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 25440 invoked from network); 9 Feb 2010 05:41:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2010 05:41:00 -0000 Received: (qmail 48843 invoked by uid 500); 9 Feb 2010 05:41:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 48738 invoked by uid 500); 9 Feb 2010 05:40:59 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 48729 invoked by uid 99); 9 Feb 2010 05:40:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 05:40:59 +0000 X-ASF-Spam-Status: No, hits=4.2 required=10.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.172] (HELO mrout2.yahoo.com) (216.145.54.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Feb 2010 05:40:49 +0000 Received: from EGL-EX07CAS02.ds.corp.yahoo.com (egl-ex07cas02.eglbp.corp.yahoo.com [203.83.248.209]) by mrout2.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o195eARQ045567 for ; Mon, 8 Feb 2010 21:40:10 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:from:to:date:subject:thread-topic:thread-index: message-id:in-reply-to:accept-language:content-language: x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage:content-type:mime-version; b=qZVQ+39b4wuKOBKJZZ2kfJ6AfJAHa0cCkDHpJxoc+zhiflgstLbQqkPLXcOW6ChV Received: from EGL-EX07VS01.ds.corp.yahoo.com ([203.83.248.205]) by EGL-EX07CAS02.ds.corp.yahoo.com ([203.83.248.216]) with mapi; Tue, 9 Feb 2010 11:10:09 +0530 From: Amogh Vasekar To: "mapreduce-user@hadoop.apache.org" Date: Tue, 9 Feb 2010 11:10:07 +0530 Subject: Re: avoiding data redistribution in iterative mapreduce Thread-Topic: avoiding data redistribution in iterative mapreduce Thread-Index: AcqldK8zzCQQdnZFTj2UzbAmNrG+TwD1abfE Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C796F30F7319amoghyahooinccom_" MIME-Version: 1.0 --_000_C796F30F7319amoghyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, AFAIK no. I'm not sure how much of a task it is to write a HOD-like schedul= er, or if its even feasible given the new architecture of single managing J= T, directly talking to TT. Probably someone more familiar with the schedule= r architecture can help you better. What I was trying to suggest with serialization was write initial mapper da= ta to known location, and instead of streaming from split, ignore that and = read form here. Sorry for the delayed response, Amogh On 2/4/10 2:01 PM, "Raghava Mutharaju" wrote: Hi, So is it not possible to avoid redistribution in this case? If that is= the case, can a custom scheduler be written -- will it be any easy task? Regards, Raghava. On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar wrote: Hi, >>Will there be a re-assignment of Map & Reduce nodes by the Master? In general using available schedulers, I believe so. Because if it weren't,= and I submit job 2 needing different/additional set of inputs, the data lo= cality considerations would be somewhat hampered right? When we had HOD, th= is was certainly possible. Amogh On 2/4/10 1:06 AM, "Raghava Mutharaju" > wrote: Hi Amogh, Thank you for the reply. >>> What you need, I believe, is "just run on whatever map has". You got that right :). An example of sequential program would b= e Bubble Sort which needs several iterations for the end result and in each= iteration it needs to work on the previous output (partially sorted list) = rather than the initial input. In my case also, the same thing should happe= n. >>> If you are using an exclusive private cluster, you can probably localiz= e from first iteration and >>> use dummy input data ( to ensure same = number of mapper tasks as first round, and use custom >>> classes of MapRun= ner, RecordReader to not read data from supplied input ) Yes, it would be a local cluster, the one in my university. If we= set the no of map tasks, would it not be followed in each iteration? As me= ntioned in the documentation, I think I need to use JobClient to control th= e no of iterations. >>> But how can you ensure that you get the same nodes always to run your m= ap reduce job on a >>> shared cluster? while (!done) { JobClient.runJob(jobConf); <>} If I write something like that in the code, would not the Map node run on t= he same data chunk it has each time? Will there be a re-assignment of Map &= Reduce nodes by the Master? Regards, Raghava. On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar > wrote: Hi, If each of your sequential iteration is map+reduce, then no. The lifetime of a split is confined to a single map reduce job. The split i= s actually a reference to data, which is used to schedule job as close as p= ossible to data. The record reader then uses same object to pass the = in split. What you need, I believe, is "just run on whatever map has". If you are usi= ng an exclusive private cluster, you can probably localize from first= iteration and use dummy input data ( to ensure same number of mapper tasks= as first round, and use custom classes of MapRunner, RecordReader to not r= ead data from supplied input )But how can you ensure that you get the same = nodes always to run your map reduce job on a shared cluster? Please correct me if I misunderstood your question. Amogh On 2/3/10 11:34 AM, "Raghava Mutharaju" > wrote: Hi all, I to run a map reduce task repeatedly in order to achieve the desired= result. Is it possible that at the beginning of each iteration, the data s= et is not distributed (divided into chunks and distributed) again and again= i.e. once the distribution occurs for the first time, map nodes should wor= k on the same chunk in every iteration. Can this be done? I only have a bri= ef experience with MapReduce and I think that the input data set is redistr= ibuted every time. Thank you. Regards, Raghava. --_000_C796F30F7319amoghyahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: avoiding data redistribution in iterative mapreduce Hi,
AFAIK no. I’m not sure how much of a task it is to write a HOD-like s= cheduler, or if its even feasible given the new architecture of single mana= ging JT, directly talking to TT. Probably someone more familiar with the sc= heduler architecture can help you better.
What I was trying to suggest with serialization was write initial mapper da= ta to known location, and instead of streaming from split, ignore that and = read form here.
Sorry for the delayed response,

Amogh



On 2/4/10 2:01 PM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com> wrote:

Hi,

=A0=A0=A0=A0 So is it not possible to avoid redistribution in this case? If= that is the case, can a custom scheduler be written -- will it be any easy= task?

Regards,
Raghava.

On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
Hi,

>>Will there be a re-assignment of Map & Reduce nodes by the Mast= er?
In general using available schedulers, I believe so. Because if it weren= 217;t, and I submit job 2 needing different/additional set of inputs, the d= ata locality considerations would be somewhat hampered right? When we had H= OD, this was certainly possible.

Amogh



On 2/4/10 1:06 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <http://m.vijayaraghava@gmail.com> > wrote:
Hi Amogh,

=A0=A0=A0=A0=A0=A0 Thank you for the reply.=A0

>>> What you need, I believe, is “just run on whatever map h= as”.
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 You got that right :). An example of sequ= ential program would be Bubble Sort which needs several iterations for the = end result and in each iteration it needs to work on the previous output (p= artially sorted list) rather than the initial input. In my case also, the s= ame thing should happen.

>>> If you are using an exclusive private cluster, you can probabl= y localize <k,v> from first iteration and >>> use dummy inpu= t data ( to ensure same number of mapper tasks as first round, and use cust= om >>> classes of MapRunner, RecordReader to not read data from su= pplied input )
=A0=A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Yes, it would be a local cluster, the one in my= university. If we set the no of map tasks, would it not be followed in eac= h iteration? As mentioned in the documentation, I think I need to use JobCl= ient to control the no of iterations.=A0


>>> But how can you ensure that you get the same nodes always to r= un your map reduce job on a
>>> shared cluster?

=A0 =A0 =A0 =A0 =A0=A0 while (!done) { JobClient.runJob(jobConf); <<D= o something to check termination condition>>}

If I write something like that in the code, would not the Map node run on t= he same data chunk it has each time? Will there be a re-assignment of Map &= amp; Reduce nodes by the Master?


Regards,
Raghava. =A0=A0=A0=A0=A0=A0=A0=A0

On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <amogh@yahoo-inc.com <h= ttp://amogh@yahoo-inc.com> > wrote:
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split i= s actually a reference to data, which is used to schedule job as close as p= ossible to data. The record reader then uses same object to pass the <k,= v> in split.
What you need, I believe, is “just run on whatever map has”. If= you are using an exclusive private cluster, you can probably localize <= k,v> from first iteration and use dummy input data ( to ensure same numb= er of mapper tasks as first round, and use custom classes of MapRunner, Rec= ordReader to not read data from supplied input )But how can you ensure that= you get the same nodes always to run your map reduce job on a shared clust= er?
Please correct me if I misunderstood your question.

Amogh



On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <http://m.vijayaraghava@gmail.com>  <http://m.vijayaraghava@gmail.com<= /a>> > wrote:

Hi all,

=A0=A0=A0=A0=A0 I to run a map reduce task repeatedly in order to achieve t= he desired result. Is it possible that at the beginning of each iteration, = the data set is not distributed (divided into chunks and distributed) again= and again i.e. once the distribution occurs for the first time, map nodes = should work on the same chunk in every iteration. Can this be done? I only = have a brief experience with MapReduce and I think that the input data set = is redistributed every time.

Thank you.

Regards,
Raghava.





--_000_C796F30F7319amoghyahooinccom_--