Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=received:from:to:date:subject:thread-topic:thread-index:
	message-id:in-reply-to:accept-language:content-language:
	x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage:content-type:mime-version;
	b=qZVQ+39b4wuKOBKJZZ2kfJ6AfJAHa0cCkDHpJxoc+zhiflgstLbQqkPLXcOW6ChV
From: Amogh Vasekar <amogh@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
Date: Tue, 9 Feb 2010 11:10:07 +0530
Subject: Re: avoiding data redistribution in iterative mapreduce
Thread-Topic: avoiding data redistribution in iterative mapreduce
Thread-Index: AcqldK8zzCQQdnZFTj2UzbAmNrG+TwD1abfE
Message-ID: <C796F30F.7319%amogh@yahoo-inc.com>
In-Reply-To: <f9b7187a1002040031h3116722bx9f77e41ab2558a29@mail.gmail.com>
Accept-Language: en-US
Content-Language: en
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_C796F30F7319amoghyahooinccom_"
MIME-Version: 1.0

--_000_C796F30F7319amoghyahooinccom_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,
AFAIK no. I'm not sure how much of a task it is to write a HOD-like schedul=
er, or if its even feasible given the new architecture of single managing J=
T, directly talking to TT. Probably someone more familiar with the schedule=
r architecture can help you better.
What I was trying to suggest with serialization was write initial mapper da=
ta to known location, and instead of streaming from split, ignore that and =
read form here.
Sorry for the delayed response,

Amogh


On 2/4/10 2:01 PM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com> wrote:

Hi,

     So is it not possible to avoid redistribution in this case? If that is=
 the case, can a custom scheduler be written -- will it be any easy task?

Regards,
Raghava.

On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
Hi,

>>Will there be a re-assignment of Map & Reduce nodes by the Master?
In general using available schedulers, I believe so. Because if it weren't,=
 and I submit job 2 needing different/additional set of inputs, the data lo=
cality considerations would be somewhat hampered right? When we had HOD, th=
is was certainly possible.

Amogh


On 2/4/10 1:06 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <http://m=
.vijayaraghava@gmail.com> > wrote:

Hi Amogh,

       Thank you for the reply.

>>> What you need, I believe, is "just run on whatever map has".
            You got that right :). An example of sequential program would b=
e Bubble Sort which needs several iterations for the end result and in each=
 iteration it needs to work on the previous output (partially sorted list) =
rather than the initial input. In my case also, the same thing should happe=
n.

>>> If you are using an exclusive private cluster, you can probably localiz=
e <k,v> from first iteration and >>> use dummy input data ( to ensure same =
number of mapper tasks as first round, and use custom >>> classes of MapRun=
ner, RecordReader to not read data from supplied input )

          Yes, it would be a local cluster, the one in my university. If we=
 set the no of map tasks, would it not be followed in each iteration? As me=
ntioned in the documentation, I think I need to use JobClient to control th=
e no of iterations.


>>> But how can you ensure that you get the same nodes always to run your m=
ap reduce job on a
>>> shared cluster?

           while (!done) { JobClient.runJob(jobConf); <<Do something to che=
ck termination condition>>}

If I write something like that in the code, would not the Map node run on t=
he same data chunk it has each time? Will there be a re-assignment of Map &=
 Reduce nodes by the Master?


Regards,
Raghava.

On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <amogh@yahoo-inc.com <http://=
amogh@yahoo-inc.com> > wrote:
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split i=
s actually a reference to data, which is used to schedule job as close as p=
ossible to data. The record reader then uses same object to pass the <k,v> =
in split.
What you need, I believe, is "just run on whatever map has". If you are usi=
ng an exclusive private cluster, you can probably localize <k,v> from first=
 iteration and use dummy input data ( to ensure same number of mapper tasks=
 as first round, and use custom classes of MapRunner, RecordReader to not r=
ead data from supplied input )But how can you ensure that you get the same =
nodes always to run your map reduce job on a shared cluster?
Please correct me if I misunderstood your question.

Amogh


On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <http://=
m.vijayaraghava@gmail.com>  <http://m.vijayaraghava@gmail.com> > wrote:

Hi all,

      I to run a map reduce task repeatedly in order to achieve the desired=
 result. Is it possible that at the beginning of each iteration, the data s=
et is not distributed (divided into chunks and distributed) again and again=
 i.e. once the distribution occurs for the first time, map nodes should wor=
k on the same chunk in every iteration. Can this be done? I only have a bri=
ef experience with MapReduce and I think that the input data set is redistr=
ibuted every time.

Thank you.

Regards,
Raghava.


--_000_C796F30F7319amoghyahooinccom_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML>
<HEAD>
<TITLE>Re: avoiding data redistribution in iterative mapreduce</TITLE>
</HEAD>
<BODY>
<FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><SPAN STYLE=3D'font-size:=
14pt'>Hi,<BR>
AFAIK no. I&#8217;m not sure how much of a task it is to write a HOD-like s=
cheduler, or if its even feasible given the new architecture of single mana=
ging JT, directly talking to TT. Probably someone more familiar with the sc=
heduler architecture can help you better.<BR>
What I was trying to suggest with serialization was write initial mapper da=
ta to known location, and instead of streaming from split, ignore that and =
read form here.<BR>
Sorry for the delayed response,<BR>
<BR>
Amogh<BR>
<BR>
<BR>
<BR>
On 2/4/10 2:01 PM, &quot;Raghava Mutharaju&quot; &lt;<a href=3D"m.vijayarag=
hava@gmail.com">m.vijayaraghava@gmail.com</a>&gt; wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:14pt'>Hi,<BR>
<BR>
=A0=A0=A0=A0 So is it not possible to avoid redistribution in this case? If=
 that is the case, can a custom scheduler be written -- will it be any easy=
 task?<BR>
<BR>
Regards,<BR>
Raghava.<BR>
<BR>
On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar &lt;<a href=3D"amogh@yahoo-in=
c.com">amogh@yahoo-inc.com</a>&gt; wrote:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:14pt'>Hi,<BR>
<BR>
&gt;&gt;Will there be a re-assignment of Map &amp; Reduce nodes by the Mast=
er?<BR>
In general using available schedulers, I believe so. Because if it weren&#8=
217;t, and I submit job 2 needing different/additional set of inputs, the d=
ata locality considerations would be somewhat hampered right? When we had H=
OD, this was certainly possible.<BR>
<FONT COLOR=3D"#888888"><BR>
Amogh<BR>
</FONT><BR>
<BR>
<BR>
On 2/4/10 1:06 AM, &quot;Raghava Mutharaju&quot; &lt;<a href=3D"m.vijayarag=
hava@gmail.com">m.vijayaraghava@gmail.com</a> &lt;<a href=3D"http://m.vijay=
araghava@gmail.com">http://m.vijayaraghava@gmail.com</a>&gt; &gt; wrote:<BR=
>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:14pt'>Hi Amogh,<BR>
<BR>
=A0=A0=A0=A0=A0=A0 Thank you for the reply.=A0 <BR>
<BR>
&gt;&gt;&gt; What you need, I believe, is &#8220;just run on whatever map h=
as&#8221;.<BR>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 You got that right :). An example of sequ=
ential program would be Bubble Sort which needs several iterations for the =
end result and in each iteration it needs to work on the previous output (p=
artially sorted list) rather than the initial input. In my case also, the s=
ame thing should happen.<BR>
<BR>
&gt;&gt;&gt; If you are using an exclusive private cluster, you can probabl=
y localize &lt;k,v&gt; from first iteration and &gt;&gt;&gt; use dummy inpu=
t data ( to ensure same number of mapper tasks as first round, and use cust=
om &gt;&gt;&gt; classes of MapRunner, RecordReader to not read data from su=
pplied input )<BR>
=A0=A0 <BR>
=A0=A0=A0=A0=A0=A0=A0=A0=A0 Yes, it would be a local cluster, the one in my=
 university. If we set the no of map tasks, would it not be followed in eac=
h iteration? As mentioned in the documentation, I think I need to use JobCl=
ient to control the no of iterations.=A0 <BR>
<BR>
<BR>
&gt;&gt;&gt; But how can you ensure that you get the same nodes always to r=
un your map reduce job on a <BR>
&gt;&gt;&gt; shared cluster?<BR>
<BR>
=A0 =A0 =A0 =A0 =A0=A0 while (!done) { JobClient.runJob(jobConf); &lt;&lt;D=
o something to check termination condition&gt;&gt;}<BR>
<BR>
If I write something like that in the code, would not the Map node run on t=
he same data chunk it has each time? Will there be a re-assignment of Map &=
amp; Reduce nodes by the Master?<BR>
<BR>
<BR>
Regards,<BR>
Raghava. =A0=A0=A0=A0=A0=A0=A0=A0 <BR>
<BR>
On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar &lt;<a href=3D"amogh@yahoo-in=
c.com">amogh@yahoo-inc.com</a> &lt;<a href=3D"http://amogh@yahoo-inc.com">h=
ttp://amogh@yahoo-inc.com</a>&gt; &gt; wrote:<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:14pt'>Hi,<BR>
If each of your sequential iteration is map+reduce, then no. <BR>
The lifetime of a split is confined to a single map reduce job. The split i=
s actually a reference to data, which is used to schedule job as close as p=
ossible to data. The record reader then uses same object to pass the &lt;k,=
v&gt; in split.<BR>
What you need, I believe, is &#8220;just run on whatever map has&#8221;. If=
 you are using an exclusive private cluster, you can probably localize &lt;=
k,v&gt; from first iteration and use dummy input data ( to ensure same numb=
er of mapper tasks as first round, and use custom classes of MapRunner, Rec=
ordReader to not read data from supplied input )But how can you ensure that=
 you get the same nodes always to run your map reduce job on a shared clust=
er?<BR>
Please correct me if I misunderstood your question.<BR>
<FONT COLOR=3D"#888888"><BR>
Amogh<BR>
</FONT><BR>
<BR>
<BR>
On 2/3/10 11:34 AM, &quot;Raghava Mutharaju&quot; &lt;<a href=3D"m.vijayara=
ghava@gmail.com">m.vijayaraghava@gmail.com</a> &lt;<a href=3D"http://m.vija=
yaraghava@gmail.com">http://m.vijayaraghava@gmail.com</a>&gt; &nbsp;&lt;<a =
href=3D"http://m.vijayaraghava@gmail.com">http://m.vijayaraghava@gmail.com<=
/a>&gt; &gt; wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:14pt'>Hi all,<BR>
<BR>
=A0=A0=A0=A0=A0 I to run a map reduce task repeatedly in order to achieve t=
he desired result. Is it possible that at the beginning of each iteration, =
the data set is not distributed (divided into chunks and distributed) again=
 and again i.e. once the distribution occurs for the first time, map nodes =
should work on the same chunk in every iteration. Can this be done? I only =
have a brief experience with MapReduce and I think that the input data set =
is redistributed every time.<BR>
<BR>
Thank you.<BR>
<BR>
Regards,<BR>
Raghava.<BR>
<BR>
</SPAN></FONT></BLOCKQUOTE></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Hel=
vetica, Arial"><SPAN STYLE=3D'font-size:14pt'><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE></BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Hel=
vetica, Arial"><SPAN STYLE=3D'font-size:14pt'><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE>
</BODY>
</HTML>


--_000_C796F30F7319amoghyahooinccom_--