Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of tmp5330@gmail.com designates
 209.85.217.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAC4A+kcaMgXZkqT27XkfMXC9+bWbsB8tpRuBO+r9pWfrNDVoqg@mail.gmail.com>
References: 
 <CAC4A+kfsHOn0h-=OLsv26qYJF7hahLGfhYn93xiNcxqKRoLB=Q@mail.gmail.com>
	<CAFwH+a-c-DP-weSvLRrp-0zuUoGc5rwe7reytcNezpEcci550A@mail.gmail.com>
	<CAC4A+kfuCtg2cwv=98fSm3dMOoBTAHQ_T1L5xjSQVUsZ3ooO8A@mail.gmail.com>
	<CAFwH+a-zDpUGQXodixBCWfjo7FZWzAShrg3LyDqeNzwJ9hd3tw@mail.gmail.com>
	<CAC4A+kcaMgXZkqT27XkfMXC9+bWbsB8tpRuBO+r9pWfrNDVoqg@mail.gmail.com>
Date: Thu, 5 Dec 2013 13:31:45 +0100
Message-ID: 
 <CAFwH+a_M_Jgp57Y+jk-oVj7V0UH8pq1RdoY0S9eue=H_jS+U4g@mail.gmail.com>
Subject: Re: Implementing and running an applicationmaster
From: Rob Blah <tmp5330@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e012277166d633e04ecc8b666

--089e012277166d633e04ecc8b666
Content-Type: text/plain; charset=ISO-8859-1

Hi

There is a way but it's not an easy one. You should overwrite the container
request code in MR_AM. As each container in MapReduce gets the same amount
of memory, the OOM shouldn't be problem as inner task "buffers" can be
spilled to disk. I am no MapReduce (code) specialist but I would start by
finding MR_Driver.class and MR_AM.class. Then overwrite the Driver.class to
execute your class Custom_MR_AM (C_MR_AM). C_MR_AM will be a copy of MR_AM
but you should change the container request code, so that you can allocate
N containers with X memory and M container with Y memory.

The hadoop-mapreduce-examples.jar is just a bunch of HelloWorld jobs. So a
new user can pick up and "learn" MR quickly.

Maybe some real MR specialist can give you better advice than me.

regards
tmp


2013/12/5 Yue Wang <terranwy@gmail.com>

> Hi,
>
> Thank you for your answer. Now I understand the connection between the two
> ways.
>
> I asked this question because I want to take benefit from the YARN
> architecture.
> If I understood correctly, I can let my ApplicationMaster request
> containers more flexibly. For example, I can request two containers with
> 100MB memory and two containers with 200MB memory for my mappers on YARN.
> However, I cannot do that on MRv1.
>
> So if I execute a WordCount program by typing "yarn jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount
> wordcount/ wc-output/", such flexibility is gone.
>
> Is there a way to let my ApplicationMaster execute WordCount on HDFS on
> containers?
>
>
> Thanks!
>
>
> On Thu, Dec 5, 2013 at 4:28 AM, Rob Blah <tmp5330@gmail.com> wrote:
>
>> Hi
>>
>> If I understood you correctly, you would like to run your AM with YARN
>> Client from shell as oppose to run the Driver like in MRv1. But it's the
>> same thing (more or less). In the example you provided
>> (org.apache.hadoop.yarn.applications.DistributedShell) the Client.class is
>> the "driver". However since distributed-shell is a "simple" application you
>> do not need a lot of configuration (setting fields in Configuration.class,
>> I/O formats etc.). The same goes for any other application. As for the
>> second example (org.apache.hadoop.examples.WordCount) MapReduce AM requires
>> certain configuration, thus you have to to it the "old-way". The main
>> difference would be: MR -> end-user-config -> driver, DS -> driver (but you
>> still can create your own end-user-config). Hope this answers your question
>> and that I understood it correctly.
>>
>> regards
>> tmp
>>
>>
>> 2013/12/5 Yue Wang <terranwy@gmail.com>
>>
>>> Hi,
>>>
>>> I took a look at the codes and found some examples on the web.
>>> One example is: http://wiki.opf-labs.org/display/SP/Resource+management
>>>
>>> It seems that users can run simple shell commands using Client of YARN.
>>> But when it comes to a practical MapReduce example like WordCount,
>>> people still run commands in the old way as in MRv1.
>>>
>>> How can I run WordCount using Client and ApplicationMaster of YARN so
>>> that I can request resources flexibly?
>>>
>>>
>>> Thanks!
>>>
>>>
>>> On Mon, Dec 2, 2013 at 11:26 AM, Rob Blah <tmp5330@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Follow the example provided in
>>>> Yarn_dist/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.
>>>>
>>>> regards
>>>> tmp
>>>>
>>>>
>>>> 2013/12/1 Yue Wang <terranwy@gmail.com>
>>>>
>>>>> Hi,
>>>>>
>>>>> I found the page (
>>>>> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html)
>>>>> and know how to write an ApplicationMaster.
>>>>>
>>>>> However, is there a complete example showing how to run this
>>>>> ApplicationMaster with a real Hadoop Program (e.g. WordCount) on YARN?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Yue
>>>>>
>>>>
>>>>
>>>
>>
>

--089e012277166d633e04ecc8b666
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Hi<br><br></div>There is a way but it&#39;s not =
an easy one. You should overwrite the container request code in MR_AM. As e=
ach container in MapReduce gets the same amount of memory, the OOM shouldn&=
#39;t be problem as inner task &quot;buffers&quot; can be spilled to disk. =
I am no MapReduce (code) specialist but I would start by finding MR_Driver.=
class and MR_AM.class. Then overwrite the Driver.class to execute your clas=
s Custom_MR_AM (C_MR_AM). C_MR_AM will be a copy of MR_AM but you should ch=
ange the container request code, so that you can allocate N containers with=
 X memory and M container with Y memory.<br>
<br></div><div>The hadoop-mapreduce-examples.jar is just a bunch of HelloWo=
rld jobs. So a new user can pick up and &quot;learn&quot; MR quickly.<br><b=
r></div><div>Maybe some real MR specialist can give you better advice than =
me.<br>
</div><div><br></div>regards<br>tmp<br></div><div class=3D"gmail_extra"><br=
><br><div class=3D"gmail_quote">2013/12/5 Yue Wang <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:terranwy@gmail.com" target=3D"_blank">terranwy@gmail.com</a=
>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>Than=
k you for your answer. Now I understand the connection between the two ways=
.</div>
<div><br></div><div>I asked this question because I want to take benefit fr=
om the YARN architecture.</div>

<div>If I understood correctly, I can let my ApplicationMaster request cont=
ainers more flexibly. For example, I can request two containers with 100MB =
memory and two containers with 200MB memory for my mappers on YARN. However=
, I cannot do that on MRv1.</div>


<div><br></div><div>So if I execute a WordCount program by typing &quot;yar=
n jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount wor=
dcount/ wc-output/&quot;, such flexibility is gone.</div><div><br></div>


<div>Is there a way to let my ApplicationMaster execute WordCount on HDFS o=
n containers?</div><div><br></div><div><br></div><div>Thanks!</div></div><d=
iv class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><d=
iv class=3D"gmail_quote">
On Thu, Dec 5, 2013 at 4:28 AM, Rob Blah <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:tmp5330@gmail.com" target=3D"_blank">tmp5330@gmail.com</a>&gt;</span>=
 wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Hi<br><br></div>I=
f I understood you correctly, you would like to run your AM with YARN Clien=
t from shell as oppose to run the Driver like in MRv1. But it&#39;s the sam=
e thing (more or less). In the example you provided (org.apache.hadoop.yarn=
.applications.DistributedShell) the Client.class is the &quot;driver&quot;.=
 However since distributed-shell is a &quot;simple&quot; application you do=
 not need a lot of configuration (setting fields in Configuration.class, I/=
O formats etc.). The same goes for any other application. As for the second=
 example (org.apache.hadoop.examples.WordCount) MapReduce AM requires certa=
in configuration, thus you have to to it the &quot;old-way&quot;. The main =
difference would be: MR -&gt; end-user-config -&gt; driver, DS -&gt; driver=
 (but you still can create your own end-user-config). Hope this answers you=
r question and that I understood it correctly.<br>


<br></div>regards<br>tmp<br></div><div><div><div class=3D"gmail_extra"><br>=
<br><div class=3D"gmail_quote">2013/12/5 Yue Wang <span dir=3D"ltr">&lt;<a =
href=3D"mailto:terranwy@gmail.com" target=3D"_blank">terranwy@gmail.com</a>=
&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div>Hi,</div><div><br></div><div>I took a look at the cod=
es and found some examples on the web.=A0</div><div>One example is:=A0<a hr=
ef=3D"http://wiki.opf-labs.org/display/SP/Resource+management" target=3D"_b=
lank">http://wiki.opf-labs.org/display/SP/Resource+management</a></div>


<div><br></div><div>It seems that users can run simple shell commands using=
 Client of YARN.</div><div>But when it comes to a practical MapReduce examp=
le like WordCount, people still run commands in the old way as in MRv1.</di=
v>


<div><br></div><div>How can I run WordCount using Client and ApplicationMas=
ter of YARN so that I can request resources flexibly?</div><div><br></div><=
div><br></div><div>Thanks!</div></div><div><div>
<div class=3D"gmail_extra"><br><br>

<div class=3D"gmail_quote">On Mon, Dec 2, 2013 at 11:26 AM, Rob Blah <span =
dir=3D"ltr">&lt;<a href=3D"mailto:tmp5330@gmail.com" target=3D"_blank">tmp5=
330@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr"><div>Hi<br><br></div><div>Follow the example provided in Y=
arn_dist/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-ya=
rn-applications-distributedshell.<br></div><div><br>regards<br></div>tmp<br=
>


</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">2013/12/1 Yue Wang <span dir=3D"ltr">&lt;<a href=3D"mailto:terranwy@gmai=
l.com" target=3D"_blank">terranwy@gmail.com</a>&gt;</span><br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">Hi,<div><br></div><div>I found the page (<a href=3D"http:/=
/hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnAppl=
ications.html" target=3D"_blank">http://hadoop.apache.org/docs/stable/hadoo=
p-yarn/hadoop-yarn-site/WritingYarnApplications.html</a>) and know how to w=
rite an ApplicationMaster.<br>


</div><div><br></div><div>However, is there a complete example showing how =
to run this ApplicationMaster with a real Hadoop Program (e.g. WordCount) o=
n YARN?</div><div><br></div><div>Thanks!</div><span><font color=3D"#888888"=
><div>


<br></div><div><br>
</div>
<div><br></div><div>Yue</div></font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e012277166d633e04ecc8b666--