Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0D26F10F09 for ; Thu, 5 Dec 2013 12:36:37 +0000 (UTC) Received: (qmail 6136 invoked by uid 500); 5 Dec 2013 12:33:14 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 5762 invoked by uid 500); 5 Dec 2013 12:32:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5488 invoked by uid 99); 5 Dec 2013 12:32:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Dec 2013 12:32:12 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tmp5330@gmail.com designates 209.85.217.171 as permitted sender) Received: from [209.85.217.171] (HELO mail-lb0-f171.google.com) (209.85.217.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Dec 2013 12:32:05 +0000 Received: by mail-lb0-f171.google.com with SMTP id q8so9876934lbi.16 for ; Thu, 05 Dec 2013 04:31:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7Xh/PL8hBE6wI6o+iIRVpsXIahbxA+DLr0YQvz5uRb0=; b=GQhxZSBWbZrz+BSfPe+ztlZj0zCa/X9Ukcga0+h4iWb8OOzl3uKZf+dk3h832nuyT1 Kw71ShzR4WoeZvV52a1Ns4SbV6e9Sg8LpS6rSGV+6zTwBUw7Hr5qaDLXqQ1irRmk3KtA Vc5WiUVDXLEx85DH+ESlJB3GzL3OAQYNpyAR50YK7B+z/0/V4FHZaExCrGstXzVE3YJX 7214n+OZpurkmwA246tFhYhn6Vd3B0am3op+yqrIhGO0OiLB6691Q/9lz5OqFamQX4Gf PnxbtSQsQ3fze0IWBM8xbFE3/4lW/bX5o6asg7PtpA7MSeQOdCRHN/Tzanr+a5cMzKJr xGaw== MIME-Version: 1.0 X-Received: by 10.152.121.105 with SMTP id lj9mr18291494lab.6.1386246705077; Thu, 05 Dec 2013 04:31:45 -0800 (PST) Received: by 10.114.18.14 with HTTP; Thu, 5 Dec 2013 04:31:45 -0800 (PST) In-Reply-To: References: Date: Thu, 5 Dec 2013 13:31:45 +0100 Message-ID: Subject: Re: Implementing and running an applicationmaster From: Rob Blah To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e012277166d633e04ecc8b666 X-Virus-Checked: Checked by ClamAV on apache.org --089e012277166d633e04ecc8b666 Content-Type: text/plain; charset=ISO-8859-1 Hi There is a way but it's not an easy one. You should overwrite the container request code in MR_AM. As each container in MapReduce gets the same amount of memory, the OOM shouldn't be problem as inner task "buffers" can be spilled to disk. I am no MapReduce (code) specialist but I would start by finding MR_Driver.class and MR_AM.class. Then overwrite the Driver.class to execute your class Custom_MR_AM (C_MR_AM). C_MR_AM will be a copy of MR_AM but you should change the container request code, so that you can allocate N containers with X memory and M container with Y memory. The hadoop-mapreduce-examples.jar is just a bunch of HelloWorld jobs. So a new user can pick up and "learn" MR quickly. Maybe some real MR specialist can give you better advice than me. regards tmp 2013/12/5 Yue Wang > Hi, > > Thank you for your answer. Now I understand the connection between the two > ways. > > I asked this question because I want to take benefit from the YARN > architecture. > If I understood correctly, I can let my ApplicationMaster request > containers more flexibly. For example, I can request two containers with > 100MB memory and two containers with 200MB memory for my mappers on YARN. > However, I cannot do that on MRv1. > > So if I execute a WordCount program by typing "yarn jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount > wordcount/ wc-output/", such flexibility is gone. > > Is there a way to let my ApplicationMaster execute WordCount on HDFS on > containers? > > > Thanks! > > > On Thu, Dec 5, 2013 at 4:28 AM, Rob Blah wrote: > >> Hi >> >> If I understood you correctly, you would like to run your AM with YARN >> Client from shell as oppose to run the Driver like in MRv1. But it's the >> same thing (more or less). In the example you provided >> (org.apache.hadoop.yarn.applications.DistributedShell) the Client.class is >> the "driver". However since distributed-shell is a "simple" application you >> do not need a lot of configuration (setting fields in Configuration.class, >> I/O formats etc.). The same goes for any other application. As for the >> second example (org.apache.hadoop.examples.WordCount) MapReduce AM requires >> certain configuration, thus you have to to it the "old-way". The main >> difference would be: MR -> end-user-config -> driver, DS -> driver (but you >> still can create your own end-user-config). Hope this answers your question >> and that I understood it correctly. >> >> regards >> tmp >> >> >> 2013/12/5 Yue Wang >> >>> Hi, >>> >>> I took a look at the codes and found some examples on the web. >>> One example is: http://wiki.opf-labs.org/display/SP/Resource+management >>> >>> It seems that users can run simple shell commands using Client of YARN. >>> But when it comes to a practical MapReduce example like WordCount, >>> people still run commands in the old way as in MRv1. >>> >>> How can I run WordCount using Client and ApplicationMaster of YARN so >>> that I can request resources flexibly? >>> >>> >>> Thanks! >>> >>> >>> On Mon, Dec 2, 2013 at 11:26 AM, Rob Blah wrote: >>> >>>> Hi >>>> >>>> Follow the example provided in >>>> Yarn_dist/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. >>>> >>>> regards >>>> tmp >>>> >>>> >>>> 2013/12/1 Yue Wang >>>> >>>>> Hi, >>>>> >>>>> I found the page ( >>>>> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) >>>>> and know how to write an ApplicationMaster. >>>>> >>>>> However, is there a complete example showing how to run this >>>>> ApplicationMaster with a real Hadoop Program (e.g. WordCount) on YARN? >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>>> Yue >>>>> >>>> >>>> >>> >> > --089e012277166d633e04ecc8b666 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi

There is a way but it's not = an easy one. You should overwrite the container request code in MR_AM. As e= ach container in MapReduce gets the same amount of memory, the OOM shouldn&= #39;t be problem as inner task "buffers" can be spilled to disk. = I am no MapReduce (code) specialist but I would start by finding MR_Driver.= class and MR_AM.class. Then overwrite the Driver.class to execute your clas= s Custom_MR_AM (C_MR_AM). C_MR_AM will be a copy of MR_AM but you should ch= ange the container request code, so that you can allocate N containers with= X memory and M container with Y memory.

The hadoop-mapreduce-examples.jar is just a bunch of HelloWo= rld jobs. So a new user can pick up and "learn" MR quickly.
Maybe some real MR specialist can give you better advice than = me.

regards
tmp

2013/12/5 Yue Wang <terranwy@gmail.com>
Hi,

Than= k you for your answer. Now I understand the connection between the two ways= .

I asked this question because I want to take benefit fr= om the YARN architecture.
If I understood correctly, I can let my ApplicationMaster request cont= ainers more flexibly. For example, I can request two containers with 100MB = memory and two containers with 200MB memory for my mappers on YARN. However= , I cannot do that on MRv1.

So if I execute a WordCount program by typing "yar= n jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount wor= dcount/ wc-output/", such flexibility is gone.

Is there a way to let my ApplicationMaster execute WordCount on HDFS o= n containers?


Thanks!


On Thu, Dec 5, 2013 at 4:28 AM, Rob Blah <tmp5330@gmail.com>= wrote:
Hi

I= f I understood you correctly, you would like to run your AM with YARN Clien= t from shell as oppose to run the Driver like in MRv1. But it's the sam= e thing (more or less). In the example you provided (org.apache.hadoop.yarn= .applications.DistributedShell) the Client.class is the "driver".= However since distributed-shell is a "simple" application you do= not need a lot of configuration (setting fields in Configuration.class, I/= O formats etc.). The same goes for any other application. As for the second= example (org.apache.hadoop.examples.WordCount) MapReduce AM requires certa= in configuration, thus you have to to it the "old-way". The main = difference would be: MR -> end-user-config -> driver, DS -> driver= (but you still can create your own end-user-config). Hope this answers you= r question and that I understood it correctly.

regards
tmp

=
2013/12/5 Yue Wang <terranwy@gmail.com= >
Hi,

I took a look at the cod= es and found some examples on the web.=A0

It seems that users can run simple shell commands using= Client of YARN.
But when it comes to a practical MapReduce examp= le like WordCount, people still run commands in the old way as in MRv1.

How can I run WordCount using Client and ApplicationMas= ter of YARN so that I can request resources flexibly?

<= div>
Thanks!


On Mon, Dec 2, 2013 at 11:26 AM, Rob Blah <tmp5= 330@gmail.com> wrote:
Hi

Follow the example provided in Y= arn_dist/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-ya= rn-applications-distributedshell.

regards
tmp


2013/12/1 Yue Wang <terranwy@gmail.com>
Hi,

I found the page (http://hadoop.apache.org/docs/stable/hadoo= p-yarn/hadoop-yarn-site/WritingYarnApplications.html) and know how to w= rite an ApplicationMaster.

However, is there a complete example showing how = to run this ApplicationMaster with a real Hadoop Program (e.g. WordCount) o= n YARN?

Thanks!



Yue





--089e012277166d633e04ecc8b666--