Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3CD5B10646 for ; Thu, 20 Jun 2013 07:00:05 +0000 (UTC) Received: (qmail 60479 invoked by uid 500); 20 Jun 2013 06:59:59 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 60396 invoked by uid 500); 20 Jun 2013 06:59:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60389 invoked by uid 99); 20 Jun 2013 06:59:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2013 06:59:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of acm@hortonworks.com designates 209.85.192.182 as permitted sender) Received: from [209.85.192.182] (HELO mail-pd0-f182.google.com) (209.85.192.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2013 06:59:54 +0000 Received: by mail-pd0-f182.google.com with SMTP id r10so5915809pdi.13 for ; Wed, 19 Jun 2013 23:59:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:content-type:message-id:mime-version:subject:date:references :to:in-reply-to:x-mailer:x-gm-message-state; bh=9TXzr1BendFQj+Xu7kpMyoUtMbokzTDab/vQyxFbOUQ=; b=iPOAUZJirY7C1nNCb5S2KkZ2kZI2qEEerCmuiw12FvxreaUAV/TE6/UHWmon+Vl/vn LA/RQj6nfrQrAq1cd6iHvbisqGdOoCbqkb2qA2QYvmr8XzIFP/7q70yJqVWcTFC9E5ZL 1GLfEEskn8WTf8pEDjoKQI9ExXoH/iyuXfl6PaB1dLnT6gY9W2URP3q9OA1xyq60iRU1 DIS0//LKX0wLuC9iopeb6YD9zYI9IosD3exyQIQleXi7VlUPgRLVgM3Ry9mC9NZwAPIV uXF/EXzpCuCvmirKdqtIRMZzBMqP+oHMm5U0epEM/Aa06MVj3qdKZc0oAKL0maWRlF/I Bz8Q== X-Received: by 10.68.14.6 with SMTP id l6mr6205191pbc.202.1371711574495; Wed, 19 Jun 2013 23:59:34 -0700 (PDT) Received: from [10.0.1.25] (c-98-234-189-94.hsd1.ca.comcast.net. [98.234.189.94]) by mx.google.com with ESMTPSA id ig4sm26548948pbc.18.2013.06.19.23.59.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Jun 2013 23:59:30 -0700 (PDT) From: Arun C Murthy Content-Type: multipart/alternative; boundary="Apple-Mail=_F213F5B6-3F17-4672-AF5F-CF2B955D7ADA" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: How Yarn execute MRv1 job? Date: Wed, 19 Jun 2013 23:59:31 -0700 References: <06006DDA5A27D541991944AC4117E7A96E1C2135@szxeml560-mbx.china.huawei.com> <493FAECB-D1F7-4FDA-9CDC-A81CA4AF0A14@hortonworks.com> <76B384CD-39E3-4F38-966B-FD26058A722B@hortonworks.com> To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQksuBUdNGKnI1qT4/RJMpKywS269vk3wPvFnReO6cyZZoH1j9VzLcmeGhFVAWDY9qfNB2ge X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_F213F5B6-3F17-4672-AF5F-CF2B955D7ADA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 I'd use hive-0.11. On Jun 19, 2013, at 11:56 PM, sam liu wrote: > Hi Azurry, >=20 > So, older versions of HBase and Hive, like HBase 0.94.0 and Hive = 0.9.0, does not support hadoop 2.x, right? >=20 > Thanks! >=20 >=20 > 2013/6/20 Azuryy Yu > Hi Sam,=20 > please look at :http://hbase.apache.org/book.html#d2617e499 >=20 > generally, we said YARN is Hadoop-2.x, you can download = hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well. >=20 >=20 >=20 > On Thu, Jun 20, 2013 at 2:11 PM, sam liu = wrote: > Thanks Arun! >=20 > #1, Yes, I did tests and found that the MRv1 jobs could run against = YARN directly, without recompiling >=20 > #2, do you mean the old versions of HBase/Hive can not run agains = YARN, and only some special versions of them can run against YARN? If = yes, how can I get the versions for YARN? >=20 >=20 > 2013/6/20 Arun C Murthy >=20 > On Jun 19, 2013, at 6:45 PM, sam liu wrote: >=20 >> Appreciating for the detailed answers! Here are three further = questions: >>=20 >> - Yarn maintains backwards compatibility, and MRv1 job could run on = Yarn. If yarn does not ask existing MRv1 job to do any code change, but = why we should recompile the MRv1 job? >=20 > You don't need to recompile MRv1 jobs to run against YARN. >=20 >> - Which yarn jar files are required in the recompiling? >> - In a cluster with Hadoop 1.1.1 and other Hadoop related = components(HBase 0.94.3, Hive 0.9.0, Zookeeper 3.4.5,...), if we want = to replace Hadoop 1.1.1 with yarn, do we need to recompile all other = Hadoop related components again with yarn jar files? Without any code = change? >=20 > You will need versions of HBase, Hive etc. which are integrated with = hadoop-2.x, but not need to change any of your end-user applications (MR = jobs, hive queries, pig scripts etc.) >=20 > Arun >=20 >>=20 >> Thanks in advance! >>=20 >>=20 >>=20 >> 2013/6/19 Rahul Bhattacharjee >> Thanks Arun and Devraj , good to know. >>=20 >>=20 >>=20 >> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy = wrote: >> Not true, the CapacityScheduler has support for both CPU & Memory = now. >>=20 >> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee = wrote: >>=20 >>> Hi Devaraj, >>>=20 >>> As for the container request request for yarn container , currently = only memory is considered as resource , not cpu. Please correct. >>>=20 >>> Thanks, >>> Rahul >>>=20 >>>=20 >>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k = wrote: >>> Hi Sam, >>>=20 >>> Please find the answers for your queries. >>>=20 >>>=20 >>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 = job has special execution process(map > shuffle > reduce) in Hadoop 1.x, = and how Yarn execute a MRv1 job? still include some special MR steps in = Hadoop 1.x, like map, sort, merge, combine and shuffle? >>>=20 >>> =20 >>>=20 >>> In Yarn, it is a concept of application. MR Job is one kind of = application which makes use of MRAppMaster(i.e ApplicationMaster for the = application). If we want to run different kinds of applications we = should have ApplicationMaster for each kind of application. >>>=20 >>> =20 >>>=20 >>> >- Do the MRv1 parameters still work for Yarn? Like = mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent? >>>=20 >>> These configurations still work for MR Job in Yarn. >>>=20 >>>=20 >>> >- What's the general process for ApplicationMaster of Yarn to = execute a job? >>>=20 >>> MRAppMaster(Application Master for MR Job) does the Job life cycle = which includes getting the containers for maps & reducers, launch the = containers using NM, tacks the tasks status till completion, manage the = failed tasks. >>>=20 >>>=20 >>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting = 'mapred.tasktracker.map.tasks.maximum' and = 'mapred.tasktracker.reduce.tasks.maximum' >>> >- For Yarn, above tow parameter do not work any more, as yarn uses = container instead, right? >>>=20 >>> Correct, these params don=92t work in yarn. In Yarn it is completely = based on the resources(memory, cpu). Application Master can request the = RM for resources to complete the tasks for that application. >>>=20 >>>=20 >>> >- For Yarn, we can set the whole physical mem for a NodeManager = using 'yarn.nodemanager.resource.memory-mb'. But how to set the default = size of physical mem of a container? >>>=20 >>> ApplicationMaster is responsible for getting the containers from RM = by sending the resource requests. For MR Job, you can use = "mapreduce.map.memory.mb" and =93mapreduce.reduce.memory.mb" = configurations for specifying the map & reduce container memory sizes. >>>=20 >>> =20 >>>=20 >>> >- How to set the maximum size of physical mem of a container? By = the parameter of 'mapred.child.java.opts'? >>>=20 >>> It can be set based on the resources requested for that container. >>>=20 >>> =20 >>>=20 >>> =20 >>>=20 >>> Thanks >>>=20 >>> Devaraj K >>>=20 >>> From: sam liu [mailto:samliuhadoop@gmail.com]=20 >>> Sent: 19 June 2013 08:16 >>> To: user@hadoop.apache.org >>> Subject: How Yarn execute MRv1 job? >>>=20 >>> =20 >>>=20 >>> Hi, >>>=20 >>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task = together, with a typical process(map > shuffle > reduce). In Yarn, as I = know, a MRv1 job will be executed only by ApplicationMaster. >>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job = has special execution process(map > shuffle > reduce) in Hadoop 1.x, and = how Yarn execute a MRv1 job? still include some special MR steps in = Hadoop 1.x, like map, sort, merge, combine and shuffle? >>> - Do the MRv1 parameters still work for Yarn? Like = mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent? >>> - What's the general process for ApplicationMaster of Yarn to = execute a job? >>>=20 >>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting = 'mapred.tasktracker.map.tasks.maximum' and = 'mapred.tasktracker.reduce.tasks.maximum' >>> - For Yarn, above tow parameter do not work any more, as yarn uses = container instead, right? >>> - For Yarn, we can set the whole physical mem for a NodeManager = using 'yarn.nodemanager.resource.memory-mb'. But how to set the default = size of physical mem of a container? >>> - How to set the maximum size of physical mem of a container? By the = parameter of 'mapred.child.java.opts'? >>>=20 >>> Thanks! >>>=20 >>>=20 >>=20 >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >>=20 >>=20 >>=20 >>=20 >=20 > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ >=20 >=20 >=20 >=20 >=20 -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ --Apple-Mail=_F213F5B6-3F17-4672-AF5F-CF2B955D7ADA Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 I'd = use hive-0.11.

On Jun 19, 2013, at 11:56 PM, sam liu = <samliuhadoop@gmail.com> = wrote:

Hi Azurry,

So, older versions = of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0, does not support = hadoop 2.x, right?

Thanks!


2013/6/20 Azuryy Yu <azuryyyu@gmail.com>
Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

<= /div>generally, we said YARN is Hadoop-2.x, you can download = hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, = 2013 at 2:11 PM, sam liu <samliuhadoop@gmail.com> wrote:
Thanks Arun!

#1, Yes, I did tests and = found that the MRv1 jobs could run against YARN directly, without = recompiling

#2, do you mean the old versions of HBase/Hive can not run = agains YARN, and only some special versions of them can run against = YARN? If yes, how can I get the versions for YARN?


2013/6/20 Arun C Murthy <acm@hortonworks.com>

On Jun 19, 2013, = at 6:45 PM, sam liu <samliuhadoop@gmail.com> = wrote:

Appreciating for the detailed answers! Here are = three further questions:

- Yarn maintains backwards = compatibility, and MRv1 job could run on Yarn. If yarn does not ask = existing MRv1 job to do any code change, but why we should recompile the = MRv1 job?

You don't need to recompile MRv1 = jobs to run against YARN.

- Which yarn jar files are required in the recompiling?
- = In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase = 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace = Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related = components again with yarn jar files? Without any code change?

You will need versions of HBase, = Hive etc. which are integrated with hadoop-2.x, but not need to change = any of your end-user applications (MR jobs, hive queries, pig scripts = etc.)

Arun

Thanks in advance!



2013/6/19 Rahul = Bhattacharjee <rahul.rec.dgp@gmail.com>
Thanks Arun = and Devraj , good to know.



On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy = <acm@hortonworks.com> wrote:
Not true, the CapacityScheduler has = support for both CPU & Memory now.

On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com> = wrote:

Hi Devaraj,

As for the container request request = for yarn container , currently only memory is considered as resource , = not cpu. Please correct.

Thanks,
Rahul


On Wed, Jun 19, = 2013 at 11:05 AM, Devaraj k <devaraj.k@huawei.com> wrote:

Hi Sam,

  Please find the answers for your = queries.


>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job = has special execution process(map > shuffle > reduce) in Hadoop = 1.x, and how Yarn execute a MRv1 job? still include some special MR = steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

 

In Yarn, it is a concept of application. MR Job is = one kind of application which makes use of MRAppMaster(i.e = ApplicationMaster for the application). If we want to run different kinds of applications we should have = ApplicationMaster for each kind of = application.

 

>- = Do the MRv1 parameters still work for Yarn? Like = mapreduce.task.io.sort.mb and = mapreduce.map.sort.spill.percent?

These configurations still work for MR Job in = Yarn.


>- What's the general process for ApplicationMaster of Yarn to = execute a job?

MRAppMaster(Application Master for MR Job) does = the Job life cycle which includes getting the containers for maps & = reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed = tasks.


>2. In Hadoop 1.x, we can set the map/reduce slots by setting = 'mapred.tasktracker.map.tasks.maximum' and = 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses = container instead, right?

Correct, these params don=92t work in yarn. In = Yarn it is completely based on the resources(memory, cpu). Application = Master can request the RM for resources to complete the tasks for that = application.


>- For Yarn, we can set the whole physical mem for a NodeManager = using 'yarn.nodemanager.resource.memory-mb'. But how to set the default = size of physical mem of a container?

ApplicationMaster is responsible for getting the = containers from RM by sending the resource requests. For MR Job, you can = use "mapreduce.map.memory.mb" and =93mapreduce.reduce.memory.mb" configurations for specifying the map = & reduce container memory sizes.

 

>- = How to set the maximum size of physical mem of a container? By the = parameter of 'mapred.child.java.opts'?

It can be set based on the resources requested for = that container.

 

 

Thanks

Devaraj K

From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

 

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task = together, with a typical process(map > shuffle > reduce). In Yarn, = as I know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has = special execution process(map > shuffle > reduce) in Hadoop 1.x, = and how Yarn execute a MRv1 job? still include some special MR steps in = Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like = mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a = job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting = 'mapred.tasktracker.map.tasks.maximum' and = 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses = container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using = 'yarn.nodemanager.resource.memory-mb'. But how to set the default size = of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the = parameter of 'mapred.child.java.opts'?

Thanks!



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

=

= --Apple-Mail=_F213F5B6-3F17-4672-AF5F-CF2B955D7ADA--