hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam liu <samliuhad...@gmail.com>
Subject Re: How Yarn execute MRv1 job?
Date Thu, 20 Jun 2013 06:11:51 GMT
Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?


2013/6/20 Arun C Murthy <acm@hortonworks.com>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <samliuhadoop@gmail.com> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <rahul.rec.dgp@gmail.com>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <acm@hortonworks.com>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <devaraj.k@huawei.com>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x,
and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>> sending the resource requests. For MR Job, you can use
>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>> for specifying the map & reduce container memory sizes.****
>>>>
>>>> ** **
>>>>
>>>> >- How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> It can be set based on the resources requested for that container.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks****
>>>>
>>>> Devaraj K****
>>>>
>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>> *Sent:* 19 June 2013 08:16
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>
>>>> ** **
>>>>
>>>> Hi,
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn,
as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x,
and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> Thanks!****
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Mime
View raw message