hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: How Yarn execute MRv1 job?
Date Thu, 20 Jun 2013 06:33:42 GMT
Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, 2013 at 2:11 PM, sam liu <samliuhadoop@gmail.com> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <acm@hortonworks.com>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <samliuhadoop@gmail.com> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <rahul.rec.dgp@gmail.com>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <acm@hortonworks.com>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <devaraj.k@huawei.com>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop
1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for
the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch
the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the
RM
>>>>> for resources to complete the tasks for that application.****
>>>>>
>>>>>
>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size
of
>>>>> physical mem of a container?****
>>>>>
>>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>>> sending the resource requests. For MR Job, you can use
>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>> for specifying the map & reduce container memory sizes.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- How to set the maximum size of physical mem of a container? By
the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> It can be set based on the resources requested for that container.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks****
>>>>>
>>>>> Devaraj K****
>>>>>
>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>> *Sent:* 19 June 2013 08:16
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn,
as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop
1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size
of
>>>>> physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> Thanks!****
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Mime
View raw message