hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: How Yarn execute MRv1 job?
Date Thu, 20 Jun 2013 07:17:52 GMT
HBase-0.94.* does support hadoop-2.x, do you look at the web site i
provided?

Hive-0.9.0  doesn't  support hadoop-2.x




On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <acm@hortonworks.com> wrote:

> I'd use hive-0.11.
>
> On Jun 19, 2013, at 11:56 PM, sam liu <samliuhadoop@gmail.com> wrote:
>
> Hi Azurry,
>
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
> does not support hadoop 2.x, right?
>
> Thanks!
>
>
> 2013/6/20 Azuryy Yu <azuryyyu@gmail.com>
>
>> Hi Sam,
>> please look at :http://hbase.apache.org/book.html#d2617e499
>>
>> generally, we said YARN is Hadoop-2.x, you can download
>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <samliuhadoop@gmail.com> wrote:
>>
>>> Thanks Arun!
>>>
>>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>>> directly, without recompiling
>>>
>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>> and only some special versions of them can run against YARN? If yes, how
>>> can I get the versions for YARN?
>>>
>>>
>>> 2013/6/20 Arun C Murthy <acm@hortonworks.com>
>>>
>>>>
>>>> On Jun 19, 2013, at 6:45 PM, sam liu <samliuhadoop@gmail.com> wrote:
>>>>
>>>> Appreciating for the detailed answers! Here are three further questions:
>>>>
>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>> we should recompile the MRv1 job?
>>>>
>>>>
>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>
>>>> - Which yarn jar files are required in the recompiling?
>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>> related components again with yarn jar files? Without any code change?
>>>>
>>>>
>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>> jobs, hive queries, pig scripts etc.)
>>>>
>>>> Arun
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>> 2013/6/19 Rahul Bhattacharjee <rahul.rec.dgp@gmail.com>
>>>>
>>>>> Thanks Arun and Devraj , good to know.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <acm@hortonworks.com>wrote:
>>>>>
>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory
now.
>>>>>>
>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>
>>>>>> Hi Devaraj,
>>>>>>
>>>>>> As for the container request request for yarn container , currently
>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <devaraj.k@huawei.com>wrote:
>>>>>>
>>>>>>>  Hi Sam,****
>>>>>>>
>>>>>>>   Please find the answers for your queries. ****
>>>>>>>
>>>>>>>
>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
MRv1
>>>>>>> job has special execution process(map > shuffle > reduce)
in Hadoop 1.x,
>>>>>>> and how Yarn execute a MRv1 job? still include some special MR
steps in
>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster
for the
>>>>>>> application). If we want to run different kinds of applications
we should
>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>
>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>
>>>>>>>
>>>>>>> >- What's the general process for ApplicationMaster of Yarn
to
>>>>>>> execute a job?****
>>>>>>>
>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life
cycle
>>>>>>> which includes getting the containers for maps & reducers,
launch the
>>>>>>> containers using NM, tacks the tasks status till completion,
manage the
>>>>>>> failed tasks.****
>>>>>>>
>>>>>>>
>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> >- For Yarn, above tow parameter do not work any more, as
yarn uses
>>>>>>> container instead, right?****
>>>>>>>
>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>> based on the resources(memory, cpu). Application Master can request
the RM
>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>
>>>>>>>
>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
default
>>>>>>> size of physical mem of a container?****
>>>>>>>
>>>>>>> ApplicationMaster is responsible for getting the containers from
RM
>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb"
configurations
>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- How to set the maximum size of physical mem of a container?
By
>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> It can be set based on the resources requested for that container.**
>>>>>>> **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Thanks****
>>>>>>>
>>>>>>> Devaraj K****
>>>>>>>
>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
task
>>>>>>> together, with a typical process(map > shuffle > reduce).
In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
job
>>>>>>> has special execution process(map > shuffle > reduce) in
Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps
in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
default
>>>>>>> size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container?
By the
>>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> Thanks!****
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Mime
View raw message