hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj k <devara...@huawei.com>
Subject RE: How Yarn execute MRv1 job?
Date Wed, 19 Jun 2013 05:35:42 GMT
Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution
process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still
include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application which makes use
of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds
of applications we should have ApplicationMaster for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting
the containers for maps & reducers, launch the containers using NM, tacks the tasks status
till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum'
and 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container instead,
right?
Correct, these params don't work in yarn. In Yarn it is completely based on the resources(memory,
cpu). Application Master can request the RM for resources to complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'.
But how to set the default size of physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending the resource
requests. For MR Job, you can use "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb"
configurations for specifying the map & reduce container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical
process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only
by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution
process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still
include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum'
and 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'.
But how to set the default size of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
Thanks!

Mime
View raw message