hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: communication path for task assignment in hadoop 2.X
Date Thu, 11 Apr 2013 12:56:17 GMT
Hi Hari,

My response inline.

On Thu, Apr 11, 2013 at 11:51 AM, hari <haribaha@gmail.com> wrote:
> Hi list,
> I was looking at the node communication (in hadoop-2.0.3-alpha)
> responsible for task assignment. So
> far I saw that the resourcemanager and nodemanager
> exchange heartbeat request/response.

Very glad you're checking out YARN. Do provide feedback on what we can
improve over the YARN JIRA when you get to try it!

> I have a couple of questions on this:
> 1. I saw that the heartbeat response to nodemanager does
> not include new task assignment. I think in the previous
> versions (eg. 0.20.2) the task assignment was piggybacked in the
> heartbeat response to the tasktracker. In the v2.0.3, so far
> I could only see that the response can trigger nodemanager
> shutdown, reboot, app cleanup, and container cleanup. Are there
> any other actions being triggered on the nodemanager
> by the heartbeat response ?

Note: New term instead of 'Task', in YARN, is 'Container'.

Yes, the heartbeats between the NodeManager and the ResourceManager
does not account for container assignments anymore. Container launches
are handled by a separate NodeManager-embedded service called the
ContainerManager [1].

You can read all the functions a heartbeat currently does at a
NodeManager under its service NodeStatusUpdater code at [2].

> 2. Is there a different communication path for task assignment ?
> Is the scheduler making the remote calls or are there other classes outside
> of yarn responsible for making the remote calls ?

The latter. Scheduler no longer is responsible for asking NodeManagers
to launch the containers. An ApplicationMaster just asks Scheduler to
reserve it some containers with specified resource (Memory/CPU/etc.)
allocations on available or preferred NodeManagers, and then once the
ApplicationMaster gets a response back that the allocation succeeded,
it manually communicates with ContainerManager on the allocated
NodeManager, to launch the command needed to run the 'task'.

A good-enough example to read would be the DistributedShell example.
I've linked [3] to show where the above AppMaster -> ContainerManager
requesting happens in its ApplicationMaster implementation, which
should help clear this for you.

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L392
[startContainer, stopContainer, getContainerStatus etc… are all
protocol-callable calls from an Application Master]
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java#L433
[Link is to the heartbeat retry loop, and then the response processing
follows right after the loop]
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L756
[See the whole method, i.e. above from this code line point which is
just the final RPC call, and you can notice how we build the
command/environment to launch our 'task']
Harsh J

View raw message