hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@hortonworks.com>
Subject Re: YARN Features
Date Tue, 12 Mar 2013 18:38:21 GMT
Answers inline. Will address the DistributedShell questions in a follow-up. 

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> Hello Hadoop Team,
> 
> I need some help/support in evaluating Hadoop YARN from the perspective of:
> 
> - Adding/removing test nodes dynamically without restart
> Is this possible in YARN? or is it only a MapReduce application feature?
> 

Yarn Feature. Each node has a NodeManager daemon running on it. The Resourcemanager tracks
nodes' health
via heartbeats from the NodeManager. In that respect, unhealthy nodes will be removed from
the available 
pool and when an application requests for resources, it will not get allocated to an unhealthy
or down node.


> - Failover from crashed test nodes
> Is this automatically done by the YARN framework?
> 

When a node fails, containers running on it are also reported as failed. This feedback about
failed nodes and failed containers
is passed to the application by the ResourceManager. How the application handles these failures
and acts upon the failure events to 
recover is based on application code. Each application will have its own logic to recover
- YARN provides all the information for an application 
to act upon but does not take any action on the application's behalf ( slight caveat that
the yarn framework does support restarting ( up to a max 
retry attempt limit ) the application itself if the application died or the node on which
the application was running died.

> - Simple prioritization of jobs
> I know there is a prioritization possibility of the tasks in the
> Scheduler, right?
> 

Yes and no - depends on which scheduler you configure the RM with and how you configure it.
The default scheduler used is the CapacityScheduler. 
Details on that are at http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

> - Insight into queue (simple reporting, including running jobs on nodes)
> Is there any reporting possibility in YARN regarding the status of
> the not executed, running, finished jobs?
> 

The YARN UI provides these details for the last N applications that were handled ( N being
around 10000 ). There are also webservices to access this 
data in xml/json format. There is a plan for a generic application history server but that
is still being worked on.

> - Support heterogenous OSes on nodes (or use separate masters for
> homogenous grids)
> I think this is supported, right?
> 

I don't believe that there is anything stopping support of a heterogenous cluster. However,
I am not sure if we have come across anyone 
who has tried that out. A lot of the problems may arise based on how well a application is
written to handle launching containers correctly on different OS types.
The scheduler today does not account for asking for containers on only certain OS types. I
am guessing there might be some minor features that be needed to be
addressed to fully support it. If you do try the above use case, please let us know if there
are features/issues that you would like to see addressed.


> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 
> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 
> Thank you in advance for your support,
> Ioan Zeng


Mime
View raw message