hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Zeng <zengi...@gmail.com>
Subject Re: YARN Features
Date Tue, 12 Mar 2013 19:26:03 GMT
Another evaluation criteria was the community support of the framework
which I rate now as very good :)

I would like to ask other questions:

I have seen YARN or MR used only in the context of HDFS. Would it be
possible to keep all YARN features without using it in relation with
HDFS (with no HDFS installed)?

You mentioned the CapacityScheduler. Does this require MapReduce? or
is it included in YARN? I understood that MRv2 is just an application
built over the YARN framework. For our use case we don't need MR.

For a better understanding of my questions regarding the Distributed
Shell. We intend to use YARN for a distributed automated test
environment which will execute set of test suites for specific builds
in parallel. Do you know about similar usages of YARN or MR, maybe
case studies?


On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hitesh@hortonworks.com> wrote:
> Answers regarding DistributedShell.
> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf
has some details on YARN's architecture.
> -- Hitesh
> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> Another point I would like to evaluate is the Distributed Shell example usage.
>> Our use case is to start different scripts on a grid. Once a node has
>> finished a script a new script has to be started on it. A report about
>> the scripts execution has to be provided. in case a node has failed to
>> execute a script it should be re-executed on a different node. Some
>> scripts are Windows specific other are Unix specific and have to be
>> executed on a node with a specific OS.
> The current implementation of distributed shell is effectively a piece of example code
to help
> folks write more complex applications. It simply supports launching a script on a given
> of containers ( without accounting for where the containers are assigned ), does not
handle retries on failures
> and simply reports a success/failure based on the no. of failures in running the script.
> Based on your use case, it should be easy enough to build on the example code to handle
the features that
> you require.
> The OS specific resource ask is something which will be need to be addressed in YARN.
Could you file a JIRA
> for this feature request with some details about your use-case.
>> The question is:
>> Would it be feasible to adapt the example "Distributed Shell"
>> application to have the above features?
>> If yes how could I run some specific scripts only on a specific OS? Is
>> this the ResourceManager responsability? What happens if there is no
>> Windows node for example in the grid but in the queue there is a
>> Windows script?
>> How to re-execute failed scripts? Does it have to be implemented by
>> custom code, or is it a built in feature of YARN?
> The way YARN works is slightly different from what you describe above.
> What you would do is write some form of a controller which in YARN terminology is referred
to as an ApplicationMaster.
> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux
with 1 GB each of RAM ). Once, the container is
> assigned, the controller would be responsible for launching the correct script based
on the container allocated. The RM would be responsible
> for ensuring the correct set of containers are allocated to the container based on resource
usage limits, priorities, etc. [ Again to clarify, OS type
> scheduling is currently not supported ]. If a script fails, the container's exit code
and completion status would be fed back to the controller which
> would then have to handle retries ( may require asking the RM for a new container ).
>> Thank you in advance for your support,
>> Ioan Zeng

View raw message