hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@hortonworks.com>
Subject Re: YARN Features
Date Tue, 12 Mar 2013 21:01:49 GMT
Answers inline. 

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> I would like to ask other questions:
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with
any filesytem solution. 
There are a couple of features which do depend on HDFS though - log aggregation for example
( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be
unaware of. If you do see anything which 
depends on HDFS, please do file jiras so that we can address the issue.

> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.

Yes - you are right - there would be no dependency on MapReduce. 
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is
YARN only ). 

> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?

There are a few others who are using Yarn in various scenarios - none who use it for their
test infrastructure as far as I know. 
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch
of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help

> Thanks,
> Ioan
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hitesh@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf
has some details on YARN's architecture.
>> -- Hitesh
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>> The current implementation of distributed shell is effectively a piece of example
code to help
>> folks write more complex applications. It simply supports launching a script on a
given number
>> of containers ( without accounting for where the containers are assigned ), does
not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the
>> Based on your use case, it should be easy enough to build on the example code to
handle the features that
>> you require.
>> The OS specific resource ask is something which will be need to be addressed in YARN.
Could you file a JIRA
>> for this feature request with some details about your use-case.
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>> The way YARN works is slightly different from what you describe above.
>> What you would do is write some form of a controller which in YARN terminology is
referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on
Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based
on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on
resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit
code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container
>>> Thank you in advance for your support,
>>> Ioan Zeng

View raw message