hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Conwell <j...@iamjohn.me>
Subject Re: What else can be built on top of YARN.
Date Wed, 29 May 2013 18:04:20 GMT
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (

Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be ported.  Its doing fine
by itself, though I think its a prime candidate for a Yarn port.

DryadLinq is a (now closed) research project out of Microsoft Research that
allowed the user to write standard LINQ code (in any .net language) and it
build an execution DAG based structure of the LINQ statement, and execute
the DAG on a MS HPC cluster.

The LINQ syntax is very much like PIG, though way more flexible and has
full IDE support (is Visual Studio), and is used in standard single process
programming.  That, to me, was the beauty behind DryadLinq: the programming
language for distributed execution was exactly the same as a well known and
used language for standard single process programming already used by
hundreds of thousands of programmers, so learning curve and acceptance debt
is really low.  But, like all good things that come out of MS Research, it
was killed because they sat on it too long.

The interesting thing is that distributed DAG execution is one of the main
examples given for the types of Yarn applications that could be developed.

On Wed, May 29, 2013 at 10:30 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
> Rahul
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>> Hi Rahul,
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>> Thanks,
>> Kishore
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>> Hi all,
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>> thanks,
>>> Rahul


John C

View raw message