hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: What else can be built on top of YARN.
Date Thu, 30 May 2013 05:16:09 GMT

Historically, many applications/frameworks wanted to take advantage of just the resource management
capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced
to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph),
BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

+Vinod Kumar Vavilapalli
Hortonworks Inc.

On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is
the major concern.Ultimately the base (Yarn) was built in a generic way for building other
generic distributed applications too.
> I am not able to think of any other parallel processing use case that would be useful
to built on top of YARN. I though of a lot of use cases that would be beneficial when run
in parallel , but again ,we can do those using map only jobs in MR.
> Can someone tell me a scenario , where a application can utilize Yarn features or can
be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
> thanks,
> Rahul

View raw message