Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74EA7D812 for ; Thu, 30 May 2013 05:16:48 +0000 (UTC) Received: (qmail 68884 invoked by uid 500); 30 May 2013 05:16:43 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 68697 invoked by uid 500); 30 May 2013 05:16:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68668 invoked by uid 99); 30 May 2013 05:16:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 May 2013 05:16:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vinodkv@hortonworks.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yh0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 May 2013 05:16:35 +0000 Received: by mail-yh0-f44.google.com with SMTP id 29so1074651yhl.3 for ; Wed, 29 May 2013 22:16:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer:x-gm-message-state; bh=s9BKZiYu7BC5IVWa1Dou2XIJR7L4DXUFnxFMCFWmImk=; b=FS/4KNK/XzZcm8AoG4DBfcpTLXDz4iaukZ+LEDWF5HP8EtC9OPg+tbHIjyfhvXRBEq x3WLCPYQ2zXiMUBG/Aq8oDxl/PjQYjno+wtBBRmas6H8F63FwOpy3DdtWDlpF9ChFryW Pq+Wn7hnViHAmFJfOL03zdJptfZERaFqJnoE7dpm0JTJbjHau84ENpqIZ+pg4jcjMyAq YO7a3H77aiXM7xK1ow93LuqI2ke02pIZSzzRaGYtBIIhNEXuCUc9ftnJAc/BclUcgsuD gUNQRhJeCzJDjichfJA+JYpbS7Exngt1La3+B7Q/6AC9RsSNRw0RfN6oP71MKZCDh+pX z1/g== X-Received: by 10.236.46.72 with SMTP id q48mr2628684yhb.96.1369890974675; Wed, 29 May 2013 22:16:14 -0700 (PDT) Received: from ?IPv6:2602:306:ce97:c9d0:84c0:a5e0:39b2:1013? ([2602:306:ce97:c9d0:84c0:a5e0:39b2:1013]) by mx.google.com with ESMTPSA id z67sm2700743yhb.5.2013.05.29.22.16.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 29 May 2013 22:16:13 -0700 (PDT) From: Vinod Kumar Vavilapalli Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_97CD09BB-C5F9-44EE-9E5A-DD177C8B3F32" Subject: Re: What else can be built on top of YARN. Date: Wed, 29 May 2013 22:16:09 -0700 In-Reply-To: To: user@hadoop.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1283) X-Gm-Message-State: ALoCoQmXu+d0BqbRppOkNUk+MvNEShdsLb6qXg7wXFsDpAdbi0gIuBAVupJMymVlnB00jEULteag X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_97CD09BB-C5F9-44EE-9E5A-DD177C8B3F32 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Historically, many applications/frameworks wanted to take advantage of = just the resource management capabilities and failure handling of Hadoop = (via JobTracker/TaskTracker), but were forced to used MapReduce even = though they didn't have to. Obvious examples are graph processing = (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp. There are issues even with map-only jobs. - You have to fake key-value processing, periodic pings, key-value = outputs - You are limited to map slot capacity in the cluster - The number of tasks is static, so you cannot grow and shrink your job - You are forced to sort data all the time (even though this has = changed recently) - You are tied to faking things like OutputCommit even if you don't = need to. That's just for starters. I can definitely think harder and list more ;) YARN lets you move ahead without those limitations. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote: > Hi all, >=20 > I was going through the motivation behind Yarn. Splitting the = responsibility of JT is the major concern.Ultimately the base (Yarn) was = built in a generic way for building other generic distributed = applications too. >=20 > I am not able to think of any other parallel processing use case that = would be useful to built on top of YARN. I though of a lot of use cases = that would be beneficial when run in parallel , but again ,we can do = those using map only jobs in MR. >=20 > Can someone tell me a scenario , where a application can utilize Yarn = features or can be built on top of YARN and at the same time , it cannot = be done efficiently using MRv2 jobs. >=20 > thanks, > Rahul >=20 >=20 --Apple-Mail=_97CD09BB-C5F9-44EE-9E5A-DD177C8B3F32 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul = Bhattacharjee wrote:

Hi = all,

I was going through = the motivation behind Yarn. Splitting the responsibility of JT is the = major concern.Ultimately the base (Yarn) was built in a generic way for = building other generic distributed applications too.

I am not able to = think of any other parallel processing use case that would be useful to = built on top of YARN. I though of a lot of use cases that would be = beneficial when run in parallel , but again ,we can do those using map = only jobs in MR.

Can someone tell me = a scenario , where a application can utilize Yarn features or can be = built on top of YARN and at the same time , it cannot be done = efficiently using MRv2 jobs.

thanks,
Rahul



= --Apple-Mail=_97CD09BB-C5F9-44EE-9E5A-DD177C8B3F32--