Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F17F317693 for ; Sun, 12 Apr 2015 07:26:34 +0000 (UTC) Received: (qmail 91677 invoked by uid 500); 12 Apr 2015 07:26:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 91567 invoked by uid 500); 12 Apr 2015 07:26:28 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91556 invoked by uid 99); 12 Apr 2015 07:26:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Apr 2015 07:26:27 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ashutosh.k78@gmail.com designates 209.85.220.174 as permitted sender) Received: from [209.85.220.174] (HELO mail-qk0-f174.google.com) (209.85.220.174) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Apr 2015 07:26:03 +0000 Received: by qku63 with SMTP id 63so114773746qku.3 for ; Sun, 12 Apr 2015 00:25:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4QtnTgszeTsuuZrA039y9f3ddJRjSE6VlXO+NlxHnDU=; b=V91Jurl/7V4tGI3YaqlKfWGzvTgvMQFpVyJY8KZyIjQixzoMi6GYc42WqEjQT0FqTv nbCe6YmIDyKQ0woHqHl6vUKW5ozQICJm2HoYXwzyDfJxqwI47AJq+bX56u39NCADIqiO Uq7AEXlcvw6nAXIxwE+9NnopXNhAeF1VS1oOKac4S1P1cWFCVzIaDWMJRnKndjgzbVG2 XEelpMH23xeuOXWogvnls97nRS363iXSFTQuSBv/C8G1Uj3cMQO8uVXk5pvmN0W7Vlwv mZmZPHc40YwdCJkfGjzRCG6HJLRyiYuYA51jRF4rokcbuCvTWJj7CgNZVsHFFAe5dRit g3YQ== MIME-Version: 1.0 X-Received: by 10.202.15.82 with SMTP id 79mr3768467oip.29.1428823516503; Sun, 12 Apr 2015 00:25:16 -0700 (PDT) Received: by 10.60.171.77 with HTTP; Sun, 12 Apr 2015 00:25:16 -0700 (PDT) In-Reply-To: References: Date: Sun, 12 Apr 2015 12:55:16 +0530 Message-ID: Subject: Re: Hadoop or spark From: Ashutosh Kumar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113d105825e383051381e664 X-Virus-Checked: Checked by ClamAV on apache.org --001a113d105825e383051381e664 Content-Type: text/plain; charset=UTF-8 Thanks. I read this article and t seems for all practical purposes Spark is preferred than Hadoop map reduce. Only when have processing for very large files , in that case Hadoop map reduce scores over Spark. But what is this large file size? Is it TBs or PBs or varies based on cluster size? Please share your views. Thanks Ashutosh On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely wrote: > Hey, > > Xplenty's CTO wrote a good piece of comparison between the two: > > https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social > > Hope this helps with deciding. > > Good luck! > > On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus > wrote: > >> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is >> new stuff and still emerging. >> >> Regards, >> Shahab >> >> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian >> wrote: >> >>> There actually is such a discussion, e.g.: >>> >>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi >>> >>> you can have a standalone Spark cluster with no dependency on Hadoop. >>> >>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus >>> wrote: >>> >>>> I hope I am not misunderstanding your question but I don't think there >>>> is a comparison between Spark and Hadoop. They are different things. >>>> >>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark. >>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as >>>> part of its installation. Spark can run within a Hadoop cluster deployment. >>>> >>>> I think a more apt comparison would be something like whether you >>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop. >>>> >>>> Or even more direct would be Spark vs. Storm, which has been discussed >>>> here. >>>> http://marc.info/?l=hadoop-user&m=140434265901449 >>>> >>>> Regards, >>>> Shahab >>>> >>>> >>>> >>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar >>> > wrote: >>>> >>>>> How do I decide whether I should go for Hadoop or Spark for a >>>>> greenfield project . I tried to find out and looks like Spark can do >>>>> everything that hadoop can do. Appreciate your thoughts on it. >>>>> >>>>> Thanks >>>>> >>>>> >>>> >>> >> > > > -- > > Moty Michaely > > VP R&D, Xplenty > > > --001a113d105825e383051381e664 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks. I read this article and t seems for all practical = purposes Spark is preferred than Hadoop map reduce. Only when have processi= ng for very large files , in that case Hadoop map reduce scores over Spark.= But what is this large file size? Is it TBs or PBs or varies based on clus= ter size? Please share your views.

Thanks
Ashu= tosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <moty@xpl= enty.com> wrote:
Hey,

Xplenty's CTO wrote a good piece of co= mparison between the two:


On Fri, Apr 10, 2015 at 4:28 PM, Shahab = Yunus <shahab.yunus@gmail.com> wrote:
Thanks for this. Slide# 77 and 87 are pre= tty good. Quite a few of it, =C2=A0is new stuff and still emerging.
Regards,
Shahab

On Fri, Apr 10, 2015 at 9:10 AM, Peyman = Mohajerian <mohajeri@gmail.com> wrote:

you can have a standalone Spark cluster = with no dependency on Hadoop.

On Fri, Apr 10, 2015 at 5:47 AM, Shahab Y= unus <shahab.yunus@gmail.com> wrote:
I hope I am not misunderstanding your ques= tion but I don't think there is a comparison between Spark and Hadoop. = They are different things.

Hadoop is a platform on which= you can run Yarn, HBase and even Spark. E.g. Cloudera's Hadoop distrib= ution has Spark, Hbase, Impala, Pig etc. as part of its installation. Spark= can run within a Hadoop cluster deployment.

I thi= nk a more apt comparison would be something like whether you should use reg= ular MapReduce on Yarn on Hadoop OR Spark on Hadoop.

Or even more direct would be Spark vs. Storm, which has been discussed h= ere.

Regards,
Shahab



On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh = Kumar <ashutosh.k78@gmail.com> wrote:
How do I decide whether I should go = for Hadoop or Spark for a greenfield project . I tried to find out and look= s like Spark can do everything that hadoop can do. Appreciate your thoughts= on it.

Thanks







<= /div>--

Moty Michaely

VP R&D, Xplenty



--001a113d105825e383051381e664--