Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of ashutosh.k78@gmail.com
 designates 209.85.220.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGA5JhzxDN=F9G3oxjP=UqkvLj+Zw=a8kMYW7OFpe=BRfS1Uqw@mail.gmail.com>
References: 
 <CAFZGXH2ZM-QjKmoDGPe5e_pm_xGPdbHEg_iQquTBLOS4FZ+Xeg@mail.gmail.com>
	<CAEo-6+T8hALVoTQygdLa0Ly904pagbOHqHSVe1rLJjsvsvwKPw@mail.gmail.com>
	<CAO6JcpiWfrCydbmQxrW=moBAMK8x-fumr=Ja=u6-Na8xOKi=yQ@mail.gmail.com>
	<CAEo-6+Q+82_OKFeVZh7shVL05PXQ1=zYD2xrTzKFP2irmO9iiQ@mail.gmail.com>
	<CAGA5JhzxDN=F9G3oxjP=UqkvLj+Zw=a8kMYW7OFpe=BRfS1Uqw@mail.gmail.com>
Date: Sun, 12 Apr 2015 12:55:16 +0530
Message-ID: 
 <CAFZGXH1Lvzyfuz+bEbyu3hywq3e_p9bXM5R+tGbbiBcD2CNvCg@mail.gmail.com>
Subject: Re: Hadoop or spark
From: Ashutosh Kumar <ashutosh.k78@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a113d105825e383051381e664

--001a113d105825e383051381e664
Content-Type: text/plain; charset=UTF-8

Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.

Thanks
Ashutosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <moty@xplenty.com> wrote:

> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <shahab.yunus@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mohajeri@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <shahab.yunus@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>

--001a113d105825e383051381e664
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks. I read this article and t seems for all practical =
purposes Spark is preferred than Hadoop map reduce. Only when have processi=
ng for very large files , in that case Hadoop map reduce scores over Spark.=
 But what is this large file size? Is it TBs or PBs or varies based on clus=
ter size? Please share your views.<div><br></div><div>Thanks</div><div>Ashu=
tosh<br><div><br></div></div></div><div class=3D"gmail_extra"><br><div clas=
s=3D"gmail_quote">On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <span dir=
=3D"ltr">&lt;<a href=3D"mailto:moty@xplenty.com" target=3D"_blank">moty@xpl=
enty.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr">Hey,<div><br></div><div>Xplenty&#39;s CTO wrote a good piece of co=
mparison between the two:</div><div><a href=3D"https://www.xplenty.com/blog=
/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=3Dhadoop-mailing-grou=
p&amp;utm_medium=3Demail&amp;utm_campaign=3Dsocial" target=3D"_blank">https=
://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_sourc=
e=3Dhadoop-mailing-group&amp;utm_medium=3Demail&amp;utm_campaign=3Dsocial</=
a><br></div><div><br></div><div>Hope this helps with deciding.</div><div><b=
r></div><div>Good luck!</div><div class=3D"gmail_extra"><div><div class=3D"=
h5"><br><div class=3D"gmail_quote">On Fri, Apr 10, 2015 at 4:28 PM, Shahab =
Yunus <span dir=3D"ltr">&lt;<a href=3D"mailto:shahab.yunus@gmail.com" targe=
t=3D"_blank">shahab.yunus@gmail.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><div dir=3D"ltr">Thanks for this. Slide# 77 and 87 are pre=
tty good. Quite a few of it, =C2=A0is new stuff and still emerging.<div><br=
></div><div>Regards,<br>Shahab</div></div><div><div><div class=3D"gmail_ext=
ra"><br><div class=3D"gmail_quote">On Fri, Apr 10, 2015 at 9:10 AM, Peyman =
Mohajerian <span dir=3D"ltr">&lt;<a href=3D"mailto:mohajeri@gmail.com" targ=
et=3D"_blank">mohajeri@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr">There actually is such a discussion, e.g.:<d=
iv><a href=3D"http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-e=
itheror-proposition-by-slim-baltagi" target=3D"_blank">http://www.slideshar=
e.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltag=
i</a><br></div><div><br></div><div>you can have a standalone Spark cluster =
with no dependency on Hadoop.</div></div><div><div><div class=3D"gmail_extr=
a"><br><div class=3D"gmail_quote">On Fri, Apr 10, 2015 at 5:47 AM, Shahab Y=
unus <span dir=3D"ltr">&lt;<a href=3D"mailto:shahab.yunus@gmail.com" target=
=3D"_blank">shahab.yunus@gmail.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr">I hope I am not misunderstanding your ques=
tion but I don&#39;t think there is a comparison between Spark and Hadoop. =
They are different things.<div><br></div><div>Hadoop is a platform on which=
 you can run Yarn, HBase and even Spark. E.g. Cloudera&#39;s Hadoop distrib=
ution has Spark, Hbase, Impala, Pig etc. as part of its installation. Spark=
 can run within a Hadoop cluster deployment.</div><div><br></div><div>I thi=
nk a more apt comparison would be something like whether you should use reg=
ular MapReduce on Yarn on Hadoop OR Spark on Hadoop.</div><div><br></div><d=
iv>Or even more direct would be Spark vs. Storm, which has been discussed h=
ere.</div><div><a href=3D"http://marc.info/?l=3Dhadoop-user&amp;m=3D1404342=
65901449" target=3D"_blank">http://marc.info/?l=3Dhadoop-user&amp;m=3D14043=
4265901449</a><br></div><div><br></div><div>Regards,</div><div>Shahab<br><d=
iv><br></div><div><br></div></div></div><div><div><div class=3D"gmail_extra=
"><br><div class=3D"gmail_quote">On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh =
Kumar <span dir=3D"ltr">&lt;<a href=3D"mailto:ashutosh.k78@gmail.com" targe=
t=3D"_blank">ashutosh.k78@gmail.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><div dir=3D"ltr"><div>How do I decide whether I should go =
for Hadoop or Spark for a greenfield project . I tried to find out and look=
s like Spark can do everything that hadoop can do. Appreciate your thoughts=
 on it.<br><br></div>Thanks<br><br></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span class=3D"HOEnZb"><font color=3D"#888888">-- <br><div><div dir=3D=
"ltr"><div><div dir=3D"ltr"><p style=3D"line-height:16px;margin:0px;padding=
:0px"><span style=3D"color:rgb(100,100,100);font-size:10pt;line-height:12pt=
;font-family:arial,helvetica,sans-serif">Moty Michaely</span><br></p><p sty=
le=3D"line-height:16px;margin:0px;padding:0px"><span style=3D"color:rgb(59,=
193,224)"><font face=3D"arial, helvetica, sans-serif">VP R&amp;D, Xplenty</=
font></span></p><p style=3D"line-height:16px;margin:0px;padding:0px"><font =
color=3D"#3bc1e0" face=3D"arial, helvetica, sans-serif"><br></font></p></di=
v></div></div></div>
</font></span></div></div>
</blockquote></div><br></div>

--001a113d105825e383051381e664--