Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of apivovarov@gmail.com
 designates 209.85.217.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <COL126-DS58DDF550D4BB1A70608A598A80@phx.gbl>
References: 
 <CACeqxwQ7sMmo92RG=SFuuOhCHn2x1Rf=WvE1_9Hs38q-EBX+Kw@mail.gmail.com>
 <CALr1C9oYpmFZUnc1LK-aEtJxPGMBvqQZA7kxFbMk9+0egK=9qw@mail.gmail.com>
 <CACeqxwTcFqN6PJtz8kB9NHJts9L3a=HH-JDwjS+TL1cAWFRO3Q@mail.gmail.com>
 <COL126-DS146CAD630D36FF9B04F93A98A80@phx.gbl>
 <CAEo-6+TQA3=Yd1rw5gZJ1xi7YbtJY4Xkt_bM3Rs7zjKKZPOnsw@mail.gmail.com>
 <COL126-DS58DDF550D4BB1A70608A598A80@phx.gbl>
From: Alexander Pivovarov <apivovarov@gmail.com>
Date: Fri, 17 Oct 2014 11:25:11 -0700
Message-ID: 
 <CAKKt98R9gKNdF5bcRZkxXpQH1xvCt-HBNL9xVyHu_iDO9ZXiNg@mail.gmail.com>
Subject: Re: Spark vs Tez
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c344f67e323f0505a27d01

--001a11c344f67e323f0505a27d01
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It's going to be spark engine for hive (in addition to mr and tez).

Spark API is available for Java and Python as well.

Tez engine is available now and it's quite stable. As for speed.  For
complex queries it shows 10x-20x improvement in comparison to mr engine.
e.g. one of my queries runs 30 min using mr (about 100 mr jobs),   if I
switch to tez it done in 100 sec.

I'm using HDP-2.1.5 (hive-0.13.1, tez 0.4.1)

On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   It was my understanding that Spark is faster batch processing. Tez is
> the new execution engine that replaces MapReduce and is also supposed to
> speed up batch processing. Is that not correct?
> B.
>
>
>
>  *From:* Shahab Yunus <shahab.yunus@gmail.com>
> *Sent:* Friday, October 17, 2014 1:12 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Spark vs Tez
>
>  What aspects of Tez and Spark are you comparing? They have different
> purposes and thus not directly comparable, as far as I understand.
>
> Regards,
> Shahab
>
> On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Does anybody have any performance figures on how Spark stacks up
>> against Tez? If you don=E2=80=99t have figures, does anybody have an opi=
nion? Spark
>> seems so popular but I=E2=80=99m not really seeing why.
>> B.
>>
>
>

--001a11c344f67e323f0505a27d01
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It&#39;s going to be spark engine for hive (in addition to=
 mr and tez).<div><br></div><div>Spark API is available for Java and Python=
 as well.</div><div><br></div><div>Tez engine is available now and it&#39;s=
 quite stable. As for speed.=C2=A0 For complex queries it shows 10x-20x imp=
rovement in comparison to mr engine.</div><div>e.g. one of my queries runs =
30 min using mr (about 100 mr jobs), =C2=A0 if I switch to tez it done in 1=
00 sec.</div><div><br></div><div>I&#39;m using HDP-2.1.5 (hive-0.13.1, tez =
0.4.1)</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"=
>On Fri, Oct 17, 2014 at 11:23 AM, Adaryl &quot;Bob&quot; Wakefield, MBA <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:adaryl.wakefield@hotmail.com" target=
=3D"_blank">adaryl.wakefield@hotmail.com</a>&gt;</span> wrote:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:&#39;Calibri&#39;;COLOR:#000000">
<div>It was my understanding that Spark is faster batch processing. Tez is =
the=20
new execution engine that replaces MapReduce and is also supposed to speed =
up=20
batch processing. Is that not correct?</div>
<div>B.</div>
<div>=C2=A0</div>
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:&#39;Calibri&#39;;COLOR:#000000">=
=C2=A0</div>
<div style=3D"FONT-SIZE:small;TEXT-DECORATION:none;FONT-FAMILY:&quot;Calibr=
i&quot;;FONT-WEIGHT:normal;COLOR:#000000;FONT-STYLE:normal;DISPLAY:inline">
<div style=3D"FONT:10pt tahoma">
<div>=C2=A0</div>
<div style=3D"BACKGROUND:#f5f5f5">
<div><b>From:</b> <a title=3D"shahab.yunus@gmail.com" href=3D"mailto:shahab=
.yunus@gmail.com" target=3D"_blank">Shahab Yunus</a> </div>
<div><b>Sent:</b> Friday, October 17, 2014 1:12 PM</div>
<div><b>To:</b> <a title=3D"user@hadoop.apache.org" href=3D"mailto:user@had=
oop.apache.org" target=3D"_blank">user@hadoop.apache.org</a> </div>
<div><b>Subject:</b> Re: Spark vs Tez</div></div></div>
<div>=C2=A0</div></div><div><div class=3D"h5">
<div style=3D"FONT-SIZE:small;TEXT-DECORATION:none;FONT-FAMILY:&quot;Calibr=
i&quot;;FONT-WEIGHT:normal;COLOR:#000000;FONT-STYLE:normal;DISPLAY:inline">
<div dir=3D"ltr">What aspects of Tez and Spark are you comparing? They have=
=20
different purposes and thus not directly comparable, as far as I understand=
.=20
<div>=C2=A0</div>
<div>Regards,</div>
<div>Shahab</div></div>
<div class=3D"gmail_extra">
<div>=C2=A0</div>
<div class=3D"gmail_quote">On Fri, Oct 17, 2014 at 2:06 PM, Adaryl &quot;Bo=
b&quot; Wakefield,=20
MBA <span dir=3D"ltr">&lt;<a href=3D"mailto:adaryl.wakefield@hotmail.com" t=
arget=3D"_blank">adaryl.wakefield@hotmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"PADDING-LEFT:1ex;MARGIN:0px 0px =
0px 0.8ex;BORDER-LEFT:#ccc 1px solid">
  <div dir=3D"ltr">
  <div dir=3D"ltr">
  <div style=3D"FONT-SIZE:12pt;FONT-FAMILY:&#39;Calibri&#39;;COLOR:#000000"=
>
  <div>Does anybody have any performance figures on how Spark stacks up aga=
inst=20
  Tez? If you don=E2=80=99t have figures, does anybody have an opinion? Spa=
rk seems so=20
  popular but I=E2=80=99m not really seeing why.</div><span><font color=3D"=
#888888">
  <div>B.</div></font></span></div></div></div></blockquote></div>
<div>=C2=A0</div></div></div></div></div></div></div></div>
</blockquote></div><br></div>

--001a11c344f67e323f0505a27d01--