hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: Spark vs Tez
Date Fri, 17 Oct 2014 18:25:11 GMT
It's going to be spark engine for hive (in addition to mr and tez).

Spark API is available for Java and Python as well.

Tez engine is available now and it's quite stable. As for speed.  For
complex queries it shows 10x-20x improvement in comparison to mr engine.
e.g. one of my queries runs 30 min using mr (about 100 mr jobs),   if I
switch to tez it done in 100 sec.

I'm using HDP-2.1.5 (hive-0.13.1, tez 0.4.1)

On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   It was my understanding that Spark is faster batch processing. Tez is
> the new execution engine that replaces MapReduce and is also supposed to
> speed up batch processing. Is that not correct?
> B.
>
>
>
>  *From:* Shahab Yunus <shahab.yunus@gmail.com>
> *Sent:* Friday, October 17, 2014 1:12 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Spark vs Tez
>
>  What aspects of Tez and Spark are you comparing? They have different
> purposes and thus not directly comparable, as far as I understand.
>
> Regards,
> Shahab
>
> On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Does anybody have any performance figures on how Spark stacks up
>> against Tez? If you don’t have figures, does anybody have an opinion? Spark
>> seems so popular but I’m not really seeing why.
>> B.
>>
>
>

Mime
View raw message