hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Spark vs Tez
Date Sun, 19 Oct 2014 11:56:37 GMT
Very interesting!
What makes Tez more scalable than Spark?
What architectural "thing" makes the difference?

Niels Basjes
On Oct 19, 2014 3:07 AM, "Jeff Zhang" <zjffdu@gmail.com> wrote:

> Tez has a feature called pre-warm which will launch JVM before you use it
> and you can reuse the container afterwards. So it is also suitable for
> interactive queries and is more stable and scalable than spark IMO.
>
> On Sat, Oct 18, 2014 at 4:22 PM, Niels Basjes <Niels@basjes.nl> wrote:
>
>> It is my understanding that one of the big differences between Tez and
>> Spark is is that a Tez based query still has the startup overhead of
>> starting JVMs on the Yarn cluster. Spark based queries are immediately
>> executed on "already running JVMs".
>>
>> So for interactive dashboards Spark seems more suitable.
>>
>> Did I understand correctly?
>>
>> Niels Basjes
>> On Oct 17, 2014 8:30 PM, "Gavin Yue" <yue.yuanyuan@gmail.com> wrote:
>>
>>> Spark and tez both make MR faster, this has no doubt.
>>>
>>> They also provide new features like DAG, which is quite important for
>>> interactive query processing.  From this perspective, you could view them
>>> as a wrapper around MR and try to handle the intermediary buffer(files)
>>> more efficiently.  It is a big pain in MR.
>>>
>>> Also they both try to use Memory as the buffer instead of only
>>> filesystems.   Spark has a concept RDD, which is quite interesting and also
>>> limited.
>>>
>>>
>>>
>>> On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA <
>>> adaryl.wakefield@hotmail.com> wrote:
>>>
>>>>   It was my understanding that Spark is faster batch processing. Tez
>>>> is the new execution engine that replaces MapReduce and is also supposed
to
>>>> speed up batch processing. Is that not correct?
>>>> B.
>>>>
>>>>
>>>>
>>>>  *From:* Shahab Yunus <shahab.yunus@gmail.com>
>>>> *Sent:* Friday, October 17, 2014 1:12 PM
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Re: Spark vs Tez
>>>>
>>>>  What aspects of Tez and Spark are you comparing? They have different
>>>> purposes and thus not directly comparable, as far as I understand.
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>> On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA <
>>>> adaryl.wakefield@hotmail.com> wrote:
>>>>
>>>>>   Does anybody have any performance figures on how Spark stacks up
>>>>> against Tez? If you don’t have figures, does anybody have an opinion?
Spark
>>>>> seems so popular but I’m not really seeing why.
>>>>> B.
>>>>>
>>>>
>>>>
>>>
>>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Mime
View raw message