hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohan Radhakrishnan <radhakrishnan.mo...@gmail.com>
Subject Re: Spark vs Tez
Date Sun, 19 Oct 2014 13:56:04 GMT
Is Tez's architecture similar to Akka's distributed architecture ? I think
I remember that Jonas boner mentioned during a presentation on distributed
computing about Akka's support for protocols like raft etc. What makes Tez
more scalable in this regard ?

Thanks,
Mohan

On Sun, Oct 19, 2014 at 5:26 PM, Niels Basjes <Niels@basjes.nl> wrote:

> Very interesting!
> What makes Tez more scalable than Spark?
> What architectural "thing" makes the difference?
>
> Niels Basjes
> On Oct 19, 2014 3:07 AM, "Jeff Zhang" <zjffdu@gmail.com> wrote:
>
>> Tez has a feature called pre-warm which will launch JVM before you use it
>> and you can reuse the container afterwards. So it is also suitable for
>> interactive queries and is more stable and scalable than spark IMO.
>>
>> On Sat, Oct 18, 2014 at 4:22 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>
>>> It is my understanding that one of the big differences between Tez and
>>> Spark is is that a Tez based query still has the startup overhead of
>>> starting JVMs on the Yarn cluster. Spark based queries are immediately
>>> executed on "already running JVMs".
>>>
>>> So for interactive dashboards Spark seems more suitable.
>>>
>>> Did I understand correctly?
>>>
>>> Niels Basjes
>>> On Oct 17, 2014 8:30 PM, "Gavin Yue" <yue.yuanyuan@gmail.com> wrote:
>>>
>>>> Spark and tez both make MR faster, this has no doubt.
>>>>
>>>> They also provide new features like DAG, which is quite important for
>>>> interactive query processing.  From this perspective, you could view them
>>>> as a wrapper around MR and try to handle the intermediary buffer(files)
>>>> more efficiently.  It is a big pain in MR.
>>>>
>>>> Also they both try to use Memory as the buffer instead of only
>>>> filesystems.   Spark has a concept RDD, which is quite interesting and also
>>>> limited.
>>>>
>>>>
>>>>
>>>> On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA <
>>>> adaryl.wakefield@hotmail.com> wrote:
>>>>
>>>>>   It was my understanding that Spark is faster batch processing. Tez
>>>>> is the new execution engine that replaces MapReduce and is also supposed
to
>>>>> speed up batch processing. Is that not correct?
>>>>> B.
>>>>>
>>>>>
>>>>>
>>>>>  *From:* Shahab Yunus <shahab.yunus@gmail.com>
>>>>> *Sent:* Friday, October 17, 2014 1:12 PM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Spark vs Tez
>>>>>
>>>>>  What aspects of Tez and Spark are you comparing? They have different
>>>>> purposes and thus not directly comparable, as far as I understand.
>>>>>
>>>>> Regards,
>>>>> Shahab
>>>>>
>>>>> On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA <
>>>>> adaryl.wakefield@hotmail.com> wrote:
>>>>>
>>>>>>   Does anybody have any performance figures on how Spark stacks up
>>>>>> against Tez? If you don’t have figures, does anybody have an opinion?
Spark
>>>>>> seems so popular but I’m not really seeing why.
>>>>>> B.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

Mime
View raw message