hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Spark vs Tez
Date Tue, 21 Oct 2014 16:06:29 GMT
scala is not an interpreted language, from my non authoritative view it
seems to have 2-3 (thousand) more compile phases than java and as a result
some of the things you are doing that look like they are "interpreted" are
actually macro's that get converted into "usually" efficient java code.

About scala in general. I have a few complains. The inter op is kinda
clunky, I have to work in scala and run into stuff, like the json mapper in
scala works! that is until one property in my scala object is actually a
java object, then it does not or I should be able to call a method in java
from scala but can not figure out how to turn a Comparator into a
Comparator[_: <Any].

The immutability aspect i find to be a real PITA. It becomes really hard to
write code the way you want to and then if you do not use an immutable
collection or some other fancy scala construct people get on your case that
your not writing idiomatic scala (even though few agree on what that really

Generally people have a large capacity to assume, "I'm smart, I know java,
and I learned lisp in school so this scala stuff is going to be a breeze"
Don't make that assumption. You will not be proficient in writing scala for
months. You likely wont be able to hire anyone that has done much
production scala. And everyone will come up to you and say "so Im trying to
(sort list|cast objects|simple thing) in scala. I can do it in java but
YOUR THE EXPERT and wondering how to do in in scala".

On Tue, Oct 21, 2014 at 10:04 AM, Tim Randles <trandles@lanl.gov> wrote:

> Yeah, compared to something as performant as java...
> </sarcasm>
> On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote:
>> Using an interpreted scripting language with something that is billing
>> itself as being fast doesn’t sound like the best idea...
>> B.
>> *From:* Russell Jurney <mailto:russell.jurney@gmail.com>
>> *Sent:* Saturday, October 18, 2014 7:38 AM
>> *To:* user@hadoop.apache.org <mailto:user@hadoop.apache.org>
>> *Subject:* Re: Spark vs Tez
>> Check out PySpark. No Scala required.
>> On Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA
>> <adaryl.wakefield@hotmail.com <mailto:adaryl.wakefield@hotmail.com>>
>> wrote:
>>     “The only problem with Spark adoption is the steep learning curve of
>>     Scala , and understanding the API properly.”
>>     This is why I’m looking for reasons to avoid Spark. In my mind, it’s
>>     one more thing to have to master and doesn’t really have anything to
>>     offer that can’t be done with other tools that are already inside my
>>     skillset. I spoke with some software engineers recently and
>>     basically the discussion boiled down to if you need to master Java
>>     or Scala go with Java. Three months into Java I don’t want to stop
>>     that and start learning Scala.
>>     B.
>>     *From:* kartik saxena
>>     <javascript:_e(%7B%7D,'cvml','kartik.sxn@gmail.com');>
>>     *Sent:* Friday, October 17, 2014 1:12 PM
>>     *To:* javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');
>>     *Subject:* Re: Spark vs Tez
>>     I did a performance benchmark during my summer internship . I am
>>     currently a grad student. Can't reveal much about the specific
>>     project but Spark is still faster than around 4-5th iteration of Tez
>>     of the same query/dataset. By Iteration I mean utilizing the
>>     "hot-container" property of Apache Tez  . See latest release of Tez
>>     and some hortonworks tutorials on their website.
>>     The only problem with Spark adoption is the steep learning curve of
>>     Scala , and understanding the API properly.
>>     Thanks
>>     On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA
>>     <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');> wrote:
>>         Does anybody have any performance figures on how Spark stacks up
>>         against Tez? If you don’t have figures, does anybody have an
>>         opinion? Spark seems so popular but I’m not really seeing why.
>>         B.
>> --
>> Russell Jurney twitter.com/rjurney
>> <http://twitter.com/rjurney>russell.jurney@gmail.com
>> <mailto:russell.jurney@gmail.com> datasyndrome.com
>> <http://datasyndrome.com/>

View raw message