hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adaryl \"Bob\" Wakefield, MBA" <adaryl.wakefi...@hotmail.com>
Subject Re: Spark vs Tez
Date Fri, 24 Oct 2014 19:46:08 GMT
My comment was in response to the suggestion to use PySpark. Perhaps I misunderstand what PySpark
is. It was my understanding that it let you work with Spark in Python. Is that not correct?


From: Edward Capriolo 
Sent: Tuesday, October 21, 2014 11:06 AM
To: user@hadoop.apache.org 
Subject: Re: Spark vs Tez

scala is not an interpreted language, from my non authoritative view it seems to have 2-3
(thousand) more compile phases than java and as a result some of the things you are doing
that look like they are "interpreted" are actually macro's that get converted into "usually"
efficient java code.  

About scala in general. I have a few complains. The inter op is kinda clunky, I have to work
in scala and run into stuff, like the json mapper in scala works! that is until one property
in my scala object is actually a java object, then it does not or I should be able to call
a method in java from scala but can not figure out how to turn a Comparator into a Comparator[_:

The immutability aspect i find to be a real PITA. It becomes really hard to write code the
way you want to and then if you do not use an immutable collection or some other fancy scala
construct people get on your case that your not writing idiomatic scala (even though few agree
on what that really is).

Generally people have a large capacity to assume, "I'm smart, I know java, and I learned lisp
in school so this scala stuff is going to be a breeze" Don't make that assumption. You will
not be proficient in writing scala for months. You likely wont be able to hire anyone that
has done much production scala. And everyone will come up to you and say "so Im trying to
(sort list|cast objects|simple thing) in scala. I can do it in java but YOUR THE EXPERT and
wondering how to do in in scala". 

On Tue, Oct 21, 2014 at 10:04 AM, Tim Randles <trandles@lanl.gov> wrote:

  Yeah, compared to something as performant as java...

  On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote:

    Using an interpreted scripting language with something that is billing
    itself as being fast doesn’t sound like the best idea...
    *From:* Russell Jurney <mailto:russell.jurney@gmail.com>
    *Sent:* Saturday, October 18, 2014 7:38 AM
    *To:* user@hadoop.apache.org <mailto:user@hadoop.apache.org>
    *Subject:* Re: Spark vs Tez
    Check out PySpark. No Scala required.

    On Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA
    <adaryl.wakefield@hotmail.com <mailto:adaryl.wakefield@hotmail.com>> wrote:

        “The only problem with Spark adoption is the steep learning curve of
        Scala , and understanding the API properly.”
        This is why I’m looking for reasons to avoid Spark. In my mind, it’s
        one more thing to have to master and doesn’t really have anything to
        offer that can’t be done with other tools that are already inside my
        skillset. I spoke with some software engineers recently and
        basically the discussion boiled down to if you need to master Java
        or Scala go with Java. Three months into Java I don’t want to stop
        that and start learning Scala.
        *From:* kartik saxena
        *Sent:* Friday, October 17, 2014 1:12 PM
        *To:* javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');
        *Subject:* Re: Spark vs Tez
        I did a performance benchmark during my summer internship . I am
        currently a grad student. Can't reveal much about the specific
        project but Spark is still faster than around 4-5th iteration of Tez
        of the same query/dataset. By Iteration I mean utilizing the
        "hot-container" property of Apache Tez  . See latest release of Tez
        and some hortonworks tutorials on their website.
        The only problem with Spark adoption is the steep learning curve of
        Scala , and understanding the API properly.
        On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA
        <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');> wrote:

            Does anybody have any performance figures on how Spark stacks up
            against Tez? If you don’t have figures, does anybody have an
            opinion? Spark seems so popular but I’m not really seeing why.

    Russell Jurney twitter.com/rjurney
    <mailto:russell.jurney@gmail.com> datasyndrome.com

View raw message