hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian O'Neill <b...@alumni.brown.edu>
Subject Re: Spark vs Tez
Date Tue, 21 Oct 2014 16:34:40 GMT
@edwardcapriolo, funny running into you over here in the hadoop community.

FWIW, I have the same perspective and had the same experience with Scala and
(I had LISP/Scheme in College too. =)

Additionally, with the JDK8 enhancements (lambda expressions, method
references, etc.), there is less motivation to move to Scala.

Specifically, with Spark ‹ take a look at this:


Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.

From:  Edward Capriolo <edlinuxguru@gmail.com>
Reply-To:  <user@hadoop.apache.org>
Date:  Tuesday, October 21, 2014 at 12:06 PM
To:  "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject:  Re: Spark vs Tez

scala is not an interpreted language, from my non authoritative view it
seems to have 2-3 (thousand) more compile phases than java and as a result
some of the things you are doing that look like they are "interpreted" are
actually macro's that get converted into "usually" efficient java code.

About scala in general. I have a few complains. The inter op is kinda
clunky, I have to work in scala and run into stuff, like the json mapper in
scala works! that is until one property in my scala object is actually a
java object, then it does not or I should be able to call a method in java
from scala but can not figure out how to turn a Comparator into a
Comparator[_: <Any].

The immutability aspect i find to be a real PITA. It becomes really hard to
write code the way you want to and then if you do not use an immutable
collection or some other fancy scala construct people get on your case that
your not writing idiomatic scala (even though few agree on what that really

Generally people have a large capacity to assume, "I'm smart, I know java,
and I learned lisp in school so this scala stuff is going to be a breeze"
Don't make that assumption. You will not be proficient in writing scala for
months. You likely wont be able to hire anyone that has done much production
scala. And everyone will come up to you and say "so Im trying to (sort
list|cast objects|simple thing) in scala. I can do it in java but YOUR THE
EXPERT and wondering how to do in in scala".

On Tue, Oct 21, 2014 at 10:04 AM, Tim Randles <trandles@lanl.gov> wrote:
> Yeah, compared to something as performant as java...
> </sarcasm>
> On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote:
>> Using an interpreted scripting language with something that is billing
>> itself as being fast doesn¹t sound like the best idea...
>> B.
>> *From:* Russell Jurney <mailto:russell.jurney@gmail.com
>> <mailto:russell.jurney@gmail.com> >
>> *Sent:* Saturday, October 18, 2014 7:38 AM
>> *To:* user@hadoop.apache.org <mailto:user@hadoop.apache.org>
>> *Subject:* Re: Spark vs Tez
>> Check out PySpark. No Scala required.
>> On Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA
>> <adaryl.wakefield@hotmail.com <mailto:adaryl.wakefield@hotmail.com
>> <mailto:adaryl.wakefield@hotmail.com> >> wrote:
>>     ³The only problem with Spark adoption is the steep learning curve of
>>     Scala , and understanding the API properly.²
>>     This is why I¹m looking for reasons to avoid Spark. In my mind, it¹s
>>     one more thing to have to master and doesn¹t really have anything to
>>     offer that can¹t be done with other tools that are already inside my
>>     skillset. I spoke with some software engineers recently and
>>     basically the discussion boiled down to if you need to master Java
>>     or Scala go with Java. Three months into Java I don¹t want to stop
>>     that and start learning Scala.
>>     B.
>>     *From:* kartik saxena
>>     <javascript:_e(%7B%7D,'cvml','kartik.sxn@gmail.com');>
>>     *Sent:* Friday, October 17, 2014 1:12 PM
>>     *To:* javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org
>> <mailto:user@hadoop.apache.org> ');
>>     *Subject:* Re: Spark vs Tez
>>     I did a performance benchmark during my summer internship . I am
>>     currently a grad student. Can't reveal much about the specific
>>     project but Spark is still faster than around 4-5th iteration of Tez
>>     of the same query/dataset. By Iteration I mean utilizing the
>>     "hot-container" property of Apache Tez  . See latest release of Tez
>>     and some hortonworks tutorials on their website.
>>     The only problem with Spark adoption is the steep learning curve of
>>     Scala , and understanding the API properly.
>>     Thanks
>>     On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA
>>     <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');> wrote:
>>         Does anybody have any performance figures on how Spark stacks up
>>         against Tez? If you don¹t have figures, does anybody have an
>>         opinion? Spark seems so popular but I¹m not really seeing why.
>>         B.
>> --
>> Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
>> <http://twitter.com/rjurney>russell.jurney@gmail.com
>> <mailto:russell.jurney@gmail.com>
>> <mailto:russell.jurney@gmail.com <mailto:russell.jurney@gmail.com> >
>> datasyndrome.com <http://datasyndrome.com>
>> <http://datasyndrome.com/>

View raw message