Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A7CF17C83 for ; Tue, 21 Oct 2014 16:35:19 +0000 (UTC) Received: (qmail 30930 invoked by uid 500); 21 Oct 2014 16:35:11 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 30796 invoked by uid 500); 21 Oct 2014 16:35:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 30775 invoked by uid 99); 21 Oct 2014 16:35:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Oct 2014 16:35:10 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of boneill42@gmail.com designates 209.85.192.46 as permitted sender) Received: from [209.85.192.46] (HELO mail-qg0-f46.google.com) (209.85.192.46) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Oct 2014 16:35:06 +0000 Received: by mail-qg0-f46.google.com with SMTP id z60so1148657qgd.33 for ; Tue, 21 Oct 2014 09:34:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :references:in-reply-to:mime-version:content-type; bh=Wmw3qBNYtnmfUNJMW1jBk6mXhY2oRg1ynU2lRArhx9E=; b=xTB4/Qi1+Km6lfUogklg2B24WZ09wTveuGutAv+Rj7rFIK/A5xf0YBv1FWxJ+R3MnD bmN5ntyGxrywp3NwTCBDVvK2wL4sh9i2QU9irQf42PnfVbgTsC+1M1m0CtNlNHxYXcv5 N4Vp21X10hmWOja2WLXpqc+Gqzj4b51IkZ5WFN8Xkn9WyOONapTZMEAtnZDbXguDmOQy Sb05md6MrYTNHeVzWqMdA7i8OqEOw+Br4qjyPJNRvl6KzrVBi5gRka0Sc99LcoEC7QOG szfJfL8LqAkGgbynjcD7h6ych8iupREN+oIrDdMUzrbFColGsvD8R6ynviXT2ZeAPjjY 9yCg== X-Received: by 10.224.69.67 with SMTP id y3mr47765776qai.76.1413909285785; Tue, 21 Oct 2014 09:34:45 -0700 (PDT) Received: from [10.60.71.81] ([67.132.206.254]) by mx.google.com with ESMTPSA id k3sm11112416qay.1.2014.10.21.09.34.44 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 21 Oct 2014 09:34:45 -0700 (PDT) Sender: "Brian O'Neill" User-Agent: Microsoft-MacOutlook/14.4.5.141003 Date: Tue, 21 Oct 2014 12:34:40 -0400 Subject: Re: Spark vs Tez From: Brian O'Neill To: Message-ID: Thread-Topic: Spark vs Tez References: <54466807.3060303@lanl.gov> In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3496739685_3331934" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3496739685_3331934 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable @edwardcapriolo, funny running into you over here in the hadoop community. =3D) FWIW, I have the same perspective and had the same experience with Scala an= d Spark.=20 (I had LISP/Scheme in College too. =3D) Additionally, with the JDK8 enhancements (lambda expressions, method references, etc.), there is less motivation to move to Scala. Specifically, with Spark =8B take a look at this: http://blog.cloudera.com/blog/2014/04/making-apache-spark-easier-to-use-in-= j ava-with-java-8/ -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive =80 King of Prussia, PA =80 19406 M: 215.588.6024 =80 @boneill42 =80 healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. =20 From: Edward Capriolo Reply-To: Date: Tuesday, October 21, 2014 at 12:06 PM To: "user@hadoop.apache.org" Subject: Re: Spark vs Tez scala is not an interpreted language, from my non authoritative view it seems to have 2-3 (thousand) more compile phases than java and as a result some of the things you are doing that look like they are "interpreted" are actually macro's that get converted into "usually" efficient java code. About scala in general. I have a few complains. The inter op is kinda clunky, I have to work in scala and run into stuff, like the json mapper in scala works! that is until one property in my scala object is actually a java object, then it does not or I should be able to call a method in java from scala but can not figure out how to turn a Comparator into a Comparator[_: wrote: > Yeah, compared to something as performant as java... > >=20 > On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote: >> Using an interpreted scripting language with something that is billing >> itself as being fast doesn=B9t sound like the best idea... >> B. >> *From:* Russell Jurney > > >> *Sent:* Saturday, October 18, 2014 7:38 AM >> *To:* user@hadoop.apache.org >> *Subject:* Re: Spark vs Tez >> Check out PySpark. No Scala required. >>=20 >> On Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA >> > >> wrote: >>=20 >> =B3The only problem with Spark adoption is the steep learning curve of >> Scala , and understanding the API properly.=B2 >> This is why I=B9m looking for reasons to avoid Spark. In my mind, it=B9s >> one more thing to have to master and doesn=B9t really have anything to >> offer that can=B9t be done with other tools that are already inside my >> skillset. I spoke with some software engineers recently and >> basically the discussion boiled down to if you need to master Java >> or Scala go with Java. Three months into Java I don=B9t want to stop >> that and start learning Scala. >> B. >> *From:* kartik saxena >> >> *Sent:* Friday, October 17, 2014 1:12 PM >> *To:* javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org >> '); >> *Subject:* Re: Spark vs Tez >> I did a performance benchmark during my summer internship . I am >> currently a grad student. Can't reveal much about the specific >> project but Spark is still faster than around 4-5th iteration of Tez >> of the same query/dataset. By Iteration I mean utilizing the >> "hot-container" property of Apache Tez . See latest release of Tez >> and some hortonworks tutorials on their website. >> The only problem with Spark adoption is the steep learning curve of >> Scala , and understanding the API properly. >> Thanks >> On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA >> wrote= : >>=20 >> Does anybody have any performance figures on how Spark stacks up >> against Tez? If you don=B9t have figures, does anybody have an >> opinion? Spark seems so popular but I=B9m not really seeing why. >> B. >>=20 >>=20 >>=20 >> -- >> Russell Jurney twitter.com/rjurney >> russell.jurney@gmail.com >> >> > >> datasyndrome.com >> --B_3496739685_3331934 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
@edwardcapriolo, funny running into you over here in the had= oop community. =3D)

<= /font>
FWIW, I have the same = perspective and had the same experience with Scala and Spark. 
(I had LISP/Scheme in College too. = =3D)

Additionally, with the JDK8 enhanceme= nts (lambda expressions, method references, etc.), there is less motivation = to move to Scala.
<= br>
Specifically, with = Spark — take a look at thi= s:

-brian

---

= Brian O'Neill

Chief Technology Off= icer


Health Market Science

The Science = of Better Results

2= 700 Horizon Drive  King of Prussia, PA  19406<= o:p>

M: 215.588.6024 @boneill42    

= healthmarketscience.com


This information transmitted in this email message is for the int= ended recipient only and may contain confidential and/or privileged material= . If you received this email in error and are not the intended recipient, or= the person responsible to deliver it to the intended recipient, please cont= act the sender at the email above and delete this email and any attachments = and destroy any copies thereof. Any review, retransmission, dissemination, c= opying or other use of, or taking any action in reliance upon, this informat= ion by persons or entities other than the intended recipient is strictly pro= hibited.

 


From: Edwa= rd Capriolo <edlinuxguru@gmail.com= >
Reply-To: <user@hadoop.apache.org>
Date: Tuesday, October 21, 2014 at 12:06 PM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: <= /span> Re: Spark vs Tez

scala is= not an interpreted language, from my non authoritative view it seems to hav= e 2-3 (thousand) more compile phases than java and as a result some of the t= hings you are doing that look like they are "interpreted" are actually macro= 's that get converted into "usually" efficient java code. 

Abou= t scala in general. I have a few complains. The inter op is kinda clunky, I = have to work in scala and run into stuff, like the json mapper in scala work= s! that is until one property in my scala object is actually a java object, = then it does not or I should be able to call a method in java from scala but= can not figure out how to turn a Comparator into a Comparator[_: <Any]. =

The immutability aspect i find to be a real PITA. It becomes really = hard to write code the way you want to and then if you do not use an immutab= le collection or some other fancy scala construct people get on your case th= at your not writing idiomatic scala (even though few agree on what that real= ly is).

Generally people have a large capacity to assume, = "I'm smart, I know java, and I learned lisp in school so this scala stuff is= going to be a breeze" Don't make that assumption. You will not be proficien= t in writing scala for months. You likely wont be able to hire anyone that h= as done much production scala. And everyone will come up to you and say "so = Im trying to (sort list|cast objects|simple thing) in scala. I can do it in = java but YOUR THE EXPERT and wondering how to do in in scala".






On Tue, Oct 21, 2014= at 10:04 AM, Tim Randles <trandles@lanl.gov> wrote:
Yeah, compared to something as performant as java...
</sarcasm>

On 10/20/2014 10:16 PM, Adaryl "Bob" Wakefield, MBA wrote:
Using an interpreted scripting language with something that is billing
itself as being fast doesn’t sound like the best idea...
B.
*From:* Russell Jurney <mailto:russell.jurney@gmail.com>
*Sent:* Saturday, October 18, 2014 7:38 AM
*To:* user@hadoop.a= pache.org <mailto:user@hadoop.apache.org>
*Subject:* Re: Spark vs Tez
Check out PySpark. No Scala required.

On Friday, October 17, 2014, Adaryl "Bob" Wakefield, MBA
<adaryl.wa= kefield@hotmail.com <mailto:adaryl.wakefield@hotmail.com>> wrote:<= br>
    “The only problem with Spark adoption is the steep lear= ning curve of
    Scala , and understanding the API properly.”
    This is why I’m looking for reasons to avoid Spark. In = my mind, it’s
    one more thing to have to master and doesn’t really hav= e anything to
    offer that can’t be done with other tools that are alre= ady inside my
    skillset. I spoke with some software engineers recently and     basically the discussion boiled down to if you need to master= Java
    or Scala go with Java. Three months into Java I don’t w= ant to stop
    that and start learning Scala.
    B.
    *From:* kartik saxena
    <javascript:_e(%7B%7D,'cvml','kartik.sxn@gmail.com');>
    *Sent:* Friday, October 17, 2014 1:12 PM
    *To:* javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');
    *Subject:* Re: Spark vs Tez
    I did a performance benchmark during my summer internship . I= am
    currently a grad student. Can't reveal much about the specifi= c
    project but Spark is still faster than around 4-5th iteration= of Tez
    of the same query/dataset. By Iteration I mean utilizing the<= br>     "hot-container" property of Apache Tez  . See latest rel= ease of Tez
    and some hortonworks tutorials on their website.
    The only problem with Spark adoption is the steep learning cu= rve of
    Scala , and understanding the API properly.
    Thanks
    On Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA=
    <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com')<= u>;> wrote:

        Does anybody have any performance figures on ho= w Spark stacks up
        against Tez? If you don’t have figures, d= oes anybody have an
        opinion? Spark seems so popular but I’m n= ot really seeing why.
        B.



--
Russell Jurney twitter= .com/rjurney
<http://twitter.com= /rjurney>ru= ssell.jurney@gmail.com
<mailto:russel= l.jurney@gmail.com> datasyndrome.com
<http://datasyndrome.= com/>

--B_3496739685_3331934--