Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E23151775F for ; Fri, 17 Oct 2014 18:26:02 +0000 (UTC) Received: (qmail 94932 invoked by uid 500); 17 Oct 2014 18:25:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 94818 invoked by uid 500); 17 Oct 2014 18:25:58 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94807 invoked by uid 99); 17 Oct 2014 18:25:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Oct 2014 18:25:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of apivovarov@gmail.com designates 209.85.217.177 as permitted sender) Received: from [209.85.217.177] (HELO mail-lb0-f177.google.com) (209.85.217.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Oct 2014 18:25:53 +0000 Received: by mail-lb0-f177.google.com with SMTP id w7so1097672lbi.8 for ; Fri, 17 Oct 2014 11:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=+RMnIwVMSTseA0/SpTiBQ/7o8z7NR9VePJ11vWLse0g=; b=qOv2esJ2bpALIGb6misa0ZpTaq1PYoFBDwawF/drYr7IxquiyLnrHBNkQcLmw6QFOW 5Vw1wmc1vBSzCJ850GGcazjQyrdfYTdGChb1XmVzLD7uYpalpFA/K6yY3wqjmXZmk0PG XVjXejHCJx0v28xQBZN1VcxKokJiPWO0+zE0A12PO4y5sJ9gqMGywJBIS37em3IkSf0M 8L7EJ4yX+zC/KW6Eif/qbGFnhEmywBeyNVlrTkqTmAqO/KrnKnBJadiQRsQqwkT8X9at Jm1wk1KtRlEJU3e9km1GpxKfQEJqLELE7mKD6fzdXwqPGIoiPlxPJmep/rKPTj2718UI p7/w== X-Received: by 10.112.198.73 with SMTP id ja9mr10449442lbc.19.1413570331814; Fri, 17 Oct 2014 11:25:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.16.2 with HTTP; Fri, 17 Oct 2014 11:25:11 -0700 (PDT) In-Reply-To: References: From: Alexander Pivovarov Date: Fri, 17 Oct 2014 11:25:11 -0700 Message-ID: Subject: Re: Spark vs Tez To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c344f67e323f0505a27d01 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c344f67e323f0505a27d01 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It's going to be spark engine for hive (in addition to mr and tez). Spark API is available for Java and Python as well. Tez engine is available now and it's quite stable. As for speed. For complex queries it shows 10x-20x improvement in comparison to mr engine. e.g. one of my queries runs 30 min using mr (about 100 mr jobs), if I switch to tez it done in 100 sec. I'm using HDP-2.1.5 (hive-0.13.1, tez 0.4.1) On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA < adaryl.wakefield@hotmail.com> wrote: > It was my understanding that Spark is faster batch processing. Tez is > the new execution engine that replaces MapReduce and is also supposed to > speed up batch processing. Is that not correct? > B. > > > > *From:* Shahab Yunus > *Sent:* Friday, October 17, 2014 1:12 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Spark vs Tez > > What aspects of Tez and Spark are you comparing? They have different > purposes and thus not directly comparable, as far as I understand. > > Regards, > Shahab > > On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bob" Wakefield, MBA < > adaryl.wakefield@hotmail.com> wrote: > >> Does anybody have any performance figures on how Spark stacks up >> against Tez? If you don=E2=80=99t have figures, does anybody have an opi= nion? Spark >> seems so popular but I=E2=80=99m not really seeing why. >> B. >> > > --001a11c344f67e323f0505a27d01 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It's going to be spark engine for hive (in addition to= mr and tez).

Spark API is available for Java and Python= as well.

Tez engine is available now and it's= quite stable. As for speed.=C2=A0 For complex queries it shows 10x-20x imp= rovement in comparison to mr engine.
e.g. one of my queries runs = 30 min using mr (about 100 mr jobs), =C2=A0 if I switch to tez it done in 1= 00 sec.

I'm using HDP-2.1.5 (hive-0.13.1, tez = 0.4.1)

On Fri, Oct 17, 2014 at 11:23 AM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com> wrote:
It was my understanding that Spark is faster batch processing. Tez is = the=20 new execution engine that replaces MapReduce and is also supposed to speed = up=20 batch processing. Is that not correct?
B.
=C2=A0
= =C2=A0
=C2=A0
Sent: Friday, October 17, 2014 1:12 PM
Subject: Re: Spark vs Tez
=C2=A0
What aspects of Tez and Spark are you comparing? They have= =20 different purposes and thus not directly comparable, as far as I understand= .=20
=C2=A0
Regards,
Shahab
=C2=A0
On Fri, Oct 17, 2014 at 2:06 PM, Adaryl "Bo= b" Wakefield,=20 MBA <adaryl.wakefield@hotmail.com> wrote:
Does anybody have any performance figures on how Spark stacks up aga= inst=20 Tez? If you don=E2=80=99t have figures, does anybody have an opinion? Spa= rk seems so=20 popular but I=E2=80=99m not really seeing why.
B.
=C2=A0

--001a11c344f67e323f0505a27d01--