Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21866180D6 for ; Wed, 3 Feb 2016 04:49:47 +0000 (UTC) Received: (qmail 77654 invoked by uid 500); 3 Feb 2016 04:49:45 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 77581 invoked by uid 500); 3 Feb 2016 04:49:45 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 77567 invoked by uid 99); 3 Feb 2016 04:49:44 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Feb 2016 04:49:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 76CD41A0371 for ; Wed, 3 Feb 2016 04:49:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=tresata-com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 8w3YTQUjzw6N for ; Wed, 3 Feb 2016 04:49:38 +0000 (UTC) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id E11962304B for ; Wed, 3 Feb 2016 04:49:37 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id p63so147404105wmp.1 for ; Tue, 02 Feb 2016 20:49:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tresata-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=75YLJntmd7QlaMF5373LoP/ujclhsGdASk3VBNpcvug=; b=dxDBmiw9ozTou/UFNKubVMJGoqq/ictYSV2x7j6iTfYGauPz6Ol1ZhhS7WxndiyD7K fMKDl18BxxpQV+EIp1HRKmXR7apHtjT0FSOtE+Kmv09YNWT6q5/OSEyduPTbXufD91JN JEd8IQCCU7jki0gtEXU/NCLonWSQncR4H7voAv6dARdIFvF4UjpNcDYRuySfIyyrEiH5 Z7UMiDVaVn/2FhZz4b+HDMebEpBtTXFg6ggjGGGhxjWNO1aHE6oqQv/6Rx0WBsEK6W+7 2laUW/B7i1YK0WQwTS+5sMnTZQ4RLLl2+fTIFLHMGYiU77NxvY3Upi+fLlHw/6wkXyaS lXug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=75YLJntmd7QlaMF5373LoP/ujclhsGdASk3VBNpcvug=; b=i4MiYI0tfBOwjq97uvSuSKfQPP6K6DBapqqvDIObeTRglCdwwqeXh7vssCnFAujlMr 4rkALApdyq4w35rg7EcipqaepvxqXbsD8JnidKvaYwaz9AtrvcQdBVt7+wkzz5E5Xp8U fvNtCjxk9eSzAvguudxNZ2BZKl8eJC6q7mSc+m8o865OM28UmCTLtYheMF26TtqWRE3m 4w78/I3WqNmK8ghOqEbH8vXqxO7hnEj895DSVu0SKEr59L6ycoG2NCFBhRQu8FXAAEmX IvRWCkysHDi6var8oiPVhI32GIB98M4UVrjsGAyKO31xKY5PbeASbXYgSHxm/52grl9F oDeg== X-Gm-Message-State: AG10YOSmXFce1NOkxcTYFpJb5CczUcIfneEMorCStd/ZIjcczP+N359uaomY8DfRRVEiI9HZe2tuSqf1VbvyZw== MIME-Version: 1.0 X-Received: by 10.194.9.42 with SMTP id w10mr31402316wja.159.1454474977441; Tue, 02 Feb 2016 20:49:37 -0800 (PST) Received: by 10.194.201.166 with HTTP; Tue, 2 Feb 2016 20:49:37 -0800 (PST) X-Originating-IP: [209.150.41.132] In-Reply-To: References: <075501d15e0a$7bd0a410$7371ec30$@peridale.co.uk> <076801d15e11$dd834e40$9889eac0$@peridale.co.uk> Date: Tue, 2 Feb 2016 23:49:37 -0500 Message-ID: Subject: Re: Hive on Spark Engine versus Spark using Hive metastore From: Koert Kuipers To: user@hive.apache.org Content-Type: multipart/alternative; boundary=047d7b4504965d7dad052ad6587c --047d7b4504965d7dad052ad6587c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable yeah but have you ever seen somewhat write a real analytical program in hive? how? where are the basic abstractions to wrap up a large amount of operations (joins, groupby's) into a single function call? where are the tools to write nice unit test for that? for example in spark i can write a DataFrame =3D> DataFrame that internally does many joins, groupBys and complex operations. all unit tested and perfectly re-usable. and in hive? copy paste round sql queries? thats just dangerous. On Tue, Feb 2, 2016 at 8:09 PM, Edward Capriolo wrote: > Hive has numerous extension points, you are not boxed in by a long shot. > > > On Tuesday, February 2, 2016, Koert Kuipers wrote: > >> uuuhm with spark using Hive metastore you actually have a real >> programming environment and you can write real functions, versus just be= ing >> boxed into some version of sql and limited udfs? >> >> On Tue, Feb 2, 2016 at 6:46 PM, Xuefu Zhang wrote: >> >>> When comparing the performance, you need to do it apple vs apple. In >>> another thread, you mentioned that Hive on Spark is much slower than Sp= ark >>> SQL. However, you configured Hive such that only two tasks can run in >>> parallel. However, you didn't provide information on how much Spark SQL= is >>> utilizing. Thus, it's hard to tell whether it's just a configuration >>> problem in your Hive or Spark SQL is indeed faster. You should be able = to >>> see the resource usage in YARN resource manage URL. >>> >>> --Xuefu >>> >>> On Tue, Feb 2, 2016 at 3:31 PM, Mich Talebzadeh >>> wrote: >>> >>>> Thanks Jeff. >>>> >>>> >>>> >>>> Obviously Hive is much more feature rich compared to Spark. Having sai= d >>>> that in certain areas for example where the SQL feature is available i= n >>>> Spark, Spark seems to deliver faster. >>>> >>>> >>>> >>>> This may be: >>>> >>>> >>>> >>>> 1. Spark does both the optimisation and execution seamlessly >>>> >>>> 2. Hive on Spark has to invoke YARN that adds another layer to the >>>> process >>>> >>>> >>>> >>>> Now I did some simple tests on a 100Million rows ORC table available >>>> through Hive to both. >>>> >>>> >>>> >>>> *Spark 1.5.2 on Hive 1.2.1 Metastore* >>>> >>>> >>>> >>>> >>>> >>>> spark-sql> select * from dummy where id in (1, 5, 100000); >>>> >>>> 1 0 0 63 >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi 1 >>>> xxxxxxxxxx >>>> >>>> 5 0 4 31 >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA 5 >>>> xxxxxxxxxx >>>> >>>> 100000 99 999 188 >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe 100000 >>>> xxxxxxxxxx >>>> >>>> Time taken: 50.805 seconds, Fetched 3 row(s) >>>> >>>> spark-sql> select * from dummy where id in (1, 5, 100000); >>>> >>>> 1 0 0 63 >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi 1 >>>> xxxxxxxxxx >>>> >>>> 5 0 4 31 >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA 5 >>>> xxxxxxxxxx >>>> >>>> 100000 99 999 188 >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe 100000 >>>> xxxxxxxxxx >>>> >>>> Time taken: 50.358 seconds, Fetched 3 row(s) >>>> >>>> spark-sql> select * from dummy where id in (1, 5, 100000); >>>> >>>> 1 0 0 63 >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi 1 >>>> xxxxxxxxxx >>>> >>>> 5 0 4 31 >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA 5 >>>> xxxxxxxxxx >>>> >>>> 100000 99 999 188 >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe 100000 >>>> xxxxxxxxxx >>>> >>>> Time taken: 50.563 seconds, Fetched 3 row(s) >>>> >>>> >>>> >>>> So three runs returning three rows just over 50 seconds >>>> >>>> >>>> >>>> *Hive 1.2.1 on spark 1.3.1 execution engine* >>>> >>>> >>>> >>>> 0: jdbc:hive2://rhes564:10010/default> select * from dummy where id in >>>> (1, 5, 100000); >>>> >>>> INFO : >>>> >>>> Query Hive on Spark job[4] stages: >>>> >>>> INFO : 4 >>>> >>>> INFO : >>>> >>>> Status: Running (Hive on Spark job[4]) >>>> >>>> INFO : Status: Finished successfully in 82.49 seconds >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | dummy.id | dummy.clustered | dummy.scattered | dummy.randomised >>>> | dummy.random_string | dummy.small_vc= | >>>> dummy.padding | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | 1 | 0 | 0 | 63 = | >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi | 1 = | >>>> xxxxxxxxxx | >>>> >>>> | 5 | 0 | 4 | 31 = | >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA | 5 = | >>>> xxxxxxxxxx | >>>> >>>> | 100000 | 99 | 999 | 188 = | >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe | 100000 = | >>>> xxxxxxxxxx | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> 3 rows selected (82.66 seconds) >>>> >>>> 0: jdbc:hive2://rhes564:10010/default> select * from dummy where id in >>>> (1, 5, 100000); >>>> >>>> INFO : Status: Finished successfully in 76.67 seconds >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | dummy.id | dummy.clustered | dummy.scattered | dummy.randomised >>>> | dummy.random_string | dummy.small_vc= | >>>> dummy.padding | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | 1 | 0 | 0 | 63 = | >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi | 1 = | >>>> xxxxxxxxxx | >>>> >>>> | 5 | 0 | 4 | 31 = | >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA | 5 = | >>>> xxxxxxxxxx | >>>> >>>> | 100000 | 99 | 999 | 188 = | >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe | 100000 = | >>>> xxxxxxxxxx | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> 3 rows selected (76.835 seconds) >>>> >>>> 0: jdbc:hive2://rhes564:10010/default> select * from dummy where id in >>>> (1, 5, 100000); >>>> >>>> INFO : Status: Finished successfully in 80.54 seconds >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | dummy.id | dummy.clustered | dummy.scattered | dummy.randomised >>>> | dummy.random_string | dummy.small_vc= | >>>> dummy.padding | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> | 1 | 0 | 0 | 63 = | >>>> rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi | 1 = | >>>> xxxxxxxxxx | >>>> >>>> | 5 | 0 | 4 | 31 = | >>>> vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA | 5 = | >>>> xxxxxxxxxx | >>>> >>>> | 100000 | 99 | 999 | 188 = | >>>> abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe | 100000 = | >>>> xxxxxxxxxx | >>>> >>>> >>>> +-----------+------------------+------------------+-------------------= +-----------------------------------------------------+-----------------+--= --------------+--+ >>>> >>>> 3 rows selected (80.718 seconds) >>>> >>>> >>>> >>>> Three runs returning the same rows in 80 seconds. >>>> >>>> >>>> >>>> It is possible that My Spark engine with Hive is 1.3.1 which is out of >>>> date and that causes this lag. >>>> >>>> >>>> >>>> There are certain queries that one cannot do with Spark. Besides it >>>> does not recognize CHAR fields which is a pain. >>>> >>>> >>>> >>>> spark-sql> *CREATE TEMPORARY TABLE tmp AS* >>>> >>>> > SELECT t.calendar_month_desc, c.channel_desc, >>>> SUM(s.amount_sold) AS TotalSales >>>> >>>> > FROM sales s, times t, channels c >>>> >>>> > WHERE s.time_id =3D t.time_id >>>> >>>> > AND s.channel_id =3D c.channel_id >>>> >>>> > GROUP BY t.calendar_month_desc, c.channel_desc >>>> >>>> > ; >>>> >>>> Error in query: Unhandled clauses: TEMPORARY 1, 2,2, 7 >>>> >>>> . >>>> >>>> You are likely trying to use an unsupported Hive feature."; >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxian= rbJd6zP6AcPCCdOABUrV8Pw >>>> * >>>> >>>> >>>> >>>> *Sybase ASE 15 Gold Medal Award 2008* >>>> >>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>>> >>>> >>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0= 91908.pdf >>>> >>>> Author of the books* "A Practitioner=E2=80=99s Guide to Upgrading to S= ybase >>>> ASE 15", ISBN 978-0-9563693-0-7*. >>>> >>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>>> 978-0-9759693-0-4* >>>> >>>> *Publications due shortly:* >>>> >>>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>>> 978-0-9563693-3-8 >>>> >>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, = volume >>>> one out shortly >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> NOTE: The information in this email is proprietary and confidential. >>>> This message is for the designated recipient only, if you are not the >>>> intended recipient, you should destroy it immediately. Any information= in >>>> this message shall not be understood as given or endorsed by Peridale >>>> Technology Ltd, its subsidiaries or their employees, unless expressly = so >>>> stated. It is the responsibility of the recipient to ensure that this = email >>>> is virus free, therefore neither Peridale Technology Ltd, its subsidia= ries >>>> nor their employees accept any responsibility. >>>> >>>> >>>> >>>> *From:* Xuefu Zhang [mailto:xzhang@cloudera.com] >>>> *Sent:* 02 February 2016 23:12 >>>> *To:* user@hive.apache.org >>>> *Subject:* Re: Hive on Spark Engine versus Spark using Hive metastore >>>> >>>> >>>> >>>> I think the diff is not only about which does optimization but more on >>>> feature parity. Hive on Spark offers all functional features that Hive >>>> offers and these features play out faster. However, Spark SQL is far f= rom >>>> offering this parity as far as I know. >>>> >>>> >>>> >>>> On Tue, Feb 2, 2016 at 2:38 PM, Mich Talebzadeh >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> My understanding is that with Hive on Spark engine, one gets the Hive >>>> optimizer and Spark query engine >>>> >>>> >>>> >>>> With spark using Hive metastore, Spark does both the optimization and >>>> query engine. The only value add is that one can access the underlying= Hive >>>> tables from spark-sql etc >>>> >>>> >>>> >>>> >>>> >>>> Is this assessment correct? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxian= rbJd6zP6AcPCCdOABUrV8Pw >>>> * >>>> >>>> >>>> >>>> *Sybase ASE 15 Gold Medal Award 2008* >>>> >>>> A Winning Strategy: Running the most Critical Financial Data on ASE 15 >>>> >>>> >>>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0= 91908.pdf >>>> >>>> Author of the books* "A Practitioner=E2=80=99s Guide to Upgrading to S= ybase >>>> ASE 15", ISBN 978-0-9563693-0-7*. >>>> >>>> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN >>>> 978-0-9759693-0-4* >>>> >>>> *Publications due shortly:* >>>> >>>> *Complex Event Processing in Heterogeneous Environments*, ISBN: >>>> 978-0-9563693-3-8 >>>> >>>> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, = volume >>>> one out shortly >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> NOTE: The information in this email is proprietary and confidential. >>>> This message is for the designated recipient only, if you are not the >>>> intended recipient, you should destroy it immediately. Any information= in >>>> this message shall not be understood as given or endorsed by Peridale >>>> Technology Ltd, its subsidiaries or their employees, unless expressly = so >>>> stated. It is the responsibility of the recipient to ensure that this = email >>>> is virus free, therefore neither Peridale Technology Ltd, its subsidia= ries >>>> nor their employees accept any responsibility. >>>> >>>> >>>> >>>> >>>> >>> >>> >> > > -- > Sorry this was sent from mobile. Will do less grammar and spell check tha= n > usual. > --047d7b4504965d7dad052ad6587c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
yeah but have you ever seen somewhat write a real analytical= program in hive? how? where are the basic abstractions to wrap up a large = amount of operations (joins, groupby's) into a single function call? wh= ere are the tools to write nice unit test for that?

for example= in spark i can write a DataFrame =3D> DataFrame that internally does ma= ny joins, groupBys and complex operations. all unit tested and perfectly re= -usable. and in hive? copy paste round sql queries? thats just dangerous.

On = Tue, Feb 2, 2016 at 8:09 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
Hive has numerous ex= tension points, you are not boxed in by a long shot.
<= div class=3D"h5">

On Tuesday, February 2, 2016, Koert Kuipers <koert@tresata.com
&= gt; wrote:
uuuhm with spark using Hive metas= tore you actually have a real programming environment and you can write rea= l functions, versus just being boxed into some version of sql and limited u= dfs?

On Tue, Feb 2, 2016 at 6:46 PM, Xuefu Zhang <= ;xzhang@cloudera.com> wrote:
When comparing the performance, you need to do i= t apple vs apple. In another thread, you mentioned that Hive on Spark is mu= ch slower than Spark SQL. However, you configured Hive such that only two t= asks can run in parallel. However, you didn't provide information on ho= w much Spark SQL is utilizing. Thus, it's hard to tell whether it's= just a configuration problem in your Hive or Spark SQL is indeed faster. Y= ou should be able to see the resource usage in YARN resource manage URL.

--Xuefu

On Tue, Feb 2, 2016 at 3:31 PM, Mich Tale= bzadeh <mich@peridale.co.uk> wrote:

Thanks Jeff.

=

=C2=A0

= Obviously Hive is much more feature rich compared to Spark. Having said tha= t in certain areas for example where the SQL feature is available in Spark,= Spark seems to deliver faster.

=C2=A0

This may be:=

=C2=A0

<= u>1.=C2=A0= =C2=A0=C2=A0 Spark does both the optimisation an= d execution seamlessly

2.=C2=A0=C2=A0=C2=A0 Hive on Spark has to invoke YARN that adds another layer to = the process

=C2=A0<= /u>

Now I did some simple tests on a 100Mil= lion rows ORC table available through Hive to both.

=C2=A0

Spark 1.5.2 o= n Hive 1.2.1 Metastore

<= span style=3D"font-family:"Arial",sans-serif">=C2=A0

=C2=A0

spark-sql> se= lect * from dummy where id in (1, 5, 100000);

1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 63=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 xxxxxxxxxx

5=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 31=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vDsFoYAO= citwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 xxxxxxxxxx

100000=C2=A0 99=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 999=C2=A0=C2=A0=C2=A0=C2=A0 188=C2=A0=C2=A0=C2= =A0=C2=A0 abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 xxxxxxxxxx

Time taken: 50.805 seco= nds, Fetched 3 row(s)

spark-sql> sele= ct * from dummy where id in (1, 5, 100000);

1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 63=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 xxxxxxxxxx

5=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 31=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vDsFoYAO= citwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 xxxxxxxxxx

100000=C2=A0 99=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 999=C2=A0=C2=A0=C2=A0=C2=A0 188=C2=A0=C2=A0=C2= =A0=C2=A0 abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 xxxxxxxxxx

Time taken: 50.358 seco= nds, Fetched 3 row(s)

spark-sql> sele= ct * from dummy where id in (1, 5, 100000);

1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 63=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 xxxxxxxxxx

5=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 31=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vDsFoYAO= citwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 xxxxxxxxxx

100000=C2=A0 99 =C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0999=C2=A0=C2=A0=C2=A0=C2=A0 188=C2=A0=C2=A0=C2= =A0=C2=A0 abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 xxxxxxxxxx

Time taken: 50.563 seco= nds, Fetched 3 row(s)

= =C2=A0

So three runs returning three= rows just over 50 seconds

<= span style=3D"font-size:11.0pt;font-family:"Arial",sans-serif">=C2=A0

Hive 1.2.1 on spark 1.3.1 execution en= gine

=C2=A0

0: jdbc:hive2://rhes564:10010/default> select *= from dummy where id in (1, 5, 100000);

INFO=C2=A0 :

Query Hive on Spark job[4] stages:

INFO=C2=A0 : 4

INFO=C2=A0 :

Status: Running (Hive on Spark job[4])=

INFO=C2=A0 : Status: Finished successful= ly in 82.49 seconds

= +-----------+------------------+------------------+-------------------+----= -------------------------------------------------+-----------------+-------= ---------+--+

| dummy.id=C2=A0 | dummy.cluste= red=C2=A0 | dummy.scattered=C2=A0 | dummy.randomised=C2=A0 |=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 dummy.random_string=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | dummy.small_vc=C2=A0 | d= ummy.padding=C2=A0 |

+-----------+------------------+------------------+-------------------+---= --------------------------------------------------+-----------------+------= ----------+--+

| 1= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 | 63=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEU= GELqTSRnFjRGbi=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | xxxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |<= u>

| 5=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| 4=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= | 31=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 | vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA= =C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 | xxxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |<= /span>

| 100000=C2=A0=C2=A0=C2=A0 | 99=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | 999=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 | 188=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRK= wulZAoJe=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 | xxxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |

+-----------+------------------+------------------+-= ------------------+-----------------------------------------------------+--= ---------------+----------------+--+

3 rows selected (82.66 seconds)

0: jdbc:hive2://rhes564:10010/default> sele= ct * from dummy where id in (1, 5, 100000);

INFO=C2=A0 : Status: Finished successfully in 76.67= seconds

+----------= -+------------------+------------------+-------------------+---------------= --------------------------------------+-----------------+----------------+-= -+

| dummy.id=C2=A0 | dummy.clustered=C2=A0 |= dummy.scattered=C2=A0 | dummy.randomised=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 du= mmy.random_string=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | dummy.small_vc=C2=A0 | dummy.padd= ing=C2=A0 |

+-------= ----+------------------+------------------+-------------------+------------= -----------------------------------------+-----------------+---------------= -+--+

| 1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | 63=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnF= jRGbi=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 | xxxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |

| 5=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 4=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 31=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA=C2=A0 |=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | xxxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |

| 100000=C2=A0=C2=A0=C2=A0 | 99=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| 999= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 | 188=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 | abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe=C2= =A0 |=C2=A0=C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | xxxxxx= xxxx=C2=A0=C2=A0=C2=A0=C2=A0 |

+-----------+------------------+------------------+-------------= ------+-----------------------------------------------------+--------------= ---+----------------+--+

3 rows selected (76.835 seconds)

0: jdbc:hive2://rhes564:10010/default> select * from d= ummy where id in (1, 5, 100000);

INFO=C2=A0 : Status: Finished successfully in 80.54 seconds=

+-----------+---------= ---------+------------------+-------------------+--------------------------= ---------------------------+-----------------+----------------+--+

| dummy.id=C2=A0 | dummy.clustered=C2=A0 | dummy.scat= tered=C2=A0 | dummy.randomised=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dummy.random_s= tring=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | dummy.small_vc=C2=A0 | dummy.padding=C2=A0 |<= u>

+-----------+-------= -----------+------------------+-------------------+------------------------= -----------------------------+-----------------+----------------+--+=

| 1=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 63= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 | rMLTDXxxqXOZnqYRJwInlGfGBTxNkAszBGEUGELqTSRnFjRGbi=C2=A0 = |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 | xxxxxxxxxx=C2=A0=C2=A0 =C2=A0=C2=A0|

| 5=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 4=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 31=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | = vDsFoYAOcitwrWNXCxPHzIIIxwKpTlrsVjFFKUDivytqJqOHGA=C2=A0 |=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | x= xxxxxxxxx=C2=A0=C2=A0=C2=A0=C2=A0 |

| 100000=C2=A0=C2=A0=C2=A0 | 99=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 999=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 188=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | abQyrlxKzPTJliMqDpsfDTJUQzdNdfofUQhrKqXvRKwulZAoJe=C2=A0 |=C2=A0= =C2=A0=C2=A0=C2=A0 100000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | xxxxxxxxxx=C2=A0= =C2=A0=C2=A0=C2=A0 |

+-----------+------------------+------------------+-------------------+---= --------------------------------------------------+-----------------+------= ----------+--+

3 r= ows selected (80.718 seconds)

=C2=A0

Three runs returning = the same rows in 80 seconds.

=C2=A0

It is possible that M= y Spark engine with Hive is 1.3.1 which is out of date and that causes this= lag.

=C2=A0

There are certain queries that one cannot do= with Spark. Besides it does not recognize CHAR fields which is a pain.<= /u>

=C2=A0

<= p class=3D"MsoNormal">spark-sql> CREATE TEMPORARY TABLE tmp AS

=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > SELECT t.calendar_month_desc, c.channel= _desc, SUM(s.amount_sold) AS TotalSales

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > FROM= sales s, times t, channels c

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > WHERE s.time_i= d =3D t.time_id

=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > AND=C2=A0=C2=A0 s.channel_i= d =3D c.channel_id

=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > GROUP BY t.calendar_mont= h_desc, c.channel_desc

= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > ;

Error in query: Unhandled clauses: TEMPORA= RY 1, 2,2, 7

.

You are likely trying to use an= unsupported Hive feature.";

=C2=A0

=C2= =A0

=C2=A0

<= p class=3D"MsoNormal">=C2=A0

= <= u>=C2=A0

Dr Mich Talebzadeh

=C2=A0

LinkedIn =C2=A0https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6A= cPCCdOABUrV8Pw

=C2=A0

Sybase ASE 1= 5 Gold Medal Award 2008

A Winning Strategy: Running the = most Critical Financial Data on ASE 15

http://login.sybase.com/files/P= roduct_Overviews/ASE-Winning-Strategy-091908.pdf

Author of = the books "A Practitioner=E2=80=99s Guide to Upgrading to Sybase AS= E 15", ISBN 978-0-9563693-0-7.

co-author "S= ybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4<= /b>

Publications due shortly:

Complex Event Processing in Heterogen= eous Environments, ISBN: 978-0-9563693-3-8<= /u>

Oracle = and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly<= /p>

= =C2=A0

= http://t= alebzadehmich.wordpress.com

=C2=A0

NOTE: The= information in this email is proprietary and confidential. This message is= for the designated recipient only, if you are not the intended recipient, = you should destroy it immediately. Any information in this message shall no= t be understood as given or endorsed by Peridale Technology Ltd, its subsid= iaries or their employees, unless expressly so stated. It is the responsibi= lity of the recipient to ensure that this email is virus free, therefore ne= ither Peridale Technology Ltd, its subsidiaries nor their employees accept = any responsibility.

=C2=A0

= From: Xuefu Zhang [mailto:xzh= ang@cloudera.com]
Sent: 02 February 2016 23:12
To:= user@hive.apache.org
Subject: Re: Hive on Spark Engine ve= rsus Spark using Hive metastore

=C2=A0

I think = the diff is not only about which does optimization but more on feature pari= ty. Hive on Spark offers all functional features that Hive offers and these= features play out faster. However, Spark SQL is far from offering this par= ity as far as I know.

=C2=A0

On Tue, Feb 2, 2016 at 2:= 38 PM, Mich Talebzadeh <mich@peridale.co.uk> wrote:<= /u>

H= i,

=C2=A0

My unde= rstanding is that with Hive on Spark engine, one gets the Hive optimizer an= d Spark query engine

=C2=A0

With spark using Hive metastore, Spark does both the optimizatio= n and query engine. The only value add is that one can access the underlyin= g Hive tables from spark-sql etc

=C2=A0=

=C2=A0

Is this assessme= nt correct?

=C2=A0

=C2=A0

=C2=A0

T= hanks

=C2=A0

Dr = Mich Talebzadeh

=C2=A0

LinkedIn =C2=A0https:= //www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8P= w

=C2=A0

Sybase ASE 15 Gold Medal Award = 2008

A Winning Strategy: Running the most Critical Finan= cial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/AS= E-Winning-Strategy-091908.pdf

Author of the books "= A Practitioner=E2=80=99s Guide to Upgrading to Sybase ASE 15", ISBN 97= 8-0-9563693-0-7.

co-author "Sybase Transact SQL G= uidelines Best Practices", ISBN 978-0-9759693-0-4=

Publications due shortly:

Com= plex Event Processing in Heterogeneous Environments= , ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out = shortly

=C2=A0

http://talebzadehmich.wordpress.com=

=C2=A0

NOTE: The information in this email is proprietary= and confidential. This message is for the designated recipient only, if yo= u are not the intended recipient, you should destroy it immediately. Any in= formation in this message shall not be understood as given or endorsed by P= eridale Technology Ltd, its subsidiaries or their employees, unless express= ly so stated. It is the responsibility of the recipient to ensure that this= email is virus free, therefore neither Peridale Technology Ltd, its subsid= iaries nor their employees accept any responsibility.<= /p>

=C2=A0

=

=C2=A0





--
Sorry this was sent from mobile. Will do less grammar and spell= check than usual.

--047d7b4504965d7dad052ad6587c--