Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CADx-ob3ZwKUBH2mrL0_3wZ=DoA09gJ-Y+x5-P+x3-psC1O58rA@mail.gmail.com>
References: <201505201338147389944@yahoo.com.hk>
	<CADx-ob3ZwKUBH2mrL0_3wZ=DoA09gJ-Y+x5-P+x3-psC1O58rA@mail.gmail.com>
Date: Thu, 21 May 2015 22:31:57 -0700
Message-ID: 
 <CAD7JkQF3rnRwcoH8ATNcZpRNi1reLzzXGWOyiuQj7d2iKqq1SA@mail.gmail.com>
Subject: Re: Hive on Spark VS Spark SQL
From: Cheolsoo Park <piaozhexiu@gmail.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=bcaec51969418d2e290516a4fa38

--bcaec51969418d2e290516a4fa38
Content-Type: text/plain; charset=UTF-8

Hi Xuefu,

Thanks for the good comparison. I agree with most points, but #1 isn't true.

SparkSQL has its own parser (implemented with Scala parser combinator
library), analyzer, and optimizer although they're not as mature as Hive.
What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.

Cheolsoo

On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang <xzhang@cloudera.com> wrote:

> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here are a few factors to be considered:
>
> 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's
> front end (parser and semantic analyzer) and metastore, and inject in
> between a laryer where Hive's operator tree is reinterpreted in Spark's
> constructs (transactions and actions). Thus, it's tied to a specific
> version of Hive, which is always behind official Hive releases.
> 2. Because of the reinterpretation, many features (window functions,
> lateral views, etc) from Hive need to be reimplemented in Spark world. If
> an implementation hasn't been done, you see a gap. That's why you would
> expect functional disparity, not to mention future Hive futures.
> 3. SparkSQL is far from production ready.
> 4. On the other hand, Hive on Spark is native in Hive, embracing all Hive
> features and growing with Hive. Hive's operators are honored without
> re-interpretation. The integration is done at the execution layer, where
> Spark is nothing but an advanced MapReduce engine.
> 5. Hive is aiming at enterprise use cases, where there are more important
> concerns such as security than purely if it works or if it runs fast. Hive
> on Spark certainly makes the query run faster, but still keeps the same
> enterprise-readiness.
> 6. SparkSQL is a good fit if you're a heavy Spark user who occasionally
> needs to run some SQL. Or you're a casual SQL user and like to try
> something new.
> 7. If haven't touched either Spark or Hive, I'd suggest you start with
> Hive, especially for an enterprise.
> 8. If you're an existing Hive user and consider taking advantage of Spark,
> consider Hive on Spark.
> 9. It's strongly discouraged to mix Hive and SparkSQL in your deployment.
> SparkSQL includes a version of Hive, which is very likely at a different
> version of the Hive that you have (even if you don't use Hive on Spark).
> Library conflicts can put you in a nightmare.
> 10. I haven't benchmarked SparkSQL myself, but I heard several reports
> that SparkSQL, when being tried at scale, is either fast or failing your
> queries.
>
> Hope this helps.
>
> Thanks,
>
>
> On Tue, May 19, 2015 at 10:38 PM, guoqing0629@yahoo.com.hk <
> guoqing0629@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> ------------------------------
>> guoqing0629@yahoo.com.hk
>>
>
>

--bcaec51969418d2e290516a4fa38
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Xuefu,<div><br></div><div>Thanks for the good compariso=
n. I agree with most points, but #1 isn&#39;t true.</div><div><br></div><di=
v>SparkSQL has its own parser (implemented with Scala parser combinator lib=
rary), analyzer, and optimizer although they&#39;re not as mature as Hive. =
What it depends on Hive for is Metastore, CliDriver, DDL parser, etc.<br></=
div><div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">Ch=
eolsoo</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On We=
d, May 20, 2015 at 10:45 AM, Xuefu Zhang <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:xzhang@cloudera.com" target=3D"_blank">xzhang@cloudera.com</a>&gt;</s=
pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-lef=
t-style:solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><=
div><div><div><div><div><div><div>I have been working on HIve on Spark, and=
 knows a little about SparkSQL. Here are a few factors to be considered:<br=
><br></div>1. SparkSQL is similar to Shark (discontinued) in that it clones=
 Hive&#39;s front end (parser and semantic analyzer) and metastore, and inj=
ect in between a laryer where Hive&#39;s operator tree is reinterpreted in =
Spark&#39;s constructs (transactions and actions). Thus, it&#39;s tied to a=
 specific version of Hive, which is always behind official Hive releases.<b=
r></div>2. Because of the reinterpretation, many features (window functions=
, lateral views, etc) from Hive need to be reimplemented in Spark world. If=
 an implementation hasn&#39;t been done, you see a gap. That&#39;s why you =
would expect functional disparity, not to mention future Hive futures.<br><=
/div>3. SparkSQL is far from production ready.<br></div>4. On the other han=
d, Hive on Spark is native in Hive, embracing all Hive features and growing=
 with Hive. Hive&#39;s operators are honored without re-interpretation. The=
 integration is done at the execution layer, where Spark is nothing but an =
advanced MapReduce engine.<br></div>5. Hive is aiming at enterprise use cas=
es, where there are more important concerns such as security than purely if=
 it works or if it runs fast. Hive on Spark certainly makes the query run f=
aster, but still keeps the same enterprise-readiness.<br></div>6. SparkSQL =
is a good fit if you&#39;re a heavy Spark user who occasionally needs to ru=
n some SQL. Or you&#39;re a casual SQL user and like to try something new.<=
br></div>7. If haven&#39;t touched either Spark or Hive, I&#39;d suggest yo=
u start with Hive, especially for an enterprise.<br></div>8. If you&#39;re =
an existing Hive user and consider taking advantage of Spark, consider Hive=
 on Spark.<br></div>9. It&#39;s strongly discouraged to mix Hive and SparkS=
QL in your deployment. SparkSQL includes a version of Hive, which is very l=
ikely at a different version of the Hive that you have (even if you don&#39=
;t use Hive on Spark). Library conflicts can put you in a nightmare.<br></d=
iv>10. I haven&#39;t benchmarked SparkSQL myself, but I heard several repor=
ts that SparkSQL, when being tried at scale, is either fast or failing your=
 queries.<br><br></div>Hope this helps.<br><br></div>Thanks,<br><div><div><=
div><br></div></div></div></div><div class=3D""><div class=3D"h5"><div clas=
s=3D"gmail_extra"><br><div class=3D"gmail_quote">On Tue, May 19, 2015 at 10=
:38 PM, <a href=3D"mailto:guoqing0629@yahoo.com.hk" target=3D"_blank">guoqi=
ng0629@yahoo.com.hk</a> <span dir=3D"ltr">&lt;<a href=3D"mailto:guoqing0629=
@yahoo.com.hk" target=3D"_blank">guoqing0629@yahoo.com.hk</a>&gt;</span> wr=
ote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style=
:solid;padding-left:1ex"><div>
<div><span></span>Hive on Spark and SparkSQL which should be better , and w=
hat are the key characteristics and the advantages and the disadvantages be=
tween ?</div>
<div><br></div><div><hr style=3D"width:210px;min-height:1px" align=3D"left"=
 color=3D"#b5c4df" size=3D"1">
<div><span><div style=3D"margin:10px;font-family:verdana;font-size:10pt"><d=
iv><a href=3D"mailto:guoqing0629@yahoo.com.hk" target=3D"_blank">guoqing062=
9@yahoo.com.hk</a></div></div></span></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>

--bcaec51969418d2e290516a4fa38--