hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: Building Spark to use for Hive on Spark
Date Mon, 23 Nov 2015 06:09:38 GMT
Hive is supposed to work with any version of Hive (1.1+) and a version of
Spark w/o Hive. Thus, to make HoS work reliably and also simply the
matters, I think it still makes to require that spark-assembly jar
shouldn't contain Hive Jars. Otherwise, you have to make sure that your
Hive version matches the same as the "other" Hive version that's included
in Spark.

In CDH 5.x, Spark version is 1.5, and we still build Spark jar w/o Hive.

Therefore, I don't see a need to update the doc.

--Xuefu

On Sun, Nov 22, 2015 at 9:23 PM, Lefty Leverenz <leftyleverenz@gmail.com>
wrote:

> Gopal, can you confirm the doc change that Jone Zhang suggests?  The
> second sentence confuses me:  "You can choose Spark1.5.0+ which  build
> include the Hive jars."
>
> Thanks.
>
> -- Lefty
>
>
> On Thu, Nov 19, 2015 at 8:33 PM, Jone Zhang <joyoungzhang@gmail.com>
> wrote:
>
>> I should add that Spark1.5.0+ is used hive1.2.1 default when you use
>> -Phive
>>
>> So this page
>> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started>
shoule
>> write like below
>> “Note that you must have a version of Spark which does *not* include the
>> Hive jars if you use Spark1.4.1 and before, You can choose Spark1.5.0+
>> which  build include the Hive jars ”
>>
>>
>> 2015-11-19 5:12 GMT+08:00 Gopal Vijayaraghavan <gopalv@apache.org>:
>>
>>>
>>>
>>> > I wanted to know  why is it necessary to remove the Hive jars from the
>>> >Spark build as mentioned on this
>>>
>>> Because SparkSQL was originally based on Hive & still uses Hive AST to
>>> parse SQL.
>>>
>>> The org.apache.spark.sql.hive package contains the parser which has
>>> hard-references to the hive's internal AST, which is unfortunately
>>> auto-generated code (HiveParser.TOK_TABNAME etc).
>>>
>>> Everytime Hive makes a release, those constants change in value and that
>>> is private API because of the lack of backwards-compat, which is violated
>>> by SparkSQL.
>>>
>>> So Hive-on-Spark forces mismatched versions of Hive classes, because it's
>>> a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic
>>> laws of causality.
>>>
>>> Spark cannot depend on a version of Hive that is unreleased and
>>> Hive-on-Spark release cannot depend on a version of Spark that is
>>> unreleased.
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>
>

Mime
View raw message