hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@apache.org>
Subject Re: using the Hive SQL parser in Spark
Date Thu, 19 May 2016 10:36:52 GMT
I want to give an update since there have been some new development since
my last email.

We did import Hive's parser into Spark in Feb, but then in April that was
replaced by another ANTLR4 based parser. So the net effect is that this
didn't happen (no release was made with the Hive parser).

Thanks for the support.

On Friday, December 18, 2015, Reynold Xin <rxin@apache.org> wrote:

> (Please use reply-all so I see the replies)
>
> Responses inline.
>
>
> On Fri, Dec 18, 2015 at 1:17 PM, Yin Huai <huaiyin.thu@gmail.com
> <javascript:_e(%7B%7D,'cvml','huaiyin.thu@gmail.com');>> wrote:
>
>> Let me add Reynold to the thread.
>>
>> On Fri, Dec 18, 2015 at 12:36 PM, Gopal Vijayaraghavan <gopalv@apache.org
>> <javascript:_e(%7B%7D,'cvml','gopalv@apache.org');>> wrote:
>>
>>>
>>> >We have looked into various options, and it looks like the best option
>>> is
>>> >to copy the ANTLR grammar file from Hive into Spark. Because the grammar
>>> >file is tightly coupled with Hive's semantic analysis, we need to
>>> refactor
>>> >some code to use them so it will end up becoming the .g file plus some
>>> >coupled code.
>>>
>>> Is the eventual goal to contribute that fork back into Hive & have Hive
>>> devs maintain a compatible parser for SparkSQL?
>>>
>>> Would that affect Hive's ability to refactor the SQL parser in the future
>>> or is this a one-time only deal?
>>
>>
> I am not sure if it is useful at all to port that back to Hive since it
> has zero user facing benefit, and would require Hive devs to spend a lot of
> time reviewing the changes. Refactoring like this is always risky for an
> established project.
>
>
>>
>>>
>>> >parser. From Hive's perspective this does not provide any immediate
>>> >benefits. From Spark's perspective, we iterate very quickly so having to
>>> >depend on an external component also slow down our development. We also
>>> >have some requirements that simply don't apply in other projects (e.g.
>>> >being able to parse DataFrame expressions).
>>>
>>> From that I assume, this involves some form of cut-paste duplication of
>>> the code into SparkSQL project with that version diverging away from
>>> Hive's.
>>
>>
> That is correct.
>
>
>>
>>>
>>> > Thanks a lot for developing this parser, and we will try our best to
>>> > contribute back as we fix bugs. I will also make sure we have the
>>> proper
>>> > acknowledgment when we do this.
>>>
>>>
>>> Under the Apache license, there's no actual restriction against a hostile
>>> embrace-extend by copying hive's code verbatim as long as the fork
>>> retains
>>> license notices.
>>>
>>> The maintainability concerns are mostly around whether this is intended
>>> as
>>> an ongoing relationship, including any compatibility committments from
>>> hive-dev@.
>>>
>>
> No commitments needed from Hive. You should update/improve the parser as
> you see fit. We do have a pretty comprehensive suite of Hive compatibility
> tests (by using the Hive tests directly) to ensure SQL compatibility with
> Hive. We will continue running those. We will also try our best to
> contribute back bug fixes to the parser.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message