hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: ORC separate project
Date Wed, 08 Apr 2015 04:03:17 GMT
Actually not so -- a spin-off project would have its own PMC and the Hive
PMC wouldn't have any say-so.  Of course, there would be some overlap of
the two PMCs.

(I'm not even sure if the PMC has governance of code, technically.  That
might belong to the committers or the development community.  Well, the PMC
does vote on release candidates so that's a kind of goverance.  But the
community is supposed to decide on major issues.)

Anyway under the Apache license, nobody needs permission from the PMC to
grab some code and use it for another purpose.


-- Lefty

On Tue, Apr 7, 2015 at 11:49 PM, Xuefu Zhang <xzhang@cloudera.com> wrote:

> If I understood Allen's #2 comment, we are moving existing ORC code out of
> Hive and make it a separate project, which I definitely missed. Since
> existing Hive PMC has governance on the code, I would expect it's still the
> case even after the spinoff. Obviously the proposal doesn't reflect this.
>
> Thanks,
> Xuefu
>
> On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates <alanfgates@gmail.com> wrote:
>
>> A couple of points:
>>
>> 1) ORC isn't going into the incubator.  The proposal before the board is
>> for it to go straight to TLP.  There's no graduation to depend on.
>> 2) As currently proposed Hive would not depend on ORC to build.  Hive
>> users who wished to used ORC would obviously need to pull in ORC artifacts
>> in addition to Hive.  Given this I don't think it makes any sense to fork
>> ORC and have it in both places.  This actually seems the worse outcome, as
>> the two will inevitably diverge.
>>
>> Alan.
>>
>>   Xuefu Zhang <xzhang@cloudera.com>
>>  April 3, 2015 at 6:41
>> I actually have a different thought to share along the same line.
>>
>> ORC is not a subproject in Hive. I'm not sure if it's the best we can do
>> by
>> making a surgery on Hive in order to make ORC a TLP, Not only may this
>> bring instability to Hive, but also it also makes Hive depend an
>> incubating
>> project. Not every project graduates(, though I do wish ORC a success as
>> TLP), some of them fail.
>>
>> Instead, I like the idea of forking Hive ORC as TLP and Hive keeps
>> whatever
>> it has. This way, the new project can do whatever it wants, and Hive
>> community probably doesn't care and has no saying to it. Once ORC as a TLP
>> graduates, Hive community can decide whether to go along with it and if so
>> how to integrate with it.
>>
>> I think this will subside the current controversy, help ORC proceed faster
>> as a TLP, and leave the decision to the near future.
>>
>> Thanks,
>> Xuefu
>>
>>
>>   Szehon Ho <szehon@cloudera.com>
>>  April 2, 2015 at 23:54
>> I also agree with this goal.
>>
>> As such, I think we should first see the proposal (JIRA?) for the
>> storage-api refactoring and other related work of Orc separating as TLP
>> before the actual separation happens, to make sure the separation is not
>> done in a way taking us further from this goal. It may very well be this
>> refactoring moves us closer to the goal, but seeing the proposal first
>> would give a lot of clarity.
>>
>> Thanks
>> Szehon
>>
>> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxguru@gmail.com>
>> <edlinuxguru@gmail.com>
>>
>>   Edward Capriolo <edlinuxguru@gmail.com>
>>  April 2, 2015 at 22:20
>> To reiterate, one thing I want to avoid is having hive rely on code that
>> sits in several tiny silos across Apache projects, or Apache Licensed but
>> not ASF projects. Hive is a mature TLP with a large number of committers
>> and it would not be a good situation if often work gets bottle necked
>> because changes had to be made across two projects simultaneously to
>> commit
>> a feature. Especially if the two projects do not share the same committer
>> list.
>>
>> I think if could be done perfectly things like ORC, Parquet, whatever
>> would
>> be <provided> scope dependencies, meaning the project can be built without
>> a particular piece but as a hole the project still works. (That might be
>> easier said than done :)
>>
>>
>>   Nick Dimiduk <ndimiduk@gmail.com>
>>  April 1, 2015 at 11:51
>> I think the storage-api would be very helpful for HBase integration as
>> well.
>>
>>
>>   Owen O'Malley <omalley@apache.org>
>>  April 1, 2015 at 11:22
>>
>>
>>
>>>
>>> What I'd like to see here is well defined interfaces in Hive so that any
>>> storage format that wants can implement them.  Hopefully that means things
>>> like interfaces and utility classes for acid, sargs, and vectorization move
>>> into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
>>> on this module without needing to pull in all of Hive.
>>>
>>> Then Hive contributors would only be forced to make changes in Orc when
>>> they want to implement something in Orc.
>>>
>>
>> Agreed. The goal of the new module keep a clean separation between the
>> code for ORC and Hive so that vectorization, sargs, and acid are kept in
>> Hive and are not moved to or duplicated in the ORC project.
>>
>> .. Owen
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message