hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: ORC separate project
Date Fri, 03 Apr 2015 20:25:42 GMT
>
> Hive users who wished to use ORC would obviously need to pull in ORC
> artifacts in addition to Hive.
>

What would happen with Hive features that (currently) only work with ORC?
Would they be extended to work with other file formats and stay in Hive?
What about future features -- would they have to work with multiple file
formats from the get-go?

-- Lefty

On Fri, Apr 3, 2015 at 3:51 PM, Alan Gates <alanfgates@gmail.com> wrote:

> A couple of points:
>
> 1) ORC isn't going into the incubator.  The proposal before the board is
> for it to go straight to TLP.  There's no graduation to depend on.
> 2) As currently proposed Hive would not depend on ORC to build.  Hive
> users who wished to used ORC would obviously need to pull in ORC artifacts
> in addition to Hive.  Given this I don't think it makes any sense to fork
> ORC and have it in both places.  This actually seems the worse outcome, as
> the two will inevitably diverge.
>
> Alan.
>
>   Xuefu Zhang <xzhang@cloudera.com>
>  April 3, 2015 at 6:41
> I actually have a different thought to share along the same line.
>
> ORC is not a subproject in Hive. I'm not sure if it's the best we can do by
> making a surgery on Hive in order to make ORC a TLP, Not only may this
> bring instability to Hive, but also it also makes Hive depend an incubating
> project. Not every project graduates(, though I do wish ORC a success as
> TLP), some of them fail.
>
> Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever
> it has. This way, the new project can do whatever it wants, and Hive
> community probably doesn't care and has no saying to it. Once ORC as a TLP
> graduates, Hive community can decide whether to go along with it and if so
> how to integrate with it.
>
> I think this will subside the current controversy, help ORC proceed faster
> as a TLP, and leave the decision to the near future.
>
> Thanks,
> Xuefu
>
>
>   Szehon Ho <szehon@cloudera.com>
>  April 2, 2015 at 23:54
> I also agree with this goal.
>
> As such, I think we should first see the proposal (JIRA?) for the
> storage-api refactoring and other related work of Orc separating as TLP
> before the actual separation happens, to make sure the separation is not
> done in a way taking us further from this goal. It may very well be this
> refactoring moves us closer to the goal, but seeing the proposal first
> would give a lot of clarity.
>
> Thanks
> Szehon
>
> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxguru@gmail.com>
> <edlinuxguru@gmail.com>
>
>   Edward Capriolo <edlinuxguru@gmail.com>
>  April 2, 2015 at 22:20
> To reiterate, one thing I want to avoid is having hive rely on code that
> sits in several tiny silos across Apache projects, or Apache Licensed but
> not ASF projects. Hive is a mature TLP with a large number of committers
> and it would not be a good situation if often work gets bottle necked
> because changes had to be made across two projects simultaneously to commit
> a feature. Especially if the two projects do not share the same committer
> list.
>
> I think if could be done perfectly things like ORC, Parquet, whatever would
> be <provided> scope dependencies, meaning the project can be built without
> a particular piece but as a hole the project still works. (That might be
> easier said than done :)
>
>
>   Nick Dimiduk <ndimiduk@gmail.com>
>  April 1, 2015 at 11:51
> I think the storage-api would be very helpful for HBase integration as
> well.
>
>
>   Owen O'Malley <omalley@apache.org>
>  April 1, 2015 at 11:22
>
>
>
>>
>> What I'd like to see here is well defined interfaces in Hive so that any
>> storage format that wants can implement them.  Hopefully that means things
>> like interfaces and utility classes for acid, sargs, and vectorization move
>> into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
>> on this module without needing to pull in all of Hive.
>>
>> Then Hive contributors would only be forced to make changes in Orc when
>> they want to implement something in Orc.
>>
>
> Agreed. The goal of the new module keep a clean separation between the
> code for ORC and Hive so that vectorization, sargs, and acid are kept in
> Hive and are not moved to or duplicated in the ORC project.
>
> .. Owen
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message