hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <>
Subject Re: ORC separate project
Date Fri, 03 Apr 2015 20:25:42 GMT
> Hive users who wished to use ORC would obviously need to pull in ORC
> artifacts in addition to Hive.

What would happen with Hive features that (currently) only work with ORC?
Would they be extended to work with other file formats and stay in Hive?
What about future features -- would they have to work with multiple file
formats from the get-go?

-- Lefty

On Fri, Apr 3, 2015 at 3:51 PM, Alan Gates <> wrote:

> A couple of points:
> 1) ORC isn't going into the incubator.  The proposal before the board is
> for it to go straight to TLP.  There's no graduation to depend on.
> 2) As currently proposed Hive would not depend on ORC to build.  Hive
> users who wished to used ORC would obviously need to pull in ORC artifacts
> in addition to Hive.  Given this I don't think it makes any sense to fork
> ORC and have it in both places.  This actually seems the worse outcome, as
> the two will inevitably diverge.
> Alan.
>   Xuefu Zhang <>
>  April 3, 2015 at 6:41
> I actually have a different thought to share along the same line.
> ORC is not a subproject in Hive. I'm not sure if it's the best we can do by
> making a surgery on Hive in order to make ORC a TLP, Not only may this
> bring instability to Hive, but also it also makes Hive depend an incubating
> project. Not every project graduates(, though I do wish ORC a success as
> TLP), some of them fail.
> Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever
> it has. This way, the new project can do whatever it wants, and Hive
> community probably doesn't care and has no saying to it. Once ORC as a TLP
> graduates, Hive community can decide whether to go along with it and if so
> how to integrate with it.
> I think this will subside the current controversy, help ORC proceed faster
> as a TLP, and leave the decision to the near future.
> Thanks,
> Xuefu
>   Szehon Ho <>
>  April 2, 2015 at 23:54
> I also agree with this goal.
> As such, I think we should first see the proposal (JIRA?) for the
> storage-api refactoring and other related work of Orc separating as TLP
> before the actual separation happens, to make sure the separation is not
> done in a way taking us further from this goal. It may very well be this
> refactoring moves us closer to the goal, but seeing the proposal first
> would give a lot of clarity.
> Thanks
> Szehon
> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <>
> <>
>   Edward Capriolo <>
>  April 2, 2015 at 22:20
> To reiterate, one thing I want to avoid is having hive rely on code that
> sits in several tiny silos across Apache projects, or Apache Licensed but
> not ASF projects. Hive is a mature TLP with a large number of committers
> and it would not be a good situation if often work gets bottle necked
> because changes had to be made across two projects simultaneously to commit
> a feature. Especially if the two projects do not share the same committer
> list.
> I think if could be done perfectly things like ORC, Parquet, whatever would
> be <provided> scope dependencies, meaning the project can be built without
> a particular piece but as a hole the project still works. (That might be
> easier said than done :)
>   Nick Dimiduk <>
>  April 1, 2015 at 11:51
> I think the storage-api would be very helpful for HBase integration as
> well.
>   Owen O'Malley <>
>  April 1, 2015 at 11:22
>> What I'd like to see here is well defined interfaces in Hive so that any
>> storage format that wants can implement them.  Hopefully that means things
>> like interfaces and utility classes for acid, sargs, and vectorization move
>> into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
>> on this module without needing to pull in all of Hive.
>> Then Hive contributors would only be forced to make changes in Orc when
>> they want to implement something in Orc.
> Agreed. The goal of the new module keep a clean separation between the
> code for ORC and Hive so that vectorization, sargs, and acid are kept in
> Hive and are not moved to or duplicated in the ORC project.
> .. Owen

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message