hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: ORC separate project
Date Fri, 03 Apr 2015 13:41:44 GMT
I actually have a different thought to share along the same line.

ORC is not a subproject in Hive. I'm not sure if it's the best we can do by
making a surgery on Hive in order to make ORC a TLP, Not only may this
bring instability to Hive, but also it also makes Hive depend an incubating
project. Not every project graduates(, though I do wish ORC a success as
TLP), some of them fail.

Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever
it has. This way, the new project can do whatever it wants, and Hive
community probably doesn't care and has no saying to it. Once ORC as a TLP
graduates, Hive community can decide whether to go along with it and if so
how to integrate with it.

I think this will subside the current controversy, help ORC proceed faster
as a TLP, and leave the decision to the near future.

Thanks,
Xuefu

On Thu, Apr 2, 2015 at 11:54 PM, Szehon Ho <szehon@cloudera.com> wrote:

> I also agree with this goal.
>
> As such, I think we should first see the proposal (JIRA?) for the
> storage-api refactoring and other related work of Orc separating as TLP
> before the actual separation happens, to make sure the separation is not
> done in a way taking us further from this goal.  It may very well be this
> refactoring moves us closer to the goal, but seeing the proposal first
> would give a lot of clarity.
>
> Thanks
> Szehon
>
> On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
> > To reiterate, one thing I want to avoid is having hive rely on code that
> > sits in several tiny silos across Apache projects, or Apache Licensed but
> > not ASF projects. Hive is a mature TLP with a large number of committers
> > and it would not be a good situation if often work gets bottle necked
> > because changes had to be made across two projects simultaneously to
> commit
> > a feature. Especially if the two projects do not share the same committer
> > list.
> >
> > I think if could be done perfectly things like ORC, Parquet, whatever
> would
> > be <provided> scope dependencies, meaning the project can be built
> without
> > a particular piece but as a hole the project still works. (That might be
> > easier said than done :)
> >
> > On Wed, Apr 1, 2015 at 2:51 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> >
> > > I think the storage-api would be very helpful for HBase integration as
> > > well.
> > >
> > > On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley <omalley@apache.org>
> > wrote:
> > >
> > > >
> > > >
> > > > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates <alanfgates@gmail.com>
> > > wrote:
> > > >
> > > >>
> > > >>
> > > >>   Carl Steinbach <cwsteinbach@gmail.com>
> > > >>  April 1, 2015 at 0:01
> > > >>
> > > >> Hi Owen,
> > > >>
> > > >> I think you're referring to the following questions I asked last
> week
> > on
> > > >> the PMC mailing list:
> > > >>
> > > >> 1) How much if any of the code for vectorization/sargs/ACID will
> > migrate
> > > >> over to the new ORC project.
> > > >>
> > > >> 2) Will Hive contributors encounter situations where they are
> required
> > > to
> > > >> make changes to ORC in order to complete work on projects related
to
> > > >> vectorization/sargs/ACID or other Hive features?
> > > >>
> > > >>  What I'd like to see here is well defined interfaces in Hive so
> that
> > > any
> > > >> storage format that wants can implement them.  Hopefully that means
> > > things
> > > >> like interfaces and utility classes for acid, sargs, and
> vectorization
> > > move
> > > >> into this new Hive module storage-api.  Then Orc, Parquet, etc. can
> > > depend
> > > >> on this module without needing to pull in all of Hive.
> > > >>
> > > >> Then Hive contributors would only be forced to make changes in Orc
> > when
> > > >> they want to implement something in Orc.
> > > >>
> > > >
> > > > Agreed. The goal of the new module keep a clean separation between
> the
> > > > code for ORC and Hive so that vectorization, sargs, and acid are kept
> > in
> > > > Hive and are not moved to or duplicated in the ORC project.
> > > >
> > > > .. Owen
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message