hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: ORC separate project
Date Fri, 03 Apr 2015 05:20:01 GMT
To reiterate, one thing I want to avoid is having hive rely on code that
sits in several tiny silos across Apache projects, or Apache Licensed but
not ASF projects. Hive is a mature TLP with a large number of committers
and it would not be a good situation if often work gets bottle necked
because changes had to be made across two projects simultaneously to commit
a feature. Especially if the two projects do not share the same committer
list.

I think if could be done perfectly things like ORC, Parquet, whatever would
be <provided> scope dependencies, meaning the project can be built without
a particular piece but as a hole the project still works. (That might be
easier said than done :)

On Wed, Apr 1, 2015 at 2:51 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> I think the storage-api would be very helpful for HBase integration as
> well.
>
> On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley <omalley@apache.org> wrote:
>
> >
> >
> > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates <alanfgates@gmail.com>
> wrote:
> >
> >>
> >>
> >>   Carl Steinbach <cwsteinbach@gmail.com>
> >>  April 1, 2015 at 0:01
> >>
> >> Hi Owen,
> >>
> >> I think you're referring to the following questions I asked last week on
> >> the PMC mailing list:
> >>
> >> 1) How much if any of the code for vectorization/sargs/ACID will migrate
> >> over to the new ORC project.
> >>
> >> 2) Will Hive contributors encounter situations where they are required
> to
> >> make changes to ORC in order to complete work on projects related to
> >> vectorization/sargs/ACID or other Hive features?
> >>
> >>  What I'd like to see here is well defined interfaces in Hive so that
> any
> >> storage format that wants can implement them.  Hopefully that means
> things
> >> like interfaces and utility classes for acid, sargs, and vectorization
> move
> >> into this new Hive module storage-api.  Then Orc, Parquet, etc. can
> depend
> >> on this module without needing to pull in all of Hive.
> >>
> >> Then Hive contributors would only be forced to make changes in Orc when
> >> they want to implement something in Orc.
> >>
> >
> > Agreed. The goal of the new module keep a clean separation between the
> > code for ORC and Hive so that vectorization, sargs, and acid are kept in
> > Hive and are not moved to or duplicated in the ORC project.
> >
> > .. Owen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message