hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: [DISCUSS] ORC separate project
Date Sat, 11 Apr 2015 21:06:24 GMT
Speaking of the C++ ORC reader and writer, could they be included in the
Hive project or do they have to be separate because they aren't Java code?

By the way, gmail thwarts adding [DISCUSS] to the subject line.  It shows
up in the mail archives, although pre- & post-DISCUSS threads are separate.

-- Lefty

On Fri, Apr 10, 2015 at 11:56 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

>
>
> On 4/10/15, 8:05 PM, "Xuefu Zhang" <xzhang@cloudera.com> wrote:
>
> >To Owen's explanation - Thanks. I guess my major concern is that we
> >seemingly are breaking apart Hive's integrity and making it hard to
> >release
> >and maintain due to increasing number of external dependents. Let's say
> >that Hive depends on a certain version of ORC (as TLP) and it's found that
> >ORC has a bug that seriously impacts Hive users. We cannot release Hive as
> >fast as we can, since dong so would need ORC community to fix the problem
> >and make a release, for which Hive PMC has no control. On the contrary,
> >Hive community can quickly fix the problem and make a release without
> >waiting for other projects to make a release. I'm not sure this move (ORC
> >as TLP) will be beneficial to vast Hive users.
>
> You need to understand exactly what this brings about for Hive, in fact to
> those who do not use ORC today.
>
> With the proposed changes, competing formats like Parquet might be able to
> compete with ORC in terms of hive features.
>
> That is the direct impact of standardization of a Storage-API
> implementation.
>
> As an independent project, new ORC features cannot use the fact that it is
> included in the ql/ source to introduce circular dependencies between
> ql.exec -> orc -> ql.exec.vector classes.
>
> As far as your concern for risks go, I would ask for a comparison against
> the bugs/release cycles of ³STORED AS PARQUET².
>
> As a Hive contributor, I¹m certain that if I find a core issue in Parquet,
> my patches would be welcome there.
>
> That should be beneficial to the Parquet community, but might not be
> aligned entirely along employer lines, since my patch might be good, but
> my intention would be to migrating warehouses with
> parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive.
>
> Resolving that conflict should be ideally left to the Parquet IPMC & the
> ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?).
>
> Now - reverse that argument and replay it, except instead we¹re talking
> about the C++ ORC reader plus a non-ASF SQL competitor to Hive.
>
>
> >If this not convincing, let me propose that we spin off metastore also as
> >TLP tomorrow!
>
> http://incubator.apache.org/projects/hcatalog.html
>
> Cheers,
> Gopal
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message