hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: [DISCUSS] ORC separate project
Date Sat, 11 Apr 2015 03:56:59 GMT


On 4/10/15, 8:05 PM, "Xuefu Zhang" <xzhang@cloudera.com> wrote:

>To Owen's explanation - Thanks. I guess my major concern is that we
>seemingly are breaking apart Hive's integrity and making it hard to
>release
>and maintain due to increasing number of external dependents. Let's say
>that Hive depends on a certain version of ORC (as TLP) and it's found that
>ORC has a bug that seriously impacts Hive users. We cannot release Hive as
>fast as we can, since dong so would need ORC community to fix the problem
>and make a release, for which Hive PMC has no control. On the contrary,
>Hive community can quickly fix the problem and make a release without
>waiting for other projects to make a release. I'm not sure this move (ORC
>as TLP) will be beneficial to vast Hive users.

You need to understand exactly what this brings about for Hive, in fact to
those who do not use ORC today.

With the proposed changes, competing formats like Parquet might be able to
compete with ORC in terms of hive features.

That is the direct impact of standardization of a Storage-API
implementation.

As an independent project, new ORC features cannot use the fact that it is
included in the ql/ source to introduce circular dependencies between
ql.exec -> orc -> ql.exec.vector classes.

As far as your concern for risks go, I would ask for a comparison against
the bugs/release cycles of ³STORED AS PARQUET².

As a Hive contributor, I¹m certain that if I find a core issue in Parquet,
my patches would be welcome there.

That should be beneficial to the Parquet community, but might not be
aligned entirely along employer lines, since my patch might be good, but
my intention would be to migrating warehouses with
parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive.

Resolving that conflict should be ideally left to the Parquet IPMC & the
ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?).

Now - reverse that argument and replay it, except instead we¹re talking
about the C++ ORC reader plus a non-ASF SQL competitor to Hive.


>If this not convincing, let me propose that we spin off metastore also as
>TLP tomorrow!

http://incubator.apache.org/projects/hcatalog.html

Cheers,
Gopal



Mime
View raw message