hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <>
Subject Re: ORC separate project
Date Fri, 03 Apr 2015 22:59:13 GMT
On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz <>

> Hive users who wished to use ORC would obviously need to pull in ORC
>> artifacts in addition to Hive.
> What would happen with Hive features that (currently) only work with ORC?
> Would they be extended to work with other file formats and stay in Hive?
> What about future features -- would they have to work with multiple file
> formats from the get-go?

The storage-api module proposed above would lead to clearer storage
interfaces in hive. That will in turn help to implement such features using
other storage including parquet, hbase etc.
The result of this work will not automatically make those features worth
with ORC, somebody would need to do that.

Whether future features would work for all formats would depend on whether
the new feature needs new functionality to be supported by the storage
layer. If the feature needs new storage functionality, I would expect new
interfaces to be defined in hive, and then implemented by the storage
engines that want to support that feature.

This will not negatively impact experience of users with respect to ORC or
other storage formats. The way we package parquet in hive, we can package
ORC as well. In fact, users would be more easily be able to upgrade their
version of ORC being used, as releases can happen independent of each other.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message