hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: [DISCUSS] Making storage-api a separately released artifact
Date Wed, 17 Aug 2016 18:39:12 GMT
+1 for making it a subproject with separate (preferably shorter) release cycle. The module
in itself is too small for a separate project. Also having a faster release cycle will resolve
circular dependency and will help other projects make use of vectorization, sarg, bloom filter

For version management, how about adding another version after patch version i.e sub-project
Example: 2.2.0.[0] will be storage api’s release version. Hive will always depend on 2.2.0-SNAPSHOT.
I think maven will let us release modules with different versions.


> On Aug 17, 2016, at 10:46 AM, Alan Gates <> wrote:
> +1 for making the API clean and easy for other projects to work with.  A few questions:
> 1) Would this also make it easier for Parquet and others to implement Hive’s ACID interfaces?
> 2) Would we make any attempt to coordinate version numbers between Hive and the storage
module, or would a given version of Hive just depend on a given version of the storage module?
> Alan.
>> On Aug 15, 2016, at 17:01, Owen O'Malley <> wrote:
>> All,
>> As part of moving ORC out of Hive, we pulled all of the vectorization
>> storage and sarg classes into a separate module, which is named
>> storage-api.  Although it is currently only used by ORC, it could be used
>> by Parquet or Avro if they wanted to make a fast vectorized reader that
>> read directly in to Hive's VectorizedRowBatch without needing a shim or
>> data copy. Note that this is in many ways similar to pulling the Arrow
>> project out of Drill.
>> This unfortunately still leaves us with a circular dependency between Hive
>> and ORC. I'd hoped that storage-api wouldn't change that much, but that
>> doesn't seem to be happening. As a result, ORC ends up shipping its own
>> fork of storage-api.
>> Although we could make a new project for just the storage-api, I think it
>> would be better to make it a subproject of Hive that is released
>> independently.
>> What do others think?
>>  Owen

View raw message