hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Steinbach <>
Subject Re: ORC separate project
Date Wed, 01 Apr 2015 07:01:56 GMT
Hi Owen,

I think you're referring to the following questions I asked last week on
the PMC mailing list:

1) How much if any of the code for vectorization/sargs/ACID will migrate
over to the new ORC project.

2) Will Hive contributors encounter situations where they are required to
make changes to ORC in order to complete work on projects related to
vectorization/sargs/ACID or other Hive features?

Thanks for taking the time to write a response, but I don't think what you
wrote really answers either of these questions.

Some more comments/questions inline:

One of the concerns that has been mentioned is how to deal with the
> vectorization and SARG APIs.

I'm actually more concerned about what will happen to the code that
provides the implementation for these APIs. Can you comment on that?

> I'd like to propose that we pull the minimal
> set of classes in a new Hive module named "storage-api". This module will
> include VectorizedRowBatch, the various ColumnVector classes, and the SARG
> classes.

"storage-api" implies that there will be a separate "storage-impl" module.
Where will that live?

> It will form the start of an API that high performance storage
> formats can use to integrate with Hive. Both ORC and Parquet can use the
> new API to support vectorization and SARGs without performance destroying
> shims.

I'd like to understand this problem better, but I don't know where to
start. Can you provide a pointer to these "performance destroying shims"?


- Carl

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message