orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: [DISCUSS] ORC 2.0
Date Fri, 04 Aug 2017 19:15:06 GMT
Let me make sure I have the backwards compatibility straight.  If a user
switches to ORC 2.0, he could choose to continue writing in older formats
so that his old tools could read it.  Then once all his tools are upgraded
he could throw a config switch and new data would be written in the new
format.  Once that switch was thrown, any pre-ORC 2.0 tools would be
unusable.  Before throwing that switch, he would get none of the benefits
of ORC 2.0.  Is this summary correct?

If so, I agree we should do this.  The list of potential benefits for
performance and space efficiency is compelling.  And the long lag for users
with many old tools to upgrade will never get better.

Alan.

On Fri, Aug 4, 2017 at 9:29 AM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> All,
>   We've started the process of updating the encodings for ORC. These
> changes are going to extend the format in ways that aren't forward
> compatible. (eg. The ORC 1.4 readers won't be able to read the new format.)
>
> The changes that I've heard about are:
> * Decimal encoding - this will like be separated in to two categories
>    + precision <= 18
>    + precision > 18
>   In both cases the precision and scale will be fixed for the entire file
> rather than per value.
> * a new Float/Double encoding
> * a new RLE encoding
>
> Are there other encodings that we should consider adding?
>
> We haven't made forward incompatible changes in a while. Currently the ORC
> Writer can write either:
>  * Hive 0.11 ORC files
>  * Hive 0.12 ORC files
>
> So I'd like to propose that we add a new ORC 2.0 file version and all of
> these changes need to be so tagged.
>
> The new ORC writers will maintain the ability to write the old versions of
> the files (Hive 0.11 ORC and Hive 0.12 ORC) as well as the ORC 2.0 files.
> The new reader will automatically read all three versions.
>
> Thoughts?
>
>   Owen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message