orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject [DISCUSS] ORC 2.0
Date Fri, 04 Aug 2017 16:29:03 GMT
  We've started the process of updating the encodings for ORC. These
changes are going to extend the format in ways that aren't forward
compatible. (eg. The ORC 1.4 readers won't be able to read the new format.)

The changes that I've heard about are:
* Decimal encoding - this will like be separated in to two categories
   + precision <= 18
   + precision > 18
  In both cases the precision and scale will be fixed for the entire file
rather than per value.
* a new Float/Double encoding
* a new RLE encoding

Are there other encodings that we should consider adding?

We haven't made forward incompatible changes in a while. Currently the ORC
Writer can write either:
 * Hive 0.11 ORC files
 * Hive 0.12 ORC files

So I'd like to propose that we add a new ORC 2.0 file version and all of
these changes need to be so tagged.

The new ORC writers will maintain the ability to write the old versions of
the files (Hive 0.11 ORC and Hive 0.12 ORC) as well as the ORC 2.0 files.
The new reader will automatically read all three versions.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message