orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: [DISCUSS] ORC 2.0
Date Fri, 04 Aug 2017 21:51:55 GMT
On Fri, Aug 4, 2017 at 12:15 PM, Alan Gates <alanfgates@gmail.com> wrote:

> Let me make sure I have the backwards compatibility straight.  If a user
> switches to ORC 2.0, he could choose to continue writing in older formats
> so that his old tools could read it.  Then once all his tools are upgraded
> he could throw a config switch and new data would be written in the new
> format.  Once that switch was thrown, any pre-ORC 2.0 tools would be
> unusable.  Before throwing that switch, he would get none of the benefits
> of ORC 2.0.  Is this summary correct?
>

Yes, exactly.


>
> If so, I agree we should do this.  The list of potential benefits for
> performance and space efficiency is compelling.  And the long lag for users
> with many old tools to upgrade will never get better.
>
> Alan.
>
> On Fri, Aug 4, 2017 at 9:29 AM, Owen O'Malley <owen.omalley@gmail.com>
> wrote:
>
> > All,
> >   We've started the process of updating the encodings for ORC. These
> > changes are going to extend the format in ways that aren't forward
> > compatible. (eg. The ORC 1.4 readers won't be able to read the new
> format.)
> >
> > The changes that I've heard about are:
> > * Decimal encoding - this will like be separated in to two categories
> >    + precision <= 18
> >    + precision > 18
> >   In both cases the precision and scale will be fixed for the entire file
> > rather than per value.
> > * a new Float/Double encoding
> > * a new RLE encoding
> >
> > Are there other encodings that we should consider adding?
> >
> > We haven't made forward incompatible changes in a while. Currently the
> ORC
> > Writer can write either:
> >  * Hive 0.11 ORC files
> >  * Hive 0.12 ORC files
> >
> > So I'd like to propose that we add a new ORC 2.0 file version and all of
> > these changes need to be so tagged.
> >
> > The new ORC writers will maintain the ability to write the old versions
> of
> > the files (Hive 0.11 ORC and Hive 0.12 ORC) as well as the ORC 2.0 files.
> > The new reader will automatically read all three versions.
> >
> > Thoughts?
> >
> >   Owen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message