orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: WriterOptions.writerVersion(version)?
Date Fri, 01 Mar 2019 23:19:46 GMT
The goal of WriterVersion is to record changes to the writer software so
that the readers can cope with unknown bugs. It is not intended to mark
format changes. A good example of this is when we switched from the
row-by-row writer to the vectorized writer in HIVE-12055. This changed the
implementation of the writer, but didn't change the format. If the change
had introduced a bug, we'd know that the reader had to compensate.

If the older versions of Hive are broken with higher writer versions, we
absolutely should fix that. I seem to remember fixing that at some point,
but I probably didn't push it back into Hive 1.x. Which version did you see
the problem?

.. Owen

On Wed, Feb 27, 2019 at 9:43 AM Dain Sundstrom <dain@iq80.com> wrote:

> Hi, we recently updated to Hive 3.0+ and have noticed some issues with
> older versions of Hive being able to read data written by newer versions of
> ORC.  Specifically, older readers only understand writer version up to 4
> and newer versions write 6.  This causes older readers to fail.  I see that
> the workaround is to set
> `WriterOptions.writerVersion(WriterVersion.HIVE_13083)`, which causes the
> writer to put a `4` in the postscript, but doesn’t seem to change anything
> else in the writer’s behavior.  My question is, did I miss something gin
> the writer where behavior changes based on version?  If not, does that
> work?  I ask because newer versions have comments like `ORC_135(6) =>
> timestamp stats use utc`, which to me would seem to require that the
> behavior changes.
>
> Thanks,
>
> -dain
>
> ----
> Dain Sundstrom
> Co-founder @ Presto Software Foundation, Co-creator of Presto (
> https://prestosql.io)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message