orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dain Sundstrom <d...@iq80.com>
Subject Re: WriterOptions.writerVersion(version)?
Date Sat, 02 Mar 2019 01:58:44 GMT
Thanks Owen,

I’m not the one working on this directly, but the folks looking at this said:

> This was fixed in ORC-125 which made it into Hive 2.3 via HIVE-14007. I believe this
affects all earlier Hive 2.x versions but I didn't confirm that. 

So it looks like you fixed it 2 years ago.

Dain Sundstrom
Co-founder @ Presto Software Foundation, Co-creator of Presto (https://prestosql.io)

> On Mar 1, 2019, at 3:19 PM, Owen O'Malley <owen.omalley@gmail.com> wrote:
> The goal of WriterVersion is to record changes to the writer software so
> that the readers can cope with unknown bugs. It is not intended to mark
> format changes. A good example of this is when we switched from the
> row-by-row writer to the vectorized writer in HIVE-12055. This changed the
> implementation of the writer, but didn't change the format. If the change
> had introduced a bug, we'd know that the reader had to compensate.
> If the older versions of Hive are broken with higher writer versions, we
> absolutely should fix that. I seem to remember fixing that at some point,
> but I probably didn't push it back into Hive 1.x. Which version did you see
> the problem?
> .. Owen
> On Wed, Feb 27, 2019 at 9:43 AM Dain Sundstrom <dain@iq80.com> wrote:
>> Hi, we recently updated to Hive 3.0+ and have noticed some issues with
>> older versions of Hive being able to read data written by newer versions of
>> ORC.  Specifically, older readers only understand writer version up to 4
>> and newer versions write 6.  This causes older readers to fail.  I see that
>> the workaround is to set
>> `WriterOptions.writerVersion(WriterVersion.HIVE_13083)`, which causes the
>> writer to put a `4` in the postscript, but doesn’t seem to change anything
>> else in the writer’s behavior.  My question is, did I miss something gin
>> the writer where behavior changes based on version?  If not, does that
>> work?  I ask because newer versions have comments like `ORC_135(6) =>
>> timestamp stats use utc`, which to me would seem to require that the
>> behavior changes.
>> Thanks,
>> -dain
>> ----
>> Dain Sundstrom
>> Co-founder @ Presto Software Foundation, Co-creator of Presto (
>> https://prestosql.io)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message