orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dain Sundstrom <d...@iq80.com>
Subject Re: WriterOptions.writerVersion(version)?
Date Sat, 02 Mar 2019 20:46:17 GMT
After a bit more investigation, it looks like this was a regression only present in Hive 2.0
to 2.2.

----
Dain Sundstrom
Co-founder @ Presto Software Foundation, Co-creator of Presto (https://prestosql.io)

> On Mar 1, 2019, at 5:58 PM, Dain Sundstrom <dain@iq80.com> wrote:
> 
> Thanks Owen,
> 
> I’m not the one working on this directly, but the folks looking at this said:
> 
>> This was fixed in ORC-125 which made it into Hive 2.3 via HIVE-14007. I believe this
affects all earlier Hive 2.x versions but I didn't confirm that. 
> 
> So it looks like you fixed it 2 years ago.
> 
> ----
> Dain Sundstrom
> Co-founder @ Presto Software Foundation, Co-creator of Presto (https://prestosql.io)
> 
>> On Mar 1, 2019, at 3:19 PM, Owen O'Malley <owen.omalley@gmail.com> wrote:
>> 
>> The goal of WriterVersion is to record changes to the writer software so
>> that the readers can cope with unknown bugs. It is not intended to mark
>> format changes. A good example of this is when we switched from the
>> row-by-row writer to the vectorized writer in HIVE-12055. This changed the
>> implementation of the writer, but didn't change the format. If the change
>> had introduced a bug, we'd know that the reader had to compensate.
>> 
>> If the older versions of Hive are broken with higher writer versions, we
>> absolutely should fix that. I seem to remember fixing that at some point,
>> but I probably didn't push it back into Hive 1.x. Which version did you see
>> the problem?
>> 
>> .. Owen
>> 
>> On Wed, Feb 27, 2019 at 9:43 AM Dain Sundstrom <dain@iq80.com> wrote:
>> 
>>> Hi, we recently updated to Hive 3.0+ and have noticed some issues with
>>> older versions of Hive being able to read data written by newer versions of
>>> ORC.  Specifically, older readers only understand writer version up to 4
>>> and newer versions write 6.  This causes older readers to fail.  I see that
>>> the workaround is to set
>>> `WriterOptions.writerVersion(WriterVersion.HIVE_13083)`, which causes the
>>> writer to put a `4` in the postscript, but doesn’t seem to change anything
>>> else in the writer’s behavior.  My question is, did I miss something gin
>>> the writer where behavior changes based on version?  If not, does that
>>> work?  I ask because newer versions have comments like `ORC_135(6) =>
>>> timestamp stats use utc`, which to me would seem to require that the
>>> behavior changes.
>>> 
>>> Thanks,
>>> 
>>> -dain
>>> 
>>> ----
>>> Dain Sundstrom
>>> Co-founder @ Presto Software Foundation, Co-creator of Presto (
>>> https://prestosql.io)
>>> 
>>> 
> 


Mime
View raw message