hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <>
Subject [jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
Date Mon, 11 Jul 2016 21:21:11 GMT


Matt McCline commented on HIVE-13974:

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The TypeDescription.equals
method compares (type) id and maximumId which does not work when there is an interior STRUCT
column with a different number of columns.  It makes it seem like a type conversion is needed
when one is not needed and other parts of the code throw exceptions complaining "no need to
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is the ambiguity
at different parts of the code as to which schema is being used.  It is the one being returned
by the input file format, is it the schema being fed back to the ORC raw merger that included
ACID columns, or is it the unconverted file schema.  I don't care what the first 2 schemas
are called as long as the names are distinct.  Maybe the names could be reader, internalReader,
and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to get Erie out
the door.  We have so little runway left.  I've had 10+ JIRAs for weeks.  Whenever I knock
some down more appear.  Also, there really needs to be a parallel HIVE JIRA for it and we
must make sure name mapping is fully supported for HIVE.  Given how *difficult* Schema Evolution
has been I simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---------------------------------------------------------------------------
>                 Key: HIVE-13974
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, ORC, Transactions
>    Affects Versions: 1.3.0, 2.1.0, 2.2.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Blocker
>         Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, HIVE-13974.03.patch, HIVE-13974.04.patch,
HIVE-13974.05.WIP.patch, HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, HIVE-13974.09.patch,
> Currently, the included columns are based on the fileSchema and not the readerSchema
which doesn't work for adding columns to non-last STRUCT data type columns.

This message was sent by Atlassian JIRA

View raw message