orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ddrinka <...@git.apache.org>
Subject [GitHub] orc pull request #308: Deliver a lower-case schema to OrcFile
Date Thu, 13 Sep 2018 23:11:26 GMT
GitHub user ddrinka opened a pull request:

    https://github.com/apache/orc/pull/308

    Deliver a lower-case schema to OrcFile

    Mixed-case struct field names don't work in Hive.  There should be a way to convert a
camel-cased JSON document into ORC without having to pre-process the JSON.
    
    This pull request is a proof-of-concept which generates two schemas, one using the default
case which is provided to the JsonReader as usual, and another schema which is lower cased
and is provided to OrcFile.
    
    TypeDescription is immutable and non-trivial to manually clone using public accessors,
so to make the idea clear, I do the conversion at schema ingest rather than where it's provided
to OrcFile.  The downside of this approach is that automatic schema detection doesn't benefit
from these changes.  A more experienced implementer could certainly do better.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ddrinka/orc ddrinka-pr-lowercase-schema

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #308
    
----
commit cc7e909725d059b69f9a8c384aca2691b52ce0ff
Author: Douglas Drinka <ddrinka@...>
Date:   2018-09-13T22:59:11Z

    Deliver a lower-case schema to OrcFile

----


---

Mime
View raw message