orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tkcode123 <...@git.apache.org>
Subject [GitHub] orc pull request #217: Provide additional constructor to JsonReader (java or...
Date Sun, 11 Feb 2018 22:45:30 GMT
GitHub user tkcode123 opened a pull request:

    https://github.com/apache/orc/pull/217

    Provide additional constructor to JsonReader (java orc tools)

    Provide additional constructor to JsonReader so that embedding code can use its own JsonParser
implementation. Intended to plug in a parser that transforms JSON while reading (flattening
nested structs, renaming and filtering capabilities).
    
    Rationale: Our application often gets JSON files that have deeply nested arrays with structs
where the innermost elements are generic like <name:string,type:tinyint,value:something>.
    I would like to be able to move the value element into separate, correctly typed elements
that hold
    either bigints, doubles, strings or boolean (etc.) so that compression and value handling
is improved. It is intended to leverage JOLT (https://github.com/bazaarvoice/jolt) for this.
I would like
    to read the original files, transform them in memory to the target shape JSON objects
and then
    create ORC files from that representation.
    Adding just another ctor would allow us to implement such a transformation step.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tkcode123/orc master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #217
    
----
commit f8134e167718035eea0b3a1796162c74a667adf0
Author: Thomas Kru╠łger <tkcode123>
Date:   2018-02-11T22:33:49Z

    Provide additional constructor to JsonReader so that embedding code can
    use it's own JsonParser implementation. Intended to plug in a parser
    that transforms JSON while reading (flattening nested structs, renaming
    and filtering capabilities).

----


---

Mime
View raw message