orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Converting json record to ORC.
Date Mon, 20 Feb 2017 17:50:37 GMT
A few of us have written hacky ones, but we should have an official one
that is more robust. Mine was in this pull request
https://github.com/apache/orc/pull/43/commits/48a9f3443062bfaee4b684e49b137106bbfe9947#diff-efa8880e64e22de68f1e34c2f1d5b538
where I was converting the github archives data to ORC for benchmarking.

I've created a jira https://issues.apache.org/jira/browse/ORC-150 for
adding one.

.. Owen


On Sun, Feb 19, 2017 at 11:14 PM, Piyush Mukati (Data Platform) <
piyush.mukati@flipkart.com> wrote:

> Hi,
> we have a use case where our MR job have to read from old json (data where
> each line is a json with fixed schema) and ORC files. The output of the job
> will be in ORC file.
>
> I tried some approaches.
>
> 1)  Hcatalog but it was not having support for reading from multiple
> tables as of now. Json data don't have hive tables too.
>
>  2) With the help of hive ORC lib and serde.
> But unable to pass orc Struct through shuffle phase. As they don't
> implement writable.(I am creating ORCStruct in mapper)
>
> 3) Currently I am checking org.apache.orc.mapreduce apis. everything is
> good here. I have to convert exiting json record to Orcstruct.
> This looks a common use-case. Writing a converter myself look like
> reinventing.
>
> Hoping if anyone in community aware of any utils which can help me in
> converting json to ORCStruct. Any other suggestion is well come.
>
> Thanks
>
>

Mime
View raw message