Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C9047200C21 for ; Mon, 20 Feb 2017 18:50:40 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C7AD3160B73; Mon, 20 Feb 2017 17:50:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C42B160B58 for ; Mon, 20 Feb 2017 18:50:39 +0100 (CET) Received: (qmail 29619 invoked by uid 500); 20 Feb 2017 17:50:39 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 29610 invoked by uid 99); 20 Feb 2017 17:50:39 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2017 17:50:39 +0000 Received: from mail-vk0-f41.google.com (mail-vk0-f41.google.com [209.85.213.41]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id DAA901A00A2 for ; Mon, 20 Feb 2017 17:50:38 +0000 (UTC) Received: by mail-vk0-f41.google.com with SMTP id r136so66075442vke.1 for ; Mon, 20 Feb 2017 09:50:38 -0800 (PST) X-Gm-Message-State: AMke39lJRG33pIgKxJf4ONnCyLdpChoE2a3LE98scpOtjQSWlBzngqTKeFWrdS9edV+jtHU46kpllmCuzO6CCg== X-Received: by 10.31.63.88 with SMTP id m85mr9237696vka.158.1487613037887; Mon, 20 Feb 2017 09:50:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.159.48.23 with HTTP; Mon, 20 Feb 2017 09:50:37 -0800 (PST) In-Reply-To: References: From: "Owen O'Malley" Date: Mon, 20 Feb 2017 09:50:37 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Converting json record to ORC. To: user@orc.apache.org Cc: gopalv@apache.org Content-Type: multipart/alternative; boundary=001a114dccfeafce420548f9e6f2 archived-at: Mon, 20 Feb 2017 17:50:41 -0000 --001a114dccfeafce420548f9e6f2 Content-Type: text/plain; charset=UTF-8 A few of us have written hacky ones, but we should have an official one that is more robust. Mine was in this pull request https://github.com/apache/orc/pull/43/commits/48a9f3443062bfaee4b684e49b137106bbfe9947#diff-efa8880e64e22de68f1e34c2f1d5b538 where I was converting the github archives data to ORC for benchmarking. I've created a jira https://issues.apache.org/jira/browse/ORC-150 for adding one. .. Owen On Sun, Feb 19, 2017 at 11:14 PM, Piyush Mukati (Data Platform) < piyush.mukati@flipkart.com> wrote: > Hi, > we have a use case where our MR job have to read from old json (data where > each line is a json with fixed schema) and ORC files. The output of the job > will be in ORC file. > > I tried some approaches. > > 1) Hcatalog but it was not having support for reading from multiple > tables as of now. Json data don't have hive tables too. > > 2) With the help of hive ORC lib and serde. > But unable to pass orc Struct through shuffle phase. As they don't > implement writable.(I am creating ORCStruct in mapper) > > 3) Currently I am checking org.apache.orc.mapreduce apis. everything is > good here. I have to convert exiting json record to Orcstruct. > This looks a common use-case. Writing a converter myself look like > reinventing. > > Hoping if anyone in community aware of any utils which can help me in > converting json to ORCStruct. Any other suggestion is well come. > > Thanks > > --001a114dccfeafce420548f9e6f2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
A few of us have written hacky ones, but we should have an= official one that is more robust. Mine was in this pull request=C2=A0https://github.com= /apache/orc/pull/43/commits/48a9f3443062bfaee4b684e49b137106bbfe9947#diff-e= fa8880e64e22de68f1e34c2f1d5b538 where I was converting the github archi= ves data to ORC for benchmarking.

I've created a jir= a=C2=A0https://is= sues.apache.org/jira/browse/ORC-150 for adding one.

.. Owen


=
On Sun, Feb 19, 2017 at 11:14 PM, Piyush Mukati = (Data Platform) <piyush.mukati@flipkart.com> wrote:=
Hi,
we have a use case where our MR job have to rea= d from old json (data where each line is a json with fixed schema) and ORC = files. The output of the job will be in ORC file.

I tried some approaches.

1) =C2=A0Hcatalog = but it was not having support for reading from multiple tables as of now. J= son data don't have hive tables too.

=C2=A02) With the help of hive ORC lib and serde.
But unabl= e to pass orc Struct through shuffle phase. As they don't implement wri= table.(I am creating ORCStruct in mapper)

3) Currently I am checking=
=C2=A0org.apache= .orc.mapreduce=C2=A0apis. everything is good here. I have to convert exiti= ng json record to Orcstruct.
This looks = a common use-case. Writing=C2=A0a converter myself look like reinventing. <= br>
Hoping=C2=A0if anyone in community aware of any utils which can help= me in converting json to ORCStruct. = Any other=C2=A0suggestion is well come.=C2=A0

Thanks
=C2=A0

--001a114dccfeafce420548f9e6f2--