Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 24022200D5B for ; Wed, 13 Dec 2017 21:42:36 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 22906160C23; Wed, 13 Dec 2017 20:42:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 413F4160C0F for ; Wed, 13 Dec 2017 21:42:35 +0100 (CET) Received: (qmail 72477 invoked by uid 500); 13 Dec 2017 20:42:34 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 72467 invoked by uid 99); 13 Dec 2017 20:42:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Dec 2017 20:42:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C18B61807A9 for ; Wed, 13 Dec 2017 20:42:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.724 X-Spam-Level: *** X-Spam-Status: No, score=3.724 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FORGED_MUA_MOZILLA=1.596, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id iDqHVvO6duPZ for ; Wed, 13 Dec 2017 20:42:31 +0000 (UTC) Received: from sonic311-22.consmr.mail.gq1.yahoo.com (sonic311-22.consmr.mail.gq1.yahoo.com [98.137.65.203]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C8EF65F3D0 for ; Wed, 13 Dec 2017 20:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1513197749; bh=eXYXbLQvJkoJltRNjhS0GwVQvk/bCVx2vXfuYsRNDH0=; h=Date:From:To:In-Reply-To:References:Subject:From:Subject; b=RV+V3qq+JvSPDAxI6v/lGmjEIigZ7IIR74xZ31UFdHwkR56qiYg/6tDM98D293l8m3bOFckBQNmQgnOyH12qyJfTZ1hSNbx/LBc4u3R1EUv5rJuh96UpoPlCnpa5wM0VxHe3a1uaPHHoVWrH/3lVP/F0nA62WMf+qcgLAub5tMQSESKZJ2o6i5apEtNndFVgVsQSRvt5VI09G3B1SSb8M0GGC1kM+u/nr6/8OA0CRISkyFeo2GTnVGB73K1dgMnNl69s44ssACPIvrxHsdl2xvT80UuJbMp/aOCa+d1iR9TxozmakEP500jd8rue3gfPbDSmvBREQEDgOdr+3EOuTw== X-YMail-OSG: VPOnwIkVM1lWdEMEBVQh54ZnnOSkBN9eKDEA9Z_t0vBsNgAEpEpNDW01EiR_mh. R9HYjZacRLGDIwfcOwyv4ZTxT6fWGXf9Xr2LOEmlfzErdbvEdvBeOIz8UCkoNrXyjOYtQVAZmmL. 8msfGxiooZjAr0EiBDmRZJ26ZqV3pLnF4bJNQZ3Tqi9ycprHRn8ex2OERsIhmJlAqjknbsyjyXo1 LTKq8lds5ZimCqsG6KXm2HcKOHaBhL.6oUSyoCnplP1LPPfwtMqxQPtzSIMkq3XMxOLam8CwuCyZ Vv2JWbdXJbvrFz16p1YRYJ8ReU5AsZajETFEZjO_ES1IX3mVDFBCAf9_JQ7VLzwkCPhjQAHOlXA. rjQM18bKrRE6y14jZ0oK8QYXKUNcKypbi9Ea2Vhx7iSShr5yf3qCbkkBB4j8JehwN4rGWgux6_6J bFD6X0xhtNOxaf8s.Zkf5Dh6KBDyQ7psuxNmeA_XnIjEuldmiWBJDCPY4ztN9ssmKK6mRVa_nDpV d Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.gq1.yahoo.com with HTTP; Wed, 13 Dec 2017 20:42:29 +0000 Date: Wed, 13 Dec 2017 20:42:26 +0000 (UTC) From: Telco Phone To: Message-ID: <579810703.2883342.1513197746193@mail.yahoo.com> In-Reply-To: References: Subject: Re: convert avro to orc MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2883341_1124991484.1513197746191" X-Mailer: WebService/1.1.11051 YMailNorrin Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7 archived-at: Wed, 13 Dec 2017 20:42:36 -0000 ------=_Part_2883341_1124991484.1513197746191 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I wrote my own Avro -> Orc. It was a pain and is a bit on the garbage code= side (not something I would like to share without cleaning it up) I would like to contribute the code if I thought enough folks would like th= is kind of option. Thoughts ? On Friday, December 8, 2017, 12:43:57 AM MST, Oleg Ruchovets wrote: =20 =20 Hello Owen.=C2=A0 =C2=A0That is interesting.=C2=A0From your experience wil= l it support hive external / managed table.=C2=A0My Idea was to prepare ORC= object ( without HIVE ) and after that register it as external=C2=A0Hive t= able. Motivation is to prevent hive schema maintenance=C2=A0 ThanksOleg. On Thu, Dec 7, 2017 at 2:55 AM, Owen O'Malley wrot= e: It would be a nice addition to the conversion tools. A first pass of conver= ting Avro schemas to ORC would be pretty easy with: boolean -> boolean int -> int long -> long float -> float double -> double bytes -> binary string -> stringenum -> stringfixed -> binarymap -> maparray -> arrayrecord -> structunion -> union with special handling for union -> X In terms of the conversion, you would just need to extend ConvertTool to cr= eate RecordReaders for Avro. There are already examples of JSON and CSV. .. Owen On Mon, Dec 4, 2017 at 11:31 PM, Oleg Ruchovets wrot= e: Hello.=C2=A0 =C2=A0=C2=A0I wonder if there Utility to convert AVRO to ORC s= imilar JSON to ORC ?=C2=A0 Background of what I am doing:=C2=A0 =C2=A0I am reading SQL data using NIFI= . NIFI returns data in AVRO format. I want to store this data on s3 in ORC = format and use it for hive external table. for that, I need to convert AVRO= to ORC and derive hive schema. NIFI has component AVRO to ORC but it suppo= rts older version of HIVE and ORC. So the question how to convert AVRO to ORC and derive hive schema. I really= like Utility that you guys build for JSON. it has both conversions to ORC = and HIVE schema extraction.=C2=A0 What is the way to achieve the same in ca= se of AVRO format? ThanksOleg. =20 ------=_Part_2883341_1124991484.1513197746191 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I wrote my own Avro -> Orc. It was a pain and is a bit = on the garbage code side (not something I would like to share without clean= ing it up)

I would like to contribute the code if = I thought enough folks would like this kind of option.

=
Thoughts ?



=20
=20
On Friday, December 8, 2017, 12:43:57 AM MST, Oleg = Ruchovets <oruchovets@gmail.com> wrote:


Hello Owen.
   That is interesting. 
= From your experience will it support hive external / managed table. 
My Idea was to prepare ORC object ( without HIVE ) and after that = register it as external Hive table. Motivation is to prevent hive sche= ma maintenance 

Thanks
Oleg.

On Thu, Dec 7, 2017 at 2:55 AM, Owen O'Malley &l= t;owen.omalley@gmail.com> wrote:
It would be a nice addition to the conversion tools. A first pass of c= onverting Avro schemas to ORC would be pretty easy with:

boolean -> boolean
int -> int
long -> long
float -> float
double -> double
bytes -> binary
string -> string
enum -> string
fixed -> binary=
map<X> -> map<string,X>
array<X> = -> array<X>
record<X,Y,Z> -> struct<X,Y,Z>= ;
union<X,Y,Z> -> union<X,Y,Z>

with special handling for union<null,X> -> X<= /div>

In terms of the conversion, you wou= ld just need to extend ConvertTool to create RecordReaders for Avro. There = are already examples of JSON and CSV.

.. Owen

=

On Mon, Dec 4, 2017 at 11:31 = PM, Oleg Ruchovets <oruchovets@gmail.co= m> wrote:
Hello.
    I won= der if there Utility to convert AVRO to ORC similar JSON to ORC ? 

Background of what I am doing:
   I am reading SQL data using NIFI. NIFI returns data in AVRO= format. I want to store this data on s3 in ORC format and use it for hive = external table. for that, I need to convert AVRO to ORC and derive hive sch= ema. NIFI has component AVRO to ORC but it supports older version of HIVE a= nd ORC.

So the question how to conv= ert AVRO to ORC and derive hive schema. I really like Utility that you guys= build for JSON. it has both conversions to ORC and HIVE schema extraction.=   What is the way to achieve the same in case of AVRO format?

Thanks
Oleg.


<= /div> ------=_Part_2883341_1124991484.1513197746191--