From user-return-263-archive-asf-public=cust-asf.ponee.io@orc.apache.org  Wed Jan 30 21:17:25 2019
Return-Path: <user-return-263-archive-asf-public=cust-asf.ponee.io@orc.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id E92C9180652
	for <archive-asf-public@cust-asf.ponee.io>; Wed, 30 Jan 2019 22:17:24 +0100 (CET)
Received: (qmail 72361 invoked by uid 500); 30 Jan 2019 21:17:23 -0000
Mailing-List: contact user-help@orc.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@orc.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@orc.apache.org>
List-Post: <mailto:user@orc.apache.org>
List-Id: <user.orc.apache.org>
Reply-To: user@orc.apache.org
Delivered-To: mailing list user@orc.apache.org
Received: (qmail 72351 invoked by uid 99); 30 Jan 2019 21:17:23 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2019 21:17:23 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7C65BC6331
	for <user@orc.apache.org>; Wed, 30 Jan 2019 21:17:23 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.203
X-Spam-Level:
X-Spam-Status: No, score=-0.203 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id 2tnHXPEfi_VK for <user@orc.apache.org>;
	Wed, 30 Jan 2019 21:17:21 +0000 (UTC)
Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id F232C5FC8A
	for <user@orc.apache.org>; Wed, 30 Jan 2019 21:17:20 +0000 (UTC)
Received: by mail-lf1-f68.google.com with SMTP id z13so683683lfe.11
        for <user@orc.apache.org>; Wed, 30 Jan 2019 13:17:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :content-transfer-encoding;
        bh=WX2ocli/F+UrVhgZjIImb+PHtAl5xw8AyzYkcIzUO80=;
        b=SBMB1t/Qy4t53Np/LTaHMbhzcq7wFV+12tPYgWkYvmQKNg9+DXeHSn3wTracq7RqBi
         G3LlYQYWUjKPL5HMouIFs+bWOV+ndTiQlYzLB1jCqYeSyI+JIJRbvJiq46jTYThhIa52
         4KfFlB4QjnmkfHXtHauHBGfweqpC9ktFmlE/MixhQdT7ZHvQhId/rDXB14ttU65DOkZa
         ZmyXN4p0jdYgbhD8YrKErZVbOH4+rWvfDYttgNVuC4bEr5j7WxCKjfJEK2aINwpjxZtk
         4oMJLDfbDtXD0jquMtQAnTdWcWe+mAusM0GRbHnTbXP9eqvyPLD0rhM3jRUpM5MWm0A4
         /bFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:content-transfer-encoding;
        bh=WX2ocli/F+UrVhgZjIImb+PHtAl5xw8AyzYkcIzUO80=;
        b=qVBDfVy5u+uyJZ8vBerbPrNEqxFkXsM0tHcAQyqh8ETctD2oGuQcL98pM9INWaEsAK
         fMRyCVrNEoNZUgl4bLrXRrHRkScQyDsZnOzraQNN6JpTFuDDVC7aZJvXs63V4EA57DZ4
         wdozmrm/FpnAInld8QDihPRW8TI9215+u8s8MthtMPhxSyW0R/cZ8RhUK61ONJHfqdC2
         S2wDojOxoMbndplZMb53r75LxlRH31MfwxF1NCYxKzYcW/dOJAzo6+KwgVsFbVUHN6RP
         ChMnFqYsqjg9UkkRi9ZbIAPcvKefTvbKLuIT9y9kp3VEy+eJQ74sb/HEo8EF3/TUw32c
         h3MQ==
X-Gm-Message-State: AJcUukf9Y3Kx9i+h6b9RCTs46chupqfxv62mken2x34somsIvpEs4rgS
	p+DW+cexJ6Cq6JHWPmWptyXscyKnaYhjRkyQqboqPzA9
X-Google-Smtp-Source: ALg8bN6+sK4oJUqJ5RblL7aFbXLSJ1DAxMNCOlsgI8/4DsNf7MdeNLBXcU/thC7RpXPBqDE4jAPk4vvIAetxyMa5Vng=
X-Received: by 2002:a19:c413:: with SMTP id u19mr24229606lff.100.1548883039929;
 Wed, 30 Jan 2019 13:17:19 -0800 (PST)
MIME-Version: 1.0
References: <CAF-Zn3nV63LgkiCg7UegcFYgPy+XGJ-3OeSTm97uNAorkjf6Uw@mail.gmail.com>
 <CAB25XEVBt-_VV_NSF44=rr0qnb-XtNP3kkdGmmcKCWNy279YCw@mail.gmail.com> <CAF-Zn3=kFGQHGc-dcjkZpfnU=4hsi8ktSoWU5306B23QOJpDCg@mail.gmail.com>
In-Reply-To: <CAF-Zn3=kFGQHGc-dcjkZpfnU=4hsi8ktSoWU5306B23QOJpDCg@mail.gmail.com>
From: Jeff Evans <jeffrey.wayne.evans@gmail.com>
Date: Wed, 30 Jan 2019 15:17:08 -0600
Message-ID: <CAF-Zn3kH5R4Y-1_pfG9ES_G6GxRMGdp1=7AabrARNCb_TF5J0Q@mail.gmail.com>
Subject: Re: Orc core (Java) dependency on hadoop-common
To: user@orc.apache.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

To close the loop on this admittedly old thread, we now have some code
that performs this conversion as part of our open source product.  I'm
mentioning it here in case anyone else finds it useful, or has any
feedback on implementation, bugs, invalid assumptions, etc.

This class converts an Avro schema to an ORC schema:
https://github.com/streamsets/datacollector/blob/master/mapreduce-protolib/=
src/main/java/com/streamsets/pipeline/lib/util/avroorc/AvroToOrcSchemaConve=
rter.java

This class converts an Avro file to an ORC file (using a schema built
using the above):
https://github.com/streamsets/datacollector/blob/master/mapreduce-protolib/=
src/main/java/com/streamsets/pipeline/lib/util/avroorc/AvroToOrcRecordConve=
rter.java

Both make use of a utility class:
https://github.com/streamsets/datacollector/blob/master/commonlib/src/main/=
java/com/streamsets/pipeline/lib/util/AvroTypeUtil.java

There are some test cases here:
https://github.com/streamsets/datacollector/tree/master/mapreduce-protolib/=
src/test/java/com/streamsets/pipeline/lib/util/avroorc

Thanks for all the info before, and any feedback/critiques are welcomed!


On Wed, Jan 17, 2018 at 4:55 PM Jeff Evans
<jeffrey.wayne.evans@gmail.com> wrote:
>
> Thanks, Istv=C3=A1n and Owen!
>
> I appreciate the input.  At the moment, I'm developing against
> orc-core 1.4.1.  I think I will go the route of excluding
> hadoop-common from the orc-core dependency and explicitly scope it as
> provided.  For modules that will ultimately live in one of our Hadoop
> deployments, this should work fine.  Moreover, we already adopt this
> sort of packaging strategy in our project, so it wouldn't be too much
> of a stretch.  For the "standalone" operation, I will probably just
> create a separate module that explicitly declares hadoop-common as a
> compile dependency, so those not on Hadoop can simply bring in the
> same version that orc itself specifies.
>
> I think the longer term approach you describe makes good sense,
> Istv=C3=A1n.  Unfortunately, given other priorities I wouldn't be able to
> devote any time to it in the near future.  As far as Hadoop
> versioning, I think the minimum/desired approach outlined in Owen's
> message would work fine.
>
> On Wed, Jan 17, 2018 at 3:58 PM, Istv=C3=A1n <leccine@gmail.com> wrote:
> > Hi Jeff,
> >
> > Few months back I wondering about the same topic. Unfortunately depende=
ncy
> > management and importing libraries is not the strongest suit of Hadoop
> > related libraries and that includes ORC. We got with our project to the
> > point when we considered forking ORC and just create our own version of=
 it
> > becuase we want to use it outside Hadoop. Unfortunately Hadoop related =
code
> > is all over the place so we decided to just exclude a bunch of librarie=
s and
> > we ended up with a pom.xml like this:
> >
> > https://gist.github.com/l1x/0c00fe69bdcb6db305e0bffae042817c
> >
> > Keep in mind this is an older version of ORC that is included in the Hi=
ve
> > 1.2.1 release. I also started to work on a project to deal with Hadoop
> > dependencies easier but we dropped the entire project altogether.
> >
> > I think what would be reasonable is to have libraries like ORC at the b=
ottom
> > of the dependency stack (orc-core) and create a library that provides a=
n
> > interface for Hadoop or any project that wants to use this file format
> > (orc-hadoop, orc-something, etc.) so that we don't have this dependency=
 hell
> > that you can see in projects like ORC. I am not sure who else is intere=
sted
> > in such a project but if you are I think I could provide you some
> > development time.
> >
> > Owen was really helpful with the efforts. See more here:
> > https://issues.apache.org/jira/browse/ORC-151
> > https://github.com/apache/orc/pull/96
> >
> > Thanks,
> > Istvan
> >
> > On Wed, Jan 17, 2018 at 6:16 PM, Jeff Evans <jeffrey.wayne.evans@gmail.=
com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I am a software engineer with StreamSets, and am working on a project
> >> to incorporate ORC support into our product.  The first phase of this
> >> will be to support Avro to ORC conversion. (I saw a post on this topic
> >> to this list a couple months ago, before I joined.  Would be happy to
> >> share more details/code for scrutiny once it's closer to completion.)
> >>
> >> One issue I'm running into is the dependency of orc-core on
> >> hadoop-common.  Our product can be deployed in a variety of Hadoop
> >> distributions from different vendors, and also standalone (i.e. not in
> >> Hadoop at all).  Therefore, this dependency makes it difficult for us
> >> to incorporate orc-core in a central way in our codebase (since the
> >> vendor typically provides this jar in their installation).  Besides
> >> that, hadoop-common also brings in a number of other problematic
> >> dependencies for us (the deprecated com.sun.jersey group for Jersey
> >> and zookeeper, to name a couple).
> >>
> >> Does anyone have suggestions for how to work around this?  It seems
> >> the only actual classes I reference are the same ones referenced in
> >> the core-java tutorial (org.apache.hadoop.conf.Configuration and
> >> org.apache.hadoop.fs.Path), although obviously the library may be
> >> making use of more itself.  Are there any plans to remove the
> >> dependency on Hadoop down the line, or should I accommodate this by
> >> shuffling our dependencies such that our code only lives in a
> >> Hadoop-provided packaging configuration?  Any insight is appreciated.
> >
> >
> >
> >
> > --
> > the sun shines for all
> >
> >