From user-return-183-archive-asf-public=cust-asf.ponee.io@orc.apache.org Wed Jan 17 22:58:47 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 46ACD18062C for ; Wed, 17 Jan 2018 22:58:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 36CF8160C35; Wed, 17 Jan 2018 21:58:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 56B8D160C25 for ; Wed, 17 Jan 2018 22:58:46 +0100 (CET) Received: (qmail 70740 invoked by uid 500); 17 Jan 2018 21:58:45 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 70730 invoked by uid 99); 17 Jan 2018 21:58:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jan 2018 21:58:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1E9E8D6539 for ; Wed, 17 Jan 2018 21:58:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id QsfG2LVQma3S for ; Wed, 17 Jan 2018 21:58:44 +0000 (UTC) Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 64DA35FAE4 for ; Wed, 17 Jan 2018 21:58:43 +0000 (UTC) Received: by mail-lf0-f65.google.com with SMTP id q17so13728999lfa.9 for ; Wed, 17 Jan 2018 13:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=lAX5OCEEiJ5INLet85mZgmskNgeVVVI/aosGFopP0X4=; b=uNRLGFnhdLUGfh9wkmt+ZaWEPELzy8Ap8Nkt0MjS4PAdn7uq2rYPuc+SyffDSId/wA ewY8ThkJJmDVxFK3MsSqsPDQ/YjrV72ei2Ukm7mqP3XFvltQXwCj1ldV56dO7brJnLcP Li2FltOnRGawh+mLTPYKiF5HLQV311YLXJ5a4tF04QufCZeAankSPmWwTNdZgP+Qbkz9 RpxtjKWRRooCdWZbKk1SxdP0xHrdoFuKQXpH2ec/u+wUvWVxUDLJpRgUW4CKfWXxp8sG aMh+BODhSC2fDPn/nmr1gwYoZKzJk8DNTEMNEcC+0eRbTDaTUmNzNeLMCXMmiefy3p2Y xvQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=lAX5OCEEiJ5INLet85mZgmskNgeVVVI/aosGFopP0X4=; b=Gswta/CLGeLwiG30QSucCxOKmFvSKHimD79COYuIB+MbrlDFpceHK1videdyiXb2A/ F/k3aMoeRVz037XzncRmq0dqXht7olSr+wWEtGGg1L5lkauGNakJdOljXC0gnlJlwpc1 S4h0wROXus8KLZFyy/VZrDAY6WoD3ACiDJWUNho3p7D8djzBLER8PD8c9JzSEGXoRaAA /uoh4L5ZNtHjY31NeuuUQHmdzDNkMIxwVxszMKGk5nd/STBPisMpPlRgAtuXLDVo7+P/ JTk6o3KJfhy5hBxXAPoUoN+4fPptHafUnmhihkNrTTncOOEJHwhVC8gBA+tX14tthE5n xmIA== X-Gm-Message-State: AKwxyteveZEq1mvyfEVpn6+SqgYBJVArg3P+OnMNDkHuZBcY2TpVbhGF 7tL/SswpuzvfQpPfwPn+xDQBhWdLRktrpqLOurE80xOm X-Google-Smtp-Source: ACJfBou/BeKGAMiJdXiVEM8F+i2q/YoMkXsFsiCH0Lj0hEb6FyAQ4J+VLCI7+5UqGrse8ZATmbTGjuotMNjNsA/tOsU= X-Received: by 10.25.147.11 with SMTP id v11mr17532407lfd.93.1516226321858; Wed, 17 Jan 2018 13:58:41 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.26.72 with HTTP; Wed, 17 Jan 2018 13:58:40 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Istv=C3=A1n?= Date: Wed, 17 Jan 2018 22:58:40 +0100 Message-ID: Subject: Re: Orc core (Java) dependency on hadoop-common To: user@orc.apache.org Content-Type: multipart/alternative; boundary="f403045c37865044060562fff3b5" --f403045c37865044060562fff3b5 Content-Type: text/plain; charset="UTF-8" Hi Jeff, Few months back I wondering about the same topic. Unfortunately dependency management and importing libraries is not the strongest suit of Hadoop related libraries and that includes ORC. We got with our project to the point when we considered forking ORC and just create our own version of it becuase we want to use it outside Hadoop. Unfortunately Hadoop related code is all over the place so we decided to just exclude a bunch of libraries and we ended up with a pom.xml like this: https://gist.github.com/l1x/0c00fe69bdcb6db305e0bffae042817c Keep in mind this is an older version of ORC that is included in the Hive 1.2.1 release. I also started to work on a project to deal with Hadoop dependencies easier but we dropped the entire project altogether. I think what would be reasonable is to have libraries like ORC at the bottom of the dependency stack (orc-core) and create a library that provides an interface for Hadoop or any project that wants to use this file format (orc-hadoop, orc-something, etc.) so that we don't have this dependency hell that you can see in projects like ORC. I am not sure who else is interested in such a project but if you are I think I could provide you some development time. Owen was really helpful with the efforts. See more here: https://issues.apache.org/jira/browse/ORC-151 https://github.com/apache/orc/pull/96 Thanks, Istvan On Wed, Jan 17, 2018 at 6:16 PM, Jeff Evans wrote: > Hi, > > I am a software engineer with StreamSets, and am working on a project > to incorporate ORC support into our product. The first phase of this > will be to support Avro to ORC conversion. (I saw a post on this topic > to this list a couple months ago, before I joined. Would be happy to > share more details/code for scrutiny once it's closer to completion.) > > One issue I'm running into is the dependency of orc-core on > hadoop-common. Our product can be deployed in a variety of Hadoop > distributions from different vendors, and also standalone (i.e. not in > Hadoop at all). Therefore, this dependency makes it difficult for us > to incorporate orc-core in a central way in our codebase (since the > vendor typically provides this jar in their installation). Besides > that, hadoop-common also brings in a number of other problematic > dependencies for us (the deprecated com.sun.jersey group for Jersey > and zookeeper, to name a couple). > > Does anyone have suggestions for how to work around this? It seems > the only actual classes I reference are the same ones referenced in > the core-java tutorial (org.apache.hadoop.conf.Configuration and > org.apache.hadoop.fs.Path), although obviously the library may be > making use of more itself. Are there any plans to remove the > dependency on Hadoop down the line, or should I accommodate this by > shuffling our dependencies such that our code only lives in a > Hadoop-provided packaging configuration? Any insight is appreciated. > -- the sun shines for all --f403045c37865044060562fff3b5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Jeff,

Few months back I wondering ab= out the same topic. Unfortunately dependency management and importing libra= ries is not the strongest suit of Hadoop related libraries and that include= s ORC. We got with our project to the point when we considered forking ORC = and just create our own version of it becuase we want to use it outside Had= oop. Unfortunately Hadoop related code is all over the place so we decided = to just exclude a bunch of libraries and we ended up with a pom.xml like th= is:


Keep in mind this is an older= version of ORC that is included in the Hive 1.2.1 release. I also started = to work on a project to deal with Hadoop dependencies easier but we dropped= the entire project altogether.

I think what would= be reasonable is to have libraries like ORC at the bottom of the dependenc= y stack (orc-core) and create a library that provides an interface for Hado= op or any project that wants to use this file format (orc-hadoop, orc-somet= hing, etc.) so that we don't have this dependency hell that you can see= in projects like ORC. I am not sure who else is interested in such a proje= ct but if you are I think I could provide you some development time.
<= div>
Owen was really helpful with the efforts. See more here:= https://issues.a= pache.org/jira/browse/ORC-151=C2=A0https://github.com/apache/orc/pull/96

Thanks,
Istvan

On Wed, Jan 17, 2018 at 6:16 PM, Jeff Evans <jeffrey.wayne.evans@gmail.com> wrote:
Hi,

I am a software engineer with StreamSets, and am working on a project
to incorporate ORC support into our product.=C2=A0 The first phase of this<= br> will be to support Avro to ORC conversion. (I saw a post on this topic
to this list a couple months ago, before I joined.=C2=A0 Would be happy to<= br> share more details/code for scrutiny once it's closer to completion.)
One issue I'm running into is the dependency of orc-core on
hadoop-common.=C2=A0 Our product can be deployed in a variety of Hadoop
distributions from different vendors, and also standalone (i.e. not in
Hadoop at all).=C2=A0 Therefore, this dependency makes it difficult for us<= br> to incorporate orc-core in a central way in our codebase (since the
vendor typically provides this jar in their installation).=C2=A0 Besides that, hadoop-common also brings in a number of other problematic
dependencies for us (the deprecated com.sun.jersey group for Jersey
and zookeeper, to name a couple).

Does anyone have suggestions for how to work around this?=C2=A0 It seems the only actual classes I reference are the same ones referenced in
the core-java tutorial (org.apache.hadoop.conf.Configuration and
org.apache.hadoop.fs.Path), although obviously the library may be
making use of more itself.=C2=A0 Are there any plans to remove the
dependency on Hadoop down the line, or should I accommodate this by
shuffling our dependencies such that our code only lives in a
Hadoop-provided packaging configuration?=C2=A0 Any insight is appreciated.<= br>



--
the sun shines for = all


--f403045c37865044060562fff3b5--