orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: Orc core (Java) dependency on hadoop-common
Date Wed, 17 Jan 2018 18:23:09 GMT
Which version of ORC are you using? We've been pretty aggressive about
keeping the dependencies down as much as possible. In the upcoming 1.5,
we've made the support more flexible and define a minimum and desired
version of Hadoop and will dynamically use the features from the newer
versions if you are using a version that supports it.

There have been some users talking about trying to remove it, but it is
pretty complicated.

The hard bits:
* Configuration
* FileSystem
* zlib compression codec
* HDFS (in 1.5 for controlling the variable length blocks via shims)
* KeyProvider (for upcoming column encryption via shims)

You should probably look at the shims module on the master branch that was
refactored in ORC-234 and ORC-91. We made the non-shims modules only depend
on Hadoop 2.2, but the shims depends on Hadoop 2.7. Thus we can ensure that
the core unit tests run with hadoop 2.2 and yet have access to the features
that were only added in hadoop 2.7+.

So would that level of version flexibility be enough or is it more?

.. Owen

On Wed, Jan 17, 2018 at 9:16 AM, Jeff Evans <jeffrey.wayne.evans@gmail.com>

> Hi,
> I am a software engineer with StreamSets, and am working on a project
> to incorporate ORC support into our product.  The first phase of this
> will be to support Avro to ORC conversion. (I saw a post on this topic
> to this list a couple months ago, before I joined.  Would be happy to
> share more details/code for scrutiny once it's closer to completion.)
> One issue I'm running into is the dependency of orc-core on
> hadoop-common.  Our product can be deployed in a variety of Hadoop
> distributions from different vendors, and also standalone (i.e. not in
> Hadoop at all).  Therefore, this dependency makes it difficult for us
> to incorporate orc-core in a central way in our codebase (since the
> vendor typically provides this jar in their installation).  Besides
> that, hadoop-common also brings in a number of other problematic
> dependencies for us (the deprecated com.sun.jersey group for Jersey
> and zookeeper, to name a couple).
> Does anyone have suggestions for how to work around this?  It seems
> the only actual classes I reference are the same ones referenced in
> the core-java tutorial (org.apache.hadoop.conf.Configuration and
> org.apache.hadoop.fs.Path), although obviously the library may be
> making use of more itself.  Are there any plans to remove the
> dependency on Hadoop down the line, or should I accommodate this by
> shuffling our dependencies such that our code only lives in a
> Hadoop-provided packaging configuration?  Any insight is appreciated.

View raw message