orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From István <lecc...@gmail.com>
Subject Re: ORC without Hadoop
Date Wed, 22 Feb 2017 19:11:36 GMT
Thanks Owen for explaining.

I understand that ORC was originally developed in the Hadoop land but as of
now there are several other use cases that do not require HDFS and Hadoop.
I have looked into Configuration and we are initialising the Writer with a
configuration that has 100+ entries that are totally irrelevant to writing
ORC files locally. I like composable systems using smaller building blocks
rather than pulling in 106 packages, several of them duplicate
or conflicting with other packages. I am going to look into what is easier
to split our orc-core and make it independent from Hadoop (using only the
Hive library for the VectorizedRowBatch if necessary).

Here is the current dependency tree of hadoop-common:

[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.6.4:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-annotations:jar:2.6.4:compile
[INFO] |  |  +- com.google.guava:guava:jar:11.0.2:compile
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  |  |  +- (commons-logging:commons-logging:jar:1.0.4:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  |  \- (commons-codec:commons-codec:jar:1.2:compile - omitted
for conflict with 1.4)
[INFO] |  |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  +- commons-net:commons-net:jar:3.1:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] |  |  +- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] |  |  +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |  |  |     +- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  |  |  |     \- javax.activation:activation:jar:1.1:compile
[INFO] |  |  |  +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile
- omitted for conflict with 1.9.13)
[INFO] |  |  |  +-
(org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for
conflict with 1.9.13)
[INFO] |  |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
[INFO] |  |  |  |  +-
(org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile - omitted for
conflict with 1.9.13)
[INFO] |  |  |  |  \-
(org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for
conflict with 1.9.13)
[INFO] |  |  |  +- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
[INFO] |  |  |  |  +-
(org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile - omitted for
conflict with 1.9.13)
[INFO] |  |  |  |  \-
(org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for
conflict with 1.9.13)
[INFO] |  |  |  \- (com.sun.jersey:jersey-core:jar:1.9:compile - omitted
for duplicate)
[INFO] |  |  +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] |  |  |  +- asm:asm:jar:3.1:compile
[INFO] |  |  |  \- (com.sun.jersey:jersey-core:jar:1.9:compile - omitted
for duplicate)
[INFO] |  |  +- tomcat:jasper-compiler:jar:5.5.23:runtime
[INFO] |  |  +- tomcat:jasper-runtime:jar:5.5.23:runtime
[INFO] |  |  |  \- (commons-el:commons-el:jar:1.0:runtime - omitted for
duplicate)
[INFO] |  |  +- commons-el:commons-el:jar:1.0:runtime
[INFO] |  |  |  \- (commons-logging:commons-logging:jar:1.0.3:runtime -
omitted for conflict with 1.0.4)
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  |  +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate)
[INFO] |  |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  |  +- (commons-codec:commons-codec:jar:1.4:compile - omitted
for duplicate)
[INFO] |  |  |  +- (commons-logging:commons-logging:jar:1.1.1:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  |  |  \- (org.apache.httpcomponents:httpcore:jar:4.1.2:compile
- omitted for duplicate)
[INFO] |  |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile
[INFO] |  |  |  \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:compile
[INFO] |  |  +- (commons-lang:commons-lang:jar:2.6:compile - omitted for
duplicate)
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  +-
(commons-collections:commons-collections:jar:3.2.1:compile - omitted for
conflict with 3.2.2)
[INFO] |  |  |  +- (commons-lang:commons-lang:jar:2.4:compile - omitted for
conflict with 2.6)
[INFO] |  |  |  +- (commons-logging:commons-logging:jar:1.1.1:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  +- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  |  |  \- (commons-logging:commons-logging:jar:1.0.3:compile
- omitted for conflict with 1.1.3)
[INFO] |  |  |  |  \- (commons-logging:commons-logging:jar:1.1:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  |  \-
commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  |     \- (commons-logging:commons-logging:jar:1.1.1:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  +- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  +- (org.slf4j:slf4j-log4j12:jar:1.7.5:compile - scope updated
from runtime; omitted for duplicate)
[INFO] |  |  +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile -
omitted for duplicate)
[INFO] |  |  +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
- omitted for duplicate)
[INFO] |  |  +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile -
omitted for duplicate)
[INFO] |  |  +- com.google.code.gson:gson:jar:2.2.4:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.6.4:compile
[INFO] |  |  |  +- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  |  +- (commons-codec:commons-codec:jar:1.4:compile - omitted
for duplicate)
[INFO] |  |  |  +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate)
[INFO] |  |  |  +- (org.slf4j:slf4j-log4j12:jar:1.7.5:runtime - omitted for
duplicate)
[INFO] |  |  |  +- (org.apache.httpcomponents:httpclient:jar:4.2.5:compile
- omitted for conflict with 4.1.2)
[INFO] |  |  |  +-
org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:compile
[INFO] |  |  |  |  +-
org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:compile
[INFO] |  |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted
for conflict with 1.7.7)
[INFO] |  |  |  |  +-
org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:compile
[INFO] |  |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted
for conflict with 1.7.7)
[INFO] |  |  |  |  +-
org.apache.directory.api:api-util:jar:1.0.0-M20:compile
[INFO] |  |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted
for conflict with 1.7.7)
[INFO] |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  |  +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile -
omitted for duplicate)
[INFO] |  |  |  \- org.apache.curator:curator-framework:jar:2.6.0:compile
[INFO] |  |  |     +- (org.apache.curator:curator-client:jar:2.6.0:compile
- omitted for duplicate)
[INFO] |  |  |     +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile -
omitted for duplicate)
[INFO] |  |  |     \- (com.google.guava:guava:jar:16.0.1:compile - omitted
for conflict with 11.0.2)
[INFO] |  |  +- com.jcraft:jsch:jar:0.1.42:compile
[INFO] |  |  +- org.apache.curator:curator-client:jar:2.6.0:compile
[INFO] |  |  |  +- (org.slf4j:slf4j-api:jar:1.7.6:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  |  +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile -
omitted for duplicate)
[INFO] |  |  |  \- (com.google.guava:guava:jar:16.0.1:compile - omitted for
conflict with 11.0.2)
[INFO] |  |  +- org.apache.curator:curator-recipes:jar:2.6.0:compile
[INFO] |  |  |  +- (org.apache.curator:curator-framework:jar:2.6.0:compile
- omitted for duplicate)
[INFO] |  |  |  +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile -
omitted for duplicate)
[INFO] |  |  |  \- (com.google.guava:guava:jar:16.0.1:compile - omitted for
conflict with 11.0.2)
[INFO] |  |  +- org.htrace:htrace-core:jar:3.0.4:compile
[INFO] |  |  |  +- (com.google.guava:guava:jar:12.0.1:compile - omitted for
conflict with 11.0.2)
[INFO] |  |  |  \- (commons-logging:commons-logging:jar:1.1.1:compile -
omitted for conflict with 1.1.3)
[INFO] |  |  +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
[INFO] |  |  |  +- (org.slf4j:slf4j-api:jar:1.6.1:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  |  +- org.slf4j:slf4j-log4j12:jar:1.6.1:compile
[INFO] |  |  |  |  +- (org.slf4j:slf4j-api:jar:1.6.1:compile - omitted for
conflict with 1.7.7)
[INFO] |  |  |  |  \- (log4j:log4j:jar:1.2.16:compile - omitted for
conflict with 1.2.17)
[INFO] |  |  |  +- (log4j:log4j:jar:1.2.16:compile - omitted for conflict
with 1.2.17)
[INFO] |  |  |  \- io.netty:netty:jar:3.7.0.Final:compile
[INFO] |  |  \- (org.apache.commons:commons-compress:jar:1.4.1:compile -
omitted for conflict with 1.8.1)
[INFO] |  +- org.apache.hive:hive-storage-api:jar:2.2.0:compile
[INFO] |  |  +- (commons-lang:commons-lang:jar:2.6:compile - omitted for
duplicate)
[INFO] |  |  \- (org.slf4j:slf4j-api:jar:1.7.10:compile - omitted for
conflict with 1.7.7)
[INFO] |  \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict
with 1.7.7)


Regards,
Istvan
ᐧ

On Wed, Feb 22, 2017 at 5:31 PM, Owen O'Malley <omalley@apache.org> wrote:

>
> On Wed, Feb 22, 2017 at 12:41 AM, István <leccine@gmail.com> wrote:
>
>> Hi,
>>
>> I was wondering how hard it would be to drop Hadoop as a dependency from
>> ORC.
>>
>
> We could make a new module that removes the Hadoop dependency. The
> fundamental parts we would need to abstract out are:
>
> * Configuration
> * FileSystem
>
> The biggest concern is API compatibility and making sure that we don't
> break users.
>
> Another concern is that we'd need to change the storage-api jar to not
> depend on Hadoop either. That would be harder in some ways, because it has
> some uses of the Writable interfaces.
>
>
>> I need Hadoop because I would like to set a path (not on HDFS) for the
>> ORC file and OrcFile requires and empty Hadoop config. If I am not mistaken
>> these could be achieved not using the Hadoop libraries.
>>
>
> You shouldn't need hdfs or an empty hadoop config. My Mac laptop can use
> the orc-tools-1.3.3-uber.jar to read ORC files from local disk without
> Hadoop (or its configuration) installed. The uber tools jar has the Hadoop
> jars included, but it doesn't have an impact other than making the size
> larger.
>
> I've filed a jira https://issues.apache.org/jira/browse/ORC-151 for going
> through and excluding more of the transitive dependencies from the direct
> dependencies especially the hadoop jar.
>
> So
>
>
>> Does anybody has a solution to avoiding Hadoop libraries for a ORC
>> project?
>>
>> Thank you in advance,
>> Istvan
>>
>> --
>> the sun shines for all
>>
>>
>> ᐧ
>>
>
>


-- 
the sun shines for all

Mime
View raw message