hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11804) POC Hadoop Client w/o transitive dependencies
Date Tue, 08 Nov 2016 22:00:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648921#comment-15648921
] 

Andrew Wang commented on HADOOP-11804:
--------------------------------------

Thanks for the rev Sean. I tried it with Avro and got NoClassDefFound for Log4J:

{noformat}
testSort(org.apache.avro.mapred.TestAvroTextSort)  Time elapsed: 0.051 sec  <<< ERROR!
java.lang.NoClassDefFoundError: org/apache/log4j/Level
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:356)
	at org.apache.avro.mapred.TestAvroTextSort.testSort(TestAvroTextSort.java:37)
{noformat}

I think this is expected based on the contents of the hadoop-client-runtime pom.xml, which
marks log4j as optional. I manually added this dependency, and then hit this:

{noformat}
testReadAvro(org.apache.avro.hadoop.io.TestAvroSequenceFile)  Time elapsed: 0.016 sec  <<<
ERROR!
java.lang.NullPointerException: null
	at org.apache.hadoop.io.serializer.SerializationFactory.<init>(SerializationFactory.java:58)
	at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1248)
	at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1207)
	at org.apache.avro.hadoop.io.AvroSequenceFile$Writer.<init>(AvroSequenceFile.java:532)
	at org.apache.avro.hadoop.io.TestAvroSequenceFile.writeSequenceFile(TestAvroSequenceFile.java:200)
	at org.apache.avro.hadoop.io.TestAvroSequenceFile.testReadAvro(TestAvroSequenceFile.java:53)
{noformat}

I decompiled the SerializationFactory class, and noticed that it messed with the config key.
I think we need to add some kind of exclusion for CommonConfigurationKeysPublic.

{code}
    // before
    if (conf.get(CommonConfigurationKeys.IO_SERIALIZATIONS_KEY).equals("")) {
    // decompiled
    if (conf.get("org.apache.hadoop.shaded.io.serializations").equals("")) {
{noformat}

Here's my Avro diff for master (without the log4j addition) if you want to try this yourself:

https://gist.github.com/anonymous/c064c283348a2d1bbec00845678339f9

> POC Hadoop Client w/o transitive dependencies
> ---------------------------------------------
>
>                 Key: HADOOP-11804
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11804
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>         Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, HADOOP-11804.3.patch,
HADOOP-11804.4.patch, HADOOP-11804.5.patch, HADOOP-11804.6.patch, HADOOP-11804.7.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to talk with
a Hadoop cluster without seeing any of the implementation dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message