avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: java.lang.NoSuchMethodError: org.codehaus.jackson.JsonFactory.enable
Date Tue, 25 Jan 2011 02:45:11 GMT
That looks like a bug in org.apache.pig.piggybank.storage.avro.AvroStorage()
It is not using Hadoop's wildcard/glob matching.   You'll want to file a bug against that
(it is part of piggybank, not Avro).

If the schema is being dynamically read from the files, it needs to use Hadoop's globbing
to resolve one of the files in order to read it and inspect the schema, and not take the passed
in string literally.

At this point, you'll need help from those that contributed that to piggybank.  The pig JIRA
related to it is:
https://issues.apache.org/jira/browse/PIG-1748
But that may not be the best place for this usability question.



On 1/24/11 6:14 PM, "felix gao" <gre1600@gmail.com<mailto:gre1600@gmail.com>>
wrote:

Thanks for the info. I have not compiled a new version of pig and it works when I load a single
avro file. But it failed when I do wildcard filename matching.
log_load = LOAD '/user/felix/avro/access_log.test.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
  <--- works fine
but
log_load = LOAD '/user/felix/avro/*.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

ERROR 1018: Problem determining schema during load

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Problem
determining schema during load
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1342)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1286)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:460)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:738)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:414)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem determining schema
during load
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:752)
    at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1336)
    ... 8 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: Problem determining
schema during load
    at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:156)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:750)
    ... 10 more
Caused by: java.io.FileNotFoundException: File does not exist: /user/felix/avro/*.avro
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:185)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:431)
    at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:181)
    at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:133)
    at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:108)
    at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:233)
    at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:169)
    at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:150)
    ... 11 more


How do I load multiple avro files with the load function.

Felix


On Mon, Jan 24, 2011 at 4:25 PM, Scott Carey <scott@richrelevance.com<mailto:scott@richrelevance.com>>
wrote:
A jar prior to the jackson jar contains an earlier version of jackson inside of it.

Pig's jar typically contains all its dependencies (there is 'pig-withouthadoop.jar' instead).

So my guess is that one of these (look at a listing of jar contents) has jackson in it:

/usr/lib/pig/bin/../pig-0.7.0+16-core.jar
/usr/lib/pig/bin/../pig-0.7.0+16.jar
/usr/lib/pig/bin/../build/pig-*-core.jar:
/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar


On 1/24/11 4:06 PM, "felix gao" <gre1600@gmail.com<mailto:gre1600@gmail.com>>
wrote:

here is the actual process that is running the pig script, I hope this helps.

root     20820 19838 66 18:57 pts/0    00:00:00 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/..
-Dpig.root.logger=INFO,console,DRFA -classpath /usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/*
org.apache.pig.Main avro.pig


pig -secretDebugCmd
dry run:
/usr/java/default/bin/java -Xmx1000m -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/..
-Dpig.root.logger=INFO,console,DRFA -classpath /usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig/bin/../pig-0.7.0+16-core.jar:/usr/lib/pig/bin/../pig-0.7.0+16.jar:/usr/lib/pig/bin/../build/pig-*-core.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0-test.jar:/usr/lib/pig/bin/../lib/hbase-0.20.0.jar:/usr/lib/pig/bin/../lib/zookeeper-hbase-1329.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.6.jar:/usr/lib/hadoop-0.20/lib/hadoop-thriftfs-0.20.2+737.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jdiff:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/native:/usr/lib/hadoop-0.20/lib/native_libs.tar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo.0.4.4.jar:/usr/lib/hadoop-0.20/conf::/home/felix/hadoop-lzo.jar:/home/felix/elephant-bird.jar:/home/felix/elephant-bird/lib/*
org.apache.pig.Main


seems the jackson 1.5.5 is on the classpath of the pig as well as tasktracker and the actual
job.

Felix


On Mon, Jan 24, 2011 at 3:44 PM, Tatu Saloranta <tsaloranta@gmail.com<mailto:tsaloranta@gmail.com>>
wrote:
On Mon, Jan 24, 2011 at 3:13 PM, Scott Carey <scott@richrelevance.com<mailto:scott@richrelevance.com>>
wrote:
> That is confusing.  Can you capture the classpath of an actual task process,
> not just the TT?  They shouldn't differ much, but it is worth checking.
> Jackson 1.3 (or was it 1.2?) and above have all been backwards compatible
> with each other I believe.   And the error you are getting is definitely
> caused by accessing the enable() methods that were added after 1.0.1.
> I can change the Avro dependency on Jackson to 1.5.5, 1.7.1, or 1.3, and
> unit tests pass.  If I change it to 1.2, 1.1, or 1.0.1 they break.

Just in case anyone is interested, this is due to change in 1.3.0
which changed return type of configuration method from 'void' to
ObjectMapper, to allow fluent-style chaining of configuration. This is
source compatible, but unfortunately binary incompatible change. On
plus side, it is the only known such problem, which makes it easier to
recognize.

-+ Tatu +-



Mime
View raw message