avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Date Fri, 03 Feb 2012 00:33:10 GMT
Cleaned up my environment by unsetting HADOOP_HOME, and removing some old
jacksons in my CLASSPATH and Pig's AvroStorage works again.

Woot!

On Thu, Feb 2, 2012 at 3:47 PM, Russell Jurney <russell.jurney@gmail.com>wrote:

> Spoken too soon... this happens no matter what avros I load now.  I can't
> figure that anything has changed regarding jars, etc.  Confused.
>
> I think this happens when Avro is parsing the schema?
>
> Pig Stack Trace
> ---------------
> ERROR 2998: Unhandled internal error.
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>
> java.lang.NoSuchMethodError:
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
> at org.apache.avro.Schema.<clinit>(Schema.java:82)
>  at
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.<clinit>(AvroStorageUtils.java:49)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
>  at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
>  at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>  at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>  at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>  at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>  at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>  at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> ================================================================================
>
> On Thu, Feb 2, 2012 at 2:53 PM, Russell Jurney <russell.jurney@gmail.com>wrote:
>
>> Further examination shows that the problematic emails I am encoding are
>> formatted in ISO-8859-1, not UTF-8.  That is why I am getting character
>> problems.  Looks like it is not an Avro problem after all.  Thanks!  :)
>>
>>
>> On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney <russell.jurney@gmail.com>wrote:
>>
>>> A little bit more searching shows this:
>>>
>>>
>>> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/
>>>
>>>
>>> On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney <russell.jurney@gmail.com
>>> > wrote:
>>>
>>>> The jars being used are:
>>>>
>>>> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
>>>> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
>>>> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
>>>> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
>>>> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>>>>
>>>> On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <jbaldassari@gmail.com
>>>> > wrote:
>>>>
>>>>> HI Russell,
>>>>>
>>>>> I'm not sure about the Python error, but the Java error looks like a
>>>>> classpath problem, not a schema parsing issue.  The NoSuchMethodError
in
>>>>> the stack trace indicates that Avro was trying to invoke a method in
the
>>>>> Jackson library that wasn't present at run-time.  My guess is that your
>>>>> program (or Pig?) either has two incompatible versions of the Jackson
>>>>> library on its classpath or maybe Avro's Jackson dependency has been
>>>>> excluded and a version that is incompatible with Avro is on the classpath.
>>>>>
>>>>> Which version of Avro is being used?  Running 'mvn dependency:tree' in
>>>>> Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify
that
>>>>> only one version of Jackson is on the classpath and that it's the version
>>>>> that is required by whatever version of Avro is on the classpath?
>>>>>
>>>>> -James
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <
>>>>> russell.jurney@gmail.com> wrote:
>>>>>
>>>>>> Correction: when I read the file in Python, I get the error below.
>>>>>>  It looks like a unicode problem?  Can one tell Avro how to handle
this?
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>   File "./cat_avro", line 21, in <module>
>>>>>>     for record in df_reader:
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
>>>>>> line 354, in next
>>>>>>     datum = self.datum_reader.read(self.datum_decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 445, in read
>>>>>>     return self.read_data(self.writers_schema, self.readers_schema,
>>>>>> decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 490, in read_data
>>>>>>     return self.read_record(writers_schema, readers_schema, decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 690, in read_record
>>>>>>     field_val = self.read_data(field.type, readers_field.type,
>>>>>> decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 488, in read_data
>>>>>>     return self.read_union(writers_schema, readers_schema, decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 654, in read_union
>>>>>>     return self.read_data(selected_writers_schema, readers_schema,
>>>>>> decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 458, in read_data
>>>>>>     return self.read_data(writers_schema, s, decoder)
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 468, in read_data
>>>>>>     return decoder.read_utf8()
>>>>>>   File
>>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>>>> line 233, in read_utf8
>>>>>>     return unicode(self.read_bytes(), "utf-8")
>>>>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position
>>>>>> 543: invalid start byte
>>>>>>
>>>>>>
>>>>>>  On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <
>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>
>>>>>>> I am writing Avro records in Ruby using the avro ruby gem in
1.8.7.
>>>>>>>  I have problems with loading these files sometimes.  As a result,
I am
>>>>>>> unable to write large files that are readable.
>>>>>>>
>>>>>>> The exception I get is below.  Anyone have an idea what this
means?
>>>>>>>  It looks like Avro is having trouble parsing the schema.  The
avro files
>>>>>>> parse in Ruby and Python, just not Pig.  Are there more rigorous
checks in
>>>>>>> Java?
>>>>>>>
>>>>>>> Pig Stack Trace
>>>>>>> ---------------
>>>>>>> ERROR 2998: Unhandled internal error.
>>>>>>> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>>>>>>>
>>>>>>> java.lang.NoSuchMethodError:
>>>>>>> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>>>>>>> at org.apache.avro.Schema.<clinit>(Schema.java:82)
>>>>>>>  at
>>>>>>> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.<clinit>(AvroStorageUtils.java:49)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
>>>>>>>  at
>>>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
>>>>>>>  at
>>>>>>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>>>>>>> at
>>>>>>> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>>>>>>>  at
>>>>>>> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>>>>>>> at
>>>>>>> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>>>>>>>  at
>>>>>>> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>>>>>>> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>>>>>>  at
>>>>>>> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>>>>>>> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>>>>>>>  at
>>>>>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>>>>>>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>>>>>>>  at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>>>>>>> at
>>>>>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>>>>>>>  at
>>>>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>>>> at
>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>>>>>>>  at
>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>>>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>>>  at org.apache.pig.Main.run(Main.java:495)
>>>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>  at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>>
>>>>>>> ================================================================================
>>>>>>>
>>>>>>> --
>>>>>>> Russell Jurney
>>>>>>> twitter.com/rjurney
>>>>>>> russell.jurney@gmail.com
>>>>>>> datasyndrome.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Russell Jurney
>>>>>> twitter.com/rjurney
>>>>>> russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Russell Jurney
>>>> twitter.com/rjurney
>>>> russell.jurney@gmail.com
>>>> datasyndrome.com
>>>>
>>>
>>>
>>>
>>> --
>>> Russell Jurney
>>> twitter.com/rjurney
>>> russell.jurney@gmail.com
>>> datasyndrome.com
>>>
>>
>>
>>
>> --
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jurney@gmail.com
>> datasyndrome.com
>>
>
>
>
> --
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>



-- 
Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

Mime
View raw message