avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Date Thu, 02 Feb 2012 22:53:16 GMT
Further examination shows that the problematic emails I am encoding are
formatted in ISO-8859-1, not UTF-8.  That is why I am getting character
problems.  Looks like it is not an Avro problem after all.  Thanks!  :)

On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney <russell.jurney@gmail.com>wrote:

> A little bit more searching shows this:
>
>
> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/
>
>
> On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney <russell.jurney@gmail.com>wrote:
>
>> The jars being used are:
>>
>> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
>> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
>> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
>> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
>> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>>
>> On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <jbaldassari@gmail.com>wrote:
>>
>>> HI Russell,
>>>
>>> I'm not sure about the Python error, but the Java error looks like a
>>> classpath problem, not a schema parsing issue.  The NoSuchMethodError in
>>> the stack trace indicates that Avro was trying to invoke a method in the
>>> Jackson library that wasn't present at run-time.  My guess is that your
>>> program (or Pig?) either has two incompatible versions of the Jackson
>>> library on its classpath or maybe Avro's Jackson dependency has been
>>> excluded and a version that is incompatible with Avro is on the classpath.
>>>
>>> Which version of Avro is being used?  Running 'mvn dependency:tree' in
>>> Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify that
>>> only one version of Jackson is on the classpath and that it's the version
>>> that is required by whatever version of Avro is on the classpath?
>>>
>>> -James
>>>
>>>
>>>
>>> On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <russell.jurney@gmail.com
>>> > wrote:
>>>
>>>> Correction: when I read the file in Python, I get the error below.  It
>>>> looks like a unicode problem?  Can one tell Avro how to handle this?
>>>>
>>>> Traceback (most recent call last):
>>>>   File "./cat_avro", line 21, in <module>
>>>>     for record in df_reader:
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
>>>> line 354, in next
>>>>     datum = self.datum_reader.read(self.datum_decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 445, in read
>>>>     return self.read_data(self.writers_schema, self.readers_schema,
>>>> decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 490, in read_data
>>>>     return self.read_record(writers_schema, readers_schema, decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 690, in read_record
>>>>     field_val = self.read_data(field.type, readers_field.type, decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 488, in read_data
>>>>     return self.read_union(writers_schema, readers_schema, decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 654, in read_union
>>>>     return self.read_data(selected_writers_schema, readers_schema,
>>>> decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 458, in read_data
>>>>     return self.read_data(writers_schema, s, decoder)
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 468, in read_data
>>>>     return decoder.read_utf8()
>>>>   File
>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>>> line 233, in read_utf8
>>>>     return unicode(self.read_bytes(), "utf-8")
>>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position
>>>> 543: invalid start byte
>>>>
>>>>
>>>>  On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <
>>>> russell.jurney@gmail.com> wrote:
>>>>
>>>>> I am writing Avro records in Ruby using the avro ruby gem in 1.8.7. 
I
>>>>> have problems with loading these files sometimes.  As a result, I am
unable
>>>>> to write large files that are readable.
>>>>>
>>>>> The exception I get is below.  Anyone have an idea what this means?
>>>>>  It looks like Avro is having trouble parsing the schema.  The avro files
>>>>> parse in Ruby and Python, just not Pig.  Are there more rigorous checks
in
>>>>> Java?
>>>>>
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2998: Unhandled internal error.
>>>>> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>>>>>
>>>>> java.lang.NoSuchMethodError:
>>>>> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>>>>> at org.apache.avro.Schema.<clinit>(Schema.java:82)
>>>>>  at
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.<clinit>(AvroStorageUtils.java:49)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
>>>>>  at
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
>>>>>  at
>>>>> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>>>>> at
>>>>> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
>>>>>  at
>>>>> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
>>>>> at
>>>>> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
>>>>>  at
>>>>> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>>>>> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>>>>  at
>>>>> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
>>>>> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
>>>>>  at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>>>>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>>>>>  at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>>>>> at
>>>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>>>>>  at
>>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>> at
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>>>>>  at
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>  at org.apache.pig.Main.run(Main.java:495)
>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>  at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>
>>>>> ================================================================================
>>>>>
>>>>> --
>>>>> Russell Jurney
>>>>> twitter.com/rjurney
>>>>> russell.jurney@gmail.com
>>>>> datasyndrome.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Russell Jurney
>>>> twitter.com/rjurney
>>>> russell.jurney@gmail.com
>>>> datasyndrome.com
>>>>
>>>
>>>
>>
>>
>> --
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jurney@gmail.com
>> datasyndrome.com
>>
>
>
>
> --
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>



-- 
Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

Mime
View raw message