avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Thurn <king...@dcswcc.org>
Subject Re: Hadoop Streaming mangles Python-produced Avro
Date Wed, 27 Mar 2013 20:16:50 GMT
I want to confirm the bug report by Alex Hanna made to this mailing list in
message #2233 on 2013-03-06 (also entered into JIRA as
https://issues.apache.org/jira/browse/AVRO-1271, but apparently punted to
this mailing list).

There is a bug in org.apache.avro.mapred.AvroAsTextInputFormat.
In my case, it looks like a faulty regular expression which is trying to do
something with smart quotes.  If an Avro datum destined to be a string
value in JSON contains a smart quote, the datum string gets weirdly
duplicated and uppercased in the JSON output.  For example, if the Avro
datum is

[before“after

the output JSON will contain the following (which is not legal JSON and
cannot be parsed):

[before\u[BEFORE“AFTERafter

 - - Martin

Mime
View raw message