avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Newb question on imorting JSON and defaults
Date Thu, 23 May 2013 19:24:06 GMT

On 5/22/13 2:26 PM, "Gregory (Grisha) Trubetskoy" <grisha@apache.org>

>I have a test.json file that looks like this:
>{"first":"John", "last":"Doe", "middle":"C"}
>{"first":"John", "last":"Doe"}
>(Second line does NOT have a "middle" element).
>And I have a test.schema file that looks like this:
>  "type":"record",
>  "fields": [
>     {"name":"first",  "type":"string"},
>     {"name":"middle", "type":"string", "default":""},
>     {"name":"last",   "type":"string"}
>I then try to use fromjson, as follows, and it chokes on the second line:
>$ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema
>test.json > test.avro
>Exception in thread "main" org.apache.avro.AvroTypeException: Expected
>field name not found: middle
>         at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)
>         at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>         at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
>         at 
>         at 
>         at 
>         at 
>         at 
>         at 
>         at 
>         at 
>         at 
>         at 
>         at org.apache.avro.tool.Main.run(Main.java:80)
>         at org.apache.avro.tool.Main.main(Main.java:69)
>The short story is - I need to convert a bunch of JSON where an element
>may not be present sometimes, in which case I'd want it to default to
>something sensible, e.g. blank or null.
>According to the Schema Resolution "if the reader's record schema has a
>field that contains a default value, and writer's schema does not have a
>field with the same name, then the reader should use the default value
>from its field."
>I'm clearly missing something obvious, any help would be appreciated!

There are two things that seem to be missing here:
 1. The fromjson tool is configuring the "writer's schema" (and readers's)
the one you provided.   Avro is expecting every
JSON fragment you are giving it to have the same schema.
 2. The tool will not work for all arbitrary json, it expects json in the
format that the Avro JSON Encoder writes.  There are a few differences
with expectations, primarily when disambiguating union types and maps from

To perform schema evolution while reading, you may need to separate json
fragments missing "middle" from those that have it, and run the tool
twice, with corresponding schemas for each case.
Alternatively the tool could be modified to handle schema resolution or
deal with different json encodings as

Alternatively, you can avoid schema resolution and write two files, one
with data in each schema after separating the records.   Then you can deal
with schema resolution in a later pass through the data with other tools
(e.g. data file reader + writer), or only lazily
when reading resolve the data into the schema you wish.


View raw message