avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Map having <string, Object>
Date Wed, 07 Dec 2011 17:36:42 GMT
On 12/07/2011 05:16 AM, Gaurav wrote:
> One option is to construct record schema on the fly and second option is to
> use unions to write schema in a general way.
> 
> Problems with 1 is that we have to construct schema everytime depending upon
> keys and then attach the entire string schema to a relatively small record.

You might instead write the Schema more efficiently in binary.

It could be written as binary Json using the following:

http://avro.apache.org/docs/current/api/java/org/apache/avro/data/Json.html

Or there's an even more efficient schema-for-schemas approach in:

https://issues.apache.org/jira/browse/AVRO-251

(I don't know if that patch is still up to date.  If you like I can
update it.  If someone finds it useful then I'll commit it.)

> But in second schema, u don't need to write schema on the wire as it is
> present with client also.
> 
> I have written one such sample schema:
> {"type":"map","values":["int","long","float","double","string","boolean",{"type":"map","values":["int","long","float","double","string","boolean"]}]}
> 
> Do you guys think writing something of this sort makes sense or is there any
> better approach to this?

A map like that is a totally reasonable approach when things vary a lot.

If the schema is really different for each instance written then
building a new schema each time might end up hurting performance.

If there are actually only relatively few schemas that re-occur then
they might be cached and reused.

If some fields are always present then you might put those in a record
and have a field in the record with a map like that for other stuff.
This is a common approach.  Every record might have a date and uid or
somesuch, but other aspects may vary.

Doug

Mime
View raw message