avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Bockrath-Vandegrift <llas...@gmail.com>
Subject Re: Hadoop serialization DatumReader/Writer
Date Mon, 13 May 2013 23:22:49 GMT
Scott Carey <scottcarey@apache.org> writes:

> Making the DatumReader/Writers configurable would be a welcome
> addition.


> Ideally, much more of what goes on there could be:
>  1. configuration driven
>  2. pre-computed to avoid repeated work during decoding/encoding
> We do some of both already.  The trick is to do #1 without impacting
> performance and #2 requires a bigger overhaul.

Which work in particular?  In my pass through the AvroSerialization
implementation so far, it looks like each MR task would create either
one or two Serializers/Deserializers (key and value), each of which in
turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
Or do De/Serializers get created multiple times per task?

> If you would like, a contribution including a Clojure related maven
> module or two that depends on the Java stuff would be a welcome
> addition and allow us to identify compatibility issues as we change
> the Java library over time.

That sounds like a great end-goal.  Right now at the company I work for
(Damballa) we've just started getting our toes wet with Avro.  Avro won
our serialization-format bake-off, but we haven't started actually using
it.  I just finished an initial pass at Avro-Clojure integration and we
have released it under an open source license:


I would very much like to eventually get a iteration of it into Avro
proper, but I want to actually start using it and Avro first, so we can
hammer out any interface issues etc.

Anyway, I'll try to work up a patch to add some more configuration hooks
to the AvroSerialization.  Should I also create a ticket in the Avro
issue tracker?


View raw message