avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Stanley <m...@mikestanley.org>
Subject Re: Go library
Date Fri, 21 Mar 2014 12:08:45 GMT
Cool.  Thanks for the response.

Quick update:

I've had early success reading avro files with the avro c library and
Go through cgo.  It was relatively straight forward.  It's a tad
tedious as the new "value" interface on the C library uses a lot of
macros, and cgo cannot (AFAIK) call macros directly.  Rather, I needed
to create C-wrapper functions for all the macros.  I did this for
about 8 or so macros (just the ones I needed as a proof of concept,
but it included most everything you'd expect on the reading side
including generic readers, retrieving writer schema, iterating over
record values, teasing out unions/disciriment branches, retrieving
strings & long values, get field by index and by name, corresponding
incref/decref, and generic readers,).  Aside from the macros,
integrating with C from Go is straight forward and, with some quick
tests, seems to be comparable in performance to C.

I have tested performance using a simple script that reads through an
Avro file, extracts two fields (string and long), and sums up the
longs across all records (strings are just dropped to the floor).  I
tested with a ~900M avro file (compressed blocks) that has about 25M
records.  On my machine, the simple C library I built runs through it
in about 42seconds.  The Go library I have that essentially does the
same thing with Go/Cgo accomplishes the same task in about 51 seconds.
 A more common (in my domain) sized input (~270M avro file) containing
~7.5M records runs ~15s C and ~18s in Go).   We regularly process 100s
of files of that size/shape.   This is not taking advantage of any of
the Go concurrency routines / etc. and the Go code is largely just the
C code in Go clothing.  But i was pleased to see pretty negligible

Looking down the road, an idiomatic library should follow a similar
pattern to the Go "encoding/json" package.   That shouldn't be too
difficult.  They only real barrier is time ;-)   I currently have a
task at hand and have enough pieces to accomplish it.   I will circle
back on this though as I get a little more comfort with Go idioms and

I wanted to share the above though as I view these quick results as promising.

p.s. I also tested using C to convert a record to a json *char and
pass that to a go function that unmarshals it into a Go struct.  this
worked fine, but, as one would expects, adds a considerable amount of
overhead - 12 minutes for the same 52 second test noted above.  it
does work though for a quick approach.

On Mar 20, 2014 4:33 PM, "Doug Cutting" <cutting@apache.org> wrote:
> I have not heard of any work on an implementation of Avro in go.  It
> would make a great addition, even if only data file support.
> Doug
> On Sat, Mar 15, 2014 at 5:59 AM, Mike Stanley <mike@mikestanley.org> wrote:
> > Anyone know of any avro libraries for go?   I haven't had much luck finding
> > anything.  Either Cgo or pure go is fine by me.  I'm a long time user of
> > avro and have a considerable amount of data in it. (Avro is our
> > serialization format of choice for all archive data, event logs, and other
> > data stored on s3, and in hdfs).  Go is quickly becoming a core technology
> > in our stack as well and avro support is one of the impeding areas for wider
> > adoption.
> >
> > Worse case scenario this may be something I take on.  I'd much rather pick
> > up where someone else left of though.    I dont need any RPC functionality.
> > Just read/write (with compression support).

View raw message