arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <>
Subject Re: Avro -> TensorFlow
Date Wed, 29 Jul 2020 04:19:53 GMT
Hi Cindy,
I haven't tried this but the best guidance I can give is the following:
1.   Create an appropriate decoder using Avro's DecoderFactory [1]
2.  Construct an arrow adapter with a schema and the decoder.  There are
some examples in the unit tests [2].
3.  Adapt the method described by Uwe describes in his blog-post about JDBC
[3] to using the adapter.  From there I think you can use the tensorflow
APIs (sorry I've not used them but my understanding is TF only has python

If number 3 doesn't work for you due to environment constraints, you could
write out an Arrow file using the file writer [4] and try to see if
examples listed in [5] help.

 ne thing to note is, I believe the Avro adapter library currently has an
impedance mismatch with the ArrowFileWriter.  The Adapter returns an new
VectorStreamRoot per batch, and the Writer libraries are designed around
loading/unloading a single VectorSchemaRoot.  I think the method with the
least overhead for transferring is the data is to create a VectorUnloader
[6] per VectorSchemaRoot, convert it to a record batch and then load it
into the Writer's VectorSchemaRoot.  This will unfortunately cause some
amount of memory churn due to extra allocations.

There is a short overview of working with Arrow generally available at [7]

Hope this helps,


On Tue, Jul 28, 2020 at 9:06 AM Cindy McMullen <>

> Hi -
> I've got a byte[] of serialized Avro, along w/ the Avro Schema (*.avsc
> file or SpecificRecord Java class) that I'd like to send to TensorFlow as
> input tensors, preferably via Arrow.  Can you suggest some existing
> adapters or code patterns (Java or Scala) that I can use?
> Thanks -
> -- Cindy

View raw message