Thanks, Micah, for your thoughtful response.  We'll give it a try and let you know how it goes.

-- Cindy

On Tue, Jul 28, 2020 at 10:20 PM Micah Kornfield <emkornfield@gmail.com> wrote:
Hi Cindy,
I haven't tried this but the best guidance I can give is the following:
1.   Create an appropriate decoder using Avro's DecoderFactory [1]
2.  Construct an arrow adapter with a schema and the decoder.  There are some examples in the unit tests [2].
3.  Adapt the method described by Uwe describes in his blog-post about JDBC [3] to using the adapter.  From there I think you can use the tensorflow APIs (sorry I've not used them but my understanding is TF only has python APIs?)

If number 3 doesn't work for you due to environment constraints, you could write out an Arrow file using the file writer [4] and try to see if examples listed in [5] help. 

 ne thing to note is, I believe the Avro adapter library currently has an impedance mismatch with the ArrowFileWriter.  The Adapter returns an new VectorStreamRoot per batch, and the Writer libraries are designed around loading/unloading a single VectorSchemaRoot.  I think the method with the least overhead for transferring is the data is to create a VectorUnloader [6] per VectorSchemaRoot, convert it to a record batch and then load it into the Writer's VectorSchemaRoot.  This will unfortunately cause some amount of memory churn due to extra allocations.

There is a short overview of working with Arrow generally available at [7]

Hope this helps,
Micah

[1] https://avro.apache.org/docs/1.10.0/api/java/org/apache/avro/io/DecoderFactory.html
[2] https://github.com/apache/arrow/blob/master/java/adapter/avro/src/test/java/org/apache/arrow/AvroToArrowIteratorTest.java#L77
[3] https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html
[4]  https://github.com/apache/arrow/blob/fe541e8fad2e6d7d5532e715f5287292c515d93b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileWriter.java
[5] https://blog.tensorflow.org/2019/08/tensorflow-with-apache-arrow-datasets.html
[6] https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java
[7] https://arrow.apache.org/docs/java/

On Tue, Jul 28, 2020 at 9:06 AM Cindy McMullen <cmcmullen@twitter.com> wrote:
Hi -

I've got a byte[] of serialized Avro, along w/ the Avro Schema (*.avsc file or SpecificRecord Java class) that I'd like to send to TensorFlow as input tensors, preferably via Arrow.  Can you suggest some existing adapters or code patterns (Java or Scala) that I can use?  

Thanks -

-- Cindy