arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cutler <>
Subject Re: tensorflow-io Arrow Datasets and thoughts on support for tensor columns
Date Wed, 27 Mar 2019 18:18:24 GMT
Thanks Wes!  I am most interested in the last option, adding Tensor as a
logical type, but if it makes sense to embed as a BinaryArray for a first
step then that would still be useful too.  I'll work on a design doc with a
use case and report back. I know there are a lot of different efforts going
on right now and I hate to pile more on, but appreciate time for feedback
and review.

Best Regards,

On Mon, Mar 25, 2019 at 2:36 PM Wes McKinney <> wrote:

> hi Bryan,
> I agree this would be useful to work out.
> There's a few options:
> * Sending multiple tensors as a sequence of encapsulated IPC messages
> (as described in
> There is no conflict with the columnar streaming protocol that
> prevents this
> * Embedding tensors in BinaryArray columns in some way (e.g. as an
> ExtensionType, which we have now in C++)
> * Adding Tensor as a logical type (this is essentially ARROW-1614)
> I would like to understand the use cases more precisely. Perhaps you
> can write a design document that describes the use cases in detail and
> proposed solution? This doesn't fall anywhere on my list of 2019
> priorities but I'm happy to give feedback on discussions and review
> PRs where relevant.
> In conjunction with embedding sequences of tensors in a BinaryArray,
> we would probably need to first develop a LargeBinaryArray with 64-bit
> offsets, so that buffers can be arbitrarily large (well, within 64-bit
> address space at least)
> - Wes
> On Fri, Mar 22, 2019 at 1:24 PM Bryan Cutler <> wrote:
> >
> > Hi All,
> >
> > Recently I have been working with the TensorFlow SIG-IO community to
> introduce Apache Arrow based Datasets for bringing Arrow data into
> TensorFlow. SIG-IO is a community maintained repository focused on
> input/output support for TF, see (a lot
> of formats from contrib/ ended up here).  Since it is community driven, if
> anyone is interested, participation is highly encouraged!
> >
> > I'm bringing this up for a couple reasons. First, I want to make sure
> that this stays in-line with any related efforts within the Arrow project
> and welcome any feedback. Secondly, the initial response has been great and
> people are excited about using Arrow and looking to use it in other areas
> of TF, but I've noticed there has been some confusion about how Arrow
> handles tensor data. Specifically, it gets assumed that tensors could be
> part of a RecordBatch and could be readily used in an Arrow stream.
> >
> > I know we have talked about making tensors a logical type for columnar
> data before in
> and there is a JIRA ARROW-1614, but since there is work needed to fully
> support the current spec for 1.0, I don't think it has moved forward much.
> I'm wondering if maybe now is a better time to start working on this?  I
> think having built-in support for tensor columns would really help to
> increase adoption of Arrow in frameworks that use tensor data. What are
> other people's thoughts?
> >
> > Best Regards,
> > Bryan
> >

View raw message