arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Spott <andrew.sp...@gmail.com>
Subject Fwd: RecordBatch with Tensors/Arrays
Date Wed, 19 Jun 2019 14:13:41 GMT
I was told to post this here, rather than as an issue on Github.

====

I'm looking to serialize data that looks something like this:

```
record<n1> = { "predicted": <tensor with shape n1, m>,
                          "truth": <tensor with shape n1, m>,
                          "loss": <double>,
                          "index": <array with shape n1>}

data = [
    pa.array([record<n1>, record<n2>, record<n3>]),
    pa.array([<float>, <float>, <float>])
    pa.array([<float>, <float>, <float>])
]

batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2'])
```

But I'm not sure how to do that, or even if what I'm trying to do is the
right way to do it.

What is the difference between `pa.array` and `pa.list_`?  This formulation
is an array of structs, but is the struct of arrays formulation of this
possible? i.e.:

```
data = [
    pa.array([ <tensor with shape n1, m>,  <tensor with shape n2, m>,
 <tensor with shape n3, m>]),
    pa.array([ <tensor with shape n1, m>,  <tensor with shape n2, m>,
 <tensor with shape n3, m>]),
    pa.array([<float>, <float>, <float>]),
...
]
```

Which doesn't currently work.  It seems that there is a separation between
'1d arraylike' datatypes and 'pythonlike' datatypes (and 'nd arraylike'
datatypes), so I can't have a struct of an array.

-Andrew

Mime
View raw message