arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Grossman <jma...@gmail.com>
Subject returning an arrow::Array from C++ to Python through pybind11
Date Fri, 14 Aug 2020 21:24:10 GMT
Hi all,

I've written a C++ library that uses arrow for its in-memory data
structures. I'd like to also add some Python APIs on top of this
library using pybind11, so that I can grab pyarrow wrappers of my C++
Arrow Arrays, convert them to numpy arrays, and then pass them in to
scikit-learn (or other Python libraries) without copying data around.

As an example, on the C++ side I've got a 1D vector class that wraps
an arrow array and has a method to convert the arrow array into a
pyarrow array:

        PyObject* get_local_pyarrow_array() {
            return arrow::py::wrap_array(std::dynamic_pointer_cast<arrow::Array,
                arrow::FixedSizeBinaryArray>(_arr));
        }

I've got some pybind11 registration code that registers the class and
that method:

    py::class_<ShmemML1D<double>>(m, "ShmemML1DD")
        .def("get_local_pyarrow_array",
                &ShmemML1D<double>::get_local_pyarrow_array);

And then I've got some Python code that calls this method (and which I
hope gets a pyarrow array as the return value):

arr = dist_arr.get_local_pyarrow_array()

Note that these are arrays that I'm constructing in C++ code and want
to expose to Python, so I don't already have a pre-existing pyarrow
instance to use. I'm trying to create a new one around my C++ arrays,
so that Python code can start manipulating those C++ arrays.

When I build and run all this, I just get told "Unable to convert
function return value to a Python type!":

Traceback (most recent call last):
  File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
15, in <module>
    random.rand(vec)
  File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
    arr = dist_arr.get_local_pyarrow_array()
TypeError: Unable to convert function return value to a Python type!
The signature was
        (self: shmem_ml.core.ShmemML1DD) -> _object
Traceback (most recent call last):
  File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
15, in <module>
    random.rand(vec)
  File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
    arr = dist_arr.get_local_pyarrow_array()
TypeError: Unable to convert function return value to a Python type!
The signature was
        (self: shmem_ml.core.ShmemML1DD) -> _object

I'm new to pybind11, so I suspect this may not be a problem with my
arrow usage as much as it is with my pybind11 usage. I wanted to ask
if there's a better way to be doing this that's recommended for
pyarrow applications. It seems there are cython examples in the docs,
would the suggestion be to drop pybind11 and write a wrapper of my C++
class in cython?

Thanks for any suggestions,

Max

Mime
View raw message