arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: returning an arrow::Array from C++ to Python through pybind11
Date Sat, 15 Aug 2020 21:07:01 GMT
Are you using the arrow::py::wrap_array function? You can follow some
other successful pybind11 projects that use the pyarrow C/C++ API. You
have to also call the import_pyarrow() function

https://github.com/blue-yonder/turbodbc/blob/0369d1329a0ea39982a4d8d169b8dd3f473e6689/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp#L338

On Fri, Aug 14, 2020 at 4:24 PM Max Grossman <jmaxg3@gmail.com> wrote:
>
> Hi all,
>
> I've written a C++ library that uses arrow for its in-memory data
> structures. I'd like to also add some Python APIs on top of this
> library using pybind11, so that I can grab pyarrow wrappers of my C++
> Arrow Arrays, convert them to numpy arrays, and then pass them in to
> scikit-learn (or other Python libraries) without copying data around.
>
> As an example, on the C++ side I've got a 1D vector class that wraps
> an arrow array and has a method to convert the arrow array into a
> pyarrow array:
>
>         PyObject* get_local_pyarrow_array() {
>             return arrow::py::wrap_array(std::dynamic_pointer_cast<arrow::Array,
>                 arrow::FixedSizeBinaryArray>(_arr));
>         }
>
> I've got some pybind11 registration code that registers the class and
> that method:
>
>     py::class_<ShmemML1D<double>>(m, "ShmemML1DD")
>         .def("get_local_pyarrow_array",
>                 &ShmemML1D<double>::get_local_pyarrow_array);
>
> And then I've got some Python code that calls this method (and which I
> hope gets a pyarrow array as the return value):
>
> arr = dist_arr.get_local_pyarrow_array()
>
> Note that these are arrays that I'm constructing in C++ code and want
> to expose to Python, so I don't already have a pre-existing pyarrow
> instance to use. I'm trying to create a new one around my C++ arrays,
> so that Python code can start manipulating those C++ arrays.
>
> When I build and run all this, I just get told "Unable to convert
> function return value to a Python type!":
>
> Traceback (most recent call last):
>   File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
> 15, in <module>
>     random.rand(vec)
>   File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
>     arr = dist_arr.get_local_pyarrow_array()
> TypeError: Unable to convert function return value to a Python type!
> The signature was
>         (self: shmem_ml.core.ShmemML1DD) -> _object
> Traceback (most recent call last):
>   File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
> 15, in <module>
>     random.rand(vec)
>   File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
>     arr = dist_arr.get_local_pyarrow_array()
> TypeError: Unable to convert function return value to a Python type!
> The signature was
>         (self: shmem_ml.core.ShmemML1DD) -> _object
>
> I'm new to pybind11, so I suspect this may not be a problem with my
> arrow usage as much as it is with my pybind11 usage. I wanted to ask
> if there's a better way to be doing this that's recommended for
> pyarrow applications. It seems there are cython examples in the docs,
> would the suggestion be to drop pybind11 and write a wrapper of my C++
> class in cython?
>
> Thanks for any suggestions,
>
> Max

Mime
View raw message