arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Ni <niyue....@gmail.com>
Subject Re: pyarrow + pybind11 segfault under Linux
Date Thu, 27 Aug 2020 03:02:54 GMT
Could anyone shed some light on this one? Any help is appreciated.

All the status are ok and the table can be validated without any issue, but
for some reason, only a partial of the wrapped table object can be accessed
in Python without triggering segfault.

On Mon, Aug 24, 2020 at 8:48 AM Yue Ni <niyue.com@gmail.com> wrote:

> Thanks Wes. I should make it more clear, and I did check the status and
> validate the table previously, and the table can be validated successfully
> by calling `ValidateFull`. Previously I removed them all but in order to
> make the demonstration code shorter.
>
> Here is the same code with more status checked and info logged:
> ```
> pybind11::object generate(const int32_t count) {
>   _logger->info("generate_table");
>   shared_ptr<arrow::Array> array;
>   arrow::Int64Builder builder;
>   for (auto i = 0; i < count; i++) {
>     auto status = builder.Append(i);
>     if (!status.ok()) {
>       _logger->error("failed_to_append_value");
>     }
>   }
>   auto array_status = builder.Finish(&array);
>   if (!array_status.ok()) {
>     _logger->error("failed_to_build_array");
>   }
>
>   auto record_batch = RecordBatch::Make(
>       arrow::schema(vector{arrow::field("int_value", arrow::int64())}),
> count, vector{array});
>   auto table =
> arrow::Table::FromRecordBatches(vector{record_batch}).ValueOrDie();
>   auto table_status = table->ValidateFull();
>   if (!table_status.ok()) {
>     _logger->error("failed_to_validate_table");
>   } else {
>     _logger->info("table_validated_successfully");
>   }
>   auto result = arrow::py::import_pyarrow();
>   if (result == 0) {
>     _logger->info("pyarrow_successfully_imported");
>   } else {
>     _logger->error("failed_to_import_pyarrow");
>   }
>   auto wrapped_table =
> pybind11::reinterpret_borrow<pybind11::object>(pybind11::handle(arrow::py::wrap_table(table)));
>   return wrapped_table;
> ```
>
> Here is the output when I run it under Debian Bullseye (inside a Docker
> container) + Python 3.8.5:
> ```
> >>> table = binding.generate(100)
> [2020-08-24T08:29:27.035650+08:00] [info] [2172] [2172] [pyland]
> msg="generate_table"
> [2020-08-24T08:29:27.040507+08:00] [info] [2172] [2172] [pyland]
> msg="table_validated_successfully"
> [2020-08-24T08:29:27.312662+08:00] [info] [2172] [2172] [pyland]
> msg="pyarrow_successfully_imported"
> >>> print(table.num_rows)
> 100
> >>> print(table.shape)
> (100, 1)
> >>> print(table.num_columns)
> 1
> >>> print(table.column_names)
> ['']
> >>> print(table.columns)
> Segmentation fault
> ```
>
> I think the problem may happen in the last two lines, either wrapping the
> table is problematic or converting the wrapped table into a python object
> is problematic, but I am far from understanding what happens under the
> hood, and there could be other reasons I am missing.
>
> On Sun, Aug 23, 2020 at 11:07 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> There are a lot of unchecked Statuses in your code. I would suggest
>> checking them all and additionally adding a (checked!) call to
>> Validate() or ValidateFull() to make sure that everything is well
>> formed (it seems like it should be, but this is a pre-requisite before
>> debugging further)
>>
>> On Sun, Aug 23, 2020 at 1:27 AM Yue Ni <niyue.com@gmail.com> wrote:
>> >
>> > Hi there,
>> >
>> > I tried to create a Python binding our Apache Arrow C++ based program,
>> and used pybind11 and pyarrow wrapping code to do it. For some reason, the
>> code works on macOS however it causes segfault under Linux.
>> >
>> > I created a minimum test case to reproduce this behavior, is there
>> anyone who can help to take a look at what may go wrong here?
>> >
>> > Here is the C++ code for creating the binding (it simply generates a
>> fixed size array and puts it into record batch and then creates a table)
>> > ```
>> > pybind11::object generate(const int32_t count) {
>> >   shared_ptr<arrow::Array> array;
>> >   arrow::Int64Builder builder;
>> >   for (auto i = 0; i < count; i++) {
>> >     auto _ = builder.Append(i);
>> >   }
>> >   auto _ = builder.Finish(&array);
>> >   auto record_batch = RecordBatch::Make(
>> >       arrow::schema(vector{arrow::field("int_value", arrow::int64())}),
>> count, vector{array});
>> >   auto table =
>> arrow::Table::FromRecordBatches(vector{record_batch}).ValueOrDie();
>> >   auto result = arrow::py::import_pyarrow();
>> >   auto wrapped_table = pybind11::reinterpret_borrow<pybind11::object>(
>> >       pybind11::handle(arrow::py::wrap_table(table)));
>> >   return wrapped_table;
>> > }
>> > ```
>> >
>> > Here is the python code that uses the binding (it calls the binding to
>> generate a 100-length single column table, and print the number of rows and
>> table schema).
>> > ```
>> > table = binding.generate(100)
>> > >>> print(table.num_rows) # this works correctly
>> > 100
>> > >>> print(table.shape) # this works correctly
>> > (100, 1)
>> > >>> print(table.num_columns) # this works correctly
>> > 1
>> > >>> print(table.column_names) # this prints an empty list, which is
>> incorrect, but the program still runs
>> > ['']
>> > >>> print(table.columns) # this causes the segfault
>> > Segmentation fault (core dumped)
>> > ```
>> >
>> > The same code works completely fine and correct on macOS (Apple clang
>> 11, Python 3.7.5, arrow 1.0.0 C++ lib, pyarrow 1.0.0), but it doesn't work
>> on Debian bullseye (gcc 10.2.0, Python 3.8.5, arrow 1.0.1 C++ lib, pyarrow
>> 1.0.1). I tried switching to some combinations of Python 3.7.5 and
>> arrow/pyarrow 1.0.0 as well, but none of them works for me.
>> >
>> > I got the core dump and use gdb for some simple debugging, and it seems
>> the segfault happened when pyarrow tried to call `pyarrow_wrap_data_type`
>> when doing `Field.init`.
>> >
>> > Here is the core dump:
>> > ```
>> > [Thread debugging using libthread_db enabled]
>> > Using host libthread_db library
>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> > Core was generated by `python3'.
>> > Program terminated with signal SIGSEGV, Segmentation fault.
>> > #0  0x00007f604ea484cc in __pyx_f_7pyarrow_3lib_pyarrow_wrap_data_type
>> () from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > [Current thread is 1 (Thread 0x7f60550bc740 (LWP 2205))]
>> > (gdb) where
>> > #0  0x00007f604ea484cc in __pyx_f_7pyarrow_3lib_pyarrow_wrap_data_type
>> () from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #1  0x00007f604eb05df0 in
>> __pyx_f_7pyarrow_3lib_5Field_init(__pyx_obj_7pyarrow_3lib_Field*,
>> std::shared_ptr<arrow::Field> const&) () from
>> /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #2  0x00007f604ea35d80 in __pyx_f_7pyarrow_3lib_pyarrow_wrap_field ()
>> from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #3  0x00007f604ea68c8f in
>> __pyx_pw_7pyarrow_3lib_6Schema_28_field(_object*, _object*) () from
>> /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #4  0x00007f604ea69595 in __Pyx_PyObject_CallOneArg(_object*, _object*)
>> () from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #5  0x00007f604ea755de in
>> __pyx_pw_7pyarrow_3lib_6Schema_7__getitem__(_object*, _object*) () from
>> /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #6  0x00007f604ea476f0 in __pyx_sq_item_7pyarrow_3lib_Schema(_object*,
>> long) () from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #7  0x00007f604ead554e in
>> __pyx_pw_7pyarrow_3lib_5Table_55_column(_object*, _object*) () from
>> /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #8  0x00007f604ea69595 in __Pyx_PyObject_CallOneArg(_object*, _object*)
>> () from /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #9  0x00007f604ea8c8df in
>> __pyx_getprop_7pyarrow_3lib_5Table_columns(_object*, void*) () from
>> /usr/local/lib/python3.8/dist-packages/pyarrow/
>> lib.cpython-38-x86_64-linux-gnu.so
>> > #10 0x000000000051bafa in ?? ()
>> > #11 0x0000000000518b3f in _PyObject_GenericGetAttrWithDict ()
>> > #12 0x0000000000505509 in _PyEval_EvalFrameDefault ()
>> > #13 0x0000000000503b25 in _PyEval_EvalCodeWithName ()
>> > #14 0x00000000005ce503 in PyEval_EvalCode ()
>> > #15 0x00000000005ec461 in ?? ()
>> > #16 0x00000000005e7a5f in ?? ()
>> > #17 0x000000000045b2dc in ?? ()
>> > #18 0x000000000045aee5 in PyRun_InteractiveLoopFlags ()
>> > #19 0x00000000005ef745 in PyRun_AnyFileExFlags ()
>> > #20 0x000000000044ddde in ?? ()
>> > #21 0x00000000005c3899 in Py_BytesMain ()
>> > #22 0x00007f60550e5cca in __libc_start_main (main=0x5c3860 <main>,
>> argc=1, argv=0x7fffc5992db8, init=<optimized out>, fini=<optimized out>,
>> rtld_fini=<optimized out>, stack_end=0x7fffc5992da8) at
>> ../csu/libc-start.c:308
>> > #23 0x00000000005c379a in _start ()
>> > ```
>> >
>> > Due to the complexity of the C++/Python conversion, I've no idea if
>> this is an issue of pyarrow or Cython or pybind11 in this case. Is there
>> anyone who can shed some light on it or how I can troubleshoot such an
>> issue? Thanks.
>> >
>> > Regards,
>> > Yue
>> >
>>
>

Mime
View raw message