arrow-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeroen Hoekx (Jira)" <>
Subject [jira] [Created] (ARROW-12622) [Python] Segfault when reading CSV inside Flight server
Date Sat, 01 May 2021 13:01:00 GMT
Jeroen Hoekx created ARROW-12622:

             Summary: [Python] Segfault when reading CSV inside Flight server
                 Key: ARROW-12622
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 4.0.0
         Environment: Arch Linux 5.11.16-arch1-1
Originally found on GitHub Actions Ubuntu 20.04.2
Python 3.8 and Python 3.9
            Reporter: Jeroen Hoekx
         Attachments:, test.csv

Using pyarrow.csv.read_csv inside a Flight server results in a segfault. This did not happen
in pyarrow 3.0.0.

The [CI build of a library we're building failed|]
and made us aware of the issue.

Attached, a CSV and Python server/client can be found that demonstrates the problem.
 * Run the server with `python server`.
 * Run the client with `python client`. The server segfaults with 'Segmentation fault
(core dumped)'.

The crash does not happen when just reading the CSV (`python`).

This is the stacktrace generated by `coredumpctl debug` of a debug build of commit 2746266addddf71d20a4fe49381497b894c4d15c:
#0  0x00007f9275cffedc in __gnu_cxx::__atomic_add (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:55

#1  __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:96

#2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x8)

    at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:142

#3  0x00007f9275cfe0a5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count
    __r=...) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:740

#4  0x00007f9275cfd01f in std::__shared_ptr<arrow::StopSourceImpl, (__gnu_cxx::_Lock_policy)2>::__shared_ptr

    this=0x7f92735a2770) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:1181

#5  0x00007f9275cfd045 in std::shared_ptr<arrow::StopSourceImpl>::shared_ptr (this=0x7f92735a2770)

    at /usr/include/c++/10.2.0/bits/shared_ptr.h:149

#6  0x00007f9275cfd06b in arrow::StopToken::StopToken (this=0x7f92735a2770)

    at /home/jeroen/dev/python/apache-arrow/dist/include/arrow/util/cancel.h:57

#7  0x00007f9275ce96f7 in __pyx_pf_7pyarrow_4_csv_read_csv (__pyx_self=0x0, __pyx_v_input_file=0x7f929e9f28b0,
    __pyx_v_read_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_parse_options=0x7f929f49ee80
    __pyx_v_convert_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_memory_pool=0x7f929f49ee80

    at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14208

#8  0x00007f9275ce8b92 in __pyx_pw_7pyarrow_4_csv_1read_csv (__pyx_self=0x0, __pyx_args=0x7f929ea64be0,

    at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14036

#9  0x00007f929f22cf98 in ?? () from /usr/lib/

#10 0x00007f929f22d5f8 in _PyObject_MakeTpCall () from /usr/lib/

Based on my limited understanding of the code, it looks like the error is here:
    with SignalStopHandler() as stop_handler:
                io_context = CIOContext(
                    (<StopToken> stop_handler.stop_token).stop_token)
Where `stop_token` is null, because the `SignalStopHandler` had an empty list of signals on
creation ([|]
        if (signal_handlers_enabled and
                threading.current_thread() is threading.main_thread()):
            self._signals = [
                sig for sig in (signal.SIGINT, signal.SIGTERM)
                if signal.getsignal(sig) not in (signal.SIG_DFL,
                                                 signal.SIG_IGN, None)]
        if not self._signals.empty():
            self._stop_token = StopToken()
            self._enabled = True

This message was sent by Atlassian Jira

View raw message