arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yevgeni Litvin (JIRA)" <>
Subject [jira] [Created] (ARROW-5260) [Python][C++] Crash when deserializing from components in a fresh new process
Date Sun, 05 May 2019 05:01:00 GMT
Yevgeni Litvin created ARROW-5260:

             Summary: [Python][C++] Crash when deserializing from components in a fresh new
                 Key: ARROW-5260
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.12.1, 0.13.0, 0.12.0
            Reporter: Yevgeni Litvin

Trying to deserialize a table from component in a fresh new process crashes with sigsegv:
#1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, std::shared_ptr<arrow::Buffer>*)
from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./
#2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, int, _object*, arrow::py::SerializedPyObject*)
() from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./
#3 0x00007fffd6b1cafe in __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*,
_object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/
#4 0x00000000004ad919 in PyCFunction_Call ()
#5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) [clone .constprop.1186]
from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/
#6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/
#7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/
#8 0x00007fffd6ab087f in __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*,
_object*) ()
from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/
#9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
#10 0x0000000000545e34 in ?? ()
#11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
#12 0x0000000000545a51 in ?? ()
#13 0x0000000000546890 in PyEval_EvalCode ()
#14 0x000000000042a9a8 in PyRun_FileExFlags ()
#15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
#16 0x000000000043e0ba in Py_Main ()
#17 0x0000000000421b04 in main ()
 The following snippet can be used to reproduce the issue:
import pickle
import sys

import pandas as pd
import pyarrow as pa

if __name__ == '__main__':
    if sys.argv[1] == 'w':
        df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
        table = pa.Table.from_pandas(df)
        table_serialized = pa.serialize(table)
        table_serialized_components = table_serialized.to_components()
        with open('/tmp/p.pickle', 'wb') as f:
            pickle.dump(table_serialized_components, f)
        print('/tmp/p.pickle written ok')

    if sys.argv[1] == 'r':
        with open('/tmp/p.pickle', 'rb') as f:
            table_serialized_components = pickle.load(f)
        table = pa.deserialize_components(table_serialized_components)

Then run:
$ python w
/tmp/p.pickle written ok

$ python r
Segmentation fault (core dumped){code}
The crash would not occur if you try to serialize unrelated data before the deserialization
(see a commented out line in the reproduction instructions)


This message was sent by Atlassian JIRA

View raw message