arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Sachs <jmsa...@gmail.com>
Subject bug? pyarrow.Table.from_pydict does not handle binary type correctly with embedded 00 bytes?
Date Wed, 04 Nov 2020 23:05:13 GMT
It looks like pyarrow.Table.from_pydict() cuts off binary data after an embedded 00 byte. Is
this a known bug?

(py3) C:\>python
Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc.
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pyarrow as pa
>>>
>>> data = np.array([b'', b'', b'', b'Foo!!', b'Bar!!',
..        b'\x00Baz!', b'half\x00baked', b''], dtype='|S13')
>>> t = pa.Table.from_pydict({'data':data})
>>> t.to_pandas()
       data
0       b''
1       b''
2       b''
3  b'Foo!!'
4  b'Bar!!'
5       b''
6   b'half'
7       b''
>>> import pandas as pd
>>> pd.DataFrame(data)
                  0
0               b''
1               b''
2               b''
3          b'Foo!!'
4          b'Bar!!'
5       b'\x00Baz!'
6  b'half\x00baked'
7               b''
>>>

Mime
View raw message