arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armin Berres (JIRA)" <>
Subject [jira] [Created] (ARROW-3651) [Python] Datetimes from non-DateTimeIndex cannot be deserialized
Date Tue, 30 Oct 2018 09:24:00 GMT
Armin Berres created ARROW-3651:

             Summary: [Python] Datetimes from non-DateTimeIndex cannot be deserialized
                 Key: ARROW-3651
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.11.1
            Reporter: Armin Berres

Given an index which contains datetimes but is no DateTimeIndex writing the file works but
reading back fails.
df = pd.DataFrame(1, index=pd.MultiIndex.from_arrays([[1,2],[3,4]]), columns=[pd.to_datetime("2018/01/01")])

# columns index is no DateTimeIndex anymore
df = df.reset_index().set_index(['level_0', 'level_1'])

table = pa.Table.from_pandas(df)
pq.write_table(table, 'test.parquet')


results in 
KeyError                                  Traceback (most recent call last)
~/venv/mpptool/lib/python3.7/site-packages/pyarrow/ in _pandas_type_to_numpy_type(pandas_type)
    676     try:
--> 677         return _pandas_logical_type_map[pandas_type]
    678     except KeyError:

KeyError: 'datetime'

The created schema:

2018-01-01 00:00:00: int64
level_0: int64
level_1: int64
{b'pandas': b'{"index_columns": ["level_0", "level_1"], "column_indexes": [{"n'
            b'ame": null, "field_name": null, "pandas_type": "datetime", "nump'
            b'y_type": "object", "metadata": null}], "columns": [{"name": "201'
            b'8-01-01 00:00:00", "field_name": "2018-01-01 00:00:00", "pandas_'
            b'type": "int64", "numpy_type": "int64", "metadata": null}, {"name'
            b'": "level_0", "field_name": "level_0", "pandas_type": "int64", "'
            b'numpy_type": "int64", "metadata": null}, {"name": "level_1", "fi'
            b'eld_name": "level_1", "pandas_type": "int64", "numpy_type": "int'
            b'64", "metadata": null}], "pandas_version": "0.23.4"}'}

This message was sent by Atlassian JIRA

View raw message