arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion
Date Mon, 09 Apr 2018 15:42:00 GMT
Uwe L. Korn created ARROW-2428:
----------------------------------

             Summary: [Python] Support ExtensionArrays in to_pandas conversion
                 Key: ARROW-2428
                 URL: https://issues.apache.org/jira/browse/ARROW-2428
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Uwe L. Korn
             Fix For: 1.0.0


With the next release of Pandas, it will be possible to define custom column types that back
a {{pandas.Series}}. Thus we will not be able to cover all possible column types in the {{to_pandas}} conversion
by default as we won't be aware of all extension arrays.

To enable users to create {{ExtensionArray}} instances from Arrow columns in the {{to_pandas}} conversion,
we should provide a hook in the {{to_pandas}} call where they can overload the default conversion
routines with the ones that produce their {{ExtensionArray}} instances.

This should avoid additional copies in the case where we would nowadays first convert the
Arrow column into a default Pandas column (probably of object type) and the user would afterwards
convert it to a more efficient {{ExtensionArray}}. This hook here will be especially useful
when you build {{ExtensionArrays}} where the storage is backed by Arrow.

The meta-issue that tracks the implementation inside of Pandas is: https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message