arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joris Van den Bossche (Jira)" <>
Subject [jira] [Created] (ARROW-7066) [Python] support returning ChunkedArray from __arrow_array__ ?
Date Tue, 05 Nov 2019 12:53:00 GMT
Joris Van den Bossche created ARROW-7066:

             Summary: [Python] support returning ChunkedArray from __arrow_array__ ?
                 Key: ARROW-7066
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche
             Fix For: 1.0.0

The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can define how they
should be converted to a pyarrow Array (similar to numpy's {{\_\_array\_\_}}). This is then
also used to support converting pandas DataFrames with columns using pandas' ExtensionArrays
to a pyarrow Table (if the pandas ExtensionArray, such as nullable integer type, implements
this {{\_\_arrow_array\_\_}} method).

This last use case could also be useful for fletcher (,
a package that implements pandas ExtensionArrays that wrap pyarrow arrays, so they can be
stored as is in a pandas DataFrame).  
However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a pandas DataFrame
(to have a better mapping with a Table, where the columns also consist of chunked arrays).
While we currently require that the return value of {{\_\_arrow_array\_\_}} is a pyarrow.Array.

So I was wondering: could we relax this constraint and also allow ChunkedArray as return value?

However, this protocol is currently called in the {{pa.array(..)}} function, which probably
should keep returning an Array (and not ChunkedArray in certain cases).

cc [~uwe]

This message was sent by Atlassian Jira

View raw message