arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ARF (Jira)" <j...@apache.org>
Subject [jira] [Created] (ARROW-6486) [Python] Allow subclassing & monkey-patching of Table
Date Sun, 08 Sep 2019 13:38:00 GMT
ARF created ARROW-6486:
--------------------------

             Summary: [Python] Allow subclassing & monkey-patching of Table
                 Key: ARROW-6486
                 URL: https://issues.apache.org/jira/browse/ARROW-6486
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: ARF


Currently, many classes in ``pyarrow`` behave strangely to the Python user: they are neither
subclassable not monkey-patchable.

 

{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}

The factory method did not return an instance of our subclass...

Never mind, let's monkey-patch {{Table}}:

{{}}

{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class 'pyarrow.lib.Table'>}}
{{}}

 

OK, that did not work either.

Let's be sneaky:

{{>>> table.__class__ = MyTable}}
{{Traceback (most recent call last):}}
{{ File "<stdin>", line 1, in <module>}}
{{TypeError: __class__ assignment only supported for heap types or ModuleType subclasses}}
{{>>>}}

 

There is currently no way to modify or extend the behaviour of a {{Table}} instance. Users
can use only what {{pyarrow}} provides out of the box. - This is likely to be a source of
frustration for many python users.

 

The attached PR remedies this for the {{Table}} class:

{{>>> import pyarrow as pa}}
{{>>> class MyTable(pa.Table):}}
{{... pass}}
{{...}}
{{>>> table = MyTable.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}
{{>>> pa.TableOriginal = pa.Table}}
{{>>> pa.Table = MyTable}}
{{>>> table = pa.Table.from_arrays([], [])}}
{{>>> type(table)}}
{{<class '__main__.MyTable'>}}
{{>>>}}

 

Ideally, these modifications would be extended to the other cython-defined classes of {{pyarrow}},
but given that Table is likely to be the interface that most users begin their interaction
with, I thought this would be a good start.

Keeping the changes limited to a single class should also keep merge conflicts manageable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Mime
View raw message