arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wes McKinney (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ARROW-228) [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects
Date Sat, 25 Jun 2016 22:43:37 GMT
Wes McKinney created ARROW-228:
----------------------------------

             Summary: [Python] Create an Arrow-cpp-compatible interface for reading bytes
from Python file-like objects 
                 Key: ARROW-228
                 URL: https://issues.apache.org/jira/browse/ARROW-228
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
            Reporter: Wes McKinney
            Assignee: Wes McKinney


In practice, IO interfaces in PyArrow will need to be bidirectional

- Exposing internal IO interfaces written purely in C++ to Python users as file-like objects

- Exposing Python file-like objects to the C++ IO subsystem

To do this efficiently, we may want to introduce an arrow::Buffer subclass that manages the
lifetime of a PyBytes object in a GIL-safe way (i.e., on destruction, the GIL is acquired
and the object's refcount is decremented). We can still implement a Read method that copies
bytes into some other buffer, after which the PyBytes is immediately destroyed.

Outside of these byte buffer management issues, wrapping a file-like object (having read()
-> bytes, seek(), tell(), and other basic file methods) is fairly straightforward, and
will allow any of the current or upcoming IO adapters to read either from native classes (file
system, HDFS, etc.) or arbitrary Python streams.

To give a concrete example: consider the output of a GET http request -- this can be put in
a {{io.BytesIO}} object and then treated as a first class citizen alongside the native (C++)
IO classes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message