arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn" <uw...@xhochy.com>
Subject Re: Go / Python Sharing
Date Mon, 08 Jul 2019 08:04:45 GMT
Hello all,

I've been using the in-process sharing method for quite some time for the Python<->Java
interaction and I really like the ease of doing it all in the same process. Especially as
this avoids any memory-copy or shared memory handling. This is really useful for the case
where you only want to call a single routine in another language.

Thus I would really like to see this also implemented for Go (and Rust) so that one can build
custom UDFs in it and use them from Python code. The pre-conditions for this are that we have
IPC tests that verify that both libraries use the exact same memory layout and that we can
pull out the memory pointer from the Go Arrow structures into the C++ memory structures and
also keep a reference between both so that memory tracking doesn't deallocate the underlying
memory. For that we have in Python the pyarrow.foreign_buffer https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292
function.

For the Go<->Python case, I would though recommend to solve this as a Go<->C++
interface as this would make interaction for all the libraries based on the C++ one (like
R, Ruby, ..) possible.

Uwe

On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote:
> My bad, IPC in Go seems to be implemented - https://issues.apache.org/jira/browse/ARROW-3679
> 
> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <seb.binet@gmail.com> wrote:
>> As far as i know, Go does support IPC (as in the arrow IPC format)
>> 
>> Another option which has been discussed at some point was to have a shared memory
allocator so the arrow arrays could be shared between processes.
>> 
>> I haven't looked in details what implementing plasma support for Go would need on
the Go side...
>> 
>> -s
>> 
>> 
>> sent from my droid
>> 
>> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <miki@353solutions.com> wrote:
>>> Hi Clive,
>>> 
>>>> I'd like to understand the high level design for a system where a Go process
can communicate an Arrow data structure to a python process on the same CPU
>>> I see two options
>>> - Different processes with hared memory, probably using plasma
>>> - Same process. The either Go uses Python shared library or Python using Go compiled
to shared library (-build-mode=c-shared)
>>> 
>>>> - and for the python process to zero-copy gain access to that data, change
it and inform the Go process. This is low latency so I don't want to save to file.
>>> IIRC arrow is not built for mutation. You build an Array/Table once and then
use it.
>>> 
>>>> Would this need the use of Plasma as a zero-copy store for the data between
the two processes or do I need to use IPC? But with IPC you are transferring the data which
is not needed in this case as I understand it. Any pointers to examples would be appreciated.
>>> See above about options. Note that currently the Go arrow implementation doesn't
support IPC or plasma (though it's in the works).
>>> 
>>> Yoni & I are working on another option which is using the C++ arrow library
from Go. It does support plasma and since it uses the same underlying C++ library that Python
does you'll be able to pass a pointer around without copying data. It's at very alpha-ish
state but you're more than welcomed to give it a try - https://github.com/353solutions/carrow
>>> 
>>> Happy hacking,
>>> Miki 

Mime
View raw message