arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Binet <seb.bi...@gmail.com>
Subject Re: Go / Python Sharing
Date Mon, 08 Jul 2019 08:51:36 GMT
Having not yet looked at the amount of work implementing plasma in Go is,
you may just ignore me :) but I think implementing a shared memory Go
allocator to be easier (as in less human hours to implement).

Another option could be to have a CGo package exposing a set of functions
(compiled as a C shlib) that call into the Go based arrow package to do
what you need.

-s

sent from my droid

On Mon, Jul 8, 2019, 10:30 Clive Cox <cc@seldon.io> wrote:

>
> Thanks for all the informative replies.
>
>  In our case the Python and Go would be in separate processes. So for that
> as I understand the conversation so far the options are:
>
>    - Use of Plasma. This requires pending updates for the current Go
>    implementation? (happy to help here)
>    - IPC - but this will require sending the data over the wire?
>
> Thanks,
>
>  Clive
>
>
>
>
>
> On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn <uwelk@xhochy.com> wrote:
>
>> Hello all,
>>
>> I've been using the in-process sharing method for quite some time for the
>> Python<->Java interaction and I really like the ease of doing it all in the
>> same process. Especially as this avoids any memory-copy or shared memory
>> handling. This is really useful for the case where you only want to call a
>> single routine in another language.
>>
>> Thus I would really like to see this also implemented for Go (and Rust)
>> so that one can build custom UDFs in it and use them from Python code. The
>> pre-conditions for this are that we have IPC tests that verify that both
>> libraries use the exact same memory layout and that we can pull out the
>> memory pointer from the Go Arrow structures into the C++ memory structures
>> and also keep a reference between both so that memory tracking doesn't
>> deallocate the underlying memory. For that we have in Python the
>> pyarrow.foreign_buffer
>> https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292
>>  function.
>>
>> For the Go<->Python case, I would though recommend to solve this as a
>> Go<->C++ interface as this would make interaction for all the libraries
>> based on the C++ one (like R, Ruby, ..) possible.
>>
>> Uwe
>>
>> On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote:
>>
>> My bad, IPC in Go seems to be implemented -
>> https://issues.apache.org/jira/browse/ARROW-3679
>>
>> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <seb.binet@gmail.com>
>> wrote:
>>
>> As far as i know, Go does support IPC (as in the arrow IPC format)
>>
>> Another option which has been discussed at some point was to have a
>> shared memory allocator so the arrow arrays could be shared between
>> processes.
>>
>> I haven't looked in details what implementing plasma support for Go would
>> need on the Go side...
>>
>> -s
>>
>>
>> sent from my droid
>>
>> On Mon, Jul 8, 2019, 08:29 Miki Tebeka <miki@353solutions.com> wrote:
>>
>> Hi Clive,
>>
>> I'd like to understand the high level design for a system where a Go
>> process can communicate an Arrow data structure to a python process on the
>> same CPU
>>
>> I see two options
>> - Different processes with hared memory, probably using plasma
>> - Same process. The either Go uses Python shared library or Python using
>> Go compiled to shared library (-build-mode=c-shared)
>>
>>
>> - and for the python process to zero-copy gain access to that data,
>> change it and inform the Go process.  This is low latency so I don't want
>> to save to file.
>>
>> IIRC arrow is not built for mutation. You build an Array/Table once and
>> then use it.
>>
>> Would this need the use of Plasma as a zero-copy store for the data
>> between the two processes or do I need to use IPC? But with IPC you are
>> transferring the data which is not needed in this case as I understand it.
>> Any pointers to examples would be appreciated.
>>
>> See above about options. Note that currently the Go arrow implementation
>> doesn't support IPC or plasma (though it's in the works).
>>
>> Yoni & I are working on another option which is using the C++ arrow
>> library from Go. It does support plasma and since it uses the same
>> underlying C++ library that Python does you'll be able to pass a pointer
>> around without copying data. It's at very alpha-ish state but you're more
>> than welcomed to give it a try - https://github.com/353solutions/carrow
>>
>> Happy hacking,
>> Miki
>>
>>
>>
>
> --
>
>
> <https://www.seldon.io>
> Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP
> (map <https://goo.gl/maps/BbJgCdNso5Q2>). Registered in England & Wales,
> No. 9188032. VAT GB 258424587. Privacy Policy
> <https://www.seldon.io/privacy/>.
>

Mime
View raw message