From user-return-153-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jul 8 08:29:01 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BD2D3180665 for ; Mon, 8 Jul 2019 10:29:00 +0200 (CEST) Received: (qmail 20257 invoked by uid 500); 8 Jul 2019 08:29:00 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 20247 invoked by uid 99); 8 Jul 2019 08:29:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2019 08:29:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8CC40C084B for ; Mon, 8 Jul 2019 08:28:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.304 X-Spam-Level: **** X-Spam-Status: No, score=4.304 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_SHORT=0.001, PDS_NO_HELO_DNS=1.327, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=seldon-io.20150623.gappssmtp.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id maCJnB9iwJfJ for ; Mon, 8 Jul 2019 08:28:57 +0000 (UTC) Received-SPF: Softfail (mailfrom) identity=mailfrom; client-ip=209.85.210.66; helo=mail-ot1-f66.google.com; envelope-from=cc@seldon.io; receiver= Received: from mail-ot1-f66.google.com (mail-ot1-f66.google.com [209.85.210.66]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 78E86BC761 for ; Mon, 8 Jul 2019 08:28:57 +0000 (UTC) Received: by mail-ot1-f66.google.com with SMTP id n5so15373718otk.1 for ; Mon, 08 Jul 2019 01:28:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seldon-io.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=KT1AWOfqNkVvlf8Aw5brODMOjqTkm/74017E4i7t2wY=; b=uA3tBt+0bLXMRY+dQpCNI0M38TPmg2V2IZGVOXMlfxtkhoooyKs47sZZZn1CXyzUvB sIlODo/3h8h1ipho0YuRF/UbYVjBmb7HnvzjpjBRtqGs+g693jhegZscC2tPWPpJTKRe 1pTRAXM4vJ+xDr8IO1k63qRvLkNbEo9C+IKT/N9zkwFucxNsQGXuTC91pDHb3RDIRzTn rfzWw9rXDbUtNBeAraSgU4AdUAqZC5uX+oj+eqrkAQGmcjG/vgz86OFyYIO57dWtaWeZ JhnUPhhfCn4Bj9isiRhdR6Vw7T8FfADHA2CeSud/1BGk4KdJJ3awNaVLpE/nhrIU0pc1 tIuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=KT1AWOfqNkVvlf8Aw5brODMOjqTkm/74017E4i7t2wY=; b=WFtHEk+HAApLh4JW/CppmSWj9NjeUgL3JO0lbDs3lD1yNKDPm2olaOw3tCNUdFicS9 d2Ja0/Td4fQ/nYQlJ6wMF2eBMET6OkxOcDLstcm9LWHjAK378HEBTc7MeypDFz8UWsFw RzvjtreSlVTm/MBP3QEM3QPcZP1YPlhs7wNKlZ2CfBn/eoUE0JhrT6ggVoLiWUsY4ciW i6II+AKLrx8fjMSGBWzUsHF0SiQ1EbDoY6c5ibMcxdzWcwantBW9lS4QyDRw8B1gzXk7 QZIOoFFAxKToajFYx4+AdYDlB4uT1TtOyGyMiITozzokdZYWj1zcSoKNVZEPUh04FJ4P +Zcw== X-Gm-Message-State: APjAAAUPzY17YLCspjEoN/dGy+o8jZIalFZQqYTsrPcAmUzDlA6EJGRe CQo1mIK7RN3d+zjHkP9tdh12WBHDv/SBiKfuFUWMq3P1KWY= X-Google-Smtp-Source: APXvYqwZNBayj5lQzA/JaXDAFUKh2pf+ivTl+Ie3wmuiy1tREX3g/6RG8/MVB0Jt8eLZ8Gb4YINB40aiiBRSN3AN6vs= X-Received: by 2002:a9d:6e04:: with SMTP id e4mr14231262otr.203.1562574536350; Mon, 08 Jul 2019 01:28:56 -0700 (PDT) MIME-Version: 1.0 References: <92e5475b-631c-47d0-92f9-2a664c2092ed@www.fastmail.com> In-Reply-To: <92e5475b-631c-47d0-92f9-2a664c2092ed@www.fastmail.com> From: Clive Cox Date: Mon, 8 Jul 2019 09:28:45 +0100 Message-ID: Subject: Re: Go / Python Sharing To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000002cb13c058d273cf5" --0000000000002cb13c058d273cf5 Content-Type: text/plain; charset="UTF-8" Thanks for all the informative replies. In our case the Python and Go would be in separate processes. So for that as I understand the conversation so far the options are: - Use of Plasma. This requires pending updates for the current Go implementation? (happy to help here) - IPC - but this will require sending the data over the wire? Thanks, Clive On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn wrote: > Hello all, > > I've been using the in-process sharing method for quite some time for the > Python<->Java interaction and I really like the ease of doing it all in the > same process. Especially as this avoids any memory-copy or shared memory > handling. This is really useful for the case where you only want to call a > single routine in another language. > > Thus I would really like to see this also implemented for Go (and Rust) so > that one can build custom UDFs in it and use them from Python code. The > pre-conditions for this are that we have IPC tests that verify that both > libraries use the exact same memory layout and that we can pull out the > memory pointer from the Go Arrow structures into the C++ memory structures > and also keep a reference between both so that memory tracking doesn't > deallocate the underlying memory. For that we have in Python the > pyarrow.foreign_buffer > https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292 > function. > > For the Go<->Python case, I would though recommend to solve this as a > Go<->C++ interface as this would make interaction for all the libraries > based on the C++ one (like R, Ruby, ..) possible. > > Uwe > > On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote: > > My bad, IPC in Go seems to be implemented - > https://issues.apache.org/jira/browse/ARROW-3679 > > On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet > wrote: > > As far as i know, Go does support IPC (as in the arrow IPC format) > > Another option which has been discussed at some point was to have a shared > memory allocator so the arrow arrays could be shared between processes. > > I haven't looked in details what implementing plasma support for Go would > need on the Go side... > > -s > > > sent from my droid > > On Mon, Jul 8, 2019, 08:29 Miki Tebeka wrote: > > Hi Clive, > > I'd like to understand the high level design for a system where a Go > process can communicate an Arrow data structure to a python process on the > same CPU > > I see two options > - Different processes with hared memory, probably using plasma > - Same process. The either Go uses Python shared library or Python using > Go compiled to shared library (-build-mode=c-shared) > > > - and for the python process to zero-copy gain access to that data, change > it and inform the Go process. This is low latency so I don't want to save > to file. > > IIRC arrow is not built for mutation. You build an Array/Table once and > then use it. > > Would this need the use of Plasma as a zero-copy store for the data > between the two processes or do I need to use IPC? But with IPC you are > transferring the data which is not needed in this case as I understand it. > Any pointers to examples would be appreciated. > > See above about options. Note that currently the Go arrow implementation > doesn't support IPC or plasma (though it's in the works). > > Yoni & I are working on another option which is using the C++ arrow > library from Go. It does support plasma and since it uses the same > underlying C++ library that Python does you'll be able to pass a pointer > around without copying data. It's at very alpha-ish state but you're more > than welcomed to give it a try - https://github.com/353solutions/carrow > > Happy hacking, > Miki > > > -- Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP ( map ). Registered in England & Wales, No. 9188032. VAT GB 258424587. Privacy Policy . --0000000000002cb13c058d273cf5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Thanks for all the informative replies= .

=C2=A0In our case the Python and Go would be in = separate processes. So for that as I understand the conversation so far the= options are:
  • Use of Plasma. This requires pending update= s for the current Go implementation? (happy to help here)
  • IPC -= but this will require sending the data over the wire?
Thanks= ,

=C2=A0Clive


<= /div>
=C2=A0
=C2=A0

On Mon, 8 Jul 2019 at 09:05, Uwe= L. Korn <uwelk@xh= ochy.com> wrote:
Hello all,

I've b= een using the in-process sharing method for quite some time for the Python&= lt;->Java interaction and I really like the ease of doing it all in the = same process. Especially as this avoids any memory-copy or shared memory ha= ndling. This is really useful for the case where you only want to call a si= ngle routine in another language.

Thus I would= really like to see this also implemented for Go (and Rust) so that one can= build custom UDFs in it and use them from Python code. The pre-conditions = for this are that we have IPC tests that verify that both libraries use the= exact same memory layout and that we can pull out the memory pointer from = the Go Arrow structures into the C++ memory structures and also keep a refe= rence between both so that memory tracking doesn't deallocate the under= lying memory. For that we have in Python the pyarrow.foreign_buffer=C2=A0https://gi= thub.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/= pyarrow/io.pxi#L1276-L1292=C2=A0function.

= For the Go<->Python case, I would though recommend to solve this as a= Go<->C++ interface as this would make interaction for all the librar= ies based on the C++ one (like R, Ruby, ..) possible.

<= div>Uwe

On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebe= ka wrote:
My bad, IPC in Go seems to be impl= emented -=C2=A0https://issues.apache.org/jira/browse/ARROW-3679
=

On Mon, Jul 8, 20= 19 at 10:18 AM Sebastien Binet <seb.binet@gmail.com> wrote:
As far as i know, Go does support= IPC (as in the arrow IPC format)

Another option which has been discussed at some point was to = have a shared memory allocator so the arrow arrays could be shared between = processes.

I haven&#= 39;t looked in details what implementing plasma support for Go would need o= n the Go side...

-s<= br>


= sent from my droid

On Mon, Jul 8, 2019, 08:29 Miki Tebeka <miki@353solutions.com> wrote:<= br>
Hi Clive,<= br>
<= div>I'd like to understand the high level design for a system where a G= o process can communicate an Arrow data structure to a python process on th= e same CPU
I see two options
- Different processe= s with hared memory, probably using plasma
- Same process. The either Go uses= Python shared library or Python using Go compiled to shared library (-buil= d-mode=3Dc-shared)
=C2=A0
- and for the python process to zero-copy gain= access to that data, change it and inform the Go process.=C2=A0 This is lo= w latency so I don't want to save to file.
=
IIRC a= rrow is not built for mutation.=C2=A0You build an Array/Table once and t= hen use it.

Would this need the use of Plasma as a zer= o-copy store for the data between the two processes or do I need to use IPC= ? But with IPC you are transferring the data which is not needed in this ca= se as I understand it. Any pointers to examples would be appreciated.
See above about options. Note that currently the Go arrow = implementation doesn't support IPC or plasma (though it's in the wo= rks).

Yoni & I are working on anothe= r option which is using the C++ arrow library from Go. It does support plas= ma and since it uses the same underlying C++ library that Python does you&#= 39;ll be able to pass a pointer around without copying data. It's at ve= ry alpha-ish state but you're more than welcomed to give it a try - https://github.com/353solutions/carrow

Happy hacking,
Miki=C2=A0



--
<= div dir=3D"ltr">

3D""
Seldon Technologies Ltd, Rise London, 41 L= uke Street, Shoreditch, EC2A 4DP (map).=C2=A0Registered in England & Wales, No. 9188032. VAT GB=C2=A0258424587. = Privacy Policy.
=
=
--0000000000002cb13c058d273cf5--