From user-return-154-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jul 8 08:52:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4B919180675 for ; Mon, 8 Jul 2019 10:52:02 +0200 (CEST) Received: (qmail 52506 invoked by uid 500); 8 Jul 2019 08:52:01 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 52261 invoked by uid 99); 8 Jul 2019 08:52:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2019 08:52:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A1B6DC098A for ; Mon, 8 Jul 2019 08:52:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.128 X-Spam-Level: *** X-Spam-Status: No, score=3.128 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_SHORT=0.001, PDS_NO_HELO_DNS=1.327, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id xeciCV-oUgTA for ; Mon, 8 Jul 2019 08:51:59 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.167.66; helo=mail-lf1-f66.google.com; envelope-from=seb.binet@gmail.com; receiver= Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 6D790BC52B for ; Mon, 8 Jul 2019 08:51:58 +0000 (UTC) Received: by mail-lf1-f66.google.com with SMTP id c19so4716721lfm.10 for ; Mon, 08 Jul 2019 01:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=N5p/8HTagUGnC1mgLLv8tB7Pt7GwBghYoNCp/8t9qCI=; b=F1Yeishj3MXndmlfaZYw7mffVEwHfbGfwZG7dx7Vjptca6VeXmVmspdbC4Mq/RGIQ4 t8QQ8eqfzoCCR2FQJz3MhgQhjfEYG/nX/OQFS2wlvXQNHYaYt3H3nVqE1a4hqII9M+pc s9A8TzWHRKPc/zmdbsSbSMQU7XBMAQoE4G+R8hVwhkzxKzwYAgc5BgPcXEIEQy6A6P7D ETbD2+m1K4sDSGJe9J/1A6t+AOSIpKb7Kl3xuaoj1HXQveT4Ky5AEpNbWLtJA+x4RT8g R2IzdACPhc9mPNtQsT3YdCnhzxUn0bxDwEXslRLiutCvOv4G4ejGrjXw7ZmEsr4RnNpf etwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=N5p/8HTagUGnC1mgLLv8tB7Pt7GwBghYoNCp/8t9qCI=; b=AY216N1toZuAzQPryR5v/EL33OVN+By2FUWdItUsLorbC7hMGLaKDvhGeHjBe8OrJk ZTY0GC31xuBNoL6z759gYIng9N5TFF8O5lAT5yq10H6Lzw2wttvx/OES11GaRd8rz3Ud mr8jAJvd/vlLqzNPpEu4IJBWVb0qdLyGpgKvJYMRSl4vDB3zixXu+yr1K4UgJ4R1h2nT Z3eFYpcZMSNUqq1fDq1iGhu5SCmJmmfMECompCcLK7Qb5vz6nF1Hsc4iOlvmmgWOZRxR hJROWzsUPAiW5QDoGHlLjIjsbpeRlHvW2Wbnp62aOhWyp2p7eo7ddvqxv1DfkwSbXbr3 +pwQ== X-Gm-Message-State: APjAAAWZxp02HGRRC2lSoeGQjxnlT9QK/PvCl9bJGGpTVzxYSlRjv9p4 DcfnVtkXWGKkV1fwfjHCHhItwi0JfXOpNR7b3tsk2Q== X-Google-Smtp-Source: APXvYqwYVOsSrqjhnwJT7i1ATJnq2YLi2H6FEghudwlkw1vSBqaKXRqDULM5L/aQDk8M+0/jeRUcll3ly9UtBA8fHyc= X-Received: by 2002:ac2:495e:: with SMTP id o30mr8030217lfi.140.1562575910201; Mon, 08 Jul 2019 01:51:50 -0700 (PDT) MIME-Version: 1.0 References: <92e5475b-631c-47d0-92f9-2a664c2092ed@www.fastmail.com> In-Reply-To: From: Sebastien Binet Date: Mon, 8 Jul 2019 10:51:36 +0200 Message-ID: Subject: Re: Go / Python Sharing To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000000fef61058d278ebf" --0000000000000fef61058d278ebf Content-Type: text/plain; charset="UTF-8" Having not yet looked at the amount of work implementing plasma in Go is, you may just ignore me :) but I think implementing a shared memory Go allocator to be easier (as in less human hours to implement). Another option could be to have a CGo package exposing a set of functions (compiled as a C shlib) that call into the Go based arrow package to do what you need. -s sent from my droid On Mon, Jul 8, 2019, 10:30 Clive Cox wrote: > > Thanks for all the informative replies. > > In our case the Python and Go would be in separate processes. So for that > as I understand the conversation so far the options are: > > - Use of Plasma. This requires pending updates for the current Go > implementation? (happy to help here) > - IPC - but this will require sending the data over the wire? > > Thanks, > > Clive > > > > > > On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn wrote: > >> Hello all, >> >> I've been using the in-process sharing method for quite some time for the >> Python<->Java interaction and I really like the ease of doing it all in the >> same process. Especially as this avoids any memory-copy or shared memory >> handling. This is really useful for the case where you only want to call a >> single routine in another language. >> >> Thus I would really like to see this also implemented for Go (and Rust) >> so that one can build custom UDFs in it and use them from Python code. The >> pre-conditions for this are that we have IPC tests that verify that both >> libraries use the exact same memory layout and that we can pull out the >> memory pointer from the Go Arrow structures into the C++ memory structures >> and also keep a reference between both so that memory tracking doesn't >> deallocate the underlying memory. For that we have in Python the >> pyarrow.foreign_buffer >> https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292 >> function. >> >> For the Go<->Python case, I would though recommend to solve this as a >> Go<->C++ interface as this would make interaction for all the libraries >> based on the C++ one (like R, Ruby, ..) possible. >> >> Uwe >> >> On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote: >> >> My bad, IPC in Go seems to be implemented - >> https://issues.apache.org/jira/browse/ARROW-3679 >> >> On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet >> wrote: >> >> As far as i know, Go does support IPC (as in the arrow IPC format) >> >> Another option which has been discussed at some point was to have a >> shared memory allocator so the arrow arrays could be shared between >> processes. >> >> I haven't looked in details what implementing plasma support for Go would >> need on the Go side... >> >> -s >> >> >> sent from my droid >> >> On Mon, Jul 8, 2019, 08:29 Miki Tebeka wrote: >> >> Hi Clive, >> >> I'd like to understand the high level design for a system where a Go >> process can communicate an Arrow data structure to a python process on the >> same CPU >> >> I see two options >> - Different processes with hared memory, probably using plasma >> - Same process. The either Go uses Python shared library or Python using >> Go compiled to shared library (-build-mode=c-shared) >> >> >> - and for the python process to zero-copy gain access to that data, >> change it and inform the Go process. This is low latency so I don't want >> to save to file. >> >> IIRC arrow is not built for mutation. You build an Array/Table once and >> then use it. >> >> Would this need the use of Plasma as a zero-copy store for the data >> between the two processes or do I need to use IPC? But with IPC you are >> transferring the data which is not needed in this case as I understand it. >> Any pointers to examples would be appreciated. >> >> See above about options. Note that currently the Go arrow implementation >> doesn't support IPC or plasma (though it's in the works). >> >> Yoni & I are working on another option which is using the C++ arrow >> library from Go. It does support plasma and since it uses the same >> underlying C++ library that Python does you'll be able to pass a pointer >> around without copying data. It's at very alpha-ish state but you're more >> than welcomed to give it a try - https://github.com/353solutions/carrow >> >> Happy hacking, >> Miki >> >> >> > > -- > > > > Seldon Technologies Ltd, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP > (map ). Registered in England & Wales, > No. 9188032. VAT GB 258424587. Privacy Policy > . > --0000000000000fef61058d278ebf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Having not yet looked at the amount of work implementing = plasma in Go is, you may just ignore me :) but I think implementing a share= d memory Go allocator to be easier (as in less human hours to implement).
Another option could be to have= a CGo package exposing a set of functions (compiled as a C shlib) that cal= l into the Go based arrow package to do what you need.

-s

sent from my droid

On Mon, Jul 8, 2019, 10:30= Clive Cox <cc@seldon.io> wrote:<= br>

Thanks for all the informative replies.

=C2=A0In= our case the Python and Go would be in separate processes. So for that as = I understand the conversation so far the options are:
  • Use= of Plasma. This requires pending updates for the current Go implementation= ? (happy to help here)
  • IPC - but this will require sending the = data over the wire?
Thanks,

=C2=A0Cl= ive


=C2=A0
=C2=A0

On Mon, 8 Jul 2019 at 09:05, Uwe L. Korn <uwelk@xhochy.com>= wrote:
<= div>
Hello all,

I've been using the in= -process sharing method for quite some time for the Python<->Java int= eraction and I really like the ease of doing it all in the same process. Es= pecially as this avoids any memory-copy or shared memory handling. This is = really useful for the case where you only want to call a single routine in = another language.

Thus I would really like to = see this also implemented for Go (and Rust) so that one can build custom UD= Fs in it and use them from Python code. The pre-conditions for this are tha= t we have IPC tests that verify that both libraries use the exact same memo= ry layout and that we can pull out the memory pointer from the Go Arrow str= uctures into the C++ memory structures and also keep a reference between bo= th so that memory tracking doesn't deallocate the underlying memory. Fo= r that we have in Python the pyarrow.foreign_buffer=C2=A0https:/= /github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/pyth= on/pyarrow/io.pxi#L1276-L1292=C2=A0function.

For the Go<->Python case, I would though recommend to solve this a= s a Go<->C++ interface as this would make interaction for all the lib= raries based on the C++ one (like R, Ruby, ..) possible.

Uwe

On Mon, Jul 8, 2019, at 9:57 AM, Miki T= ebeka wrote:
My bad, IPC in Go seems to be implemented -=C2=A0https://issues.apache.org/jira/browse/ARROW-3679

On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet <seb.bi= net@gmail.com> wrote:
As far as i know, Go does support I= PC (as in the arrow IPC format)

Another option which has been discussed at some point was to ha= ve a shared memory allocator so the arrow arrays could be shared between pr= ocesses.

I haven'= ;t looked in details what implementing plasma support for Go would need on = the Go side...

-s


se= nt from my droid

On Mon, Jul 8, 2019, 08:= 29 Miki Tebeka <miki@353solutions.com> wrote:
Hi Clive,

I'd like to understand the high level design for a sy= stem where a Go process can communicate an Arrow data structure to a python= process on the same CPU
I see t= wo options
- Different processes with hared memory= , probably using plasma
- Same process. The either Go us= es Python shared library or Python using Go compiled to shared library (-bu= ild-mode=3Dc-shared)
=C2=A0
- and for the python pr= ocess to zero-copy gain access to that data, change it and inform the Go pr= ocess.=C2=A0 This is low latency so I don't want to save to file.
IIRC arrow is no= t built for mutation.=C2=A0You= build an Array/Table once and then use it.

Would this need the us= e of Plasma as a zero-copy store for the data between the two processes or = do I need to use IPC? But with IPC you are transferring the data which is n= ot needed in this case as I understand it. Any pointers to examples would b= e appreciated.
See above about options. Note that currently the Go arrow implementatio= n doesn't support IPC or plasma (though it's in the works).<= /span>

Yoni & I are working on another op= tion which is using the C++ arrow library from Go. It does support plasma a= nd since it uses the same underlying C++ library that Python does you'l= l be able to pass a pointer around without copying data. It's at very a= lpha-ish state but you're more than welcomed to give it a try - https://github.com/353solutions/carrow<= br>

<= span style=3D"font-family:georgia,serif" class=3D"m_5686353965326161667gmai= l-m_-4552411343456687673gmail-m_6673346572166542845font">Happy hacking,
Miki=C2=A0
=



--

3D""
Seldon Technologies Lt= d, Rise London, 41 Luke Street, Shoreditch, EC2A 4DP (map).=C2= =A0Registered in Eng= land & Wales, No. 9188032. VAT GB=C2=A0258424587. Privacy Policy.
=
=
--0000000000000fef61058d278ebf--