From user-return-152-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jul 8 08:05:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id C9F4D180665 for ; Mon, 8 Jul 2019 10:05:01 +0200 (CEST) Received: (qmail 74443 invoked by uid 500); 8 Jul 2019 08:05:00 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 74432 invoked by uid 99); 8 Jul 2019 08:05:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2019 08:05:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 478EAC2E1E for ; Mon, 8 Jul 2019 08:05:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.599 X-Spam-Level: * X-Spam-Status: No, score=1.599 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, JMQ_SPF_NEUTRAL=0.5, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=xhochy.com header.b=I4e73g16; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=tytbt4SB Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qc33L4Belz_P for ; Mon, 8 Jul 2019 08:04:58 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=64.147.123.21; helo=wout5-smtp.messagingengine.com; envelope-from=uwelk@xhochy.com; receiver= Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 8F0A67E21C for ; Mon, 8 Jul 2019 08:04:57 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 24FD46C4 for ; Mon, 8 Jul 2019 04:04:47 -0400 (EDT) Received: from imap36 ([10.202.2.86]) by compute5.internal (MEProxy); Mon, 08 Jul 2019 04:04:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xhochy.com; h= mime-version:message-id:in-reply-to:references:date:from:to :subject:content-type; s=fm3; bh=o/+EQ6RMGzK1/YhX6jfIvr+5VF45gKy xl4hIiGTR/2U=; b=I4e73g16hUPRMyXo8/zE2vsTatvljw7UaZZ6DaFpkmFw8ze X6jI5kcUMcWrBgADzaZ7I3T8H3acZ/tJPkbuj97vlxTqpHyxi0cbyZS9sAdA0rFk mNjMgvQgObDAhF2z3SjXGrTff3dP669MXLO7mV5Qp3eF3sdMShMs5+NUvjyhEh+1 NqycgZJ37d4Y82fZsMqTDd6cKosU9JLJwTxTH75SuLD8EZdbH0ZxKM/2MtJZLvnN 3Craq7wfvMieBX0nxSBnMMb15ClyH0h+lISQo71GxU8KPXrkDTG7SDQiSModQx7r IEUezYWY98+K7gpDXYlRZMVKosDyTwYCazQDXHA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=o/+EQ6 RMGzK1/YhX6jfIvr+5VF45gKyxl4hIiGTR/2U=; b=tytbt4SBgliloZqNwl46Zt vFYmYFrV4Dw+GaBJU25nS2scmJwqrBmRpXEQWcwkTaJ5ApIZ6D2pBGLxCGjFBNFD 1gaHJO+b+FArmCU/5dF+S/zvGuo9sUel7IegL/z2EtMYrkcQsI/hU82xf+jyZJaD r54EmjrD2jQpfQrA4HSgv6T2KtRpwB8bqfxZcIlNU29ZhXsi566LN46AbWW66rsu 5Pfq2KXA2zHW4e8niQ09gIwZtMBvopDarA+YjV2if9viSTMZG4H1nCwz2sN/NXSF OM9D0fH3iLLDphIMNyssgkTUK1yg75YyUkq6lrW1+36ouizIVMQKkwoJS3xNS7ig == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrgedtucetufdoteggodetrfdotffvucfrrh hofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgenuceurghi lhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsegrtderreerre dtnecuhfhrohhmpedffgifvgcunfdrucfmohhrnhdfuceouhifvghlkhesgihhohgthhih rdgtohhmqeenucffohhmrghinheprghprggthhgvrdhorhhgpdhgihhthhhusgdrtghomh enucfrrghrrghmpehmrghilhhfrhhomhepuhifvghlkhesgihhohgthhihrdgtohhmnecu vehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 5CD4F12200A2; Mon, 8 Jul 2019 04:04:46 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.1.6-731-g19d3b16-fmstable-20190627v1 Mime-Version: 1.0 Message-Id: <92e5475b-631c-47d0-92f9-2a664c2092ed@www.fastmail.com> In-Reply-To: References: Date: Mon, 08 Jul 2019 10:04:45 +0200 From: "Uwe L. Korn" To: user@arrow.apache.org Subject: Re: Go / Python Sharing Content-Type: multipart/alternative; boundary=4874d9aee1434d5fa04e3f31e0a2adf7 --4874d9aee1434d5fa04e3f31e0a2adf7 Content-Type: text/plain Hello all, I've been using the in-process sharing method for quite some time for the Python<->Java interaction and I really like the ease of doing it all in the same process. Especially as this avoids any memory-copy or shared memory handling. This is really useful for the case where you only want to call a single routine in another language. Thus I would really like to see this also implemented for Go (and Rust) so that one can build custom UDFs in it and use them from Python code. The pre-conditions for this are that we have IPC tests that verify that both libraries use the exact same memory layout and that we can pull out the memory pointer from the Go Arrow structures into the C++ memory structures and also keep a reference between both so that memory tracking doesn't deallocate the underlying memory. For that we have in Python the pyarrow.foreign_buffer https://github.com/apache/arrow/blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L1276-L1292 function. For the Go<->Python case, I would though recommend to solve this as a Go<->C++ interface as this would make interaction for all the libraries based on the C++ one (like R, Ruby, ..) possible. Uwe On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka wrote: > My bad, IPC in Go seems to be implemented - https://issues.apache.org/jira/browse/ARROW-3679 > > On Mon, Jul 8, 2019 at 10:18 AM Sebastien Binet wrote: >> As far as i know, Go does support IPC (as in the arrow IPC format) >> >> Another option which has been discussed at some point was to have a shared memory allocator so the arrow arrays could be shared between processes. >> >> I haven't looked in details what implementing plasma support for Go would need on the Go side... >> >> -s >> >> >> sent from my droid >> >> On Mon, Jul 8, 2019, 08:29 Miki Tebeka wrote: >>> Hi Clive, >>> >>>> I'd like to understand the high level design for a system where a Go process can communicate an Arrow data structure to a python process on the same CPU >>> I see two options >>> - Different processes with hared memory, probably using plasma >>> - Same process. The either Go uses Python shared library or Python using Go compiled to shared library (-build-mode=c-shared) >>> >>>> - and for the python process to zero-copy gain access to that data, change it and inform the Go process. This is low latency so I don't want to save to file. >>> IIRC arrow is not built for mutation. You build an Array/Table once and then use it. >>> >>>> Would this need the use of Plasma as a zero-copy store for the data between the two processes or do I need to use IPC? But with IPC you are transferring the data which is not needed in this case as I understand it. Any pointers to examples would be appreciated. >>> See above about options. Note that currently the Go arrow implementation doesn't support IPC or plasma (though it's in the works). >>> >>> Yoni & I are working on another option which is using the C++ arrow library from Go. It does support plasma and since it uses the same underlying C++ library that Python does you'll be able to pass a pointer around without copying data. It's at very alpha-ish state but you're more than welcomed to give it a try - https://github.com/353solutions/carrow >>> >>> Happy hacking, >>> Miki --4874d9aee1434d5fa04e3f31e0a2adf7 Content-Type: text/html Content-Transfer-Encoding: quoted-printable
Hello all,
<= /div>

I've been using the in-process sharing method f= or quite some time for the Python<->Java interaction and I really = like the ease of doing it all in the same process. Especially as this av= oids any memory-copy or shared memory handling. This is really useful fo= r the case where you only want to call a single routine in another langu= age.

Thus I would really like to see this a= lso implemented for Go (and Rust) so that one can build custom UDFs in i= t and use them from Python code. The pre-conditions for this are that we= have IPC tests that verify that both libraries use the exact same memor= y layout and that we can pull out the memory pointer from the Go Arrow s= tructures into the C++ memory structures and also keep a reference betwe= en both so that memory tracking doesn't deallocate the underlying memory= . For that we have in Python the pyarrow.foreign_buffer https://github.com/apache/arrow= /blob/1b798a317df719d32312ca2c3253a2e399e949b8/python/pyarrow/io.pxi#L12= 76-L1292 function.

For the Go<-= >Python case, I would though recommend to solve this as a Go<->= C++ interface as this would make interaction for all the libraries based= on the C++ one (like R, Ruby, ..) possible.

Uw= e

On Mon, Jul 8, 2019, at 9:57 AM, Miki Tebeka = wrote:
My bad, IPC in Go seems to be implemented - https://issues.apache.o= rg/jira/browse/ARROW-3679

On Mon, Jul 8,= 2019 at 10:18 AM Sebastien Binet <seb.binet@gmail.com> wrote:
= As far as i know, Go does support IPC (as in the arrow IPC format)

Another option which h= as been discussed at some point was to have a shared memory allocator so= the arrow arrays could be shared between processes.

I haven't looked in details what impl= ementing plasma support for Go would need on the Go side...

-s


sent from my droid
<= /div>

On Mon, Jul 8, 2019, 08:29 Miki Tebeka &l= t;miki@353solutions.com>= wrote:
Hi Cliv= e,

=
=
I'd like to understand the high level design for a= system where a Go process can communicate an Arrow data structure to a = python process on the same CPU
I see two options
- Different process= es with hared memory, probably using plasma
- Sam= e process. The either Go uses Python shared library or Python using Go c= ompiled to shared library (-build-mode=3Dc-shared)
 <= br>
- and for the python process to zero-copy= gain access to that data, change it and inform the Go process.  Th= is is low latency so I don't want to save to file.
IIRC arrow is not built for= mutation. You build= an Array/Table once and then use it.

Would this need the use of Plasma as a zero-copy store for the data bet= ween the two processes or do I need to use IPC? But with IPC you are tra= nsferring the data which is not needed in this case as I understand it. = Any pointers to examples would be appreciated.
See above about options. Note t= hat currently the Go arrow implementation doesn't support IPC or plasma = (though it's in the works).

Yoni & I are working on another option which is using the C++= arrow library from Go. It does support plasma and since it uses the sam= e underlying C++ library that Python does you'll be able to pass a point= er around without copying data. It's at very alpha-ish state but you're = more than welcomed to give it a try - https://github.com/353solutions/carr= ow

Happy hacking,=
Miki 

--4874d9aee1434d5fa04e3f31e0a2adf7--