From user-return-886-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jan 4 18:15:33 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 13C78180670 for ; Mon, 4 Jan 2021 19:15:33 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 4941245C03 for ; Mon, 4 Jan 2021 18:15:32 +0000 (UTC) Received: (qmail 75918 invoked by uid 500); 4 Jan 2021 18:15:31 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 75900 invoked by uid 99); 4 Jan 2021 18:15:31 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jan 2021 18:15:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id B5337C03FE for ; Mon, 4 Jan 2021 18:15:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id 6ECuG5251NiM for ; Mon, 4 Jan 2021 18:15:29 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.43; helo=mail-ed1-f43.google.com; envelope-from=neal.p.richardson@gmail.com; receiver= Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 151E6BCC0D for ; Mon, 4 Jan 2021 18:15:29 +0000 (UTC) Received: by mail-ed1-f43.google.com with SMTP id j16so28324488edr.0 for ; Mon, 04 Jan 2021 10:15:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=8N4qrqZTLypAajGQMXxZjcQ58t0AxL1mJFds2Sl/Gxo=; b=cq6g9TE1JKk8fPrJcchICH4KShmhhvp56u2CTs2SU2kRVj9K7DZojSNWFgA5ROR1aY FF3w7SOkNYPj/LKZlsqMnb5KJ/96bPPtRtTiwoJ7hOAUrIUENNF16G2fDQohofOtKlxV wK3almYJvuOqVbXIHacVUV48iyMotCBp16lGZRv+qdvtaEHtMkQ7xfCfivpXFagCs5yV yNh3nvvV+9hh0ZID+wLSb2fQZlns2YvIQFs4EIRHFJtFvH8PLN60rsgYmyPay10lLCVc +yjJCb5GJLlS4bMSNI5ntGBSVAUiFK211YMCNSU4s1T64H/Luvh3rszu14b4+e0XRLGY TsGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=8N4qrqZTLypAajGQMXxZjcQ58t0AxL1mJFds2Sl/Gxo=; b=tYJJqHcycfp5JwSrTVY3ZSGopYq8ZpBnRL8vxSE2yQfDo3b0KLw0FhWtoY+2gaICcM zSjCZQZ+rcMGp/CbNivghuR89ZR980O0uhMeumhAU387Eu/V+hlBQL2spkE6o6Shz2+r Fq0Zcz1kXa9VzFHL0KXr0IM8Ie09IK8OhNlDvjm9+KrVQC5dKs03WAjEZ80nW6hAZ6C5 ifAzhL2USnfenq2VPm5wE/l+RB6xfhT+H1Pcs+BiJc7zckmJjLwR5PL9XYlb0tYY12uo xLinHdOwLO3Mu7yPO4PNp+R8+1SMO6XM9Y63fgzdRJ/DZMGcRaL+UC02LtvfAzBlBJ3C y5Dw== X-Gm-Message-State: AOAM5307zoisygUlwOC3RPg2Xzol/57oBQHGIC4/srS2pqfocxbC5Fvc Wgp2QrGg/zTrstvY7TwgVLIDar072VK0x7rpEnpZlFbykSU= X-Google-Smtp-Source: ABdhPJyEnAOC6EOYHk1/mDKqtaubCtnan6zDQRsUAc5iuvh2C6eE/cyOxk449l/gY32hhQtGJZSoc+A7Q8Lptdm3EI0= X-Received: by 2002:aa7:d7d8:: with SMTP id e24mr70939675eds.135.1609784127962; Mon, 04 Jan 2021 10:15:27 -0800 (PST) MIME-Version: 1.0 References: <22b22927-94a8-f804-0e75-97b0f1f0f7c6@crvm.io> <0f1b2235-57bb-8b17-6c71-dc0125c2e5cc@crvm.io> In-Reply-To: From: Neal Richardson Date: Mon, 4 Jan 2021 10:15:16 -0800 Message-ID: Subject: Re: Plasma store implementation status across client libraries To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000001cc2be05b8171318" --0000000000001cc2be05b8171318 Content-Type: text/plain; charset="UTF-8" I believe Plasma only has Python bindings. FWIW it has not seen active development in quite a while. Neal On Mon, Jan 4, 2021 at 8:58 AM Chris Nuernberger wrote: > Yes that makes sense. I guess you also need something to broker shared > memory filenames/ids. The database isn't in-memory, however, although I > know what you mean. One huge advantage of mmap is you can have much larger > than memory storage act like in-memory storage; so the plasma store can be > roughly the size of your disk and larger your ram but your program, unless > it attempts to verbatim copy a column wouldn't know any better. > > Numerical larger-than-memory-but-in-memory redis indeed; that is an > interesting way to think of it. > > On Mon, Jan 4, 2021 at 9:45 AM Thomas Browne wrote: > >> Interesting and agreed. I guess this a big advantage of the "on the wire" >> unserialised format - just read it in and it's already native. I'll go this >> way possibly. >> >> However I also note the beginnings of more advanced functionality in the >> Plasma store, for example, notification API on buffer seal (ie when >> something changes, all clients can be notified). >> >> >> https://arrow.apache.org/docs/python/generated/pyarrow.plasma.PlasmaClient.html#pyarrow.plasma.PlasmaClient.subscribe >> >> I'm assuming the plasma store will add functionality over time, and if >> this is the case, having all client libraries implement it means I can >> almost have a redis-like column-store specialising in numerical computation >> (which would be awesome), and for which i don't need to write my own >> functionality for each client library. >> >> A numerical in-memory database, if you will. >> On 04/01/2021 15:55, Chris Nuernberger wrote: >> >> Julia, Python, and R all have some support for mmap operations. >> >> On Mon, Jan 4, 2021 at 8:55 AM Chris Nuernberger >> wrote: >> >>> Could simply saving the arrow file in streaming mode to shared memory >>> and then mmap-ing the result in each language solve your problem ? Plasma >>> seems to me to be a layer on top of basic mmap operations; as long as you >>> have shared memory and mmap then you can have multiple processes talking to >>> the same logical block of memory. >>> >>> On Mon, Jan 4, 2021 at 8:27 AM Thomas Browne wrote: >>> >>>> I am hoping to use the Apache Arrow project for cross-language >>>> numerical >>>> computation, and for that the shared-memory idea is very powerful. Am I >>>> correct that the Plasma Store is the enabling technology for this, >>>> especially for soft real-time computation (ie not moving to parquet or >>>> any file-based sharing system)? >>>> >>>> Is that the case? And if so, then I'm wondering which client libraries, >>>> other than Python (and I assume C[++]), implement the Plasma Store. >>>> This >>>> table doesn't feature a row for Plasma: >>>> >>>> https://arrow.apache.org/docs/status.html >>>> >>>> and I can't seem to find any reference to the Plasma store in the >>>> Julia, >>>> R, or Javascript libraries. >>>> >>>> https://arrow.apache.org/docs/r/ >>>> >>>> https://arrow.apache.org/docs/js/ >>>> >>>> https://arrow.juliadata.org/stable/ >>>> >>>> >>>> Thank you, >>>> >>>> Thomas >>>> >>>> >>>> --0000000000001cc2be05b8171318 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I believe Plasma only has Python bindings. FWIW it ha= s not seen active development in quite a while.

Ne= al

On Mon, Jan 4, 2021 at 8:58 AM Chris Nuernberger <chris@techascent.com> wrote:
=
Yes that= makes sense.=C2=A0 I guess you also need something to broker shared memory= filenames/ids.=C2=A0 The database isn't in-memory, however, although I= know what you mean.=C2=A0 One huge advantage of mmap is you can have much = larger than memory storage act like in-memory storage; so the plasma store = can be roughly the size of your disk and larger your ram but your program, = unless it attempts to verbatim copy a column wouldn't know any better.<= div>
Numerical larger-than-memory-but-in-memory redis indeed;= that is an interesting way to think of it.=C2=A0=C2=A0

=
On Mon, Ja= n 4, 2021 at 9:45 AM Thomas Browne <thomas@crvm.io> wrote:
=20 =20 =20

Interesting and agreed. I guess this a big advantage of the "on the wire" unserialised format - just read it in and it's alr= eady native. I'll go this way possibly.

However I also note the beginnings of more advanced functionality in the Plasma store, for example, notification API on buffer seal (ie when something changes, all clients can be notified).

https://arrow.apache.org/docs/python/generated/pyarrow.plasma.PlasmaCl= ient.html#pyarrow.plasma.PlasmaClient.subscribe

I'm assuming the plasma store will add functionality over time, and if this is the case, having all client libraries implement it means I can almost have a redis-like column-store specialising in numerical computation (which would be awesome), and for which i don't need to write my own functionality for each client library.

A numerical in-memory database, if you will.

On 04/01/2021 15:55, Chris Nuernberger wrote:
=20
Julia, Python, and R all have some support for mmap operations.

On Mon, Jan 4, 2021 at 8:55 A= M Chris Nuernberger <chris@techascent.com> wrote:
Could simply saving the arrow file in streaming mode to shared memory and then mmap-ing the result in each language solve your problem ?=C2=A0 Plasma seems to me to be a layer on top of basic mmap operations; as long as you have shared memory and mmap then you can have multiple processes talking to the same logical block of memory.

On Mon, Jan 4, 2021 at 8:27 AM Thomas Browne <thomas@crvm.io> wrote:
I am hoping t= o use the Apache Arrow project for cross-language numerical
computation, and for that the shared-memory idea is very powerful. Am I
correct that the Plasma Store is the enabling technology for this,
especially for soft real-time computation (ie not moving to parquet or
any file-based sharing system)?

Is that the case? And if so, then I'm wondering which client libraries,
other than Python (and I assume C[++]), implement the Plasma Store. This
table doesn't feature a row for Plasma:

https://arrow.apache.org/docs/status.html=

and I can't seem to find any reference to the Plasma stor= e in the Julia,
R, or Javascript libraries.

https://arrow.apache.org/docs/r/

https://arrow.apache.org/docs/js/

https://arrow.juliadata.org/stable/


Thank you,

Thomas


--0000000000001cc2be05b8171318--