arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Plasma store implementation status across client libraries
Date Mon, 04 Jan 2021 20:20:07 GMT
hi Burke -- you have to e-mail user-unsubscribe@arrow.apache.org

On Mon, Jan 4, 2021 at 1:28 PM Burke Kaltenberger
<burke@firsttalentsearch.com> wrote:
>
> please remove me from your email list. Thank you
>
> On Mon, Jan 4, 2021 at 10:15 AM Neal Richardson <neal.p.richardson@gmail.com> wrote:
>>
>> I believe Plasma only has Python bindings. FWIW it has not seen active development
in quite a while.
>>
>> Neal
>>
>> On Mon, Jan 4, 2021 at 8:58 AM Chris Nuernberger <chris@techascent.com> wrote:
>>>
>>> Yes that makes sense.  I guess you also need something to broker shared memory
filenames/ids.  The database isn't in-memory, however, although I know what you mean.  One
huge advantage of mmap is you can have much larger than memory storage act like in-memory
storage; so the plasma store can be roughly the size of your disk and larger your ram but
your program, unless it attempts to verbatim copy a column wouldn't know any better.
>>>
>>> Numerical larger-than-memory-but-in-memory redis indeed; that is an interesting
way to think of it.
>>>
>>> On Mon, Jan 4, 2021 at 9:45 AM Thomas Browne <thomas@crvm.io> wrote:
>>>>
>>>> Interesting and agreed. I guess this a big advantage of the "on the wire"
unserialised format - just read it in and it's already native. I'll go this way possibly.
>>>>
>>>> However I also note the beginnings of more advanced functionality in the
Plasma store, for example, notification API on buffer seal (ie when something changes, all
clients can be notified).
>>>>
>>>> https://arrow.apache.org/docs/python/generated/pyarrow.plasma.PlasmaClient.html#pyarrow.plasma.PlasmaClient.subscribe
>>>>
>>>> I'm assuming the plasma store will add functionality over time, and if this
is the case, having all client libraries implement it means I can almost have a redis-like
column-store specialising in numerical computation (which would be awesome), and for which
i don't need to write my own functionality for each client library.
>>>>
>>>> A numerical in-memory database, if you will.
>>>>
>>>> On 04/01/2021 15:55, Chris Nuernberger wrote:
>>>>
>>>> Julia, Python, and R all have some support for mmap operations.
>>>>
>>>> On Mon, Jan 4, 2021 at 8:55 AM Chris Nuernberger <chris@techascent.com>
wrote:
>>>>>
>>>>> Could simply saving the arrow file in streaming mode to shared memory
and then mmap-ing the result in each language solve your problem ?  Plasma seems to me to
be a layer on top of basic mmap operations; as long as you have shared memory and mmap then
you can have multiple processes talking to the same logical block of memory.
>>>>>
>>>>> On Mon, Jan 4, 2021 at 8:27 AM Thomas Browne <thomas@crvm.io> wrote:
>>>>>>
>>>>>> I am hoping to use the Apache Arrow project for cross-language numerical
>>>>>> computation, and for that the shared-memory idea is very powerful.
Am I
>>>>>> correct that the Plasma Store is the enabling technology for this,
>>>>>> especially for soft real-time computation (ie not moving to parquet
or
>>>>>> any file-based sharing system)?
>>>>>>
>>>>>> Is that the case? And if so, then I'm wondering which client libraries,
>>>>>> other than Python (and I assume C[++]), implement the Plasma Store.
This
>>>>>> table doesn't feature a row for Plasma:
>>>>>>
>>>>>> https://arrow.apache.org/docs/status.html
>>>>>>
>>>>>> and I can't seem to find any reference to the Plasma store in the
Julia,
>>>>>> R, or Javascript libraries.
>>>>>>
>>>>>> https://arrow.apache.org/docs/r/
>>>>>>
>>>>>> https://arrow.apache.org/docs/js/
>>>>>>
>>>>>> https://arrow.juliadata.org/stable/
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>
>
> --
> First Talent Search & Placement
> Burke Kaltenberger | Founder
> 408.458.0071

Mime
View raw message