arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burke Kaltenberger <bu...@firsttalentsearch.com>
Subject Re: Plasma store implementation status across client libraries
Date Mon, 04 Jan 2021 19:27:39 GMT
please remove me from your email list. Thank you

On Mon, Jan 4, 2021 at 10:15 AM Neal Richardson <neal.p.richardson@gmail.com>
wrote:

> I believe Plasma only has Python bindings. FWIW it has not seen active
> development in quite a while.
>
> Neal
>
> On Mon, Jan 4, 2021 at 8:58 AM Chris Nuernberger <chris@techascent.com>
> wrote:
>
>> Yes that makes sense.  I guess you also need something to broker shared
>> memory filenames/ids.  The database isn't in-memory, however, although I
>> know what you mean.  One huge advantage of mmap is you can have much larger
>> than memory storage act like in-memory storage; so the plasma store can be
>> roughly the size of your disk and larger your ram but your program, unless
>> it attempts to verbatim copy a column wouldn't know any better.
>>
>> Numerical larger-than-memory-but-in-memory redis indeed; that is an
>> interesting way to think of it.
>>
>> On Mon, Jan 4, 2021 at 9:45 AM Thomas Browne <thomas@crvm.io> wrote:
>>
>>> Interesting and agreed. I guess this a big advantage of the "on the
>>> wire" unserialised format - just read it in and it's already native. I'll
>>> go this way possibly.
>>>
>>> However I also note the beginnings of more advanced functionality in the
>>> Plasma store, for example, notification API on buffer seal (ie when
>>> something changes, all clients can be notified).
>>>
>>>
>>> https://arrow.apache.org/docs/python/generated/pyarrow.plasma.PlasmaClient.html#pyarrow.plasma.PlasmaClient.subscribe
>>>
>>> I'm assuming the plasma store will add functionality over time, and if
>>> this is the case, having all client libraries implement it means I can
>>> almost have a redis-like column-store specialising in numerical computation
>>> (which would be awesome), and for which i don't need to write my own
>>> functionality for each client library.
>>>
>>> A numerical in-memory database, if you will.
>>> On 04/01/2021 15:55, Chris Nuernberger wrote:
>>>
>>> Julia, Python, and R all have some support for mmap operations.
>>>
>>> On Mon, Jan 4, 2021 at 8:55 AM Chris Nuernberger <chris@techascent.com>
>>> wrote:
>>>
>>>> Could simply saving the arrow file in streaming mode to shared memory
>>>> and then mmap-ing the result in each language solve your problem ?  Plasma
>>>> seems to me to be a layer on top of basic mmap operations; as long as you
>>>> have shared memory and mmap then you can have multiple processes talking
to
>>>> the same logical block of memory.
>>>>
>>>> On Mon, Jan 4, 2021 at 8:27 AM Thomas Browne <thomas@crvm.io> wrote:
>>>>
>>>>> I am hoping to use the Apache Arrow project for cross-language
>>>>> numerical
>>>>> computation, and for that the shared-memory idea is very powerful. Am
>>>>> I
>>>>> correct that the Plasma Store is the enabling technology for this,
>>>>> especially for soft real-time computation (ie not moving to parquet or
>>>>> any file-based sharing system)?
>>>>>
>>>>> Is that the case? And if so, then I'm wondering which client
>>>>> libraries,
>>>>> other than Python (and I assume C[++]), implement the Plasma Store.
>>>>> This
>>>>> table doesn't feature a row for Plasma:
>>>>>
>>>>> https://arrow.apache.org/docs/status.html
>>>>>
>>>>> and I can't seem to find any reference to the Plasma store in the
>>>>> Julia,
>>>>> R, or Javascript libraries.
>>>>>
>>>>> https://arrow.apache.org/docs/r/
>>>>>
>>>>> https://arrow.apache.org/docs/js/
>>>>>
>>>>> https://arrow.juliadata.org/stable/
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>

-- 
*First Talent Search & Placement*
*Burke Kaltenberger
<https://www.linkedin.com/in/burke-kaltenberger-3a41731/> | Founder*
*408.458.0071*

Mime
View raw message