arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: [python] Share data between multiple processes
Date Sat, 23 Jan 2021 04:45:23 GMT
The easiest way I can think of is writing the data to a single file then
using mmap [1] and file IO [2] to read the data of interest.  I haven't
tested this out but I think this would be zero-copy.

Another alternative which isn't currently maintained is Plasma [3]

[1] https://docs.python.org/3/library/mmap.html
[2] https://arrow.apache.org/docs/python/memory.html#input-and-output
[3] https://arrow.apache.org/docs/python/plasma.html


On Fri, Dec 11, 2020 at 11:21 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hello,
>
> I'm implementing a text data analyzer that compares all the data against
> each other. This  process benefits a lot from using multiprocessing.
> However, I'm having problems sharing the data between the processes and I
> think arrow will solve this easily.
> I was wondering if someone could point me in the right direction. How does
> one go sharing the location of the mapped data among the different
> processes spawned by the main process? Do I have to share the table pointer
> between the processes?
>
> Any guidance would be much appreciated
> Fernando
>

Mime
View raw message