arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [python] Table.filter outputs in memory with no option to direct it to memory map
Date Thu, 25 Mar 2021 15:42:09 GMT
hi Theo — I think this use case needs to align with our query engine work
that's currently percolating. So rather than eagerly evaluating a filter,
instead we would produce a query plan whose sink is an IPC file or
collection of IPC files.

So from

result = table.filter(boolean_array)

to something like

filter_step = source.filter(filter_expr)
sink_step = write_to_ipc(filter_step, location)
sink_step.execute()

The filtered version of "source" would never be materialized in memory, so
this could run with limited memory footprint

On Thu, Mar 25, 2021 at 11:19 AM Théo Matussière <theo@huggingface.co>
wrote:

> Hi all,
> Thanks for all the cool work on Arrow, it's definitely making things
> easier for us :)
>
> I'm wondering if there is a workaround for the current behaviour of
> `Table.filter` that I'm seeing, in that its result goes to RAM even if the
> table is memory mapped.
>
> Here's an example code to highlight the behaviour:
>
> [image: Screenshot 2021-03-25 at 16.11.31.png]
>
> Thanks for the attention!
> Théo
>

Mime
View raw message