This will be new work that we anticipate will be available at some point in the future (sooner if others help out!). 

You could do this now by hand by breaking a large table into small chunks, filtering them, then writing each chunk into an output file. 

On Thu, Mar 25, 2021 at 12:21 PM Théo Matussière <theo@huggingface.co> wrote:
Hi Wes, thanks for the quick reply! 
I'm sorry but I'm not sure I understand what you're referring to with "our query engine work that's currently percolating". Are you referring to ongoing work on Arrow that we can expect to land in the near future, or something that's already available that you're working to leverage in your own use-case?
I think the ambiguity for me comes from your example that shows the same API as the one that currently exists, so that it's unclear what actually makes it a query plan.
Best,
Théo

On Thu, Mar 25, 2021 at 4:42 PM Wes McKinney <wesmckinn@gmail.com> wrote:
hi Theo — I think this use case needs to align with our query engine work that's currently percolating. So rather than eagerly evaluating a filter, instead we would produce a query plan whose sink is an IPC file or collection of IPC files.

So from

result = table.filter(boolean_array)

to something like

filter_step = source.filter(filter_expr)
sink_step = write_to_ipc(filter_step, location)
sink_step.execute()

The filtered version of "source" would never be materialized in memory, so this could run with limited memory footprint

On Thu, Mar 25, 2021 at 11:19 AM Théo Matussière <theo@huggingface.co> wrote:
Hi all,
Thanks for all the cool work on Arrow, it's definitely making things easier for us :)

I'm wondering if there is a workaround for the current behaviour of `Table.filter` that I'm seeing, in that its result goes to RAM even if the table is memory mapped.

Here's an example code to highlight the behaviour:

Screenshot 2021-03-25 at 16.11.31.png

Thanks for the attention!
Théo