hi Theo — I think this use case needs to align with our query engine work that's currently percolating. So rather than eagerly evaluating a filter, instead we would produce a query plan whose sink is an IPC file or collection of IPC files.
result = table.filter(boolean_array)
to something like
filter_step = source.filter(filter_expr)
sink_step = write_to_ipc(filter_step, location)
The filtered version of "source" would never be materialized in memory, so this could run with limited memory footprint