arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <>
Subject Re: Filtering list/map arrays
Date Fri, 21 May 2021 18:07:09 GMT
I think we would want to implement a scalar "list_isin" function as a
core C++ function, so the type signature looks like this:

(Array<List<T>>, Scalar<T>) -> Array<Boolean>

I couldn't find an issue like this with a quick Jira search so I created

On Fri, May 21, 2021 at 8:06 AM Elad Rosenheim <> wrote:
> Hi!
> One of the gaps I currently have in Funnel Rocket (
is supporting nested columns, as in: given a Parquet file with a column of type List(int64),
be able to find rows where the list holds a specific int element.
> Right now, the need is fortunately limited to lists of primitives (mostly int) and maps
of string->string, rather than any arbitrary complexity.
> Currently, I load Parquet files via pyarrow, then call to_pandas() and run multiple filters
on the DataFrame.
> After reading Uwe's blog post (
and looking at the Fletcher project (, seems the "proper"
way to do it would be:
> * Write an ExtensionDType/ExtensionArray can wrap an arrow ChunkedArray made of ListArrays.
Not even sure what the operator should be for lookup in a list - should I treat a list_series==123
as "for each list in this series, look for the element 123 in it?".
>  * Potentially use a @jitclass for more performant lookup, as Uwe has outlined.
> * For now, for any abstract method I'm not sure what to do with - start with raising
an exception, then run some unit tests based on my project's needs, and see that they pass
> * When calling Table.to_pandas(), supply a type mapper argument to map the specific supported
types to the appropriate extension class.
> * If it seems to work, figure out if I've missed something important in the concrete
classes :-/
> Am I getting this right, more or less?
> Thanks a lot,
> Elad

View raw message