arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yeshwanth Sriram <>
Subject Re: [C++] - How to extract indices of nested MapArray
Date Wed, 03 Mar 2021 23:41:16 GMT
Hi Micah,

Thank you for the detailed response. Apologize for not responding earlier.

a.) Looked at the latencies with and without filtering based on just foreach and the latency
is dominated by the parquet/write operation. So I’m going to go with what I have which already
provides substantial improvement for my use case.

b.) Would like to contribute for implement ANY over booleans in Arrow/compute kernel. Waiting
for permission to come through.

I’m also interested in contributing to Azure/ADLS filesystem but the library I was looking
at is c++14 here <>
. Is c++14 no-go as a dependency in Arrow (even conditional ?)

Thank you

> On Feb 28, 2021, at 2:09 PM, Micah Kornfield <> wrote:
> Hi  Yeshwanth, 
> I think you can do the first part of the filtering using the Equals kernel and IsIn kernel
on the child arrays of the Map.  I took a quick look but I don't think that there is anything
implemented that would allow you to map the resulting bitmaps to the parent lists. It seems
that we would want to add an "Any" function for List<Bool> that returns a Bool array
if any of the elements are true. There is already one for flat Boolean Arrays [1] but I don't
think that is useful here.
> So I think the logic that you would ultimately want in pseudo-code:
> children_bitmap = Equals(map.key, "some string") && IsIn( <>,
[[“aaa”, “bee”, “see”])
> list = MakeList(map.offsets, children_bitmap)
> final_selection = Any(list)
> Is the new Kernel something you would be interested in contributing? 
> -Micah
> [1] <>
> On Sun, Feb 28, 2021 at 9:05 AM Yeshwanth Sriram < <>>
> Using C++//Arrow to filter out large parquet files and I’m able to do this successfully.
The current poc implementation is based on nested for/loops which I would like to avoid this
and instead use built-in filter/take functions or some recommendations  to extract (take functions
?) arrays of indices or booleans to filter out rows.
> The input (data) array/column type is MapArray[key:String, value:StructArray[id:String,
> The input filter is a {filter_key: “some string”, filter_ids: [“aaa”, “bee”,
“see”, ..] }
>   - Where filter_key, and filter_ids is to match contents of input MapArray
> The output I’m looking for is either array of booleans or indices of input array that
match the input filer.
> Thank you

View raw message