Hi Wes,
Thanks. On the top of my head, that was a similar algorithm I had in mind
as well.
Is this the JIRA you were referring to? [1]
I see that there are some improvements that have been done here [2].
I guess bug reports like this [3] are also related to the same scenario.
Is there anyone working on this?
Best
[1] https://issues.apache.org/jira/browse/ARROW5454
[2] https://github.com/apache/arrow/pull/8823
[3] https://issues.apache.org/jira/browse/ARROW10799
On Fri, Jan 15, 2021 at 10:38 AM Wes McKinney <wesmckinn@gmail.com> wrote:
> You can do that, but note that the implementation is currently not
> efficient, see
>
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909
>
> Rather than preconcatenating the chunks (which can easily fail) and
> then invoking Take on the resulting concatenated Array, it would be
> better to do a O(N log K) take on the chunks directly, where N is the
> number of take indices and K is the number of chunks.
>
> For example, if you have chunks of size
>
> 10
> 50
> 100
> 20
>
> then the algorithm computes the following offset table:
>
> 0
> 10
> 60
> 160
> 180
>
> Indices relative to the whole ChunkedArray are translated to (chunk
> number, intrachunk index), for example:
>
> take with [5, 40, 100, 170] is translated by doing binary searches in
> the offset table to:
>
> (chunk=0, relative_index=5)
> (1, 30)
> (2, 40)
> (3, 10)
>
> Consecutive indices from the same chunk are batched together and then
> Take is invoked on the respective chunk (with boundschecking disabled)
> to select a chunk for the resulting output ChunkedArray.
>
> Might be helpful to copy this to the appropriate Jira (I'm sure there
> is one already) to assist the person who implements this.
>
> Thanks,
> Wes
>
> On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera
> <niranda.perera@gmail.com> wrote:
> >
> > Hi all,
> >
> > I was wondering how the Take API works with ChunkedArrays?
> > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50]
> > so, if I want an element from each array, can I pass something like [10,
> 60] as the indices?
> >
> > 
>

