arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: unsubscribe
Date Fri, 15 Jan 2021 17:09:37 GMT
You have to e-mail user-unsubscribe@arrow.apache.org

On Fri, Jan 15, 2021 at 11:09 AM Sisneros, Dominic E (FAA)
<Dominic.E.Sisneros@faa.gov> wrote:
>
>
>
> Dominic Sisneros
> FAA, WSA Engineering Services, AJW-2W13B
> Office: 801-320-2377
> Cell: 801-558-1966
>
> -----Original Message-----
> From: Wes McKinney <wesmckinn@gmail.com>
> Sent: Friday, January 15, 2021 8:38 AM
> To: user@arrow.apache.org
> Subject: Re: compute::Take & ChunkedArrays
>
> You can do that, but note that the implementation is currently not efficient, see
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909
>
> Rather than pre-concatenating the chunks (which can easily fail) and then invoking Take
on the resulting concatenated Array, it would be better to do a O(N log K) take on the chunks
directly, where N is the number of take indices and K is the number of chunks.
>
> For example, if you have chunks of size
>
> 10
> 50
> 100
> 20
>
> then the algorithm computes the following offset table:
>
> 0
> 10
> 60
> 160
> 180
>
> Indices relative to the whole ChunkedArray are translated to (chunk number, intrachunk
index), for example:
>
> take with [5, 40, 100, 170] is translated by doing binary searches in the offset table
to:
>
> (chunk=0, relative_index=5)
> (1, 30)
> (2, 40)
> (3, 10)
>
> Consecutive indices from the same chunk are batched together and then Take is invoked
on the respective chunk (with boundschecking disabled) to select a chunk for the resulting
output ChunkedArray.
>
> Might be helpful to copy this to the appropriate Jira (I'm sure there is one already)
to assist the person who implements this.
>
> Thanks,
> Wes
>
> On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera <niranda.perera@gmail.com> wrote:
> >
> > Hi all,
> >
> > I was wondering how the Take API works with ChunkedArrays?
> > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] so,
> > if I want an element from each array, can I pass something like [10, 60] as the
indices?
> >
> > --
> > Niranda Perera
> > @n1r44
> > +1 812 558 8884 / +94 71 554 8430
> > https://www.linkedin.com/in/niranda

Mime
View raw message