arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Pitrou <anto...@python.org>
Subject Re: [Python] Why is the access to values in ChunkedArray O(n_chunks) ?
Date Mon, 15 Mar 2021 18:14:05 GMT
On Mon, 15 Mar 2021 11:02:09 -0700
Micah Kornfield <emkornfield@gmail.com> wrote:
> >
> > Do you know if the iteration is done on the python side, or on the C++
> > side ?  
> 
> It appears to be in cython [1] which looking at the definition, I would
> expect to compile down to pretty straightforward C code.
> 
> May I create a post on JIRA about adding an indexing structure for
> > ChunkedArray ?  
> 
> Yes, please do, I'm sure others will have thoughts on the best way to
> incorporate this (it might also be a good first contribution to the project
> if you are interested).  I think also providing some context from this
> thread on the relative slowness would be good (the bottleneck still might
> be something else, that others more familiar with the code could point to).

Just for the record, there is already something like this in the
compute layer.  The general idea can probably be reused.

https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L94

Regards

Antoine.



Mime
View raw message