arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhuo Peng (Jira)" <>
Subject [jira] [Created] (ARROW-6775) Proposal for several Array utility functions
Date Wed, 02 Oct 2019 22:43:00 GMT
Zhuo Peng created ARROW-6775:

             Summary: Proposal for several Array utility functions
                 Key: ARROW-6775
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Zhuo Peng


We developed several utilities that computes / accesses certain properties of Arrays and wonder
if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming
yes, where is the best place to put them?

Maybe I have overlooked existing APIs that already do the same.. in that case please point


1/ ListLengthFromListArray(ListArray&)

Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists).
For example:

[[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can
be converted to numpy)


2/ GetBinaryArrayTotalByteSize(BinaryArray&)

Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]).

Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.


3/ GetArrayNullBitmapAsByteArray(Array&)

Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool
numpy array)


4/ GetFlattenedArrayParentIndices(ListArray&)

Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j
means i-th element in the flattened ListArray came from j-th list in the ListArray.

For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]


This message was sent by Atlassian Jira

View raw message