arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Question about memoryviews and array construction
Date Sat, 07 Mar 2020 14:18:11 GMT
hi Dan,

Yes, we should support constructing StringArray directly from
memoryview as we do with bytes and unicode -- you're the first person
to ask about this so far. I opened
https://issues.apache.org/jira/browse/ARROW-8026. This should not be a
huge amount of work so would be a good first contribution to the
project

Thanks

Wes

On Fri, Mar 6, 2020 at 8:29 PM Nugent, Daniel <Daniel.Nugent@mlp.com> wrote:
>
> Hi,
>
>
>
> I have a short program which I’m wondering about the sensibility of. Could anyone let
me know if this is reasonable or not:
>
>
>
> >>> import pyarrow as pa, third_party_library
>
> >>> memory_views = third_party_library.get_strings()
>
> >>> memory_views
>
> [<memory at 0x7f1745cc0870>, <memory at 0x7f1745cc0940>, <memory at 0x7f1745cc0a10>,
<memory at 0x7f1745cc0ae0>]
>
> >>> pa.array(memory_views,pa.string())
>
> Traceback (most recent call last):
>
>   File "<stdin>", line 1, in <module>
>
>   File "pyarrow/array.pxi", line 269, in pyarrow.lib.array
>
>   File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array
>
>   File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
>
> pyarrow.lib.ArrowTypeError: Expected a string or bytes object, got a 'memoryview' object
>
> >>> pa.array(map(bytes,memory_views),pa.string())
>
> <pyarrow.lib.StringArray object at 0x7f1745cbdd00>
>
> [
>
>   "this",
>
>   "is",
>
>   "a",
>
>   "sample"
>
> ]
>
>
>
> I have a big list of byte sequences being provided to me as memoryviews from a third
party library. I’d like to create an Arrow StringArray from them as efficiently as possible.
Having to map and consequently copy them through a bytes constructor seems not great (and
the memoryview tobytes function appears to just call the bytes constructor, afaict).
>
>
>
> To me, it seemed like pa.array should be able to use the memoryview objects directly
in order to construct the StringArray, but it seems like Arrow wants them copied into fresh
byte objects first. I don’t know if I understand why and was ultimately wondering if it’s
a reasonable thing to desire.
>
>
>
> Thanks in advance,
>
> -Dan Nugent
>
>
>
>
> ######################################################################
>
> The information contained in this communication is confidential and
>
> may contain information that is privileged or exempt from disclosure
>
> under applicable law. If you are not a named addressee, please notify
>
> the sender immediately and delete this email from your system.
>
> If you have received this communication, and are not a named
>
> recipient, you are hereby notified that any dissemination,
>
> distribution or copying of this communication is strictly prohibited.
>
> ######################################################################

Mime
View raw message