arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Does arrow streaming support splitting big string/binary data?
Date Tue, 25 Jun 2019 01:35:48 GMT
hi Ivan,

Currently all implementations of Arrow treat record batch protocol
messages are atomic entities, in the sense that IPC protocol readers
expect to have access to the entire message in virtual address space.

If Arrow protocol payloads need to be split on the wire, usually
that's handled by the underlying transport layer. For example, in
Flight (which uses gRPC as its default transport), gRPC breaks large
messages into smaller buffers internally.

- Wes

On Mon, Jun 24, 2019 at 8:29 PM Ivan Popivanov <ivan.popivanov@gmail.com> wrote:
>
> Hello,
>
> Looking at these examples and the documentation, it seems that a record batch cannot
span multiple messages. Is my understanding correct?
>
> Here is the scenario I am considering: two columns, an int and a string. Let assume that
we want the maximum message size to be 64K. If there is a row with a string value of let's
say 70K, it has to span multiple batches. Does the current message format support this?
>
> If it doesn't, then another layer is needed to create the messages when a column size
is a multiple of the message size.
>
> Thanks
> Ivan

Mime
View raw message