arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [c++] Help with serializing and IPC with dictionary arrays
Date Thu, 18 Feb 2021 19:09:32 GMT
I believe you have to extend the ipc::MessageReader interface, have you
looked at the details in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/client.cc#L425

? (there is analogous code handling the Put side in server.cc) The idea is
that you feed the stream of IPC messages and the dictionary
accounting/record batch reconstruction is handled internally.

On Thu, Feb 18, 2021 at 12:14 PM Dawson D'Almeida <
dawson.dalmeida@snowflake.com> wrote:

> Hi Wes,
>
> We have our own implementation of something like Flight for flexibility of
> use.
>
> The main thing that I am trying to figure out is how to get the dictionary
> record batches properly deserialized on the server side. On the client
> side, I can deserialize them properly using the dictionarymemo directly
> from the record batch we create, but on the other side I do not have access
> to the same dictionarymemo. How is this passed in Flight? I have been
> trying to find this in the source code but haven't yet.
>
> Thanks,
> Dawson
>
> On Fri, Feb 12, 2021 at 3:34 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> hi Dawson — you need to follow the IPC stream protocol, e.g. what
>> RecordBatchStreamWriter or RecordBatchStreamReader are doing
>> internally. Is there a reason you cannot use these interfaces
>> (particularly their internal bits, which are also used to implement
>> Flight where messages are split across different elements of a gRPC
>> stream)?
>>
>> I'm not sure that I would advise you to deal with dictionary
>> disassembly and reconstruction on your own unless it's your only
>> option. That said if you look in the unit test suite you should be
>> able to find examples of where DictionaryBatch IPC messages are
>> reconstructed manually, and then used to reconstitute a RecordBatch
>> IPC message using the arrow::ipc::ReadRecordBatch API. We can try to
>> help you look in the right place, let us know.
>>
>> Thanks,
>> Wes
>>
>> On Fri, Feb 12, 2021 at 2:58 PM Dawson D'Almeida
>> <dawson.dalmeida@snowflake.com> wrote:
>> >
>> > I am trying to create a record batch containing any number of
>> dictionary and/or normal arrow arrays, serialize the record batch into
>> bytes (a normal std::string), and send it via grpc to another server
>> process. On that end we receive the arrow bytes and deserialize using the
>> bytes and the schema.
>> >
>> > Is there a standard way to serialize/deserialize these dictionary
>> arrays? It seems like all of the info is packaged correctly into the record
>> batch.
>> >
>> > I've looked through a lot of the c++ apache arrow source and test code
>> but I can't find how to approach our use case.
>> >
>> > The current failure is:
>> > Field with memory address 140283497044320 not found
>> > from the returns status from arrow::ipc::ReadRecordBatch
>> >
>> > Thanks,
>> > --
>> > Dawson d'Almeida
>> > Software Engineer
>> >
>> > MOBILE  +1 360 499 1852
>> > EMAIL  dawson.dalmeida@snowflake.com
>> >
>> >
>> > Snowflake Inc.
>> > 227 Bellevue Way NE
>> > Bellevue, WA, 98004
>>
>
>
> --
> Dawson d'Almeida
> Software Engineer
>
> MOBILE  +1 360 499 1852
> EMAIL  dawson.dalmeida@snowflake.com <name.lastname@snowflake.com>
>
>
> Snowflake Inc.
> 227 Bellevue Way NE
> Bellevue, WA, 98004
>

Mime
View raw message