arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawson D'Almeida" <dawson.dalme...@snowflake.com>
Subject Re: [c++] Help with serializing and IPC with dictionary arrays
Date Thu, 18 Feb 2021 18:13:50 GMT
Hi Wes,

We have our own implementation of something like Flight for flexibility of
use.

The main thing that I am trying to figure out is how to get the dictionary
record batches properly deserialized on the server side. On the client
side, I can deserialize them properly using the dictionarymemo directly
from the record batch we create, but on the other side I do not have access
to the same dictionarymemo. How is this passed in Flight? I have been
trying to find this in the source code but haven't yet.

Thanks,
Dawson

On Fri, Feb 12, 2021 at 3:34 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> hi Dawson — you need to follow the IPC stream protocol, e.g. what
> RecordBatchStreamWriter or RecordBatchStreamReader are doing
> internally. Is there a reason you cannot use these interfaces
> (particularly their internal bits, which are also used to implement
> Flight where messages are split across different elements of a gRPC
> stream)?
>
> I'm not sure that I would advise you to deal with dictionary
> disassembly and reconstruction on your own unless it's your only
> option. That said if you look in the unit test suite you should be
> able to find examples of where DictionaryBatch IPC messages are
> reconstructed manually, and then used to reconstitute a RecordBatch
> IPC message using the arrow::ipc::ReadRecordBatch API. We can try to
> help you look in the right place, let us know.
>
> Thanks,
> Wes
>
> On Fri, Feb 12, 2021 at 2:58 PM Dawson D'Almeida
> <dawson.dalmeida@snowflake.com> wrote:
> >
> > I am trying to create a record batch containing any number of dictionary
> and/or normal arrow arrays, serialize the record batch into bytes (a normal
> std::string), and send it via grpc to another server process. On that end
> we receive the arrow bytes and deserialize using the bytes and the schema.
> >
> > Is there a standard way to serialize/deserialize these dictionary
> arrays? It seems like all of the info is packaged correctly into the record
> batch.
> >
> > I've looked through a lot of the c++ apache arrow source and test code
> but I can't find how to approach our use case.
> >
> > The current failure is:
> > Field with memory address 140283497044320 not found
> > from the returns status from arrow::ipc::ReadRecordBatch
> >
> > Thanks,
> > --
> > Dawson d'Almeida
> > Software Engineer
> >
> > MOBILE  +1 360 499 1852
> > EMAIL  dawson.dalmeida@snowflake.com
> >
> >
> > Snowflake Inc.
> > 227 Bellevue Way NE
> > Bellevue, WA, 98004
>


-- 
Dawson d'Almeida
Software Engineer

MOBILE  +1 360 499 1852
EMAIL  dawson.dalmeida@snowflake.com <name.lastname@snowflake.com>


Snowflake Inc.
227 Bellevue Way NE
Bellevue, WA, 98004

Mime
View raw message