arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [C++] Storing/retreiving a Table in plasma
Date Mon, 20 May 2019 18:45:26 GMT
hi Miki,

In

https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc#L47

GetRecordBatchSize does not represent the entire size of the stream
including schema. If you are serializing Schema separate from
RecordBatch then you need to use the lower level
arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at
the unit tests

If you are going to use RecordBatchStreamWriter then you need to
compute the size using MockOutputStream per my original e-mail

- Wes

On Mon, May 20, 2019 at 12:50 PM Miki Tebeka <miki@353solutions.com> wrote:
>>
>> That link didn't work for me.
>
> Doh! I moved it to https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc
>
>>
>> Would it not be better to do this work in Apache Arrow rather than an external project?
I would guess the
>> community would be interested in this.
>
> I do plan to suggest this as a patch to arrow once the code is usable, currently it's
just noise.
>
> The idea behind carrow is to use the underlying C++ both in Python & Go so that in
the same process we can simply share pointers (and maybe later used shared memory allocator
to do it between processes).  I don't see a clear path to do it with the current Go implementation
since it's uses the Go runtime to allocate memory, and carrow has a complicated build process
that currently won't with with simple "go get".
>
> To get initial usable Go<->Python IPC quickly, I'm trying to utilize plasma for
now. However in the long run I'd like to just share pointers with no serializaton at all.
>
> I'd love to discuss how we can make this project usable and get the community help in
solving some "easy of build" issues later on. Would love to have it in the main arrow eventually.

Mime
View raw message