arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bipin Mathew <bipinmat...@gmail.com>
Subject Re: Help with writing Apache Arrow tables to shared memory.
Date Fri, 12 Oct 2018 04:08:49 GMT
Good Evening Everyone,

    Circling back to this ask. I just wanted to suggest, that instead of a
an email thread, it maybe more valuable for some of the noobs out there
like me, to have available an "Apache Arrow and Shared Memory", "Hello
World" example program on the developer wiki, possibly without the
additional complication of Plasma. I managed to get many ancillary features
of Apache Arrow working ( IPC for example ), but have not quite closed the
circle on the raison d'etre for Apache Arrow, which is efficiently sharing
tables and record batches in shared memory. It is not even obvious to me,
if it is possible to construct the tables in shared memory or if they have
to be copied there after being constructed elsewhere.

    I also happened to come across this, currently unanswered, question on
stack overflow which references an approach I was thinking about (
basically create a shared memory subclass for MemoryPool ), but was not
sure that was the appropriate level of the stack at which to attack this
problem.

https://stackoverflow.com/questions/52673910/allocate-apache-arrow-memory-pool-in-external-memory

Another approach I was considering is subclassing form ResizeableBuffer,
but was not sure if that is the right method either since I was not sure if
I could construct tables in shared memory without copying.

Thank you to this great community for all your help in this matter. I am
very excited about this project and its prospects.

Regards,

Bipin



On Wed, Oct 3, 2018 at 4:37 PM Bipin Mathew <bipinmathew@gmail.com> wrote:

> Totally understandable. Thank you Wes! We can continue this correspondence
> there. Looking forward to the 0.11 release :-)
>
> Regards,
>
> Bipin
>
> On Wed, Oct 3, 2018 at 4:22 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> hi Bipin -- I will reply to your mail on the dev@ mailing list but it
>> may take me some time. I'm traveling internationally to conferences
>> and also have been focused on moving the 0.11 release forward.
>>
>> - Wes
>> On Wed, Oct 3, 2018 at 12:00 PM Bipin Mathew <bipinmathew@gmail.com>
>> wrote:
>> >
>> > Good Morning Everyone,
>> >
>> >     I originally posted this question to the dev channel, not knowing a
>> user channel was available. This channel is more probably more appropriate
>> and I am hoping the kind souls here can help me. How, fundamentally, are we
>> expected, to copy or indeed directly write a arrow table to shared memory
>> using the cpp sdk? Currently, I have an implementation like this:
>> >
>> >>  77   std::shared_ptr<arrow::Buffer> B;
>> >>  78   std::shared_ptr<arrow::io::BufferOutputStream> buffer;
>> >>  79   std::shared_ptr<arrow::ipc::RecordBatchWriter> writer;
>> >>  80   arrow::MemoryPool* pool = arrow::default_memory_pool();
>> >>  81   arrow::io::BufferOutputStream::Create(4096,pool,&buffer);
>> >>  82   std::shared_ptr<arrow::Table> table;
>> >>  83   karrow::ArrowHandle *h;
>> >>  84   h = (karrow::ArrowHandle *)Kj(khandle);
>> >>  85   table = h->table;
>> >>  86
>> >>  87
>>  arrow::ipc::RecordBatchStreamWriter::Open(buffer.get(),table->schema(),&writer);
>> >>  88   writer->WriteTable(*table);
>> >>  89   writer->Close();
>> >>  90   buffer->Finish(&B);
>> >>  91
>> >>  92   // printf("Investigate Memory usage.");
>> >>  93   // getchar();
>> >>  94
>> >>  95
>> >>  96   std::shared_ptr<arrow::io::MemoryMappedFile> mm;
>> >>  97
>>  arrow::io::MemoryMappedFile::Create("/dev/shm/arrow_table",B->size(),&mm);
>> >>  98   mm->Write(B->data(),B->size());
>> >>  99   mm->Close();
>> >
>> >
>> > "table" on line 85 is a shared_ptr to a arrow::Table object. As you can
>> see there, I write to an arrow:Buffer then write that to a memory mapped
>> file. Is there a more direct approach? I watched this video of a talk @Wes
>> McKinney gave here:
>> >
>> > https://www.dremio.com/webinars/arrow-c++-roadmap-and-pandas2/
>> >
>> > Where a method: arrow::MemoryMappedBuffer was referenced, but I have
>> not seen any documentation regarding this function. Has it been deprecated?
>> >
>> > Also, as I mentioned, "table" up there is a arrow::Table object. I
>> create it columnwise using various arrow::[type]Builder functions. Is there
>> anyway to actually even write the original table directly into shared
>> memory? Any guidance on the proper way to do these things would be greatly
>> appreciated.
>> >
>> > Regards,
>> >
>> > Bipin
>>
>

Mime
View raw message