From user-return-26-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri Oct 12 06:09:13 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4855318064A for ; Fri, 12 Oct 2018 06:09:12 +0200 (CEST) Received: (qmail 6100 invoked by uid 500); 12 Oct 2018 04:09:11 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 6090 invoked by uid 99); 12 Oct 2018 04:09:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2018 04:09:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CF1DCC20DC for ; Fri, 12 Oct 2018 04:09:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.888 X-Spam-Level: * X-Spam-Status: No, score=1.888 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id suXzNEjtOmJe for ; Fri, 12 Oct 2018 04:09:09 +0000 (UTC) Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1B55E5F3EE for ; Fri, 12 Oct 2018 04:09:08 +0000 (UTC) Received: by mail-pg1-f176.google.com with SMTP id v133-v6so5189608pgb.2 for ; Thu, 11 Oct 2018 21:09:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=t7hpyZzELMvQJSuA2gy/Z56WpCsc76CM1zlMyGfUKhk=; b=VYr3x8uZICxN4FX5bvqL/VCicV6lvqRGX0NAwDiTPUAHhRd9EFEFUo0m33DiRLT4Vr g7/pEQUtK1jnUfjtvDhWd8mAT97QPI1x/pOUsyCi1efON/mbnoV8bJhn8r/grwywDqce 9LfFcFdogGLm5vBoFvFJr8yRSgo4Z8dIrCNiKY6AAP66ep7wt9cM2dnWgCPrQ60qRe9p IjeUods418L8G5yX+XBQibXfbWVcBpBWNkluXsyJPnrXRUeX7FaCQzYDGz+qEuHz2sFP /tUdmhTsk3o9nvd//Y/p1JrGW00ZJlMD5Qicz5IRB89S6jVI1KLxx/8WMtL+YyQYZxc4 19tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=t7hpyZzELMvQJSuA2gy/Z56WpCsc76CM1zlMyGfUKhk=; b=iCbePepp4a7ivgdouHJJtPphMuO21fOZxXwJ1+3yiF2hRJhVh2f8ADU3OU914cL1ky PJpxbzxXrNR6JQZx2B806x08jawr8BnkhmQEyZQ2M02B/cToCxAc6+Imnw4ua+uRmWgq 5fVEZBtC7cGtlQKZQZHblh0v4y4EEJZ1BhAydq9CQeaqsNU2Z0mkuwe+N3CibQlWnlm7 pilggFjyg8gKnQ7vgtXH9GfgMOHrbWLogoWn/jM+5Rz0WphMlf4+LleY0Uu1+xVuybzf 4EiK5VWEej8DMx1GRhT8egCL2d7yNauaPmd7jTvZPXYMW2fZ8fNDs5MR2cvWdaGkvfiz 6oqw== X-Gm-Message-State: ABuFfoiqZHU5EKGX+ljzfWTRMcQbWueYqqpsIPiLP51arcvACACaOcVX GQcksG9rVoxl5jj+DNzDyVvhlaRkjUlDpD/KinkfAA2J X-Google-Smtp-Source: ACcGV62aCnVvcAvvvKGVTeikHTiyVWb37GkDDYhwe08D6CEx91B5s9zuZYsAEnq6WZq489TjSzABcx79SmoujLrxxIs= X-Received: by 2002:a62:c252:: with SMTP id l79-v6mr4361364pfg.141.1539317340388; Thu, 11 Oct 2018 21:09:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Bipin Mathew Date: Fri, 12 Oct 2018 00:08:49 -0400 Message-ID: Subject: Re: Help with writing Apache Arrow tables to shared memory. To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000004541d10578003f0a" --0000000000004541d10578003f0a Content-Type: text/plain; charset="UTF-8" Good Evening Everyone, Circling back to this ask. I just wanted to suggest, that instead of a an email thread, it maybe more valuable for some of the noobs out there like me, to have available an "Apache Arrow and Shared Memory", "Hello World" example program on the developer wiki, possibly without the additional complication of Plasma. I managed to get many ancillary features of Apache Arrow working ( IPC for example ), but have not quite closed the circle on the raison d'etre for Apache Arrow, which is efficiently sharing tables and record batches in shared memory. It is not even obvious to me, if it is possible to construct the tables in shared memory or if they have to be copied there after being constructed elsewhere. I also happened to come across this, currently unanswered, question on stack overflow which references an approach I was thinking about ( basically create a shared memory subclass for MemoryPool ), but was not sure that was the appropriate level of the stack at which to attack this problem. https://stackoverflow.com/questions/52673910/allocate-apache-arrow-memory-pool-in-external-memory Another approach I was considering is subclassing form ResizeableBuffer, but was not sure if that is the right method either since I was not sure if I could construct tables in shared memory without copying. Thank you to this great community for all your help in this matter. I am very excited about this project and its prospects. Regards, Bipin On Wed, Oct 3, 2018 at 4:37 PM Bipin Mathew wrote: > Totally understandable. Thank you Wes! We can continue this correspondence > there. Looking forward to the 0.11 release :-) > > Regards, > > Bipin > > On Wed, Oct 3, 2018 at 4:22 PM Wes McKinney wrote: > >> hi Bipin -- I will reply to your mail on the dev@ mailing list but it >> may take me some time. I'm traveling internationally to conferences >> and also have been focused on moving the 0.11 release forward. >> >> - Wes >> On Wed, Oct 3, 2018 at 12:00 PM Bipin Mathew >> wrote: >> > >> > Good Morning Everyone, >> > >> > I originally posted this question to the dev channel, not knowing a >> user channel was available. This channel is more probably more appropriate >> and I am hoping the kind souls here can help me. How, fundamentally, are we >> expected, to copy or indeed directly write a arrow table to shared memory >> using the cpp sdk? Currently, I have an implementation like this: >> > >> >> 77 std::shared_ptr B; >> >> 78 std::shared_ptr buffer; >> >> 79 std::shared_ptr writer; >> >> 80 arrow::MemoryPool* pool = arrow::default_memory_pool(); >> >> 81 arrow::io::BufferOutputStream::Create(4096,pool,&buffer); >> >> 82 std::shared_ptr table; >> >> 83 karrow::ArrowHandle *h; >> >> 84 h = (karrow::ArrowHandle *)Kj(khandle); >> >> 85 table = h->table; >> >> 86 >> >> 87 >> arrow::ipc::RecordBatchStreamWriter::Open(buffer.get(),table->schema(),&writer); >> >> 88 writer->WriteTable(*table); >> >> 89 writer->Close(); >> >> 90 buffer->Finish(&B); >> >> 91 >> >> 92 // printf("Investigate Memory usage."); >> >> 93 // getchar(); >> >> 94 >> >> 95 >> >> 96 std::shared_ptr mm; >> >> 97 >> arrow::io::MemoryMappedFile::Create("/dev/shm/arrow_table",B->size(),&mm); >> >> 98 mm->Write(B->data(),B->size()); >> >> 99 mm->Close(); >> > >> > >> > "table" on line 85 is a shared_ptr to a arrow::Table object. As you can >> see there, I write to an arrow:Buffer then write that to a memory mapped >> file. Is there a more direct approach? I watched this video of a talk @Wes >> McKinney gave here: >> > >> > https://www.dremio.com/webinars/arrow-c++-roadmap-and-pandas2/ >> > >> > Where a method: arrow::MemoryMappedBuffer was referenced, but I have >> not seen any documentation regarding this function. Has it been deprecated? >> > >> > Also, as I mentioned, "table" up there is a arrow::Table object. I >> create it columnwise using various arrow::[type]Builder functions. Is there >> anyway to actually even write the original table directly into shared >> memory? Any guidance on the proper way to do these things would be greatly >> appreciated. >> > >> > Regards, >> > >> > Bipin >> > --0000000000004541d10578003f0a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Good Evening Everyone,
=C2=A0 =C2=A0 Circling back to this ask. I just want= ed to suggest, that instead of a an email thread, it maybe more valuable fo= r some of the noobs out there like me, to have available an "Apache Ar= row and Shared Memory", "Hello World" example program on the= developer wiki, possibly without the additional complication of Plasma. I = managed to get many ancillary features of Apache Arrow working ( IPC for ex= ample ), but have not quite closed the circle on the raison d'etre for = Apache Arrow, which is efficiently sharing tables and record batches in sha= red memory. It is not even obvious to me, if it is possible to construct th= e tables in shared memory or if they have to be copied there after being co= nstructed elsewhere.=C2=A0

=C2=A0 =C2=A0 I a= lso happened to come across this, currently unanswered, question on stack o= verflow which references an approach I was thinking about ( basically creat= e a shared memory subclass for MemoryPool ), but was not sure that was the = appropriate level of the stack at which to attack this problem.=C2=A0
=


Another approach I was considering is subcl= assing form ResizeableBuffer, but was not sure if that is the right method = either since I was not sure if I could construct tables in shared memory wi= thout copying.

Thank you to this great community f= or all your help in this matter. I am very excited about this project and i= ts prospects.

Regards,

Bi= pin



On Wed, Oct 3, 2018 at 4:37 PM Bipin Mathew &= lt;bipinmathew@gmail.com> w= rote:
Totally understandable. Thank you W= es! We can continue this correspondence there. Looking forward to the 0.11 = release :-)

Regards,

Bipi= n

On Wed, Oct 3, 2018 at 4:22 PM Wes McKinney <wesmckinn@gmail.com> wrote:
hi Bipin -- I will reply to your mail on = the dev@ mailing list but it
may take me some time. I'm traveling internationally to conferences
and also have been focused on moving the 0.11 release forward.

- Wes
On Wed, Oct 3, 2018 at 12:00 PM Bipin Mathew <bipinmathew@gmail.com> wrote:
>
> Good Morning Everyone,
>
>=C2=A0 =C2=A0 =C2=A0I originally posted this question to the dev channe= l, not knowing a user channel was available. This channel is more probably = more appropriate and I am hoping the kind souls here can help me. How, fund= amentally, are we expected, to copy or indeed directly write a arrow table = to shared memory using the cpp sdk? Currently, I have an implementation lik= e this:
>
>>=C2=A0 77=C2=A0 =C2=A0std::shared_ptr<arrow::Buffer> B;
>>=C2=A0 78=C2=A0 =C2=A0std::shared_ptr<arrow::io::BufferOutputStr= eam> buffer;
>>=C2=A0 79=C2=A0 =C2=A0std::shared_ptr<arrow::ipc::RecordBatchWri= ter> writer;
>>=C2=A0 80=C2=A0 =C2=A0arrow::MemoryPool* pool =3D arrow::default_me= mory_pool();
>>=C2=A0 81=C2=A0 =C2=A0arrow::io::BufferOutputStream::Create(4096,po= ol,&buffer);
>>=C2=A0 82=C2=A0 =C2=A0std::shared_ptr<arrow::Table> table; >>=C2=A0 83=C2=A0 =C2=A0karrow::ArrowHandle *h;
>>=C2=A0 84=C2=A0 =C2=A0h =3D (karrow::ArrowHandle *)Kj(khandle);
>>=C2=A0 85=C2=A0 =C2=A0table =3D h->table;
>>=C2=A0 86
>>=C2=A0 87=C2=A0 =C2=A0arrow::ipc::RecordBatchStreamWriter::Open(buf= fer.get(),table->schema(),&writer);
>>=C2=A0 88=C2=A0 =C2=A0writer->WriteTable(*table);
>>=C2=A0 89=C2=A0 =C2=A0writer->Close();
>>=C2=A0 90=C2=A0 =C2=A0buffer->Finish(&B);
>>=C2=A0 91
>>=C2=A0 92=C2=A0 =C2=A0// printf("Investigate Memory usage.&quo= t;);
>>=C2=A0 93=C2=A0 =C2=A0// getchar();
>>=C2=A0 94
>>=C2=A0 95
>>=C2=A0 96=C2=A0 =C2=A0std::shared_ptr<arrow::io::MemoryMappedFil= e> mm;
>>=C2=A0 97=C2=A0 =C2=A0arrow::io::MemoryMappedFile::Create("/de= v/shm/arrow_table",B->size(),&mm);
>>=C2=A0 98=C2=A0 =C2=A0mm->Write(B->data(),B->size());
>>=C2=A0 99=C2=A0 =C2=A0mm->Close();
>
>
> "table" on line 85 is a shared_ptr to a arrow::Table object.= As you can see there, I write to an arrow:Buffer then write that to a memo= ry mapped file. Is there a more direct approach? I watched this video of a = talk @Wes McKinney gave here:
>
> https://www.dremio.com/webinars/a= rrow-c++-roadmap-and-pandas2/
>
> Where a method: arrow::MemoryMappedBuffer was referenced, but I have n= ot seen any documentation regarding this function. Has it been deprecated?<= br> >
> Also, as I mentioned, "table" up there is a arrow::Table obj= ect. I create it columnwise using various arrow::[type]Builder functions. I= s there anyway to actually even write the original table directly into shar= ed memory? Any guidance on the proper way to do these things would be great= ly appreciated.
>
> Regards,
>
> Bipin
--0000000000004541d10578003f0a--