From user-return-135-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue May 21 03:29:49 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 88738180627 for ; Tue, 21 May 2019 05:29:49 +0200 (CEST) Received: (qmail 82671 invoked by uid 500); 21 May 2019 03:29:48 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 82660 invoked by uid 99); 21 May 2019 03:29:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 May 2019 03:29:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 65CC7C2076 for ; Tue, 21 May 2019 03:29:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.003 X-Spam-Level: ** X-Spam-Status: No, score=2.003 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=353solutions-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id AZqGYRoWNXVo for ; Tue, 21 May 2019 03:29:44 +0000 (UTC) Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0421A5F173 for ; Tue, 21 May 2019 03:29:43 +0000 (UTC) Received: by mail-oi1-f176.google.com with SMTP id q186so5002175oia.0 for ; Mon, 20 May 2019 20:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=353solutions-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kRFyYEOutR8XoicZOTeXQBmW4Pt8OFIx1PmKbLtmZO0=; b=H9o00ZtPrKq9EdliW5H8SJ2uVS3jZWDf+2Bv3g5YDqp/lbcr81Gyk+wuBZTYjOmtaQ +aHVrm4XMyfYR8uwtfVKVIWWuByUItmxJv7n1aeM1JH6c+U/wmkku1BrSmt/PSf25qrh EAkoqRStUyKkHbJlPWQ4xPjQC3GQnRjQYSeGOD/ptX7UujGUhjS7UAn2S/P2xm8XjUx6 mxgCCuYT8Zf1rnHXGup/fPZpdojrVzwuErq4Vbu0YN52or3TJD8OXdW5hEFGwqfITLWh T0TlVIpzDwA//mkvXDthbhLEDtlNpNzsf75YQr2j/jQG2ZfbV+cgNNdcb/IoMdAruC0V zwOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kRFyYEOutR8XoicZOTeXQBmW4Pt8OFIx1PmKbLtmZO0=; b=MlE3D35hKtZ58Se5A+Vt+Cv+Y8FhCzfbmY7PlYaRMN7oQKCtvhJRI8b5GkahlXXMRa HgBWgeOGHARJ2jk9CLgiW/Sgx863KTmZQg3v5XiOO+gRWl0NeVuoDjn3f0Xk6Af9kOje VpiaI+VXT0BePm6qcTQEmvuBayERxF4ZmYsgwV+vs8anJ5gIJSi2WcDcZSx7Wzepcl7V ig+O2+FWb7GDlj5PYjFS3Fr5vbIDYJ5WdWj4ZN2b+HhsaUQsFM8RO/Heyfm3AgCXDZf4 WI8DqizOehPbxZhjcv7+a7E8v76hY7ASjKSQMwHcuabL6IeO07yPA3TwFsZMlH3DbKXz 38NA== X-Gm-Message-State: APjAAAU4Pl3b79m+XJFnGJIJJ4Gl23h/CnxT7/hx5CyHEDXcX9EGGCpR 6ERMdMwjKQqRII11VQ+ZGUOrv/4W0thOGuvLprm71hoN X-Google-Smtp-Source: APXvYqxLJ7cyOou3XQUKaYOETaVZ/u9nT372Wl7UeapDEMnKd5oyl2+ba/b7tYjEAXAz7PICymZSkpAcLhC+r7VoWwA= X-Received: by 2002:aca:7250:: with SMTP id p77mr1868288oic.103.1558409382488; Mon, 20 May 2019 20:29:42 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Miki Tebeka Date: Tue, 21 May 2019 06:29:31 +0300 Message-ID: Subject: Re: [C++] Storing/retreiving a Table in plasma To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000a8834105895d751e" --000000000000a8834105895d751e Content-Type: text/plain; charset="UTF-8" Thanks Wes! On Mon, May 20, 2019 at 9:46 PM Wes McKinney wrote: > hi Miki, > > In > > https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc#L47 > > GetRecordBatchSize does not represent the entire size of the stream > including schema. If you are serializing Schema separate from > RecordBatch then you need to use the lower level > arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at > the unit tests > > If you are going to use RecordBatchStreamWriter then you need to > compute the size using MockOutputStream per my original e-mail > > - Wes > > On Mon, May 20, 2019 at 12:50 PM Miki Tebeka > wrote: > >> > >> That link didn't work for me. > > > > Doh! I moved it to > https://github.com/353solutions/carrow/blob/plasma/_misc/plasma.cc > > > >> > >> Would it not be better to do this work in Apache Arrow rather than an > external project? I would guess the > >> community would be interested in this. > > > > I do plan to suggest this as a patch to arrow once the code is usable, > currently it's just noise. > > > > The idea behind carrow is to use the underlying C++ both in Python & Go > so that in the same process we can simply share pointers (and maybe later > used shared memory allocator to do it between processes). I don't see a > clear path to do it with the current Go implementation since it's uses the > Go runtime to allocate memory, and carrow has a complicated build process > that currently won't with with simple "go get". > > > > To get initial usable Go<->Python IPC quickly, I'm trying to utilize > plasma for now. However in the long run I'd like to just share pointers > with no serializaton at all. > > > > I'd love to discuss how we can make this project usable and get the > community help in solving some "easy of build" issues later on. Would love > to have it in the main arrow eventually. > --000000000000a8834105895d751e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Wes!

On Mon, May 20, 2019 at 9:46 PM= Wes McKinney <wesmckinn@gmail.co= m> wrote:
hi Miki,

In

https://github.com/353solution= s/carrow/blob/plasma/_misc/plasma.cc#L47

GetRecordBatchSize does not represent the entire size of the stream
including schema. If you are serializing Schema separate from
RecordBatch then you need to use the lower level
arrow::ipc::ReadRecordBatch/WriteRecordBatch functions. Have a look at
the unit tests

If you are going to use RecordBatchStreamWriter then you need to
compute the size using MockOutputStream per my original e-mail

- Wes

On Mon, May 20, 2019 at 12:50 PM Miki Tebeka <miki@353solutions.com> wrote:
>>
>> That link didn't work for me.
>
> Doh! I moved it to https://gi= thub.com/353solutions/carrow/blob/plasma/_misc/plasma.cc
>
>>
>> Would it not be better to do this work in Apache Arrow rather than= an external project? I would guess the
>> community would be interested in this.
>
> I do plan to suggest this as a patch to arrow once the code is usable,= currently it's just noise.
>
> The idea behind carrow is to use the underlying C++ both in Python &am= p; Go so that in the same process we can simply share pointers (and maybe l= ater used shared memory allocator to do it between processes).=C2=A0 I don&= #39;t see a clear path to do it with the current Go implementation since it= 's uses the Go runtime to allocate memory, and carrow has a complicated= build process that currently won't with with simple "go get"= .
>
> To get initial usable Go<->Python IPC quickly, I'm trying to= utilize plasma for now. However in the long run I'd like to just share= pointers with no serializaton at all.
>
> I'd love to discuss how we can make this project usable and get th= e community help in solving some "easy of build" issues later on.= Would love to have it in the main arrow eventually.
--000000000000a8834105895d751e--