From user-return-582-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Wed Jul 29 12:53:31 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 0C1B7180656 for ; Wed, 29 Jul 2020 14:53:31 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id 771D3123FE6 for ; Wed, 29 Jul 2020 12:53:29 +0000 (UTC) Received: (qmail 84482 invoked by uid 500); 29 Jul 2020 12:53:28 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 84472 invoked by uid 99); 29 Jul 2020 12:53:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2020 12:53:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0F8AD18147F for ; Wed, 29 Jul 2020 12:53:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.901 X-Spam-Level: X-Spam-Status: No, score=-2.901 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, KAM_DMARC_STATUS=0.01, NICE_REPLY_A=-0.812, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vw8SEzu6l0sQ for ; Wed, 29 Jul 2020 12:53:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=194.95.232.160; helo=a1962.mx.srv.dfn.de; envelope-from=simon.dumke@ipp.mpg.de; receiver= Received: from a1962.mx.srv.dfn.de (a1962.mx.srv.dfn.de [194.95.232.160]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 52060BE355 for ; Wed, 29 Jul 2020 12:53:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by a1962.mx.srv.dfn.de (Postfix) with ESMTP id 632171C0063 for ; Wed, 29 Jul 2020 14:53:17 +0200 (CEST) Received: from a1962.mx.srv.dfn.de ([127.0.0.1]) by localhost (mgw4-han.srv.dfn.de [127.0.0.1]) (amavisd-new, port 20123) with ESMTP id Oup0B5VU5JZa for ; Wed, 29 Jul 2020 14:53:16 +0200 (CEST) Received: from post.rzg.mpg.de (post.rzg.mpg.de [130.183.30.42]) by a1962.mx.srv.dfn.de (Postfix) with ESMTPS for ; Wed, 29 Jul 2020 14:53:16 +0200 (CEST) Received: from [10.66.24.197] ([194.94.214.61]) (authenticated bits=0) by post.rzg.mpg.de (8.14.7/8.14.7) with ESMTP id 06TCrF9W012404 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 29 Jul 2020 14:53:16 +0200 Subject: Re: Arrow-C++ "Zero-Copy-Append" to BinaryArray To: user@arrow.apache.org References: <8205051f-7ad7-5bc9-c008-fd9af666555b@ipp.mpg.de> From: Simon Dumke Message-ID: Date: Wed, 29 Jul 2020 14:53:15 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------3E94E2B84F5691963BB6851F" This is a multi-part message in MIME format. --------------3E94E2B84F5691963BB6851F Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi Micah, very late on my part but still: Thanks for your reply! I've followed=20 your suggestion and it is working as expected. I believe this=20 functionality could be added to the BinaryBuilder - Would this be a=20 sensible feature to add? Kind regards, Simon Am 19.06.2020 um 05:39 schrieb Micah Kornfield: > Hi Simon, > I don't think there is a public API for this in C++.=C2=A0 You would ha= ve=20 > to presize a values buffer to the size expected for the compressed=20 > data, have the compressor output directly to that buffer while=20 > recording the necessary offsets.=C2=A0 You could then construct the=20 > BinaryArray directly with these buffers (I would need to double check, = > but you might need to construct an intermediate ArrayData object). > > Hope this helps. > > Micah > > > > On Thursday, June 18, 2020, Simon Dumke > wrote: > > Hi all, > > I would like build RecordBatches with (besides others) a > BinaryArray column containing compressed data. when filling the > BinaryArray, i would like to allow the compresseor to immediately > output into the Arrow Buffer instead of allocating an output > buffer and then copying the data into Arrow Buffers. > > Is such an approach possible? And if so - how do I achieve this? > > I'd be thankfull for any insights! > > Best regards, > > Simon > --=20 Simon Dumke Entwickler - CoDaC Department Operation Max Planck Institut for Plasmaphysics Wendelsteinstrasse 1 17491 Greifswald, Germany Phone: +49(0)3834 88 1215 --------------3E94E2B84F5691963BB6851F Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi Micah,

very late on my part but still: Thanks= for your reply! I've followed your suggestion and it is working as expected. I believe this functionality could be added to the BinaryBuilder - Would this be a sensible feature to add?

Kind regards,
Simon

Am 19.06.2020 um 05:39 schrieb Micah Kornfield:
Hi Simon,
I don't think there is a public API for this in C++.=C2=A0 You= would have to presize a values buffer to the size expected for the compressed data, have the compressor output directly to that buffer while recording the necessary offsets.=C2=A0 You could the= n construct the BinaryArray directly with these buffers (I would need to double check, but you might need to construct an intermediate ArrayData object).

Hope this helps.

Micah



On Thursday, June 18, 2020, Simon Dumke <simon.dumke@ipp.mpg.de> wrote:
Hi all,

I would like build RecordBatches with (besides others) a BinaryArray column containing compressed data. when filling the BinaryArray, i would like to allow the compresseor to immediately output into the Arrow Buffer instead of allocating an output buffer and then copying the data into Arrow Buffers.<= br>
Is such an approach possible? And if so - how do I achieve this?

I'd be thankfull for any insights!

Best regards,

Simon


--=20
Simon Dumke

Entwickler - CoDaC
Department Operation

Max Planck Institut for Plasmaphysics
Wendelsteinstrasse 1
17491 Greifswald, Germany

Phone: +49(0)3834 88 1215 
--------------3E94E2B84F5691963BB6851F--