From user-return-595-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sat Aug 1 02:27:44 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 600C7180647 for ; Sat, 1 Aug 2020 04:27:44 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id 95D591256FA for ; Sat, 1 Aug 2020 02:27:43 +0000 (UTC) Received: (qmail 96910 invoked by uid 500); 1 Aug 2020 02:27:43 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 96898 invoked by uid 99); 1 Aug 2020 02:27:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Aug 2020 02:27:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 837DA1A325C for ; Sat, 1 Aug 2020 02:27:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id g-SQZaATdU1N for ; Sat, 1 Aug 2020 02:27:40 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::52c; helo=mail-ed1-x52c.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id E7A147F747 for ; Sat, 1 Aug 2020 02:27:39 +0000 (UTC) Received: by mail-ed1-x52c.google.com with SMTP id a14so6627109edx.7 for ; Fri, 31 Jul 2020 19:27:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=NY2dwJErUa2a28GzWvL0xKZx5Q+kIxrcJoDvKkUh/0E=; b=HMbDLz0c4o+KnYFPQVzDYbzcr5w3jppMNt0lMxyT/2X9bzMxIW59xCCR+jwEvY4Bev yDoUxTEzuHNpAMPFL31mV968M0I3uXfG1CUyf6TVH4SGQUNpoGoGE7xGsuH1TJpx2/g+ 6FfCgBRjtONqijNkIR76ZL1dkApKW8b6g33sNiJlubSB7a9PM+0t5XHXagEY0Fq3F2Jy 9vu1/BhhW2SCMJOGC5FOWwRcuBrsiIDRjTUNELv3O0Iits+sHN4UDg2nL6tRtMAxJCCI jUH6oUU7/kpoBS6pxDisXclwYUwYUSqWuGgdp6d3muh5M8JryLyRH25ypr8JZx+szYUx 8nNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=NY2dwJErUa2a28GzWvL0xKZx5Q+kIxrcJoDvKkUh/0E=; b=mJ+YdY7aOFniuxetM/rfrR/m8XjFdJFyuvY4cqO0BCR/rjBjApC41SkphozU/1kkWM iKFSu+mcDprbiYji18iIu1XiVrcqlxbNApzkRBwRIccDJcgsk/EPX5D0IAcGxI+WPr9a yEJV8OHeSt/5/W95bW7iPzdKMSnQ2ItUxeUmPxK4GGfxpmvpCEpxczWC6ICJxT37k8vI yUDRknyLREKm6grxhQjJm62hou00HR0WvcMtDrhTyMGKmkEVUCT73O8AML5SZTzVcQR/ PoHpU6AUeSgsJe2t0XZCqQqsiljxfKgLLntxdnBwEiZeP4XRFcKyV7OiAa3AuILV/P/C DIKg== X-Gm-Message-State: AOAM5316IpPbjNE2JDt22sVqyyyc+5lJWMKuBUWa67kEPxg9h1cj1dwL Kdbzs5j56hUUdBWZdgsN9E+F3qB58WHUh/K3tkcL/3u7EWg= X-Google-Smtp-Source: ABdhPJz7vyaU2P9IfcsuCV6u+IiOHeCiccZB1R+36Qs/0aycLzj92465rWxRqOgS1Xsh7aeADHhisFrThr2Ztaqg7A0= X-Received: by 2002:a05:6402:1bc1:: with SMTP id ch1mr6666290edb.142.1596248852816; Fri, 31 Jul 2020 19:27:32 -0700 (PDT) MIME-Version: 1.0 References: <8205051f-7ad7-5bc9-c008-fd9af666555b@ipp.mpg.de> In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Fri, 31 Jul 2020 19:27:21 -0700 Message-ID: Subject: Re: Arrow-C++ "Zero-Copy-Append" to BinaryArray To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000d867c105abc7a52e" --000000000000d867c105abc7a52e Content-Type: text/plain; charset="UTF-8" Hi Simon, Yes, I think it potentially would be a good addition to the BinaryBuilder. I'm sure other people might have opinions on this, the best way forward would be to open up a JIRA with a proposal for an API and send a PR (I imagine this should be a fairly small change, so most discussion could probably happen on the PR). Thanks, Micah [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_binary.h#L505 On Wed, Jul 29, 2020 at 5:53 AM Simon Dumke wrote: > Hi Micah, > > very late on my part but still: Thanks for your reply! I've followed your > suggestion and it is working as expected. I believe this functionality > could be added to the BinaryBuilder - Would this be a sensible feature to > add? > > Kind regards, > Simon > > Am 19.06.2020 um 05:39 schrieb Micah Kornfield: > > Hi Simon, > I don't think there is a public API for this in C++. You would have to > presize a values buffer to the size expected for the compressed data, have > the compressor output directly to that buffer while recording the necessary > offsets. You could then construct the BinaryArray directly with these > buffers (I would need to double check, but you might need to construct an > intermediate ArrayData object). > > Hope this helps. > > Micah > > > > On Thursday, June 18, 2020, Simon Dumke wrote: > >> Hi all, >> >> I would like build RecordBatches with (besides others) a BinaryArray >> column containing compressed data. when filling the BinaryArray, i would >> like to allow the compresseor to immediately output into the Arrow Buffer >> instead of allocating an output buffer and then copying the data into Arrow >> Buffers. >> >> Is such an approach possible? And if so - how do I achieve this? >> >> I'd be thankfull for any insights! >> >> Best regards, >> >> Simon >> >> > -- > Simon Dumke > > Entwickler - CoDaC > Department Operation > > Max Planck Institut for Plasmaphysics > Wendelsteinstrasse 1 > 17491 Greifswald, Germany > > Phone: +49(0)3834 88 1215 > > --000000000000d867c105abc7a52e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi Simon,
Yes, I think it potenti= ally would be a good=C2=A0addition to the BinaryBuilder.=C2=A0 I'm sure= other people might have opinions on this, the best way forward would be to= open up a JIRA with a proposal for an API and send a PR (I imagine this sh= ould be a fairly small change, so most discussion could probably happen on = the PR).

Thanks,

On Wed, Jul 29, 2= 020 at 5:53 AM Simon Dumke <si= mon.dumke@ipp.mpg.de> wrote:
=20 =20 =20
Hi Micah,

very late on my part but still: Thanks for your reply! I've followed your suggestion and it is working a= s expected. I believe this functionality could be added to the BinaryBuilder - Would this be a sensible feature to add?

Kind regards,
Simon

Am 19.06.2020 um 05:39 schrieb Micah Kornfield:
=20 Hi Simon,
I don't think there is a public API for this in C++.=C2=A0 Y= ou would have to presize a values buffer to the size expected for the compressed data, have the compressor output directly to that buffer while recording the necessary offsets.=C2=A0 You could then construct the BinaryArray directly with these buffers (I would need to double check, but you might need to construct an intermediate ArrayData object).

Hope this helps.

Micah



On Thursday, June 18, 2020, Simon Dumke <simon.dumke@ipp.mpg.de> wrote:
Hi all,

I would like build RecordBatches with (besides others) a BinaryArray column containing compressed data. when filling the BinaryArray, i would like to allow the compresseor to immediately output into the Arrow Buffer instead of allocating an output buffer and then copying the data into Arrow Buffers.
Is such an approach possible? And if so - how do I achieve this?

I'd be thankfull for any insights!

Best regards,

Simon


--=20
Simon Dumke

Entwickler - CoDaC
Department Operation

Max Planck Institut for Plasmaphysics
Wendelsteinstrasse 1
17491 Greifswald, Germany

Phone: +49(0)3834 88 1215 
--000000000000d867c105abc7a52e--