From user-return-514-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri Jun 19 03:39:56 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A984318065B for ; Fri, 19 Jun 2020 05:39:55 +0200 (CEST) Received: (qmail 47994 invoked by uid 500); 19 Jun 2020 03:39:54 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 47984 invoked by uid 99); 19 Jun 2020 03:39:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jun 2020 03:39:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DA2ACC0A69 for ; Fri, 19 Jun 2020 03:39:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id cacdshjnNPSF for ; Fri, 19 Jun 2020 03:39:52 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::62f; helo=mail-ej1-x62f.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id A890A7D3FB for ; Fri, 19 Jun 2020 03:39:51 +0000 (UTC) Received: by mail-ej1-x62f.google.com with SMTP id gl26so8632048ejb.11 for ; Thu, 18 Jun 2020 20:39:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to; bh=yQqJmxDMMcAHwYJV+rJZcK8nu+NHNZPzgVsk54kW+3w=; b=ba67RPRKeyhqqwr4xcyqGQvzTHPkJV+Dd3/iYw5XlCmyy2oyWsrVAveB+B54LFHH6i vUE2XroNIJ1aLpQjjklBjn6yoWpzGo4b7KVmjq5SfRGLgL+J+cjAnXTUMogwHZVsDDUv CwRdyECa823QuC6NFOo/0f2v2Vk7Cj+PU5m2xCI53Hzbnqu68/2eZeapyGtqqK71NHBv lefXGrP5QvzpNZYL/b5hAjS73M4IWsTfW3RX/nNR8g9Jlj7g1oEOtuyWED9NhUOOss3i gQrIgLUJ5JqJ/ZxqZ4FJDkceoAZVO7C+/CDXdLLT7lSDSLoLegz8QI80UcDNVmYH+jp9 BV5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to; bh=yQqJmxDMMcAHwYJV+rJZcK8nu+NHNZPzgVsk54kW+3w=; b=GwZIdXQpwgkLPuLx9MklNtcsFkE5jy7Ri/JNmnH4LJl/bSekF71DGCQh5YwGfWWY4R p7g1r/flqF/3guHF9KDtI5My06CE/IegyLyiGsU24R2X/Wt8DcBntL8vVx9ANI/fehf2 ygZNopRMS+IJPd2K0m51Q/a7Hn2iGXedjLVZQEp0mvCrMAE9LaNMnVcXJhpT/wtPcR+H vTBZ80gwVQK+yrOHfDvY4USKEaRgW8oaMLZa06eQ3u4nyypvvJOMLSPidha/ewqrN9DU 9pO4Zgvrv95TRDS07J1u4E7IbHfm3h5+UwGDPVdf/uWqaPNv9aD6Q+GGAuAyqoe6E14k /07Q== X-Gm-Message-State: AOAM532fWjv/ofIU8UWXHbtHGZIxjXB0JSXtVFY+V50LYaQzJhQsEzrV A23gkbRtDOTIkHTAfH2An7EThC2Rpib4slSd23S7hYCO X-Google-Smtp-Source: ABdhPJxyyotf8F31CkxbdELt43wylXJX05DlPFC7NBRK/aV9GqAzNBNykpgakliryL/3coLNsNVm+GyqPt+aGOBqKcw= X-Received: by 2002:a17:906:2a94:: with SMTP id l20mr1707731eje.224.1592537990930; Thu, 18 Jun 2020 20:39:50 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a54:2e45:0:0:0:0:0 with HTTP; Thu, 18 Jun 2020 20:39:50 -0700 (PDT) Reply-To: emkornfield@gmail.com In-Reply-To: <8205051f-7ad7-5bc9-c008-fd9af666555b@ipp.mpg.de> References: <8205051f-7ad7-5bc9-c008-fd9af666555b@ipp.mpg.de> From: Micah Kornfield Date: Thu, 18 Jun 2020 20:39:50 -0700 Message-ID: Subject: Re: Arrow-C++ "Zero-Copy-Append" to BinaryArray To: "user@arrow.apache.org" Content-Type: multipart/alternative; boundary="0000000000003da6ca05a867a507" --0000000000003da6ca05a867a507 Content-Type: text/plain; charset="UTF-8" Hi Simon, I don't think there is a public API for this in C++. You would have to presize a values buffer to the size expected for the compressed data, have the compressor output directly to that buffer while recording the necessary offsets. You could then construct the BinaryArray directly with these buffers (I would need to double check, but you might need to construct an intermediate ArrayData object). Hope this helps. Micah On Thursday, June 18, 2020, Simon Dumke wrote: > Hi all, > > I would like build RecordBatches with (besides others) a BinaryArray > column containing compressed data. when filling the BinaryArray, i would > like to allow the compresseor to immediately output into the Arrow Buffer > instead of allocating an output buffer and then copying the data into Arrow > Buffers. > > Is such an approach possible? And if so - how do I achieve this? > > I'd be thankfull for any insights! > > Best regards, > > Simon > > --0000000000003da6ca05a867a507 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Simon,
I don't think there is a public API for this in C++.=C2= =A0 You would have to presize a values buffer to the size expected for the = compressed data, have the compressor output directly to that buffer while r= ecording the necessary offsets.=C2=A0 You could then construct the BinaryAr= ray directly with these buffers (I would need to double check, but you migh= t need to construct an intermediate ArrayData object).

=
Hope this helps.

Micah



On Thursday, June 18, 2020, Simon Dumke <simon.dumke@ipp.mpg.de> wrote:
Hi all,

I would like build RecordBatches with (besides others) a BinaryArray column= containing compressed data. when filling the BinaryArray, i would like to = allow the compresseor to immediately output into the Arrow Buffer instead o= f allocating an output buffer and then copying the data into Arrow Buffers.=

Is such an approach possible? And if so - how do I achieve this?

I'd be thankfull for any insights!

Best regards,

Simon

--0000000000003da6ca05a867a507--