From user-return-865-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Dec 31 19:41:07 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 51DE4180638 for ; Thu, 31 Dec 2020 20:41:07 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id B241764ED7 for ; Thu, 31 Dec 2020 19:41:06 +0000 (UTC) Received: (qmail 48695 invoked by uid 500); 31 Dec 2020 19:41:05 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 48684 invoked by uid 99); 31 Dec 2020 19:41:05 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Dec 2020 19:41:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id 9091AC0111 for ; Thu, 31 Dec 2020 19:41:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: 0.201 X-Spam-Level: X-Spam-Status: No, score=0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H2=-0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=techascent-com.20150623.gappssmtp.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id ksilhzu2dtdc for ; Thu, 31 Dec 2020 19:41:00 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=209.85.208.50; helo=mail-ed1-f50.google.com; envelope-from=chris@techascent.com; receiver= Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 59FF9BCC0D for ; Thu, 31 Dec 2020 19:41:00 +0000 (UTC) Received: by mail-ed1-f50.google.com with SMTP id i24so18936130edj.8 for ; Thu, 31 Dec 2020 11:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=techascent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=qV+zoKAmViD5JmkL94zkeWdoxouZsvVeaK2JS31vfwc=; b=weHon6Oa30SneRKCyCETcTuMDgmzQzUaOk+XFWolJLA3svtkEWvMNHrOKXuzUigR+O Kq6q0QPIXT5hzipuGA3yriXeVOuLkfCb3x7cOhtCfZE77iQW4Q1wes7CoK7pErOiq7fP WyrhlECehubGr3sSEsbHrtvIwWzPygB0OYv1yihLmCCFTKPWAI38Azyhj4yLW2BLeRLs AeKMaw5imlGe23bsE76JRvCLn1MoeoATMX+0+7Yd7S+jB/Bd05BTi7f1aXhHhgLYM0t3 wWYK96SZIit13fvThHAjVEVM/GXPT88+apoCxfhlMVUMfULxRcU/EMpIqib9JCBEr6iX knQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=qV+zoKAmViD5JmkL94zkeWdoxouZsvVeaK2JS31vfwc=; b=KTHVnrR+xDaeQQ5I8H2ngEuQ9pHxz2r/SVDKeLhzs1OfdOr+O/9BIT1Nrq9G7C90wi 1t/zWhQU+D5qur6pOsN5BFBJH1UkHABHPRpzirVNqlhiig2F++Fwu/JYpbVyVW3gkHiz Sw+QUEDZyTlIOsI3DdceTw8tGJzJHdCis8gd4RVdfgrynWX8Rbb0MXAZL6g6qY5WzJOT 3MlGxxamZ/DKUmH/QvHqjZ9OlPyV5QbhBzsLDkYOIAosMqGfTdzX6meRYZymtlNJkdhN qGAv0JdoP+fYyZiivOnhYx6R0PWaicQBw5jJsC8lHrudKKQ1gh5GJ47lclBKQcekxyVQ 4i+Q== X-Gm-Message-State: AOAM530YiEkMcGFd9i5l9p/fwkXNifEw2ehb33mvqzucZCWM0XQbCStD UI97RNyrLuwbq/YIo4JZfUNq2wbdRCa2526KX17wqsUEMcC3ZLJU X-Google-Smtp-Source: ABdhPJzVGdOEd647Yf0lQLk3wkTkhCI4Aooq/Bsa10jonVj8KOSC2nxAK23G0cfLi682c3nC6hu2Rx2cGjmC50s4f/s= X-Received: by 2002:a50:fc83:: with SMTP id f3mr57247810edq.219.1609443652703; Thu, 31 Dec 2020 11:40:52 -0800 (PST) MIME-Version: 1.0 References: <017A1CA3-D9A2-44A6-91E8-434BB3CBEF80@upgini.com> In-Reply-To: <017A1CA3-D9A2-44A6-91E8-434BB3CBEF80@upgini.com> From: Chris Nuernberger Date: Thu, 31 Dec 2020 12:40:42 -0700 Message-ID: Subject: Re: Apache Arrow Java To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="00000000000034abc105b7c7cd03" --00000000000034abc105b7c7cd03 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Igor, I am not an arrow developer but to my knowledge only java pathway that can use mmap is the one I wrote for Clojure: https://techascent.com/blog/memory-mapping-arrow.html The underlying library is tech.ml.dataset and we also have generic python bindings . I do wonder what the pointer actually points at with pyarrow. Columns themselves may point to up to 3 buffers (data, valid, offsets) in the case of text and usually have 2 data points for data and valid. Potentially the pointer you get back is a pointer to the low level record batch but this specifically cannot have a pointer to a dictionary. Just considering the actual arrow file format a single pointer cannot point to both the schema information (which contains the dictionary) and the record batch column data. There isn't a single column interchange format I am aware of aside from potentially writing a streaming format with a single column. On Wed, Dec 30, 2020 at 8:08 AM Igor wrote: > Hello Apache Arrow developers! > > We are using apache arrow library in java and python, using arrow-vector > arrow-memory-unsafe in java and Pyarrow in python. > > We try to implement in memory zero copy DataFrame, but we can=E2=80=99t f= ind > appropriate API in java libraries to get memory address of our vectors fr= om > python. I have found that API in Pyarrow library, but not in java librari= es. > > What we need: > 1) Create vector in java, collect data in memory using arrow as memory ma= p > API > 2) Get memory address or descriptor in java > 3) Pass it to the python library Pyarrow > 4) Read vector data > > We have problem in the point 2 > > Tell us please, how we can do that. Thank you! > > > Best regards, > Eshtyganov Igor > https://www.upgini.com > --00000000000034abc105b7c7cd03 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Igor,=C2=A0

I am not an arrow developer= but to my knowledge only java pathway that can use mmap is the one I wrote= for Clojure:


The underlying library is tech.ml.dataset=C2=A0and we al= so have generic pyt= hon bindings.

I do wonder what the pointer= actually points at with pyarrow.=C2=A0 Columns themselves may point to up = to 3 buffers (data, valid, offsets) in the case of text and usually have 2 = data points for data and valid. Potentially the pointer you get back is a p= ointer to the low level record batch but this specifically cannot have a po= inter to a dictionary.

Just considering the actual= arrow file format a single pointer cannot point to both the schema informa= tion (which contains the dictionary) and the record batch column data.

There isn't a single column interchange form= at I am aware of aside from potentially writing a streaming format with a s= ingle column.

On Wed, Dec 30, 2020 at 8:08 AM Igor <igor@upgini.com> wrote:
https://www.upg= ini.com
--00000000000034abc105b7c7cd03--