From user-return-946-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Jan 26 18:07:29 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 64B05180633 for ; Tue, 26 Jan 2021 19:07:29 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 8FF6F44F60 for ; Tue, 26 Jan 2021 18:07:28 +0000 (UTC) Received: (qmail 77790 invoked by uid 500); 26 Jan 2021 18:07:28 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 77780 invoked by uid 99); 26 Jan 2021 18:07:28 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Jan 2021 18:07:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id 6CF1F1FF39B for ; Tue, 26 Jan 2021 18:07:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: -3.377 X-Spam-Level: X-Spam-Status: No, score=-3.377 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.2, NICE_REPLY_A=-3.576, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=crvm-io.20150623.gappssmtp.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id N-vBGDDrCLH0 for ; Tue, 26 Jan 2021 18:07:26 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::434; helo=mail-wr1-x434.google.com; envelope-from=thomas@crvm.io; receiver= Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 8DBCE7FA08 for ; Tue, 26 Jan 2021 18:07:26 +0000 (UTC) Received: by mail-wr1-x434.google.com with SMTP id p15so10807601wrq.8 for ; Tue, 26 Jan 2021 10:07:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=crvm-io.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=/Jm5Y+SlOTuakRFY7y/TzHp4io0wILuo4kDTZT2IYIY=; b=kRNfQiUSughdOzVrKYKa3fcm82abnxG+u2uoqYlX7jThslFNwp0QGPGtU0CyzyxM1t +oStRI9dEP0/SfHLkFTy6Y+H22pEElxA2BkE0VbATCj3/2R5bClbfpgSp/V3pNghvyqk wN6LuP1bvBFZ9wyuCcJCfrIWJO0id5iX2BhqoP6+vLaisHA2tH3Gxcw/Glvu857YYTOV KP61dk+EH4Zo+fbPV16NrMoiUdSZJPCFAz1kFQguiVBAoZ5YHxOd5vnxdsFyrbA66+T5 vWw4KNCfYNc/KKao/YShiW39a6valK1KAHGLZR+GhaYnFuFDdFbKxz0sNFcYS4v0miUg e/ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=/Jm5Y+SlOTuakRFY7y/TzHp4io0wILuo4kDTZT2IYIY=; b=DxNrqbMzedqfQNiAu24UvamwUEWoWLWUSXezRvqcnNgv1oMU4x7AkPTH8V+YxGijK1 pmpLqIzEtTnc12fF0/eylmp7FBA/G7d+pHZKpzZ/2wvvozAWWh7kuYdBelIe15g0OqLC L8pGiNm6gEJm0AD716xyddsWgtW7gEcKfpm9ndavlKmgq57aGXnHr9+AEnpx6gms8Fqq kh/c9//QCcdouSklBw895VrOj377zNRQIi5nef5haEzA4ly9Gw8OlSJ2bLdouW6orlwN EwKCfGcLGxEAnmz9IlRic01Q/mU8muD1Zj7hYoaBxDSYr/RB+CKGCXmKjkTJAZIcwzaG w4dg== X-Gm-Message-State: AOAM5302TrXlDtzrGyO7GdIEDBosAsqrxXIGmyjBuHHmdSb1/xxJJ95p hXMHvfAhpLTgVOARJjdeZxmmIM+UAR+R9Df9f9s= X-Google-Smtp-Source: ABdhPJysnucwXbJRp1fb8W+IhxePo1XyP//4NgxiPJmZEJ7RYfXvyT/h/LR6/GNI5AnjwarsxTkIrA== X-Received: by 2002:a5d:420d:: with SMTP id n13mr7315154wrq.320.1611684444184; Tue, 26 Jan 2021 10:07:24 -0800 (PST) Received: from ?IPv6:2a00:23c8:29d:b601:b1a4:932c:a86c:87e6? ([2a00:23c8:29d:b601:b1a4:932c:a86c:87e6]) by smtp.gmail.com with ESMTPSA id r12sm16846376wrp.13.2021.01.26.10.07.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Jan 2021 10:07:23 -0800 (PST) Subject: Re: Question the nature of the "Zero Copy" advantages of Apache Arrow To: user@arrow.apache.org References: From: Thomas Browne Message-ID: <1fb8087e-1463-857a-507c-5aa63002a47c@crvm.io> Date: Tue, 26 Jan 2021 18:07:23 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------89E37FEBB1BA2AFC8C44E3AC" Content-Language: en-US This is a multi-part message in MIME format. --------------89E37FEBB1BA2AFC8C44E3AC Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit don't I lose the benefit of mmapping huge files with a ramdisk? Cos the file has to now fit on my ramdisk. Personally working with financial tick data which can be enormous. On 26/01/2021 18:00, Daniel Nugent wrote: > Is there a problem with just using a RAM disk as the method for > sharing the arrow buffers? It just seems easier and less finicky than > a separate API to program against. > > It also makes storing the data permanently a lot  more > straightforward, I think. > > -- > -Dan Nugent > On Jan 26, 2021, 12:47 -0500, Thomas Browne , wrote: >> So one of the big advantages of Arrow is the common format in memory, on >> the wire, across languages. >> >> I get that this makes it very easy and fast to transfer data between >> nodes, and between languages, which will all share the in-memory format >> and therefore the (often expensive) serialisation step is removed. >> >> However, is it true that one of the core objectives of the project is >> also to allow shared memory objects across different languages on the >> same node? For example, a fast C-based ingest system constantly >> populates a pyarrow buffer, which can be read directly by any other >> application on that node, through pointer sharing? >> >> If this is a core objective, what is the canonical way for brokering the >> "pointers" to this data between languages? Is it the Plasma store? And >> if so, are there plans for Plasma to move be implemented in other client >> languages? >> >> In short. Is Plasma (or if not Plasma, the functionality it provides >> implemented some other way), a core objective of the project? >> >> Or instead is Flight supposed to be used between languages on the same >> node, and if so, does Flight provide true zero-copy (ie - the same >> buffer, not copying the buffer) if run between processes on the same >> node? >> >> Many thanks. --------------89E37FEBB1BA2AFC8C44E3AC Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

don't I lose the benefit of mmapping huge files with a ramdisk? Cos the file has to now fit on my ramdisk.

Personally working with financial tick data which can be enormous.

On 26/01/2021 18:00, Daniel Nugent wrote:
Is there a problem with just using a RAM disk as the method for sharing the arrow buffers? It just seems easier and less finicky than a separate API to program against.

It also makes storing the data permanently a lot  more straightforward, I think.

--
-Dan Nugent
On Jan 26, 2021, 12:47 -0500, Thomas Browne <thomas@crvm.io>, wrote:
So one of the big advantages of Arrow is the common format in memory, on
the wire, across languages.

I get that this makes it very easy and fast to transfer data between
nodes, and between languages, which will all share the in-memory format
and therefore the (often expensive) serialisation step is removed.

However, is it true that one of the core objectives of the project is
also to allow shared memory objects across different languages on the
same node? For example, a fast C-based ingest system constantly
populates a pyarrow buffer, which can be read directly by any other
application on that node, through pointer sharing?

If this is a core objective, what is the canonical way for brokering the
"pointers" to this data between languages? Is it the Plasma store? And
if so, are there plans for Plasma to move be implemented in other client
languages?

In short. Is Plasma (or if not Plasma, the functionality it provides
implemented some other way), a core objective of the project?

Or instead is Flight supposed to be used between languages on the same
node, and if so, does Flight provide true zero-copy (ie - the same
buffer, not copying the buffer) if run between processes on the same node?

Many thanks.
--------------89E37FEBB1BA2AFC8C44E3AC--