Return-Path: X-Original-To: apmail-arrow-dev-archive@minotaur.apache.org Delivered-To: apmail-arrow-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC76A19813 for ; Wed, 16 Mar 2016 03:30:45 +0000 (UTC) Received: (qmail 32710 invoked by uid 500); 16 Mar 2016 03:30:45 -0000 Delivered-To: apmail-arrow-dev-archive@arrow.apache.org Received: (qmail 32645 invoked by uid 500); 16 Mar 2016 03:30:45 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 32629 invoked by uid 99); 16 Mar 2016 03:30:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Mar 2016 03:30:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AB93C1A08D0 for ; Wed, 16 Mar 2016 03:30:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id VIPkPbHRxXYw for ; Wed, 16 Mar 2016 03:30:42 +0000 (UTC) Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id E50655F3F3 for ; Wed, 16 Mar 2016 03:30:41 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id k12so39501634lbb.1 for ; Tue, 15 Mar 2016 20:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Y8IXQ6FDl28W8Rbey67wZ2DmobTCxsRldQ833dG1qc4=; b=DYEbv6QD4DGhmSQB/qxdPat1FJjfzGyoV1JJ8sLS7bma61/SosFyOTzz1jKA5cn2Nv 3ctDgKiU8w63xgUfvf+qSLaxslGLJQ9wed0OTv2D0f8MLfz4vazJDoIY+c4ohFq84Pt8 m5W37sW7jKctLN/Q/ybmANgBSX0D/XOcKa8q8M6qT+k/WvBJOZQkkAM01dpGRm/0XqOC qw6hzlf1mHpYkTF8q1pgTErvw4M0KTp/8e61dZaJjImAVtsoIR8DIF5EqpM7CZsZKoyP eIQ++J44IUSeZ6Q1+nysaqWmW4WYZonrnxknNZit7MWfTrqvUfI6dPjP6PvSyStxW2xN 6HXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Y8IXQ6FDl28W8Rbey67wZ2DmobTCxsRldQ833dG1qc4=; b=TSibpQt3+oTXZxGCxTVX+3AaMGkKPaBk56JQRWsjtWXjPB26CY8B3U+/MdJbS5kJJl sS9nMsBV9+iVelplJOgqN00xvLsOf2JrmosQwMY6ycayWtWJLUK78esKoxK0t34g1Mcv vJQVBW3biS4sI/mN429YieeeSxmh8vt+gh+/OaEDCu3aI6tckH9PxUUoNSXsVMN5OiCP TmD6MP0yWI6YblhfYRsNKg2dzKcPqv/CJpQj2dQEKOZw+xXTHv3XHIgvzrFgmsBuraHh NpyjtQzwxIywFper3WMbvL3JY5ZrGoBcEEJPhRhyJULeIilEoPloqy+lXO+ggDeo+WUH e3bA== X-Gm-Message-State: AD7BkJLhZxa/nmNJW5Sh+VIZPoYBY4rXBwFiNoeL9JlKQt6DKLAvlA79SOX/+SsbsKcpgJmjr4vIQXGE/bA4VSc+ X-Received: by 10.112.139.202 with SMTP id ra10mr472933lbb.41.1458099040411; Tue, 15 Mar 2016 20:30:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.184.195 with HTTP; Tue, 15 Mar 2016 20:30:20 -0700 (PDT) In-Reply-To: References: <72A2BCFD-54D7-4376-8199-04A5535D86C0@gmail.com> From: Todd Lipcon Date: Tue, 15 Mar 2016 20:30:20 -0700 Message-ID: Subject: Re: Understanding "shared" memory implications To: dev@arrow.apache.org Content-Type: multipart/alternative; boundary=001a11c238885a0484052e22234f --001a11c238885a0484052e22234f Content-Type: text/plain; charset=UTF-8 Having thought about this quite a bit in the past, I think the mechanics of how to share memory are by far the easiest part. The much harder part is the resource management and ownership. Questions like: - if you are using an mmapped file in /dev/shm/, how do you make sure it gets cleaned up if the process crashes? - how do you allocate memory to it? there's nothing ensuring that /dev/shm doesn't swap out if you try to put too much in there, and then your in-memory super-fast access will basically collapse under swap thrashing - how do you do lifecycle management across the two processes? If, say, Kudu wants to pass a block of data to some Python program, how does it know when the Python program is done reading it and it should be deleted? What if the python program crashed in the middle - when can Kudu release it? - how do you do security? If both sides of the connection don't trust each other, and use length prefixes and offsets, you have to be constantly validating and re-validating everything you read. Another big factor is that shared memory is not, in my experience, immediately faster than just copying data over a unix domain socket. In particular, the first time you read an mmapped file, you'll end up paying minor page fault overhead on every page. This can be improved with HugePages, but huge page mmaps are not supported yet in current Linux (work going on currently to address this). So you're left with hugetlbfs, which involves static allocations and much more pain. All the above is a long way to say: let's make sure we do the write prototyping and up-front design before jumping into code. -Todd On Tue, Mar 15, 2016 at 5:54 PM, Jacques Nadeau wrote: > @Corey > The POC Steven and Wes are working on is based on MappedBuffer but I'm > looking at using netty's fork of tcnative to use shared memory directly. > > @Yiannis > We need to have both RPC and a shared memory mechanisms (what I'm inclined > to call IPC but is a specific kind of IPC). The idea is we negotiate via > RPC and then if we determine shared locality, we work over shared memory > (preferably for both data and control). So the system interacting with > HBase in your example would be the one responsible for placing collocated > execution to take advantage of IPC. > > How do others feel of my redefinition of IPC to mean the same memory space > communication (either via shared memory or rdma) versus RPC as socket based > communication? > > > On Tue, Mar 15, 2016 at 5:38 PM, Corey Nolet wrote: > > > I was seeing Netty's unsafe classes being used here, not mapped byte > > buffer not sure if that statement is completely correct but I'll have to > > dog through the code again to figure that out. > > > > The more I was looking at unsafe, it makes sense why that would be > > used.apparently it's also supposed to be included on Java 9 as a first > > class API > > On Mar 15, 2016 7:03 PM, "Wes McKinney" wrote: > > > > > My understanding is that you can use java.nio.MappedByteBuffer to work > > > with memory-mapped files as one way to share memory pages between Java > > > (and non-Java) processes without copying. > > > > > > I am hoping that we can reach a POC of zero-copy Arrow memory sharing > > > Java-to-Java and Java-to-C++ in the near future. Indeed this will have > > > huge implications once we get it working end to end (for example, > > > receiving memory from a Java process in Python without a heavy ser-de > > > step -- it's what we've always dreamed of) and with the metadata and > > > shared memory control flow standardized. > > > > > > - Wes > > > > > > On Wed, Mar 9, 2016 at 9:25 PM, Corey J Nolet > wrote: > > > > If I understand correctly, Arrow is using Netty underneath which is > > > using Sun's Unsafe API in order to allocate direct byte buffers off > heap. > > > It is using Netty to communicate between "client" and "server", > > information > > > about memory addresses for data that is being requested. > > > > > > > > I've never attempted to use the Unsafe API to access off heap memory > > > that has been allocated in one JVM from another JVM but I'm assuming > this > > > must be the case in order to claim that the memory is being accessed > > > directly without being copied, correct? > > > > > > > > The implication here is huge. If the memory is being directly shared > > > across processes by them being allowed to directly reach into the > direct > > > byte buffers, that's true shared memory. Otherwise, if there's copies > > going > > > on, it's less appealing. > > > > > > > > > > > > Thanks. > > > > > > > > Sent from my iPad > > > > > > -- Todd Lipcon Software Engineer, Cloudera --001a11c238885a0484052e22234f--