Return-Path: X-Original-To: apmail-arrow-dev-archive@minotaur.apache.org Delivered-To: apmail-arrow-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE9AC19F49 for ; Thu, 31 Mar 2016 00:17:34 +0000 (UTC) Received: (qmail 97083 invoked by uid 500); 31 Mar 2016 00:17:29 -0000 Delivered-To: apmail-arrow-dev-archive@arrow.apache.org Received: (qmail 97022 invoked by uid 500); 31 Mar 2016 00:17:29 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 97005 invoked by uid 99); 31 Mar 2016 00:17:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2016 00:17:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E0D4F1A00C2 for ; Thu, 31 Mar 2016 00:17:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id AtWdnliqu6fr for ; Thu, 31 Mar 2016 00:17:26 +0000 (UTC) Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 00D885F23D for ; Thu, 31 Mar 2016 00:17:25 +0000 (UTC) Received: by mail-ig0-f173.google.com with SMTP id ma7so79974590igc.0 for ; Wed, 30 Mar 2016 17:17:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=OE0IIP8NoWpc2Dcv+gZov/j+1TVPkn5ugoCH250rUYA=; b=xMF8dlIY4TG7PS9E5svH4O8FT4DQpkqYIeVxT3N658+sO3SnlQDJ1HeH8q4WeMRLCX 1WlJnheH5CPZPNaK4Gfp/8IHWy+5vAGfL7kOAjg22j8kqTtslY3NhyuBXHu7+WoHVCMt pikFBWuJjGSBwB83bb47ErHRIjJW7vKkrOZAEYW+XIQkpaYKyfOwcKImAcJJSHGeoICG LGVjDvw4Gl/icPfM+TdWC/8bHtybU+cpTl3dNYlqVCyeiiR9AvtYxGdAncpinJ/ljhv3 DnxBGA9790pr7yIsX4MS9WifMMnIY4iXAqDxIAMqdm3wQaZRiPr+SoEEKpG+n6MoU77Q hEZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=OE0IIP8NoWpc2Dcv+gZov/j+1TVPkn5ugoCH250rUYA=; b=EJBC6A80qIQsGFtwV0Jh1ogunHkIhK2jh54FZmBcEggMjWp8u190XmuapXpgMpevVz OHwvPj38K11Oqm4JlJatKbsMHRPHLruToA+FFTmfeAIyNLY4l6ST1Q6DvKghD1VCIX5/ nAwYEoC773+osliSabPGe/aT0jR16/DBTqkGnzXQcdSGH4tWh4LOYmc99xCS7ZtZEL14 v4CuZUbmDcymwbND7bMBHxxDJH6rg+bStkJCX0X86JYNyZNledytsZw8+0gbV782UZ3n 5DjojpLCSNmVC/pdk3U4GrKNU2QCH4u12V4W6WMlNGLJvMNps/tv0TO33cbBPPLAjQ0p aN6g== X-Gm-Message-State: AD7BkJJOEWDGBHWLnMwKj1tuotivFR+eZT6v17Fwwh5MkgZzAMv49kmLorKLnQRoi49g9gwIHpNr68vJLvKDrg== MIME-Version: 1.0 X-Received: by 10.50.112.169 with SMTP id ir9mr11352igb.92.1459383445121; Wed, 30 Mar 2016 17:17:25 -0700 (PDT) Received: by 10.107.12.210 with HTTP; Wed, 30 Mar 2016 17:17:25 -0700 (PDT) In-Reply-To: <927E21E7-69DD-4636-A299-9C45C0B23058@intel.com> References: <56FB11EF.9050701@apache.org> <927E21E7-69DD-4636-A299-9C45C0B23058@intel.com> Date: Wed, 30 Mar 2016 17:17:25 -0700 Message-ID: Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow From: Henry Saputra To: dev@mnemonic.incubator.apache.org Cc: "dev@arrow.apache.org" , "dev@mnemonic.apache.org" Content-Type: multipart/alternative; boundary=089e011843e0d6850b052f4d2f8f --089e011843e0d6850b052f4d2f8f Content-Type: text/plain; charset=UTF-8 The community for both podlings are bigger than the ones show up at Strata =) Would love to have the summary of the discussions in the dev@ list if indeed some discussions happening at Strata. - Henry On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping wrote: > Hi, All > > I met with Jacques today at Strata, we think it would be great that Arrow > and Mnemonic communities can have a F2F meeting together to talk about our > integration. > I have following two days, 4/11 Monday afternoon, or 4/15 Friday. > We can meet at intel SC campus. > > Would you let me know if you are able to join us and which day you'd > prefer? > > Thanks > Yanping > > > On Mar 29, 2016, at 4:38 PM, Gary garyw@apache.org>> wrote: > > Yes, I agree with you and that's great if we could brainstorm here to > collect more ideas about enabling non-volatile memory usage for Apache > Arrow through Mnemonic. > > for the questions, my ideas are: > > > - Right now you are using unpooled persistent memory. Does that make sense > or does chunking make more sense? > > Gary: I think it could make some sense if developer knows that their > datasets are very big and they want Apache Arrow to keep most of them in > memory for intensive computing e.g. sort. > the developer certainly can spill their Mnemonic managed > datasets into disk but this way seems a bit inefficient in some scenarios > that might depend on concrete application logic . > > > - What do you think is the right way to transition back and forth between > persistent and ephemeral memory? What do you think will be the first > pattern to be adopted. For example, do you think we should try to use it as > a tiered storage for sort spilling (before hitting the disk), or should we > use it for caching? > Gary: my 2 cents, the netty library looks not yet provide a elegant switch > mechanism for Arrow to use, probably we can change the logic around > "initialCapacity > directArena.chunkSize" to control which buffer put on > off-heap or managed by Mnemonic, another approach is to let memory > clustering mechanism of Mnemonic managing hybrid memory-like spaces instead > of part logics of class PooledByteBufAllocatorL. > Regarding the sorting, I think it is a typical case of random access to > the data, we should avoid spilling as much as possible. > my 2 cents, the performance could be > all in off-heap if possible > mnemonic used as cache > all in mnemonic > using NVMe/disk > off-heap + spilling > the code simplicity would be > all in off-heap if possible > all in mnemonic using NVMe/disk > mnemonic > used as cache > off-heap + spilling > > the reason why the mode "mnemonic used as cache + spilling" probably > unnecessary is mnemonic could provide nearly equivalent capacity of disk. > > Thanks. > Gary. > > > -----Original Message----- > > From: Jacques Nadeau [mailto:jacques@apache.org] > > Sent: Tuesday, March 29, 2016 8:05 AM > > To: dev@arrow.apache.org dev@arrow.apache.org> > > Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. > for Apache Arrow > > > > This is super cool. A couple of questions: > > > > - Right now you are using unpooled persistent memory. Does that make sense > or does chunking make more sense? > > - What do you think is the right way to transition back and forth between > persistent and ephemeral memory? What do you think will be the first > pattern to be adopted. For example, do you think we should try to use it as > a tiered storage for sort spilling (before hitting the disk), or should we > use it for caching? > > > > I think it will be much easier to think about this in the context of a > primary or first use case. Do you have something in mind or should we > brainstorm here? > > > > On Wed, Mar 23, 2016 at 7:16 PM, Gary garyw@apache.org>> wrote: > > > > > Hello, > > > > > > We have created a patch for Apache Arrow to leverage Apache > > > incubator Mnemonic as an alternative infra. for underlying memory > > > resources allocation, you can find it as below forked repo. > > > > > > > https://github.com/NonVolatileComputing/arrow > > > > > > By this way, Apache Arrow could take some structural benefits from > > > Mnemonic project they are > > > > > > - Arrow is able to leverage larger capacity of high performance > > > hybrid storage devices. e.g. high-end SSD, NVMe > > > > > > - Mnemonic provide a potential opportunity for Arrow to > > > optimize/tuning its allocation algorithms as a native Arrow-oriented > > > allocation services > > > > > > - The non-volatile features of Mnemonic make it possible that > > > Arrow could make its columnar in-memory data shared between different > > > applications or across life-cycle of single application > > > > > > - Arrow could take advantages of coming Mnemonic features of > > > memory clustering/DOG (distributed object graph) and massive native > > > computing > > > > > > - Mnemonic helps to reduce the pressure of main memory utilization > > > and its related system wide overheads. > > > > > > Our this patch is designed to minimize the changes for user to use > > > Arrow, please check out the test cases provided by this patch for your > > > reference. > > > > > > Note that, we need to put allocator services to a specified > > > position (indicated by pom.xml) for Mnemonic backed Arrow related test > > > cases to run because those services are required for external > > > memory-like device management. > > > > > > Please give your comments and review feedback for better > > > collaboration of Apache Arrow and Mnemonic, Thanks. > > > > > > Best Regards. > > > Gary. > > > > > > > > > > > > > --089e011843e0d6850b052f4d2f8f--