From user-return-568-archive-asf-public=cust-asf.ponee.io@arrow.apache.org  Sun Jul 26 20:48:29 2020
Return-Path: <user-return-568-archive-asf-public=cust-asf.ponee.io@arrow.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with ESMTPS id 954E018061A
	for <archive-asf-public@cust-asf.ponee.io>; Sun, 26 Jul 2020 22:48:29 +0200 (CEST)
Received: from mail.apache.org (localhost [127.0.0.1])
	by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id E20E21266E5
	for <archive-asf-public@cust-asf.ponee.io>; Sun, 26 Jul 2020 20:48:24 +0000 (UTC)
Received: (qmail 75300 invoked by uid 500); 26 Jul 2020 20:48:21 -0000
Mailing-List: contact user-help@arrow.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@arrow.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@arrow.apache.org>
List-Post: <mailto:user@arrow.apache.org>
List-Id: <user.arrow.apache.org>
Reply-To: user@arrow.apache.org
Delivered-To: mailing list user@arrow.apache.org
Received: (qmail 75291 invoked by uid 99); 26 Jul 2020 20:48:21 -0000
Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2020 20:48:21 +0000
Received: from mail-oo1-f46.google.com (mail-oo1-f46.google.com [209.85.161.46])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 50C464028D
	for <user@arrow.apache.org>; Sun, 26 Jul 2020 20:48:21 +0000 (UTC)
Received: by mail-oo1-f46.google.com with SMTP id y9so2786528oot.9
        for <user@arrow.apache.org>; Sun, 26 Jul 2020 13:48:21 -0700 (PDT)
X-Gm-Message-State: AOAM531gg6z9eQrA81se9L6zze8Bw++QVGGZri91mzkYX9pqYQCQSCvr
	6kkPRgmVPuG/T0ha+duAxivazG+YepYB/0Vpll4=
X-Google-Smtp-Source: ABdhPJxEP2OdpUfTsBATesPCRRq2EBtmcaN0im1ZMwWZBKzTXZQKfmkAzHlirEjDmJHwklUOtlDcYR+uqAbqV/zskxs=
X-Received: by 2002:a4a:b006:: with SMTP id f6mr18444206oon.13.1595796500907;
 Sun, 26 Jul 2020 13:48:20 -0700 (PDT)
MIME-Version: 1.0
References: <CADbpEJtCvLYBPQ46=_+t+DJgqhBWSKmDdet+C9289fm2g-+tQA@mail.gmail.com>
 <CAKa9qDmWngXhpRd_=9gOd8X0PpGsExCTDdYrEzTRxzd9srwxQA@mail.gmail.com>
 <CADbpEJsUfs+w52ouHyFszyc-DWDrgSqQKOerTWtfyfFaY9SR1g@mail.gmail.com>
 <CAKa9qD=FyiYhgBHdFbzY-cP7vr+y2UQxx22st9X7rpG2K8gk_w@mail.gmail.com>
 <CADbpEJvvs0VjVRfd4Jn7fi8H0-qzD02B6uN3LCncK-fxqG_Nwg@mail.gmail.com>
 <CAKa9qD=wwd_1=5V0WtT12HR5SpMa0atEmko+i_2D2t-e9Y23cA@mail.gmail.com> <CADbpEJvuqST3=X5pP-tGvsC=DT8Lqtjmg+DetV5tUSH0OAkkTA@mail.gmail.com>
In-Reply-To: <CADbpEJvuqST3=X5pP-tGvsC=DT8Lqtjmg+DetV5tUSH0OAkkTA@mail.gmail.com>
From: Jacques Nadeau <jacques@apache.org>
Date: Sun, 26 Jul 2020 13:48:09 -0700
X-Gmail-Original-Message-ID: <CAKa9qDnAh0S5BeK0ngoUO0ZMDbxjg7KuLH2c9Dk3oXg-cDSqzA@mail.gmail.com>
Message-ID: <CAKa9qDnAh0S5BeK0ngoUO0ZMDbxjg7KuLH2c9Dk3oXg-cDSqzA@mail.gmail.com>
Subject: Re: memory mapped record batches in Java
To: user@arrow.apache.org
Content-Type: multipart/alternative; boundary="000000000000920b3e05ab5e53d2"

--000000000000920b3e05ab5e53d2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 26, 2020 at 11:57 AM Chris Nuernberger <chris@techascent.com>
wrote:

> The distinction between heap and off-heap is confusing to someone who
> works in both java and c++ but I understand what you are saying; there is
> some minimal overhead there.
>
In the JVM there is a very clear distinction and this is precisely what I
was referring to. Heap memory in context of the JVM is garbage collected
and there is the cost to the churn of objects within this garbage collected
space. The vector schema root pipelining pattern was built to minimize this
heap churn.

What I keep trying to say is that when you use malloc (or create a new
> object in the JVM) you are allocating memory that can=E2=80=99t be paged =
out of
> process;
>
Sigh. Per my original response: create an allocation manager which works
with one or many mmaped Arrow-IPC formatted files.

I bet in general you are completely wrong


lol

What algorithms are you thinking...


large joins and aggregations of a pipelined input.

--000000000000920b3e05ab5e53d2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Sun, Jul 26, 2020 at 11:57 AM Chris Nu=
ernberger &lt;<a href=3D"mailto:chris@techascent.com">chris@techascent.com<=
/a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><=
div dir=3D"ltr"><div><p style=3D"margin:0px 0px 1.2em">The distinction betw=
een heap and off-heap is confusing to someone who works in both java and c+=
+ but I understand what you are saying; there is some minimal overhead ther=
e. </p></div></div></blockquote><div>In the JVM there is a very clear disti=
nction and this is precisely what I was referring to. Heap memory in contex=
t of the JVM is garbage collected and there is the cost to the churn of obj=
ects within this garbage collected space. The vector schema root pipelining=
 pattern was built to minimize this heap churn.</div><div><br></div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><p style=
=3D"margin:0px 0px 1.2em"> What I keep trying to say is that when you use m=
alloc (or create a new object in the JVM) you are allocating memory that ca=
n=E2=80=99t be paged out of process;</p></div></div></blockquote><div>Sigh.=
 Per my original response: create an allocation manager which works with on=
e or many mmaped Arrow-IPC formatted files.</div><div><br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex">I bet in general you are completely =
wrong</blockquote><div><br></div><div>lol</div><div><br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex">What algorithms are you thinking...</b=
lockquote><div><br></div><div>large joins and aggregations of a pipelined i=
nput.</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">
</blockquote></div>
</blockquote></div></div>

--000000000000920b3e05ab5e53d2--