From user-return-568-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sun Jul 26 20:48:29 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 954E018061A for ; Sun, 26 Jul 2020 22:48:29 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id E20E21266E5 for ; Sun, 26 Jul 2020 20:48:24 +0000 (UTC) Received: (qmail 75300 invoked by uid 500); 26 Jul 2020 20:48:21 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 75291 invoked by uid 99); 26 Jul 2020 20:48:21 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jul 2020 20:48:21 +0000 Received: from mail-oo1-f46.google.com (mail-oo1-f46.google.com [209.85.161.46]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 50C464028D for ; Sun, 26 Jul 2020 20:48:21 +0000 (UTC) Received: by mail-oo1-f46.google.com with SMTP id y9so2786528oot.9 for ; Sun, 26 Jul 2020 13:48:21 -0700 (PDT) X-Gm-Message-State: AOAM531gg6z9eQrA81se9L6zze8Bw++QVGGZri91mzkYX9pqYQCQSCvr 6kkPRgmVPuG/T0ha+duAxivazG+YepYB/0Vpll4= X-Google-Smtp-Source: ABdhPJxEP2OdpUfTsBATesPCRRq2EBtmcaN0im1ZMwWZBKzTXZQKfmkAzHlirEjDmJHwklUOtlDcYR+uqAbqV/zskxs= X-Received: by 2002:a4a:b006:: with SMTP id f6mr18444206oon.13.1595796500907; Sun, 26 Jul 2020 13:48:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jacques Nadeau Date: Sun, 26 Jul 2020 13:48:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: memory mapped record batches in Java To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000920b3e05ab5e53d2" --000000000000920b3e05ab5e53d2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jul 26, 2020 at 11:57 AM Chris Nuernberger wrote: > The distinction between heap and off-heap is confusing to someone who > works in both java and c++ but I understand what you are saying; there is > some minimal overhead there. > In the JVM there is a very clear distinction and this is precisely what I was referring to. Heap memory in context of the JVM is garbage collected and there is the cost to the churn of objects within this garbage collected space. The vector schema root pipelining pattern was built to minimize this heap churn. What I keep trying to say is that when you use malloc (or create a new > object in the JVM) you are allocating memory that can=E2=80=99t be paged = out of > process; > Sigh. Per my original response: create an allocation manager which works with one or many mmaped Arrow-IPC formatted files. I bet in general you are completely wrong lol What algorithms are you thinking... large joins and aggregations of a pipelined input. --000000000000920b3e05ab5e53d2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

What I keep trying to say is that when you use m= alloc (or create a new object in the JVM) you are allocating memory that ca= n=E2=80=99t be paged out of process;

Sigh.= Per my original response: create an allocation manager which works with on= e or many mmaped Arrow-IPC formatted files.

I bet in general you are completely = wrong

lol

What algorithms are you thinking...

large joins and aggregations of a pipelined i= nput.

--000000000000920b3e05ab5e53d2--