arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <>
Subject Re: memory mapped record batches in Java
Date Sun, 26 Jul 2020 20:48:09 GMT
On Sun, Jul 26, 2020 at 11:57 AM Chris Nuernberger <>

> The distinction between heap and off-heap is confusing to someone who
> works in both java and c++ but I understand what you are saying; there is
> some minimal overhead there.
In the JVM there is a very clear distinction and this is precisely what I
was referring to. Heap memory in context of the JVM is garbage collected
and there is the cost to the churn of objects within this garbage collected
space. The vector schema root pipelining pattern was built to minimize this
heap churn.

What I keep trying to say is that when you use malloc (or create a new
> object in the JVM) you are allocating memory that can’t be paged out of
> process;
Sigh. Per my original response: create an allocation manager which works
with one or many mmaped Arrow-IPC formatted files.

I bet in general you are completely wrong


What algorithms are you thinking...

large joins and aggregations of a pipelined input.

View raw message