The distinction between heap and off-heap is confusing to someone who works in both java and c++ but I understand what you are saying; there is some minimal overhead there.

In the JVM there is a very clear distinction and this is precisely what I was referring to. Heap memory in context of the JVM is garbage collected and there is the cost to the churn of objects within this garbage collected space. The vector schema root pipelining pattern was built to minimize this heap churn.

What I keep trying to say is that when you use malloc (or create a new object in the JVM) you are allocating memory that can’t be paged out of process;

Sigh. Per my original response: create an allocation manager which works with one or many mmaped Arrow-IPC formatted files.

I bet in general you are completely wrong


What algorithms are you thinking...

large joins and aggregations of a pipelined input.