arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nuernberger <>
Subject Re: Apache Arrow Java
Date Thu, 31 Dec 2020 19:40:42 GMT

I am not an arrow developer but to my knowledge only java pathway that can
use mmap is the one I wrote for Clojure:

The underlying library is
<> and we also have generic
python bindings <>.

I do wonder what the pointer actually points at with pyarrow.  Columns
themselves may point to up to 3 buffers (data, valid, offsets) in the case
of text and usually have 2 data points for data and valid. Potentially the
pointer you get back is a pointer to the low level record batch but this
specifically cannot have a pointer to a dictionary.

Just considering the actual arrow file format a single pointer cannot point
to both the schema information (which contains the dictionary) and the
record batch column data.

There isn't a single column interchange format I am aware of aside from
potentially writing a streaming format with a single column.

On Wed, Dec 30, 2020 at 8:08 AM Igor <> wrote:

> Hello Apache Arrow developers!
> We are using apache arrow library in java and python, using arrow-vector
> arrow-memory-unsafe in java and Pyarrow in python.
> We try to implement in memory zero copy DataFrame, but we can’t find
> appropriate API in java libraries to get memory address of our vectors from
> python. I have found that API in Pyarrow library, but not in java libraries.
> What we need:
> 1) Create vector in java, collect data in memory using arrow as memory map
> 2) Get memory address or descriptor in java
> 3) Pass it to the python library Pyarrow
> 4) Read vector data
> We have problem in the point 2
> Tell us please, how we can do that. Thank you!
> Best regards,
> Eshtyganov Igor

View raw message