arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: (java) Producing an in-memory Arrow buffer from a file
Date Fri, 24 Jan 2020 05:16:43 GMT
Hi Andrew,
It might help to provide a little more detail on where you are starting
from and what you want to do once you have the data in arrow format.

 If you have the data already available in some sort of off-heap
datastructure you can potentially avoid copies wrap with the existing
ArrowBuf machinery [1].  If you have an iterator over the data you can also
directly build a ListVector [2].

Depending on your end goal, you might want to stream the values through a
VectorSchemaRoot instead.

There was some documentation written that will be published with the next
release that gives an overview of the Java libraries [3] that might be
helpful.

Cheers,
Micah

[1]
https://javadoc.io/static/org.apache.arrow/arrow-memory/0.15.1/io/netty/buffer/ArrowBuf.html
[2]
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java
[3] https://github.com/apache/arrow/tree/master/docs/source/java

On Thu, Jan 23, 2020 at 5:02 AM Andrew Melo <andrew.melo@gmail.com> wrote:

> Hello all,
>
> I work in particle physics, which has standardized on the ROOT (
> http://root.cern) file format to store/process our data. The format
> itself is quite complicated, but the relevant part here is that after
> parsing/decompression, we end up with value and offset buffers holding our
> data.
>
> What I'd like to do is represent these data in-memory in the Arrow format.
> I've written a very rough POC where I manually put an Arrow stream into a
> ByteBuffer, then replaced the values and offset buffers with the bytes from
> my files., and I'm wondering what's the "proper" way to do this is. From my
> reading of the code, it appears (?) that what I want to do is produce a
> org.apache.arrow.vector.types.pojo.Schema object, and N ArrowRecordBatch
> objects, then use MessageSerializer to stick them into a ByteBuffer one
> after each other.
>
> Is this correct? Or, is there another API I'm missing?
>
> Thanks!
> Andrew
>

Mime
View raw message