arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Bulk copy methods to/from Java vectors
Date Sat, 25 Jul 2020 23:21:59 GMT
You can allocate exactly for both fixed [1] and variable types [2].

1:
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L292
2:
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L401

You can then use the set method per cell or just grab the memory address
(e.g. getDataBufferAddress()) and use Unsafe to bulk copy. The latter
obviously is more advanced and requires you do things like set the
validity buffers as well.


On Sat, Jul 25, 2020 at 6:02 AM Chris Nuernberger <chris@techascent.com>
wrote:

> Hey,
>
> I would like to have bulk methods for copying data into a vector.
> Specifically, I have an existing data table so I know the desired lengths
> of the columns.  I can also precalculate the necessary buffer sizes for any
> variable sized column.
>
>
> What I don't see is how to pre-allocate columns of a given size.  When I
> use setValueCount on a column and then use the set method I get a netty
> error.  What I was hoping for is some allocation method, especially for
> primitive data, that allocates the desired uninitialized memory for the
> valide and buffer data and then hands those two buffers back to me so I can
> use memcpy and friends as opposed to repeated calls to setSafe.
>
>
> Repeated calls to setSafe are time consuming, not parallelizable, and
> unnecessary when I know the data rectangle I would like to transfer into a
> record batch.
>
>
> In my case I have the data pre-cut.  How would you recommend copying bulk
> portions of data (that may be in java arrays or in some more abstract
> interface) into a record batch?
>
> Thanks for any help,
>
> Chris
>

Mime
View raw message