arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nuernberger <>
Subject Re: Bulk copy methods to/from Java vectors
Date Sun, 26 Jul 2020 15:35:22 GMT
Also it appears that allocate new fails to set the value count for
BaseVariableWidthVectors.  And if you set the value count after you have
assigned data then it clears *only* the offset buffer but not the validity
or the data buffers.

On Sun, Jul 26, 2020 at 9:01 AM Chris Nuernberger <>

> It appears that those methods do not allocate the validity buffer *and*
> the function `allocateValidityBuffer` is private.
> How do you recommend allocating the validity buffer?
> On Sun, Jul 26, 2020 at 6:48 AM Chris Nuernberger <>
> wrote:
>> Perfect, thank you.  I tried setCapacity and setValueCount together and
>> this didn't have the result I was hoping for.  The methods you provide are
>> what I was looking for.
>> On Sat, Jul 25, 2020 at 5:22 PM Jacques Nadeau <>
>> wrote:
>>> You can allocate exactly for both fixed [1] and variable types [2].
>>> 1:
>>> 2:
>>> You can then use the set method per cell or just grab the memory address
>>> (e.g. getDataBufferAddress()) and use Unsafe to bulk copy. The latter
>>> obviously is more advanced and requires you do things like set the
>>> validity buffers as well.
>>> On Sat, Jul 25, 2020 at 6:02 AM Chris Nuernberger <>
>>> wrote:
>>>> Hey,
>>>> I would like to have bulk methods for copying data into a vector.
>>>> Specifically, I have an existing data table so I know the desired lengths
>>>> of the columns.  I can also precalculate the necessary buffer sizes for any
>>>> variable sized column.
>>>> What I don't see is how to pre-allocate columns of a given size.  When
>>>> I use setValueCount on a column and then use the set method I get a netty
>>>> error.  What I was hoping for is some allocation method, especially for
>>>> primitive data, that allocates the desired uninitialized memory for the
>>>> valide and buffer data and then hands those two buffers back to me so I can
>>>> use memcpy and friends as opposed to repeated calls to setSafe.
>>>> Repeated calls to setSafe are time consuming, not parallelizable, and
>>>> unnecessary when I know the data rectangle I would like to transfer into
>>>> record batch.
>>>> In my case I have the data pre-cut.  How would you recommend copying
>>>> bulk portions of data (that may be in java arrays or in some more abstract
>>>> interface) into a record batch?
>>>> Thanks for any help,
>>>> Chris

View raw message