arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nuernberger <ch...@techascent.com>
Subject Re: Bulk copy methods to/from Java vectors
Date Sun, 26 Jul 2020 15:01:54 GMT
It appears that those methods do not allocate the validity buffer *and* the
function `allocateValidityBuffer` is private.

How do you recommend allocating the validity buffer?

On Sun, Jul 26, 2020 at 6:48 AM Chris Nuernberger <chris@techascent.com>
wrote:

> Perfect, thank you.  I tried setCapacity and setValueCount together and
> this didn't have the result I was hoping for.  The methods you provide are
> what I was looking for.
>
> On Sat, Jul 25, 2020 at 5:22 PM Jacques Nadeau <jacques@apache.org> wrote:
>
>> You can allocate exactly for both fixed [1] and variable types [2].
>>
>> 1:
>> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L292
>> 2:
>> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L401
>>
>> You can then use the set method per cell or just grab the memory address
>> (e.g. getDataBufferAddress()) and use Unsafe to bulk copy. The latter
>> obviously is more advanced and requires you do things like set the
>> validity buffers as well.
>>
>>
>> On Sat, Jul 25, 2020 at 6:02 AM Chris Nuernberger <chris@techascent.com>
>> wrote:
>>
>>> Hey,
>>>
>>> I would like to have bulk methods for copying data into a vector.
>>> Specifically, I have an existing data table so I know the desired lengths
>>> of the columns.  I can also precalculate the necessary buffer sizes for any
>>> variable sized column.
>>>
>>>
>>> What I don't see is how to pre-allocate columns of a given size.  When I
>>> use setValueCount on a column and then use the set method I get a netty
>>> error.  What I was hoping for is some allocation method, especially for
>>> primitive data, that allocates the desired uninitialized memory for the
>>> valide and buffer data and then hands those two buffers back to me so I can
>>> use memcpy and friends as opposed to repeated calls to setSafe.
>>>
>>>
>>> Repeated calls to setSafe are time consuming, not parallelizable, and
>>> unnecessary when I know the data rectangle I would like to transfer into a
>>> record batch.
>>>
>>>
>>> In my case I have the data pre-cut.  How would you recommend copying
>>> bulk portions of data (that may be in java arrays or in some more abstract
>>> interface) into a record batch?
>>>
>>> Thanks for any help,
>>>
>>> Chris
>>>
>>

Mime
View raw message