arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: why there is no batch set interface in arrow?
Date Wed, 03 Jun 2020 05:00:19 GMT
Hi Yunfan,
That is mostly correct given the current API.  If you are looking to write
performant code there are a few tweaks.  You should be presizing the
ValueVector and using the "set" [1] method not setSafe.

Additionally, there are system properties and environment variables that
you can toggle to turns off bounds checking [2] for setters which can have
a big performance impact.

Last, there was discussion on creating Builder classes on the dev@ mailing
list a while ago [3] but there didn't seem to be strong use-case for it.
In particular because there wasn't a strong use-case in improving
performance.  For test-code, there is a VectorPopular which can reduce
boilerplate in testing and is close to the API you suggest.

For the code that you provided above I do not think SIMD would be
applicable since the List holds "Boxed" integer values that are not
contiguous in memory. If the first two suggestions don't help performance
wise and you can get something faster than the loop you provided then a
proof of concept PR would be helpful to continue the discussion.

Thanks,
Micah

[1]
https://arrow.apache.org/docs/java/org/apache/arrow/vector/BigIntVector.html#set-int-long-
[2]
https://github.com/apache/arrow/blob/master/java/README.md#performance-tuning
[3]
https://lists.apache.org/thread.html/43cc4fb214a63efa5a80157ba852b00eea94f31057a439e4b7f784a1%40%3Cdev.arrow.apache.org%3E

On Tue, Jun 2, 2020 at 9:39 PM yunfan <yunfanfighting@foxmail.com> wrote:

> In my understanding.
> I have to use arrow like this:
>
> BigIntVector bigIntVector;
> List<Long> myData;
> for (int i = 0; i < myData.size; i++) {
>       bigIntVector.setSafe(i, myData.get(i));
> }
>
> why not support interface like that:
>
> bigIntVector.setBatchSafe(myData);
>
> And arrow can use SIMD make the set data faster than now.
>
>

Mime
View raw message