arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fan Liya <liya.fa...@gmail.com>
Subject Re: [Java JDBC adapter] non-nullable fields?
Date Fri, 07 May 2021 02:21:51 GMT
Hi Joris,

I think you are right.

We only use the nullability information in the consumers, because it makes
a difference in performance.

The nullability information in the schema is not accurate, as you have
observed.
However, such information is not well-used in the Java implementation
(IMHO). For example, the validity buffer is allocated even if the vector is
non-nullable.

That said, I think it would be better to keep the nullability information
in sync.
So maybe we can open a JIRA to track it?

Best,
Liya Fan


On Thu, May 6, 2021 at 3:09 PM Joris Peeters <joris.mg.peeters@gmail.com>
wrote:

> Hello Fan,
>
> Yes, but it seems that code path only affects the consumers, and whether
> they set a value in the vector or not, see e.g.
> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57
> However, the VectorSchemaRoot's schema, defined I believe at
> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59,
> does not appear to use this info, and just sets every column's nullability
> to true (as per the link in my original email).
>
> Note that we are indeed using the ArrowVectorIterator, and it's when
> iterating over the iterator and inspecting the schema of the elements
> (VectorSchemaRoot) that I notice this.
> Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in
> `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary
> encoding */ null, metadata);`.
>
> Cheers,
> -J
>
> On Thu, May 6, 2021 at 5:53 AM Fan Liya <liya.fan03@gmail.com> wrote:
>
>> Hi Joris,
>>
>> Thanks for reporting the problem.
>>
>> We make use of the nullable information
>> in ArrowVectorIterator#initialize. (Details can be found in
>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73
>> )
>>
>> Please note that the  ArrowVectorIterator is our encouraged way of using
>> the JDBC adapter.
>>
>> Best,
>> Liya Fan
>>
>>
>> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <emkornfield@gmail.com>
>> wrote:
>>
>>> I would need to look further, but I thought we handled null vs not
>>> null.  At least I thought we had specialized conversion code to avoid
>>> branches.  If this isn't the case it seems reasonable to contribute a path.
>>>
>>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters <joris.mg.peeters@gmail.com>
>>> wrote:
>>>
>>>> I'm looking to use the Java JDBC adapter for loading tables from SQL
>>>> Server into Arrow record batches.
>>>>
>>>> At first glance the Arrow JDBC adapter seems to work well but, unless
>>>> I'm mistaken, it simply makes every vector nullable, irrespective of
>>>> whether the corresponding SQL column is nullable or not.
>>>>
>>>> I think the line
>>>>
>>>> final FieldType fieldType = new FieldType(true, arrowType, /*
>>>> dictionary encoding */ null, metadata);
>>>>
>>>> in
>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158
>>>> might be the cause here.
>>>>
>>>> Is my interpretation correct, or am I missing a setting of sorts? If
>>>> indeed correct, is there a fundamental reason the NULL-ness is not
>>>> transferred, or is this something I could contribute in a PR? (which I'd
be
>>>> happy to) I guess it's just a matter of inspecting the result metadata.
>>>>
>>>> Cheers,
>>>> -J
>>>>
>>>

Mime
View raw message