arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Peeters <joris.mg.peet...@gmail.com>
Subject Re: [Java JDBC adapter] non-nullable fields?
Date Fri, 07 May 2021 09:18:51 GMT
https://issues.apache.org/jira/browse/ARROW-12679

On Fri, May 7, 2021 at 8:54 AM Joris Peeters <joris.mg.peeters@gmail.com>
wrote:

> Fair enough.
> I have this data moving through a few different servers and clients, in
> IPC streaming format, consumed on various platforms/languages. The
> nullability in the schema is often used in "language-friendly" clients,
> e.g. to build a `std::vector<bool>` or `std::vector<std::optional<bool>>`
> depending on whether the bit column is nullable, so preserving this
> information is quite important, even if locally in Java it makes little
> difference.
>
> I've worked around it for now by fudging the VectorSchemaRoot's schema
> myself, but I'll open a JIRA to track, and I'll assign it to myself and
> provide a fix.
>
> Cheers!
> -Joris.
>
>
> On Fri, May 7, 2021 at 3:22 AM Fan Liya <liya.fan03@gmail.com> wrote:
>
>> Hi Joris,
>>
>> I think you are right.
>>
>> We only use the nullability information in the consumers, because it
>> makes a difference in performance.
>>
>> The nullability information in the schema is not accurate, as you have
>> observed.
>> However, such information is not well-used in the Java implementation
>> (IMHO). For example, the validity buffer is allocated even if the vector is
>> non-nullable.
>>
>> That said, I think it would be better to keep the nullability information
>> in sync.
>> So maybe we can open a JIRA to track it?
>>
>> Best,
>> Liya Fan
>>
>>
>> On Thu, May 6, 2021 at 3:09 PM Joris Peeters <joris.mg.peeters@gmail.com>
>> wrote:
>>
>>> Hello Fan,
>>>
>>> Yes, but it seems that code path only affects the consumers, and whether
>>> they set a value in the vector or not, see e.g.
>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57
>>> However, the VectorSchemaRoot's schema, defined I believe at
>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59,
>>> does not appear to use this info, and just sets every column's nullability
>>> to true (as per the link in my original email).
>>>
>>> Note that we are indeed using the ArrowVectorIterator, and it's when
>>> iterating over the iterator and inspecting the schema of the elements
>>> (VectorSchemaRoot) that I notice this.
>>> Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in
>>> `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary
>>> encoding */ null, metadata);`.
>>>
>>> Cheers,
>>> -J
>>>
>>> On Thu, May 6, 2021 at 5:53 AM Fan Liya <liya.fan03@gmail.com> wrote:
>>>
>>>> Hi Joris,
>>>>
>>>> Thanks for reporting the problem.
>>>>
>>>> We make use of the nullable information
>>>> in ArrowVectorIterator#initialize. (Details can be found in
>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73
>>>> )
>>>>
>>>> Please note that the  ArrowVectorIterator is our encouraged way of
>>>> using the JDBC adapter.
>>>>
>>>> Best,
>>>> Liya Fan
>>>>
>>>>
>>>> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <emkornfield@gmail.com>
>>>> wrote:
>>>>
>>>>> I would need to look further, but I thought we handled null vs not
>>>>> null.  At least I thought we had specialized conversion code to avoid
>>>>> branches.  If this isn't the case it seems reasonable to contribute a
path.
>>>>>
>>>>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters <
>>>>> joris.mg.peeters@gmail.com> wrote:
>>>>>
>>>>>> I'm looking to use the Java JDBC adapter for loading tables from
SQL
>>>>>> Server into Arrow record batches.
>>>>>>
>>>>>> At first glance the Arrow JDBC adapter seems to work well but, unless
>>>>>> I'm mistaken, it simply makes every vector nullable, irrespective
of
>>>>>> whether the corresponding SQL column is nullable or not.
>>>>>>
>>>>>> I think the line
>>>>>>
>>>>>> final FieldType fieldType = new FieldType(true, arrowType, /*
>>>>>> dictionary encoding */ null, metadata);
>>>>>>
>>>>>> in
>>>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158
>>>>>> might be the cause here.
>>>>>>
>>>>>> Is my interpretation correct, or am I missing a setting of sorts?
If
>>>>>> indeed correct, is there a fundamental reason the NULL-ness is not
>>>>>> transferred, or is this something I could contribute in a PR? (which
I'd be
>>>>>> happy to) I guess it's just a matter of inspecting the result metadata.
>>>>>>
>>>>>> Cheers,
>>>>>> -J
>>>>>>
>>>>>

Mime
View raw message