arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Peeters <joris.mg.peet...@gmail.com>
Subject Re: [Java JDBC adapter] non-nullable fields?
Date Fri, 07 May 2021 07:54:53 GMT
Fair enough.
I have this data moving through a few different servers and clients, in IPC
streaming format, consumed on various platforms/languages. The nullability
in the schema is often used in "language-friendly" clients, e.g. to build a
`std::vector<bool>` or `std::vector<std::optional<bool>>` depending on
whether the bit column is nullable, so preserving this information is quite
important, even if locally in Java it makes little difference.

I've worked around it for now by fudging the VectorSchemaRoot's schema
myself, but I'll open a JIRA to track, and I'll assign it to myself and
provide a fix.

Cheers!
-Joris.


On Fri, May 7, 2021 at 3:22 AM Fan Liya <liya.fan03@gmail.com> wrote:

> Hi Joris,
>
> I think you are right.
>
> We only use the nullability information in the consumers, because it makes
> a difference in performance.
>
> The nullability information in the schema is not accurate, as you have
> observed.
> However, such information is not well-used in the Java implementation
> (IMHO). For example, the validity buffer is allocated even if the vector is
> non-nullable.
>
> That said, I think it would be better to keep the nullability information
> in sync.
> So maybe we can open a JIRA to track it?
>
> Best,
> Liya Fan
>
>
> On Thu, May 6, 2021 at 3:09 PM Joris Peeters <joris.mg.peeters@gmail.com>
> wrote:
>
>> Hello Fan,
>>
>> Yes, but it seems that code path only affects the consumers, and whether
>> they set a value in the vector or not, see e.g.
>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57
>> However, the VectorSchemaRoot's schema, defined I believe at
>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59,
>> does not appear to use this info, and just sets every column's nullability
>> to true (as per the link in my original email).
>>
>> Note that we are indeed using the ArrowVectorIterator, and it's when
>> iterating over the iterator and inspecting the schema of the elements
>> (VectorSchemaRoot) that I notice this.
>> Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in
>> `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary
>> encoding */ null, metadata);`.
>>
>> Cheers,
>> -J
>>
>> On Thu, May 6, 2021 at 5:53 AM Fan Liya <liya.fan03@gmail.com> wrote:
>>
>>> Hi Joris,
>>>
>>> Thanks for reporting the problem.
>>>
>>> We make use of the nullable information
>>> in ArrowVectorIterator#initialize. (Details can be found in
>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73
>>> )
>>>
>>> Please note that the  ArrowVectorIterator is our encouraged way of using
>>> the JDBC adapter.
>>>
>>> Best,
>>> Liya Fan
>>>
>>>
>>> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield <emkornfield@gmail.com>
>>> wrote:
>>>
>>>> I would need to look further, but I thought we handled null vs not
>>>> null.  At least I thought we had specialized conversion code to avoid
>>>> branches.  If this isn't the case it seems reasonable to contribute a path.
>>>>
>>>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters <
>>>> joris.mg.peeters@gmail.com> wrote:
>>>>
>>>>> I'm looking to use the Java JDBC adapter for loading tables from SQL
>>>>> Server into Arrow record batches.
>>>>>
>>>>> At first glance the Arrow JDBC adapter seems to work well but, unless
>>>>> I'm mistaken, it simply makes every vector nullable, irrespective of
>>>>> whether the corresponding SQL column is nullable or not.
>>>>>
>>>>> I think the line
>>>>>
>>>>> final FieldType fieldType = new FieldType(true, arrowType, /*
>>>>> dictionary encoding */ null, metadata);
>>>>>
>>>>> in
>>>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158
>>>>> might be the cause here.
>>>>>
>>>>> Is my interpretation correct, or am I missing a setting of sorts? If
>>>>> indeed correct, is there a fundamental reason the NULL-ness is not
>>>>> transferred, or is this something I could contribute in a PR? (which
I'd be
>>>>> happy to) I guess it's just a matter of inspecting the result metadata.
>>>>>
>>>>> Cheers,
>>>>> -J
>>>>>
>>>>

Mime
View raw message