From user-return-1225-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri May 7 02:22:56 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id EE7BC18060E for ; Fri, 7 May 2021 04:22:55 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 37CED617AD for ; Fri, 7 May 2021 02:22:54 +0000 (UTC) Received: (qmail 15426 invoked by uid 500); 7 May 2021 02:22:53 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 15411 invoked by uid 99); 7 May 2021 02:22:52 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 May 2021 02:22:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id 455281FF4AD for ; Fri, 7 May 2021 02:22:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: 0.249 X-Spam-Level: X-Spam-Status: No, score=0.249 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id aB1SOvUjb_3u for ; Fri, 7 May 2021 02:22:51 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::b36; helo=mail-yb1-xb36.google.com; envelope-from=liya.fan03@gmail.com; receiver= Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 8FA6D7FDC1 for ; Fri, 7 May 2021 02:22:51 +0000 (UTC) Received: by mail-yb1-xb36.google.com with SMTP id m9so10068725ybm.3 for ; Thu, 06 May 2021 19:22:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9fv5aeootkwztaz3HIHoT1PWbgVCY7RVJUSBiJIrWEw=; b=P28zSPzUOHVxKvxZllnhkv9/7cs8pkL4KAkw5j97z3nAelQS7S4s2lW9DbWwKLb7ah DFtRKyI0+lhPp4EIJov+2GNVRJzWsGAQ5AwrGX3uJMzzWXttl29WRfCKfoFQYzdEfIIE bPG89E3sHU7dk+OWJdqSAr6UUUMabjwXaCQjm6kVoxkoFbnKTD0K+SZyAflJCCcKeMuK K8P+JPJaRsJmA/L4Vy5uP5IN98qlVdclCJ7rrw9jgPg52V4EVpbv3mzrOY0CozUVlytP SzLNWSXfz9TN/EgEqC4MNt5yZ17WblDbfeMgQCzBlbiib3/E7e/Txa+IZPzgqfFJ/84H hQOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9fv5aeootkwztaz3HIHoT1PWbgVCY7RVJUSBiJIrWEw=; b=VNf6jvpsMfWX8J+pwu8SJWFG8QAkItRRJMqPEsuvvtDQBkt/NyVOV28Eix182T2GRq fsKaXb80hpTn+wCWUYzo9XLDpxwvkEURNJmlm1nVqNiYpRhoI5eimleKlGAptrFmG3UT Ya0/VMYcvmKTycj5hLTvd3/OQzb0aqyYkwTEAibEjs57kODbL0ARfnEn6VIuPUDjFVt2 6IL9ZEznnMZWo5wl2u3E9qCvfQGUM5WTwWRILr74ERbraX0VEWH/jKy++HkPOx/XbtES EnmJuW2u3awRwfBIBUBIgum53JONBNu0kbH3WCc5F0yhGeCkIh+qGObY1wElf20FEqEI mujQ== X-Gm-Message-State: AOAM5300qT9V23+/DqWQS8NrTiSGUtR0nObqUL7XdTuQlwD766ykusJe sLbjEZGb7wW7mVdY7LFqvsVhk26WiDfRP6tooNBxoT+V X-Google-Smtp-Source: ABdhPJyV5gakOqaFQLMPoLfydhN/b07U3JBZBkE6fSzie8neOgkbOHj0rjLaIvSHqvznIkxdqJ21DLWjQEfCBpTA31g= X-Received: by 2002:a25:3f07:: with SMTP id m7mr10136276yba.241.1620354170239; Thu, 06 May 2021 19:22:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fan Liya Date: Fri, 7 May 2021 10:21:51 +0800 Message-ID: Subject: Re: [Java JDBC adapter] non-nullable fields? To: user@arrow.apache.org Cc: Micah Kornfield Content-Type: multipart/alternative; boundary="000000000000ba3f6605c1b41a56" --000000000000ba3f6605c1b41a56 Content-Type: text/plain; charset="UTF-8" Hi Joris, I think you are right. We only use the nullability information in the consumers, because it makes a difference in performance. The nullability information in the schema is not accurate, as you have observed. However, such information is not well-used in the Java implementation (IMHO). For example, the validity buffer is allocated even if the vector is non-nullable. That said, I think it would be better to keep the nullability information in sync. So maybe we can open a JIRA to track it? Best, Liya Fan On Thu, May 6, 2021 at 3:09 PM Joris Peeters wrote: > Hello Fan, > > Yes, but it seems that code path only affects the consumers, and whether > they set a value in the vector or not, see e.g. > https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/DoubleConsumer.java#L57 > However, the VectorSchemaRoot's schema, defined I believe at > https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L59, > does not appear to use this info, and just sets every column's nullability > to true (as per the link in my original email). > > Note that we are indeed using the ArrowVectorIterator, and it's when > iterating over the iterator and inspecting the schema of the elements > (VectorSchemaRoot) that I notice this. > Maybe all this needs is a `isColumnNullable(i, ..)` instead of `true` in > `final FieldType fieldType = new FieldType(true, arrowType, /* dictionary > encoding */ null, metadata);`. > > Cheers, > -J > > On Thu, May 6, 2021 at 5:53 AM Fan Liya wrote: > >> Hi Joris, >> >> Thanks for reporting the problem. >> >> We make use of the nullable information >> in ArrowVectorIterator#initialize. (Details can be found in >> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73 >> ) >> >> Please note that the ArrowVectorIterator is our encouraged way of using >> the JDBC adapter. >> >> Best, >> Liya Fan >> >> >> On Wed, May 5, 2021 at 1:42 PM Micah Kornfield >> wrote: >> >>> I would need to look further, but I thought we handled null vs not >>> null. At least I thought we had specialized conversion code to avoid >>> branches. If this isn't the case it seems reasonable to contribute a path. >>> >>> On Tue, May 4, 2021 at 3:39 AM Joris Peeters >>> wrote: >>> >>>> I'm looking to use the Java JDBC adapter for loading tables from SQL >>>> Server into Arrow record batches. >>>> >>>> At first glance the Arrow JDBC adapter seems to work well but, unless >>>> I'm mistaken, it simply makes every vector nullable, irrespective of >>>> whether the corresponding SQL column is nullable or not. >>>> >>>> I think the line >>>> >>>> final FieldType fieldType = new FieldType(true, arrowType, /* >>>> dictionary encoding */ null, metadata); >>>> >>>> in >>>> https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java#L158 >>>> might be the cause here. >>>> >>>> Is my interpretation correct, or am I missing a setting of sorts? If >>>> indeed correct, is there a fundamental reason the NULL-ness is not >>>> transferred, or is this something I could contribute in a PR? (which I'd be >>>> happy to) I guess it's just a matter of inspecting the result metadata. >>>> >>>> Cheers, >>>> -J >>>> >>> --000000000000ba3f6605c1b41a56 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Joris,

I think you are right.
<= div>
We only use the nullability information in the consumers= , because it makes a difference in performance.

Th= e nullability information in the schema is not accurate, as you have observ= ed.=C2=A0
However, such information is not well-used in the Java = implementation (IMHO). For example, the validity buffer is allocated even i= f the vector is non-nullable.

That said, I think i= t would be better to keep the nullability information in sync.
So= maybe we can open a JIRA to track it?

Best,
=
Liya Fan
=C2=A0

On Thu, May 6, 2021 at 3:09 PM Joris Pe= eters <joris.mg.peeters@gm= ail.com> wrote:
Hello Fan,

Yes, but it seems tha= t code path only affects the consumers, and whether they set a value in the= vector or not, see e.g.=C2=A0https://github.com/apache/= arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/= jdbc/consumer/DoubleConsumer.java#L57
However, the VectorSche= maRoot's schema, defined I believe at=C2=A0https://githu= b.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/a= rrow/adapter/jdbc/ArrowVectorIterator.java#L59, does not appear to use = this info, and just sets every column's nullability to true (as per the= link in my original email).

Note that we are inde= ed using the ArrowVectorIterator, and it's when iterating over the iter= ator and inspecting the schema of the elements (VectorSchemaRoot) that I no= tice this.
Maybe all this needs is a `isColumnNullable(i, ..)` in= stead of `true` in `final FieldType fieldType =3D new FieldType(true, arrow= Type, /* dictionary encoding */ null, metadata);`.

=
Cheers,
-J

On Thu, May 6, 2021 at 5:53 AM Fan Liya <= liya.fan03@gmail.= com> wrote:
Hi Joris,

Thanks for reporting the p= roblem.

We make use of the nullable information in= =C2=A0ArrowVectorIterator#initialize. (Details can be found in https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/j= ava/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L73)

Please note that the=C2=A0 ArrowVectorIterator is our encouraged way of using the JDBC adapter.
<= div>
Best,
Liya Fan


<= div class=3D"gmail_quote">
On Wed, May= 5, 2021 at 1:42 PM Micah Kornfield <emkornfield@gmail.com> wrote:
I would need = to look further, but I thought we handled null vs not null.=C2=A0 At least = I thought we had specialized conversion code to avoid branches.=C2=A0 If th= is isn't the case it seems reasonable to contribute a path.

On Tue, May = 4, 2021 at 3:39 AM Joris Peeters <joris.mg.peeters@gmail.com> wrote:
I'= m looking to use the Java JDBC adapter for loading tables from SQL Server i= nto Arrow record batches.=C2=A0

At first glance the Arro= w JDBC adapter seems to work well but, unless I'm mistaken, it simply m= akes every vector nullable, irrespective of whether the corresponding SQL c= olumn is nullable or not.=C2=A0

I think the line= =C2=A0

final FieldType fieldType =3D new FieldType= (true, arrowType, /* dictionary encoding */ null, metadata);

=

Is my i= nterpretation correct, or am I missing a setting of sorts? If indeed correc= t, is there a fundamental reason the NULL-ness is not transferred, or is th= is something I could contribute in a PR? (which I'd be happy to) I gues= s it's just a matter of inspecting the result metadata.

<= /div>
Cheers,
-J
--000000000000ba3f6605c1b41a56--