arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Validity masks and nullable
Date Sun, 26 Jul 2020 16:38:02 GMT
I think your first question is: can I skip the validity buffer if I know
all values are defined.

In the Java library, you cannot. This was a design choice to simplify
implementations. The memory consumption difference is relatively small and
collapsing the concepts was done to clean up code.

Fun fact: This was done in the second design iteration of the Java library
(the first one included support for this). We identified that many sources
of data are actually all annotated as nullable but are mostly or are all
non-null. Part of this is user laziness, part due to tools since they
frequently don't support generating both types of data (writers of Parquet
frequently do this, for example). As such, we found that wordwise
operations against validity vectors that adapt processing code based on
continuous sequences of nullable and non-nullable values was actually
substantially more beneficial to generalized real-world workloads (while
also simplifying the codebase).

On Sun, Jul 26, 2020 at 7:00 AM Chris Nuernberger <chris@techascent.com>
wrote:

> Hi, I have a question about the actual file format and how it is reflected
> in the Java api.
>
> 1.  Are validity masks necessary of nullable is false?
> 2.  Does the java system reflect the implications of #1?  Can I create a
> vector with a null validity mask?
>
> Thanks again (and again and again) for you help :-).
>
> Chris
>

Mime
View raw message