arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nuernberger <ch...@techascent.com>
Subject Re: Validity masks and nullable
Date Sun, 26 Jul 2020 16:43:03 GMT
Makes sense, I buy that :-).  Thanks.

On Sun, Jul 26, 2020 at 10:38 AM Jacques Nadeau <jacques@apache.org> wrote:

> I think your first question is: can I skip the validity buffer if I know
> all values are defined.
>
> In the Java library, you cannot. This was a design choice to simplify
> implementations. The memory consumption difference is relatively small and
> collapsing the concepts was done to clean up code.
>
> Fun fact: This was done in the second design iteration of the Java library
> (the first one included support for this). We identified that many sources
> of data are actually all annotated as nullable but are mostly or are all
> non-null. Part of this is user laziness, part due to tools since they
> frequently don't support generating both types of data (writers of Parquet
> frequently do this, for example). As such, we found that wordwise
> operations against validity vectors that adapt processing code based on
> continuous sequences of nullable and non-nullable values was actually
> substantially more beneficial to generalized real-world workloads (while
> also simplifying the codebase).
>
> On Sun, Jul 26, 2020 at 7:00 AM Chris Nuernberger <chris@techascent.com>
> wrote:
>
>> Hi, I have a question about the actual file format and how it is
>> reflected in the Java api.
>>
>> 1.  Are validity masks necessary of nullable is false?
>> 2.  Does the java system reflect the implications of #1?  Can I create a
>> vector with a null validity mask?
>>
>> Thanks again (and again and again) for you help :-).
>>
>> Chris
>>
>

Mime
View raw message