drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3831) Allow null values in lists
Date Fri, 25 Sep 2015 19:03:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908533#comment-14908533

Jason Altekruse commented on DRILL-3831:

I think that this might not be the correct issue for this problem. I think the issue that
is discussed in DRILL-2796 is actually related to untyped nulls. While we will want to support
untyped nulls in a list (such as the JSON below), this issue is primarily concerned with allowing
nulls in lists where the type is known. Another issue has been opened recently to allow for
general untyped nulls, it is DRILL-3806, these two units of work will have to be combined
to allow for untyped nulls in lists, but I don't know if we are actually using the Drill repeated
type to represent the members of the IN list. If that were the case than the combination of
these two JIRAs would be needed to solve the problem.

> Allow null values in lists
> --------------------------
>                 Key: DRILL-3831
>                 URL: https://issues.apache.org/jira/browse/DRILL-3831
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Data Types
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
>             Fix For: 1.3.0
> Drill currently fails to read a json file where a list has a value of null in it. We
have a workaround with all_text_mode for this case, but we need to enhance Drill to support
this concept in the core ValueVector data structure used to represent records.
> As part of this change, I am considering removing the concept of a list that requires
all of its members to be non-null, effectively the only type of list we have today. The data
that can be read today would simply be read into a list where the members could be nullable,
but they all happen to be non-null. This would simplify the code to prevent the need to cover
the null and non-null cases explicitly.
> Initially this could pose a risk with a minor performance hit, but overall our approach
with complex data is not been heavily performance tested. Keeping the code simple for now
will at least allow for more thorough testing of the smaller number of cases, and hopefully
make it easier to reason about and improve as we evaluate the performance of Drill with complex
data more thoroughly.

This message was sent by Atlassian JIRA

View raw message