drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Query a field with empty list
Date Mon, 02 Nov 2015 21:27:11 GMT
I do agree that we should distinguish between empty lists and nulls. I
consider this related to the outstanding task to allow for untyped nulls,
as empty lists like this in JSON does not give us enough type information
to materialize any known type.

This does have the unfortunate property that we are actually losing
information today during the JSON read. There is very clearly a list there,
we just don't know the inner type. In this case we leave it out of the
schema entirely because we require a Drill type to specify both
repeatability/nullability and a data type (int, varchar, map, etc.). We
should materialize a list type with a untyped null as the inner type, so we
at least know that it is not a scalar. The confusing behavior with flatten
(or other operations that take a list, like repeated_contains) is that it
will fail with a message saying there is no repeated_contains function
implementation that takes a nullable bigint type, because this is what we
fill in when there is nothing at that element in the schema (the way we
left it after encountering an untyped empty list in the JSON reader).

As I understand it, Steven's outstanding work to enable changing types with
the union vector does allow for untyped nulls, I'm not sure if this also
covers untyped empty lists. He does currently has the union type disabled
by default, and it can be turned on with an option. We should discuss if we
should implement some of the concepts he adds with union type in the case
where it is not enabled. I assume that after some more thorough testing we
will want to default turn it on, but in the meantime we might want to ease
some users pain points like this.

On Neeraja's point about whether or not an empty list should produce an
output row when flattened, please see my comment here [1]. If there is a
need for this feature we can consider adding it, perhaps with an alternate
function name, but I think that the current behavior of flatten concerning
empty lists (to ignore them) is useful for the reasons I mention on the
JIRA.

[1] - https://issues.apache.org/jira/browse/DRILL-2153


On Mon, Nov 2, 2015 at 1:09 PM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Seems empty list is getting the vote for select. For consistency would it
> then make sense to have flatten produce a single row with an empty list in
> the result?
>
>
> —Andries
>
> > On Nov 2, 2015, at 12:55 PM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
> >
> > Empty list sounds like the appropriate thing.
> > What would be the result of we do flatten on this list?
> > I think user should get a row back in the results.
> >
> >
> >
> > On Mon, Nov 2, 2015 at 11:53 AM, Hsuan Yi Chu <hyichu@maprtech.com>
> wrote:
> >
> >> I agree. I think in any context, empty list should appear as [] in the
> >> result.
> >>
> >> Any opposition?
> >>
> >> On Mon, Nov 2, 2015 at 11:50 AM, Zelaine Fong <zfong@maprtech.com>
> wrote:
> >>
> >>> Wouldn't it make more sense to return an empty list in this case?  Null
> >> is
> >>> not quite the same as an empty list.  So, I would think you'd want a
> >>> distinction between the two.
> >>>
> >>> -- Zelaine
> >>>
> >>> On Mon, Nov 2, 2015 at 11:41 AM, Hsuan Yi Chu <hyichu@maprtech.com>
> >> wrote:
> >>>
> >>>> There could be inconsistent results:
> >>>> https://issues.apache.org/jira/browse/DRILL-4007
> >>>>
> >>>> On Mon, Nov 2, 2015 at 8:17 AM, Andries Engelbrecht <
> >>>> aengelbrecht@maprtech.com> wrote:
> >>>>
> >>>>> Currently in 1.2 I observe the following.
> >>>>>
> >>>>> 0: jdbc:drill:> select a from dfs.json.`/test.json`;
> >>>>> +-------+
> >>>>> |   a   |
> >>>>> +-------+
> >>>>> | null  |
> >>>>> +———+
> >>>>>
> >>>>> But then flatten produces an error
> >>>>>
> >>>>> 0: jdbc:drill:> select flatten(a) from dfs.json.`/test.json`;
> >>>>> Error: SYSTEM ERROR: ClassCastException: Cannot cast
> >>>>> org.apache.drill.exec.vector.NullableIntVector to
> >>>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
> >>>>>
> >>>>> Ideally both should return similar responses, in most cases it may
be
> >>>>> easier to work with a null being returned than no record or an empty
> >>>>> string. Returning [] might be interpreted as a string data type
> >> pending
> >>>>> user experience or tools being used??
> >>>>>
> >>>>> —Andries
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Oct 31, 2015, at 4:44 PM, Hsuan Yi Chu <hyichu@maprtech.com>
> >>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>> For example, say, we are querying a field with empty list:
> >>>>>>
> >>>>>> select a from `XXX,json`
> >>>>>>
> >>>>>> {
> >>>>>> a: []
> >>>>>> }
> >>>>>>
> >>>>>> What is the expected result in the sqlline? Which one of the
> >>> following
> >>>>>> should it be?
> >>>>>>
> >>>>>> 1. []
> >>>>>> 2. null
> >>>>>> 3. no record
> >>>>>>
> >>>>>> Thanks.
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message