drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Query a field with empty list
Date Mon, 02 Nov 2015 21:37:51 GMT
Sounds to me that FLATTEN over an empty list NULL should produce 0 row.

That's actually the behavior of UNNEST of empty array, or NULL, in Postgres.

Unless we have a strong reason that FLATTEN should behavior
differently than UNNEST, I think it makes sense for FLATTEN to produce
0 row over empty list.


mydb=# create table temp (id integer, pnumber integer[]);
CREATE TABLE
mydb=# insert into temp values (1, null);
INSERT 0 1
mydb=# insert into temp values (2, ARRAY[100,200]);
INSERT 0 1
mydb=# insert into temp values (3, '{}');
INSERT 0 1
mydb=# select * from temp;
 id |  pnumber
----+-----------
  1 |
  2 | {100,200}
  3 | {}
(3 rows)

mydb=# select id, t.phone_number from temp, unnest(temp.pnumber) as
t(phone_number);
 id | phone_number
----+--------------
  2 |          100
  2 |          200
(2 rows)



On Mon, Nov 2, 2015 at 1:27 PM, Jason Altekruse
<altekrusejason@gmail.com> wrote:
> I do agree that we should distinguish between empty lists and nulls. I
> consider this related to the outstanding task to allow for untyped nulls,
> as empty lists like this in JSON does not give us enough type information
> to materialize any known type.
>
> This does have the unfortunate property that we are actually losing
> information today during the JSON read. There is very clearly a list there,
> we just don't know the inner type. In this case we leave it out of the
> schema entirely because we require a Drill type to specify both
> repeatability/nullability and a data type (int, varchar, map, etc.). We
> should materialize a list type with a untyped null as the inner type, so we
> at least know that it is not a scalar. The confusing behavior with flatten
> (or other operations that take a list, like repeated_contains) is that it
> will fail with a message saying there is no repeated_contains function
> implementation that takes a nullable bigint type, because this is what we
> fill in when there is nothing at that element in the schema (the way we
> left it after encountering an untyped empty list in the JSON reader).
>
> As I understand it, Steven's outstanding work to enable changing types with
> the union vector does allow for untyped nulls, I'm not sure if this also
> covers untyped empty lists. He does currently has the union type disabled
> by default, and it can be turned on with an option. We should discuss if we
> should implement some of the concepts he adds with union type in the case
> where it is not enabled. I assume that after some more thorough testing we
> will want to default turn it on, but in the meantime we might want to ease
> some users pain points like this.
>
> On Neeraja's point about whether or not an empty list should produce an
> output row when flattened, please see my comment here [1]. If there is a
> need for this feature we can consider adding it, perhaps with an alternate
> function name, but I think that the current behavior of flatten concerning
> empty lists (to ignore them) is useful for the reasons I mention on the
> JIRA.
>
> [1] - https://issues.apache.org/jira/browse/DRILL-2153
>
>
> On Mon, Nov 2, 2015 at 1:09 PM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
>> Seems empty list is getting the vote for select. For consistency would it
>> then make sense to have flatten produce a single row with an empty list in
>> the result?
>>
>>
>> —Andries
>>
>> > On Nov 2, 2015, at 12:55 PM, Neeraja Rentachintala <
>> nrentachintala@maprtech.com> wrote:
>> >
>> > Empty list sounds like the appropriate thing.
>> > What would be the result of we do flatten on this list?
>> > I think user should get a row back in the results.
>> >
>> >
>> >
>> > On Mon, Nov 2, 2015 at 11:53 AM, Hsuan Yi Chu <hyichu@maprtech.com>
>> wrote:
>> >
>> >> I agree. I think in any context, empty list should appear as [] in the
>> >> result.
>> >>
>> >> Any opposition?
>> >>
>> >> On Mon, Nov 2, 2015 at 11:50 AM, Zelaine Fong <zfong@maprtech.com>
>> wrote:
>> >>
>> >>> Wouldn't it make more sense to return an empty list in this case?  Null
>> >> is
>> >>> not quite the same as an empty list.  So, I would think you'd want a
>> >>> distinction between the two.
>> >>>
>> >>> -- Zelaine
>> >>>
>> >>> On Mon, Nov 2, 2015 at 11:41 AM, Hsuan Yi Chu <hyichu@maprtech.com>
>> >> wrote:
>> >>>
>> >>>> There could be inconsistent results:
>> >>>> https://issues.apache.org/jira/browse/DRILL-4007
>> >>>>
>> >>>> On Mon, Nov 2, 2015 at 8:17 AM, Andries Engelbrecht <
>> >>>> aengelbrecht@maprtech.com> wrote:
>> >>>>
>> >>>>> Currently in 1.2 I observe the following.
>> >>>>>
>> >>>>> 0: jdbc:drill:> select a from dfs.json.`/test.json`;
>> >>>>> +-------+
>> >>>>> |   a   |
>> >>>>> +-------+
>> >>>>> | null  |
>> >>>>> +———+
>> >>>>>
>> >>>>> But then flatten produces an error
>> >>>>>
>> >>>>> 0: jdbc:drill:> select flatten(a) from dfs.json.`/test.json`;
>> >>>>> Error: SYSTEM ERROR: ClassCastException: Cannot cast
>> >>>>> org.apache.drill.exec.vector.NullableIntVector to
>> >>>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> >>>>>
>> >>>>> Ideally both should return similar responses, in most cases
it may be
>> >>>>> easier to work with a null being returned than no record or
an empty
>> >>>>> string. Returning [] might be interpreted as a string data type
>> >> pending
>> >>>>> user experience or tools being used??
>> >>>>>
>> >>>>> —Andries
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>> On Oct 31, 2015, at 4:44 PM, Hsuan Yi Chu <hyichu@maprtech.com>
>> >>> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>> For example, say, we are querying a field with empty list:
>> >>>>>>
>> >>>>>> select a from `XXX,json`
>> >>>>>>
>> >>>>>> {
>> >>>>>> a: []
>> >>>>>> }
>> >>>>>>
>> >>>>>> What is the expected result in the sqlline? Which one of
the
>> >>> following
>> >>>>>> should it be?
>> >>>>>>
>> >>>>>> 1. []
>> >>>>>> 2. null
>> >>>>>> 3. no record
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Mime
View raw message