asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Homogeneous lists with nullable items
Date Fri, 18 Dec 2015 17:59:23 GMT
Hi Till,

As I was thinking through I have also realized those two separate issues. For now I am going
to concentrate on 1) as quick solution to the existing bugs, as you have pointed out.

From the design perspective I was thinking to reuse the serialization for homogenous lists
with nullable types could be the same as for heterogeneous list, but instead of ANY it will
have a nullable type tag. 
Yes, this will require a separate type tag for nullable type (we do have one already - ATypeTag.Union,
but do do provide serve for it). 
Current homogenous list representation should be unaffected.
Not sure what did you meant by redefining current representation.

> On Dec 18, 2015, at 01:57, Till Westmann <tillw@apache.org> wrote:
> 
> Hi Ildar,
> 
> it seems that we have 2 separate points here:
> 1) There are bugs in the way we decide which list representation to use and
> 2) we could add support for (and an optimized representation for) a list of a fixed but
nullable type.
> It seems that - by fixing 1) - we could get rid of the issues you’ve listed.
> 
> But I also think that it would be nice to support lists of a nullable type (feels like
an omission that we don’t support that in the language) - and potentially provide an efficient
representation for them.
> However, it is not clear to me how we would do this.
> A few thoughts:
> - Would we maintain the current representation for homogenous lists of non-nullable types?
> - Would we introduce a new type tag for “nullable lists”?
> - Would we redefine the current representation to mean something else?
> Do you have thoughts on those?
> 
> Cheers,
> Till
> 
> On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote:
> 
>> Hi devs,
>> 
>> Recently I have been playing around with lists and functions, which receive/return
list parameters/values. I have noticed one particular issue, which seems to be incorrect.
>> As you might know internally we do support 2 types of lists homogeneous, where all
the items are untagged and the item type is stored in type definition, and heterogeneous,
where items on contrary are tagged, and the list item type is effectively ANY.
>> The decision which of two types would be used is usually done by parser or is altered
by IntroduceEnforcedListTypeRule, which effectively turns heterogenous list into homogenous
if all the items have the same type.
>> Right now only we allow homogeneous lists to be defined as a field in some type,
we also restrict the item type to be only non-nullable type:
>> create type listType {
>> “id”:int64,
>> “list”:[int64]   // [int64?] is not possible
>> }
>> 
>> This constraint spans both of the language level as well as serialization. Under
that restriction the only way to load the list, which contains null values, would be to make
the appropriate field open (open lists are heterogenous by definition).
>> 
>> 1) Seems like we’re missing an optimization opportunity when we are dealing with
large sparse lists. Serialization in this case might use a bit mask to specify which items
in the lists are not null, and later encode only those items.
>> 2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list to homogeneous
list with nullable item type we might resolve issues https://issues.apache.org/jira/browse/ASTERIXDB-905,
https://issues.apache.org/jira/browse/ASTERIXDB-867, https://issues.apache.org/jira/browse/ASTERIXDB-1131all
at once.
>> 
>> Thoughts?
>> 
>> Best regards,
>> Ildar

Best regards,
Ildar


Mime
View raw message