asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <>
Subject Re: Asterix Schema Provider Framework
Date Wed, 20 Jan 2016 20:33:00 GMT
Hi Chen,

On Asterix, I think the type UNION is an Algebricks type only. The user
cannot define type UNION on their schema (except for optional which is the
equivalent of UNION(null,OTHER_TYPE).)
To represent "name" on Asterix,  the user should leave it as open. In the
storage level, "name" will be a "tagged" field and Algebricks type-computer
will infer it as type ANY.

What I'm suggesting is that if the types get inferred on ingestion, "name"
can have the type UNION(record,list[record]) instead of ANY. The question
probably is how is that useful? well .. we can help Asterix to fail at
compile time instead of failing at run-time. For example,

let's assume the following:
"x":[1, 2, 3, 4]
"x" ["hello", "world"]

The inferred type of "x" will be UNION(list[int32],list[string]) which
implies that we can apply the function count() without a problem. However,
for "name" in the previous example, count() will throw an exception.

Also, I believe knowing the schema will reduce the "code size" to handle
corner cases of the open type. For example, a bug I forgot to file:

use dataverse wosDataverse
let $c := (for $x in dataset wos
let $id := $
group by $country := $ with $id
return {"country":$country, "id" : $id})

return count($

Unsupported type UNION(NULL, [ null: open { id: [ ANY ] } ]) for field
access expression: function-call: asterix:field-access-by-name,
Args:[%0->$4, AString: {id}] [AlgebricksException]

I know it's a stupid query :-)) .. but I had a case which enforced me to do
that way.

As for Spark, String is their "open" type. But, with the limitation that
you cannot apply any operation such as count because it's just a string

Thank you Chen for your feedback and engagement.

P.S @Till: I'm removing the modifications on Algebricks to have the schema
framework on Asterix.

On Mon, Jan 18, 2016 at 1:47 PM, Chen Li <> wrote:

> Wail, Thanks for the detailed examples and documentation.  They are
> very helpful.  Just curious: for the provided example, we infer this
> "name" type as UNION of both record and a list of records.  Is it just
> a heuristic?  Is there any "principle" behind this approach, compared
> the approach by Spark's approach of inferring it as a String?
> Chen


Wail Alkowaileet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message