asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Lychagin <dmitry.lycha...@couchbase.com>
Subject New identifier resolution rules in SQL++
Date Sat, 16 Dec 2017 02:45:01 GMT
Hi All,

I just merged the change that modifies how identifiers are resolved in SQL++.
https://asterix-gerrit.ics.uci.edu/#/c/2207/

Previously we relied on input schema information when resolving identifiers that did not refer
to in-scope variables.
For example,
select count(*) from customer join order on c_id = o_cid

variables in scope: customer, order
If we knew schemas for customer (c_id) and order (o_cid) then we resolved c_id as customer.c_id
and o_cid as order.o_cid
We raised an error if there was more than one variable that an identifier could potentially
be resolved to
(let’s say both datasets were open types and c_id could appear in both).

After this change, we no longer rely on schema information when resolving identifiers.
The new rules are the following:

-         If the identifier refers to a variable in scope then it is resolved to that variable
(same as before)

-         Otherwise if we’re in the FROM clause (or IN) then it is resolved to a dataset
if it exists (dataverse.dataset also ok), otherwise it’s an error

-         Otherwise (the identifier is not in the FROM clause)

o   If there’s a single variable bound in the nearest query block then it’s resolved as
a field access on that variable

o   Otherwise if there’s more than one variable bound in the nearest query block then we
raise an error (“ambiguous reference”)

o   We haven’t finalized the case when there’re no variables bound. Currently we resolve
to a dataset, but we might raise an error in the future.

So the query above fails under the new resolution rules because there are two variables bound
in scope and we no longer rely on their schemas.
There’re a couple of options to fix it.

1)    use variables implicitly created by the from clause:
select count(*) from customer join order on customer.c_id = order.o_cid

2)    explicitly define variables in the from clause and refer to them:
select count(*) from customer c join order o on c.c_id = o.o_cid


Note that queries with a single variable in the from clause continue to work as before:
select c_id from customer
is resolved as
select customer.c_id from customer
because there’s a variable ‘customer’ introduced by the from clause.

select c_id from customer c
also works because there’s a single variable in scope (explicitly defined in the from clause)

The review in the link above contains more examples of how our existing test cases were modified
to follow the new rule.

Please let me know if you have any comments or questions regarding this change.

Thanks,
-- Dmitry

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message