calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Rewriting IN
Date Wed, 29 Jul 2015 21:37:49 GMT

> On Jul 28, 2015, at 4:06 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
> 
>> But if you solve the general problem,
> 
> That is important. However, as we solve that "general" problem, we
> should take into account that "downstream engines" might support
> 3-valued logic.
> For instance, OracleDB has null-aware anti-join. In other words, "not
> in" just works yet still supports 3-valued logic behind the scenes.
> 
> My point #1 is:
> 1.1) When I approach IN/NOT IN kind of queries, I typically add NOT
> NULL everywhere. This solves 3-valued problem. As you add "not null"
> you identify the required handling of nulls in a particular query.
> 1.2) As Calcite focuses on query optimization, it should probably
> treat "the most typical cases" first. For instance, it shouldn't
> convert every join to cross join+count "just in case" there will be
> nulls.

I have to do both — run every query correctly, and run the common cases quickly. I am finding
that I can optimize quite a lot, and easily, by starting from the general case and removing
every “is not null” that I know at compile time will always evaluate to true.

There is a further optimization where EXISTS and IN turn into a semi-join. The EXISTS or IN
expressions are completely removed from the WHERE clause - because if they had failed the
row would have been removed already. That is a more difficult optimization to make.

Null-aware anti-join would be a nice feature to add. Hive has null-safe join also. The present
framework could probably accommodate it, but I am not thinking about it right now.

There are also big gains to be had in decorrelation. I am not looking at decorrelation right
now.

> 
>> So, Oracle is compliant with my understanding of the standard.
> 
> Thanks for taking time and explaining this.
> I just tried to provoke some conversation over a well-known
> hard-to-reason-about problem.
> I was inspired with your topic and tried a couple of
> never-ask-in-an-interview queries.

It would be great if you could add some of your nasty queries to subquery.oq. Test coverage
will help us stay on the right path.

If you write a query and say “This ought to be done using a plan that uses just a semi-join”
then I can either refute your assertion or implement the necessary planner rule.

Julian


Mime
View raw message