calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: bigint conversion issue w/IN clauses
Date Thu, 30 Jul 2015 14:53:08 GMT
As a follow-up for this, I posted a patch to fix the problem here:
https://issues.apache.org/jira/browse/CALCITE-824

On Tue, Jul 28, 2015 at 7:13 PM, Josh Wills <jwills@cloudera.com> wrote:

> Hey devs,
>
> Found a fun issue for you that I couldn't find anywhere else after some
> googling.
>
> Let's say I have a BIGINT column named "q" in a table, and I want to do
> some filtering on a list of values from that column based on an IN clause.
> If all of the values in my IN clause happen to be INTs, then my query will
> work properly (i.e., it will return any rows from the table whose "q" value
> contains one of the values from my list) if the IN clause contains less
> than 20 values, but as soon as the IN clause contains more than 20 values,
> the query will always return zero rows.
>
> I did some explains and can see that the planner changes the strategy for
> the query execution when the number of values in an IN clause exceeds 20--
> less than 20 generates a big (q = 1 OR q = 2 OR ...) statement, whereas >=
> 20 switches to a SemiJoin between the original table and an
> EnumerableValues object that contains the list of values from the IN
> clause. I think the rub here is that if the values in the IN clause look
> like INTs or at least one of them isn't explicitly cast to a BIGINT, then
> the EnumerableValues will also have type int, and the semi-join will fail
> b/c of the type difference (that last part is me hypothesizing, you all
> would obviously know better what the problem is.)
>
> I can get around the issue by either casting one of the values in the IN
> clause to be a BIGINT, or by explicitly including a value in the list that
> can't be stored as an INT, but it seemed like there should be a cleaner way
> to handle this case so that the query behavior doesn't change dramatically
> when one extra value gets added to the IN list.
>
> Thanks!
> Josh
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message