calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: bigint conversion issue w/IN clauses
Date Thu, 30 Jul 2015 18:53:47 GMT
And a test case too. Awesome!

I will review.

> On Jul 30, 2015, at 7:53 AM, Josh Wills <jwills@cloudera.com> wrote:
> 
> As a follow-up for this, I posted a patch to fix the problem here:
> https://issues.apache.org/jira/browse/CALCITE-824
> 
> On Tue, Jul 28, 2015 at 7:13 PM, Josh Wills <jwills@cloudera.com> wrote:
> 
>> Hey devs,
>> 
>> Found a fun issue for you that I couldn't find anywhere else after some
>> googling.
>> 
>> Let's say I have a BIGINT column named "q" in a table, and I want to do
>> some filtering on a list of values from that column based on an IN clause.
>> If all of the values in my IN clause happen to be INTs, then my query will
>> work properly (i.e., it will return any rows from the table whose "q" value
>> contains one of the values from my list) if the IN clause contains less
>> than 20 values, but as soon as the IN clause contains more than 20 values,
>> the query will always return zero rows.
>> 
>> I did some explains and can see that the planner changes the strategy for
>> the query execution when the number of values in an IN clause exceeds 20--
>> less than 20 generates a big (q = 1 OR q = 2 OR ...) statement, whereas >=
>> 20 switches to a SemiJoin between the original table and an
>> EnumerableValues object that contains the list of values from the IN
>> clause. I think the rub here is that if the values in the IN clause look
>> like INTs or at least one of them isn't explicitly cast to a BIGINT, then
>> the EnumerableValues will also have type int, and the semi-join will fail
>> b/c of the type difference (that last part is me hypothesizing, you all
>> would obviously know better what the problem is.)
>> 
>> I can get around the issue by either casting one of the values in the IN
>> clause to be a BIGINT, or by explicitly including a value in the list that
>> can't be stored as an INT, but it seemed like there should be a cleaner way
>> to handle this case so that the query behavior doesn't change dramatically
>> when one extra value gets added to the IN list.
>> 
>> Thanks!
>> Josh
>> 
>> 
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>> 
> 
> 
> 
> -- 
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>


Mime
View raw message