Return-Path: X-Original-To: apmail-calcite-dev-archive@www.apache.org Delivered-To: apmail-calcite-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ACB7F184D4 for ; Thu, 30 Jul 2015 18:54:00 +0000 (UTC) Received: (qmail 34590 invoked by uid 500); 30 Jul 2015 18:53:51 -0000 Delivered-To: apmail-calcite-dev-archive@calcite.apache.org Received: (qmail 34524 invoked by uid 500); 30 Jul 2015 18:53:51 -0000 Mailing-List: contact dev-help@calcite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.incubator.apache.org Delivered-To: mailing list dev@calcite.incubator.apache.org Received: (qmail 34513 invoked by uid 99); 30 Jul 2015 18:53:50 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 18:53:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 698FFD8F3B for ; Thu, 30 Jul 2015 18:53:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Xva-_BDBOcBA for ; Thu, 30 Jul 2015 18:53:42 +0000 (UTC) Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 91C4E20B6F for ; Thu, 30 Jul 2015 18:53:41 +0000 (UTC) Received: by pacan13 with SMTP id an13so28075257pac.1 for ; Thu, 30 Jul 2015 11:53:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=tOH0aumJnP+MNJrT9+7iKblphV0OjRa59Ax6DHaCN7A=; b=IcI6DwRCnagpHR3jBBcHYTNmDycuacawKK0nyYe2AWHcawRxhU5I43kCX39z+e0CFJ SUE5RpvL1revIzeENNq5MWLq68lb+aGJDIPlvOKWjghtW1dAA8ACLCacw42LEc56GS3Y KDhWedh2ATJYuoRHQCVeDdn0W3MTWOQGb6HofsbP4wLgU21dZPSvm7H8giVtVp2Yswb6 9gKyS1kqnjlkDRBJmo8y92sapd1JhBe5LTnnwh7LjdMFFxQhYkmd4ON+WL09YfATtkcx L03tQqocUQr7adD7ByvWh/2zpO7m8sz5TwIge6rNL7Qwftvno8jm6VulFDdc3jFucZQj RIgw== X-Received: by 10.66.235.104 with SMTP id ul8mr110810710pac.33.1438282420123; Thu, 30 Jul 2015 11:53:40 -0700 (PDT) Received: from [10.10.9.208] ([192.175.27.21]) by smtp.gmail.com with ESMTPSA id ca13sm3473068pac.25.2015.07.30.11.53.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 Jul 2015 11:53:39 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) Subject: Re: bigint conversion issue w/IN clauses From: Julian Hyde In-Reply-To: Date: Thu, 30 Jul 2015 11:53:47 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: dev@calcite.incubator.apache.org X-Mailer: Apple Mail (2.2102) And a test case too. Awesome! I will review. > On Jul 30, 2015, at 7:53 AM, Josh Wills wrote: >=20 > As a follow-up for this, I posted a patch to fix the problem here: > https://issues.apache.org/jira/browse/CALCITE-824 >=20 > On Tue, Jul 28, 2015 at 7:13 PM, Josh Wills = wrote: >=20 >> Hey devs, >>=20 >> Found a fun issue for you that I couldn't find anywhere else after = some >> googling. >>=20 >> Let's say I have a BIGINT column named "q" in a table, and I want to = do >> some filtering on a list of values from that column based on an IN = clause. >> If all of the values in my IN clause happen to be INTs, then my query = will >> work properly (i.e., it will return any rows from the table whose "q" = value >> contains one of the values from my list) if the IN clause contains = less >> than 20 values, but as soon as the IN clause contains more than 20 = values, >> the query will always return zero rows. >>=20 >> I did some explains and can see that the planner changes the strategy = for >> the query execution when the number of values in an IN clause exceeds = 20-- >> less than 20 generates a big (q =3D 1 OR q =3D 2 OR ...) statement, = whereas >=3D >> 20 switches to a SemiJoin between the original table and an >> EnumerableValues object that contains the list of values from the IN >> clause. I think the rub here is that if the values in the IN clause = look >> like INTs or at least one of them isn't explicitly cast to a BIGINT, = then >> the EnumerableValues will also have type int, and the semi-join will = fail >> b/c of the type difference (that last part is me hypothesizing, you = all >> would obviously know better what the problem is.) >>=20 >> I can get around the issue by either casting one of the values in the = IN >> clause to be a BIGINT, or by explicitly including a value in the list = that >> can't be stored as an INT, but it seemed like there should be a = cleaner way >> to handle this case so that the query behavior doesn't change = dramatically >> when one extra value gets added to the IN list. >>=20 >> Thanks! >> Josh >>=20 >>=20 >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >>=20 >=20 >=20 >=20 > --=20 > Director of Data Science > Cloudera > Twitter: @josh_wills