From dev-return-1546-apmail-calcite-dev-archive=calcite.apache.org@calcite.incubator.apache.org Wed Jul 29 00:13:38 2015 Return-Path: X-Original-To: apmail-calcite-dev-archive@www.apache.org Delivered-To: apmail-calcite-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDCED183F7 for ; Wed, 29 Jul 2015 00:13:38 +0000 (UTC) Received: (qmail 47634 invoked by uid 500); 29 Jul 2015 00:13:38 -0000 Delivered-To: apmail-calcite-dev-archive@calcite.apache.org Received: (qmail 47574 invoked by uid 500); 29 Jul 2015 00:13:38 -0000 Mailing-List: contact dev-help@calcite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.incubator.apache.org Delivered-To: mailing list dev@calcite.incubator.apache.org Received: (qmail 47563 invoked by uid 99); 29 Jul 2015 00:13:38 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2015 00:13:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 09ED71A8B1A for ; Wed, 29 Jul 2015 00:13:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id sV-HPUfBLxPh for ; Wed, 29 Jul 2015 00:13:32 +0000 (UTC) Received: from mail-yk0-f169.google.com (mail-yk0-f169.google.com [209.85.160.169]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CF6902134E for ; Wed, 29 Jul 2015 00:13:31 +0000 (UTC) Received: by ykba194 with SMTP id a194so7183241ykb.0 for ; Tue, 28 Jul 2015 17:13:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=oUICjUnIKIug2Dwdgq1v6qsg436XFbMVTaEVy91ioSM=; b=NPrgi0n+kKYkfkvV1xiCvC1MuOEmx8gg+gEFZbb6ijsoB7JbHzqvtF84yBcweGWCvZ M1hNwnQXpY0mRK2U1dBmebBD6i09SLKI5WcFo9vn7SjO6ZZaZccnEqZySaoZ0yKZP4ev x91fwM6t2mScs47VAakKi4Ak1BOsQ+kdiU1e9BCc0++oY+yICajPYlbZtK16sDkdVuVk u3ylT8Jb3GYp8Q/YqOmYeGrYykoT2G/EOCzORcY3owP+M25DqV1yQBKxYA2kK2TMBqkU 4pEJrmPu/Vwl3r0HfglLW3VG3fbt0m3uMkKsuR7Ndxt/uPndGSRcJoyHiQ0NaMcNMyNS iR2Q== X-Gm-Message-State: ALoCoQnm8VdZKh5zzkWt5pB5EyfvFk/j/ZRvp/O99zotNKVVnk2ElRtEPFtykU+BB9FcZTRe8T0q X-Received: by 10.13.232.83 with SMTP id r80mr39131438ywe.16.1438128805448; Tue, 28 Jul 2015 17:13:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.196.197 with HTTP; Tue, 28 Jul 2015 17:13:06 -0700 (PDT) From: Josh Wills Date: Tue, 28 Jul 2015 17:13:06 -0700 Message-ID: Subject: bigint conversion issue w/IN clauses To: dev@calcite.incubator.apache.org Content-Type: multipart/alternative; boundary=94eb2c084140974313051bf87471 --94eb2c084140974313051bf87471 Content-Type: text/plain; charset=UTF-8 Hey devs, Found a fun issue for you that I couldn't find anywhere else after some googling. Let's say I have a BIGINT column named "q" in a table, and I want to do some filtering on a list of values from that column based on an IN clause. If all of the values in my IN clause happen to be INTs, then my query will work properly (i.e., it will return any rows from the table whose "q" value contains one of the values from my list) if the IN clause contains less than 20 values, but as soon as the IN clause contains more than 20 values, the query will always return zero rows. I did some explains and can see that the planner changes the strategy for the query execution when the number of values in an IN clause exceeds 20-- less than 20 generates a big (q = 1 OR q = 2 OR ...) statement, whereas >= 20 switches to a SemiJoin between the original table and an EnumerableValues object that contains the list of values from the IN clause. I think the rub here is that if the values in the IN clause look like INTs or at least one of them isn't explicitly cast to a BIGINT, then the EnumerableValues will also have type int, and the semi-join will fail b/c of the type difference (that last part is me hypothesizing, you all would obviously know better what the problem is.) I can get around the issue by either casting one of the values in the IN clause to be a BIGINT, or by explicitly including a value in the list that can't be stored as an INT, but it seemed like there should be a cleaner way to handle this case so that the query behavior doesn't change dramatically when one extra value gets added to the IN list. Thanks! Josh -- Director of Data Science Cloudera Twitter: @josh_wills --94eb2c084140974313051bf87471--