Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 38C6B200BAC for ; Wed, 26 Oct 2016 22:50:35 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 373EA160AEE; Wed, 26 Oct 2016 20:50:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 552EB160AE1 for ; Wed, 26 Oct 2016 22:50:34 +0200 (CEST) Received: (qmail 29441 invoked by uid 500); 26 Oct 2016 20:50:33 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 29429 invoked by uid 99); 26 Oct 2016 20:50:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2016 20:50:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B1E35C787A for ; Wed, 26 Oct 2016 20:50:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=maprtech.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id e3r08wcr1no3 for ; Wed, 26 Oct 2016 20:50:30 +0000 (UTC) Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com [74.125.82.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 50EB85F24F for ; Wed, 26 Oct 2016 20:50:30 +0000 (UTC) Received: by mail-wm0-f48.google.com with SMTP id b80so247540526wme.1 for ; Wed, 26 Oct 2016 13:50:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maprtech.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ToKFuNpzlLuFpeQnNuveLrkpQwbYFypE1FbwI/DALGs=; b=Kf156eEw+NsXt8OF3rOe2TK6bzKhJ6qA6yRPBwjdV7BtavklhyL5nyHjaaPtFqsWQa Mz4v7E53K+RjElVJ9pARqbHBgGoJJl/Ey58V0pyrjwXTNOK87FJqHKrFIBGak3fV7RHe whkFVSekIgm13JrK+ZAFObn0pTfZrm0uATC50= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ToKFuNpzlLuFpeQnNuveLrkpQwbYFypE1FbwI/DALGs=; b=mW89vkwkk9A4RVAiq4532jGNmyDFc7DEpB7tyj1LVvST+37Ek6Rg7DgBQUKrsqRu9n DFAfXJweMM1JXNqE5eX5HNnCX/aHlmEuDOz3uyDXpebTI8axx/U8T4pH45i5Zs6s2z8L vdotKDnbvxVq5kl1+yWQkFwC4eoGHkZQcuV+kU5X6nZNNjlxrfsy65VbN4nLz3BSxJIE b9dRtvyaxryNQDkkAsznm7kMogbYyYbbarfqUfZuGjjz9/G2Ifh9gJ0IcJJvP5+TVhIn ugNiylfnM84BUjSiMbRr2XHI1a2XsNg5stt4DSoPj1B+OivEITfqiCJ7DBwzN0wtLHI0 RAhA== X-Gm-Message-State: ABUngvdfEnFejt6ErV2TfpGHpZiWy43F43IjWLOIfF5D6YEV3R+LoItKoPE0c9wWKQaGmlrERAS38wfxtkLgQKe8 X-Received: by 10.194.235.34 with SMTP id uj2mr3608369wjc.144.1477515029066; Wed, 26 Oct 2016 13:50:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.165.163 with HTTP; Wed, 26 Oct 2016 13:50:28 -0700 (PDT) In-Reply-To: References: From: Zelaine Fong Date: Wed, 26 Oct 2016 13:50:28 -0700 Message-ID: Subject: Re: Filter appears above project in query plan - Null Equality Join To: dev Content-Type: multipart/alternative; boundary=089e013d11b0754078053fcac61d archived-at: Wed, 26 Oct 2016 20:50:35 -0000 --089e013d11b0754078053fcac61d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The filter, I assume you' referring to, is a join filter. So, at a minimum, it needs to be applied after the hash join. I'm not sure there's a lot of benefit in pushing that filter past the project that's on top of the hash join. -- Zelaine On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz wrote: > Hi All, > > Filter is seen on top of Project in query plan for a null equality join. > This is over CSV data, shouldn't the filter appear below the project in t= he > query plan ? > I am on Drill 1.9.0 git commit id: a29f1e29 > > Note : t1 has some nulls in it > t2 does not have any nulls in it. > > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> explain plan for > select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 > ON t1.columns[0] =3D t2.columns[0] > WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] > OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 ProjectAllowDup(*=3D[$0], *0=3D[$1]) > 00-02 Project(T43=C2=A6=C2=A6*=3D[$0], T44=C2=A6=C2=A6*=3D[$2]) > 00-03 SelectionVectorRemover > 00-04 Filter(condition=3D[OR(CAST(CASE(IS NULL(ITEM($1, 0)), I= S > NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =3D(ITEM($= 1, > 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS > NULL(ITEM($3, 0))))]) > 00-05 Project(T43=C2=A6=C2=A6*=3D[$0], columns=3D[$1], T44= =C2=A6=C2=A6*=3D[$3], > columns0=3D[$4]) > 00-06 HashJoin(condition=3D[=3D($2, $5)], joinType=3D[inne= r]) > 00-07 Project(T44=C2=A6=C2=A6*=3D[$0], columns0=3D[$1], = $f20=3D[$2]) > 00-09 Project(T44=C2=A6=C2=A6*=3D[$0], columns=3D[$1],= $f2=3D[ITEM($1, > 0)]) > 00-11 Project(T44=C2=A6=C2=A6*=3D[$0], columns=3D[$1= ]) > 00-13 Scan(groupscan=3D[EasyGroupScan > [selectionRoot=3Dmaprfs:/tmp/oneColWOnulls.csv, numFiles=3D1, columns=3D[= `*`], > files=3D[maprfs:///tmp/oneColWOnulls.csv]]]) > 00-08 Project(T43=C2=A6=C2=A6*=3D[$0], columns=3D[$1], $= f2=3D[ITEM($1, > 0)]) > 00-10 Project(T43=C2=A6=C2=A6*=3D[$0], columns=3D[$1]) > 00-12 Scan(groupscan=3D[EasyGroupScan > [selectionRoot=3Dmaprfs:/tmp/oneColDupsWnulls.csv, numFiles=3D1, > columns=3D[`*`], > files=3D[maprfs:///tmp/oneColDupsWnulls.csv]]]) > {noformat} > > Results returned by query > > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> select * from `oneColDupsWnulls.csv` t1 J= OIN > `oneColWOnulls.csv` t2 ON t1.columns[0] =3D t2.columns[0] WHERE t1.column= s[0] > IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND > t2.columns[0] IS NULL ); > +-------------+-------------+ > | columns | columns0 | > +-------------+-------------+ > | ["test"] | ["test"] | > | ["foo"] | ["foo"] | > | ["foo"] | ["foo"] | > | ["bar"] | ["bar"] | > | ["yes"] | ["yes"] | > | ["yes"] | ["yes"] | > | ["no"] | ["no"] | > | ["no"] | ["no"] | > | ["foobar"] | ["foobar"] | > | ["foobar"] | ["foobar"] | > | ["never"] | ["never"] | > | ["never"] | ["never"] | > | ["ever"] | ["ever"] | > | ["ever"] | ["ever"] | > | ["here"] | ["here"] | > | ["there"] | ["there"] | > | ["no"] | ["no"] | > | ["no"] | ["no"] | > | ["yes"] | ["yes"] | > | ["yes"] | ["yes"] | > | ["foobar"] | ["foobar"] | > | ["foobar"] | ["foobar"] | > | ["temp"] | ["temp"] | > +-------------+-------------+ > 23 rows selected (0.341 seconds) > {noformat} > > Thanks, > Khurram > --089e013d11b0754078053fcac61d--