From dev-return-15635-archive-asf-public=cust-asf.ponee.io@calcite.apache.org Thu Mar 26 11:22:34 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 56BA9180637 for ; Thu, 26 Mar 2020 12:22:34 +0100 (CET) Received: (qmail 22348 invoked by uid 500); 26 Mar 2020 11:22:33 -0000 Mailing-List: contact dev-help@calcite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.apache.org Delivered-To: mailing list dev@calcite.apache.org Received: (qmail 22335 invoked by uid 99); 26 Mar 2020 11:22:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2020 11:22:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9EE01C2DE6 for ; Thu, 26 Mar 2020 11:22:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.003 X-Spam-Level: * X-Spam-Status: No, score=1.003 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id n9pYz68Oo33x for ; Thu, 26 Mar 2020 11:22:28 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.210.65; helo=mail-ot1-f65.google.com; envelope-from=njucshuo@gmail.com; receiver= Received: from mail-ot1-f65.google.com (mail-ot1-f65.google.com [209.85.210.65]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id A77E9BB815 for ; Thu, 26 Mar 2020 11:22:28 +0000 (UTC) Received: by mail-ot1-f65.google.com with SMTP id g23so5402356otq.4 for ; Thu, 26 Mar 2020 04:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=SHf1ao+m3wxOq2aucnb7sMqweL3xGLylGpysJxr/JhI=; b=j2YS/FGbxtfGc6JPPw/oTzs/kPdyIemIxXcktKN5odf42qrro3eROJC+vr7DmQ3Ntx ZPgxwkTDEqqSA4gL+xnNEprOyTqp+r632K/4Zkc0ozioZOGiO1/n3zQfRjGUSGJJemjg xTI7eW1jiyTGXefIP6V3i5fzxzpfuz8W+t0+sw0HhuKJ1eiTWHoqKExjx+HaRhXc0wrS /XwHRcuViA6Y6DalAdbM2BUSK3/eVZIhCPDiOpgnzoyrv+wijafvxlDeJ6MrStBo+0NK 4A/5p6iW+fMFVGVt7JslWu1YuJ+nF2ymqWQMWU5oc2KTHprHhdpVsE/40E0a46JATSE2 OC/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=SHf1ao+m3wxOq2aucnb7sMqweL3xGLylGpysJxr/JhI=; b=NLauRa9fEHXkcx6G4R2zEHxswnLyyHti8JmFTYK/sBZhVatyAuLKq9WidQ6yH1pbjn WttQfqjYrNFFEeTBmuKr2XqY+ebCirF4Y2T9fzBxYKxlhQrJAQ5pZadvZ3rjxCltQ9XY nMseikL5OPq66a5VTWmNkp0rFCcinVZ5XSxEqYkc/I9ONfZ2UlIU2q5vl9Hoqb8RGwjt X6N+aK+j3HFzk5Crq5MMMkB7SSAI064Ga5Mg8xPFlALcuMNXZ8zrND6w/W9gcU7qrYdQ C3UPGcAXRsiF3LyXzRt0JZlqhAnPGDDP5JsCwEkNa1n1zsOSQNnRbtl6Z+gq9zOz0qxj XXaw== X-Gm-Message-State: ANhLgQ0z7JnymAqMs6+Gn3gXXi8eke3VE00lip4OOAgTejbJmfewYYMS 804ENJIzd4iOGjKRed9T7AgqW5Ui4zuCQigGEEvPziix X-Google-Smtp-Source: ADFU+vvOl5vfhdUOs8KEFB2WT0XlD4GVnn/VJlyAk9DgyX7W75zESfNFF/TgtMEA7DD++0l32WXHfhPmTHxls3YpNiU= X-Received: by 2002:a4a:9c86:: with SMTP id z6mr4846709ooj.68.1585221741578; Thu, 26 Mar 2020 04:22:21 -0700 (PDT) MIME-Version: 1.0 References: <4213FFD3-FBF8-42B0-AF49-D28BE49A1E6B@gmail.com> <17df552b-cbb2-bfd6-5cd5-ccbc1dc0d4aa@rxd.hu> <9892F37B-F2F6-4E42-BE31-1919883E1704@apache.org> In-Reply-To: From: Shuo Cheng Date: Thu, 26 Mar 2020 19:22:10 +0800 Message-ID: Subject: Re: Split Join condition with CAST which only widening nullability To: dev@calcite.apache.org Content-Type: multipart/alternative; boundary="000000000000cc0bc805a1c0326a" --000000000000cc0bc805a1c0326a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think we may solve the problem from two aspects: 1. Do not try to preserve type (nullability) of Join/Filter condition expression when simplifying or something like pushing down. 2. We can do some work (remove unnecessary CAST) right before create a Join/Filter, as Julian said, something in RelBuilder could be done. I've do some fix in above Link (remove unnecessary CAST when doing pushDownEqualJoinConditions) On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng wrote: > Sorry for the late reply, I've reproduced the problem here > https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d= 4df7929 > . > > On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde wrote: > >> It does seem to be something that RelBuilder could do. (RexSimplify can= =E2=80=99t >> really do it, because it doesn=E2=80=99t know how the expression is bein= g used.) >> >> It=E2=80=99s also worth discovering why the CAST was added in the first = place. It >> doesn=E2=80=99t seem to be helpful. I think we should strive to eliminat= e all of >> the slightly unhelpful things that Calcite does; those things can add up >> and cause major inefficiencies in the planning process and/or sub-optima= l >> plans. >> >> Julian >> >> >> > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich wrote: >> > >> > Hey, >> > >> > That's a great diagnosis :) >> > I would guess that newCondition became non-nullable for some reason >> (rexSimplify runs under RexProgramBuilder so it might be able to narrow = the >> nullability) >> > you could try invoking simplify.simplifyPreservingType() on it to see >> if that would help. >> > >> > > I know it's necessary to preserve the nullability when simplifying a >> boolean expression in project columns, but as for condition in Filter/Ca= lc, >> may be we can omit the >> > > nullability? >> > I think that could probably work - we can't change the nullability on >> project columns because those could be referenced (and the reference als= o >> has the type) ; but for filter/join conditions we don't need to care wit= h >> it. >> > It seems we already have a "matchnullability" in ReduceExpressionsRule >> ; for FILTER/JOIN we should probably turn that off... :) >> > >> > cheers, >> > Zoltan >> > >> > >> > On 3/24/20 9:15 AM, Shuo Cheng wrote: >> >> Hi Zoltan, >> >> I encountered the problem when running TPC tests, and have not >> reproduced it in Calcite master. >> >> But I figured it out how the problem is produced: >> >> There is semi join with the condition=EF=BC=9AAND(EXPANDED_INDF1, >> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true` >> >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), >> type of AND is BOOLEAN with nullable `true` >> >> After SemiJoinProjectTransposeRule --> Join condition: >> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false` >> >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` >> where forced type correction is added by `RexProgramBuilder#addCondition= `, >> which will call `RexSimplify#simplifyPreservingType` before registering = an >> expression. >> >> I know it's necessary to preserve the nullability when simplifying a >> boolean expression in project columns, but as for condition in Filter/Ca= lc, >> may be we can omit the nullability? >> >> Best Regards, >> >> Shuo >> >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich > kirk@rxd.hu>> wrote: >> >> Hey Shuo! >> >> I think that simplification should been made on join conditions - >> I've done a quick check; and it seems to be working for me. >> >> I suspected that it will be either a missing call to RexSimplify >> for some reason - or it is added by a forced return type correction: IIR= C >> there are some cases in which >> >> the >> >> RexNode type should retained after simplification. >> >> Is this reproducible on current master; could you share a testcase= ? >> >> cheers, >> >> Zoltan >> >> On 3/24/20 7:28 AM, Shuo Cheng wrote: >> >> > Hi, Julian, That's what we do as a workaround way. we remove >> CAST which are >> >> > only widening nullability as what CALCITE-2695 does before >> applying >> >> > hash-join/sort-merge-join rule, such that equiv predicate can b= e >> split >> >> > out. I'm not sure whether it's properly for Calcite to do the >> 'convert >> >> > back' job, for example, simplify the join condition when create >> a Join; Or >> >> > maybe let other systems what use Calcite to do the "convert >> back" job as an >> >> > optimization? What do you think? >> >> > >> >> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde < >> jhyde.apache@gmail.com > wrote: >> >> > >> >> >> Or convert it back to a not-nullable BOOLEAN? The join >> condition treats >> >> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never >> occur, so the >> >> >> conditions with and without the CAST are equivalent. >> >> >> >> >> >> Julian >> >> >> >> >> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng > > wrote: >> >> >>> >> >> >>> =EF=BB=BFHi all, >> >> >>> >> >> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, >> $2), >> >> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable >> BOOLEAN, >> >> >> Calcite >> >> >>> can not split out equiv predicate, thus some join operation >> like hash >> >> >> join >> >> >>> / sort merge join may not be used. Maybe we can >> >> >>> expand RelOptUtil#splitJoinCondition to support this scenario= ? >> >> >> >> >> > >> >> --000000000000cc0bc805a1c0326a--