calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuo Cheng <njucs...@gmail.com>
Subject Re: Split Join condition with CAST which only widening nullability
Date Thu, 26 Mar 2020 11:22:10 GMT
I think we may solve the problem from two aspects:
1. Do not try to preserve type (nullability) of Join/Filter condition
expression when simplifying or something like pushing down.
2. We can do some work (remove unnecessary CAST) right before create a
Join/Filter, as Julian said, something in RelBuilder could be done.
I've do some fix in above Link (remove unnecessary CAST when doing
pushDownEqualJoinConditions)

On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng <njucshuo@gmail.com> wrote:

> Sorry for the late reply, I've reproduced the problem here
> https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929
> .
>
> On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde <jhyde@apache.org> wrote:
>
>> It does seem to be something that RelBuilder could do. (RexSimplify can’t
>> really do it, because it doesn’t know how the expression is being used.)
>>
>> It’s also worth discovering why the CAST was added in the first place. It
>> doesn’t seem to be helpful. I think we should strive to eliminate all of
>> the slightly unhelpful things that Calcite does; those things can add up
>> and cause major inefficiencies in the planning process and/or sub-optimal
>> plans.
>>
>> Julian
>>
>>
>> > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich <kirk@rxd.hu> wrote:
>> >
>> > Hey,
>> >
>> > That's a great diagnosis :)
>> > I would guess that newCondition became non-nullable for some reason
>> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the
>> nullability)
>> > you could try invoking simplify.simplifyPreservingType() on it to see
>> if that would help.
>> >
>> > > I know it's necessary to preserve the nullability when simplifying a
>> boolean expression in project columns, but as for condition in Filter/Calc,
>> may be we can omit the
>> > > nullability?
>> > I think that could probably work - we can't change the nullability on
>> project columns because those could be referenced (and the reference also
>> has the type) ; but for filter/join conditions we don't need to care with
>> it.
>> > It seems we already have a "matchnullability" in ReduceExpressionsRule
>> ; for FILTER/JOIN we should probably turn that off...  :)
>> >
>> > cheers,
>> > Zoltan
>> >
>> >
>> > On 3/24/20 9:15 AM, Shuo Cheng wrote:
>> >> Hi Zoltan,
>> >> I encountered the problem when running TPC tests, and have not
>> reproduced it in Calcite master.
>> >> But I figured it out how the problem is produced:
>> >> There is semi join with the condition:AND(EXPANDED_INDF1,
>> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true`
>> >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2),
>> type of AND is BOOLEAN with nullable `true`
>> >> After  SemiJoinProjectTransposeRule --> Join condition:
>> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false`
>> >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule`
>> where forced type correction is added by `RexProgramBuilder#addCondition`,
>> which will call `RexSimplify#simplifyPreservingType` before registering an
>> expression.
>> >> I know it's necessary to preserve the nullability when simplifying a
>> boolean expression in project columns, but as for condition in Filter/Calc,
>> may be we can omit the nullability?
>> >> Best Regards,
>> >> Shuo
>> >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich <kirk@rxd.hu <mailto:
>> kirk@rxd.hu>> wrote:
>> >>    Hey Shuo!
>> >>    I think that simplification should been made on join conditions -
>> I've done a quick check; and it seems to be working for me.
>> >>    I suspected that it will be either a missing call to RexSimplify
>> for some reason - or it is added by a forced return type correction: IIRC
>> there are some cases in which
>> >>    the
>> >>    RexNode type should retained after simplification.
>> >>    Is this reproducible on current master; could you share a testcase?
>> >>    cheers,
>> >>    Zoltan
>> >>    On 3/24/20 7:28 AM, Shuo Cheng wrote:
>> >>     > Hi, Julian, That's what we do as a workaround way. we remove
>> CAST which are
>> >>     > only widening nullability as what CALCITE-2695 does before
>> applying
>> >>     > hash-join/sort-merge-join rule, such that equiv predicate can be
>> split
>> >>     > out.  I'm not sure whether it's properly for Calcite to do the
>> 'convert
>> >>     > back' job, for example, simplify the join condition when create
>> a Join; Or
>> >>     > maybe let other systems what use Calcite to do the "convert
>> back" job as an
>> >>     > optimization? What do you think?
>> >>     >
>> >>     > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde <
>> jhyde.apache@gmail.com <mailto:jhyde.apache@gmail.com>> wrote:
>> >>     >
>> >>     >> Or convert it back to a not-nullable BOOLEAN? The join
>> condition treats
>> >>     >> UNKNOWN the same as FALSE, and besides UNKNOWN will never
>> occur, so the
>> >>     >> conditions with and without the CAST are equivalent.
>> >>     >>
>> >>     >> Julian
>> >>     >>
>> >>     >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng <njucshuo@gmail.com
>> <mailto:njucshuo@gmail.com>> wrote:
>> >>     >>>
>> >>     >>> Hi all,
>> >>     >>>
>> >>     >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1,
>> $2),
>> >>     >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable
>> BOOLEAN,
>> >>     >> Calcite
>> >>     >>> can not split out equiv predicate, thus some join operation
>> like hash
>> >>     >> join
>> >>     >>> / sort merge join may not be used. Maybe we can
>> >>     >>> expand RelOptUtil#splitJoinCondition to support this scenario?
>> >>     >>
>> >>     >
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message