kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
Date Sun, 18 Mar 2018 09:25:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403931#comment-16403931
] 

ASF subversion and git services commented on KYLIN-3149:
--------------------------------------------------------

Commit b81d7a2df289029768edd6532e63eae251f28b41 in kylin's branch refs/heads/master from [~hn5092]
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=b81d7a2 ]

KYLIN-3149 Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected


> Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
> ------------------------------------------------------------------------
>
>                 Key: KYLIN-3149
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3149
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v2.2.0
>            Reporter: hongbin ma
>            Assignee: yiming.xu
>            Priority: Major
>         Attachments: dump.txt
>
>
> for queries like:
> {code:sql}
> select TRANS_ID from kylin_sales group by cast (case 
> WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, '999999999999'), 1,
1)
> when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, '999999999999'), 1, 1)
 
> when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, '999999999999'),
1, 1)
> when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, '999999999999'),
1, 1)
> end as varchar(256)), TRANS_ID;
> {code}
> the expected logical plan after volcano is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>     OLAPLimitRel(ctx=[], fetch=[50000])
>       OLAPAggregateRel(group=[{0, 1}], ctx=[])
>         OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1)],
TRANS_ID=[$0], ctx=[])
>           OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> however the actual is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPLimitRel(ctx=[], fetch=[50000])
>     OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>       OLAPAggregateRel(group=[{0, 1}], ctx=[])
>         OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), SUBSTRING(CASE(IS NOT
NULL($9), $9, '999999999999'), 1, 1), =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10),
$10, '999999999999'), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2,
'999999999999'), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, '999999999999'),
1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"], TRANS_ID=[$0],
ctx=[])
>           OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected.
If we dump the internal state of this VolcanoPlanner (org.apache.calcite.plan.volcano.VolcanoPlanner#dump),
line 19-21 from the complete dump is attached:
> {code}
> 	rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561
> 		rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
'1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1), =('1030101', '1030102'),
SUBSTRING(CASE(IS NOT NULL($10), $10, '999999999999'), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS
NOT NULL($2), $2, '999999999999'), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2),
$2, '999999999999'), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=),
rowcount=100.0, cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> 		rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
NOT NULL($9), $9, '999999999999'), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, cumulative cost={15.0
rows, 25.05 cpu, 0.0 io}
> {code}
> we see two rels with same cost:  #339 and #348, where #339 is created from LogicalProject
= (OLAPProjectRule)=> OLAPProject, and #348 is created from LogicalProject =( ReduceExpressionsRule)
=> Reduced LogicalProject =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule
require Logical Project rather than OLAP Project, #339 is never reduced.
> The worse thing is that cost of #339 and #348 are same. By current volcano planner algorithm
 the first met rel will be chosen, so unexpected rel is chosen
> A simple approach to fix this is to refine the rel choosing algorithm: when two rels
are equal in cost, choose a "simpler" one. Since we don't have a perfect measurement of "simple",
we simply choose the rel with smaller toString() length



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message