drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5468) TPCH Q18 regressed ~3x due to execution plan changes
Date Wed, 12 Jul 2017 19:11:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084522#comment-16084522

Jinfeng Ni commented on DRILL-5468:

[~amansinha100], you are right that the HAVING predicate {{having sum(l_quantity) > 300}}
is the one that reduces rowcount most.  Since it's uses SUM(), having NDV would not help for
this HAVING predicate estimation. 

For tpch-sf100, the rowcount is below the default broadcast threshold (10M).  Prior to Drill-4678,
the rowcount on the broadcast side is 300k, which is increased to 3M after DRILL-4678. Both
of them is below 10M.  I think it's the relative cost comparison between broadcast vs hash
exchange that causes the change of plan. 

> TPCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>                 Key: DRILL-5468
>                 URL: https://issues.apache.org/jira/browse/DRILL-5468
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>         Environment: 10+1 node ucs-micro cluster RHEL6.4
>            Reporter: Dechang Gu
>            Assignee: Jinfeng Ni
>             Fix For: 1.11.0
>         Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on SF100 parquet
dataset took ~81 secs, while the same query on 1.10.0 took only ~27 secs.  The query time
on the commit adbf363 which is right before 841ead4 is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be uploaded)

This message was sent by Atlassian JIRA

View raw message