impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-1391) TPC-DS query 17 very slow
Date Tue, 04 Apr 2017 17:32:41 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mostafa Mokhtar resolved IMPALA-1391.
-------------------------------------
    Resolution: Fixed

Runtime filters speed the query by >10x 

> TPC-DS query 17 very slow
> -------------------------
>
>                 Key: IMPALA-1391
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1391
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Perf Investigation
>    Affects Versions: Impala 2.0
>            Reporter: David Rorke
>            Assignee: Mostafa Mokhtar
>            Priority: Minor
>              Labels: performance
>             Fix For: Impala 2.5.0
>
>         Attachments: q17.profile, q17_shuffle_hint.plan, tpc_ds_q17.pdf
>
>
> TPC-DS query 17 takes 56 minutes on a 15 TB scale factor data set (20 node cluster).
This is with explicit partition filters added on each of the large fact tables.  A few points
I noticed in the plan/profile:
> (1) The bulk of the time is used in the joins and aggregation.  There is some spilling
(mostly in one of the joins).
> (2) The plan is using broadcast joins in all cases, even when joining large tables/result
sets.
> (3) I rewrote the query to use SQL 92 style joins and added "shuffle" hints on what should
be the larger joins.  The resulting plan uses a partitioned join for one of the 2 cases where
I added a shuffle hint, but continues to use a broadcast for the other large join.
> The profile for the original query and the explain plan output for the modified (hinted)
query are attached.
> Published results from HWX claim that this query runs in 300 seconds with Hive/Tez and
a 30 TB scale factor (we haven't independently verified this Hive time).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message