impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5196) Consider reducing partition fanout in Hash Join
Date Tue, 11 Apr 2017 16:38:41 GMT
Tim Armstrong created IMPALA-5196:
-------------------------------------

             Summary: Consider reducing partition fanout in Hash Join
                 Key: IMPALA-5196
                 URL: https://issues.apache.org/jira/browse/IMPALA-5196
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


Reducing the partition fanout would reduce the minimum memory requirement. The downside is
that it can increase the number of required repartitions while spilling, or require spilling
a bit more data (because the partition granularity is larger).

There's no effect on in-memory perf on TPC-H but there appears to be some in the targeted
perf:
{code}
+-----------+-----------------------+---------+------------+------------+----------------+
| Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCH(_60) | parquet / none / none | 17.62   | +0.23%     | 12.10      | -0.16%         |
+-----------+-----------------------+---------+------------+------------+----------------+

+-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)
| Base StdDev(%) | Num Clients | Iters |
+-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.88  | 11.36       |   +4.56%   |   0.02%
  |   0.39%        | 1           | 5     |
| TPCH(_60) | TPCH-Q21 | parquet / none / none | 68.54  | 67.43       |   +1.64%   |   1.60%
  |   0.33%        | 1           | 5     |
| TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.44   | 2.40        |   +1.60%   |   5.36%
  |   3.88%        | 1           | 5     |
| TPCH(_60) | TPCH-Q7  | parquet / none / none | 36.05  | 35.49       |   +1.59%   |   0.42%
  |   0.27%        | 1           | 5     |
| TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.86   | 7.80        |   +0.80%   |   0.77%
  |   0.58%        | 1           | 5     |
| TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.84  | 10.79       |   +0.49%   |   1.57%
  |   1.54%        | 1           | 5     |
| TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.49   | 6.46        |   +0.38%   |   0.83%
  |   0.48%        | 1           | 5     |
| TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.21  | 12.18       |   +0.27%   |   0.67%
  |   1.31%        | 1           | 5     |
| TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.62        |   +0.24%   |   0.45%
  |   0.43%        | 1           | 5     |
| TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.94  | 10.92       |   +0.18%   |   0.38%
  |   0.60%        | 1           | 5     |
| TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.71  | 27.70       |   +0.05%   |   0.53%
  |   0.64%        | 1           | 5     |
| TPCH(_60) | TPCH-Q18 | parquet / none / none | 56.23  | 56.29       |   -0.11%   |   0.94%
  |   0.55%        | 1           | 5     |
| TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.50  | 13.53       |   -0.18%   |   0.50%
  |   1.86%        | 1           | 5     |
| TPCH(_60) | TPCH-Q17 | parquet / none / none | 16.45  | 16.55       |   -0.60%   |   1.06%
  |   0.82%        | 1           | 5     |
| TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.33   | 9.39        |   -0.62%   |   1.04%
  |   1.53%        | 1           | 5     |
| TPCH(_60) | TPCH-Q9  | parquet / none / none | 36.26  | 36.59       |   -0.92%   |   0.63%
  |   0.31%        | 1           | 5     |
| TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.58   | 4.63        |   -1.04%   |   0.79%
  |   0.47%        | 1           | 5     |
| TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.86   | 7.96        |   -1.31%   |   0.29%
  |   0.81%        | 1           | 5     |
| TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 22.33       |   -1.60%   |   0.19%
  |   0.67%        | 1           | 5     |
| TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.22   | 3.29        |   -2.23%   |   3.87%
  |   0.77%        | 1           | 5     |
| TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.06   | 8.25        |   -2.33%   |   1.83%
  |   3.19%        | 1           | 5     |
| TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.47   | 5.69        |   -4.02%   |   0.85%
  |   1.35%        | 1           | 5     |
+-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
{code}
{code}
+--------------------+-----------------------+---------+------------+------------+----------------+
| Workload           | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean)
|
+--------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(_60) | parquet / none / none | 18.92   | +2.34%     | 5.00       | +1.85%
        |
+--------------------+-----------------------+---------+------------+------------+----------------+

+--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
------+-------+
| Workload           | Query                                                  | File Format
          | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Cl
ients | Iters |
+--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
------+-------+
| TARGETED-PERF(_60) | primitive_exchange_broadcast                           | parquet /
none / none | 57.73  | 39.60       | R +45.78%  |   0.32%    |   2.60%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_conjunct_ordering_1                          | parquet /
none / none | 0.11   | 0.09        | R +31.01%  | * 21.46% * | * 26.37% *     | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_topn_bigint                                  | parquet /
none / none | 5.19   | 4.41        |   +17.76%  | * 14.63% * |   8.25%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_in_predicate                          | parquet /
none / none | 2.39   | 2.27        |   +5.06%   |   4.95%    |   4.89%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_exchange_shuffle                             | parquet /
none / none | 85.73  | 82.08       |   +4.44%   |   2.41%    |   0.55%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_string_non_selective                  | parquet /
none / none | 1.89   | 1.84        |   +2.56%   |   1.27%    |   1.16%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_groupby_bigint_lowndv                        | parquet /
none / none | 3.85   | 3.77        |   +2.11%   |   2.73%    |   1.72%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_bigint_selective                      | parquet /
none / none | 0.18   | 0.17        |   +1.95%   |   6.48%    |   1.66%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_decimal_selective                     | parquet /
none / none | 1.66   | 1.63        |   +1.74%   |   1.88%    |   0.17%        | 1     
      | 5     |
| TARGETED-PERF(_60) | PERF_STRING-Q3                                         | parquet /
none / none | 3.76   | 3.70        |   +1.59%   |   0.62%    |   0.62%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_string_selective                      | parquet /
none / none | 1.94   | 1.92        |   +1.36%   |   1.35%    |   1.24%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_shuffle_join_union_all_with_groupby          | parquet /
none / none | 49.09  | 48.47       |   +1.29%   |   0.11%    |   0.30%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_decimal_non_selective                 | parquet /
none / none | 1.70   | 1.68        |   +1.28%   |   2.00%    |   1.80%        | 1     
      | 5     |
| TARGETED-PERF(_60) | primitive_filter_string_like                           | parquet /
none / none | 14.83  | 14.66       |   +1.17%   |   0.01%    |   0.14%        | 1        
  | 5     |

{code}


I'm not considering reducing the aggregation fanout because there's quite a strong "pre-partitioning"
effect between the pre-aggregation and merge aggregation - reducing the fanout led to big
regressions on some large aggregations in my experiment:
{code}
Report Generated on 2017-04-07
Run Description: "Base: c24142491b3e140aac8c6818668472707d015cc7 vs Ref: e1395f670f30df823adf682101b00fc72e15bd71"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:          impalad version 2.9.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE ()

+-----------+-----------------------+---------+------------+------------+----------------+
| Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCH(_60) | parquet / none / none | 18.84   | +3.78%     | 12.28      | -0.19%         |
+-----------+-----------------------+---------+------------+------------+----------------+

+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)
 | Base StdDev(%) | Num Clients | Iters |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(_60) | TPCH-Q18 | parquet / none / none | 86.46  | 56.09       | R +54.15%  | * 12.70%
* |   0.60%        | 1           | 5     |
| TPCH(_60) | TPCH-Q17 | parquet / none / none | 17.71  | 16.32       |   +8.55%   |   0.77%
   |   1.05%        | 1           | 5     |
| TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.21   | 3.13        |   +2.73%   |   5.51%
   |   3.46%        | 1           | 5     |
| TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.63  | 13.31       |   +2.41%   |   1.04%
   |   0.97%        | 1           | 5     |
| TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.99  | 10.86       |   +1.12%   |   0.29%
   |   0.49%        | 1           | 5     |
| TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.43   | 2.41        |   +0.64%   |   2.18%
   |   3.88%        | 1           | 5     |
| TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.07   | 8.05        |   +0.15%   |   1.59%
   |   1.00%        | 1           | 5     |
| TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 21.94       |   +0.13%   |   0.30%
   |   0.19%        | 1           | 5     |
| TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.80   | 7.80        |   +0.03%   |   0.42%
   |   0.63%        | 1           | 5     |
| TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.16  | 12.17       |   -0.09%   |   1.14%
   |   0.54%        | 1           | 5     |
| TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.68        |   -0.44%   |   0.54%
   |   0.48%        | 1           | 5     |
| TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.52   | 6.56        |   -0.66%   |   0.94%
   |   0.77%        | 1           | 5     |
| TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.30   | 9.37        |   -0.81%   |   0.19%
   |   1.81%        | 1           | 5     |
| TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.56   | 4.60        |   -0.84%   |   0.60%
   |   0.81%        | 1           | 5     |
| TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.55  | 27.84       |   -1.03%   |   1.32%
   |   0.92%        | 1           | 5     |
| TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.79   | 7.88        |   -1.12%   |   1.10%
   |   0.27%        | 1           | 5     |
| TPCH(_60) | TPCH-Q21 | parquet / none / none | 67.65  | 69.84       |   -3.14%   |   0.25%
   |   0.13%        | 1           | 5     |
| TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.45   | 5.67        |   -3.73%   |   0.96%
   |   1.46%        | 1           | 5     |
| TPCH(_60) | TPCH-Q9  | parquet / none / none | 33.72  | 36.25       |   -6.96%   |   0.54%
   |   0.27%        | 1           | 5     |
| TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.22  | 11.06       |   -7.60%   |   0.23%
   |   0.63%        | 1           | 5     |
| TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.86  | 13.32       |   -10.96%  |   0.62%
   |   1.00%        | 1           | 5     |
| TPCH(_60) | TPCH-Q7  | parquet / none / none | 35.72  | 45.17       | I -20.93%  |   0.47%
   |   6.56%        | 1           | 5     |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

(R) Regression: TPCH(_60) TPCH-Q18 [parquet / none / none] (56.09s -> 86.46s [+54.15%])
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
| Operator     | % of Query | Avg    | Base Avg | Delta(Avg) | StdDev(%)  | Max    | Base
Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
| 13:AGGREGATE | 56.02%     | 61.84s | 27.05s   | +128.61%   |   9.36%    | 71.87s | 27.37s
  | +162.55%   | 1      | 3.85K  | 8.60M     |
| 04:AGGREGATE | 15.87%     | 17.52s | 23.23s   | -24.59%    | * 30.12% * | 26.50s | 23.43s
  | +13.09%    | 1      | 90.00M | 86.02M    |
| 05:HASH JOIN | 18.97%     | 20.95s | 23.27s   | -9.99%     | * 14.01% * | 26.01s | 23.56s
  | +10.37%    | 1      | 92.82K | 360.01M   |
| 02:SCAN HDFS | 4.13%      | 4.56s  | 4.62s    | -1.31%     |   1.14%    | 4.64s  | 4.71s
   | -1.42%     | 1      | 92.82K | 360.01M   |
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+

(I) Improvement: TPCH(_60) TPCH-Q7 [parquet / none / none] (45.17s -> 35.72s [-20.93%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max      | Base
Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| 11:AGGREGATE | 4.63%      | 1.75s    | 1.80s    | -2.69%     |   0.30%   | 1.76s    | 1.86s
   | -5.38%     | 1      | 4       | 1.64M     |
| 10:HASH JOIN | 10.12%     | 3.83s    | 4.33s    | -11.56%    |   0.50%   | 3.86s    | 4.50s
   | -14.20%    | 1      | 350.34K | 36.00M    |
| 09:HASH JOIN | 5.43%      | 2.06s    | 2.52s    | -18.44%    |   1.58%   | 2.11s    | 2.66s
   | -20.51%    | 1      | 109.37M | 36.00M    |
| 08:HASH JOIN | 9.95%      | 3.77s    | 5.09s    | -25.84%    |   0.31%   | 3.79s    | 5.34s
   | -29.10%    | 1      | 109.37M | 36.00M    |
| 03:SCAN HDFS | 2.47%      | 935.77ms | 947.71ms | -1.26%     |   1.11%   | 945.37ms | 969.54ms
| -2.49%     | 1      | 9.00M   | 9.00M     |
| 07:HASH JOIN | 34.60%     | 13.11s   | 19.43s   | -32.54%    |   0.51%   | 13.21s   | 21.49s
  | -38.55%    | 1      | 109.37M | 36.00M    |
| 14:EXCHANGE  | 2.50%      | 947.18ms | 931.62ms | +1.67%     |   1.96%   | 978.10ms | 948.27ms
| +3.14%     | 1      | 90.00M  | 90.00M    |
| 02:SCAN HDFS | 3.14%      | 1.19s    | 1.20s    | -0.97%     |   0.69%   | 1.20s    | 1.22s
   | -1.35%     | 1      | 90.00M  | 90.00M    |
| 06:HASH JOIN | 23.24%     | 8.81s    | 9.67s    | -8.92%     |   0.52%   | 8.87s    | 10.16s
  | -12.71%    | 1      | 109.37M | 36.00M    |
| 00:SCAN HDFS | 2.32%      | 879.94ms | 890.98ms | -1.24%     |   1.29%   | 891.75ms | 892.31ms
| -0.06%     | 1      | 600.00K | 600.00K   |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+

(V) Significant Variability: TPCH(_60) TPCH-Q18 [parquet / none / none] (0.60% -> 12.70%)
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
| Operator     | % of Query | StdDev(%) | Base StdDev(%) | Delta(StdDev(%)) | #Hosts | #Rows
 | Est #Rows |
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
| 04:AGGREGATE | 15.87%     | 30.12%    | 0.65%          | +4526.94%        | 1      | 90.00M
| 86.02M    |
| 05:HASH JOIN | 18.97%     | 14.01%    | 0.98%          | +1323.57%        | 1      | 92.82K
| 360.01M   |
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message