impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5196) Consider reducing partition fanout in Hash Join
Date Thu, 13 Jul 2017 20:36:00 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Armstrong resolved IMPALA-5196.
-----------------------------------
    Resolution: Won't Fix

It looks like we probably won't need to do this to get the memory consumption low enough -
the buffer size scaling seems to be effective at reducing memory consumption for small joins.

We can revisit later - maybe it makes sense to scale number of partitions instead of buffer
size in some cases, but that adds extra complexity that doesn't seem necessary for now.

> Consider reducing partition fanout in Hash Join
> -----------------------------------------------
>
>                 Key: IMPALA-5196
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5196
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>              Labels: resource-management
>
> Reducing the partition fanout would reduce the minimum memory requirement. The downside
is that it can increase the number of required repartitions while spilling, or require spilling
a bit more data (because the partition granularity is larger).
> There's no effect on in-memory perf on TPC-H. On targeted perf there was one regression
because the broadcast join spilled a partition:
> {code}
> +-----------+-----------------------+---------+------------+------------+----------------+
> | Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean)
|
> +-----------+-----------------------+---------+------------+------------+----------------+
> | TPCH(_60) | parquet / none / none | 17.62   | +0.23%     | 12.10      | -0.16%    
    |
> +-----------+-----------------------+---------+------------+------------+----------------+
> +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
> | Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) |
StdDev(%) | Base StdDev(%) | Num Clients | Iters |
> +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
> | TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.88  | 11.36       |   +4.56%   |
  0.02%   |   0.39%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q21 | parquet / none / none | 68.54  | 67.43       |   +1.64%   |
  1.60%   |   0.33%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.44   | 2.40        |   +1.60%   |
  5.36%   |   3.88%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q7  | parquet / none / none | 36.05  | 35.49       |   +1.59%   |
  0.42%   |   0.27%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.86   | 7.80        |   +0.80%   |
  0.77%   |   0.58%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.84  | 10.79       |   +0.49%   |
  1.57%   |   1.54%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.49   | 6.46        |   +0.38%   |
  0.83%   |   0.48%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.21  | 12.18       |   +0.27%   |
  0.67%   |   1.31%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.62        |   +0.24%   |
  0.45%   |   0.43%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.94  | 10.92       |   +0.18%   |
  0.38%   |   0.60%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.71  | 27.70       |   +0.05%   |
  0.53%   |   0.64%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q18 | parquet / none / none | 56.23  | 56.29       |   -0.11%   |
  0.94%   |   0.55%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.50  | 13.53       |   -0.18%   |
  0.50%   |   1.86%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q17 | parquet / none / none | 16.45  | 16.55       |   -0.60%   |
  1.06%   |   0.82%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.33   | 9.39        |   -0.62%   |
  1.04%   |   1.53%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q9  | parquet / none / none | 36.26  | 36.59       |   -0.92%   |
  0.63%   |   0.31%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.58   | 4.63        |   -1.04%   |
  0.79%   |   0.47%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.86   | 7.96        |   -1.31%   |
  0.29%   |   0.81%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 22.33       |   -1.60%   |
  0.19%   |   0.67%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.22   | 3.29        |   -2.23%   |
  3.87%   |   0.77%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.06   | 8.25        |   -2.33%   |
  1.83%   |   3.19%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.47   | 5.69        |   -4.02%   |
  0.85%   |   1.35%        | 1           | 5     |
> +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
> {code}
> {code}
> +--------------------+-----------------------+---------+------------+------------+----------------+
> | Workload           | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean)
|
> +--------------------+-----------------------+---------+------------+------------+----------------+
> | TARGETED-PERF(_60) | parquet / none / none | 18.92   | +2.34%     | 5.00       | +1.85%
        |
> +--------------------+-----------------------+---------+------------+------------+----------------+
> +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
> ------+-------+
> | Workload           | Query                                                  | File
Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Cl
> ients | Iters |
> +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
> ------+-------+
> | TARGETED-PERF(_60) | primitive_exchange_broadcast                           | parquet
/ none / none | 57.73  | 39.60       | R +45.78%  |   0.32%    |   2.60%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_conjunct_ordering_1                          | parquet
/ none / none | 0.11   | 0.09        | R +31.01%  | * 21.46% * | * 26.37% *     | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_topn_bigint                                  | parquet
/ none / none | 5.19   | 4.41        |   +17.76%  | * 14.63% * |   8.25%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_in_predicate                          | parquet
/ none / none | 2.39   | 2.27        |   +5.06%   |   4.95%    |   4.89%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_exchange_shuffle                             | parquet
/ none / none | 85.73  | 82.08       |   +4.44%   |   2.41%    |   0.55%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_string_non_selective                  | parquet
/ none / none | 1.89   | 1.84        |   +2.56%   |   1.27%    |   1.16%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_groupby_bigint_lowndv                        | parquet
/ none / none | 3.85   | 3.77        |   +2.11%   |   2.73%    |   1.72%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_bigint_selective                      | parquet
/ none / none | 0.18   | 0.17        |   +1.95%   |   6.48%    |   1.66%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_decimal_selective                     | parquet
/ none / none | 1.66   | 1.63        |   +1.74%   |   1.88%    |   0.17%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | PERF_STRING-Q3                                         | parquet
/ none / none | 3.76   | 3.70        |   +1.59%   |   0.62%    |   0.62%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_string_selective                      | parquet
/ none / none | 1.94   | 1.92        |   +1.36%   |   1.35%    |   1.24%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_shuffle_join_union_all_with_groupby          | parquet
/ none / none | 49.09  | 48.47       |   +1.29%   |   0.11%    |   0.30%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_decimal_non_selective                 | parquet
/ none / none | 1.70   | 1.68        |   +1.28%   |   2.00%    |   1.80%        | 1     
>       | 5     |
> | TARGETED-PERF(_60) | primitive_filter_string_like                           | parquet
/ none / none | 14.83  | 14.66       |   +1.17%   |   0.01%    |   0.14%        | 1      
    | 5     |
> {code}
> I'm not considering reducing the aggregation fanout because there's quite a strong "pre-partitioning"
effect between the pre-aggregation and merge aggregation - reducing the fanout led to big
regressions on some large aggregations in my experiment:
> {code}
> Report Generated on 2017-04-07
> Run Description: "Base: c24142491b3e140aac8c6818668472707d015cc7 vs Ref: e1395f670f30df823adf682101b00fc72e15bd71"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:          impalad version 2.9.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE ()
> +-----------+-----------------------+---------+------------+------------+----------------+
> | Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean)
|
> +-----------+-----------------------+---------+------------+------------+----------------+
> | TPCH(_60) | parquet / none / none | 18.84   | +3.78%     | 12.28      | -0.19%    
    |
> +-----------+-----------------------+---------+------------+------------+----------------+
> +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> | Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) |
StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> | TPCH(_60) | TPCH-Q18 | parquet / none / none | 86.46  | 56.09       | R +54.15%  |
* 12.70% * |   0.60%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q17 | parquet / none / none | 17.71  | 16.32       |   +8.55%   |
  0.77%    |   1.05%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.21   | 3.13        |   +2.73%   |
  5.51%    |   3.46%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.63  | 13.31       |   +2.41%   |
  1.04%    |   0.97%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.99  | 10.86       |   +1.12%   |
  0.29%    |   0.49%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.43   | 2.41        |   +0.64%   |
  2.18%    |   3.88%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.07   | 8.05        |   +0.15%   |
  1.59%    |   1.00%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 21.94       |   +0.13%   |
  0.30%    |   0.19%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.80   | 7.80        |   +0.03%   |
  0.42%    |   0.63%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.16  | 12.17       |   -0.09%   |
  1.14%    |   0.54%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.68        |   -0.44%   |
  0.54%    |   0.48%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.52   | 6.56        |   -0.66%   |
  0.94%    |   0.77%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.30   | 9.37        |   -0.81%   |
  0.19%    |   1.81%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.56   | 4.60        |   -0.84%   |
  0.60%    |   0.81%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.55  | 27.84       |   -1.03%   |
  1.32%    |   0.92%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.79   | 7.88        |   -1.12%   |
  1.10%    |   0.27%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q21 | parquet / none / none | 67.65  | 69.84       |   -3.14%   |
  0.25%    |   0.13%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.45   | 5.67        |   -3.73%   |
  0.96%    |   1.46%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q9  | parquet / none / none | 33.72  | 36.25       |   -6.96%   |
  0.54%    |   0.27%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.22  | 11.06       |   -7.60%   |
  0.23%    |   0.63%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.86  | 13.32       |   -10.96%  |
  0.62%    |   1.00%        | 1           | 5     |
> | TPCH(_60) | TPCH-Q7  | parquet / none / none | 35.72  | 45.17       | I -20.93%  |
  0.47%    |   6.56%        | 1           | 5     |
> +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> (R) Regression: TPCH(_60) TPCH-Q18 [parquet / none / none] (56.09s -> 86.46s [+54.15%])
> +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
> | Operator     | % of Query | Avg    | Base Avg | Delta(Avg) | StdDev(%)  | Max    |
Base Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
> +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
> | 13:AGGREGATE | 56.02%     | 61.84s | 27.05s   | +128.61%   |   9.36%    | 71.87s |
27.37s   | +162.55%   | 1      | 3.85K  | 8.60M     |
> | 04:AGGREGATE | 15.87%     | 17.52s | 23.23s   | -24.59%    | * 30.12% * | 26.50s |
23.43s   | +13.09%    | 1      | 90.00M | 86.02M    |
> | 05:HASH JOIN | 18.97%     | 20.95s | 23.27s   | -9.99%     | * 14.01% * | 26.01s |
23.56s   | +10.37%    | 1      | 92.82K | 360.01M   |
> | 02:SCAN HDFS | 4.13%      | 4.56s  | 4.62s    | -1.31%     |   1.14%    | 4.64s  |
4.71s    | -1.42%     | 1      | 92.82K | 360.01M   |
> +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
> (I) Improvement: TPCH(_60) TPCH-Q7 [parquet / none / none] (45.17s -> 35.72s [-20.93%])
> +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
> | Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max    
 | Base Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |
> +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
> | 11:AGGREGATE | 4.63%      | 1.75s    | 1.80s    | -2.69%     |   0.30%   | 1.76s  
 | 1.86s    | -5.38%     | 1      | 4       | 1.64M     |
> | 10:HASH JOIN | 10.12%     | 3.83s    | 4.33s    | -11.56%    |   0.50%   | 3.86s  
 | 4.50s    | -14.20%    | 1      | 350.34K | 36.00M    |
> | 09:HASH JOIN | 5.43%      | 2.06s    | 2.52s    | -18.44%    |   1.58%   | 2.11s  
 | 2.66s    | -20.51%    | 1      | 109.37M | 36.00M    |
> | 08:HASH JOIN | 9.95%      | 3.77s    | 5.09s    | -25.84%    |   0.31%   | 3.79s  
 | 5.34s    | -29.10%    | 1      | 109.37M | 36.00M    |
> | 03:SCAN HDFS | 2.47%      | 935.77ms | 947.71ms | -1.26%     |   1.11%   | 945.37ms
| 969.54ms | -2.49%     | 1      | 9.00M   | 9.00M     |
> | 07:HASH JOIN | 34.60%     | 13.11s   | 19.43s   | -32.54%    |   0.51%   | 13.21s 
 | 21.49s   | -38.55%    | 1      | 109.37M | 36.00M    |
> | 14:EXCHANGE  | 2.50%      | 947.18ms | 931.62ms | +1.67%     |   1.96%   | 978.10ms
| 948.27ms | +3.14%     | 1      | 90.00M  | 90.00M    |
> | 02:SCAN HDFS | 3.14%      | 1.19s    | 1.20s    | -0.97%     |   0.69%   | 1.20s  
 | 1.22s    | -1.35%     | 1      | 90.00M  | 90.00M    |
> | 06:HASH JOIN | 23.24%     | 8.81s    | 9.67s    | -8.92%     |   0.52%   | 8.87s  
 | 10.16s   | -12.71%    | 1      | 109.37M | 36.00M    |
> | 00:SCAN HDFS | 2.32%      | 879.94ms | 890.98ms | -1.24%     |   1.29%   | 891.75ms
| 892.31ms | -0.06%     | 1      | 600.00K | 600.00K   |
> +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
> (V) Significant Variability: TPCH(_60) TPCH-Q18 [parquet / none / none] (0.60% ->
12.70%)
> +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
> | Operator     | % of Query | StdDev(%) | Base StdDev(%) | Delta(StdDev(%)) | #Hosts
| #Rows  | Est #Rows |
> +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
> | 04:AGGREGATE | 15.87%     | 30.12%    | 0.65%          | +4526.94%        | 1     
| 90.00M | 86.02M    |
> | 05:HASH JOIN | 18.97%     | 14.01%    | 0.98%          | +1323.57%        | 1     
| 92.82K | 360.01M   |
> +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message