drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Hash Agg vs Streaming Agg for a smaller data set
Date Fri, 10 Jul 2015 23:48:42 GMT
My guess is that in the second query, the size of the dataset is smaller,
and this causes the cost of sorting to be small enough that it is cheaper
than the HashAgg.

On Fri, Jul 10, 2015 at 4:27 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hi,
>
> Info about Data : The data is auto partitioned tpch 0.01 data. The second
> filter is a non-partitioned column, so in the first case the 'OR' predicate
> results in a full-table scan, while in the second case, partition pruning
> takes effect.
>
> The first case results in a hash agg and the second case in a streaming
> agg. Any idea why?
>
> 1. explain plan for select distinct l_modline, l_moddate from
> `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
> '1992-01-01' or l_shipdate=date'1992-01-01';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(l_modline=[$0], l_moddate=[$1])
> 00-02        Project(l_modline=[$0], l_moddate=[$1])
> 00-03          HashAgg(group=[{0, 1}])
> 00-04            Project(l_modline=[$2], l_moddate=[$0])
> 00-05              SelectionVectorRemover
> 00-06                Filter(condition=[OR(=($0, 1992-01-01), =($1,
> 1992-01-01))])
> 00-07                  Project(l_moddate=[$2], l_shipdate=[$1],
> l_modline=[$0])
> 00-08                    Scan..........
>
> 2. explain plan for select distinct l_modline, l_moddate from
> `tpch_multiple_partitions/lineitem_twopart` where l_moddate=date
> '1992-01-01' and l_shipdate=date'1992-01-01';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(l_modline=[$0], l_moddate=[$1])
> 00-02        Project(l_modline=[$0], l_moddate=[$1])
> 00-03          StreamAgg(group=[{0, 1}])
> 00-04            Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-05              Project(l_modline=[$2], l_moddate=[$0])
> 00-06                SelectionVectorRemover
> 00-07                  Filter(condition=[AND(=($0, 1992-01-01), =($1,
> 1992-01-01))])
> 00-08                    Project(l_moddate=[$2], l_shipdate=[$1],
> l_modline=[$0])
> 00-09                      Scan.....................
>
> - Rahul
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message